Technical Report · January 2025

Aggregating Federal Legislative Data at Scale

A technical overview of integrating 15+ government APIs into a unified civic intelligence platform, covering data architecture, caching strategies, and the challenges of maintaining accuracy across disparate federal data sources.

1. Introduction

CIV.IQ aggregates legislative data from over 15 federal and state government APIs into a single queryable interface. This report documents the technical architecture, data integration challenges, and performance optimizations required to serve comprehensive civic information at scale.

The core challenge: government APIs are inconsistent in format, update frequency, rate limits, and data quality. Building a reliable platform requires abstracting these differences while maintaining data accuracy and freshness.

2. Data Sources

CIV.IQ integrates the following primary data sources:

Source Data Provided Update Frequency
Congress.gov API Members, bills, votes, committees Daily
FEC.gov API Campaign finance, contributions, expenditures Quarterly filings
Census Bureau Demographics, boundaries, geocoding Annual/Decennial
OpenStates v3 State legislators, bills, votes Session-dependent
USAspending.gov Federal contracts, grants Daily
Federal Register Executive orders, regulations Daily
GovInfo Hearings, committee reports As published
Wikidata Biographical data, state executives Continuous

3. Architecture

3.1 API Layer

CIV.IQ exposes 101 API endpoints, organized into logical domains:

3.2 Data Normalization

Each upstream API returns data in different formats. Congress.gov uses XML with nested structures. FEC returns paginated JSON. Census provides GeoJSON for boundaries. OpenStates uses a different member ID scheme than Congress.gov.

The normalization layer maps all sources to a unified schema:

// Unified Representative Schema
interface Representative {
  id: string;              // Internal CIV.IQ ID
  bioguideId: string;      // Congress.gov identifier
  fecCandidateId: string;  // FEC identifier
  openStatesId?: string;   // OpenStates identifier (state legs only)
  
  name: {
    first: string;
    last: string;
    official: string;
    nickname?: string;
  };
  
  position: {
    chamber: 'house' | 'senate';
    state: string;
    district?: number;
    party: string;
    startDate: string;
    isVoting: boolean;     // Distinguishes territorial delegates
  };
  
  contact: {
    website: string;
    phone: string;
    office: string;
    socialMedia: SocialLinks;
  };
}

3.3 Geographic Resolution

ZIP code to congressional district mapping is non-trivial. ZIP codes are postal routes, not geographic boundaries — they can cross district lines. CIV.IQ uses the Census Bureau's ZIP Code Tabulation Areas (ZCTAs) crosswalked against congressional district shapefiles.

For the 39,495 searchable ZIP codes:

4. Caching Strategy

Government APIs have strict rate limits (Congress.gov: 1000/hour, FEC: 1000/hour). Naive implementations would exhaust limits serving a few hundred users. CIV.IQ implements tiered caching with Next.js ISR (Incremental Static Regeneration) and Redis.

4.1 Revalidation Tiers

Data Type Revalidation Period Rationale
Member biographical data 1 week Rarely changes mid-term
Committee assignments 1 day Changes at session start
Voting records 1 hour Updates during session
Bill status 1 hour Active legislation moves fast
Campaign finance 1 day FEC filings are periodic
News/GDELT 5 minutes Breaking coverage

4.2 Cache Warming

On deployment, a background job pre-fetches and caches data for all 540 federal representatives. This ensures first-hit performance and reduces upstream API load during traffic spikes.

// Cache warming on deploy
async function warmCache() {
  const members = await fetchAllMembers();
  
  for (const member of members) {
    await Promise.all([
      cache.set(`member:${member.bioguideId}:profile`, ...),
      cache.set(`member:${member.bioguideId}:votes`, ...),
      cache.set(`member:${member.bioguideId}:bills`, ...),
      cache.set(`member:${member.bioguideId}:finance`, ...),
    ]);
  }
}

5. Constitutional Accuracy

A key design principle: CIV.IQ must accurately represent constitutional distinctions that other civic tools flatten. The 540 members of Congress are not equivalent:

The UI explicitly labels voting status and cites the relevant constitutional provisions. ZIP code lookups in territories return delegates with appropriate context about their limited floor voting rights.

6. Performance Optimizations

6.1 Image Optimization

Official congressional photos (432 current members with available photos) were batch-converted from JPEG to WebP, reducing total asset size by 83%. Images are served via Next.js Image component with automatic format negotiation.

6.2 Map Tiles

District boundary visualization uses PMTiles format instead of traditional vector tile servers. The complete dataset (7,383 state legislative districts + 435 congressional districts) compresses to 24MB — a 75% reduction from the source shapefiles. Maps render client-side via MapLibre GL.

6.3 Static Data Generation

Slowly-changing reference data is pre-generated at build time:

7. Error Handling

Government APIs have varying reliability. CIV.IQ's error handling philosophy: real data or explicit absence. Every endpoint either returns verified government data or a clear "Data unavailable" response with source attribution.

Design Principle
No mock data. No generated placeholders. No "example" content. If the upstream API fails, the user sees an honest error rather than fabricated information.

8. Future Work

9. Conclusion

Building a reliable civic intelligence platform requires treating government data aggregation as a serious engineering problem. The challenges — inconsistent APIs, rate limits, constitutional nuance, geographic complexity — are solvable with careful architecture. The result is a platform that makes democratic participation more informed without sacrificing accuracy for convenience.