Aggregating Federal Legislative Data at Scale

1. Introduction

CIV.IQ aggregates legislative data from over 15 federal and state government APIs into a single queryable interface. This report documents the technical architecture, data integration challenges, and performance optimizations required to serve comprehensive civic information at scale.

The core challenge: government APIs are inconsistent in format, update frequency, rate limits, and data quality. Building a reliable platform requires abstracting these differences while maintaining data accuracy and freshness.

2. Data Sources

CIV.IQ integrates the following primary data sources:

Source	Data Provided	Update Frequency
Congress.gov API	Members, bills, votes, committees	Daily
FEC.gov API	Campaign finance, contributions, expenditures	Quarterly filings
Census Bureau	Demographics, boundaries, geocoding	Annual/Decennial
OpenStates v3	State legislators, bills, votes	Session-dependent
USAspending.gov	Federal contracts, grants	Daily
Federal Register	Executive orders, regulations	Daily
GovInfo	Hearings, committee reports	As published
Wikidata	Biographical data, state executives	Continuous

3. Architecture

3.1 API Layer

CIV.IQ exposes 101 API endpoints, organized into logical domains:

Federal Representatives — 35 endpoints covering member profiles, contact info, committee assignments, voting records, sponsored legislation
Campaign Finance — 8 endpoints for FEC data including contributors, expenditures, industry breakdowns, geographic distribution
Legislative Tracking — 15 endpoints for bills, amendments, cosponsors, status timelines
State Government — 25 endpoints for state legislators, district maps, state bills
Civic Engagement — 18 endpoints for hearings, comment periods, federal spending by district

3.2 Data Normalization

Each upstream API returns data in different formats. Congress.gov uses XML with nested structures. FEC returns paginated JSON. Census provides GeoJSON for boundaries. OpenStates uses a different member ID scheme than Congress.gov.

The normalization layer maps all sources to a unified schema:

// Unified Representative Schema
interface Representative {
  id: string;              // Internal CIV.IQ ID
  bioguideId: string;      // Congress.gov identifier
  fecCandidateId: string;  // FEC identifier
  openStatesId?: string;   // OpenStates identifier (state legs only)
  
  name: {
    first: string;
    last: string;
    official: string;
    nickname?: string;
  };
  
  position: {
    chamber: 'house' | 'senate';
    state: string;
    district?: number;
    party: string;
    startDate: string;
    isVoting: boolean;     // Distinguishes territorial delegates
  };
  
  contact: {
    website: string;
    phone: string;
    office: string;
    socialMedia: SocialLinks;
  };
}

3.3 Geographic Resolution

ZIP code to congressional district mapping is non-trivial. ZIP codes are postal routes, not geographic boundaries — they can cross district lines. CIV.IQ uses the Census Bureau's ZIP Code Tabulation Areas (ZCTAs) crosswalked against congressional district shapefiles.

For the 39,495 searchable ZIP codes:

87% map to a single congressional district
11% span 2 districts (user sees both representatives)
2% span 3+ districts (major metro ZIPs)

4. Caching Strategy

Government APIs have strict rate limits (Congress.gov: 1000/hour, FEC: 1000/hour). Naive implementations would exhaust limits serving a few hundred users. CIV.IQ implements tiered caching with Next.js ISR (Incremental Static Regeneration) and Redis.

4.1 Revalidation Tiers

Data Type	Revalidation Period	Rationale
Member biographical data	1 week	Rarely changes mid-term
Committee assignments	1 day	Changes at session start
Voting records	1 hour	Updates during session
Bill status	1 hour	Active legislation moves fast
Campaign finance	1 day	FEC filings are periodic
News/GDELT	5 minutes	Breaking coverage

4.2 Cache Warming

On deployment, a background job pre-fetches and caches data for all 540 federal representatives. This ensures first-hit performance and reduces upstream API load during traffic spikes.

// Cache warming on deploy
async function warmCache() {
  const members = await fetchAllMembers();
  
  for (const member of members) {
    await Promise.all([
      cache.set(`member:${member.bioguideId}:profile`, ...),
      cache.set(`member:${member.bioguideId}:votes`, ...),
      cache.set(`member:${member.bioguideId}:bills`, ...),
      cache.set(`member:${member.bioguideId}:finance`, ...),
    ]);
  }
}

5. Constitutional Accuracy

A key design principle: CIV.IQ must accurately represent constitutional distinctions that other civic tools flatten. The 540 members of Congress are not equivalent:

435 House Representatives — Voting members per Article I
100 Senators — Voting members per Article I
5 Territorial Delegates — Non-voting, per Article IV (DC, Puerto Rico, Guam, US Virgin Islands, American Samoa)
1 Resident Commissioner — Puerto Rico's 4-year delegate

The UI explicitly labels voting status and cites the relevant constitutional provisions. ZIP code lookups in territories return delegates with appropriate context about their limited floor voting rights.

6. Performance Optimizations

6.1 Image Optimization

Official congressional photos (432 current members with available photos) were batch-converted from JPEG to WebP, reducing total asset size by 83%. Images are served via Next.js Image component with automatic format negotiation.

6.2 Map Tiles

District boundary visualization uses PMTiles format instead of traditional vector tile servers. The complete dataset (7,383 state legislative districts + 435 congressional districts) compresses to 24MB — a 75% reduction from the source shapefiles. Maps render client-side via MapLibre GL.

6.3 Static Data Generation

Slowly-changing reference data is pre-generated at build time:

Census Gazetteer (ZIP to coordinate mapping)
Committee metadata and Wikipedia descriptions
State executive biographical data from Wikidata
District demographic summaries

7. Error Handling

Government APIs have varying reliability. CIV.IQ's error handling philosophy: real data or explicit absence. Every endpoint either returns verified government data or a clear "Data unavailable" response with source attribution.

Design Principle

No mock data. No generated placeholders. No "example" content. If the upstream API fails, the user sees an honest error rather than fabricated information.

8. Future Work

Historical data — Voting records and campaign finance for previous congresses
Local expansion — City council data beyond the current 10 major cities
Alert system — Notifications for bill status changes, upcoming votes, comment period deadlines
API access — Public API for researchers and civic developers

9. Conclusion

Building a reliable civic intelligence platform requires treating government data aggregation as a serious engineering problem. The challenges — inconsistent APIs, rate limits, constitutional nuance, geographic complexity — are solvable with careful architecture. The result is a platform that makes democratic participation more informed without sacrificing accuracy for convenience.