Political Transparency Project

What Political Transparency Project does

PTP aggregates campaign finance, congressional voting records, lobbying disclosures, stock trades, and live election odds into one source-linked, non-partisan profile per candidate. This page documents where every piece of data comes from, how it is processed, and the limits of what it shows.

Data sources

Campaign finance

Federal campaign finance data comes from the Federal Election Commission (FEC) — totals, contribution breakdowns, and outside-spending records for every federal candidate and committee. Currently fetched via the FEC public API; migration to FEC bulk S3 downloads is in progress for production-scale loads.

Coverage: federal races only. State races (governor, attorney general, etc.) are outside FEC jurisdiction and currently lack a unified national source.

Congressional voting & legislation

Bill data, sponsorship, cosponsorship, and member metadata come from the Congress.gov API. Roll-call vote records come from the Senate.gov Legislative Information System (Senate XML) and the House Clerk (House XML). Coverage: 119th Congress and forward.

Stock trades (STOCK Act disclosures)

House member trade data comes from House Clerk Periodic Transaction Reports (PTRs), parsed from PDF disclosures. The STOCK Act (2012) requires members of Congress to disclose securities trades within 45 days. Disclosed amounts are ranges (e.g., $1,001–$15,000), not exact values.

Senate Electronic Financial Disclosure (eFD) coverage is in development. Senate member trades are not yet shown.

Lobbying disclosures

Quarterly LD-2 filings from the Senate Office of Public Records and the House Clerk are ingested to identify which industries are actively lobbying on which bills. These filings underpin the revolving-door tracker and other lobbying-related surfaces.

Election odds

Live prediction-market prices come from Polymarket. Prices reflect real-money markets and update continuously. These are market-implied probabilities, not PTP editorial forecasts.

State-level data

State-level officeholder metadata comes from the Google Civic API.

Revolving door methodology

The revolving-door tracker links people who have moved between U.S. government roles and the private sector — lobbying firms, law firms, trade associations, advocacy groups, and corporate offices. Every claim links back to a sworn federal disclosure.

Primary source

The current data layer is built on quarterly LD-2 lobbying disclosures from lda.gov. Each filing identifies the registered lobbying firm, the client they represented, the bills lobbied on, and — for every individual lobbyist named — a freetext covered_position field where the lobbyist self-discloses any prior covered government employment. These filings are submitted to the federal government under penalty of perjury (2 U.S.C. § 1606), which is what makes them suitable as a high-confidence primary source.

How extraction works

Filing archive. Every LDA filing payload is hashed (sha256 of its canonical JSON form) and stored in a source_documents table. Re-fetching the same filing returns the existing archive row — no duplicates. Every role and transition row links to one or more source documents; the schema enforces this at write time (a role cannot be inserted without at least one source).
Lobbyist roles. For every named lobbyist on a filing, a current “Lobbyist at firm” role is upserted, linking the person to the registered lobbying firm.
Historical government roles.The covered_position string is parsed with a heuristic-first, LLM-fallback pipeline. Heuristics handle common shapes (“Senate Foreign Relations Committee”, “Office of Senator X”, “Department of Energy”, etc.). Strings that no heuristic handles fall through to a Claude Haiku call with a tightly constrained JSON-only prompt at temperature 0. Both paths produce a structured record: the government organization, the role title, and any explicit dates.
Transitions.A separate compute step walks each person’s roles in chronological order and emits transition rows for consecutive (from, to) pairs, annotated with direction (gov-to-private, private-to-gov, gov-to-advocacy, etc.), gap days where computable, and a notability score from 0–100.

Confidence tiers

Every role is tagged with a confidence value. The public site only surfaces confirmed rows; lower tiers exist in the database for internal review and future data-quality work but are not displayed.

ConfirmedDirect documentary evidence — primarily LDA filings (sworn under perjury). The lobbyist or filing entity has themselves attested to the information. This is the only tier shown publicly.
ProbableInferred from a secondary documented source — SEC EDGAR 8-K filings, official press releases, congressional bio archives. Higher specificity than network-adjacent inference but not self-attested. Reserved for Tier-2 ingestion (in development).
UnconfirmedNetwork-adjacent or LLM-extracted from less authoritative sources. Internal use only; never published.

Entity resolution

Matching the same human across multiple filings is the central technical challenge. Two filings might list “Bob Smith” and “Robert J. Smith” — same person or two different lobbyists? PTP’s policy is conservative resolution:

When a new sighting matches an existing person on first+last name and no other field disagrees (middle name, suffix), the rows are merged.
When a normalized-name match exists but middle name or suffix conflicts (e.g., “Robert James Smith” vs “Robert Paul Smith”), the system inserts a new persons row and writes the candidate pair to a human-review queue. It does not auto-merge under uncertainty.
When two or more existing rows already share the normalized name, the system always inserts new and queues for review, regardless of other fields.

The trade-off is intentional: the published graph will sometimes show two near-duplicate rows for the same person rather than a single wrongly-merged row. False merges can be defamatory; false splits are a transparency annoyance that gets cleaned up via review.

Notability scoring

Each transition gets a score from 0–100. The score is heuristic, not statistical, and is used only for sort/filter, not as an editorial judgment of significance. The base is 30, with additive bonuses for: gov-to-private direction (+20); from-role at a legislative or executive-agency body (+15 / +12); senior title markers (Chief, Director, Secretary, Counsel, etc.) on the from-role (+10); to-role at a lobbying firm or with lobbying tells in the org name (+10); a gap of less than one year between the gov role ending and the private role beginning (+10).

Known limitations

LDA-only coverage. Only people who have appeared as a registered lobbyist on an LDA filing are in the graph. Former officials who left government for non-lobbying private roles, board seats, consulting, or trade-press are absent until SEC EDGAR + press-release ingestion ships.
Date resolution is quarterly at best. The covered_position freetext gives years where present and nothing where not. Day/month accuracy is generally unavailable.
Self-disclosure bias.A lobbyist who omits their covered_position is missing from the graph for that filing. We currently treat absence as “no prior gov role”, not “undisclosed.”
Same-name collision.Conservative resolution means common-name pairs like “John Smith” can split into multiple rows pending review. This will shrink as the human-review queue is processed.

Legal posture

Public LDA filings are federal records (17 U.S.C. § 105) — public domain, not copyrightable. Reproducing factual statements from sworn disclosures is well-established ground (Feist v. Rural Telephone). Every claim shown on a person or organization page links back to the originating LDA filing on lda.gov for independent verification. PTP does not editorialize about motives; the data describes what people did, not why.

Editorial standards

PTP is non-partisan and source-linked. We do not endorse candidates or take positions on bills. Where the data shows a meaningful pattern, we describe it; we do not interpret it.

Editorial voice

PTP applies structural parity rather than false balance: candidates and parties receive proportional coverage based on the data available, not based on a quota. A senator with $50M in tracked PAC money receives more analytical depth than a long-shot challenger with no FEC filings — we do not inflate the latter to match the former.

Source-linking

Every data point on a candidate profile links back to its primary source: FEC filing pages, Congress.gov bill pages, House/Senate clerk PDFs, Polymarket markets. If you cannot trace a number to its source in two clicks, please report it as a bug.

Known limitations

We publish these openly because they affect what conclusions you should draw:

House PTR only. Stock trade data currently covers House members. Senate eFD coverage is in development.
State-level finance is incomplete. FEC does not cover state races, and disclosure quality varies widely by state.
Outside-spending matching is conservative. We err on the side of false negatives — better to under-match than to misattribute spending to the wrong race.

Updating & corrections

Live API data (FEC totals, Polymarket odds): refreshed within 1–24 hours.
Cached data (vote records, bills, lobbying filings): refreshed daily.
Editorial content: updated on review.

To report a data error or request a correction, email PolTraPro@proton.me. Corrections are logged publicly with the date and nature of the change.

Privacy & legal

For information on what we collect from site visitors, see the Privacy Policy. For commercial API terms, see the Developer Portal.

Methodology & Sources

What Political Transparency Project does

Data sources

Campaign finance

Congressional voting & legislation

Stock trades (STOCK Act disclosures)

Lobbying disclosures

Election odds

State-level data

Revolving door methodology

Primary source

How extraction works

Confidence tiers

Entity resolution

Notability scoring

Known limitations

Legal posture

Editorial standards

Editorial voice

Source-linking

Known limitations

Updating & corrections

Privacy & legal