Skip to main content
PTP

Transparency

Methodology & Sources

Last updated: May 9, 2026

What Political Transparency Project does

PTP aggregates campaign finance, congressional voting records, lobbying disclosures, stock trades, and live election odds into one source-linked, non-partisan profile per candidate. This page documents where every piece of data comes from, how it is processed, and the limits of what it shows.

Data sources

Campaign finance

Federal campaign finance data comes from the Federal Election Commission (FEC) — totals, contribution breakdowns, and outside-spending records for every federal candidate and committee. Currently fetched via the FEC public API; migration to FEC bulk S3 downloads is in progress for production-scale loads.

Coverage: federal races only. State races (governor, attorney general, etc.) are outside FEC jurisdiction and currently lack a unified national source.

Congressional voting & legislation

Bill data, sponsorship, cosponsorship, and member metadata come from the Congress.gov API. Roll-call vote records come from the Senate.gov Legislative Information System (Senate XML) and the House Clerk (House XML). Coverage: 119th Congress and forward.

Stock trades (STOCK Act disclosures)

House member trade data comes from House Clerk Periodic Transaction Reports (PTRs), parsed from PDF disclosures. The STOCK Act (2012) requires members of Congress to disclose securities trades within 45 days. Disclosed amounts are ranges (e.g., $1,001–$15,000), not exact values.

Senate Electronic Financial Disclosure (eFD) coverage is in development. Senate member trades are not yet shown.

Lobbying disclosures

Quarterly LD-2 filings from the Senate Office of Public Records and the House Clerkare used to identify which industries are actively lobbying on which bills. These filings are the basis of the Money-Vote Gap’s donor-position layer.

Election odds

Live prediction-market prices come from Polymarket. Prices reflect real-money markets and update continuously. These are market-implied probabilities, not PTP editorial forecasts.

State-level data

State-level officeholder metadata comes from the Google Civic API.

Money-Vote Gap (MVG) methodology

The Money-Vote Gap is PTP’s flagship analytical metric. It compares the industries that fund a member of Congress against how that member votes on legislation those same industries care about.

How it is computed

  1. PAC contributions are aggregated per candidate and tagged by industry (oil & gas, pharmaceutical, defense, etc.).
  2. Roll-call votes are tagged by which industries have a direct financial stake in the outcome — regulation, spending, subsidies, tax treatment, licensing, or liability rules.
  3. Alignment is computed where a documented or inferred industry position exists: did the member vote with the industry’s stated position, or against it?
  4. Reference medians (two views): Chamber median = median Yes% on industry-tagged bills across all members of the chamber. Party median = median Yes% across members of the same chamber and same party. The party-conditional median is the primary baseline for ranking; chamber median is shown as secondary context. See the note below on why.

Why party-conditional median is the primary baseline

The chamber-wide median is partisan-biased whenever the chamber has a partisan majority. In the current 119th Congress House, the median House member on industry-tagged bills votes around 80% Yes — because the median is structurally a Republican, and Republicans whip-vote along industry-aligned lines. A typical Democrat voting 25% Yes shows a chamber-deviation of −55, but that’s measuring “this person is a Democrat”, not “money distorted this person’s votes.” A 2026-05-14 methodology audit showed that 100% of the top-20 negative chamber-deviation cells melted to within ±5 of the party median — they were partisan-baseline artifacts, not money-vote divergence. The fix: rank by party deviation = member’s Yes% − median Yes% among same-party members in the same chamber, same industry, same cycle. Chamber deviation is preserved as a secondary metric for context. Cells where fewer than 5 same-party members met the vote-count floor display “low-n” and fall back to chamber deviation for ranking.

Donor position tiers

PTP tracks the source and confidence of every industry position used in alignment scoring:

  • Tier 1Explicit organizational position. Scorecards from the US Chamber of Commerce, AFL-CIO, NFIB, League of Conservation Voters, and trade-association press releases. Highest defensibility — the organization itself said it.
  • Tier 2Lobbying disclosure with inferred direction.LDA filings show which bills an industry lobbied on; direction (support / oppose) is inferred from the filing’s “specific issues” text using a constrained language model. Labeled as “from lobbying disclosures.”
  • Tier 3Inferred from bill text.When no Tier 1 or Tier 2 source exists, an industry’s likely position is inferred from the bill’s title and summary using a constrained prompt (temperature 0, output limited to support / oppose / unknown). Always labeled as inferred. Do not cite as proof of an industry’s position.

Statistical confidence (Wilson CI)

Each member’s Yes-rate on an industry’s tagged bills is a proportion (yes-votes / total-votes), and at small total-vote counts the point estimate is fragile. A member who voted 0 of 7 oil bills as Yes has a wide range of plausible true Yes-rates even before we ask what their colleagues did. PTP computes a Wilson 95% confidence intervalon every cell’s Yes-rate at write time and stores both bounds on money_vote_gap.

The Party Δ shown on the leaderboard inherits this uncertainty: the confidence interval on Party Δ is the Wilson interval shifted by the party median. When that interval crosses zero, the direction of the deviation isn’t confidently signed and the row is marked CI ∋ 0; the leaderboard ranks all confidently-signed cells above all CI-crosses-zero cells, then orders within each group by the lower bound of |Party Δ| — the minimum deviation we can defend at 95% confidence. This kills the small-sample noise that point-estimate ranking lets through. See docs/audits/mvg-wilson-ci-2026-05-15.md for the methodology audit.

Sample-size policy

To prevent statistical noise from misleading readers:

  • Fewer than 3 votes — the industry row is hidden entirely.
  • 3–4 votes — raw vote counts only; no rate or deviation is displayed.
  • 5 or more votes — full statistical display including party and chamber deviation.
  • Fewer than 5 same-party members in the cell’s (chamber, industry) pool — Party Δ is suppressed (“low-n”); the row ranks by chamber deviation only.

Revolving door methodology

The revolving-door tracker links people who have moved between U.S. government roles and the private sector — lobbying firms, law firms, trade associations, advocacy groups, and corporate offices. Every claim links back to a sworn federal disclosure.

Primary source

The current data layer is built on quarterly LD-2 lobbying disclosures from lda.gov. Each filing identifies the registered lobbying firm, the client they represented, the bills lobbied on, and — for every individual lobbyist named — a freetext covered_position field where the lobbyist self-discloses any prior covered government employment. These filings are submitted to the federal government under penalty of perjury (2 U.S.C. § 1606), which is what makes them suitable as a high-confidence primary source.

How extraction works

  1. Filing archive. Every LDA filing payload is hashed (sha256 of its canonical JSON form) and stored in a source_documents table. Re-fetching the same filing returns the existing archive row — no duplicates. Every role and transition row links to one or more source documents; the schema enforces this at write time (a role cannot be inserted without at least one source).
  2. Lobbyist roles. For every named lobbyist on a filing, a current “Lobbyist at firm” role is upserted, linking the person to the registered lobbying firm.
  3. Historical government roles.The covered_position string is parsed with a heuristic-first, LLM-fallback pipeline. Heuristics handle common shapes (“Senate Foreign Relations Committee”, “Office of Senator X”, “Department of Energy”, etc.). Strings that no heuristic handles fall through to a Claude Haiku call with a tightly constrained JSON-only prompt at temperature 0. Both paths produce a structured record: the government organization, the role title, and any explicit dates.
  4. Transitions.A separate compute step walks each person’s roles in chronological order and emits transition rows for consecutive (from, to) pairs, annotated with direction (gov-to-private, private-to-gov, gov-to-advocacy, etc.), gap days where computable, and a notability score from 0–100.

Confidence tiers

Every role is tagged with a confidence value. The public site only surfaces confirmed rows; lower tiers exist in the database for internal review and future data-quality work but are not displayed.

  • ConfirmedDirect documentary evidence — primarily LDA filings (sworn under perjury). The lobbyist or filing entity has themselves attested to the information. This is the only tier shown publicly.
  • ProbableInferred from a secondary documented source — SEC EDGAR 8-K filings, official press releases, congressional bio archives. Higher specificity than network-adjacent inference but not self-attested. Reserved for Tier-2 ingestion (in development).
  • UnconfirmedNetwork-adjacent or LLM-extracted from less authoritative sources. Internal use only; never published.

Entity resolution

Matching the same human across multiple filings is the central technical challenge. Two filings might list “Bob Smith” and “Robert J. Smith” — same person or two different lobbyists? PTP’s policy is conservative resolution:

  • When a new sighting matches an existing person on first+last name and no other field disagrees (middle name, suffix), the rows are merged.
  • When a normalized-name match exists but middle name or suffix conflicts (e.g., “Robert James Smith” vs “Robert Paul Smith”), the system inserts a new persons row and writes the candidate pair to a human-review queue. It does not auto-merge under uncertainty.
  • When two or more existing rows already share the normalized name, the system always inserts new and queues for review, regardless of other fields.

The trade-off is intentional: the published graph will sometimes show two near-duplicate rows for the same person rather than a single wrongly-merged row. False merges can be defamatory; false splits are a transparency annoyance that gets cleaned up via review.

Notability scoring

Each transition gets a score from 0–100. The score is heuristic, not statistical, and is used only for sort/filter, not as an editorial judgment of significance. The base is 30, with additive bonuses for: gov-to-private direction (+20); from-role at a legislative or executive-agency body (+15 / +12); senior title markers (Chief, Director, Secretary, Counsel, etc.) on the from-role (+10); to-role at a lobbying firm or with lobbying tells in the org name (+10); a gap of less than one year between the gov role ending and the private role beginning (+10).

Known limitations

  • LDA-only coverage. Only people who have appeared as a registered lobbyist on an LDA filing are in the graph. Former officials who left government for non-lobbying private roles, board seats, consulting, or trade-press are absent until SEC EDGAR + press-release ingestion ships.
  • Date resolution is quarterly at best. The covered_position freetext gives years where present and nothing where not. Day/month accuracy is generally unavailable.
  • Self-disclosure bias.A lobbyist who omits their covered_position is missing from the graph for that filing. We currently treat absence as “no prior gov role”, not “undisclosed.”
  • Same-name collision.Conservative resolution means common-name pairs like “John Smith” can split into multiple rows pending review. This will shrink as the human-review queue is processed.

Legal posture

Public LDA filings are federal records (17 U.S.C. § 105) — public domain, not copyrightable. Reproducing factual statements from sworn disclosures is well-established ground (Feist v. Rural Telephone). Every claim shown on a person or organization page links back to the originating LDA filing on lda.gov for independent verification. PTP does not editorialize about motives; the data describes what people did, not why.

Editorial standards

PTP is non-partisan and source-linked. We do not endorse candidates or take positions on bills. Where the data shows a meaningful pattern, we describe it; we do not interpret it.

Editorial voice

PTP applies structural parity rather than false balance: candidates and parties receive proportional coverage based on the data available, not based on a quota. A senator with $50M in tracked PAC money receives more analytical depth than a long-shot challenger with no FEC filings — we do not inflate the latter to match the former.

AI assistance — disclosure

Parts of PTP are AI-assisted. We disclose where:

  • Per-candidate narrative summaries are generated from the structured data already on the same page by a large language model using a constrained prompt. They contain no claims not directly supported by the linked sections below them, and are labeled on each profile.
  • Candidate biographies for the majority of profiles are AI-drafted from FEC, Congress.gov, and public record sources, then reviewed and approved by a human editor before publication.
  • Bill industry classification and lobbying position extraction use Claude Haiku with deterministic prompts (temperature 0, output constrained to a fixed taxonomy).
  • Long-form analysis, weekly reports, and editorial commentary are human-written. Anything requiring interpretation, judgment, or narrative voice is produced by a human editor.

Source-linking

Every data point on a candidate profile links back to its primary source: FEC filing pages, Congress.gov bill pages, House/Senate clerk PDFs, Polymarket markets. If you cannot trace a number to its source in two clicks, please report it as a bug.

Known limitations

We publish these openly because they affect what conclusions you should draw:

  • House PTR only. Stock trade data currently covers House members. Senate eFD coverage is in development.
  • State-level finance is incomplete. FEC does not cover state races, and disclosure quality varies widely by state.
  • Outside-spending matching is conservative. We err on the side of false negatives — better to under-match than to misattribute spending to the wrong race.
  • Inferred positions are not documented positions.Tier 3 LLM-inferred positions are useful for breadth but should not be cited as proof of an industry’s stance on a bill. Always check the tier label.

Updating & corrections

  • Live API data (FEC totals, Polymarket odds): refreshed within 1–24 hours.
  • Cached data (vote records, bills, lobbying filings): refreshed daily.
  • Editorial content: updated on review.

To report a data error or request a correction, email data@poltrapro.com. Corrections are logged publicly with the date and nature of the change.

For information on what we collect from site visitors, see the Privacy Policy. For commercial API terms, see the Developer Portal.