Nonpartisan Government Accountability
PolicyLogic
How We Apply Our Methodology
INDEPENDENT & TRANSPARENT METHODOLOGY
PolicyLogic
Home About
Scorecards
All Scorecards State & Local Congress Presidential International Depts & Agencies Nonprofits & NGOs
Learn
Learning Center Think Clearly How Policy Works Take Action Govt 101 Glossary
Methodology
Elected Officials Depts & Agencies Presidential & International AI Pipeline Limitations
Corrections
Error Log Submission Tracker Contact
Elected Officials Departments & Agencies Presidential & International AI Pipeline Limitations
Known Limitations
Every accountability system has gaps. These are ours — documented transparently so that users and researchers understand where to apply caution.
AI Training Data Bias
Mitigation: human review + cross-party audits
Claude's training data has an unknown partisan distribution. It may have seen more critical coverage of some officials than others, producing systematic scoring differences unrelated to actual delivery.
Source Availability Asymmetry
Mitigation: Limited Evidence flag + minimum source requirements
Governors of large states have extensive press coverage. Mayors of small cities may have one or two local sources. Low-coverage officials will tend toward lower scores because evidence of delivery is harder to find.
Their Role Subjectivity
Mitigation: lookup table + published rationale per promise
Assigning a 0.6 vs. 0.8 Their Role score involves judgment even with the lookup table. Inconsistency across AI runs remains possible for situations that don't match an anchor value.
Impact Scores Embed Value Judgments
Mitigation: Magnitude defined in terms of documented population affected
Designating a promise as M3 vs. M2 requires a judgment about significance. The framework defines Magnitude in terms of population affected, but application embeds assumptions about what matters.
High-Impact Failures Penalize More
Intentional design — disclosed on every scorecard
Impact is not modified by delivery. A failed high-impact promise drags down a grade more than a failed low-impact one. Ambitious officials who fail are graded more harshly than cautious ones who deliver.
Non-English-Language Coverage Gap
Mitigation: community-submitted promise nominations accepted
Promises made in small-market radio interviews or community events in languages other than English may not surface in web searches. This systematically disadvantages accountability for commitments made to non-English-speaking constituencies.
Long-Horizon Promises Mid-Term
Mitigation: Time Pressure adjustment + prominent term stage display
Officials scored mid-term appear to underdeliver on long-term promises simply because they haven't had time. The Time Pressure adjustment partially addresses this but does not eliminate it.
Model Version Drift
Mitigation: version logging + periodic consistency audits
As Claude is updated, scoring behavior may shift. Without periodic re-scoring audits, grade comparisons across scorecards generated by different model versions may reflect model drift rather than genuine differences in delivery.
Difficulty Is Coupled to Delivery
Intentional design — disclosed in the scoring framework
Difficulty points are earned in proportion to delivery, not independently. An ambitious promise that is never delivered earns zero difficulty credit, so a failed high-difficulty promise loses points on two axes at once. This is deliberate — it prevents officials from banking credit for promising hard things they never did — but it means Difficulty is not a standalone measure of ambition.

Found an Error?

If you find a factual error, a missing promise, or a score you believe is wrong, use the Report an Error link on any scorecard. Every submission is reviewed. Corrections are published in the Error Log.

View the Error Log →