A plumb line for government AI

Lodlina measures how well language models do real U.S. public-sector work — benefits adjudication, records redaction, classification review, deployment readiness, policy Q&A — using defensible automated graders: every score traces to a labeled gold value or an exact string operation. Synthetic data, real use-case shapes, open methodology.

View the leaderboard Read the methodology

Current standings

Suite 2026.3-dev (jury packs re-scored post nova-pro-ceiling + rubric fix) · 13 task packs · generated 2026-06-11 02:21 UTC. Full per-task rates on the board page.
#Model Lodlina Score
1claude-opus-4-83,576 / 3,700
2claude-opus-4-73,457 / 3,700
3claude-haiku-4-53,003 / 3,700

The score is difficulty-weighted and open-ended — the maximum grows as harder packs are added, so the scale never saturates and is never reset. Scores are comparable only within a suite version.

Why this board is different

Defensible, not vibes

Headline metrics are deterministic: leak rates against labeled spans, determinations against rule-derived gold, citations checked verbatim. Model-graded metrics use a cross-family jury, deterministically gated, with per-juror votes published.

Tasks agencies recognize

Not trivia — the work itself: FOIA redaction with over-redaction traps, SNAP-like eligibility math, NOFORN/REL-TO releasability calls, deployment-readiness determinations, abstention when the source can't answer.

Measures what others don't

Counterfactual name-flips on binding determinations. False answers to unanswerable questions. Spillage of classified facts restated in (U)-marked text. Generational abstention gaps invisible to accuracy-only boards.

Contamination-resistant

Every pack is procedurally generated and reseedable: official boards run on fresh, never-published holdout items — the psychometric model (methods public, item bank private).

Mission coverage

B1 · Benefits & EligibilityB2 · Records, Disclosure & PrivacyB3 · Citizen Services & CorrespondenceB4 · Authoritative-Source Q&AB5 · Adjudication & CaseworkB6 · Acquisition & GrantsB7 · National Security Info ProtectionB8 · Defense & Military Mission Support

Eight mission buckets from the public content taxonomy; packs ship across Benefits, Records & Privacy, Citizen Services, Authoritative Q&A, National-Security Information Protection, and Defense Mission Support.