Goodhart’s Law in LLM Evaluation

“When a measure becomes a target, it ceases to be a good measure.” — Charles Goodhart (1975), originally about monetary policy

In LLM evaluation: if you repeatedly tune a prompt to maximize a specific eval score, the prompt learns to game that score rather than genuinely improve. The eval metric stops reflecting real quality.

How it manifests

  1. You write 50 eval cases and iteratively improve your prompt against them
  2. By iteration 10, the prompt scores 95% — but only on those 50 cases
  3. On unseen inputs, quality may have degraded because the prompt over-specialized

Mitigations

  • Frozen test split — a portion of the eval set is never used for iteration
  • Regular refresh — add new production examples periodically
  • Multiple eval dimensions — harder to game if you’re measuring accuracy AND conciseness AND safety simultaneously

See also