Goodhart’s Law in LLM Evaluation
“When a measure becomes a target, it ceases to be a good measure.” — Charles Goodhart (1975), originally about monetary policy
In LLM evaluation: if you repeatedly tune a prompt to maximize a specific eval score, the prompt learns to game that score rather than genuinely improve. The eval metric stops reflecting real quality.
How it manifests
- You write 50 eval cases and iteratively improve your prompt against them
- By iteration 10, the prompt scores 95% — but only on those 50 cases
- On unseen inputs, quality may have degraded because the prompt over-specialized
Mitigations
- Frozen test split — a portion of the eval set is never used for iteration
- Regular refresh — add new production examples periodically
- Multiple eval dimensions — harder to game if you’re measuring accuracy AND conciseness AND safety simultaneously
See also
- Prompt Evaluation — where this failure mode appears in practice