Goodhart’s Law in LLM Evaluation

“When a measure becomes a target, it ceases to be a good measure.” — Charles Goodhart (1975), originally about monetary policy

In LLM evaluation: if you repeatedly tune a prompt to maximize a specific eval score, the prompt learns to game that score rather than genuinely improve. The eval metric stops reflecting real quality.

How it manifests

You write 50 eval cases and iteratively improve your prompt against them
By iteration 10, the prompt scores 95% — but only on those 50 cases
On unseen inputs, quality may have degraded because the prompt over-specialized

Mitigations

Frozen test split — a portion of the eval set is never used for iteration
Regular refresh — add new production examples periodically
Multiple eval dimensions — harder to game if you’re measuring accuracy AND conciseness AND safety simultaneously

Edmondo's Vault

Explorer

Goodhart's Law in LLM Evaluation

Goodhart’s Law in LLM Evaluation

How it manifests

Mitigations

See also

Graph View

Table of Contents

Backlinks