Cosine Similarity
A measure of how similar two vectors are in direction, regardless of their magnitude. Computed as the cosine of the angle between them:
- 1.0 = vectors point in the same direction (identical meaning in embedding space)
- 0.0 = vectors are perpendicular (unrelated)
- -1.0 = vectors point in opposite directions (rare in practice with embeddings, which are typically non-negative)
In NLP and LLM evaluation, cosine similarity is used to compare Embeddings — if two texts have high cosine similarity between their embedding vectors, they are considered semantically similar.
Limitations
- Cannot distinguish between “similar topic” and “same meaning” — two sentences about the same subject but making opposite claims may score high
- Sensitive to the embedding model used — different models produce different similarity scores for the same pair
- Poor at detecting numerical differences (“$100” vs “$10,000” can score highly similar)
See also
- Embeddings — the vectors being compared
- Prompt Evaluation — where cosine similarity is used as an eval assertion