Cosine Similarity

A measure of how similar two vectors are in direction, regardless of their magnitude. Computed as the cosine of the angle between them:

  • 1.0 = vectors point in the same direction (identical meaning in embedding space)
  • 0.0 = vectors are perpendicular (unrelated)
  • -1.0 = vectors point in opposite directions (rare in practice with embeddings, which are typically non-negative)

In NLP and LLM evaluation, cosine similarity is used to compare Embeddings — if two texts have high cosine similarity between their embedding vectors, they are considered semantically similar.

Limitations

  • Cannot distinguish between “similar topic” and “same meaning” — two sentences about the same subject but making opposite claims may score high
  • Sensitive to the embedding model used — different models produce different similarity scores for the same pair
  • Poor at detecting numerical differences (“$100” vs “$10,000” can score highly similar)

See also