Join Our 5-Week ML/AI Engineer Interview Bootcamp šŸš€ led by ML Tech Leads at FAANGs

Back to Questions

225. Embedding collapse detection

medium
GeneralGeneral
senior

Detect embedding collapse in an embeddings-and-retrieval pipeline, where many items unintentionally map to nearly the same vector and retrieval quality degrades. You’ll compute a simple ā€œcollapse scoreā€ based on average cosine similarity across embedding pairs.

The cosine similarity between two vectors uu and vv is:

cos⁔(u,v)=uā‹…v∄u∄∄v∄\cos(u, v) = \frac{u \cdot v}{\|u\|\|v\|}

Requirements

Implement the function

python

Rules:

  • Convert the input to a NumPy array if necessary and L2-normalize each embedding vector.
  • Compute the average cosine similarity over all unique pairs (i, j) where i < j.
  • Return the average similarity and a boolean is_collapsed based on threshold.
  • Don’t use any prebuilt similarity utilities (e.g., sklearn.metrics.pairwise).
  • Keep it in a single Python function using only NumPy (and Python built-ins if needed).

Example

python

Output:

python
Input Signature
ArgumentType
thresholdfloat
embeddingsnp.ndarray
Output Signature
Return NameType
valuetuple

Constraints

  • Use NumPy only; no sklearn similarities

  • L2-normalize each embedding row

  • Average i<j cosine pairs only

  • Input is np.ndarray

Hint 1

Normalize first. Ensure embeddings is a 2D NumPy array and L2-normalize each row so cosine similarity becomes a dot product.

Hint 2

Vectorize pairwise cosine. After normalization, compute S = X @ X.T to get all pair cosine similarities at once.

Hint 3

Average only unique pairs. Use np.triu_indices(n, k=1) to select i < j, then take mean; handle n < 2 (no pairs) explicitly.

Roles
ML Engineer
AI Engineer
Companies
GeneralGeneral
Levels
senior
entry
Tags
cosine-similarity
numpy-vectorization
pairwise-metrics
embedding-quality
49 people are solving this problem
Python LogoPython Editor
Ln 1, Col 1

Input Arguments

Edit values below to test with custom inputs

You need tolog in/sign upto run or submit