Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs
Early stopping can save compute by stopping a similarity search once you’ve already found “good enough” matches for a query embedding. In this question, you’ll implement a simple early-stopping rule for cosine-similarity retrieval over a list of candidate embeddings.
The cosine similarity between a query vector and a candidate vector is defined as:
Implement the function
Cosine similarity is:
Rules:
query and each candidate in the given order.top_k results (indices + scores) as you scan.>= threshold for patience consecutive candidate checks.top_k matches found up to the point you stop (or after scanning all candidates).Output:
| Argument | Type |
|---|---|
| query | np.ndarray |
| top_k | int |
| patience | int |
| threshold | float |
| candidates | np.ndarray |
| Return Name | Type |
|---|---|
| value | np.ndarray |
No FAISS/sklearn neighbors; manual cosine similarity
Return list of (index, similarity) sorted desc
Handle zero-norm vectors to avoid division by zero
Precompute the query norm once; cosine similarity for each candidate is (dot(query,cand)) / (||query|| * ||cand||).
While scanning candidates in order, maintain a top_k list of (index, sim) sorted descending; only insert/replace when the new sim beats the current worst in top_k.
Early-stop uses the best similarity seen so far, not the current candidate’s similarity: keep best_sim and a streak counter; after each candidate, if best_sim >= threshold increment streak else reset; stop when streak == patience.