Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs

Back to Questions

81. Early stopping retrieval

medium
GeneralGeneral
staff

Early stopping can save compute by stopping a similarity search once you’ve already found “good enough” matches for a query embedding. In this question, you’ll implement a simple early-stopping rule for cosine-similarity retrieval over a list of candidate embeddings.

The cosine similarity between a query vector q\mathbf{q} and a candidate vector c\mathbf{c} is defined as:

sim(q,c)=qcqc\text{sim}(\mathbf{q}, \mathbf{c}) = \frac{\mathbf{q} \cdot \mathbf{c}}{\|\mathbf{q}\| \|\mathbf{c}\|}

Requirements

Implement the function

python

Cosine similarity is:

cos(q,x)=qxqx\cos(q, x) = \frac{q \cdot x}{\|q\|\|x\|}

Rules:

  • Compute cosine similarity between query and each candidate in the given order.
  • Maintain the current top_k results (indices + scores) as you scan.
  • Early stop when the current best similarity seen so far is >= threshold for patience consecutive candidate checks.
  • Return the best top_k matches found up to the point you stop (or after scanning all candidates).
  • Do not use any prebuilt nearest-neighbor or retrieval utilities (e.g., FAISS, sklearn neighbors).

Example

python

Output:

python
Input Signature
ArgumentType
querynp.ndarray
top_kint
patienceint
thresholdfloat
candidatesnp.ndarray
Output Signature
Return NameType
valuenp.ndarray

Constraints

  • No FAISS/sklearn neighbors; manual cosine similarity

  • Return list of (index, similarity) sorted desc

  • Handle zero-norm vectors to avoid division by zero

Hint 1

Precompute the query norm once; cosine similarity for each candidate is (dot(query,cand)) / (||query|| * ||cand||).

Hint 2

While scanning candidates in order, maintain a top_k list of (index, sim) sorted descending; only insert/replace when the new sim beats the current worst in top_k.

Hint 3

Early-stop uses the best similarity seen so far, not the current candidate’s similarity: keep best_sim and a streak counter; after each candidate, if best_sim >= threshold increment streak else reset; stop when streak == patience.

Roles
ML Engineer
AI Engineer
Companies
GeneralGeneral
Levels
staff
senior
entry
Tags
cosine-similarity
top-k selection
early stopping
vector retrieval
20 people are solving this problem
Python LogoPython Editor
Ln 1, Col 1

Input Arguments

Edit values below to test with custom inputs

You need tolog in/sign upto run or submit