Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs

Back to Questions

138. Batched query retrieval

medium
GeneralGeneral
staff

Implement batched query retrieval to find the top‑(k) most similar items (by cosine similarity) in a small embedding index for each query. You’ll take queries and documents as NumPy arrays and return, for every query, the indices of the best matches.

The cosine similarity between vectors (q) and (d) is:

cos(q,d)=qdq2d2\cos(q, d) = \frac{q \cdot d}{\|q\|_2 \, \|d\|_2}

Requirements

Implement the function

python

Rules:

  • Compute cosine similarities between every query and every document (batched).
  • Return only the document indices, not the similarity scores.
  • Break ties by returning smaller document indices first.
  • Don’t use any prebuilt retrieval or similarity search libraries (e.g., FAISS, sklearn neighbors).
  • Use NumPy operations for the main computation (avoid Python double for-loops over queries×docs).
  • Return a NumPy array.

Example

python

Output:

python
Input Signature
ArgumentType
kint
docsnp.ndarray
queriesnp.ndarray
Output Signature
Return NameType
valuenp.ndarray

Constraints

  • Use NumPy; avoid nested query×doc loops.

  • Stable tie-break: smaller index first.

  • Return indices only; length k per query.

Hint 1

Compute all dot products at once with Q @ D.T.

Hint 2

Cosine similarity needs norms: compute q_norms = norm(Q, axis=1) and d_norms = norm(D, axis=1), then divide by q_norms[:,None] * d_norms[None,:].

Hint 3

For top‑k with tie-breaking by smaller index, use a stable sort: np.argsort(-sims, axis=1, kind='mergesort'), then take [:, :k].

Roles
ML Engineer
AI Engineer
Companies
GeneralGeneral
Levels
staff
senior
entry
Tags
cosine-similarity
top-k-retrieval
numpy-vectorization
stable-sorting
28 people are solving this problem
Python LogoPython Editor
Ln 1, Col 1

Input Arguments

Edit values below to test with custom inputs

You need tolog in/sign upto run or submit