Batched query retrieval - ML & AI Coding

138. Batched query retrieval

medium

General

staff

Implement batched query retrieval to find the top‑(k) most similar items (by cosine similarity) in a small embedding index for each query. You’ll take queries and documents as NumPy arrays and return, for every query, the indices of the best matches.

The cosine similarity between vectors (q) and (d) is:

\cos(q, d) = \frac{q \cdot d}{\|q\|_2 \, \|d\|_2}

Requirements

Implement the function

python

Rules:

Compute cosine similarities between every query and every document (batched).
Return only the document indices, not the similarity scores.
Break ties by returning smaller document indices first.
Don’t use any prebuilt retrieval or similarity search libraries (e.g., FAISS, sklearn neighbors).
Use NumPy operations for the main computation (avoid Python double for-loops over queries×docs).
Return a NumPy array.

Example

python

Output:

python

Input Signature

Argument	Type
k	int
docs	np.ndarray
queries	np.ndarray

Output Signature

Return Name	Type
value	np.ndarray

Constraints

Use NumPy; avoid nested query×doc loops.
Stable tie-break: smaller index first.
Return indices only; length k per query.

Hint 1

Compute all dot products at once with Q @ D.T.

Hint 2

Cosine similarity needs norms: compute q_norms = norm(Q, axis=1) and d_norms = norm(D, axis=1), then divide by q_norms[:,None] * d_norms[None,:].

Hint 3

For top‑k with tie-breaking by smaller index, use a stable sort: np.argsort(-sims, axis=1, kind='mergesort'), then take [:, :k].

Roles

ML Engineer

AI Engineer

Companies

General

Levels

staff

senior

entry

Input Arguments

Edit values below to test with custom inputs

You need tolog in/sign upto run or submit