Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs

Back to Questions

248. Top k embeddings retrieval

easy
GeneralGeneral
senior

Retrieve the top‑k most similar embeddings for a query embedding using cosine similarity, which is a common building block in embedding search. You’ll compute similarity scores and return the indices of the best matches.

Cosine similarity is:

cos_sim(q,xi)=q⋅xi∥q∥  ∥xi∥\text{cos\_sim}(q, x_i) = \frac{q \cdot x_i}{\|q\|\;\|x_i\|}

Requirements

Implement the function

python

Rules:

  • Compute cosine similarity between query and every row in embeddings.
  • Return the indices of the top k most similar embeddings, sorted from most similar to least similar.
  • Do not use any prebuilt nearest-neighbor/search libraries (e.g., FAISS, sklearn NearestNeighbors).
  • Use NumPy for vectorized computation.
  • If there are ties in similarity, break ties by smaller index first.

Example

python

Output:

python
Input Signature
ArgumentType
kint
querynp.ndarray
embeddingsnp.ndarray
Output Signature
Return NameType
valuelist

Constraints

  • Use NumPy for vectorization.

  • Return indices sorted; ties use smaller index

  • Handle k <= n; embeddings shape (n,d)

Hint 1

Compute norms: q_norm = np.linalg.norm(query) and e_norms = np.linalg.norm(embeddings, axis=1).

Hint 2

Compute dot products: dots = embeddings @ query.

Hint 3

Calculate similarities: sims = dots / (q_norm * e_norms); create pairs (sim, idx) and sort with key (-sim, idx).

Roles
ML Engineer
AI Engineer
Companies
GeneralGeneral
Levels
senior
entry
Tags
cosine-similarity
top-k-retrieval
numpy-vectorization
similarity-search
16 people are solving this problem
Python LogoPython Editor
Ln 1, Col 1

Input Arguments

Edit values below to test with custom inputs

You need tolog in/sign upto run or submit