Join Our 5-Week ML/AI Engineer Interview Bootcamp πŸš€ led by ML Tech Leads at FAANGs

Back to Questions

54. Similarity ranking

easy
GeneralGeneral
senior

Build a similarity-based ranking function for an embeddings retrieval system, where you rank candidate items by how close their embedding vectors are to a query embedding. Use cosine similarity to score each candidate:

cos_sim(q,xi)=qβ‹…xiβˆ₯qβˆ₯2 βˆ₯xiβˆ₯2\text{cos\_sim}(q, x_i) = \frac{q \cdot x_i}{\|q\|_2 \, \|x_i\|_2}

Requirements

Implement the function

python

Rules:

  • Compute cosine similarity between query and each vector in candidates.
  • Return the indices of the top_k most similar candidates, sorted from most to least similar.
  • Break ties by smaller index first (stable ranking).
  • Don’t use any prebuilt similarity or nearest-neighbor utilities (e.g., sklearn, scipy).
  • Keep it in a single Python function using only NumPy + built-ins.

Example

python

Output:

python
Input Signature
ArgumentType
querynp.ndarray
top_kint
candidatesnp.ndarray
Output Signature
Return NameType
valuenp.ndarray

Constraints

  • Input query and candidates are NumPy arrays.

  • top_k is a positive integer.

  • Return indices as a NumPy array.

Hint 1

Use matrix multiplication candidates @ query to compute dot products for all items at once.

Hint 2

To normalize, divide the dot product vector by (|query| * |candidates|) broadcasting where necessary.

Hint 3

Use np.argsort(-sims, kind='stable') to sort indices by descending similarity while preserving original order for ties.

Roles
ML Engineer
AI Engineer
Companies
GeneralGeneral
Levels
senior
entry
Tags
cosine-similarity
numpy-vectorization
stable-sorting
embeddings-retrieval
20 people are solving this problem
Python LogoPython Editor
Ln 1, Col 1

Input Arguments

Edit values below to test with custom inputs

You need tolog in/sign upto run or submit