Join Our 5-Week ML/AI Engineer Interview Bootcamp π led by ML Tech Leads at FAANGs
Build a similarity-based ranking function for an embeddings retrieval system, where you rank candidate items by how close their embedding vectors are to a query embedding. Use cosine similarity to score each candidate:
Implement the function
Rules:
query and each vector in candidates.top_k most similar candidates, sorted from most to least similar.sklearn, scipy).Output:
| Argument | Type |
|---|---|
| query | np.ndarray |
| top_k | int |
| candidates | np.ndarray |
| Return Name | Type |
|---|---|
| value | np.ndarray |
Input query and candidates are NumPy arrays.
top_k is a positive integer.
Return indices as a NumPy array.
Use matrix multiplication candidates @ query to compute dot products for all items at once.
To normalize, divide the dot product vector by (|query| * |candidates|) broadcasting where necessary.
Use np.argsort(-sims, kind='stable') to sort indices by descending similarity while preserving original order for ties.