Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs
Retrieve the top‑k most similar embeddings for a query embedding using cosine similarity, which is a common building block in embedding search. You’ll compute similarity scores and return the indices of the best matches.
Cosine similarity is:
Implement the function
Rules:
query and every row in embeddings.k most similar embeddings, sorted from most similar to least similar.Output:
| Argument | Type |
|---|---|
| k | int |
| query | np.ndarray |
| embeddings | np.ndarray |
| Return Name | Type |
|---|---|
| value | list |
Use NumPy for vectorization.
Return indices sorted; ties use smaller index
Handle k <= n; embeddings shape (n,d)
Compute norms: q_norm = np.linalg.norm(query) and e_norms = np.linalg.norm(embeddings, axis=1).
Compute dot products: dots = embeddings @ query.
Calculate similarities: sims = dots / (q_norm * e_norms); create pairs (sim, idx) and sort with key (-sim, idx).