Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs
Implement a numerically stable softmax as used inside Transformer attention, so you can safely turn attention logits into probabilities without overflow. The softmax is defined as:
Implement the function
Rules:
Output:
| Argument | Type |
|---|---|
| logits | np.ndarray |
| Return Name | Type |
|---|---|
| value | np.ndarray |
Use NumPy only.
Row-wise softmax over last dimension.
Return NumPy array.
Ensure logits is a float array so you can operate row-wise with axis=1.
For numerical stability, subtract each row’s maximum: shifted = arr - arr.max(axis=1, keepdims=True) before np.exp.
Compute exp = np.exp(shifted) then normalize per row: probs = exp / exp.sum(axis=1, keepdims=True).