Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs

Back to Questions

208. Stable softmax

easy
GeneralGeneral
senior

Implement a numerically stable softmax as used inside Transformer attention, so you can safely turn attention logits into probabilities without overflow. The softmax is defined as:

softmax(zi)=ezi∑j=1nezj\text{softmax}(z_i)=\frac{e^{z_i}}{\sum_{j=1}^{n} e^{z_j}}

Requirements

Implement the function

python

Rules:

  • Compute softmax row-wise: each query’s logits should sum to 1 across keys.
  • Use the stability trick: subtract the row max before exponentiating.
  • Use only NumPy operations (no SciPy / deep learning frameworks).
  • Return the output as a NumPy array.
  • Keep it numerically stable for large positive/negative logits.

Example

python

Output:

python
Input Signature
ArgumentType
logitsnp.ndarray
Output Signature
Return NameType
valuenp.ndarray

Constraints

  • Use NumPy only.

  • Row-wise softmax over last dimension.

  • Return NumPy array.

Hint 1

Ensure logits is a float array so you can operate row-wise with axis=1.

Hint 2

For numerical stability, subtract each row’s maximum: shifted = arr - arr.max(axis=1, keepdims=True) before np.exp.

Hint 3

Compute exp = np.exp(shifted) then normalize per row: probs = exp / exp.sum(axis=1, keepdims=True).

Roles
ML Engineer
AI Engineer
Companies
GeneralGeneral
Levels
senior
entry
Tags
softmax
numerical-stability
numpy
attention
11 people are solving this problem
Python LogoPython Editor
Ln 1, Col 1

Input Arguments

Edit values below to test with custom inputs

You need tolog in/sign upto run or submit