Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs

Back to Questions

156. Layer norm forward pass

easy
GeneralGeneral
senior

Implement the forward pass of Layer Normalization, a common deep learning building block that normalizes each sample across its feature dimensions. Layer norm is defined per sample as:

yi=γ⊙xi−μσ2+ϵ+β,μ=1d∑j=1dxij,  σ2=1d∑j=1d(xij−μ)2y_i = \gamma \odot \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon}} + \beta,\quad \mu = \frac{1}{d}\sum_{j=1}^{d} x_{ij},\ \ \sigma^2 = \frac{1}{d}\sum_{j=1}^{d}(x_{ij}-\mu)^2

Requirements

Implement the function

python

Rules:

  • Normalize each row (each sample) independently across its d features.
  • Use gamma and beta for per-feature scaling/shifting (broadcast across the batch).
  • Compute mean and variance for each sample (row), not across the batch.
  • Return a NumPy array.
  • Don’t use any prebuilt layer norm utilities (e.g., torch.nn.LayerNorm).

Example

python

Output:

python
Input Signature
ArgumentType
Xnp.ndarray
epsfloat
betanp.ndarray
gammanp.ndarray
Output Signature
Return NameType
valuenp.ndarray

Constraints

  • Return NumPy array

  • Normalize per sample; mean/var over features (axis=-1)

  • No prebuilt LayerNorm utilities

Hint 1

Process along the feature dimension (usually axis 1) for each sample. LayerNorm computes mean/variance across the feature dimension d, not across the batch.

Hint 2

Use np.mean(X, axis=1, keepdims=True) and np.var(X, axis=1, keepdims=True) to convert (n, d) -> (n, 1) for broadcasting.

Hint 3

Normalize using (X - mean) / np.sqrt(var + eps), then apply gamma * ... + beta. Gamma and beta will broadcast from shape (d,) to (n, d).

Roles
ML Engineer
AI Engineer
Companies
GeneralGeneral
Levels
senior
entry
Tags
layer-normalization
numerical-stability
vectorization
deep-learning-basics
31 people are solving this problem
Python LogoPython Editor
Ln 1, Col 1

Input Arguments

Edit values below to test with custom inputs

You need tolog in/sign upto run or submit