Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs
Implement the forward pass of Layer Normalization, a common deep learning building block that normalizes each sample across its feature dimensions. Layer norm is defined per sample as:
Implement the function
Rules:
d features.gamma and beta for per-feature scaling/shifting (broadcast across the batch).torch.nn.LayerNorm).Output:
| Argument | Type |
|---|---|
| X | np.ndarray |
| eps | float |
| beta | np.ndarray |
| gamma | np.ndarray |
| Return Name | Type |
|---|---|
| value | np.ndarray |
Return NumPy array
Normalize per sample; mean/var over features (axis=-1)
No prebuilt LayerNorm utilities
Process along the feature dimension (usually axis 1) for each sample. LayerNorm computes mean/variance across the feature dimension d, not across the batch.
Use np.mean(X, axis=1, keepdims=True) and np.var(X, axis=1, keepdims=True) to convert (n, d) -> (n, 1) for broadcasting.
Normalize using (X - mean) / np.sqrt(var + eps), then apply gamma * ... + beta. Gamma and beta will broadcast from shape (d,) to (n, d).