Residual connections - ML & AI Coding

76. Residual connections

easy

General

senior

Residual connections are a simple but crucial part of Transformers that help gradients flow and keep representations stable as depth increases. In this task, you’ll implement the residual “Add & Norm” step used around Transformer sublayers.

The operations are:

z = x + \text{sublayer\_out}

\text{LayerNorm}(z) = \gamma \odot \frac{z - \mu}{\sqrt{\sigma^2 + \epsilon}} + \beta

where $\mu$ and $\sigma^2$ are the mean and variance of $z$ taken across the feature dimension.

Requirements

Implement the function

python

Rules:

Compute the residual addition z = x + sublayer_out elementwise.
Apply Layer Normalization across the feature dimension (d_model) for each token independently.
Use the given gamma and beta to scale and shift the normalized values.
Return the result as a NumPy array.
Don’t use prebuilt LayerNorm implementations (e.g., torch.nn.LayerNorm).

Example

python

Output:

python

Input Signature

Argument	Type
x	np.ndarray
eps	float
beta	np.ndarray
gamma	np.ndarray
sublayer_out	np.ndarray

Output Signature

Return Name	Type
value	np.ndarray

Constraints

Normalize across d_model (axis=1), per token.
No prebuilt LayerNorm; use numpy ops only.
Return NumPy array.

Hint 1

Start by computing the residual: z = x + sublayer_out (broadcasting works naturally).

Hint 2

LayerNorm here is per token: for each row z[i], compute mean and var across the feature dimension (axis=1). Use eps inside the square root.

Hint 3

After normalization norm = (z - mean) / sqrt(var + eps), apply the affine transform out = gamma * norm + beta and return out.

Roles

ML Engineer

AI Engineer

Companies

General

Levels

senior

entry

Input Arguments

Edit values below to test with custom inputs

You need tolog in/sign upto run or submit