Positional embeddings for images

42. Positional embeddings for images

medium

General

manager

Positional embeddings help a vision model keep track of where each patch/token came from in an image, so spatial layout isn’t lost when you flatten patches into a sequence. In this task, you’ll generate 2D sinusoidal positional embeddings for an image grid, similar to what’s used in Vision Transformers.

The 2D sinusoidal embedding is formed by concatenating 1D embeddings for row and column positions:

PE(x, y) = [\,PE_{row}(y)\ \Vert\ PE_{col}(x)\,]

where each 1D embedding uses

PE(pos, 2i)=\sin\left(\frac{pos}{10000^{2i/d}}\right),\quad PE(pos, 2i+1)=\cos\left(\frac{pos}{10000^{2i/d}}\right).

Requirements

Implement the function

python

Rules:

Use 2D embeddings by concatenating row and column sinusoidal embeddings, each of size d_model/2.
Return a NumPy array of shape (height * width, d_model).
Use row-major order: index idx = y * width + x corresponds to position (y, x).
Do not call any prebuilt positional embedding utilities (e.g., from transformers).
Use NumPy for vectorized computation.

Example

python

Output:

python

Input Signature

Argument	Type
width	int
height	int
d_model	int

Output Signature

Return Name	Type
value	np.ndarray

Constraints

Return NumPy array.
Output shape (height*width, d_model)
d_model must be even; half uses sin/cos

Hint 1

Split d_model into two halves: half = d_model//2 for rows and columns, and ensure d_model is even.

Hint 2

Implement a reusable 1D sinusoidal embedding builder: precompute denom = 10000 ** (2*i/half) for i=0..half/2-1, then fill even indices with sin(pos/denom) and odd with cos(pos/denom).

Hint 3

Precompute row_emb for all y and col_emb for all x. Use repeat or tile to create grids, then concat and reshape.

Roles

ML Engineer

AI Engineer

Companies

General

Levels

manager

staff

senior

Input Arguments

Edit values below to test with custom inputs

You need tolog in/sign upto run or submit