Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs

Back to Questions

227. Sequence padding and truncation

easy
GeneralGeneral
senior

Preprocessing text for NLP models often requires every token sequence to have the same length. Write a function that pads shorter sequences and truncates longer ones to a fixed max_len.

For a sequence xx of length LL and target length MM, define L=min(L,M)L'=\min(L, M) and produce an output yy of length MM by copying LL' tokens (from the start or end) and filling the rest with pad_value.

Requirements

Implement the function

python

Rules:

  • Return a NumPy array of shape (batch_size, max_len).
  • The input seqs is a standard Python list of lists.
  • If a sequence is longer than max_len, truncate according to truncating ("pre" keeps the last max_len, "post" keeps the first max_len).
  • If a sequence is shorter than max_len, pad using pad_value according to padding ("pre" pads on the left, "post" pads on the right).
  • Do not modify the input seqs in-place.
  • Use only NumPy and Python built-in libraries.

Example

python

Output:

python
Input Signature
ArgumentType
seqslist
max_lenint
paddingstr
pad_valueint
truncatingstr
Output Signature
Return NameType
valuenp.ndarray

Constraints

  • Use only NumPy and Python built-ins

  • Do not modify input seqs in-place

  • Each output sequence length equals max_len

Hint 1

Start by deciding how to handle a single sequence: if len(seq) > max_len, slice it based on truncating (seq[:max_len] vs seq[-max_len:]).

Hint 2

Once you have the tokens to keep, compute how many pads you need: pad_count = max_len - len(tokens). Then build the final sequence by adding [pad_value] * pad_count either before or after depending on padding.

Hint 3

Using NumPy can simplify this: pre-allocate np.full((len(seqs), max_len), pad_value) and copy tokens into either padded[i, :n] (post) or padded[i, -n:] (pre). Return the NumPy array.

Roles
ML Engineer
AI Engineer
Data Scientist
Quantitative Analyst
Companies
GeneralGeneral
Levels
senior
entry
Tags
sequence-padding
truncation
numpy
40 people are solving this problem
Python LogoPython Editor
Ln 1, Col 1

Input Arguments

Edit values below to test with custom inputs

You need tolog in/sign upto run or submit