Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs

Back to Questions

253. Whitespace tokenization

easy
GeneralGeneral
senior

Implement whitespace tokenization for a simple NLP preprocessing step, where you split text into tokens by runs of whitespace and discard empty tokens. Return the tokens in order so they can be fed into later steps like vocabulary building.

Requirements

Implement the function

python

Rules:

  • Tokenize by splitting on one or more whitespace characters (spaces, tabs, newlines).
  • Do not include any empty tokens in the output.
  • Keep punctuation attached to words (e.g., "hello," stays as "hello,").
  • Do not lowercase or normalize; preserve the original characters.
  • Use only NumPy and Python built-in libraries.
  • Return a NumPy array of strings.

Example

python

Output:

python
Input Signature
ArgumentType
textstr
Output Signature
Return NameType
valuenp.ndarray

Constraints

  • Input is a Python str.

  • Split on any whitespace runs.

  • Return np.ndarray of strings, no empty tokens.

Hint 1

Python already has a built-in way to split on any whitespace (spaces, tabs, newlines).

Hint 2

If you call split() without an explicit delimiter, it treats runs of whitespace as one separator and drops empty strings automatically.

Hint 3

Use np.array(text.split()) to convert the list of tokens to a NumPy array.

Roles
ML Engineer
AI Engineer
Data Scientist
Quantitative Analyst
Companies
GeneralGeneral
Levels
senior
entry
Tags
string-processing
tokenization
whitespace-splitting
python-basics
17 people are solving this problem
Python LogoPython Editor
Ln 1, Col 1

Input Arguments

Edit values below to test with custom inputs

You need tolog in/sign upto run or submit