Whitespace tokenization - ML & AI Coding

253. Whitespace tokenization

easy

General

senior

Implement whitespace tokenization for a simple NLP preprocessing step, where you split text into tokens by runs of whitespace and discard empty tokens. Return the tokens in order so they can be fed into later steps like vocabulary building.

Requirements

Implement the function

python

Rules:

Tokenize by splitting on one or more whitespace characters (spaces, tabs, newlines).
Do not include any empty tokens in the output.
Keep punctuation attached to words (e.g., "hello," stays as "hello,").
Do not lowercase or normalize; preserve the original characters.
Use only NumPy and Python built-in libraries.
Return a NumPy array of strings.

Example

python

Output:

python

Input Signature

Argument	Type
text	str

Output Signature

Return Name	Type
value	np.ndarray

Constraints

Input is a Python str.
Split on any whitespace runs.
Return np.ndarray of strings, no empty tokens.

Hint 1

Python already has a built-in way to split on any whitespace (spaces, tabs, newlines).

Hint 2

If you call split() without an explicit delimiter, it treats runs of whitespace as one separator and drops empty strings automatically.

Hint 3

Use np.array(text.split()) to convert the list of tokens to a NumPy array.

Roles

ML Engineer

AI Engineer

Data Scientist

Quantitative Analyst

Companies

General

Levels

senior

entry

Input Arguments

Edit values below to test with custom inputs

You need tolog in/sign upto run or submit