Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs

Back to Questions

4. Text normalization

medium
GeneralGeneral
staff

Text normalization is a common NLP preprocessing step that converts raw text into a consistent, “clean” form so downstream models see fewer superficial variations. In this task, you’ll implement a simple normalizer with a few standard rules.

Requirements

Implement the function

python

Rules:

  • For each string, lowercase it.
  • Remove punctuation characters: .,!?;:\'"()-
  • Replace any sequence of whitespace (spaces, tabs, newlines) with a single space.
  • Trim leading and trailing spaces.
  • Report the normalization rate as: r=total removed characterstotal original charactersr = \frac{\text{total removed characters}}{\text{total original characters}} and return it along with the normalized texts as (normalized_texts, r).

Example

python

Output:

python
Input Signature
ArgumentType
textsnp.ndarray
Output Signature
Return NameType
valuetuple

Constraints

  • Input must be a NumPy array of strings.

  • Return tuple: (np.ndarray, float).

Hint 1

Start by iterating over each string in the input array.

Hint 2

Use a set of punctuation to remove and build cleaned_chars in one pass while lowercasing.

Hint 3

Use np.array to convert the list of normalized strings before returning.

Roles
ML Engineer
AI Engineer
Data Scientist
Quantitative Analyst
Companies
GeneralGeneral
Levels
staff
senior
entry
Tags
string-processing
text-normalization
whitespace-collapsing
character-filtering
13 people are solving this problem
Python LogoPython Editor
Ln 1, Col 1

Input Arguments

Edit values below to test with custom inputs

You need tolog in/sign upto run or submit