Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs
Text normalization is a common NLP preprocessing step that converts raw text into a consistent, “clean” form so downstream models see fewer superficial variations. In this task, you’ll implement a simple normalizer with a few standard rules.
Implement the function
Rules:
.,!?;:\'"()-(normalized_texts, r).Output:
| Argument | Type |
|---|---|
| texts | np.ndarray |
| Return Name | Type |
|---|---|
| value | tuple |
Input must be a NumPy array of strings.
Return tuple: (np.ndarray, float).
Start by iterating over each string in the input array.
Use a set of punctuation to remove and build cleaned_chars in one pass while lowercasing.
Use np.array to convert the list of normalized strings before returning.