Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs
Detect duplicate (or perfectly correlated) features during feature engineering so you can safely drop redundant columns. You’ll treat two features as duplicates if they are exactly equal for every row.
The duplicate check is defined as:
Implement the function
Rules:
(feature_names[i], feature_names[j]) for i < j that are duplicates.Output:
| Argument | Type |
|---|---|
| X | np.ndarray |
| feature_names | np.ndarray |
| Return Name | Type |
|---|---|
| value | np.ndarray |
Use NumPy; no pandas
Return pairs in i<j scan order
Compare columns exactly; no string conversion
Slice columns directly from X as X[:, i].
Scan feature pairs in stable order with nested loops (i < j) and compare full columns.
Use np.array_equal(X[:, i], X[:, j]) for exact, row-wise identity; append [feature_names[i], feature_names[j]] immediately to preserve discovery order.