Join Our 5-Week ML/AI Engineer Interview Bootcamp π led by ML Tech Leads at FAANGs
Feature correlation filtering is a simple feature-engineering step that removes redundant input columns that are strongly correlated with each other. Youβll implement a function that drops features whose absolute correlation with any already-kept feature exceeds a threshold, using Pearson correlation.
The Pearson correlation between two features (x) and (y) is:
Implement the function:
Rules:
X.abs(corr) to threshold).feature_names (donβt return indices or a modified X).Output:
| Argument | Type |
|---|---|
| X | np.ndarray |
| threshold | float |
| feature_names | list |
| Return Name | Type |
|---|---|
| value | list |
Use NumPy only; no feature-selection utilities.
Return kept feature_names; preserve original order.
Drop if abs(Pearson corr) > threshold.
Convert X to a NumPy float array so you can compute correlations column-wise.
Use np.corrcoef(X, rowvar=False) to get an n_features Γ n_features Pearson correlation matrix; compare abs(corr[i,j]) to the threshold.
Apply a greedy left-to-right rule: maintain a list of kept feature indices; for each new feature, drop it if it exceeds the threshold with any previously kept feature, otherwise keep it and continue.