Join Our 5-Week ML/AI Engineer Interview Bootcamp πŸš€ led by ML Tech Leads at FAANGs

Back to Questions

113. Feature correlation filtering

medium
GeneralGeneral
staff

Feature correlation filtering is a simple feature-engineering step that removes redundant input columns that are strongly correlated with each other. You’ll implement a function that drops features whose absolute correlation with any already-kept feature exceeds a threshold, using Pearson correlation.

The Pearson correlation between two features (x) and (y) is:

ρ(x,y)=βˆ‘i=1n(xiβˆ’xΛ‰)(yiβˆ’yΛ‰)βˆ‘i=1n(xiβˆ’xΛ‰)2βˆ‘i=1n(yiβˆ’yΛ‰)2\rho(x, y) = \frac{\sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i-\bar{x})^2}\sqrt{\sum_{i=1}^{n}(y_i-\bar{y})^2}}

Requirements

Implement the function:

python

Rules:

  • Compute the feature-feature Pearson correlation matrix from X.
  • Use absolute correlation (i.e., compare abs(corr) to threshold).
  • Greedy rule: iterate features from left to right; keep a feature only if it is not too correlated with any previously kept feature.
  • Return only the kept feature_names (don’t return indices or a modified X).
  • Don’t use any prebuilt feature-selection utilities; just NumPy + basic Python.

Example

python

Output:

python
Input Signature
ArgumentType
Xnp.ndarray
thresholdfloat
feature_nameslist
Output Signature
Return NameType
valuelist

Constraints

  • Use NumPy only; no feature-selection utilities.

  • Return kept feature_names; preserve original order.

  • Drop if abs(Pearson corr) > threshold.

Hint 1

Convert X to a NumPy float array so you can compute correlations column-wise.

Hint 2

Use np.corrcoef(X, rowvar=False) to get an n_features Γ— n_features Pearson correlation matrix; compare abs(corr[i,j]) to the threshold.

Hint 3

Apply a greedy left-to-right rule: maintain a list of kept feature indices; for each new feature, drop it if it exceeds the threshold with any previously kept feature, otherwise keep it and continue.

Roles
ML Engineer
AI Engineer
Companies
GeneralGeneral
Levels
staff
senior
entry
Tags
pearson-correlation
greedy-algorithm
feature-selection
numpy
48 people are solving this problem
Python LogoPython Editor
Ln 1, Col 1

Input Arguments

Edit values below to test with custom inputs

You need tolog in/sign upto run or submit