Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs
Selecting useful features is a key step in feature engineering, and one simple baseline is to drop features that barely change across samples. In this task, you’ll implement variance-threshold feature selection for a tabular dataset represented as a NumPy array.
Implement the function
Rules:
threshold.sklearn.feature_selection.VarianceThreshold).Output:
| Argument | Type |
|---|---|
| X | np.ndarray |
| threshold | float |
| Return Name | Type |
|---|---|
| value | np.ndarray |
Return np.ndarray.
Variance uses 1/n (ddof=0).
Keep columns where variance > threshold.
Convert X to a NumPy array and compute per-column variance with axis=0 (use the population variance definition, i.e., divide by n).
Create a boolean mask for columns where var > threshold (strictly greater), then slice X[:, mask] and convert back to list[list[float]].
Handle edge cases: if mask selects no columns, return [[] for _ in X]; if all columns pass, return X unchanged (but still as lists).