Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs

Back to Questions

86. Feature selection variance threshold

easy
GeneralGeneral
senior

Selecting useful features is a key step in feature engineering, and one simple baseline is to drop features that barely change across samples. In this task, you’ll implement variance-threshold feature selection for a tabular dataset represented as a NumPy array.

Requirements

Implement the function

python

Rules:

  • Compute the variance per feature (per column) across all samples.
  • Keep a feature only if its variance is strictly greater than threshold.
  • Return the filtered dataset as a NumPy array (same row order, fewer columns).
  • Do not use prebuilt feature selection utilities (e.g., sklearn.feature_selection.VarianceThreshold).
  • Key considerations: vectorize with NumPy, treat columns independently, preserve original column order.

Example

python

Output:

python
Input Signature
ArgumentType
Xnp.ndarray
thresholdfloat
Output Signature
Return NameType
valuenp.ndarray

Constraints

  • Return np.ndarray.

  • Variance uses 1/n (ddof=0).

  • Keep columns where variance > threshold.

Hint 1

Convert X to a NumPy array and compute per-column variance with axis=0 (use the population variance definition, i.e., divide by n).

Hint 2

Create a boolean mask for columns where var > threshold (strictly greater), then slice X[:, mask] and convert back to list[list[float]].

Hint 3

Handle edge cases: if mask selects no columns, return [[] for _ in X]; if all columns pass, return X unchanged (but still as lists).

Roles
ML Engineer
AI Engineer
Companies
GeneralGeneral
Levels
senior
entry
Tags
feature-selection
variance
numpy
tabular-data
43 people are solving this problem
Python LogoPython Editor
Ln 1, Col 1

Input Arguments

Edit values below to test with custom inputs

You need tolog in/sign upto run or submit