Feature selection variance threshold

86. Feature selection variance threshold

easy

General

senior

Selecting useful features is a key step in feature engineering, and one simple baseline is to drop features that barely change across samples. In this task, you’ll implement variance-threshold feature selection for a tabular dataset represented as a NumPy array.

Requirements

Implement the function

python

Rules:

Compute the variance per feature (per column) across all samples.
Keep a feature only if its variance is strictly greater than threshold.
Return the filtered dataset as a NumPy array (same row order, fewer columns).
Do not use prebuilt feature selection utilities (e.g., sklearn.feature_selection.VarianceThreshold).
Key considerations: vectorize with NumPy, treat columns independently, preserve original column order.

Example

python

Output:

python

Input Signature

Argument	Type
X	np.ndarray
threshold	float

Output Signature

Return Name	Type
value	np.ndarray

Constraints

Return np.ndarray.
Variance uses 1/n (ddof=0).
Keep columns where variance > threshold.

Hint 1

Convert X to a NumPy array and compute per-column variance with axis=0 (use the population variance definition, i.e., divide by n).

Hint 2

Create a boolean mask for columns where var > threshold (strictly greater), then slice X[:, mask] and convert back to list[list[float]].

Hint 3

Handle edge cases: if mask selects no columns, return [[] for _ in X]; if all columns pass, return X unchanged (but still as lists).

Roles

ML Engineer

AI Engineer

Companies

General

Levels

senior

entry

Input Arguments

Edit values below to test with custom inputs

You need tolog in/sign upto run or submit