Detecting duplicate features

88. Detecting duplicate features

easy

General

senior

Detect duplicate (or perfectly correlated) features during feature engineering so you can safely drop redundant columns. You’ll treat two features as duplicates if they are exactly equal for every row.

The duplicate check is defined as:

x \equiv y \iff \forall i \in \{1,\dots,n\},\; x_i = y_i

Requirements

Implement the function

python

Rules:

Two features are duplicates only if their entire columns match exactly across all rows.
Return all duplicate pairs (feature_names[i], feature_names[j]) for i < j that are duplicates.
Use only NumPy and Python built-in libraries (no pandas).
Keep the implementation in a single Python function.
Key considerations: compare columns efficiently, keep output ordering stable, avoid converting to strings for comparison, handle multiple duplicates of the same feature, and return names (not indices).

Example

python

Output:

python

Input Signature

Argument	Type
X	np.ndarray
feature_names	np.ndarray

Output Signature

Return Name	Type
value	np.ndarray

Constraints

Use NumPy; no pandas
Return pairs in i<j scan order
Compare columns exactly; no string conversion

Hint 1

Slice columns directly from X as X[:, i].

Hint 2

Scan feature pairs in stable order with nested loops (i < j) and compare full columns.

Hint 3

Use np.array_equal(X[:, i], X[:, j]) for exact, row-wise identity; append [feature_names[i], feature_names[j]] immediately to preserve discovery order.

Roles

ML Engineer

AI Engineer

Companies

General

Levels

senior

entry

Input Arguments

Edit values below to test with custom inputs

You need tolog in/sign upto run or submit