Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs

Back to Questions

88. Detecting duplicate features

easy
GeneralGeneral
senior

Detect duplicate (or perfectly correlated) features during feature engineering so you can safely drop redundant columns. You’ll treat two features as duplicates if they are exactly equal for every row.

The duplicate check is defined as:

x≡y  ⟺  ∀i∈{1,…,n},  xi=yix \equiv y \iff \forall i \in \{1,\dots,n\},\; x_i = y_i

Requirements

Implement the function

python

Rules:

  • Two features are duplicates only if their entire columns match exactly across all rows.
  • Return all duplicate pairs (feature_names[i], feature_names[j]) for i < j that are duplicates.
  • Use only NumPy and Python built-in libraries (no pandas).
  • Keep the implementation in a single Python function.
  • Key considerations: compare columns efficiently, keep output ordering stable, avoid converting to strings for comparison, handle multiple duplicates of the same feature, and return names (not indices).

Example

python

Output:

python
Input Signature
ArgumentType
Xnp.ndarray
feature_namesnp.ndarray
Output Signature
Return NameType
valuenp.ndarray

Constraints

  • Use NumPy; no pandas

  • Return pairs in i<j scan order

  • Compare columns exactly; no string conversion

Hint 1

Slice columns directly from X as X[:, i].

Hint 2

Scan feature pairs in stable order with nested loops (i < j) and compare full columns.

Hint 3

Use np.array_equal(X[:, i], X[:, j]) for exact, row-wise identity; append [feature_names[i], feature_names[j]] immediately to preserve discovery order.

Roles
ML Engineer
AI Engineer
Companies
GeneralGeneral
Levels
senior
entry
Tags
numpy
feature-engineering
duplicate-columns
array-equality
26 people are solving this problem
Python LogoPython Editor
Ln 1, Col 1

Input Arguments

Edit values below to test with custom inputs

You need tolog in/sign upto run or submit