Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs

Back to Questions

12. Information gain

medium
GeneralGeneral
senior

Compute information gain for a candidate split in a decision tree, which tells you how much uncertainty about the labels is reduced after splitting. You’ll implement entropy and use it to score a split given arrays of labels and feature values.

The Information Gain for a binary split is defined as:

IG(Y)=H(Y)(nleftnH(Yleft)+nrightnH(Yright))IG(Y) = H(Y) - \left( \frac{n_{left}}{n} H(Y_{left}) + \frac{n_{right}}{n} H(Y_{right}) \right)

where H(Y)H(Y) is the entropy of the labels.

Requirements

Implement the function

python

Rules:

  • Use entropy H(S)=cp(c)log2p(c)H(S) = -\sum_{c} p(c)\log_2 p(c) (ignore terms where p(c)=0p(c)=0).
  • Split into left group L={i:xit}L=\{i: x_i \le t\} and right group R={i:xi>t}R=\{i: x_i > t\}.
  • Use only NumPy.
  • Return a single float.

Example

python

Output:

python
Input Signature
ArgumentType
xnp.ndarray
ynp.ndarray
thresholdfloat
Output Signature
Return NameType
valuefloat

Constraints

  • Use NumPy.

  • Return a float.

Hint 1

Use np.unique(y, return_counts=True) to get class counts for entropy calculation.

Hint 2

Entropy is -sum(p * log2(p)). Handle p=0 implicitly by only iterating over non-zero counts.

Hint 3

Use boolean masking mask = x <= threshold to split y into y[mask] and y[~mask].

Roles
ML Engineer
AI Engineer
Data Scientist
Quantitative Analyst
Companies
GeneralGeneral
Levels
senior
entry
Tags
entropy
information-gain
decision-trees
numpy
38 people are solving this problem
Python LogoPython Editor
Ln 1, Col 1

Input Arguments

Edit values below to test with custom inputs

You need tolog in/sign upto run or submit