Information gain - ML & AI Coding

Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs

Back to Questions

12. Information gain

medium

General

senior

Compute information gain for a candidate split in a decision tree, which tells you how much uncertainty about the labels is reduced after splitting. You’ll implement entropy and use it to score a split given arrays of labels and feature values.

The Information Gain for a binary split is defined as:

IG(Y) = H(Y) - \left( \frac{n_{left}}{n} H(Y_{left}) + \frac{n_{right}}{n} H(Y_{right}) \right)

where $H(Y)$ is the entropy of the labels.

Requirements

Implement the function

python

Rules:

Use entropy $H(S) = -\sum_{c} p(c)\log_2 p(c)$ (ignore terms where $p(c)=0$ ).
Split into left group $L=\{i: x_i \le t\}$ and right group $R=\{i: x_i > t\}$ .
Use only NumPy.
Return a single float.

Example

python

Output:

python

Input Signature

Argument	Type
x	np.ndarray
y	np.ndarray
threshold	float

Output Signature

Return Name	Type
value	float

Constraints

Use NumPy.
Return a float.

Hint 1

Use np.unique(y, return_counts=True) to get class counts for entropy calculation.

Hint 2

Entropy is -sum(p * log2(p)). Handle p=0 implicitly by only iterating over non-zero counts.

Hint 3

Use boolean masking mask = x <= threshold to split y into y[mask] and y[~mask].

Roles

ML Engineer

AI Engineer

Data Scientist

Quantitative Analyst

Companies

General

Levels

senior

entry

Input Arguments

Edit values below to test with custom inputs

You need tolog in/sign upto run or submit