Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs
Compute information gain for a candidate split in a decision tree, which tells you how much uncertainty about the labels is reduced after splitting. You’ll implement entropy and use it to score a split given arrays of labels and feature values.
The Information Gain for a binary split is defined as:
where is the entropy of the labels.
Implement the function
Rules:
Output:
| Argument | Type |
|---|---|
| x | np.ndarray |
| y | np.ndarray |
| threshold | float |
| Return Name | Type |
|---|---|
| value | float |
Use NumPy.
Return a float.
Use np.unique(y, return_counts=True) to get class counts for entropy calculation.
Entropy is -sum(p * log2(p)). Handle p=0 implicitly by only iterating over non-zero counts.
Use boolean masking mask = x <= threshold to split y into y[mask] and y[~mask].