Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs
Compute the Expected Calibration Error (ECE) to measure how well a classifier’s predicted probabilities match actual outcomes. ECE bins predictions by confidence and compares average confidence to empirical accuracy in each bin.
The metric is defined as:
where (S_b) is the set of indices in bin (b), (n) is the total number of samples, (\mathrm{acc}(S_b)) is average correctness, and (\mathrm{conf}(S_b)) is average predicted confidence.
Implement the function:
Rules:
y_true == (y_prob >= 0.5) (i.e., predicted label uses threshold 0.5).y_prob in that bin.Output:
| Argument | Type |
|---|---|
| n_bins | int |
| y_prob | np.ndarray |
| y_true | np.ndarray |
| Return Name | Type |
|---|---|
| value | float |
Use NumPy only; no sklearn/scipy.
Equal-width bins on [0,1].
Threshold predictions at 0.5 for accuracy.
Start by converting y_true and y_prob to NumPy arrays, then create n_bins+1 equally spaced edges in [0,1] using np.linspace.
Assign each probability to a bin index (0..n_bins-1). Watch the edge case where y_prob == 1.0 should fall into the last bin (not overflow). np.digitize + clamping works well.
For each bin: compute count, acc = mean(y_true == (y_prob>=0.5)), and conf = mean(y_prob). Accumulate ece += (count/n) * abs(acc-conf); skip empty bins.