Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs
Optimize a binary-classification decision threshold to maximize an evaluation metric on a validation set, a common step in ml_evaluation when converting predicted probabilities into hard labels.
Given predicted probabilities (p_i) and a threshold (t), the predicted label is (\hat{y}_i = \mathbb{1}[p_i \ge t]), and you should choose (t) that maximizes the F1 score:
Implement the function
Rules:
y_prob (plus 0.0 and 1.0 if you want), and pick the one with the highest F1.sklearn.metrics).(best_threshold, best_f1) as Python floats.Output:
| Argument | Type |
|---|---|
| y_prob | np.ndarray |
| y_true | np.ndarray |
| Return Name | Type |
|---|---|
| value | tuple |
Only NumPy; no sklearn metrics/utilities.
Thresholds from unique y_prob; ties choose smallest.
Return Python floats (best_threshold, best_f1).
Computations should handle NumPy array inputs.
Compute F1 for thresholds taken from unique y_prob values; tie-break by smallest threshold.
Sort by y_prob (descending) and sweep the threshold from high to low, updating TP/FP counts as more items become predicted-positive.
Process equal-probability values as a group (since p >= t includes all with p == t). Update TP/FP per group and compute precision/recall/F1 with divide-by-zero guards.