Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs

Back to Questions

245. Threshold optimization

medium
GeneralGeneral
staff

Optimize a binary-classification decision threshold to maximize an evaluation metric on a validation set, a common step in ml_evaluation when converting predicted probabilities into hard labels.

Given predicted probabilities (p_i) and a threshold (t), the predicted label is (\hat{y}_i = \mathbb{1}[p_i \ge t]), and you should choose (t) that maximizes the F1 score:

F1(t)=2â‹…Precision(t)â‹…Recall(t)Precision(t)+Recall(t)F1(t)=\frac{2\cdot \text{Precision}(t)\cdot \text{Recall}(t)}{\text{Precision}(t)+\text{Recall}(t)}

Requirements

Implement the function

python

Rules:

  • Use thresholds drawn from the unique values in y_prob (plus 0.0 and 1.0 if you want), and pick the one with the highest F1.
  • If multiple thresholds tie for best F1, return the smallest threshold.
  • Do not use any prebuilt metric/threshold utilities (e.g., sklearn.metrics).
  • Return (best_threshold, best_f1) as Python floats.
  • Use only NumPy and Python built-in libraries.

Example

python

Output:

python
Input Signature
ArgumentType
y_probnp.ndarray
y_truenp.ndarray
Output Signature
Return NameType
valuetuple

Constraints

  • Only NumPy; no sklearn metrics/utilities.

  • Thresholds from unique y_prob; ties choose smallest.

  • Return Python floats (best_threshold, best_f1).

  • Computations should handle NumPy array inputs.

Hint 1

Compute F1 for thresholds taken from unique y_prob values; tie-break by smallest threshold.

Hint 2

Sort by y_prob (descending) and sweep the threshold from high to low, updating TP/FP counts as more items become predicted-positive.

Hint 3

Process equal-probability values as a group (since p >= t includes all with p == t). Update TP/FP per group and compute precision/recall/F1 with divide-by-zero guards.

Roles
ML Engineer
AI Engineer
Data Scientist
Quantitative Analyst
Companies
GeneralGeneral
Levels
staff
senior
entry
Tags
threshold-optimization
f1-score
binary-classification
sweep-line
23 people are solving this problem
Python LogoPython Editor
Ln 1, Col 1

Input Arguments

Edit values below to test with custom inputs

You need tolog in/sign upto run or submit