Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs
Binning (a.k.a. discretization) turns a continuous feature into bucket IDs that you can feed into models or use for analysis. You’ll implement equal-width binning for one feature and optionally return one-hot encoded bins.
Implement the function
Rules:
num_bins - 1).bin_ids and the one_hot encoding for each value.Output:
| Argument | Type |
|---|---|
| x | np.ndarray |
| num_bins | int |
| Return Name | Type |
|---|---|
| value | tuple |
Use only NumPy and Python built-in libraries
Output bin_ids array and one_hot matrix
Clamp max value to last bin
Compute x_min = np.min(x) and x_max = np.max(x), then bin width w = (x_max - x_min) / num_bins.
For each value v, compute idx = floor((v - x_min) / w); in NumPy use ((x - x_min) / w).astype(int).
Edge case: when v == x_max, the formula yields num_bins; clamp with np.clip(idx, 0, num_bins-1). Then build one-hot array with advanced indexing.