Join Our 5-Week ML/AI Engineer Interview Bootcamp 🚀 led by ML Tech Leads at FAANGs
Implement global norm gradient clipping, a common trick in optimization to prevent exploding gradients during training. Given a list of gradient vectors, you’ll scale them down only if their combined (global) L2 norm is too large.
The clipping rule is:
Implement the function
Rules:
grads (treat all entries as one long vector).<= max_norm, return gradients unchanged (numerically, same values).scale.Output:
| Argument | Type |
|---|---|
| grads | list |
| max_norm | float |
| Return Name | Type |
|---|---|
| value | list |
Use NumPy; no deep-learning utilities.
Return list of np.ndarray.
Single shared scale for all gradients.
Compute the global norm by summing squares of all entries across every gradient vector, then take sqrt.
Use a single shared scale: scale = min(1.0, max_norm / global_norm) (handle global_norm == 0). Apply it to every element.
With NumPy: accumulate total_sq += np.sum(g**2) over vectors; global_norm = np.sqrt(total_sq); output must stay list of arrays.