Join ML Engineer Interview MasterClass (April Cohort) led by FAANG Data Scientists | Just 6 seats remaining...
ML Engineer MasterClass (April) | 6 seats left
Every calculus course teaches you that second-order terms vanish. When you expand $f(x + dx)$ as a Taylor series, you keep the $f'(x)\,dx$ term and throw away $(dx)^2$ because it's infinitesimally small compared to $dx$. That works perfectly in deterministic calculus. It fails completely for Brownian motion.
The reason is a single, strange fact: $(dW_t)^2 = dt$. Not zero. Not negligible. Exactly $dt$. Brownian motion is so rough, so non-differentiable, that its squared increments accumulate at a finite rate. This means when you try to differentiate a function of a stochastic process, the second-order term in your Taylor expansion refuses to die, and you're left with an extra correction that has no analogue in ordinary calculus. That correction is Ito's Lemma.
Informally, Ito's Lemma tells you how a smooth function $f(t, X_t)$ evolves when $X_t$ is driven by noise. If the function is convex, the noise pushes its average value up; if concave, down. This is the continuous-time version of Jensen's inequality, and it shows up as the $\frac{1}{2}\sigma^2 f_{xx}$ term that separates stochastic calculus from everything you learned before it.
This is the engine behind Black-Scholes. It's the foundation of the Heston model, the Hull-White model, and every continuous-time result you'll encounter in a quant interview. Interviewers at Goldman, Citadel, and Two Sigma routinely open with "derive Ito's Lemma from scratch" precisely because it separates candidates who understand the mathematics from those who memorized a formula. The derivation starts with quadratic variation, and that's where we start too.
Start with quadratic variation. That's the foundation everything else rests on.
Partition the interval $[0, t]$ into $n$ subintervals of width $\Delta t = t/n$. Over each subinterval, the Brownian increment $\Delta W_i = W_{t_{i+1}} - W_{t_i}$ is normally distributed with mean zero and variance $\Delta t$. Now sum the squared increments:
$$\sum_{i=1}^{n} (\Delta W_i)^2$$
Each term has expected value $\mathbb{E}[(\Delta W_i)^2] = \Delta t$, so the sum has expected value $t$. The variance of each term is $\mathbb{E}[(\Delta W_i)^4] - (\Delta t)^2 = 2(\Delta t)^2$, so the total variance of the sum is $2n(\Delta t)^2 = 2t^2/n$, which goes to zero as $n \to \infty$. The sum converges to $t$ in $L^2$. This is the quadratic variation result: $[W, W]_t = t$.
That single fact is what breaks ordinary calculus. In a deterministic setting, $(\Delta x)^2$ is negligible compared to $\Delta x$ as increments shrink. For Brownian motion, $(\Delta W)^2$ is not negligible. It concentrates around $\Delta t$, not zero.
This gives you the Ito multiplication table, which is the bookkeeping device for the whole theory:
$$dW_t \cdot dW_t = dt, \qquad dt \cdot dW_t = 0, \qquad dt \cdot dt = 0$$
Think of it like significant figures: $dt$ is first-order small, $(dt)^2$ is second-order and vanishes, but $(dW_t)^2$ is also first-order because Brownian paths are rough enough to accumulate variation at rate $dt$.
Now take a smooth function $f(t, x) \in C^{1,2}$ and an Ito process $dX_t = \mu_t \, dt + \sigma_t \, dW_t$. You want to find $df(t, X_t)$.
Write the second-order Taylor expansion:
$$df = f_t \, dt + f_x \, dX_t + \frac{1}{2} f_{tt} \, (dt)^2 + f_{tx} \, dt \, dX_t + \frac{1}{2} f_{xx} \, (dX_t)^2 + \cdots$$
Now substitute $dX_t = \mu_t \, dt + \sigma_t \, dW_t$ and apply the multiplication table. The term $(dX_t)^2$ expands as:
$$(dX_t)^2 = (\mu_t \, dt + \sigma_t \, dW_t)^2 = \mu_t^2 (dt)^2 + 2\mu_t \sigma_t \, dt \, dW_t + \sigma_t^2 (dW_t)^2$$
Apply the table: $(dt)^2 = 0$, $dt \, dW_t = 0$, $(dW_t)^2 = dt$. Everything collapses to $\sigma_t^2 \, dt$.
The $f_{tt}(dt)^2$ and $f_{tx} \, dt \, dX_t$ terms also vanish by the same rules. What survives:
$$\boxed{df(t, X_t) = \left( f_t + \mu_t f_x + \frac{1}{2}\sigma_t^2 f_{xx} \right) dt + \sigma_t f_x \, dW_t}$$
That $\frac{1}{2}\sigma_t^2 f_{xx}$ term is the entire story. It comes directly from $(dW)^2 = dt$ surviving the Taylor expansion. In ordinary calculus, the second-order term dies. Here it lives.
Here's what that flow looks like:

In deterministic calculus, if $x(t)$ is a differentiable path, then $(dx)^2 \sim (x'(t))^2 (dt)^2$, which is second-order small and drops out. The chain rule $df = f_t \, dt + f_x \, dx$ is exact.
Brownian paths are nowhere differentiable. They're too rough for that. The squared increment doesn't vanish; it accumulates. So the second-order term in the Taylor expansion is not a rounding error you can ignore. It's a genuine first-order contribution.
Your interviewer cares about this contrast because it's the conceptual test. If you say "Ito's Lemma is just the chain rule with an extra term," you'll get a follow-up: "why is there an extra term?" The answer is quadratic variation, not a hand-wave about randomness.
When you have two correlated Ito processes $dX_t^{(i)} = \mu_i \, dt + \sigma_i \, dW_t^{(i)}$ with $dW_t^{(i)} \, dW_t^{(j)} = \rho_{ij} \, dt$, the cross-variation terms survive too. For $f(X_t^{(1)}, X_t^{(2)}, \ldots)$, the correction becomes a full Hessian sum:
$$df = f_t \, dt + \sum_i f_{x_i} \, dX_t^{(i)} + \frac{1}{2} \sum_{i,j} f_{x_i x_j} \, \sigma_i \sigma_j \rho_{ij} \, dt$$
Multi-asset desks at firms like Goldman or Citadel will probe this directly. If you're pricing a spread option or a basket, the $\rho_{ij}$ terms in the Hessian are where correlation enters the PDE. Candidates who only know the one-dimensional version get stuck the moment an interviewer writes down two correlated underlyings.
The key property to internalize: the correction term scales with $\sigma^2$ (or $\sigma_i \sigma_j \rho_{ij}$ in multiple dimensions). Higher volatility means a larger curvature correction. This is the continuous-time analog of Jensen's inequality: a convex function of a random variable has a higher expectation than the function evaluated at the mean, and the gap grows with variance.
In an interview, you'll usually need to pick a specific approach. Here are the ones worth knowing.
This is the pattern you will be asked about. Every quant interviewer has it in their back pocket, and the trap is always the same: candidates write the drift as $\mu$ instead of $\mu - \frac{1}{2}\sigma^2$ and don't notice.
Start with $dS_t = \mu S_t \, dt + \sigma S_t \, dW_t$ and apply Ito's Lemma to $f(S) = \ln S$. The partials are $f_S = 1/S$, $f_{SS} = -1/S^2$, and $f_t = 0$. Plugging into the lemma:
$$d(\ln S_t) = \frac{1}{S_t} dS_t + \frac{1}{2} \left(-\frac{1}{S_t^2}\right) \sigma^2 S_t^2 \, dt = \left(\mu - \frac{1}{2}\sigma^2\right) dt + \sigma \, dW_t$$
The $-\frac{1}{2}\sigma^2$ term is the entire point. It's the curvature correction from Jensen's inequality: because $\ln$ is concave, the expected log return is strictly less than the log of the expected return. Financially, it's the gap between arithmetic and geometric average returns. When your interviewer asks "why is the drift $\mu - \frac{1}{2}\sigma^2$ and not just $\mu$?", that's your answer.
When to reach for this: any time an interviewer asks you to find the distribution of $S_T$, simulate GBM paths correctly, or derive what $\mathbb{E}[\ln S_T]$ equals.

Ito's Lemma on $V(t, S_t)$ gives you:
$$dV = \left(V_t + \mu S V_S + \frac{1}{2}\sigma^2 S^2 V_{SS}\right) dt + \sigma S V_S \, dW_t$$
Now construct a portfolio $\Pi = V - \Delta S$ where $\Delta = V_S$. The $dW_t$ terms cancel exactly, leaving a portfolio that is instantaneously riskless. No-arbitrage then forces $d\Pi = r\Pi \, dt$, and after substituting and rearranging you arrive at:
$$V_t + \frac{1}{2}\sigma^2 S^2 V_{SS} + r S V_S - rV = 0$$
Notice that $\mu$ has completely disappeared. That's not a coincidence; the delta hedge eliminates all exposure to the actual drift of the stock. The PDE only depends on $\sigma$ and $r$, which is why two investors with different views on $\mu$ can still agree on the option price.
Interviewers want to see you derive this without skipping the delta-hedge step. If you jump straight to the PDE without explaining why the $dW_t$ term vanishes, expect a follow-up that forces you back to it anyway.
When to reach for this: whenever the question involves option pricing, the connection between SDEs and PDEs, or the Feynman-Kac theorem.

Define $Z_t = \exp!\left(\theta W_t - \frac{1}{2}\theta^2 t\right)$. Apply Ito's Lemma with $f(t, w) = e^{\theta w - \frac{1}{2}\theta^2 t}$:
$$dZ_t = \left(-\frac{1}{2}\theta^2 Z_t + \frac{1}{2}\theta^2 Z_t\right) dt + \theta Z_t \, dW_t = \theta Z_t \, dW_t$$
The $dt$ terms cancel perfectly. No drift means $Z_t$ is a local martingale (and under Novikov's condition, a true martingale). This is not just a curiosity: $Z_t$ is the Radon-Nikodym derivative $\frac{d\mathbb{Q}}{d\mathbb{P}}\big|_{\mathcal{F}_t}$ that defines the risk-neutral measure. Under $\mathbb{Q}$, the process $\tilde{W}_t = W_t - \theta t$ is a standard Brownian motion, which is exactly Girsanov's theorem.
When to reach for this: risk-neutral pricing derivations, any question about changing probability measures, or when an interviewer asks you to verify that a given process is a martingale.

Take two Ito processes $X_t$ and $Y_t$ and apply the two-dimensional Ito's Lemma to $f(x, y) = xy$. The second-order partials are $f_{xx} = f_{yy} = 0$ and $f_{xy} = 1$, so:
$$d(X_t Y_t) = X_t \, dY_t + Y_t \, dX_t + dX_t \, dY_t$$
That last term, $dX_t \, dY_t$, is the piece that ordinary calculus doesn't have. If $dX_t = \sigma_X \, dW_t^{(1)}$ and $dY_t = \sigma_Y \, dW_t^{(2)}$ with $dW^{(1)} dW^{(2)} = \rho \, dt$, then $dX_t \, dY_t = \rho \sigma_X \sigma_Y \, dt$. For independent processes it vanishes; for correlated ones it doesn't.
This pattern comes up constantly in proofs: deriving the dynamics of a ratio $X_t / Y_t$ (via $f(x,y) = x/y$), computing the dynamics of a discounted price process, or proving that a product of martingales is not generally a martingale.
When to reach for this: any time you need to find the SDE for a product or ratio of two stochastic processes, or when an interviewer asks about the dynamics of a discounted asset.

The OU process $dX_t = -\alpha X_t \, dt + \sigma \, dW_t$ looks like it should have a closed-form solution, but you can't integrate it directly because of the $X_t$ on the right-hand side. The trick is an integrating factor.
Define $f(t, x) = e^{\alpha t} x$ and apply Ito's Lemma:
$$d!\left(e^{\alpha t} X_t\right) = \alpha e^{\alpha t} X_t \, dt + e^{\alpha t} dX_t = \alpha e^{\alpha t} X_t \, dt + e^{\alpha t}(-\alpha X_t \, dt + \sigma \, dW_t) = \sigma e^{\alpha t} \, dW_t$$
The drift cancels entirely. Integrating both sides from $0$ to $t$:
$$X_t = X_0 e^{-\alpha t} + \sigma \int_0^t e^{-\alpha(t-s)} \, dW_s$$
This is the explicit solution: a deterministic decay toward zero plus a weighted sum of past noise. The distribution of $X_t$ is Gaussian with mean $X_0 e^{-\alpha t}$ and variance $\frac{\sigma^2}{2\alpha}(1 - e^{-2\alpha t})$. Short-rate models like Vasicek use exactly this structure.
When to reach for this: any mean-reverting SDE, Vasicek or Hull-White model questions, or when an interviewer asks you to characterize the stationary distribution of a process.

| Pattern | Function $f$ Applied To | Key Output | Primary Interview Context |
|---|---|---|---|
| Log transform of GBM | $\ln S_t$ | $d(\ln S) = (\mu - \frac{1}{2}\sigma^2)dt + \sigma \, dW$ | GBM distribution, Monte Carlo drift |
| Black-Scholes PDE | $V(t, S_t)$ | BS PDE via delta-hedge | Option pricing, Feynman-Kac |
| Exponential martingale | $e^{\theta W_t - \frac{1}{2}\theta^2 t}$ | $dZ = \theta Z \, dW$ (martingale) | Girsanov, measure changes |
| Stochastic integration by parts | $X_t Y_t$ | $d(XY) = X\,dY + Y\,dX + dX\,dY$ | Product/ratio dynamics, discounting |
| Ornstein-Uhlenbeck | $e^{\alpha t} X_t$ | Explicit solution via integrating factor | Mean reversion, short-rate models |
For most interview problems, you'll default to the log transform or the Black-Scholes PDE derivation; those two cover the majority of equity derivatives questions. Reach for the exponential martingale when the conversation turns to risk-neutral pricing or measure changes, and the OU pattern whenever you see mean reversion or hear the words "Vasicek" or "Hull-White." The integration by parts result is less likely to be the main question, but it appears constantly as a sub-step in longer derivations, so knowing it cold saves you from getting stuck mid-proof.
Here's where candidates lose points — and it's almost always one of these.
The most common whiteboard error I've seen, across every firm, is writing Ito's Lemma and either omitting the $\frac{1}{2}\sigma^2 f_{xx}$ term entirely or writing it as $\frac{1}{2} f_{xx}$ with the $\sigma^2$ quietly missing. The candidate looks confident, writes the SDE, and the interviewer just watches to see if the correction appears unprompted.
A bad answer sounds like: "So $df = f_t \, dt + f_x \, dX$, and then there's some extra term..." followed by a pause, a glance at the ceiling, and a half-remembered $\frac{1}{2} f_{xx} \, dt$ that's missing its $\sigma^2$.
Why it matters: the $\sigma^2$ is not cosmetic. It carries the units and the magnitude of the correction. Drop it and your Black-Scholes PDE is wrong, your log-normal drift is wrong, and your Monte Carlo paths are biased. The interviewer knows this and is watching for exactly that factor.
Ask a candidate to apply Ito's Lemma to $\ln S_t$ under geometric Brownian motion and a surprising number will write:
$$d(\ln S_t) = \mu \, dt + \sigma \, dW_t$$
That's wrong. The correct drift is $\mu - \frac{\sigma^2}{2}$, and the missing $\frac{\sigma^2}{2}$ is not a rounding error. It's the entire point of the exercise.
This mistake is especially costly because interviewers often follow up numerically. They'll ask: "If $\mu = 0.1$ and $\sigma = 0.2$, what's the expected log return over a year?" A candidate who wrote $\mu$ as the drift will say 10%. The right answer is $\mu - \frac{\sigma^2}{2} = 0.08$, or 8%. That gap is the difference between arithmetic and geometric returns, and it shows up directly in Monte Carlo drift corrections.
What to say instead: after computing the partials $f_S = 1/S$ and $f_{SS} = -1/S^2$, pause and say "the $-1/S^2$ term is what gives us the $-\frac{\sigma^2}{2}$ drift correction, which is why log returns have a lower mean than the arithmetic drift." Say it before they ask.
This one filters out more candidates than any other. The interview starts with "derive Ito's Lemma for me," and the candidate begins writing the final formula from memory. The interviewer interrupts: "Where does the $\frac{1}{2}$ come from?"
Silence.
The $\frac{1}{2}$ comes from the Taylor expansion. Specifically, the second-order term $\frac{1}{2} f_{xx} (dX)^2$ survives because $(dW_t)^2 = dt$ rather than vanishing the way $(dx)^2$ does in ordinary calculus. If you can't explain that on demand, you've signaled that you learned a formula, not a result.
Interviewers at Goldman, Citadel, and Two Sigma will push on this. The follow-up is almost always "and why does $(dW)^2 = dt$?" You need to be ready to sketch the quadratic variation argument: that $\sum (\Delta W_i)^2 \to t$ in $L^2$, so in the infinitesimal limit we treat $(dW_t)^2$ as exactly $dt$.
Single-asset Ito's Lemma is fine. Then the interviewer says "now suppose you have two correlated assets" and the candidate either freezes or writes the wrong Hessian sum.
The error usually looks like this: the candidate writes $df = \ldots + \frac{1}{2}(\sigma_1^2 f_{x_1 x_1} + \sigma_2^2 f_{x_2 x_2}) \, dt$ and stops. They've forgotten the cross term $\sigma_1 \sigma_2 \rho \, f_{x_1 x_2} \, dt$, which comes from $dW_1 \, dW_2 = \rho \, dt$.
This matters in practice for any multi-asset derivative: spread options, basket options, quanto products. The cross-partial term is where correlation enters the pricing PDE. Missing it in an interview for a multi-asset desk is a hard signal that you haven't worked with correlated processes.
The fix is to internalize the multiplication table as a two-by-two object, not just the diagonal entries. When $i = j$, $dW_i \, dW_j = dt$. When $i \neq j$, $dW_i \, dW_j = \rho_{ij} \, dt$. Write that table in the corner of your whiteboard before you start the derivation. It takes five seconds and prevents the error entirely.
Ito's Lemma isn't something you wait to be asked about. It's the foundation. But here are the specific cues that signal the interviewer wants you to go there:
Any time a smooth function gets applied to a stochastic process, Ito's Lemma is the tool. Say that out loud if the interviewer seems to be fishing.
This is the most common opening. It goes messier than you'd expect.
$$df = \left(\frac{\partial f}{\partial t} + \mu_t \frac{\partial f}{\partial x} + \frac{1}{2}\sigma_t^2 \frac{\partial^2 f}{\partial x^2}\right)dt + \sigma_t \frac{\partial f}{\partial x} \, dW_t$$
The $\frac{1}{2}\sigma^2 f_{xx}$ term is the entire point. It comes from the $(dX)^2$ term in the Taylor expansion, which survives because $(dW)^2 = dt$. In ordinary calculus, $(dx)^2 = 0$ and that term vanishes. Here it doesn't."
"Apply Ito's Lemma to $f(S_t) = \ln S_t$ where $dS = \mu S \, dt + \sigma S \, dW$."
Compute $f_S = 1/S$, $f_{SS} = -1/S^2$, $f_t = 0$, substitute, and get $d(\ln S) = (\mu - \sigma^2/2) \, dt + \sigma \, dW$. Then immediately say: "The $\sigma^2/2$ is the arithmetic-to-geometric drift correction. It's why the median of $S_T$ is $S_0 e^{(\mu - \sigma^2/2)T}$, not $S_0 e^{\mu T}$."
"What's the difference between the Ito and Stratonovich integrals?"
Stratonovich uses a midpoint convention that preserves the ordinary chain rule, so $d(\ln S) = \mu \, dt + \sigma \, dW$ without the correction. But Stratonovich integrals are not martingales in general, which breaks the risk-neutral pricing machinery. Finance uses Ito because martingales are the backbone of no-arbitrage theory.
"How does Ito's Lemma connect to the Black-Scholes PDE?"
Apply Ito's Lemma to $V(t, S_t)$, which gives you a $dW$ term proportional to $V_S$. Construct a portfolio $\Pi = V - V_S \cdot S$ that cancels that term. The resulting portfolio is instantaneously riskless, so by no-arbitrage it must earn the risk-free rate. Setting $d\Pi = r\Pi \, dt$ gives you the Black-Scholes PDE directly.
"How does this appear in Monte Carlo simulation?"
The log-Euler scheme simulates $\ln S$ rather than $S$ directly: $\ln S_{t+\Delta t} = \ln S_t + (\mu - \sigma^2/2)\Delta t + \sigma \sqrt{\Delta t} \, Z$. If you naively discretize $dS = \mu S \, dt + \sigma S \, dW$ as $S_{t+\Delta t} = S_t(1 + \mu \Delta t + \sigma \sqrt{\Delta t} \, Z)$, you get a biased estimator because you're missing the Ito correction in the drift.