ML Engineer MasterClass (April) | 6 seats left

Ito's Lemma and Stochastic Calculus

Ito's Lemma and Stochastic Calculus

Ito's Lemma and Stochastic Calculus

Every calculus course teaches you that second-order terms vanish. When you expand $f(x + dx)$ as a Taylor series, you keep the $f'(x)\,dx$ term and throw away $(dx)^2$ because it's infinitesimally small compared to $dx$. That works perfectly in deterministic calculus. It fails completely for Brownian motion.

The reason is a single, strange fact: $(dW_t)^2 = dt$. Not zero. Not negligible. Exactly $dt$. Brownian motion is so rough, so non-differentiable, that its squared increments accumulate at a finite rate. This means when you try to differentiate a function of a stochastic process, the second-order term in your Taylor expansion refuses to die, and you're left with an extra correction that has no analogue in ordinary calculus. That correction is Ito's Lemma.

Informally, Ito's Lemma tells you how a smooth function $f(t, X_t)$ evolves when $X_t$ is driven by noise. If the function is convex, the noise pushes its average value up; if concave, down. This is the continuous-time version of Jensen's inequality, and it shows up as the $\frac{1}{2}\sigma^2 f_{xx}$ term that separates stochastic calculus from everything you learned before it.

This is the engine behind Black-Scholes. It's the foundation of the Heston model, the Hull-White model, and every continuous-time result you'll encounter in a quant interview. Interviewers at Goldman, Citadel, and Two Sigma routinely open with "derive Ito's Lemma from scratch" precisely because it separates candidates who understand the mathematics from those who memorized a formula. The derivation starts with quadratic variation, and that's where we start too.

The Mechanism Behind the Correction Term

Start with quadratic variation. That's the foundation everything else rests on.

Partition the interval $[0, t]$ into $n$ subintervals of width $\Delta t = t/n$. Over each subinterval, the Brownian increment $\Delta W_i = W_{t_{i+1}} - W_{t_i}$ is normally distributed with mean zero and variance $\Delta t$. Now sum the squared increments:

$$\sum_{i=1}^{n} (\Delta W_i)^2$$

Each term has expected value $\mathbb{E}[(\Delta W_i)^2] = \Delta t$, so the sum has expected value $t$. The variance of each term is $\mathbb{E}[(\Delta W_i)^4] - (\Delta t)^2 = 2(\Delta t)^2$, so the total variance of the sum is $2n(\Delta t)^2 = 2t^2/n$, which goes to zero as $n \to \infty$. The sum converges to $t$ in $L^2$. This is the quadratic variation result: $[W, W]_t = t$.

That single fact is what breaks ordinary calculus. In a deterministic setting, $(\Delta x)^2$ is negligible compared to $\Delta x$ as increments shrink. For Brownian motion, $(\Delta W)^2$ is not negligible. It concentrates around $\Delta t$, not zero.

This gives you the Ito multiplication table, which is the bookkeeping device for the whole theory:

$$dW_t \cdot dW_t = dt, \qquad dt \cdot dW_t = 0, \qquad dt \cdot dt = 0$$

Think of it like significant figures: $dt$ is first-order small, $(dt)^2$ is second-order and vanishes, but $(dW_t)^2$ is also first-order because Brownian paths are rough enough to accumulate variation at rate $dt$.

The Taylor Expansion Proof

Now take a smooth function $f(t, x) \in C^{1,2}$ and an Ito process $dX_t = \mu_t \, dt + \sigma_t \, dW_t$. You want to find $df(t, X_t)$.

Write the second-order Taylor expansion:

$$df = f_t \, dt + f_x \, dX_t + \frac{1}{2} f_{tt} \, (dt)^2 + f_{tx} \, dt \, dX_t + \frac{1}{2} f_{xx} \, (dX_t)^2 + \cdots$$

Now substitute $dX_t = \mu_t \, dt + \sigma_t \, dW_t$ and apply the multiplication table. The term $(dX_t)^2$ expands as:

$$(dX_t)^2 = (\mu_t \, dt + \sigma_t \, dW_t)^2 = \mu_t^2 (dt)^2 + 2\mu_t \sigma_t \, dt \, dW_t + \sigma_t^2 (dW_t)^2$$

Apply the table: $(dt)^2 = 0$, $dt \, dW_t = 0$, $(dW_t)^2 = dt$. Everything collapses to $\sigma_t^2 \, dt$.

The $f_{tt}(dt)^2$ and $f_{tx} \, dt \, dX_t$ terms also vanish by the same rules. What survives:

$$\boxed{df(t, X_t) = \left( f_t + \mu_t f_x + \frac{1}{2}\sigma_t^2 f_{xx} \right) dt + \sigma_t f_x \, dW_t}$$

That $\frac{1}{2}\sigma_t^2 f_{xx}$ term is the entire story. It comes directly from $(dW)^2 = dt$ surviving the Taylor expansion. In ordinary calculus, the second-order term dies. Here it lives.

Here's what that flow looks like:

Ito's Lemma: From Taylor Expansion to Stochastic Differential

Why the Ordinary Chain Rule Isn't Enough

In deterministic calculus, if $x(t)$ is a differentiable path, then $(dx)^2 \sim (x'(t))^2 (dt)^2$, which is second-order small and drops out. The chain rule $df = f_t \, dt + f_x \, dx$ is exact.

Brownian paths are nowhere differentiable. They're too rough for that. The squared increment doesn't vanish; it accumulates. So the second-order term in the Taylor expansion is not a rounding error you can ignore. It's a genuine first-order contribution.

Your interviewer cares about this contrast because it's the conceptual test. If you say "Ito's Lemma is just the chain rule with an extra term," you'll get a follow-up: "why is there an extra term?" The answer is quadratic variation, not a hand-wave about randomness.

⚠️Common mistake
Candidates memorize the formula but can't explain where the $\frac{1}{2}$ comes from. It's the $\frac{1}{2}$ from the Taylor expansion, multiplied by $\sigma_t^2$ from $(dW_t)^2 = dt$. Both pieces matter.

The Multidimensional Case

When you have two correlated Ito processes $dX_t^{(i)} = \mu_i \, dt + \sigma_i \, dW_t^{(i)}$ with $dW_t^{(i)} \, dW_t^{(j)} = \rho_{ij} \, dt$, the cross-variation terms survive too. For $f(X_t^{(1)}, X_t^{(2)}, \ldots)$, the correction becomes a full Hessian sum:

$$df = f_t \, dt + \sum_i f_{x_i} \, dX_t^{(i)} + \frac{1}{2} \sum_{i,j} f_{x_i x_j} \, \sigma_i \sigma_j \rho_{ij} \, dt$$

Multi-asset desks at firms like Goldman or Citadel will probe this directly. If you're pricing a spread option or a basket, the $\rho_{ij}$ terms in the Hessian are where correlation enters the PDE. Candidates who only know the one-dimensional version get stuck the moment an interviewer writes down two correlated underlyings.

The key property to internalize: the correction term scales with $\sigma^2$ (or $\sigma_i \sigma_j \rho_{ij}$ in multiple dimensions). Higher volatility means a larger curvature correction. This is the continuous-time analog of Jensen's inequality: a convex function of a random variable has a higher expectation than the function evaluated at the mean, and the gap grows with variance.

⏱️Your 30-second explanation
"Ito's Lemma is the chain rule for stochastic processes. When you differentiate a smooth function of a Brownian-driven process, you get the usual first-order terms plus a correction: one-half sigma-squared times the second derivative. That correction comes from the fact that Brownian motion has nonzero quadratic variation, so $(dW)^2 = dt$ rather than zero. In ordinary calculus that term vanishes; in stochastic calculus it survives and becomes first-order. The result is $df = (f_t + \mu f_x + \frac{1}{2}\sigma^2 f_{xx}) \, dt + \sigma f_x \, dW$."

Patterns You Need to Know

In an interview, you'll usually need to pick a specific approach. Here are the ones worth knowing.

Pattern 1: Log Transform of Geometric Brownian Motion

This is the pattern you will be asked about. Every quant interviewer has it in their back pocket, and the trap is always the same: candidates write the drift as $\mu$ instead of $\mu - \frac{1}{2}\sigma^2$ and don't notice.

Start with $dS_t = \mu S_t \, dt + \sigma S_t \, dW_t$ and apply Ito's Lemma to $f(S) = \ln S$. The partials are $f_S = 1/S$, $f_{SS} = -1/S^2$, and $f_t = 0$. Plugging into the lemma:

$$d(\ln S_t) = \frac{1}{S_t} dS_t + \frac{1}{2} \left(-\frac{1}{S_t^2}\right) \sigma^2 S_t^2 \, dt = \left(\mu - \frac{1}{2}\sigma^2\right) dt + \sigma \, dW_t$$

The $-\frac{1}{2}\sigma^2$ term is the entire point. It's the curvature correction from Jensen's inequality: because $\ln$ is concave, the expected log return is strictly less than the log of the expected return. Financially, it's the gap between arithmetic and geometric average returns. When your interviewer asks "why is the drift $\mu - \frac{1}{2}\sigma^2$ and not just $\mu$?", that's your answer.

When to reach for this: any time an interviewer asks you to find the distribution of $S_T$, simulate GBM paths correctly, or derive what $\mathbb{E}[\ln S_T]$ equals.

Pattern 1: Log Transform of Geometric Brownian Motion

Pattern 2: Deriving the Black-Scholes PDE

Ito's Lemma on $V(t, S_t)$ gives you:

$$dV = \left(V_t + \mu S V_S + \frac{1}{2}\sigma^2 S^2 V_{SS}\right) dt + \sigma S V_S \, dW_t$$

Now construct a portfolio $\Pi = V - \Delta S$ where $\Delta = V_S$. The $dW_t$ terms cancel exactly, leaving a portfolio that is instantaneously riskless. No-arbitrage then forces $d\Pi = r\Pi \, dt$, and after substituting and rearranging you arrive at:

$$V_t + \frac{1}{2}\sigma^2 S^2 V_{SS} + r S V_S - rV = 0$$

Notice that $\mu$ has completely disappeared. That's not a coincidence; the delta hedge eliminates all exposure to the actual drift of the stock. The PDE only depends on $\sigma$ and $r$, which is why two investors with different views on $\mu$ can still agree on the option price.

Interviewers want to see you derive this without skipping the delta-hedge step. If you jump straight to the PDE without explaining why the $dW_t$ term vanishes, expect a follow-up that forces you back to it anyway.

When to reach for this: whenever the question involves option pricing, the connection between SDEs and PDEs, or the Feynman-Kac theorem.

Pattern 2: Black-Scholes PDE via Ito's Lemma and Delta Hedging

Pattern 3: Exponential Martingale and Girsanov's Theorem

Define $Z_t = \exp!\left(\theta W_t - \frac{1}{2}\theta^2 t\right)$. Apply Ito's Lemma with $f(t, w) = e^{\theta w - \frac{1}{2}\theta^2 t}$:

$$dZ_t = \left(-\frac{1}{2}\theta^2 Z_t + \frac{1}{2}\theta^2 Z_t\right) dt + \theta Z_t \, dW_t = \theta Z_t \, dW_t$$

The $dt$ terms cancel perfectly. No drift means $Z_t$ is a local martingale (and under Novikov's condition, a true martingale). This is not just a curiosity: $Z_t$ is the Radon-Nikodym derivative $\frac{d\mathbb{Q}}{d\mathbb{P}}\big|_{\mathcal{F}_t}$ that defines the risk-neutral measure. Under $\mathbb{Q}$, the process $\tilde{W}_t = W_t - \theta t$ is a standard Brownian motion, which is exactly Girsanov's theorem.

🔑Key insight
The $-\frac{1}{2}\theta^2 t$ term in the exponent is not arbitrary. It's precisely the correction that kills the drift in $dZ_t$, making $Z_t$ a martingale. Without it, you'd have an exponential with a nonzero $dt$ term and the measure change would break down.

When to reach for this: risk-neutral pricing derivations, any question about changing probability measures, or when an interviewer asks you to verify that a given process is a martingale.

Pattern 3: Exponential Martingale and Measure Change

Pattern 4: Stochastic Integration by Parts

Take two Ito processes $X_t$ and $Y_t$ and apply the two-dimensional Ito's Lemma to $f(x, y) = xy$. The second-order partials are $f_{xx} = f_{yy} = 0$ and $f_{xy} = 1$, so:

$$d(X_t Y_t) = X_t \, dY_t + Y_t \, dX_t + dX_t \, dY_t$$

That last term, $dX_t \, dY_t$, is the piece that ordinary calculus doesn't have. If $dX_t = \sigma_X \, dW_t^{(1)}$ and $dY_t = \sigma_Y \, dW_t^{(2)}$ with $dW^{(1)} dW^{(2)} = \rho \, dt$, then $dX_t \, dY_t = \rho \sigma_X \sigma_Y \, dt$. For independent processes it vanishes; for correlated ones it doesn't.

This pattern comes up constantly in proofs: deriving the dynamics of a ratio $X_t / Y_t$ (via $f(x,y) = x/y$), computing the dynamics of a discounted price process, or proving that a product of martingales is not generally a martingale.

When to reach for this: any time you need to find the SDE for a product or ratio of two stochastic processes, or when an interviewer asks about the dynamics of a discounted asset.

Pattern 4: Stochastic Integration by Parts

Pattern 5: Solving the Ornstein-Uhlenbeck SDE

The OU process $dX_t = -\alpha X_t \, dt + \sigma \, dW_t$ looks like it should have a closed-form solution, but you can't integrate it directly because of the $X_t$ on the right-hand side. The trick is an integrating factor.

Define $f(t, x) = e^{\alpha t} x$ and apply Ito's Lemma:

$$d!\left(e^{\alpha t} X_t\right) = \alpha e^{\alpha t} X_t \, dt + e^{\alpha t} dX_t = \alpha e^{\alpha t} X_t \, dt + e^{\alpha t}(-\alpha X_t \, dt + \sigma \, dW_t) = \sigma e^{\alpha t} \, dW_t$$

The drift cancels entirely. Integrating both sides from $0$ to $t$:

$$X_t = X_0 e^{-\alpha t} + \sigma \int_0^t e^{-\alpha(t-s)} \, dW_s$$

This is the explicit solution: a deterministic decay toward zero plus a weighted sum of past noise. The distribution of $X_t$ is Gaussian with mean $X_0 e^{-\alpha t}$ and variance $\frac{\sigma^2}{2\alpha}(1 - e^{-2\alpha t})$. Short-rate models like Vasicek use exactly this structure.

When to reach for this: any mean-reverting SDE, Vasicek or Hull-White model questions, or when an interviewer asks you to characterize the stationary distribution of a process.

Pattern 5: Solving the Ornstein-Uhlenbeck SDE Explicitly

Pattern Comparison

PatternFunction $f$ Applied ToKey OutputPrimary Interview Context
Log transform of GBM$\ln S_t$$d(\ln S) = (\mu - \frac{1}{2}\sigma^2)dt + \sigma \, dW$GBM distribution, Monte Carlo drift
Black-Scholes PDE$V(t, S_t)$BS PDE via delta-hedgeOption pricing, Feynman-Kac
Exponential martingale$e^{\theta W_t - \frac{1}{2}\theta^2 t}$$dZ = \theta Z \, dW$ (martingale)Girsanov, measure changes
Stochastic integration by parts$X_t Y_t$$d(XY) = X\,dY + Y\,dX + dX\,dY$Product/ratio dynamics, discounting
Ornstein-Uhlenbeck$e^{\alpha t} X_t$Explicit solution via integrating factorMean reversion, short-rate models

For most interview problems, you'll default to the log transform or the Black-Scholes PDE derivation; those two cover the majority of equity derivatives questions. Reach for the exponential martingale when the conversation turns to risk-neutral pricing or measure changes, and the OU pattern whenever you see mean reversion or hear the words "Vasicek" or "Hull-White." The integration by parts result is less likely to be the main question, but it appears constantly as a sub-step in longer derivations, so knowing it cold saves you from getting stuck mid-proof.

What Trips People Up

Here's where candidates lose points — and it's almost always one of these.

The Mistake: Dropping the Correction Term (or Mangling It)

The most common whiteboard error I've seen, across every firm, is writing Ito's Lemma and either omitting the $\frac{1}{2}\sigma^2 f_{xx}$ term entirely or writing it as $\frac{1}{2} f_{xx}$ with the $\sigma^2$ quietly missing. The candidate looks confident, writes the SDE, and the interviewer just watches to see if the correction appears unprompted.

A bad answer sounds like: "So $df = f_t \, dt + f_x \, dX$, and then there's some extra term..." followed by a pause, a glance at the ceiling, and a half-remembered $\frac{1}{2} f_{xx} \, dt$ that's missing its $\sigma^2$.

Why it matters: the $\sigma^2$ is not cosmetic. It carries the units and the magnitude of the correction. Drop it and your Black-Scholes PDE is wrong, your log-normal drift is wrong, and your Monte Carlo paths are biased. The interviewer knows this and is watching for exactly that factor.

💡Interview tip
Before you write anything, say out loud: "The correction term comes from $(dX)^2 = \sigma^2 \, dt$, so it picks up a full $\sigma^2$." Narrating this signals you understand the origin, not just the formula.

The Mistake: Writing $d(\ln S)$ with Drift $\mu$

Ask a candidate to apply Ito's Lemma to $\ln S_t$ under geometric Brownian motion and a surprising number will write:

$$d(\ln S_t) = \mu \, dt + \sigma \, dW_t$$

That's wrong. The correct drift is $\mu - \frac{\sigma^2}{2}$, and the missing $\frac{\sigma^2}{2}$ is not a rounding error. It's the entire point of the exercise.

This mistake is especially costly because interviewers often follow up numerically. They'll ask: "If $\mu = 0.1$ and $\sigma = 0.2$, what's the expected log return over a year?" A candidate who wrote $\mu$ as the drift will say 10%. The right answer is $\mu - \frac{\sigma^2}{2} = 0.08$, or 8%. That gap is the difference between arithmetic and geometric returns, and it shows up directly in Monte Carlo drift corrections.

⚠️Common mistake
Candidates conflate the drift of $S_t$ with the drift of $\ln S_t$. The interviewer hears: "This person has never actually simulated a GBM path."

What to say instead: after computing the partials $f_S = 1/S$ and $f_{SS} = -1/S^2$, pause and say "the $-1/S^2$ term is what gives us the $-\frac{\sigma^2}{2}$ drift correction, which is why log returns have a lower mean than the arithmetic drift." Say it before they ask.


The Mistake: Memorizing the Formula Instead of Owning the Derivation

This one filters out more candidates than any other. The interview starts with "derive Ito's Lemma for me," and the candidate begins writing the final formula from memory. The interviewer interrupts: "Where does the $\frac{1}{2}$ come from?"

Silence.

The $\frac{1}{2}$ comes from the Taylor expansion. Specifically, the second-order term $\frac{1}{2} f_{xx} (dX)^2$ survives because $(dW_t)^2 = dt$ rather than vanishing the way $(dx)^2$ does in ordinary calculus. If you can't explain that on demand, you've signaled that you learned a formula, not a result.

Interviewers at Goldman, Citadel, and Two Sigma will push on this. The follow-up is almost always "and why does $(dW)^2 = dt$?" You need to be ready to sketch the quadratic variation argument: that $\sum (\Delta W_i)^2 \to t$ in $L^2$, so in the infinitesimal limit we treat $(dW_t)^2$ as exactly $dt$.

💡Interview tip
Practice deriving Ito's Lemma starting from a second-order Taylor expansion, substituting the Ito multiplication table, and arriving at the final SDE. Do it three times on paper until the steps feel mechanical. The formula is the last thing you write, not the first.

The Mistake: Forgetting Cross-Variation in the Multidimensional Case

Single-asset Ito's Lemma is fine. Then the interviewer says "now suppose you have two correlated assets" and the candidate either freezes or writes the wrong Hessian sum.

The error usually looks like this: the candidate writes $df = \ldots + \frac{1}{2}(\sigma_1^2 f_{x_1 x_1} + \sigma_2^2 f_{x_2 x_2}) \, dt$ and stops. They've forgotten the cross term $\sigma_1 \sigma_2 \rho \, f_{x_1 x_2} \, dt$, which comes from $dW_1 \, dW_2 = \rho \, dt$.

This matters in practice for any multi-asset derivative: spread options, basket options, quanto products. The cross-partial term is where correlation enters the pricing PDE. Missing it in an interview for a multi-asset desk is a hard signal that you haven't worked with correlated processes.

The fix is to internalize the multiplication table as a two-by-two object, not just the diagonal entries. When $i = j$, $dW_i \, dW_j = dt$. When $i \neq j$, $dW_i \, dW_j = \rho_{ij} \, dt$. Write that table in the corner of your whiteboard before you start the derivation. It takes five seconds and prevents the error entirely.

How to Talk About This in Your Interview

When to Bring It Up

Ito's Lemma isn't something you wait to be asked about. It's the foundation. But here are the specific cues that signal the interviewer wants you to go there:

  • They ask you to "derive the Black-Scholes PDE from scratch"
  • They say "model the stock price as GBM and tell me the distribution of $\ln S_T$"
  • They ask "how do you simulate GBM in a Monte Carlo?" (the drift correction is the trap)
  • They mention "risk-neutral measure" or "change of measure" (Girsanov lives downstream of Ito)
  • They ask "what's the expected value of $S_T$ under the risk-neutral measure?" (you need the log-normal formula, which requires the $\sigma^2/2$ correction)

Any time a smooth function gets applied to a stochastic process, Ito's Lemma is the tool. Say that out loud if the interviewer seems to be fishing.

Sample Dialogue

This is the most common opening. It goes messier than you'd expect.


I
Interviewer: "Alright, let's start with something foundational. State Ito's Lemma and derive it."
Y
You: "Sure. Before I write the formula, let me anchor it in why it's different from ordinary calculus. The key fact is that Brownian motion has nonzero quadratic variation. Specifically, $\sum (\Delta W_i)^2 \to t$ in $L^2$, which means we treat $(dW_t)^2 = dt$ rather than zero. That single fact is why the ordinary chain rule fails."
I
Interviewer: "Wait, back up. Why does $(dW_t)^2 = dt$? That seems like it's just being asserted."
Y
You: "Fair point. It comes from the variance of the increments. Each $\Delta W_i \sim \mathcal{N}(0, \Delta t)$, so $(\Delta W_i)^2$ has mean $\Delta t$ and variance $2(\Delta t)^2$. When you sum $n$ of these over $[0,t]$ with $\Delta t = t/n$, the sum of means is $t$ and the total variance is $2t^2/n \to 0$. So the sum converges to $t$ in $L^2$. It's not an assertion, it's a limit theorem."
I
Interviewer: "Okay, good. Now give me the formula."
Y
You: "For a function $f(t, X_t)$ where $dX_t = \mu_t \, dt + \sigma_t \, dW_t$, Taylor-expanding to second order and substituting the Ito table gives:

$$df = \left(\frac{\partial f}{\partial t} + \mu_t \frac{\partial f}{\partial x} + \frac{1}{2}\sigma_t^2 \frac{\partial^2 f}{\partial x^2}\right)dt + \sigma_t \frac{\partial f}{\partial x} \, dW_t$$

The $\frac{1}{2}\sigma^2 f_{xx}$ term is the entire point. It comes from the $(dX)^2$ term in the Taylor expansion, which survives because $(dW)^2 = dt$. In ordinary calculus, $(dx)^2 = 0$ and that term vanishes. Here it doesn't."

I
Interviewer: "And what does the $f_{xx}$ term mean financially?"
Y
You: "It's a curvature correction. If $f$ is convex in $X$, Jensen's inequality tells you the expected value of $f(X)$ exceeds $f(\mathbb{E}[X])$. The Ito correction is the continuous-time version of that gap. It's why geometric Brownian motion has a lower drift in log space than in price space."

Follow-Up Questions to Expect

"Apply Ito's Lemma to $f(S_t) = \ln S_t$ where $dS = \mu S \, dt + \sigma S \, dW$."

Compute $f_S = 1/S$, $f_{SS} = -1/S^2$, $f_t = 0$, substitute, and get $d(\ln S) = (\mu - \sigma^2/2) \, dt + \sigma \, dW$. Then immediately say: "The $\sigma^2/2$ is the arithmetic-to-geometric drift correction. It's why the median of $S_T$ is $S_0 e^{(\mu - \sigma^2/2)T}$, not $S_0 e^{\mu T}$."

"What's the difference between the Ito and Stratonovich integrals?"

Stratonovich uses a midpoint convention that preserves the ordinary chain rule, so $d(\ln S) = \mu \, dt + \sigma \, dW$ without the correction. But Stratonovich integrals are not martingales in general, which breaks the risk-neutral pricing machinery. Finance uses Ito because martingales are the backbone of no-arbitrage theory.

"How does Ito's Lemma connect to the Black-Scholes PDE?"

Apply Ito's Lemma to $V(t, S_t)$, which gives you a $dW$ term proportional to $V_S$. Construct a portfolio $\Pi = V - V_S \cdot S$ that cancels that term. The resulting portfolio is instantaneously riskless, so by no-arbitrage it must earn the risk-free rate. Setting $d\Pi = r\Pi \, dt$ gives you the Black-Scholes PDE directly.

"How does this appear in Monte Carlo simulation?"

The log-Euler scheme simulates $\ln S$ rather than $S$ directly: $\ln S_{t+\Delta t} = \ln S_t + (\mu - \sigma^2/2)\Delta t + \sigma \sqrt{\Delta t} \, Z$. If you naively discretize $dS = \mu S \, dt + \sigma S \, dW$ as $S_{t+\Delta t} = S_t(1 + \mu \Delta t + \sigma \sqrt{\Delta t} \, Z)$, you get a biased estimator because you're missing the Ito correction in the drift.

What Separates Good from Great

  • A mid-level candidate states the formula correctly and applies it to GBM. A senior candidate derives it from the Taylor expansion, explains exactly where each term comes from, and connects the $\sigma^2/2$ correction to Jensen's inequality without being prompted.
  • A mid-level candidate knows that Stratonovich and Ito differ. A senior candidate explains the trade-off precisely: Stratonovich gives you the chain rule but loses the martingale property; Ito gives you martingales but requires the correction term. Then they say which one appears in the Girsanov theorem and why.
  • When asked about Monte Carlo, a mid-level candidate mentions the log-Euler scheme. A senior candidate explains that the naive Euler scheme for $S$ is not wrong per se, it just converges more slowly and introduces discretization bias that the log scheme eliminates exactly for GBM.
🎯Key takeaway
Ito's Lemma is not a formula to recite. It's a derivation to reproduce, and every step from quadratic variation through the Taylor expansion to the $\frac{1}{2}\sigma^2 f_{xx}$ correction should be something you can explain out loud, on a whiteboard, under pressure.