Join ML Engineer Interview MasterClass (April Cohort) led by FAANG Data Scientists | Just 6 seats remaining...
ML Engineer MasterClass (April) | 6 seats left
Most candidates who fail quant probability interviews at Jane Street, Citadel, and Two Sigma know the material. They can recite Bayes' theorem, derive the geometric distribution, and explain conditional expectation. They fail because they apply the right formula to the wrong problem, and they don't catch it until the interviewer's expression tells them something has gone sideways.
The traps fall into three categories. Cognitive traps are where your intuition overrides your math: you "feel" that two events are independent, so you treat them as independent without checking. Mechanical traps are subtler: your reasoning is sound but your setup is wrong, like counting ordered outcomes when the problem requires unordered ones. Communication traps are the cruelest: you get the right answer but explain it in a way that reveals you don't know why it's right, which is often worse than a wrong answer with clean reasoning.
Interviewers at top quant firms are not just testing whether you know the tools. They are specifically probing for these failure modes. A Jane Street interviewer who asks a Monty Hall variant isn't curious whether you've seen it before. They want to watch how you handle a problem designed to make your intuition lie to you. This guide gives you a detection framework to run before you write a single equation, worked examples of each trap in action, and a pre-answer checklist you can execute in your head in under thirty seconds.
Most candidates dive straight into calculation. That's exactly when traps bite. The STOP-CHECK-SOLVE-SANITY framework forces a 60-second pause before you write anything, and that pause is where interviews are won.
Here's the structure. Memorize this table.
| Phase | Time | Goal |
|---|---|---|
| STOP | 15-20 sec | Classify the problem: discrete or continuous, counting or expectation, conditional or joint |
| CHECK | 30-40 sec | Run all five trap signatures; flag any that apply |
| SOLVE | Remaining time | Execute with a verified setup: notation first, derivation second, number last |
| SANITY | 15 sec | Boundary checks before you say your final answer out loud |
The whole pre-computation phase takes under a minute. The interviewer won't notice the pause. They'll notice the rigor.

What to do:
What to say:
"Before I set up anything, let me make sure I understand what kind of problem this is. We're looking for a probability, and the event depends on the outcome of multiple draws, so I want to be careful about whether order matters here."
How the interviewer is evaluating you:
They're watching whether you rush. A candidate who immediately writes $P(A) = \frac{\text{favorable}}{\text{total}}$ without pausing has already signaled that they're pattern-matching, not reasoning. Interviewers at Jane Street and Two Sigma specifically look for the moment you slow down and classify, because it predicts whether you'll catch your own errors later.
This is the core of the framework. Before you commit to a setup, run through each of the following five checks in order. You're looking for any that apply, not just the first one.
Trap 1: Conditional vs. unconditional confusion. Ask yourself: is the problem giving me information that updates the sample space? If the problem says "given that..." or "you observe that..." or "a player reveals...", you're in conditional probability territory. The trap is computing $P(A \cap B)$ when you need $P(A \mid B)$, or worse, computing $P(A \mid B)$ when the problem is actually asking for $P(B \mid A)$.
Trap 2: Unverified independence. Before you multiply probabilities, ask: are these events actually independent, or am I assuming they are because it's convenient? Draws without replacement are never independent. Events defined on overlapping outcomes are rarely independent. If you can't state why independence holds, don't assume it.
Trap 3: Sample space miscounting. Ask: does order matter in my sample space? Am I sampling with or without replacement? A common error is counting ordered outcomes in the numerator and unordered outcomes in the denominator, or vice versa. Both need to be consistent. When in doubt, enumerate small cases explicitly.
Trap 4: Base rate neglect. Any time you see a conditional probability problem with a rare event (a disease, a signal, an anomaly), ask: what is the prior probability of the event I'm conditioning on? Ignoring the prior and computing only the likelihood is the defining error of base rate neglect. The posterior is never just the likelihood.
Trap 5: False symmetry. Symmetry arguments are powerful but fragile. Before invoking symmetry, ask: are the outcomes I'm treating as equivalent actually equally likely? Sequences of the same length are not always equally likely to appear first. Positions in a game are not always interchangeable. If you can't write down the formal argument for why symmetry holds, don't use it.
What to say:
"Let me just run through a quick check before I set up the calculation. I want to confirm whether these draws are independent, and I want to make sure I have the conditioning direction right."
That's it. You don't need to narrate every check. One sentence signals that you're doing it.
How the interviewer is evaluating you:
They may have designed the problem specifically to trigger one of these traps. When you say "let me check whether order matters in my sample space," you've demonstrated that you know the trap exists. Even if you then proceed correctly, that verbalization earns credit. If you skip the check and walk into the trap, no amount of correct algebra afterward fully recovers the impression.
Do this: Say one sentence out loud during the CHECK phase. It doesn't have to cover all five signatures. It just has to show you paused and thought about setup before solving.
Don't do this: Run the CHECK phase silently and then present your solution as if the setup were obvious. The interviewer can't see your internal process. If you don't verbalize it, it didn't happen, as far as they're concerned.
Once you've classified the problem and flagged any traps, you can actually compute. The order matters here too.
What to do:
What to say:
"Okay, I'm confident in the setup. Let me define the events formally and then work through the calculation."
How the interviewer is evaluating you:
They want to see clean, auditable reasoning. If your notation is sloppy, they can't tell whether a wrong answer came from a conceptual error or an arithmetic slip. Explicit notation protects you: if the setup is right and the arithmetic is wrong, a good interviewer will tell you and let you continue. If the setup is wrong and the notation is ambiguous, they can't help you.
Never say your final answer the moment you compute it. Take 15 seconds.
What to do:
What to say:
"Let me just do a quick sanity check. If $p = 1$, this should give probability 1, and it does. The answer looks right to me: $\boxed{2/3}$."
Example: "I'm getting $\frac{4}{3}$ here, which can't be right for a probability. Let me go back and check my denominator in the Bayes calculation."
That kind of self-correction, done calmly, is not a failure. It's exactly what the interviewer wants to see. The candidates who impress at Citadel and DE Shaw are not the ones who never make errors; they're the ones who catch their own errors before being prompted.
The framework is a real-time tool. You run it before you write a single equation, not after you've already committed to a setup. Treating it as a post-hoc check defeats the purpose entirely. The trap is already sprung by the time you're verifying a completed solution.
Each trap below comes with a full worked example: the problem, the wrong setup most candidates reach for, and the correct derivation with a numerical answer. Run the STOP-CHECK-SOLVE loop on each one before reading the solution.
The problem: You have two children. At least one is a boy. What is the probability both are boys?
Most candidates answer 1/2 immediately. That's wrong, and here's exactly why.
The mistake is treating "the other child is a boy" as an independent event with probability 1/2. But the problem has already conditioned on the family having at least one boy. You're computing $P(\text{both boys} \mid \text{at least one boy})$, not $P(\text{second child is boy})$.
Correct setup using Bayes' theorem:
Let $A$ = both children are boys. Let $B$ = at least one child is a boy.
$$P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B)}$$
The sample space for two children, equally likely: ${BB, BG, GB, GG}$.
$$P(A) = \frac{1}{4}, \quad P(B) = \frac{3}{4}, \quad P(B \mid A) = 1$$
$$P(A \mid B) = \frac{1 \cdot \frac{1}{4}}{\frac{3}{4}} = \frac{1}{3}$$
Answer: 1/3, not 1/2.
The flip that kills candidates: they set up $P(A \mid B)$ but compute $P(B \mid A)$ instead, or they forget to normalize by $P(B)$ entirely. The Monty Hall problem fails the same way. You're told the host opened door 3 (a goat). The posterior probability that door 1 hides the car is $P(\text{car at 1} \mid \text{host opens 3})$, which requires knowing the host's door-selection rule. Candidates who say "now it's 50/50" are computing $P(\text{car at 1})$ after ignoring the conditioning event.
Do this: Before writing any conditional probability, write out $P(_ \mid _)$ explicitly with both slots filled. If you can't name what's in each slot, you haven't set up the problem yet.
The problem: A standard deck has 52 cards. You draw 5 without replacement. What is the probability that exactly 2 are aces?
The wrong setup: treat each draw as independent with $p = 4/52 = 1/13$, then apply the binomial formula.
$$P(\text{wrong}) = \binom{5}{2} \left(\frac{1}{13}\right)^2 \left(\frac{12}{13}\right)^3 \approx 0.0412$$
This is incorrect. Draws without replacement are not independent. Removing an ace changes the probability of drawing an ace on the next draw. The binomial model requires independent, identically distributed trials. Neither condition holds here.
Correct setup using the hypergeometric distribution:
You're sampling $n = 5$ cards from a population of $N = 52$, where $K = 4$ are "successes" (aces).
$$P(X = 2) = \frac{\binom{4}{2}\binom{48}{3}}{\binom{52}{5}}$$
Computing each term:
$$\binom{4}{2} = 6, \quad \binom{48}{3} = \frac{48 \cdot 47 \cdot 46}{6} = 17296, \quad \binom{52}{5} = 2598960$$
$$P(X = 2) = \frac{6 \cdot 17296}{2598960} = \frac{103776}{2598960} \approx 0.0399$$
Answer: approximately 3.99%.
The binomial gave 4.12%. Close enough that you might not catch the error on intuition alone, which is exactly why interviewers use this problem. The numerical difference is small; the conceptual error is large.
The problem: You roll two fair six-sided dice. What is the probability the sum equals 4?
Here's where candidates go wrong. They list the "ways to get 4" as: ${(1,3), (2,2), (3,1)}$ and say there are 3 outcomes. Then they count total outcomes as 21 (the number of unordered pairs from ${1,...,6}$). That gives $3/21 = 1/7$.
Wrong.
The sample space of two dice is ordered pairs. $(1,3)$ and $(3,1)$ are distinct outcomes because die 1 and die 2 are distinguishable objects. The correct sample space has $6 \times 6 = 36$ equally likely outcomes.
Correct enumeration:
Outcomes summing to 4: $(1,3), (2,2), (3,1)$. That's 3 outcomes.
$$P(\text{sum} = 4) = \frac{3}{36} = \frac{1}{12} \approx 0.0833$$
The wrong answer was $1/7 \approx 0.1429$. A 70% relative error from a sample space mistake.
The unordered approach fails because unordered pairs are not equally likely. The pair ${1,3}$ corresponds to two ordered outcomes ($(1,3)$ and $(3,1)$), while ${2,2}$ corresponds to only one. Assigning equal probability to unordered pairs violates the uniform distribution assumption.
For more complex dice problems, multinomial coefficients handle the counting cleanly. The number of ordered outcomes of rolling $k$ dice that produce a specific multiset ${n_1, n_2, \ldots}$ is $\frac{k!}{n_1! n_2! \cdots}$, and you weight each multiset by this factor before summing.
The problem: Your trading signal predicts an up-move correctly 90% of the time when an up-move occurs. It also fires a false positive 30% of the time on non-up-move days. Up-moves happen on 5% of trading days. The signal fires today. What is the probability there's actually an up-move?
Candidates who anchor on "90% accurate" say the answer is around 90%. It's not even close.
Correct Bayes setup:
Let $U$ = up-move occurs. Let $S$ = signal fires.
$$P(U \mid S) = \frac{P(S \mid U) \cdot P(U)}{P(S)}$$
Expanding the denominator using total probability:
$$P(S) = P(S \mid U)P(U) + P(S \mid U^c)P(U^c)$$
$$P(S) = (0.90)(0.05) + (0.30)(0.95) = 0.045 + 0.285 = 0.330$$
$$P(U \mid S) = \frac{(0.90)(0.05)}{0.330} = \frac{0.045}{0.330} \approx 0.136$$
Answer: approximately 13.6%.
A signal that's "90% accurate" gives you a posterior of 13.6% because up-moves are rare. The false positive rate of 30% on 95% of days swamps the true positive rate of 90% on 5% of days. This is the base rate neglect trap in its purest form.
Do this: In any Bayes problem, write down $P(\text{prior})$ before you write anything else. If the prior is extreme (close to 0 or 1), your posterior will be dominated by it. If you haven't accounted for the prior, your answer is wrong regardless of how carefully you computed the likelihood.
In a live interview, say this out loud: "Before I set up Bayes, let me note the base rate is 5%, which is low, so I'd expect the posterior to be much lower than the raw accuracy figure suggests." That one sentence signals you know exactly which trap is in play.
The problem: What is the expected number of fair coin flips to see the pattern HH? What about HT? Are they the same?
Most candidates say they're the same. Both are length-2 sequences. The coin is fair. Feels symmetric. It's not.
Setting up the Markov chain for HH:
Define states by progress toward HH:
Let $e_i$ = expected flips to absorption from state $S_i$.
From $S_0$: flip once. With probability 1/2 go to $S_1$, with probability 1/2 stay in $S_0$.
$$e_0 = 1 + \frac{1}{2}e_1 + \frac{1}{2}e_0$$
From $S_1$: flip once. With probability 1/2 go to $S_2$ (done), with probability 1/2 go to $S_0$.
$$e_1 = 1 + \frac{1}{2}(0) + \frac{1}{2}e_0$$
Solving: from the second equation, $e_1 = 1 + \frac{1}{2}e_0$. Substituting into the first:
$$e_0 = 1 + \frac{1}{2}\left(1 + \frac{1}{2}e_0\right) + \frac{1}{2}e_0 = 1 + \frac{1}{2} + \frac{1}{4}e_0 + \frac{1}{2}e_0$$
$$e_0 - \frac{3}{4}e_0 = \frac{3}{2} \implies \frac{1}{4}e_0 = \frac{3}{2} \implies e_0 = 6$$
Expected flips to HH: 6.
Setting up the Markov chain for HT:
From $S_0$: flip once. With probability 1/2 go to $S_1$ (got H), with probability 1/2 stay in $S_0$ (got T).
$$e_0 = 1 + \frac{1}{2}e_1 + \frac{1}{2}e_0$$
From $S_1$: flip once. With probability 1/2 go to $S_2$ (got T, done), with probability 1/2 stay in $S_1$ (got H, still have H as last flip).
$$e_1 = 1 + \frac{1}{2}(0) + \frac{1}{2}e_1$$
Solving: $e_1 - \frac{1}{2}e_1 = 1 \implies e_1 = 2$. Then:
$$e_0 = 1 + \frac{1}{2}(2) + \frac{1}{2}e_0 \implies \frac{1}{2}e_0 = 2 \implies e_0 = 4$$
Expected flips to HT: 4.
The answers are 6 and 4. Not equal.
Why does symmetry fail? When you're hunting for HH and you see an H followed by a T, you've wasted progress: you're back to $S_0$ with nothing. When you're hunting for HT and you see an H followed by another H, you haven't wasted it: you're still in $S_1$ because you have a fresh H. The overlap structure of the target pattern determines how much "credit" you carry after a mismatch, and that's fundamentally asymmetric between HH and HT.
These aren't edge cases. Every one of these shows up constantly in quant interviews, and every one of them is avoidable if you know what to watch for.
You hear the problem, you recognize the pattern, and you start calculating. That's the trap.
Here's what it looks like in practice:
Interviewer: "Two dice are rolled. What's the probability the sum is 7?"
Candidate: "There are 6 ways to get a sum of 7, and 36 total outcomes, so 6/36 = 1/6."
Interviewer: "What if I told you one of the dice already shows a 3?"
Candidate: "...still 1/6?"
Interviewer: "Are you sure? Which outcomes are still in your sample space?"
The candidate answered the unconditional question. The interviewer was asking a conditional one. Once you know one die shows a 3, the sample space shrinks from 36 to 11: the six outcomes where the first die is 3, plus the five outcomes where the second die is 3 and the first isn't (to avoid double-counting the (3,3) case). The favorable outcomes for a sum of 7 are (3,4) and (4,3), so the correct answer is $P(\text{sum}=7 \mid \text{one die shows } 3) = 2/11$, not $1/6$. The candidate was wrong, and confidently so.
Interviewers plant this trap deliberately. They want to see if you'll commit before you've confirmed what you're computing.
Do this: Before writing a single symbol, say out loud: "Let me confirm the sample space. Are we conditioning on anything here?" It takes five seconds and it signals exactly the kind of rigor these firms hire for.
Read the problem again. Slowly. Does it say "at least one" or "exactly one"? These are different calculations, and candidates swap them constantly.
"At least one" means one or more. The clean way to compute it is almost always the complement:
$$P(\text{at least one}) = 1 - P(\text{none})$$
"Exactly one" requires inclusion-exclusion or direct counting, and it's messier. If you set up the complement method on an "exactly one" problem, you'll get the wrong answer with no obvious error to catch.
The failure mode looks like this: the problem says "at least one of three components fails," the candidate computes $1 - P(\text{all three work})$, gets a clean number, and moves on. That's actually correct for "at least one." But if the problem said "exactly one fails," that same setup produces a number that's too large and the candidate has no idea.
Don't do this: Skim the quantifier. "At least," "exactly," "at most," and "more than" are four different setups.
The fix: underline or repeat the quantifier back to the interviewer before you start. "So we want the probability that at least one of these events occurs, right?" You'll catch misreads before they cost you.
This one produces answers greater than 1. That should be an automatic alarm, but under time pressure, candidates miss it.
Bayes' theorem is:
$$P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B)}$$
What happens in interviews is that candidates correctly identify $P(B \mid A) \cdot P(A)$ as the numerator, write it down, and then either forget $P(B)$ entirely or wave at it with "and we normalize." That's not a derivation. That's a placeholder where a calculation should be.
The algebraic consequence is concrete. Suppose $P(B \mid A) = 0.9$ and $P(A) = 0.4$. The numerator is $0.36$. If $P(B) = 0.3$, then $P(A \mid B) = 1.2$. That's not a probability. If you'd computed $P(B)$ via the law of total probability, you'd have caught it:
$$P(B) = P(B \mid A)P(A) + P(B \mid A^c)P(A^c) = 0.9(0.4) + P(B \mid A^c)(0.6)$$
You need that second term. Skipping it is what produces nonsense.
Don't do this: Write the numerator and call it done. If your answer isn't in $[0, 1]$, you dropped the denominator.
Always expand $P(B)$ explicitly using the law of total probability. It's one extra line and it's the line that makes the answer valid.
The law of total expectation is powerful: $E[X] = \sum_i E[X \mid A_i] P(A_i)$. But it only works when the $A_i$ are mutually exclusive and exhaustive. Candidates forget the second condition constantly.
The failure looks like this: a candidate conditions on two events, $A$ and $B$, computes $E[X \mid A]P(A) + E[X \mid B]P(B)$, and presents that as $E[X]$. But if $A$ and $B$ overlap, or if $A \cup B \neq \Omega$, this double-counts or misses probability mass. The answer can be off by a lot, and it'll look completely reasonable on the page.
Interviewers at firms like Two Sigma and DE Shaw specifically probe this because it reveals whether you understand the theorem or just pattern-match to it.
Do this: Before conditioning, explicitly state your partition. "I'm going to condition on whether the first card is a heart or not a heart. These are mutually exclusive and exhaustive, so the law of total expectation applies." That sentence protects you.
If you can't name a clean partition, you don't have one yet. Find it before you start computing.
You find an approach that works. You run with it. You get an answer. The interviewer says, "Interesting. Is there another way to see this?"
If your answer is a blank stare, you've already lost points.
At Jane Street and Optiver, finding a second solution path isn't a bonus. It's part of the evaluation. A direct counting argument that takes 12 steps is objectively worse than a symmetry argument that takes 2, and interviewers know which one you should have seen. When you anchor on the first method that comes to mind, you often miss the elegant approach entirely.
The practical cost is also real: direct counting on a complex problem is slow and error-prone. A recursion or generating function argument is faster and harder to mess up. Candidates who only know one gear make arithmetic mistakes under pressure that candidates with multiple approaches avoid.
Don't do this: Finish your calculation and stop thinking. The first method you find is a floor, not a ceiling.
After reaching an answer, spend 30 seconds asking yourself: "Could I solve this with symmetry? With a recursion? With a clever conditioning argument?" Say that out loud. Even if you don't pursue the second method fully, naming it shows the interviewer you're thinking like a quant, not just executing a procedure.
| Trap | Warning Sign in Problem Wording | Corrective Action | Key Formula/Technique |
|---|---|---|---|
| Conditional probability confusion | "Given that," "knowing that," "after observing" | Identify which event is conditioned on; write $P(A \mid B)$ explicitly before computing | $P(A \mid B) = \frac{P(A \cap B)}{P(B)}$ |
| Independence assumption error | Drawing cards, sampling people, sequential events | Ask: is this with or without replacement? If without, draws are dependent | Hypergeometric, not binomial |
| Sample space miscounting | Dice rolls, arrangements, combinations | Decide ordered vs. unordered, with vs. without replacement before counting | Multinomial coefficients; explicit enumeration for small cases |
| Base rate neglect | "The signal is accurate X% of the time," any Bayesian setup | Write down the prior $P(H)$ before touching the likelihood | $P(H \mid E) = \frac{P(E \mid H)P(H)}{P(E)}$; compute $P(E)$ via total probability |
| False symmetry | "Expected time until sequence," patterns of equal length | Build Markov states explicitly; symmetry only holds when state spaces match | State equations: $E[T] = 1 + p \cdot E[\text{next}] + (1-p) \cdot E[\text{restart}]$ |
| Phase | When | What You're Doing |
|---|---|---|
| STOP | First 30 seconds | Classify: discrete or continuous, counting or expectation, conditional or joint |
| CHECK | Before writing any equation | Run the five trap signatures above |
| FLAG | As soon as a trap is detected | Name it out loud; state the corrective action |
| SOLVE | After setup is verified | Notation first, then derivation, then numerical answer |
| SANITY | After you have a number | Boundary checks (see below) before saying "my answer is" |
These signal rigor. Say them before committing to a setup, not after.
Run these on every answer before you say it out loud.
| Problem Type | Most Likely Trap |
|---|---|
| Dice and coin games | Sample space miscounting (ordered vs. unordered) |
| Bayesian inference / signal accuracy | Base rate neglect |
| Card drawing, urn models | Independence assumption (without replacement) |
| Coin sequence / pattern waiting times | False symmetry between sequences |
| Conditional expectation, tower property | Partition error (non-exhaustive or overlapping conditioning events) |
| "At least one" problems | Forgetting the complement; setting up $P(\text{exactly one})$ instead |
You will catch yourself mid-calculation sometimes. That's fine. Here's exactly what to say.
"Actually, I set up the conditioning backwards. Let me rewrite this with $P(B \mid A)$ on the correct side of Bayes and redo the denominator."
"I think I treated these draws as independent, but they're without replacement, so let me switch to the hypergeometric setup."
"I counted ordered pairs but the problem is asking for unordered outcomes. Let me divide through by the number of orderings."
"I conditioned on event $A$, but I haven't checked that my partition covers the full sample space. Let me list the cases explicitly before continuing."
Saying these things clearly, without panic, is what separates a strong candidate from a great one. Interviewers at Jane Street and Citadel are not penalizing you for catching your own error. They're penalizing you for not catching it.