Probability is the mathematical foundation that separates real data scientists from code monkeys who memorize pandas syntax. Top-tier companies like Two Sigma, Jane Street, and Citadel use probability questions to test whether you can think rigorously about uncertainty, model complex systems, and derive insights from first principles. If you cannot quickly compute conditional probabilities or explain why a Poisson model makes sense for your feature, you will not make it past the first round.
What makes probability interviews brutal is that a single conceptual mistake cascades through your entire solution. You might confidently set up a Bayesian calculation for fraud detection, nail the formula structure, but forget that the base rate of fraud is 0.1% rather than 10% and conclude that flagged transactions are definitely fraudulent. Interviewers watch for this exact type of error because it reflects how you will reason about real business problems where getting the setup wrong costs millions.
Here are the top 29 probability questions organized by the core concepts that repeatedly appear in technical interviews at quantitative firms and tech companies.
Probability Interview Questions
Top Probability interview questions covering the key areas tested at leading tech companies. Practice with real questions and detailed solutions.
Combinatorics and Counting
Combinatorics questions test whether you can count outcomes correctly under realistic constraints, which is fundamental to any probability calculation. Most candidates fail because they either overcomplicate the setup with unnecessary formulas or miss subtle restrictions that change the count entirely.
The key insight is recognizing problem patterns quickly: permutations with identical objects use multinomial coefficients, selections with constraints require inclusion-exclusion, and arrangements with restrictions often need indirect counting. Practice identifying these patterns in 15 seconds, not 5 minutes.
Combinatorics and Counting
You start here because most probability interview questions collapse into counting the right sample space. You are tested on setting up cases cleanly, avoiding double counting, and choosing the right counting tool under time pressure.
You are logging user sessions. In a day you observe 10 events labeled by type, with counts: 4 clicks, 3 scrolls, 2 views, 1 purchase. How many distinct event-type sequences are possible across the 10 positions?
Sample Answer
Most candidates default to $4^ {10}$ or $10!$, but that fails here because events of the same type are indistinguishable, so those counts massively overcount. You want the number of distinct permutations of a multiset. The count is $$\frac{10!}{4!\,3!\,2!\,1!}.$$ This is the clean sample space size for any probability you build on top of these event types.
A quant team is choosing a 6 person on call rotation from 10 researchers and 6 engineers, with the constraint that at least 2 engineers are included. How many distinct 6 person teams are possible?
In an order book snapshot you see 12 orders, 5 are buys and 7 are sells. You randomly permute the 12 time stamps. How many permutations have no two buys adjacent?
Google is bucketing 20 user IDs into 5 labeled shards of equal size 4. How many distinct shard assignments are possible, assuming each shard must get exactly 4 users?
Citadel is designing a 10 bit feature flag vector with exactly 4 ones, and the additional constraint that no run of consecutive ones is longer than 2. How many valid vectors exist?
Conditional Probability and Bayes
Bayes' theorem separates candidates who understand conditional probability from those who just memorize formulas. Interviewers love these questions because they mirror real ML problems like fraud detection, A/B testing, and classification where base rates matter enormously.
The common mistake is forgetting to weight by prior probabilities when computing posterior odds. When Jane Street asks about market regime detection or Meta asks about user engagement modeling, they want to see you instinctively account for how rare the target condition actually is.
Conditional Probability and Bayes
In interviews, you are often given partial information and asked to update probabilities correctly. Candidates struggle when they mix up conditioning direction, forget base rates, or fail to define events precisely before computing.
A fraud model flags 2% of transactions. It catches 92% of truly fraudulent transactions, and has a 5% false positive rate on legitimate transactions. If a transaction is flagged, what is the probability it is actually fraud?
Sample Answer
About $27.3\%$. Use Bayes: $P(F\mid +)=\frac{P(+\mid F)P(F)}{P(+\mid F)P(F)+P(+\mid \neg F)P(\neg F)}$. Plug in: $\frac{0.92\cdot 0.02}{0.92\cdot 0.02+0.05\cdot 0.98}=\frac{0.0184}{0.0674}\approx 0.273$. The base rate is doing most of the work, so you cannot ignore the 98% legitimate mass.
You have two data sources. Source A is chosen with probability 0.7 and generates a positive label with probability 0.1. Source B is chosen with probability 0.3 and generates a positive label with probability 0.6. Given you observed a positive label, what is the probability it came from Source B?
A/B test traffic is split 50/50. Variant A has conversion probability 4%, variant B has conversion probability 5%. You sample one random converter from the combined population, what is the probability they came from variant B?
A market data feed has two independent failure modes: network drop with probability 1% and decoder bug with probability 0.5%. If at least one failure occurs, an alert fires. Given you saw an alert, what is the probability the decoder bug occurred?
A quant researcher models signals as follows. With probability $p$, the regime is "trending" and then $P(S=1\mid T)=0.8$; with probability $1-p$, the regime is "mean reverting" and then $P(S=1\mid M)=0.3$. You observe $n$ independent days with $S=1$ on all days. Express $P(T\mid S_1=\cdots=S_n=1)$ as a function of $p$ and $n$.
You have three coins: one fair, one double headed, one biased with $P(H)=0.75$. You pick one uniformly at random, flip it three times, and observe HHT. What is the posterior probability you picked the biased coin?
Random Variables and Common Distributions
Distribution questions reveal whether you understand the assumptions behind statistical models you use every day. Google and Facebook will ask you to justify why you chose Poisson for event counts or exponential for waiting times, then immediately follow up with parameter estimation or tail probability calculations.
Your goal is connecting the story to the math seamlessly. If arrivals are independent with constant rate, say Poisson immediately. If you are waiting for the first success, say geometric. Hesitating on these fundamentals signals that you do not really understand the models you claim to use.
Random Variables and Common Distributions
A common test is whether you can map a story to a random variable and pick a distributional model that matches the mechanism. You are evaluated on recognizing when Poisson, binomial, geometric, normal, and exponential assumptions are justified and when they are not.
You are modeling the number of support tickets that arrive to a product team in a 10 minute window. Historical data suggests arrivals are independent and the average rate is 3 per 10 minutes. What distribution would you use, and what is $P(N \ge 5)$?
Sample Answer
You could do a binomial model by chopping time into tiny slots, or a Poisson model directly on counts. Poisson wins here because the story is count of independent arrivals in a fixed interval with a stable rate. Use $N \sim \text{Poisson}(\lambda=3)$, so $$P(N\ge 5)=1-\sum_{k=0}^{4} e^{-3}\frac{3^k}{k!}.$$ If you compute it numerically, you just evaluate that partial sum and subtract from 1.
A trading system receives market data updates as a Poisson process with rate 120 per minute. What is the distribution of the waiting time until the next update, and what is the probability you wait more than 1 second?
You run an A B experiment where each user converts independently with probability $p=0.02$. For a day with 50,000 users, what distribution would you use for the number of conversions, and how would you approximate the probability of seeing at least 1,100 conversions?
You model the number of independent login attempts a user makes until the first successful login, where each attempt succeeds with probability $p$ and attempts are independent. What distribution fits, what are $\mathbb{E}[N]$ and $\mathrm{Var}(N)$, and how would the answer change if $p$ increases after each failure due to a CAPTCHA hint?
A PM claims daily incident counts are Poisson because the mean equals the variance in the dashboard. You notice incidents often come in bursts during outages and are quiet otherwise. What assumption is being violated, what empirical symptom would you expect, and name a more appropriate distributional model you might propose?
Expectation, Variance, and Concentration
Expectation and variance calculations test your ability to work with random variables algebraically, which underlies everything from experimental design to risk management. Citadel and DE Shaw particularly focus on these because trading strategies require precise understanding of return distributions and tail risks.
The trick is recognizing when to use linearity of expectation versus when you need to account for dependence structure. Master the standard formulas for sampling without replacement, negative binomial processes, and concentration inequalities so you can derive sample sizes on the spot.
Expectation, Variance, and Concentration
What interviewers want is your ability to compute expectations fast using linearity, indicator variables, and variance tricks. You tend to lose time if you over integrate, or if you do not know when to use bounds like Markov, Chebyshev, Hoeffding, or CLT approximations.
You sample $n$ users uniformly at random from a product with $N$ users, and there are exactly $K$ users in a target segment. Let $X$ be the number of target users in your sample without replacement. Compute $\mathbb{E}[X]$ and $\mathrm{Var}(X)$ quickly.
Sample Answer
Reason through it: define indicators $I_i$ for whether the $i$th draw is a target user, so $X=\sum_{i=1}^n I_i$. By symmetry, $\mathbb{E}[I_i]=K/N$, so $\mathbb{E}[X]=nK/N$ by linearity. For variance, use $\mathrm{Var}(X)=\sum \mathrm{Var}(I_i)+2\sum_{i<j}\mathrm{Cov}(I_i,I_j)$, where $\mathrm{Var}(I_i)=p(1-p)$ with $p=K/N$ and without replacement gives negative covariance. The known hypergeometric result is $$\mathrm{Var}(X)=n\frac{K}{N}\Bigl(1-\frac{K}{N}\Bigr)\frac{N-n}{N-1}.$$
You flip a biased coin with $\Pr(H)=p$ until you see $r$ heads. Let $T$ be the total number of flips. Find $\mathbb{E}[T]$ and $\mathrm{Var}(T)$, and give a one line justification for each.
A monitoring system counts events per minute. Assume you observe i.i.d. bounded counts $X_1,\dots,X_n$ with $0\le X_i\le 1$ and mean $\mu$. You want $\Pr\left(\left|\bar X-\mu\right|\ge \varepsilon\right)\le \delta$. Give a tight, interview-ready sample size $n$ using a concentration bound, and state when you would switch to a CLT approximation instead.
You are estimating a click through rate $p$ by sampling $n$ impressions with Bernoulli outcomes. Product asks for a guarantee that the relative error is at most $10\%$ with probability at least $99\%$, meaning $\Pr(|\hat p-p|\ge 0.1p)\le 0.01$. Using only Markov or Chebyshev style tools and variance tricks, give a conservative sufficient $n$ in terms of $p$.
You have i.i.d. mean zero, variance one random variables $X_1,\dots,X_n$. Let $S_n=\sum_{i=1}^n X_i$. Give two different upper bounds on $\Pr(S_n\ge t)$, one using Markov on $S_n^2$ and one using a CLT style approximation, and state when each is appropriate.
A hash function maps $m$ items independently and uniformly into $n$ buckets. Let $C$ be the number of collisions, defined as the number of unordered item pairs that land in the same bucket. Compute $\mathbb{E}[C]$ and give a usable variance or concentration statement that would let you argue $C$ is close to its mean for large $m,n$.
Stochastic Processes, Markov Chains, and Stopping Times
Stochastic processes and Markov chains test advanced probabilistic thinking that quant firms use for modeling user behavior, market dynamics, and system performance. These questions separate senior candidates from junior ones because they require understanding how randomness evolves over time.
Success here means fluently working with transition matrices, hitting probabilities, and stopping times without getting bogged down in notation. Two Sigma wants to see you set up the recursive equations for absorption problems, then solve them cleanly using first-step analysis or matrix methods.
Stochastic Processes, Markov Chains, and Stopping Times
Later-round quant and research interviews push you into dynamics, where state, memorylessness, and long run behavior matter. You are tested on setting up transitions, stationary distributions, hitting times, and using tools like recursion or optional stopping without handwaving.
You are modeling a user as switching between two states each day: Active (A) and Inactive (I). The transition matrix is $P=\begin{pmatrix}0.9&0.1\\0.3&0.7\end{pmatrix}$ with rows A, I. If the user starts Active, what is the stationary distribution and what fraction of days do you expect them to be Active in the long run?
Sample Answer
This question is checking whether you can set up and solve the stationarity equations and interpret them as long run time fractions. You solve $\pi=\pi P$ with $\pi_A+\pi_I=1$: $\pi_A=0.9\pi_A+0.3\pi_I\Rightarrow 0.1\pi_A=0.3\pi_I\Rightarrow \pi_A=3\pi_I$. Normalize to get $\pi_I=1/4$, $\pi_A=3/4$. Since the chain is irreducible and aperiodic, your long run fraction of Active days converges to $\pi_A=0.75$, regardless of the initial state.
A stock microprice model uses a Markov chain on states $\{0,1,2\}$ representing imbalance buckets. From state 1 you move to 0 or 2 with probability $1/2$ each, and 0 and 2 are absorbing. Starting at 1, what is the expected time to absorption, and how would you write the recursion that gets you there?
You implement an epsilon-greedy policy that randomly explores with probability $\varepsilon$ each step. You model the exploration phase length as a stopping time $\tau$ for a Markov chain on exploration states. Give a concrete condition under which $\mathbb{E}[\tau]$ is finite, and show how you would compute or bound it using hitting probabilities in a finite Markov chain.
A random walk on $\{0,1,2,3,4\}$ moves from $i$ to $i+1$ with probability $p$ and to $i-1$ with probability $1-p$, reflecting at 4 (from 4 it goes to 3 with prob 1) and absorbing at 0. Starting at 2, compute the probability of hitting 4 before 0, and describe the key equation you use.
You see a martingale in a pricing model: $M_t=S_t-\mu t$ where $S_t$ is a simple symmetric random walk starting at 0. Let $\tau$ be the first time $S_t$ hits $+a$ or $-b$ for positive integers $a,b$. Use optional stopping to compute $\mathbb{P}(S_\tau=+a)$, and state the integrability condition you need to justify the stop.
You model user churn with a continuous time Markov chain with two transient states (Engaged, At Risk) and an absorbing state (Churned). Given generator matrix entries, how do you compute the expected time to churn starting from Engaged, and what linear system do you solve?
A market making model uses a birth death chain for inventory $X_t\in\{-L,\dots,L\}$ with reflecting boundaries. You are asked to find the stationary distribution in closed form under asymmetric buy and sell arrival rates, and then compute the long run probability of being at the limits $\pm L$.
How to Prepare for Probability Interviews
Drill Basic Distributions Daily
Memorize the PMF, mean, and variance for binomial, Poisson, geometric, and negative binomial distributions until you can write them instantly. Practice connecting real scenarios to these distributions in under 10 seconds. Use datainterview.com/questions to test your pattern recognition speed.
Master Bayes with Concrete Numbers
Always work Bayes problems with specific numbers first, then generalize to symbolic form if needed. Write out the full probability tree with branches labeled clearly. This prevents the classic error of confusing P(A|B) with P(B|A) under pressure.
Practice Matrix Operations by Hand
For Markov chain problems, you need to compute matrix powers and solve linear systems quickly without a calculator. Drill 2x2 and 3x3 matrix multiplication until it becomes automatic. Focus on finding stationary distributions and absorption probabilities using eigenvalue methods.
Connect Every Problem to Business Context
When you solve probability problems, always explain why the mathematical result matters for decision-making. If you calculate a 15% false positive rate, immediately discuss how that affects user experience or operational costs. Interviewers want to see business intuition, not just mathematical correctness.
How Ready Are You for Probability Interviews?
1 / 6You need to estimate collision risk for a feature that assigns users a 6 character code using A to Z and 0 to 9, characters can repeat. Roughly how many distinct codes exist, and which expression would you use?
Frequently Asked Questions
How deep does my Probability knowledge need to be for Data Scientist or Quantitative Researcher interviews?
You should be fluent with core probability rules, conditional probability, Bayes' theorem, common distributions, expectation, variance, and independence. You will often need to derive results on the spot, not just quote formulas. For quant roles, expect deeper work with random variables, moment generating functions, and asymptotics, plus more proof-style reasoning.
Which companies tend to ask the most Probability interview questions?
Market making firms, hedge funds, and high frequency trading shops tend to ask Probability most heavily, especially for quantitative researcher roles. Big Tech data science interviews include Probability too, but it is often tied to A/B testing, experimentation, and product metrics. If a role mentions stochastic modeling, risk, or pricing, expect frequent Probability questions.
Do Probability interviews require coding, or are they purely math?
Many Probability interviews are whiteboard style and focus on clean reasoning, but coding can still show up. You may be asked to simulate a random process, estimate a probability via Monte Carlo, or write code to validate an analytic result. If coding is expected, practice implementing sampling, RNG pitfalls, and vectorized simulations at datainterview.com/coding.
How do Probability interview questions differ between Data Scientist and Quantitative Researcher roles?
Data Scientist interviews emphasize applied Probability, like interpreting p-values, likelihood, Bayesian updates for conversion rates, and uncertainty in experiments. Quantitative Researcher interviews emphasize theory and derivations, like conditioning arguments, distributions of functions of random variables, stopping times, and sometimes measure-theory flavored intuition. You should tailor your practice accordingly, applied inference for DS and deeper stochastic reasoning for quant.
How can I prepare for Probability interviews if I have no real-world experience using Probability?
You can build experience by solving probability puzzles and then validating them with simulations, this bridges theory and practical intuition. Create a small portfolio of notebook-style analyses, like simulating the birthday problem, coupon collector, or random walks, and compare to closed-form expectations. For targeted practice, drill probability interview prompts at datainterview.com/questions and verify your answers by simulation.
What are common mistakes to avoid in Probability interview questions?
You should not assume independence without justifying it, and you should explicitly state the sample space and what is being conditioned on. Another common mistake is mixing up conditional probabilities, like confusing P(A|B) with P(B|A), or forgetting to normalize in Bayes' rule. Also watch for off-by-one counting errors, and always sanity check results against extremes and symmetry.
