Top 22 Statistics Interview Questions (2026)

Q: How much statistics depth do I need for a Data Analyst, Data Scientist, or Quantitative Researcher interview?

You should be fluent in probability basics, sampling, hypothesis tests, confidence intervals, linear regression, and experiment design. For Data Scientist roles, expect deeper questions on bias and variance, regularization, model evaluation, and causal inference basics. For Quantitative Researcher roles, you often need stronger probability, distributions, stochastic processes basics, and rigorous reasoning about assumptions and tail risk.

Q: Which companies tend to ask the most statistics-heavy interview questions?

Tech companies with strong experimentation culture, large marketplaces, and ads or ranking systems often ask a lot about A/B testing, p-values, power, and metric tradeoffs. Quant trading firms and market makers tend to push harder on probability, distributional thinking, and inference under uncertainty. Healthcare, insurance, and fintech also commonly emphasize statistical modeling, calibration, and validation.

Q: Is coding required for statistics interviews, or is it mostly theory?

You may need light coding to compute summary statistics, run regressions, or simulate a sampling distribution, usually in SQL or Python/R. Many interviews mix conceptual questions with a small take-home or live exercise that tests whether you can translate statistical ideas into code. For coding practice aligned to these tasks, use datainterview.com/coding.

Q: How do statistics interview questions differ across Data Analyst, Data Scientist, and Quantitative Researcher roles?

Data Analyst interviews focus on experimental design details, metric selection, interpreting confidence intervals, and common pitfalls like selection bias and Simpson’s paradox. Data Scientist interviews add modeling-related statistics, such as likelihood, regularization, cross-validation, and calibration, plus deeper discussion of assumptions. Quantitative Researcher interviews are typically the most math-forward, emphasizing probability, distributions, estimators, and rigorous derivations or proofs of key results.

Q: How can I prepare for statistics interviews if I have no real-world experience?

Use simulated projects: design a mock A/B test, generate synthetic data, and practice estimating power, confidence intervals, and false positive rates via simulation. Practice explaining assumptions and failure modes, for example what happens if samples are dependent or the metric is heavy-tailed. Drill common prompts and solutions at datainterview.com/questions, and be ready to justify each step with statistical reasoning.

Q: What are the most common mistakes candidates make in statistics interviews, and how do I avoid them?

You often lose points by applying a test without checking assumptions like independence, randomization, equal variance, or sample size adequacy. Another common mistake is confusing statistical significance with practical significance, or ignoring multiple testing and peeking, which inflates false positives. You should state assumptions explicitly, discuss robustness checks, and tie results to effect sizes and uncertainty, not just p-values.

Statistics questions are the backbone of data science interviews at top tech companies. Meta asks about A/B testing power and sample size calculations. Netflix probes your understanding of distribution assumptions and confidence intervals. Uber wants to see if you can identify confounding variables and design robust experiments. Every FAANG company expects you to think statistically about uncertainty, causation, and experimental design.

What makes statistics interviews particularly brutal is that they test both mathematical rigor and business judgment simultaneously. You might nail the probability calculation for a binomial distribution, but then fumble when asked whether you'd actually ship the feature based on that result. Or you'll correctly identify multiple testing problems but struggle to propose a practical solution that stakeholders will accept. The best candidates seamlessly blend technical precision with pragmatic decision-making.

Here are the top 22 statistics questions organized by the core areas that matter most: probability fundamentals, hypothesis testing, regression analysis, and Bayesian thinking.

Intermediate22 questions

Statistics Interview Questions

Top Statistics interview questions covering the key areas tested at leading tech companies. Practice with real questions and detailed solutions.

Data AnalystData ScientistQuantitative Researcher Meta

Descriptive Statistics and Data Summaries

Candidates consistently underestimate how much probability theory matters in day-to-day data work, then get blindsided when interviewers ask them to model user behavior with specific distributions. You need to recognize when arrivals follow a Poisson process, when sampling leads to hypergeometric distributions, and when normal approximations are valid.

The key insight interviewers look for is whether you can connect abstract probability models to real business scenarios. Don't just memorize formulas: practice explaining why a Poisson model makes sense for ride requests, or when you'd use binomial versus normal approximations for conversion rates.

Descriptive Statistics and Data Summaries

Start here: you are tested on whether you can summarize messy product or experiment data into a few reliable numbers, and explain what they mean. You usually stumble when outliers, skew, missingness, or metric definitions make averages and percentiles misleading.

Practice more Descriptive Statistics and Data Summaries questions

Probability, Random Variables, and Core Distributions

A/B testing questions reveal whether you understand the statistical reasoning behind product decisions, not just the mechanics of running tests. Interviewers want to see if you can spot multiple comparisons problems, recognize when stopping rules invalidate p-values, and distinguish between statistical significance and practical importance.

Most candidates can calculate a t-test, but they struggle with the subtler issues that matter in practice: early stopping bias, confidence interval interpretation, and power analysis. The strongest answers always connect statistical concepts back to business risk and decision-making frameworks.

Probability, Random Variables, and Core Distributions

In interviews, you need to translate a story problem into a probability model and pick the right distribution under time pressure. You tend to struggle when independence assumptions are implicit, sampling is without replacement, or you confuse discrete counts with continuous rates.

At Meta, you sample 5 users uniformly at random from a pool of 100 users, where 20 are in a treatment group and 80 are in control. What is the probability you pick exactly 2 treatment users, sampling without replacement?

MetaMediumProbability, Random Variables, and Core Distributions

Sample Answer

Use a hypergeometric model: $$P(X=2)=\frac{\binom{20}{2}\binom{80}{3}}{\binom{100}{5}}.$$ You are counting successes in a fixed-size sample drawn without replacement, so independence is not valid and binomial is the wrong default. The numerator picks 2 of the 20 treatment users and 3 of the 80 control users, and the denominator counts all possible 5-user samples.

At Uber, ride requests arrive at an average rate of 12 per hour for a city zone. What is the probability of getting at least 3 requests in the next 10 minutes, assuming the usual model for arrivals?

UberEasyProbability, Random Variables, and Core Distributions

Sample Answer

You could do this with a Poisson count model over 10 minutes, or with exponential interarrival times and then sum waiting times. Poisson wins here because you are asked for a count in a fixed interval, so $N \sim \text{Poisson}(\lambda t)$ with $\lambda=12/\text{hr}$ and $t=1/6$ hr, giving $\lambda t=2$. Then $$P(N\ge 3)=1-\left[P(N=0)+P(N=1)+P(N=2)\right]=1-e^{-2}\left(1+2+\frac{2^2}{2}\right).$$

At Netflix, 2% of streams experience a startup failure. You independently sample 200 streams from logs. Approximate the probability you see at least 8 failures, and state the distributional approximation you are using.

NetflixHardProbability, Random Variables, and Core Distributions

Sample Answer

You start with $X \sim \text{Binomial}(n=200,p=0.02)$ because you have independent Bernoulli trials and you are counting failures. Since $np=4$ is small and $n$ is large, you can approximate with $X \approx \text{Poisson}(\lambda=np=4)$. Then you compute $$P(X\ge 8)=1-\sum_{k=0}^{7} e^{-4}\frac{4^k}{k!}.$$ If you want a quick sanity check, $8$ is about double the mean, so the tail should be small but not astronomically small.

At Google, you A/B test two ranking models and measure click-through rate as clicks divided by impressions for each user. Why is a Poisson model for CTR usually a mismatch, and what distributional model would you pick for clicks given impressions?

GoogleMediumProbability, Random Variables, and Core Distributions

At Spotify, you randomly shuffle a playlist of 30 songs that contains 6 songs by the same artist. What is the probability that no two of those 6 songs end up adjacent in the shuffled order?

SpotifyHardProbability, Random Variables, and Core Distributions

Practice more Probability, Random Variables, and Core Distributions questions

Hypothesis Testing, A/B Testing, and Power

Regression and causality questions separate junior analysts from senior data scientists because they test your ability to think critically about confounding, selection bias, and model assumptions. You'll need to diagnose when coefficients can't be interpreted causally and propose experimental or quasi-experimental alternatives.

The biggest mistake candidates make is treating regression as a black box prediction tool instead of a causal inference framework. Interviewers want to see you question whether the data generating process matches your modeling assumptions, especially around endogeneity and omitted variable bias.

Hypothesis Testing, A/B Testing, and Power

You will be asked to design, critique, and interpret A/B tests like you would at Meta, Google, or Netflix, including edge cases like peeking and multiple metrics. Many candidates miss the practical implications of Type I and Type II errors, power, and p-value interpretation when business stakes are attached.

At Netflix, you run an A/B test on a new recommendation UI. The dashboard shows a live p-value each hour, and stakeholders want to stop the test as soon as $p < 0.05$. How do you critique this, and what would you do instead?

NetflixMediumHypothesis Testing, A/B Testing, and Power

Sample Answer

You could keep checking the same fixed-horizon p-value and stop early, or you could use a design that supports interim looks. The first option inflates your Type I error because repeated peeking increases the chance you see $p < 0.05$ under the null. The second option wins here because group sequential boundaries or always-valid p-values control error while allowing early stopping. If you cannot change the method, you should commit to a pre-registered horizon and only read the p-value at the end.

At Meta, an experiment lifts click-through rate with $p = 0.03$ and a 95% CI of $[0.1\%, 0.9\%]$ relative. A PM says, "There is a 97% chance this will be positive in production." How do you respond, and what does the p-value actually mean?

MetaEasyHypothesis Testing, A/B Testing, and Power

Sample Answer

Step by step: the p-value is $P(\text{data as or more extreme} \mid H_0)$, not $P(H_1 \mid \text{data})$. So $p = 0.03$ means that if the true effect were zero, you would see results this extreme about 3% of the time. The CI says the data are consistent with a small but positive lift, but it still does not give a probability the effect is positive without a Bayesian prior. If they want a probability of being positive in production, you point them to a posterior probability or to replication and external validity checks.

At Uber, you test a new driver incentive and track 12 metrics. You plan to call the test a win if any metric is significant at $\alpha=0.05$. What is the statistical problem, and how would you redesign the decision rule?

UberHardHypothesis Testing, A/B Testing, and Power

Sample Answer

This question is checking whether you can connect business decision rules to error rates under multiple comparisons. If you hunt across 12 metrics and declare victory when any hits $p<0.05$, your family-wise Type I error can be much larger than 5%. You should pre-specify a primary metric tied to the goal, and treat the rest as secondary or diagnostic. If multiple metrics truly drive the launch decision, you adjust for multiplicity, for example Bonferroni, Holm, or a controlled FDR, and you define a coherent success criterion before the test runs.

At Airbnb, your primary metric is bookings per user, but it is highly skewed with many zeros. The team wants a t-test on the mean and a standard power calculation, then to ship if $p<0.05$. What is your approach, and what pitfalls do you call out?

AirbnbMediumHypothesis Testing, A/B Testing, and Power

Sample Answer

The standard move is to define the estimand as the mean difference and use a t-test, since with enough users the CLT often makes the mean approximately normal. But here, skew and zero inflation can slow convergence, and variance estimates can be unstable, which makes both p-values and power projections brittle. You can use variance reduction like CUPED, or switch to a robust approach like a permutation test on the mean, or model-based methods like a two-part model, while keeping the estimand clear. You also sanity check power via simulation using historical data rather than relying only on closed-form formulas.

At Google, you are asked to design an A/B test with 80% power to detect a 0.5% relative lift in conversion at $\alpha=0.05$. What inputs do you need, and how do you explain the tradeoff between MDE, sample size, and duration to a PM?

GoogleMediumHypothesis Testing, A/B Testing, and Power

At Spotify, you observe a statistically significant lift in the overall metric, but the effect is negative for a key high-value segment. How do you decide whether to ship, and how do you avoid a misleading conclusion due to segment slicing?

SpotifyHardHypothesis Testing, A/B Testing, and Power

Practice more Hypothesis Testing, A/B Testing, and Power questions

Regression, Causal Thinking, and Model Assumptions

Bayesian thinking is becoming essential at companies that need to make decisions under uncertainty with limited data, especially in areas like experimentation, forecasting, and recommendation systems. Interviewers test whether you can construct reasonable priors, interpret posterior distributions, and communicate uncertainty to non-technical stakeholders.

The challenge with Bayesian questions isn't the math, it's the business judgment: when to trust your prior versus the data, how to incorporate historical information from similar contexts, and what decision framework to use when the posterior still leaves substantial uncertainty.

Regression, Causal Thinking, and Model Assumptions

Expect questions that force you to connect regression outputs to decisions, diagnose assumption violations, and reason about confounding. You often get tripped up when correlation is mistaken for causation, when multicollinearity affects interpretability, or when you cannot explain what the model is actually identifying.

At Uber, you regress weekly rider churn on promo exposure (yes/no), rider tenure, and number of support tickets, and the promo coefficient is negative and significant. What decision would you make about scaling the promo, and what is the key identification risk in interpreting that coefficient causally?

UberMediumRegression, Causal Thinking, and Model Assumptions

Sample Answer

Reason through it: You start by asking whether promo exposure is exogenous, because the coefficient only reflects a causal effect if exposure is as good as random given controls. If the business targets promos to riders who are already at high churn risk, you have selection bias, so the negative coefficient could be understating the true benefit or even flipping sign depending on targeting rules. Before scaling, you would check how assignment happens and look for quasi random variation, for example eligibility thresholds or randomized holdouts. If you cannot defend identification, you should treat the estimate as predictive, not causal, and avoid ROI claims.

At Netflix, you model watch time as a function of number of recommendations shown, homepage load time, and device type. The coefficient on recommendations is positive, but you suspect reverse causality. How would you explain what the interviewer is testing, and what design or modeling change would you propose?

NetflixHardRegression, Causal Thinking, and Model Assumptions

Sample Answer

This question is checking whether you can separate correlation from causation and articulate what the regression is actually identifying. If heavy watchers trigger the system to show more recommendations, then recommendations is endogenous and OLS is biased, you are partly measuring user intent. You would propose an experiment or an instrument that shifts recommendations without directly affecting watch time, for example randomized quota of modules or an eligibility rule. Absent that, you would at least reframe the coefficient as an association conditional on controls, not a causal lift per additional recommendation.

At Spotify, you run a linear regression to predict daily listening minutes from ad load, subscription status, and prior week minutes. Residual plots show a fan shape, and a Breusch Pagan test rejects homoskedasticity. What do you do, and when is the standard fix not enough?

SpotifyMediumRegression, Causal Thinking, and Model Assumptions

Sample Answer

The standard move is to use heteroskedasticity robust standard errors so your inference does not rely on constant variance. But here, modeling $E[y\mid x]$ with a linear mean might still be misspecified if variance increases because the relationship is nonlinear or you have unmodeled segments. You would consider transforming the outcome, adding nonlinear terms, or fitting a model that matches the distribution, for example a Gamma with log link if minutes are positive and skewed. If the goal is decision making, you also check whether prediction intervals are miscalibrated for high usage users, not just whether $p$ values change.

At Meta, you fit a regression for ad conversion on bid, ad quality score, and predicted click through rate, and you see high VIFs for quality score and predicted CTR. The quality score coefficient changes sign when you add predicted CTR. How do you interpret the coefficients, and what should you do next?

MetaHardRegression, Causal Thinking, and Model Assumptions

Sample Answer

Get this wrong in production and you will optimize the wrong lever, for example lowering quality because the coefficient looks negative after conditioning on a collinear proxy. The right call is to admit the coefficients are not stable or individually interpretable because the model is trying to separate near redundant signals, so small data shifts change attribution. You would decide whether you need interpretability or prediction, then either drop one feature, combine them, or use a regularized model and report grouped effects. If you need a causal story, you must define the estimand, because conditioning on predicted CTR could be post treatment or a collider depending on how it is constructed.

At Airbnb, you estimate the effect of enabling instant book on booking rate using a regression with controls for price, reviews, and host response time. What variables could be bad controls here, and how would you explain the bias direction using a causal graph?

AirbnbHardRegression, Causal Thinking, and Model Assumptions

At Microsoft, you build a linear model for customer spend with features including tenure, number of seats, and sales touchpoints. A few accounts have extreme spend and high leverage, and Cook's distance flags them. What is your approach to deciding whether to keep, cap, or remove them, and how do you justify it to stakeholders?

MicrosoftMediumRegression, Causal Thinking, and Model Assumptions

Practice more Regression, Causal Thinking, and Model Assumptions questions

Bayesian Inference and Decision Making Under Uncertainty

Beyond frequentist testing, you may need to update beliefs with data and communicate uncertainty in a way stakeholders can act on. Candidates commonly struggle to choose priors sensibly, interpret credible intervals, and connect posterior results to concrete decisions like ranking, experimentation, or risk control.

At Netflix you are choosing between two recommendation ranking models. Model A has 520 clicks out of 10,000 impressions, Model B has 500 clicks out of 10,000, and you assume independent Beta priors $\text{Beta}(1,1)$ for each CTR. What is the posterior probability that Model A has higher CTR than Model B, and how would you communicate that to a product manager?

NetflixMediumBayesian Inference and Decision Making Under Uncertainty

Sample Answer

This question is checking whether you can go from a Bayesian model to an actionable probability statement, not just a p-value. You update to posteriors $p_A \sim \text{Beta}(1+520, 1+9480)$ and $p_B \sim \text{Beta}(1+500, 1+9500)$. Then you estimate $P(p_A > p_B)$, typically via Monte Carlo draws from both Betas and counting the fraction where $p_A > p_B$. You tell the PM, "Given our prior and the data, there is about X% chance A is better than B on CTR," and you pair it with expected lift and decision thresholds.

At Uber you are launching in a new city and need a prior for conversion rate from app open to ride request. You have three similar cities with past conversion rates around 2.0%, 2.4%, and 2.2% on about 50,000 opens each. How would you construct a prior, and when would you avoid using that historical information?

UberHardBayesian Inference and Decision Making Under Uncertainty

Sample Answer

The standard move is to fit an empirical Bayes prior, for example a Beta distribution whose mean and variance match the historical city-to-city variation, then use it as $\text{Beta}(\alpha,\beta)$ for the new city. But here, covariate shift matters because a new city can differ in pricing, supply, regulations, or user mix, so the historical cities might be overconfident and miscalibrated. You would downweight history by using a weaker prior (smaller $\alpha+\beta$), or a hierarchical model with a city-level random effect and partial pooling. You avoid using the historical info when you have strong evidence the new city is not exchangeable with the old ones, in which case a skeptical prior centered broadly is safer.

At Meta you run an experiment on ad relevance and measure revenue per user, which is heavy-tailed and noisy. Stakeholders want a single number they can act on tomorrow morning. Which Bayesian model would you use, what posterior summary would you report, and how would you turn that into a ship or no-ship decision with asymmetric risk?

MetaHardBayesian Inference and Decision Making Under Uncertainty

Sample Answer

Get this wrong in production and you either ship a change that increases tail losses, or you block a real win because the mean is dominated by noise. The right call is a robust likelihood, for example Student-t on log revenue or a mixture model, then compute the posterior of incremental value per user and the posterior predictive for downside risk. You report something decision-ready like $P(\Delta > 0)$ plus a credible interval for $\Delta$, and also $P(\Delta < -\tau)$ for a business harm threshold $\tau$. With asymmetric risk, you choose the action that maximizes expected utility, for example ship only if $$\mathbb{E}[U(\Delta)] > 0$$ or if $P(\Delta < -\tau)$ is below a preset risk cap.

At Google you monitor a rare event rate, false positive reports per million searches, and you get 0 events in the last week. Using a Bayesian approach, how do you produce an upper bound on the rate that is still meaningful, and what prior would you choose to avoid overconfidence?

GoogleMediumBayesian Inference and Decision Making Under Uncertainty

At Airbnb you are ranking listings and you have sparse data for new hosts. You model booking probability with a Bayesian hierarchical model. How would you explain to a nontechnical stakeholder why the model intentionally shrinks extreme estimates, and how that affects the tradeoff between exploration and exploitation?

AirbnbEasyBayesian Inference and Decision Making Under Uncertainty

Practice more Bayesian Inference and Decision Making Under Uncertainty questions

How to Prepare for Statistics Interviews

Practice distribution recognition with real scenarios

Don't just memorize probability formulas. For each major distribution (binomial, Poisson, normal, exponential), write out three different business scenarios where it would apply. Practice explaining your reasoning out loud.

Master the confidence interval interpretation

Most candidates botch this fundamental concept. Practice explaining what a 95% confidence interval actually means, why it's not a probability statement about the parameter, and how you'd communicate uncertainty to a product manager.

Build your causal inference vocabulary

Learn to spot and name common threats to causal inference: selection bias, omitted variable bias, reverse causality, and collider bias. Practice proposing specific solutions like instrumental variables, difference-in-differences, or randomized experiments.

Connect every statistical result to a business decision

Never end your answer with just a number or p-value. Always explain what action you'd recommend, what additional information you'd want, and how you'd communicate the uncertainty to stakeholders who need to make decisions.

How Ready Are You for Statistics Interviews?

1 / 6

Descriptive Statistics and Data Summaries

You are analyzing customer spend, the distribution is strongly right skewed with a few extreme outliers. A product manager asks for a single number to represent typical spend and a way to summarize variability. What do you report and why?

Frequently Asked Questions

How much statistics depth do I need for a Data Analyst, Data Scientist, or Quantitative Researcher interview?

You should be fluent in probability basics, sampling, hypothesis tests, confidence intervals, linear regression, and experiment design. For Data Scientist roles, expect deeper questions on bias and variance, regularization, model evaluation, and causal inference basics. For Quantitative Researcher roles, you often need stronger probability, distributions, stochastic processes basics, and rigorous reasoning about assumptions and tail risk.

Which companies tend to ask the most statistics-heavy interview questions?

Tech companies with strong experimentation culture, large marketplaces, and ads or ranking systems often ask a lot about A/B testing, p-values, power, and metric tradeoffs. Quant trading firms and market makers tend to push harder on probability, distributional thinking, and inference under uncertainty. Healthcare, insurance, and fintech also commonly emphasize statistical modeling, calibration, and validation.

Is coding required for statistics interviews, or is it mostly theory?

You may need light coding to compute summary statistics, run regressions, or simulate a sampling distribution, usually in SQL or Python/R. Many interviews mix conceptual questions with a small take-home or live exercise that tests whether you can translate statistical ideas into code. For coding practice aligned to these tasks, use datainterview.com/coding.

How do statistics interview questions differ across Data Analyst, Data Scientist, and Quantitative Researcher roles?

Data Analyst interviews focus on experimental design details, metric selection, interpreting confidence intervals, and common pitfalls like selection bias and Simpson’s paradox. Data Scientist interviews add modeling-related statistics, such as likelihood, regularization, cross-validation, and calibration, plus deeper discussion of assumptions. Quantitative Researcher interviews are typically the most math-forward, emphasizing probability, distributions, estimators, and rigorous derivations or proofs of key results.

How can I prepare for statistics interviews if I have no real-world experience?

Use simulated projects: design a mock A/B test, generate synthetic data, and practice estimating power, confidence intervals, and false positive rates via simulation. Practice explaining assumptions and failure modes, for example what happens if samples are dependent or the metric is heavy-tailed. Drill common prompts and solutions at datainterview.com/questions, and be ready to justify each step with statistical reasoning.

What are the most common mistakes candidates make in statistics interviews, and how do I avoid them?

You often lose points by applying a test without checking assumptions like independence, randomization, equal variance, or sample size adequacy. Another common mistake is confusing statistical significance with practical significance, or ignoring multiple testing and peeking, which inflates false positives. You should state assumptions explicitly, discuss robustness checks, and tie results to effect sizes and uncertainty, not just p-values.

Statistics Interview Questions

Statistics Interview Questions

Descriptive Statistics and Data Summaries

Descriptive Statistics and Data Summaries

Probability, Random Variables, and Core Distributions

Probability, Random Variables, and Core Distributions

Hypothesis Testing, A/B Testing, and Power

Hypothesis Testing, A/B Testing, and Power

Regression, Causal Thinking, and Model Assumptions

Regression, Causal Thinking, and Model Assumptions

Bayesian Inference and Decision Making Under Uncertainty

Bayesian Inference and Decision Making Under Uncertainty

How to Prepare for Statistics Interviews

Practice distribution recognition with real scenarios

Master the confidence interval interpretation

Build your causal inference vocabulary

Connect every statistical result to a business decision

Frequently Asked Questions

Dan Lee

Related Articles

The 7 Best AI Engineering Courses in 2026 (Reviewed by an Engineer)

Forward Deployed Engineer vs AI Engineer: Which Path Fits You?

Securing AI Applications: Common Threats and Defenses