Statistics questions are the backbone of data science interviews at top tech companies. Meta asks about A/B testing power and sample size calculations. Netflix probes your understanding of distribution assumptions and confidence intervals. Uber wants to see if you can identify confounding variables and design robust experiments. Every FAANG company expects you to think statistically about uncertainty, causation, and experimental design.
What makes statistics interviews particularly brutal is that they test both mathematical rigor and business judgment simultaneously. You might nail the probability calculation for a binomial distribution, but then fumble when asked whether you'd actually ship the feature based on that result. Or you'll correctly identify multiple testing problems but struggle to propose a practical solution that stakeholders will accept. The best candidates seamlessly blend technical precision with pragmatic decision-making.
Here are the top 22 statistics questions organized by the core areas that matter most: probability fundamentals, hypothesis testing, regression analysis, and Bayesian thinking.
Statistics Interview Questions
Top Statistics interview questions covering the key areas tested at leading tech companies. Practice with real questions and detailed solutions.
Descriptive Statistics and Data Summaries
Candidates consistently underestimate how much probability theory matters in day-to-day data work, then get blindsided when interviewers ask them to model user behavior with specific distributions. You need to recognize when arrivals follow a Poisson process, when sampling leads to hypergeometric distributions, and when normal approximations are valid.
The key insight interviewers look for is whether you can connect abstract probability models to real business scenarios. Don't just memorize formulas: practice explaining why a Poisson model makes sense for ride requests, or when you'd use binomial versus normal approximations for conversion rates.
Descriptive Statistics and Data Summaries
Start here: you are tested on whether you can summarize messy product or experiment data into a few reliable numbers, and explain what they mean. You usually stumble when outliers, skew, missingness, or metric definitions make averages and percentiles misleading.
Probability, Random Variables, and Core Distributions
A/B testing questions reveal whether you understand the statistical reasoning behind product decisions, not just the mechanics of running tests. Interviewers want to see if you can spot multiple comparisons problems, recognize when stopping rules invalidate p-values, and distinguish between statistical significance and practical importance.
Most candidates can calculate a t-test, but they struggle with the subtler issues that matter in practice: early stopping bias, confidence interval interpretation, and power analysis. The strongest answers always connect statistical concepts back to business risk and decision-making frameworks.
Probability, Random Variables, and Core Distributions
In interviews, you need to translate a story problem into a probability model and pick the right distribution under time pressure. You tend to struggle when independence assumptions are implicit, sampling is without replacement, or you confuse discrete counts with continuous rates.
At Meta, you sample 5 users uniformly at random from a pool of 100 users, where 20 are in a treatment group and 80 are in control. What is the probability you pick exactly 2 treatment users, sampling without replacement?
Sample Answer
Use a hypergeometric model: $$P(X=2)=\frac{\binom{20}{2}\binom{80}{3}}{\binom{100}{5}}.$$ You are counting successes in a fixed-size sample drawn without replacement, so independence is not valid and binomial is the wrong default. The numerator picks 2 of the 20 treatment users and 3 of the 80 control users, and the denominator counts all possible 5-user samples.
At Uber, ride requests arrive at an average rate of 12 per hour for a city zone. What is the probability of getting at least 3 requests in the next 10 minutes, assuming the usual model for arrivals?
At Netflix, 2% of streams experience a startup failure. You independently sample 200 streams from logs. Approximate the probability you see at least 8 failures, and state the distributional approximation you are using.
At Google, you A/B test two ranking models and measure click-through rate as clicks divided by impressions for each user. Why is a Poisson model for CTR usually a mismatch, and what distributional model would you pick for clicks given impressions?
At Spotify, you randomly shuffle a playlist of 30 songs that contains 6 songs by the same artist. What is the probability that no two of those 6 songs end up adjacent in the shuffled order?
Hypothesis Testing, A/B Testing, and Power
Regression and causality questions separate junior analysts from senior data scientists because they test your ability to think critically about confounding, selection bias, and model assumptions. You'll need to diagnose when coefficients can't be interpreted causally and propose experimental or quasi-experimental alternatives.
The biggest mistake candidates make is treating regression as a black box prediction tool instead of a causal inference framework. Interviewers want to see you question whether the data generating process matches your modeling assumptions, especially around endogeneity and omitted variable bias.
Hypothesis Testing, A/B Testing, and Power
You will be asked to design, critique, and interpret A/B tests like you would at Meta, Google, or Netflix, including edge cases like peeking and multiple metrics. Many candidates miss the practical implications of Type I and Type II errors, power, and p-value interpretation when business stakes are attached.
At Netflix, you run an A/B test on a new recommendation UI. The dashboard shows a live p-value each hour, and stakeholders want to stop the test as soon as $p < 0.05$. How do you critique this, and what would you do instead?
Sample Answer
You could keep checking the same fixed-horizon p-value and stop early, or you could use a design that supports interim looks. The first option inflates your Type I error because repeated peeking increases the chance you see $p < 0.05$ under the null. The second option wins here because group sequential boundaries or always-valid p-values control error while allowing early stopping. If you cannot change the method, you should commit to a pre-registered horizon and only read the p-value at the end.
At Meta, an experiment lifts click-through rate with $p = 0.03$ and a 95% CI of $[0.1\%, 0.9\%]$ relative. A PM says, "There is a 97% chance this will be positive in production." How do you respond, and what does the p-value actually mean?
At Uber, you test a new driver incentive and track 12 metrics. You plan to call the test a win if any metric is significant at $\alpha=0.05$. What is the statistical problem, and how would you redesign the decision rule?
At Airbnb, your primary metric is bookings per user, but it is highly skewed with many zeros. The team wants a t-test on the mean and a standard power calculation, then to ship if $p<0.05$. What is your approach, and what pitfalls do you call out?
At Google, you are asked to design an A/B test with 80% power to detect a 0.5% relative lift in conversion at $\alpha=0.05$. What inputs do you need, and how do you explain the tradeoff between MDE, sample size, and duration to a PM?
At Spotify, you observe a statistically significant lift in the overall metric, but the effect is negative for a key high-value segment. How do you decide whether to ship, and how do you avoid a misleading conclusion due to segment slicing?
Regression, Causal Thinking, and Model Assumptions
Bayesian thinking is becoming essential at companies that need to make decisions under uncertainty with limited data, especially in areas like experimentation, forecasting, and recommendation systems. Interviewers test whether you can construct reasonable priors, interpret posterior distributions, and communicate uncertainty to non-technical stakeholders.
The challenge with Bayesian questions isn't the math, it's the business judgment: when to trust your prior versus the data, how to incorporate historical information from similar contexts, and what decision framework to use when the posterior still leaves substantial uncertainty.
Regression, Causal Thinking, and Model Assumptions
Expect questions that force you to connect regression outputs to decisions, diagnose assumption violations, and reason about confounding. You often get tripped up when correlation is mistaken for causation, when multicollinearity affects interpretability, or when you cannot explain what the model is actually identifying.
At Uber, you regress weekly rider churn on promo exposure (yes/no), rider tenure, and number of support tickets, and the promo coefficient is negative and significant. What decision would you make about scaling the promo, and what is the key identification risk in interpreting that coefficient causally?
Sample Answer
Reason through it: You start by asking whether promo exposure is exogenous, because the coefficient only reflects a causal effect if exposure is as good as random given controls. If the business targets promos to riders who are already at high churn risk, you have selection bias, so the negative coefficient could be understating the true benefit or even flipping sign depending on targeting rules. Before scaling, you would check how assignment happens and look for quasi random variation, for example eligibility thresholds or randomized holdouts. If you cannot defend identification, you should treat the estimate as predictive, not causal, and avoid ROI claims.
At Netflix, you model watch time as a function of number of recommendations shown, homepage load time, and device type. The coefficient on recommendations is positive, but you suspect reverse causality. How would you explain what the interviewer is testing, and what design or modeling change would you propose?
At Spotify, you run a linear regression to predict daily listening minutes from ad load, subscription status, and prior week minutes. Residual plots show a fan shape, and a Breusch Pagan test rejects homoskedasticity. What do you do, and when is the standard fix not enough?
At Meta, you fit a regression for ad conversion on bid, ad quality score, and predicted click through rate, and you see high VIFs for quality score and predicted CTR. The quality score coefficient changes sign when you add predicted CTR. How do you interpret the coefficients, and what should you do next?
At Airbnb, you estimate the effect of enabling instant book on booking rate using a regression with controls for price, reviews, and host response time. What variables could be bad controls here, and how would you explain the bias direction using a causal graph?
At Microsoft, you build a linear model for customer spend with features including tenure, number of seats, and sales touchpoints. A few accounts have extreme spend and high leverage, and Cook's distance flags them. What is your approach to deciding whether to keep, cap, or remove them, and how do you justify it to stakeholders?
Bayesian Inference and Decision Making Under Uncertainty
Bayesian Inference and Decision Making Under Uncertainty
Beyond frequentist testing, you may need to update beliefs with data and communicate uncertainty in a way stakeholders can act on. Candidates commonly struggle to choose priors sensibly, interpret credible intervals, and connect posterior results to concrete decisions like ranking, experimentation, or risk control.
At Netflix you are choosing between two recommendation ranking models. Model A has 520 clicks out of 10,000 impressions, Model B has 500 clicks out of 10,000, and you assume independent Beta priors $\text{Beta}(1,1)$ for each CTR. What is the posterior probability that Model A has higher CTR than Model B, and how would you communicate that to a product manager?
Sample Answer
This question is checking whether you can go from a Bayesian model to an actionable probability statement, not just a p-value. You update to posteriors $p_A \sim \text{Beta}(1+520, 1+9480)$ and $p_B \sim \text{Beta}(1+500, 1+9500)$. Then you estimate $P(p_A > p_B)$, typically via Monte Carlo draws from both Betas and counting the fraction where $p_A > p_B$. You tell the PM, "Given our prior and the data, there is about X% chance A is better than B on CTR," and you pair it with expected lift and decision thresholds.
At Uber you are launching in a new city and need a prior for conversion rate from app open to ride request. You have three similar cities with past conversion rates around 2.0%, 2.4%, and 2.2% on about 50,000 opens each. How would you construct a prior, and when would you avoid using that historical information?
At Meta you run an experiment on ad relevance and measure revenue per user, which is heavy-tailed and noisy. Stakeholders want a single number they can act on tomorrow morning. Which Bayesian model would you use, what posterior summary would you report, and how would you turn that into a ship or no-ship decision with asymmetric risk?
At Google you monitor a rare event rate, false positive reports per million searches, and you get 0 events in the last week. Using a Bayesian approach, how do you produce an upper bound on the rate that is still meaningful, and what prior would you choose to avoid overconfidence?
At Airbnb you are ranking listings and you have sparse data for new hosts. You model booking probability with a Bayesian hierarchical model. How would you explain to a nontechnical stakeholder why the model intentionally shrinks extreme estimates, and how that affects the tradeoff between exploration and exploitation?
How to Prepare for Statistics Interviews
Practice distribution recognition with real scenarios
Don't just memorize probability formulas. For each major distribution (binomial, Poisson, normal, exponential), write out three different business scenarios where it would apply. Practice explaining your reasoning out loud.
Master the confidence interval interpretation
Most candidates botch this fundamental concept. Practice explaining what a 95% confidence interval actually means, why it's not a probability statement about the parameter, and how you'd communicate uncertainty to a product manager.
Build your causal inference vocabulary
Learn to spot and name common threats to causal inference: selection bias, omitted variable bias, reverse causality, and collider bias. Practice proposing specific solutions like instrumental variables, difference-in-differences, or randomized experiments.
Connect every statistical result to a business decision
Never end your answer with just a number or p-value. Always explain what action you'd recommend, what additional information you'd want, and how you'd communicate the uncertainty to stakeholders who need to make decisions.
How Ready Are You for Statistics Interviews?
1 / 6You are analyzing customer spend, the distribution is strongly right skewed with a few extreme outliers. A product manager asks for a single number to represent typical spend and a way to summarize variability. What do you report and why?
Frequently Asked Questions
How much statistics depth do I need for a Data Analyst, Data Scientist, or Quantitative Researcher interview?
You should be fluent in probability basics, sampling, hypothesis tests, confidence intervals, linear regression, and experiment design. For Data Scientist roles, expect deeper questions on bias and variance, regularization, model evaluation, and causal inference basics. For Quantitative Researcher roles, you often need stronger probability, distributions, stochastic processes basics, and rigorous reasoning about assumptions and tail risk.
Which companies tend to ask the most statistics-heavy interview questions?
Tech companies with strong experimentation culture, large marketplaces, and ads or ranking systems often ask a lot about A/B testing, p-values, power, and metric tradeoffs. Quant trading firms and market makers tend to push harder on probability, distributional thinking, and inference under uncertainty. Healthcare, insurance, and fintech also commonly emphasize statistical modeling, calibration, and validation.
Is coding required for statistics interviews, or is it mostly theory?
You may need light coding to compute summary statistics, run regressions, or simulate a sampling distribution, usually in SQL or Python/R. Many interviews mix conceptual questions with a small take-home or live exercise that tests whether you can translate statistical ideas into code. For coding practice aligned to these tasks, use datainterview.com/coding.
How do statistics interview questions differ across Data Analyst, Data Scientist, and Quantitative Researcher roles?
Data Analyst interviews focus on experimental design details, metric selection, interpreting confidence intervals, and common pitfalls like selection bias and Simpson’s paradox. Data Scientist interviews add modeling-related statistics, such as likelihood, regularization, cross-validation, and calibration, plus deeper discussion of assumptions. Quantitative Researcher interviews are typically the most math-forward, emphasizing probability, distributions, estimators, and rigorous derivations or proofs of key results.
How can I prepare for statistics interviews if I have no real-world experience?
Use simulated projects: design a mock A/B test, generate synthetic data, and practice estimating power, confidence intervals, and false positive rates via simulation. Practice explaining assumptions and failure modes, for example what happens if samples are dependent or the metric is heavy-tailed. Drill common prompts and solutions at datainterview.com/questions, and be ready to justify each step with statistical reasoning.
What are the most common mistakes candidates make in statistics interviews, and how do I avoid them?
You often lose points by applying a test without checking assumptions like independence, randomization, equal variance, or sample size adequacy. Another common mistake is confusing statistical significance with practical significance, or ignoring multiple testing and peeking, which inflates false positives. You should state assumptions explicitly, discuss robustness checks, and tie results to effect sizes and uncertainty, not just p-values.

