Causal Inference Interview Questions

Dan Lee's profile image
Dan LeeData & AI Lead
Last updateMarch 13, 2026

Causal inference questions are becoming mandatory at top tech companies, especially for senior data scientist roles at Meta, Google, Netflix, and Uber. These companies need to understand what drives user behavior, not just predict it. When you're asked to design an experiment or analyze observational data for causal effects, you're being tested on skills that directly impact billion-dollar product decisions.

What makes causal inference interviews brutal is that there's always a hidden trap. You might confidently propose an A/B test, only to realize users can share treatments with friends, violating SUTVA. Or you'll suggest difference-in-differences, then discover the rollout timing creates bias that standard two-way fixed effects can't handle. Interviewers love these gotchas because they separate candidates who memorized techniques from those who understand when methods break.

Here are the top 27 causal inference questions, organized by the core methodologies that dominate tech interviews.

Advanced27 questions

Causal Inference Interview Questions

Top Causal Inference interview questions covering the key areas tested at leading tech companies. Practice with real questions and detailed solutions.

Data ScientistMetaGoogleNetflixAirbnbUberMicrosoftLinkedInSpotify

Potential Outcomes and A/B Testing Assumptions

Most data scientists can run A/B tests, but senior candidates must understand the potential outcomes framework that makes causal inference possible. Interviewers probe whether you grasp SUTVA, unconfoundedness, and positivity because these assumptions determine if your estimates mean anything. The failure mode here is treating randomization as magic: you randomize, compare means, and assume you're done.

The critical insight is that each assumption maps to a specific threat in real product experiments. SUTVA breaks with social features, unconfoundedness fails with non-compliance, and positivity disappears with extreme propensity scores. Master how to spot these violations and you'll stand out from candidates who just know the formulas.

Potential Outcomes and A/B Testing Assumptions

Start by nailing the potential outcomes setup, because interviewers want to see that you can translate product experiments into causal estimands like ATE, ATT, and CATE. You will be pushed on assumptions like SUTVA, consistency, and overlap, and many candidates struggle to explain what breaks when those assumptions fail in real experiments.

Meta runs an A/B test on a new notification ranking model. Users can forward notifications to friends, which changes what those friends see. Define the causal estimand you want, then explain which potential outcomes assumption is most at risk and what that does to your estimate.

MetaMetaHardPotential Outcomes and A/B Testing Assumptions

Sample Answer

Most candidates default to treating this like a standard user level ATE with independent units, but that fails here because interference violates SUTVA. Your potential outcomes $Y_i(1)$ and $Y_i(0)$ are not well-defined if they depend on other users' assignments, so the ATE is no longer identified by simple difference in means. You either need a different estimand, for example a cluster level ATE, or you need to redesign the experiment, for example randomize at the network or group level. If you ignore interference, your estimate can be biased in either direction, and the bias does not go away with more data.

Practice more Potential Outcomes and A/B Testing Assumptions questions

Confounding and Propensity Score Methods

Observational causal inference separates advanced practitioners from beginners, yet most candidates crash on propensity score questions. The typical mistake is thinking propensity scores automatically solve confounding, when really they just make your confounding assumptions explicit and testable. Interviewers want to see you reason about what makes treatment assignment random conditional on covariates.

Your advantage comes from understanding that propensity scores are a preprocessing step, not a magic bullet. The real work is in covariate selection, model diagnostics, and choosing between matching versus weighting. Companies like Uber and Meta deal with massive selection bias in user behavior, so they need people who can navigate these choices thoughtfully.

Confounding and Propensity Score Methods

In this section, you show you can reason about selection bias when randomization is not available, then choose a defensible adjustment strategy. You are expected to discuss matching, weighting, stratification, and diagnostics like balance checks, and candidates often miss how model misspecification and poor overlap can dominate results.

At Meta, you are estimating the effect of enabling a new notification setting on 7 day retention using observational logs. Users who enable it are heavier users at baseline. How would you use propensity scores to adjust, and what diagnostics would you run before trusting the estimate?

MetaMetaMediumConfounding and Propensity Score Methods

Sample Answer

Use propensity score weighting or matching to balance pre treatment covariates between enabled and not enabled users, then estimate the retention difference on the balanced sample. You fit $e(x)=P(T=1\mid X)$ using only pre treatment features like prior sessions, tenure, device, and region, then check standardized mean differences are near 0 after adjustment. You also verify overlap by inspecting the propensity score distributions and trimming or restricting to common support if needed. Finally, you check weight stability, for example effective sample size, so a few extreme weights are not driving the result.

Practice more Confounding and Propensity Score Methods questions

Difference in Differences and Panel Data Pitfalls

Difference-in-differences questions reveal whether you understand modern panel data methods or just the textbook version. Many candidates know the basic setup but fall apart when treatment timing varies or when two-way fixed effects produces biased estimates. Tech companies frequently use staggered rollouts, making this knowledge essential for roles analyzing product launches.

The game-changer is recognizing that recent econometrics research has shown major problems with standard DiD approaches when treatment effects are heterogeneous. Candidates who mention Goodman-Bacon decomposition or propose event study designs demonstrate they're current with best practices, not stuck in 2010.

Difference in Differences and Panel Data Pitfalls

You will be asked to design and critique a DiD study for a feature rollout, policy change, or marketplace intervention using time series or panel data. Many candidates stumble on parallel trends validation, staggered adoption issues, and how to interpret coefficients when treatment timing varies across units.

Meta rolls out a new ranking feature to 30 percent of creators starting in week 10, leaving the rest unchanged. You plan a DiD on weekly creator revenue, how do you check parallel trends and what do you do if pre-trends are not flat?

MetaMetaMediumDifference in Differences and Panel Data Pitfalls

Sample Answer

You could validate parallel trends with a pre-period outcome regression on a treatment indicator and time, or you could run an event study with leads and lags. The event study wins here because it shows you the whole pre-trend pattern, not just a single slope test. If leads are non-zero, you either restrict to a window where trends look parallel, add unit-specific linear trends cautiously, or reweight or match units on pre-period outcomes to improve comparability. You should also sanity check with placebo rollout dates to see if you still get an effect.

Practice more Difference in Differences and Panel Data Pitfalls questions

Instrumental Variables and Encouragement Designs

Instrumental variables questions are where technical depth meets business intuition, and most candidates struggle with both sides. You need to argue convincingly that your instrument affects the outcome only through the treatment, while also explaining why LATE matters for product decisions. The common failure is proposing an instrument that obviously violates exclusion restrictions.

The key insight is that IV estimates a very specific parameter: the effect for compliers only. When a PM asks about the impact of a feature on all users, giving them a LATE estimate can lead to wrong decisions. Strong candidates always connect the economic interpretation back to the business question being asked.

Instrumental Variables and Encouragement Designs

Expect questions that test whether you can salvage causal identification with an instrument when confounding is severe and compliance is imperfect. You need to articulate relevance, exclusion, monotonicity, and what LATE means for product decisions, and candidates often hand wave the exclusion restriction in ways interviewers will challenge.

At Uber, you want the causal effect of a driver earnings guarantee on hours worked, but opt-in is heavily confounded by driver motivation. You propose using random assignment to receive a guarantee offer email as an instrument, how do you argue relevance, exclusion, and what estimand you get with imperfect compliance?

UberUberMediumInstrumental Variables and Encouragement Designs

Sample Answer

Reason through it: First you check relevance, the email must shift take-up, so you show a strong first stage like $E[D\mid Z=1] \neq E[D\mid Z=0]$ and quantify it. Next you defend exclusion, $Z$ affects hours only through taking the guarantee, so you argue the email itself does not change behavior via salience, morale, or information beyond the guarantee, and you look for balance and placebo outcomes to probe this. With imperfect compliance you do not identify ATE, you identify LATE for compliers, $$\tau_{LATE}=\frac{E[Y\mid Z=1]-E[Y\mid Z=0]}{E[D\mid Z=1]-E[D\mid Z=0]}.$$ You also state monotonicity, nobody is less likely to take the guarantee because they got the email, otherwise LATE interpretation breaks.

Practice more Instrumental Variables and Encouragement Designs questions

Regression Discontinuity and Threshold-Based Policies

Regression discontinuity questions test your ability to exploit policy rules for causal identification, but candidates often miss the nuanced decisions that make or break the analysis. Simply knowing that you compare units just above and below a threshold isn't enough when interviewers ask about bandwidth choice, functional form, or what to do with imperfect compliance. These design choices determine whether your estimates are credible.

The sophistication comes from understanding that RD is fundamentally a local experiment around the cutoff. You're not estimating effects for the whole population, just for units near the threshold. Companies like Uber and Netflix have many score-based policies, so they value candidates who can design rigorous RD studies and communicate the limitations clearly.

Regression Discontinuity and Threshold-Based Policies

This area evaluates whether you can exploit a cutoff rule like eligibility thresholds, ranking scores, or risk bands to estimate local causal effects. Interviewers probe bandwidth choice, manipulation tests, functional form sensitivity, and how you would communicate that the effect is local, which is where candidates frequently overclaim generality.

At Uber, drivers with a risk score of 70 or higher are required to complete a safety training before they can go online. You have historical data on risk score and subsequent incidents, how would you estimate the causal effect of training using an RD design and what validity checks would you run?

UberUberMediumRegression Discontinuity and Threshold-Based Policies

Sample Answer

This question is checking whether you can translate a cutoff policy into a credible local causal estimate and defend the assumptions. You would run a local RD around 70, typically local linear regression on either side with a kernel and data-driven bandwidth selection, estimating the jump in incidents at $x=70$. You would check manipulation with a density test at the cutoff and covariate balance near 70, plus a discontinuity check in pre-treatment outcomes if available. You would also state clearly that the estimand is a local average treatment effect for drivers near 70, not for low or very high risk drivers.

Practice more Regression Discontinuity and Threshold-Based Policies questions

How to Prepare for Causal Inference Interviews

Draw the causal graph first

Before jumping into methods, sketch out what causes what in the problem. This forces you to identify confounders, mediators, and colliders that determine which approach will work. Interviewers notice when you think causally from the start.

Connect assumptions to business reality

Don't just state SUTVA or unconfoundedness abstractly. Explain how social features violate SUTVA, or how user self-selection breaks unconfoundedness. Companies need people who spot these issues in real product settings.

Know when methods fail

Study the failure modes: when DiD gives biased estimates, when IV exclusion restrictions break, when propensity scores have poor overlap. Interviewers test whether you blindly apply methods or understand their limitations.

Practice explaining LATE to non-technical stakeholders

IV estimates are often misinterpreted in business contexts. Rehearse explaining why your IV result applies only to compliers, not all users. This skill separates senior candidates who can communicate with PMs from those who just crunch numbers.

Memorize the diagnostic tests

Know how to check parallel trends, test instrument strength, assess covariate balance, and validate RD assumptions. Interviewers expect you to propose specific validation checks, not just mention that you'd 'check assumptions somehow.'

How Ready Are You for Causal Inference Interviews?

1 / 6
Potential Outcomes and A/B Testing Assumptions

You run an A/B test on a website. Some users in control see the treatment UI because they share devices and cookies are overwritten. Which statement best describes what assumption is violated and why it matters for interpreting the estimated treatment effect?

Frequently Asked Questions

How deep do I need to go on Causal Inference for a Data Scientist interview?

You should be comfortable with core identification ideas: confounding, selection bias, counterfactuals, DAGs, and when assumptions make an effect identifiable. Expect to explain and defend common estimators like regression with controls, matching, inverse propensity weighting, difference in differences, synthetic control, and instrumental variables. You also need to interpret results, run sanity checks, and communicate assumptions, not just name methods.

Which companies tend to ask the most Causal Inference questions?

Product driven tech companies with mature experimentation and measurement teams ask it frequently, including Meta, Google, Amazon, Microsoft, Apple, Netflix, Uber, Lyft, DoorDash, Airbnb, and TikTok. Marketplaces, ads, and growth organizations also emphasize it because selection bias is common and randomized tests are not always feasible. Consulting and applied economics groups in fintech and healthcare can be similarly heavy on identification and quasi experiments.

Will I need to code for Causal Inference interviews?

Often yes, but it is usually applied coding rather than algorithm puzzles: estimating propensity scores, implementing IPW, running diff in diff regressions, checking balance, and writing clean analysis in Python or R. Some interviews include SQL to build cohorts and treatment timing for observational studies. For practice, use datainterview.com/coding for implementation style questions and datainterview.com/questions for causal reasoning prompts.

How do Causal Inference interviews differ across Data Scientist sub roles?

Product or experimentation Data Scientists get questions about A/B testing pitfalls, interference, noncompliance, and interpreting treatment effects across segments. Marketing or ads measurement roles focus more on attribution, incrementality, MMM limitations, and instruments or geo experiments. Economics or marketplace roles tend to go deeper on identification with IV, regression discontinuity, diff in diff assumptions, and robustness checks.

How can I prepare for Causal Inference interviews if I have no real world experience?

You can build a small portfolio by reproducing a quasi experimental study on a public dataset and writing a short memo that states the causal question, DAG, identification strategy, and sensitivity checks. Practice translating messy scenarios into assumptions and estimators, for example what to do when treatment timing varies or when selection into treatment is driven by user intent. Use datainterview.com/questions to drill scenario based identification and communication.

What are common mistakes to avoid in Causal Inference interviews?

Do not jump to an estimator without first stating what causal effect you want and what assumptions identify it. Avoid controlling for post treatment variables, conditioning on colliders, or claiming causality from a predictive model without a design, these are classic failure modes. Also do not ignore diagnostics like parallel trends in diff in diff, overlap for propensity methods, or weak instrument concerns in IV.

Dan Lee's profile image

Written by

Dan Lee

Data & AI Lead

Dan is a seasoned data scientist and ML coach with 10+ years of experience at Google, PayPal, and startups. He has helped candidates land top-paying roles and offers personalized guidance to accelerate your data career.

Connect on LinkedIn