Data Analyst at a Glance
Total Compensation
$134k - $290k/yr
Interview Rounds
6 rounds
Difficulty
Levels
Entry - Principal
Education
Bachelor's
Experience
0–15+ yrs
Most candidates prep for McKinsey's Data Analyst interview the way they'd prep for a tech company loop. Then they hit a round where they're asked to structure a healthcare cost problem from scratch and recommend what to tell a hospital CFO. The candidates who wash out here aren't weak technically. They never practiced framing analytical work the way McKinsey's consulting teams actually consume it.
McKinsey & Company Data Analyst Role
Primary Focus
Skill Profile
Math & Stats
MediumStrong foundation in quantitative thinking, statistical analysis, and hypothesis testing to derive meaningful insights from data.
Software Eng
MediumRequires intermediate programmatic expertise in Python or R for data manipulation and analysis.
Data & SQL
MediumProficiency in ETL concepts, data warehousing procedures, and building/managing data pipelines (e.g., with Apache Airflow) to automate reporting and analysis.
Machine Learning
LowRequires a basic understanding of modeling techniques such as regression models, clustering, classification, and causal inference.
Applied AI
LowNo explicit mention of modern AI or GenAI in the job description.
Infra & Cloud
LowNo mention of infrastructure or cloud deployment responsibilities for this role.
Business
HighStrong ability to translate data analysis into valuable business insights, design dashboards for stakeholders, and address common business challenges through data.
Viz & Comms
HighExplicit need to present insights and work with stakeholders; data visualization tools and clear communication in Spanish/English are highlighted for the role.
Languages
Tools & Technologies
Want to ace the interview?
Practice with real questions.
You're not shipping dashboards to a product manager. In McKinsey's Health Systems & Services practice, where DA hiring has been concentrated, you're building data marts from claims extracts in Databricks and packaging cost-driver analyses into exhibits that a partner presents to a payer's VP of Network Strategy. Success after year one looks like owning an analytics workstream end-to-end on a client engagement, from wrangling messy provider taxonomy files through to the steering committee slide that changed a decision.
A Typical Week
A Week in the Life of a Data Analyst
Weekly time split
What will surprise you is the reactive texture of the analytical blocks. Your calendar might show three hours of focus time, but an associate's Slack message can redirect your afternoon toward an entirely different dataset with a next-morning deadline. The writing-heavy split also catches people off guard, because "writing" at McKinsey means translating model outputs into exhibits that meet the firm's notoriously precise chart standards, not drafting Jira tickets.
Projects & Impact Areas
Claims data reconciliation is the unsexy foundation: deduplicating provider records, resolving NPI mismatches, and standardizing classification files before any real analysis can begin. That pipeline work, often orchestrated in Airflow with Databricks as the compute layer, feeds directly into the client-visible deliverables like propensity models for patient segmentation or enrollment forecasts prototyped with Prophet for scenario planning. McKinsey's published research on AI-powered customer interactions reflects the kind of applied ML you'd contribute to, where model selection and business framing matter more than novel architectures.
Skills & What's Expected
The skill scores show ML rated high, but don't confuse that with research-depth machine learning. McKinsey's version means picking the right regression or decision tree for a client problem and explaining the tradeoff to a partner who doesn't know what regularization is. The underrated dimension is communication: you're producing exhibits for C-suite healthcare executives, and candidates who can write complex Spark jobs but can't storyline a deck plateau fast.
Levels & Career Growth
Data Analyst Levels
Each level has different expectations, compensation, and interview focus.
$114k
$19k
$8k
What This Level Looks Like
You handle well-defined requests — pull data, build a chart, answer a specific question from a PM or ops lead. Someone senior decides what's worth analyzing; you execute the query and summarize the result.
Interview Focus at This Level
SQL dominates: window functions, CTEs, joins, and GROUP BY. Expect a basic product metrics question and a short behavioral round. Problems are well-defined.
Find your level
Practice with questions tailored to your target level.
Most external hires land at Analyst or Senior Analyst. What separates Senior Specialist from Specialist isn't technical chops; it's whether you can independently shape an engagement's analytical direction and manage stakeholder expectations without your engagement manager scaffolding every step. The promotion blocker worth knowing: McKinsey's headcount reductions reported in 2025 have made the path to Expert (Principal-equivalent) less predictable, and reaching that tier from what candidates report requires a sellable niche like healthcare analytics or GenAI strategy that partners can repeatedly staff you against.
Work Culture
McKinsey's "obligation to dissent" culture means you're expected to push back on a partner when your data contradicts their hypothesis, which is a genuinely different dynamic than most corporate DA seats where you confirm what leadership already believes. The intensity is spiky rather than constant: engagement deadlines can push weeks past 55 hours, but bench periods between studies offer real breathing room. On location, the firm's hybrid stance varies by practice and engagement manager, and some engagements may require client-site travel, so if you're assuming fully remote analytical work, recalibrate early in the recruiter screen.
McKinsey & Company Data Analyst Compensation
Because McKinsey is a private partnership, stock grants generally don't exist for data analyst roles. That shifts the math: your bonus becomes the only variable component, and at senior levels it carries real weight. The spread between floor and ceiling TC widens as you climb precisely because individual ratings and firm performance drive that bonus number up or down each year.
The single biggest negotiation lever most candidates overlook is the sign-on bonus. Base salary has some flex within a band (particularly if you're holding a competing offer), but sign-on is where recruiters have the most room to close you. Put your counter in writing, keep it to one or two specific asks, and anchor on scope: the client-facing travel expectations, the SQL/Python/experimentation skill set the role demands, and the fact that you'd be owning data engineering work alongside analysis. Location or office alignment is another lever worth raising if it matters to you.
McKinsey & Company Data Analyst Interview Process
6 rounds·~4 weeks end to end
Initial Screen
2 roundsRecruiter Screen
An initial phone call with a recruiter to discuss your background, interest in the role, and confirm basic qualifications. Expect questions about your experience, compensation expectations, and timeline.
Tips for this round
- Have a 60-second pitch that clearly states your analytics domain (e.g., ops, finance, marketing), top tools (SQL, Power BI/Tableau, Python/R), and 2 measurable outcomes.
- Be ready to describe your ETL exposure using concrete tooling (e.g., ADF/Informatica/SSIS/Airflow) even if you only consumed pipelines rather than built them end-to-end.
- Clarify constraints early: work authorization, preferred city, hybrid/onsite willingness, and earliest start date—these are common screen-out factors in services firms.
- Prepare a tight project summary using STAR, emphasizing stakeholder management and ambiguity handling (typical in the company engagements).
Hiring Manager Screen
A deeper conversation with the hiring manager focused on your past projects, problem-solving approach, and team fit. You'll walk through your most impactful work and explain how you think about data problems.
Technical Assessment
2 roundsSQL & Data Modeling
A hands-on round where you write SQL queries and discuss data modeling approaches. Expect window functions, CTEs, joins, and questions about how you'd structure tables for analytics.
Tips for this round
- Practice advanced SQL queries, including joins, window functions, aggregations, and subqueries.
- Focus on clarifying assumptions and edge cases before writing your SQL code.
- Think out loud as you solve the problem, explaining your logic and approach to the interviewer.
- Be prepared to discuss how you would validate your query results and optimize for performance.
Product Sense & Metrics
You'll be given a business problem or a product scenario and asked to define key metrics, analyze potential issues, or propose data-driven solutions. This round assesses your ability to translate business needs into analytical questions and derive actionable insights.
Onsite
2 roundsCase Study
Another Super Day component, this round often combines behavioral questions with a practical case study or group task. You might be presented with a business problem related to finance and asked to analyze it, propose solutions, or collaborate on a presentation.
Tips for this round
- Lead with a MECE structure (profit tree, 3Cs, or value chain) and signpost your roadmap before diving into math.
- Do accurate, clean calculations: write units, keep a visible equation, and sanity-check magnitude to catch errors early.
- When given charts/tables, summarize the 'so what' first (trend, driver, anomaly) then quantify and connect to the hypothesis.
- Synthesize frequently: after each section, state what you learned and how it changes your recommendation or what you’d test next.
Behavioral
Assesses collaboration, leadership, conflict resolution, and how you handle ambiguity. Interviewers look for structured answers (STAR format) with concrete examples and measurable outcomes.
Budget about six weeks from application to offer. Gaps of 1-2 weeks between rounds are common because your interviewers are staffed on live client engagements at hospitals, banks, or insurers, and their availability bends to client timelines, not recruiting timelines. The rejection reason candidates report most often is unstructured problem solving: jumping into the McKinsey Solve or the SQL round without framing the problem first, which in a firm that puts analysts in front of healthcare C-suites is a dealbreaker.
From what candidates describe, no single round can carry you. A strong SQL & Data Modeling session won't offset a flat behavioral showing, because McKinsey's PEI-style final round is specifically designed to test whether you can own a client-facing analytics narrative, not just write correct queries. If you only prep the technical dimensions, you're leaving the rounds that assess consulting judgment completely uncovered.
McKinsey & Company Data Analyst Interview Questions
SQL & Data Manipulation
Expect questions that force you to translate messy payments/product prompts into correct SQL under time pressure. You’ll be evaluated on joins, window functions, cohorting, and debugging logic to produce decision-ready tables.
For each listing, compute the trailing 28-day booking revenue, excluding the current day, and return the top 50 listings by that metric for yesterday. Bookings can be refunded, so use net revenue per booking.
Sample Answer
Compute daily net revenue per listing, then sum it over the prior 28 days using a date-based window that excludes the current day. You avoid double counting by aggregating to listing-day before windowing, then filtering to yesterday at the end. Use $[d-28, d-1]$ as the window, not 28 rows, because missing days exist. Net revenue should incorporate refunds at the booking level before the listing-day rollup.
1WITH booking_net AS (
2 SELECT
3 b.booking_id,
4 b.listing_id,
5 DATE(b.booking_ts) AS booking_day,
6 COALESCE(b.gross_amount_usd, 0) - COALESCE(b.refund_amount_usd, 0) AS net_amount_usd
7 FROM bookings b
8 WHERE b.status IN ('confirmed', 'completed', 'refunded')
9),
10listing_day AS (
11 SELECT
12 listing_id,
13 booking_day,
14 SUM(net_amount_usd) AS net_revenue_usd
15 FROM booking_net
16 GROUP BY 1, 2
17),
18scored AS (
19 SELECT
20 listing_id,
21 booking_day,
22 SUM(net_revenue_usd) OVER (
23 PARTITION BY listing_id
24 ORDER BY booking_day
25 RANGE BETWEEN INTERVAL '28' DAY PRECEDING AND INTERVAL '1' DAY PRECEDING
26 ) AS trailing_28d_net_revenue_excl_today_usd
27 FROM listing_day
28)
29SELECT
30 listing_id,
31 trailing_28d_net_revenue_excl_today_usd
32FROM scored
33WHERE booking_day = CURRENT_DATE - INTERVAL '1' DAY
34ORDER BY trailing_28d_net_revenue_excl_today_usd DESC NULLS LAST
35LIMIT 50;You need host-level cancellation rate for the last 90 days, where the numerator is guest-initiated cancellations and the denominator is all bookings that reached confirmed status. Hosts can have multiple listings, and booking status changes are tracked in an events table with one row per status transition.
Product Sense & Metrics
The bar here isn’t whether you know a metric name—it’s whether you can structure an analysis plan that maps to decisions. You’ll need to define success, identify leading vs lagging indicators, and anticipate confounders and data limitations.
How would you define and choose a North Star metric for a product?
Sample Answer
A North Star metric is the single metric that best captures the core value your product delivers to users. For Spotify it might be minutes listened per user per week; for an e-commerce site it might be purchase frequency. To choose one: (1) identify what "success" means for users, not just the business, (2) make sure it's measurable and movable by the team, (3) confirm it correlates with long-term business outcomes like retention and revenue. Common mistakes: picking revenue directly (it's a lagging indicator), picking something too narrow (e.g., page views instead of engagement), or choosing a metric the team can't influence.
Outbound delivery speed for the company Logistics improved from 2.3 to 2.1 days, but CS contacts per 1,000 orders increased by 12% in the same period. You have order, shipment scan, and contact reason data, propose a metric framework to diagnose whether the speed win is causing the contact increase.
A company reduces the guest service fee by 1 percentage point in 5 countries, and Finance wants a metric tree that separates demand lift from margin impact and host behavior changes. Propose the primary success metric, the decomposition you would show (with formulas), and 2 guardrails that prevent gaming or long-run supply damage.
A/B Testing & Experiment Design
What is an A/B test and when would you use one?
Sample Answer
An A/B test is a randomized controlled experiment where you split users into two groups: a control group that sees the current experience and a treatment group that sees a change. You use it when you want to measure the causal impact of a specific change on a metric (e.g., does a new checkout button increase conversion?). The key requirements are: a clear hypothesis, a measurable success metric, enough traffic for statistical power, and the ability to randomly assign users. A/B tests are the gold standard for product decisions because they isolate the effect of your change from other factors.
You run an experiment on the guest cancellation flow and randomize by user_id, but a guest can book multiple trips and see both variants across devices. How do you detect and quantify interference, and what changes to the design or analysis would you make?
A company runs 8 simultaneous experiments on the host pricing page, and your experiment shows $p = 0.03$ on booking conversion and $p = 0.20$ on contribution margin. How do you decide whether this is a real win, and what correction or validation would you apply?
Statistics
Most candidates underestimate how much applied stats shows up in fraud analytics, from thresholding to false-positive tradeoffs. You’ll need to reason clearly about distributions, sampling bias, and how to validate signals with limited labels.
What is a confidence interval and how do you interpret one?
Sample Answer
A 95% confidence interval is a range of values that, if you repeated the experiment many times, would contain the true population parameter 95% of the time. For example, if a survey gives a mean satisfaction score of 7.2 with a 95% CI of [6.8, 7.6], it means you're reasonably confident the true mean lies between 6.8 and 7.6. A common mistake is saying "there's a 95% probability the true value is in this interval" — the true value is fixed, it's the interval that varies across samples. Wider intervals indicate more uncertainty (small sample, high variance); narrower intervals indicate more precision.
A company Logistics changed a routing rule and late deliveries dropped from $2.4\%$ to $2.1\%$ over 14 days, but shipment volume also increased and the mix shifted toward longer-distance lanes. How do you estimate whether the routing change reduced late deliveries, and which statistical model or adjustment would you use?
An AWS Console UI experiment shows a $+1.2\%$ lift in weekly active users, but the metric has heavy-tailed session counts and the variance doubled during the test. How do you decide whether to ship, and what statistical technique would you use to make the result decision-ready?
Data Modeling
When you design tables for analytics, you’re being tested on grain, keys, and how modeling choices impact BI performance and correctness. Expect star schema reasoning, fact/dimension tradeoffs, and how you’d model common product/usage datasets.
An ETL job builds fct_support_interactions from Zendesk tickets, chat transcripts, and on-chain deposit events, and you notice a sudden 12% drop in interactions after a schema change in chat. What data quality checks and pipeline safeguards do you add so this does not silently ship to dashboards again?
Sample Answer
Get this wrong in production and your CX dashboards underreport demand, staffing and SLA decisions get made on fake stability. The right call is to add volume and freshness checks (row count deltas by source, max event timestamp lag), completeness checks on required keys (ticket_id, interaction_id, user_id), and distribution checks on critical dimensions (channel, product surface). Gate the publish step with alerting and fail-closed thresholds, plus backfill logic and schema versioning so a renamed field cannot null out a join unnoticed.
A company wants a single "gross bookings" metric used by Finance and Product, but your model has cancellations, modifications, partial refunds, and multiple payment captures per reservation. How do you model facts and keys so that gross bookings, net bookings, and revenue can be computed without double counting across these flows?
Visualization
When dashboards become the source of truth, small choices in charting and narrative can change decisions. You’ll be tested on picking the right visual, communicating insights to non-technical stakeholders, and proposing actionable next steps.
A Tableau dashboard for the company Retail shows conversion rate by store, but the VP wants stores ranked and "actionable" by tomorrow. What is your default chart and sorting approach, and what adjustment do you make to avoid overreacting to small-sample stores?
Sample Answer
The standard move is a ranked bar chart of conversion with a reference line for the fleet median, plus a small table for traffic and transactions. But here, sample size matters because $n$ varies wildly by store, so the ranking is mostly noise for low-traffic locations. You either filter to a minimum volume threshold or plot a funnel chart (conversion versus sessions) with confidence bands, then call out only statistically stable outliers for action.
You ship an exec dashboard for iOS crash rate by build, but a new build rollout causes an apparent crash-rate jump. How do you redesign the dashboard so leadership can tell whether the build is worse versus the user mix changing due to staged rollout?
Data Pipelines & Engineering
In practice, you’ll be asked how you keep reporting accurate when pipelines break or definitions drift. Strong answers cover validation checks, anomaly detection, backfills, idempotency, and communicating data incidents to stakeholders.
What is the difference between a batch pipeline and a streaming pipeline, and when would you choose each?
Sample Answer
Batch pipelines process data in scheduled chunks (e.g., hourly, daily ETL jobs). Streaming pipelines process data continuously as it arrives (e.g., Kafka + Flink). Choose batch when: latency tolerance is hours or days (daily reports, model retraining), data volumes are large but infrequent, and simplicity matters. Choose streaming when you need real-time or near-real-time results (fraud detection, live dashboards, recommendation updates). Most companies use both: streaming for time-sensitive operations and batch for heavy analytical workloads, model training, and historical backfills.
You need a trustworthy daily metric for App Store subscriptions that powers Finance reporting and product dashboards, and events can arrive up to 72 hours late. How do you design the warehouse tables and the incremental rebuild logic so the metric is both stable and correct?
An Airflow DAG builds a daily fact table for payouts to hosts, partitioned by payout_date, and finance reports missing payouts for a two week window after a backfill. How do you design the backfill and data quality safeguards so you avoid double counting, preserve idempotency, and keep downstream Superset dashboards stable?
Causal Inference
What is the difference between correlation and causation, and how do you establish causation?
Sample Answer
Correlation means two variables move together; causation means one actually causes the other. Ice cream sales and drowning rates are correlated (both rise in summer) but one doesn't cause the other — temperature is the confounder. To establish causation: (1) run a randomized experiment (A/B test) which eliminates confounders by design, (2) when experiments aren't possible, use quasi-experimental methods like difference-in-differences, regression discontinuity, or instrumental variables, each of which relies on specific assumptions to approximate random assignment. The key question is always: what else could explain this relationship besides a direct causal effect?
Hulu ad load was reduced for a subset of DMAs, but advertisers also shifted budgets toward those same DMAs mid-flight due to a sports schedule. You need the causal effect of ad load reduction on ad revenue per hour, do you use a geo-based diff-in-diff or an instrumental variables approach, and why?
A company runs a retargeting campaign for the company+ lapsed subscribers, but exposure is highly selective because it targets users with high predicted return probability. How do you design a quasi-experiment to estimate incremental resubscription lift, and what diagnostics convince you the estimate is not driven by selection bias?
The weight toward pipeline and query work (Data Engineering/ETL plus SQL Analytics) signals that McKinsey's DA loop is structured to find people who can own the full path from raw claims ingestion in Databricks to polished metrics a partner can present at a client steering committee. Where it gets especially hard is the overlap between Healthcare Data Modeling and those two areas. You'll face questions about designing episode-of-care marts or PMPM cost schemas, then immediately need to write the SQL or Spark logic that makes them work against real payer data with late-arriving corrections. The prep mistake most candidates make is spending the bulk of their time on ML algorithms and statistical theory while underestimating how much of this loop tests whether you can model, build, and query healthcare-specific data end to end.
Practice McKinsey-style questions across all six areas at datainterview.com/questions.
How to Prepare for McKinsey & Company Data Analyst Interviews
McKinsey is betting heavily on embedding AI directly into client engagements, not just advising on it from the sidelines. Their Technology Trends Outlook 2025 maps where generative AI will reshape industries, while their research on AI-powered customer interactions shows the firm already building predictive models that ship inside client deliverables. For DAs, this means your day-to-day isn't exploratory analysis that sits in a notebook. You're constructing the data pipelines and models that partners present to healthcare CEOs and bank CFOs as the analytical backbone of a recommendation.
The single biggest "why McKinsey" mistake is talking about prestige or "working with the best." Partners hear that dozens of times a week. What lands is naming a specific piece of the firm's published research, like the State of Fashion report or the Global Banking Annual Review, and connecting it to analytical work you've actually done. "I noticed your demand-forecasting methodology in the fashion report and I've built similar pipelines from messy retail data" tells the interviewer you understand how DAs feed the firm's intellectual product, not just that you want the name on your LinkedIn.
Try a Real Interview Question
Experiment lift in booking conversion by market
sqlGiven users assigned to an experiment variant and their subsequent sessions with booking outcomes, compute booking conversion rate per market for each variant and the absolute lift delta = conv_treatment - conv_control. Output one row per market with conv_control, conv_treatment, and delta, using only sessions within 7 days after each user's assignment timestamp.
| user_id | experiment_name | variant | assigned_at | market |
|---|---|---|---|---|
| 101 | search_ranker_v2 | control | 2026-01-01 10:00:00 | US |
| 102 | search_ranker_v2 | treatment | 2026-01-02 09:00:00 | US |
| 103 | search_ranker_v2 | control | 2026-01-03 12:00:00 | FR |
| 104 | search_ranker_v2 | treatment | 2026-01-03 08:30:00 | FR |
| session_id | user_id | session_start | did_book |
|---|---|---|---|
| 9001 | 101 | 2026-01-02 11:00:00 | 1 |
| 9002 | 101 | 2026-01-10 09:00:00 | 0 |
| 9003 | 102 | 2026-01-05 14:00:00 | 0 |
| 9004 | 103 | 2026-01-04 13:00:00 | 0 |
| 9005 | 104 | 2026-01-06 07:00:00 | 1 |
700+ ML coding problems with a live Python executor.
Practice in the EngineMcKinsey's DA loop includes a case study round borrowed from their consulting track, which means even technical questions get framed around a client scenario (a hospital system's readmission rates, a payer's claims cost drivers) rather than abstract table joins. Practicing problems that force you to interpret business context before writing a query is the best way to prepare for that blend. Drill those at datainterview.com/coding.
Test Your Readiness
Data Analyst Readiness Assessment
1 / 10Can you structure a stakeholder intake conversation to clarify the business problem, define success criteria, and document assumptions and constraints?
McKinsey's process spans case reasoning, SQL, statistics, and behavioral rounds, so a weak spot in any one area can sink you. Calibrate across all of them at datainterview.com/questions.
Frequently Asked Questions
What technical skills are tested in Data Analyst interviews?
Core skills tested are SQL (window functions, CTEs, joins), product metrics and dashboarding, basic statistics, and data visualization. SQL, Python, R are the primary languages. Expect more weight on communication and metric interpretation than on ML or engineering.
How long does the Data Analyst interview process take?
Most candidates report 3 to 5 weeks from first recruiter call to offer. The process typically includes a recruiter screen, hiring manager screen, SQL round, product/case study, and behavioral interviews. Some companies combine SQL with the case study or use a take-home instead.
What is the total compensation for a Data Analyst?
Total compensation across the industry ranges from $85k to $534k depending on level, location, and company. This includes base salary, equity (RSUs or stock options), and annual bonus. Pre-IPO equity is harder to value, so weight cash components more heavily when comparing offers.
What education do I need to become a Data Analyst?
A Bachelor's degree in a quantitative field is the standard baseline. A Master's can help but is rarely required. Strong SQL skills and a portfolio of analytical projects often matter more than graduate credentials.
How should I prepare for Data Analyst behavioral interviews?
Use the STAR format (Situation, Task, Action, Result). Prepare 5 stories covering cross-functional collaboration, handling ambiguity, failed projects, technical disagreements, and driving impact without authority. Keep each answer under 90 seconds. Most interview loops include 1-2 dedicated behavioral rounds.
How many years of experience do I need for a Data Analyst role?
Entry-level positions typically require 0+ years (including internships and academic projects). Senior roles expect 7-15+ years of industry experience. What matters more than raw years is demonstrated impact: shipped models, experiments that changed decisions, or pipelines you built and maintained.



