Data Analyst Interview Prep

Dan Lee's profile image
Dan LeeData & AI Lead
Last updateMarch 16, 2026
Data Analyst Interview Prep Guide - comprehensive preparation resource for data analyst interviews

Data Analyst at a Glance

Total Compensation

$134k - $290k/yr

Interview Rounds

6 rounds

Difficulty

Levels

Entry - Principal

Education

Bachelor's

Experience

0–15+ yrs

SQL Python RProduct AnalyticssqlBusiness IntelligencepythonData VisualizationFintech

Across 62 companies' interview data, the skill that separates Data Analyst candidates who get offers from those who don't isn't SQL or Python. It's metric decomposition: the ability to take a number like "WAU dropped 8%," break it into cohort-level drivers, and recommend a specific action in under two minutes. Candidates who spend their prep time on ML theory and algorithm complexity instead of this skill are solving the wrong problem entirely.

What Data Analysts Actually Do

Primary Focus

Product AnalyticssqlBusiness IntelligencepythonData VisualizationFintech

Skill Profile

Math & StatsSoftware EngData & SQLMachine LearningApplied AIInfra & CloudBusinessViz & Comms

Math & Stats

Medium

Strong foundation in quantitative thinking, statistical analysis, and hypothesis testing to derive meaningful insights from data.

Software Eng

Medium

Requires intermediate programmatic expertise in Python or R for data manipulation and analysis.

Data & SQL

Medium

Proficiency in ETL concepts, data warehousing procedures, and building/managing data pipelines (e.g., with Apache Airflow) to automate reporting and analysis.

Machine Learning

Low

Requires a basic understanding of modeling techniques such as regression models, clustering, classification, and causal inference.

Applied AI

Low

No explicit mention of modern AI or GenAI in the job description.

Infra & Cloud

Low

No mention of infrastructure or cloud deployment responsibilities for this role.

Business

High

Strong ability to translate data analysis into valuable business insights, design dashboards for stakeholders, and address common business challenges through data.

Viz & Comms

High

Explicit need to present insights and work with stakeholders; data visualization tools and clear communication in Spanish/English are highlighted for the role.

Languages

SQLPythonR

Tools & Technologies

TableauPower BILookerExcelSnowflakeMicrosoft ExcelBigQuery

Want to ace the interview?

Practice with real questions.

Start Mock Interview

You'll find this role everywhere: FAANG product teams dissecting retention funnels, fintech startups tracking transaction fraud rates, retail companies optimizing inventory forecasts. The companies differ, but the core loop is the same: pull data in Snowflake or BigQuery, find the signal, convince someone to act on it. Success after year one means you own a metric area so thoroughly that stakeholders come to you before making decisions, not after.

A Typical Week

A Week in the Life of a Data Analyst

Weekly time split

Analysis30%Meetings18%Writing18%Coding12%Break12%Research5%Infrastructure5%

Writing and meetings together eat 36% of the week, more than analysis itself at 30%. Most candidates picture heads-down SQL all day, but you'll spend Thursday morning translating a Python significance test into a five-slide deck with plain-English takeaways for product leadership. The analyst who gets promoted isn't the one who writes the fastest window function; it's the one who can walk a game designer through a cohort retention chart in Looker and get a ship/no-ship decision before the meeting ends.

Skills & What's Expected

Machine learning won't come up in most DA interviews, but don't mistake that for the role being "easy mode." Interviewers want you to look at a Tableau dashboard showing a 12% WAU drop and immediately ask the right follow-up: is it seasonal, did a feature ship, is it concentrated in one geography or cohort? SQL proficiency (CTEs, window functions, self-joins) is non-negotiable at every level, and Python or R for data manipulation and reproducible analysis is expected from mid-level onward, not just at senior. Excel still shows up at finance-heavy shops and early-stage startups that haven't migrated off spreadsheets.

Levels & Career Growth

Data Analyst Levels

Each level has different expectations, compensation, and interview focus.

Base

$114k

Stock/yr

$19k

Bonus

$8k

0–2 yrs Bachelor's or higher

What This Level Looks Like

You handle well-defined requests — pull data, build a chart, answer a specific question from a PM or ops lead. Someone senior decides what's worth analyzing; you execute the query and summarize the result.

Interview Focus at This Level

SQL dominates: window functions, CTEs, joins, and GROUP BY. Expect a basic product metrics question and a short behavioral round. Problems are well-defined.

Find your level

Practice with questions tailored to your target level.

Start Practicing

Most hiring clusters at entry and mid levels, where the promotion gap comes down to several concrete shifts: owning a metric area end-to-end, running your own A/B test power calculations, and writing analysis that PMs reference in roadmap reviews instead of filing away. Senior analysts define what to measure and mentor others on methodology. Staff and Principal titles are rare outside a handful of companies like Meta, Google, and Airbnb.

The IC-versus-management fork usually appears around the senior level. Many analysts pivot into analytics engineering, product management, or data science by year four or five, especially at companies where the senior IC ceiling feels low relative to those adjacent roles.

Data Analyst Compensation

The wide min-to-max bands you see above reflect real structural differences in how companies pay. From what candidates report, a 4-year RSU schedule is standard at large public tech firms, but the details matter: Amazon's back-loaded 5/15/40/40 vesting means your Year 1 take-home is dramatically lower than the headline number, while Meta and Google spread grants more evenly. Pre-IPO startups hand you options with a 1-year cliff that could expire worthless, so discount that equity heavily when comparing offers.

Signing bonuses and equity grants are almost always more negotiable than base salary. If you have a competing offer, say so explicitly. Recruiters at most mid-to-large companies have authority to bump RSU grants 10 to 20% to close a candidate, but they won't volunteer that room unprompted.

Refresh grants can quietly reshape your comp trajectory at senior levels and above. Strong performers at large public tech companies can expect annual refreshers running 20 to 30% of the initial grant, which compounds into a meaningful gap between staying and job-hopping every two years.

Data Analyst Interview Process

6 rounds·~4 weeks end to end

Initial Screen

2 rounds
1

Recruiter Screen

30mPhone

An initial phone call with a recruiter to discuss your background, interest in the role, and confirm basic qualifications. Expect questions about your experience, compensation expectations, and timeline.

generalbehavioralproduct_sensevisualizationfinance

Tips for this round

  • Have a 60-second pitch that clearly states your analytics domain (e.g., ops, finance, marketing), top tools (SQL, Power BI/Tableau, Python/R), and 2 measurable outcomes.
  • Be ready to describe your ETL exposure using concrete tooling (e.g., ADF/Informatica/SSIS/Airflow) even if you only consumed pipelines rather than built them end-to-end.
  • Clarify constraints early: work authorization, preferred city, hybrid/onsite willingness, and earliest start date—these are common screen-out factors in services firms.
  • Prepare a tight project summary using STAR, emphasizing stakeholder management and ambiguity handling (typical in the company engagements).

Technical Assessment

2 rounds
3

SQL & Data Modeling

60mLive

A hands-on round where you write SQL queries and discuss data modeling approaches. Expect window functions, CTEs, joins, and questions about how you'd structure tables for analytics.

databasedata_modelingdata_warehousestats_codingdata_engineering

Tips for this round

  • Practice advanced SQL queries, including joins, window functions, aggregations, and subqueries.
  • Focus on clarifying assumptions and edge cases before writing your SQL code.
  • Think out loud as you solve the problem, explaining your logic and approach to the interviewer.
  • Be prepared to discuss how you would validate your query results and optimize for performance.

Onsite

2 rounds
5

Case Study

60mVideo Call

Another Super Day component, this round often combines behavioral questions with a practical case study or group task. You might be presented with a business problem related to finance and asked to analyze it, propose solutions, or collaborate on a presentation.

product_sensevisualizationstatisticsguesstimatebehavioral

Tips for this round

  • Lead with a MECE structure (profit tree, 3Cs, or value chain) and signpost your roadmap before diving into math.
  • Do accurate, clean calculations: write units, keep a visible equation, and sanity-check magnitude to catch errors early.
  • When given charts/tables, summarize the 'so what' first (trend, driver, anomaly) then quantify and connect to the hypothesis.
  • Synthesize frequently: after each section, state what you learned and how it changes your recommendation or what you’d test next.

Four weeks is the typical timeline from recruiter call to offer, based on data aggregated from 54 companies. Big tech loops (Meta, Google, Amazon) often stretch to six or seven weeks because of hiring committee reviews and cross-team calibration. Startups and mid-market firms can compress to two weeks, though they rarely cut rounds so much as stack them into fewer days.

From what candidates report, the Hiring Manager Screen is where the most people quietly get cut. HMs want you to describe past impact with specifics ("I reduced churn by 3% after identifying a billing UX drop-off in a Tableau funnel dashboard"), and they screen out anyone who speaks only in generalities.

The behavioral round carries more weight than its final-round position suggests. By round six, the team has already scored your SQL and product sense. They're using STAR-formatted stories to judge whether you can defend a methodology to a skeptical VP or navigate conflicting priorities between, say, a PM who wants a launch metric and a finance lead who wants cost attribution, and candidates who prep five rehearsed stories (stakeholder pushback, ambiguous data, tight deadline, cross-team influence, a mistake you caught) consistently outperform those who wing it.

Data Analyst Interview Questions

SQL & Data Manipulation

Expect questions that force you to translate messy payments/product prompts into correct SQL under time pressure. You’ll be evaluated on joins, window functions, cohorting, and debugging logic to produce decision-ready tables.

For each listing, compute the trailing 28-day booking revenue, excluding the current day, and return the top 50 listings by that metric for yesterday. Bookings can be refunded, so use net revenue per booking.

AirbnbAirbnbMediumWindow Functions and Time Windows

Sample Answer

Compute daily net revenue per listing, then sum it over the prior 28 days using a date-based window that excludes the current day. You avoid double counting by aggregating to listing-day before windowing, then filtering to yesterday at the end. Use $[d-28, d-1]$ as the window, not 28 rows, because missing days exist. Net revenue should incorporate refunds at the booking level before the listing-day rollup.

SQL
1WITH booking_net AS (
2  SELECT
3    b.booking_id,
4    b.listing_id,
5    DATE(b.booking_ts) AS booking_day,
6    COALESCE(b.gross_amount_usd, 0) - COALESCE(b.refund_amount_usd, 0) AS net_amount_usd
7  FROM bookings b
8  WHERE b.status IN ('confirmed', 'completed', 'refunded')
9),
10listing_day AS (
11  SELECT
12    listing_id,
13    booking_day,
14    SUM(net_amount_usd) AS net_revenue_usd
15  FROM booking_net
16  GROUP BY 1, 2
17),
18scored AS (
19  SELECT
20    listing_id,
21    booking_day,
22    SUM(net_revenue_usd) OVER (
23      PARTITION BY listing_id
24      ORDER BY booking_day
25      RANGE BETWEEN INTERVAL '28' DAY PRECEDING AND INTERVAL '1' DAY PRECEDING
26    ) AS trailing_28d_net_revenue_excl_today_usd
27  FROM listing_day
28)
29SELECT
30  listing_id,
31  trailing_28d_net_revenue_excl_today_usd
32FROM scored
33WHERE booking_day = CURRENT_DATE - INTERVAL '1' DAY
34ORDER BY trailing_28d_net_revenue_excl_today_usd DESC NULLS LAST
35LIMIT 50;
Practice more SQL & Data Manipulation questions

Product Sense & Metrics

The bar here isn’t whether you know a metric name—it’s whether you can structure an analysis plan that maps to decisions. You’ll need to define success, identify leading vs lagging indicators, and anticipate confounders and data limitations.

How would you define and choose a North Star metric for a product?

EasyFundamentals

Sample Answer

A North Star metric is the single metric that best captures the core value your product delivers to users. For Spotify it might be minutes listened per user per week; for an e-commerce site it might be purchase frequency. To choose one: (1) identify what "success" means for users, not just the business, (2) make sure it's measurable and movable by the team, (3) confirm it correlates with long-term business outcomes like retention and revenue. Common mistakes: picking revenue directly (it's a lagging indicator), picking something too narrow (e.g., page views instead of engagement), or choosing a metric the team can't influence.

Practice more Product Sense & Metrics questions

A/B Testing & Experiment Design

What is an A/B test and when would you use one?

EasyFundamentals

Sample Answer

An A/B test is a randomized controlled experiment where you split users into two groups: a control group that sees the current experience and a treatment group that sees a change. You use it when you want to measure the causal impact of a specific change on a metric (e.g., does a new checkout button increase conversion?). The key requirements are: a clear hypothesis, a measurable success metric, enough traffic for statistical power, and the ability to randomly assign users. A/B tests are the gold standard for product decisions because they isolate the effect of your change from other factors.

Practice more A/B Testing & Experiment Design questions

Statistics

Most candidates underestimate how much applied stats shows up in fraud analytics, from thresholding to false-positive tradeoffs. You’ll need to reason clearly about distributions, sampling bias, and how to validate signals with limited labels.

What is a confidence interval and how do you interpret one?

EasyFundamentals

Sample Answer

A 95% confidence interval is a range of values that, if you repeated the experiment many times, would contain the true population parameter 95% of the time. For example, if a survey gives a mean satisfaction score of 7.2 with a 95% CI of [6.8, 7.6], it means you're reasonably confident the true mean lies between 6.8 and 7.6. A common mistake is saying "there's a 95% probability the true value is in this interval" — the true value is fixed, it's the interval that varies across samples. Wider intervals indicate more uncertainty (small sample, high variance); narrower intervals indicate more precision.

Practice more Statistics questions

Data Modeling

When you design tables for analytics, you’re being tested on grain, keys, and how modeling choices impact BI performance and correctness. Expect star schema reasoning, fact/dimension tradeoffs, and how you’d model common product/usage datasets.

An ETL job builds fct_support_interactions from Zendesk tickets, chat transcripts, and on-chain deposit events, and you notice a sudden 12% drop in interactions after a schema change in chat. What data quality checks and pipeline safeguards do you add so this does not silently ship to dashboards again?

CoinbaseCoinbaseMediumETL Monitoring, Data Quality

Sample Answer

Get this wrong in production and your CX dashboards underreport demand, staffing and SLA decisions get made on fake stability. The right call is to add volume and freshness checks (row count deltas by source, max event timestamp lag), completeness checks on required keys (ticket_id, interaction_id, user_id), and distribution checks on critical dimensions (channel, product surface). Gate the publish step with alerting and fail-closed thresholds, plus backfill logic and schema versioning so a renamed field cannot null out a join unnoticed.

Practice more Data Modeling questions

Visualization

When dashboards become the source of truth, small choices in charting and narrative can change decisions. You’ll be tested on picking the right visual, communicating insights to non-technical stakeholders, and proposing actionable next steps.

A Tableau dashboard for the company Retail shows conversion rate by store, but the VP wants stores ranked and "actionable" by tomorrow. What is your default chart and sorting approach, and what adjustment do you make to avoid overreacting to small-sample stores?

AppleAppleMediumRanking, Variability, and Visualization Choice

Sample Answer

The standard move is a ranked bar chart of conversion with a reference line for the fleet median, plus a small table for traffic and transactions. But here, sample size matters because $n$ varies wildly by store, so the ranking is mostly noise for low-traffic locations. You either filter to a minimum volume threshold or plot a funnel chart (conversion versus sessions) with confidence bands, then call out only statistically stable outliers for action.

Practice more Visualization questions

Data Pipelines & Engineering

In practice, you’ll be asked how you keep reporting accurate when pipelines break or definitions drift. Strong answers cover validation checks, anomaly detection, backfills, idempotency, and communicating data incidents to stakeholders.

What is the difference between a batch pipeline and a streaming pipeline, and when would you choose each?

EasyFundamentals

Sample Answer

Batch pipelines process data in scheduled chunks (e.g., hourly, daily ETL jobs). Streaming pipelines process data continuously as it arrives (e.g., Kafka + Flink). Choose batch when: latency tolerance is hours or days (daily reports, model retraining), data volumes are large but infrequent, and simplicity matters. Choose streaming when you need real-time or near-real-time results (fraud detection, live dashboards, recommendation updates). Most companies use both: streaming for time-sensitive operations and batch for heavy analytical workloads, model training, and historical backfills.

Practice more Data Pipelines & Engineering questions

Causal Inference

What is the difference between correlation and causation, and how do you establish causation?

EasyFundamentals

Sample Answer

Correlation means two variables move together; causation means one actually causes the other. Ice cream sales and drowning rates are correlated (both rise in summer) but one doesn't cause the other — temperature is the confounder. To establish causation: (1) run a randomized experiment (A/B test) which eliminates confounders by design, (2) when experiments aren't possible, use quasi-experimental methods like difference-in-differences, regression discontinuity, or instrumental variables, each of which relies on specific assumptions to approximate random assignment. The key question is always: what else could explain this relationship besides a direct causal effect?

Practice more Causal Inference questions

The distribution is bottom-heavy: eight areas each carry between 10% and 18%, so no single topic dominates and you can't afford to skip any of them. Where candidates get burned is the overlap between A/B Testing and Statistics, because a question about randomization unit conflicts (like the guest cancellation example above) quickly demands you reason about power analysis, minimum detectable effect sizing, and multiple comparison corrections in the same breath. Most people over-index on SQL drills and treat Causal Inference as optional, yet difference-in-differences and instrumental variable questions show up at the same 10% rate as Data Pipelines, and they're far harder to improve on in a last-minute cram.

Browse 900+ real questions sorted by area and difficulty at datainterview.com/questions.

How to Prepare

Weeks one and two: SQL and product sense. Solve two window-function problems daily on realistic multi-table schemas, focusing on ROW_NUMBER, LAG, LEAD, self-joins, and CTEs with messy NULL handling. Alongside that, pick a product you actually use (Spotify, DoorDash, whatever), decompose its North Star metric into component drivers, and practice explaining a hypothetical 10% drop out loud in under two minutes.

That two-minute drill trains you to structure metric-decline answers the way interviewers actually score them: hypothesis breadth first, then segmentation logic, then data you'd pull to confirm. For product sense, the rubric rewards how quickly you prioritize plausible causes over how many you list. Weeks three and four, rotate into A/B testing, statistics, and behavioral prep.

For experimentation, drill the three traps that sink the most candidates: underpowered sample sizes, novelty effects that inflate early results, and one-tailed vs. two-tailed test selection (which depends on your pre-registered analysis plan and whether the loss function is symmetric, not just a gut feeling about direction). Expect interviewers at places like Airbnb or Uber to ask you to explain a confidence interval while walking through a real case study, or justify difference-in-differences over a naive pre/post comparison on non-randomized data.

On the behavioral side, write out five STAR stories before you need them, each covering a distinct scenario: pushing back on a stakeholder's flawed assumption, navigating incomplete data, delivering under a tight deadline, influencing a team you didn't manage, and catching your own mistake before it shipped. Rehearse each one out loud at least twice. Candidates who wing behavioral rounds sound vague, and vague gets cut.

Try a Real Interview Question

Experiment lift in booking conversion by market

sql

Given users assigned to an experiment variant and their subsequent sessions with booking outcomes, compute booking conversion rate per market for each variant and the absolute lift delta = conv_treatment - conv_control. Output one row per market with conv_control, conv_treatment, and delta, using only sessions within 7 days after each user's assignment timestamp.

experiment_assignments
user_idexperiment_namevariantassigned_atmarket
101search_ranker_v2control2026-01-01 10:00:00US
102search_ranker_v2treatment2026-01-02 09:00:00US
103search_ranker_v2control2026-01-03 12:00:00FR
104search_ranker_v2treatment2026-01-03 08:30:00FR
sessions
session_iduser_idsession_startdid_book
90011012026-01-02 11:00:001
90021012026-01-10 09:00:000
90031022026-01-05 14:00:000
90041032026-01-04 13:00:000
90051042026-01-06 07:00:001

700+ ML coding problems with a live Python executor.

Practice in the Engine

This type of multi-step SQL problem mirrors live-coding rounds, where you'll get an unfamiliar schema with a few joined tables and 20 minutes to produce a business-ready result set. The gap between knowing what a window function does and writing one correctly under time pressure is enormous, and only repetition on varied schemas closes it. Practice more problems like this at datainterview.com/coding.

Test Your Readiness

Data Analyst Readiness Assessment

1 / 10
Stakeholder Consulting

Can you structure a stakeholder intake conversation to clarify the business problem, define success criteria, and document assumptions and constraints?

If any topic area feels shaky, that's where to focus your remaining prep time. Drill those weak spots at datainterview.com/questions.

Frequently Asked Questions

What technical skills are tested in Data Analyst interviews?

Core skills tested are SQL (window functions, CTEs, joins), product metrics and dashboarding, basic statistics, and data visualization. SQL, Python, R are the primary languages. Expect more weight on communication and metric interpretation than on ML or engineering.

How long does the Data Analyst interview process take?

Most candidates report 3 to 5 weeks from first recruiter call to offer. The process typically includes a recruiter screen, hiring manager screen, SQL round, product/case study, and behavioral interviews. Some companies combine SQL with the case study or use a take-home instead.

What is the total compensation for a Data Analyst?

Total compensation across the industry ranges from $85k to $534k depending on level, location, and company. This includes base salary, equity (RSUs or stock options), and annual bonus. Pre-IPO equity is harder to value, so weight cash components more heavily when comparing offers.

What education do I need to become a Data Analyst?

A Bachelor's degree in a quantitative field is the standard baseline. A Master's can help but is rarely required. Strong SQL skills and a portfolio of analytical projects often matter more than graduate credentials.

How should I prepare for Data Analyst behavioral interviews?

Use the STAR format (Situation, Task, Action, Result). Prepare 5 stories covering cross-functional collaboration, handling ambiguity, failed projects, technical disagreements, and driving impact without authority. Keep each answer under 90 seconds. Most interview loops include 1-2 dedicated behavioral rounds.

How many years of experience do I need for a Data Analyst role?

Entry-level positions typically require 0+ years (including internships and academic projects). Senior roles expect 7-15+ years of industry experience. What matters more than raw years is demonstrated impact: shipped models, experiments that changed decisions, or pipelines you built and maintained.

Dan Lee's profile image

Written by

Dan Lee

Data & AI Lead

Dan is a seasoned data scientist and ML coach with 10+ years of experience at Google, PayPal, and startups. He has helped candidates land top-paying roles and offers personalized guidance to accelerate your data career.

Connect on LinkedIn