Amazon Data Scientist Guide (2026): Job, Salary & Interviews

Amazon Data Scientist at a Glance

Total Compensation

$182k - $763k/yr

Interview Rounds

6 rounds

Difficulty

Levels

L4 - L8

Education

Bachelor's / Master's / PhD

Experience

0–25+ yrs

Python R SQLE-commerceCloud ComputingHuman ResourcesMarketingSupply ChainCustomer ExperienceArtificial Intelligence

Prepping for an Amazon Data Scientist interview?

Grinding LeetCode, memorizing stats, and writing SQL is natural. Those skills matter. But they rarely cause rejections.

The Missing Piece: Leadership Principles I have coached over 40 candidates. The real killer is the 16 Leadership Principles. Candidates usually fail because they lack qualities like Customer Obsession or Ownership.

The Structure: Every interviewer is assigned 2 or 3 specific principles to evaluate you on.
The Challenge: You can answer every technical question correctly and still not get the offer if your behavioral answers do not align with these principles.

A Real Example: I worked with a Senior Data Scientist who was a brilliant coder with a strong background. He did well in the technical rounds but was ultimately turned down. The feedback was that he "seemed to wait for permission rather than taking initiative."

This guide is designed to help you bridge that gap. We will cover how to weave the Leadership Principles into your stories, the technical bar you need to meet, and the common pitfalls to avoid.

Amazon Data Scientist Role

Amazon has over 2,000 data scientists across retail, AWS, Alexa, advertising, and operations. This is an applied science role. You're not publishing papers. You're shipping models that move revenue.

A Typical Week

A Week in the Life of a Amazon Data Scientist

Typical L5 workweek · Amazon

Weekly time split

Analysis — 25%Writing — 20%Coding — 15%Meetings — 15%Research — 10%Break — 10%Infrastructure — 5%

Culture notes

Amazon runs at a high pace with a strong writing culture — expect to spend significant time on docs, and be prepared for direct, sometimes blunt feedback in review meetings rooted in Leadership Principles.
Most corporate roles follow a three-days-in-office policy (typically Tuesday through Thursday at HQ in Seattle), with Monday and Friday commonly worked from home.

What the job description won't tell you: a real chunk of your week goes to data quality fires. Upstream pipelines break, schemas change without notice, and you'll debug more ETL than you ever expected.

Forecasting and optimization. Demand forecasting, delivery routes, inventory placement. A 0.5% accuracy gain moves millions here.

Customer-facing ML. Rec engines, search ranking, Alexa NLU. You're A/B testing at a scale most companies never touch.

Internal tools and measurement. Causal inference, experimentation platforms, metric definitions. Less sexy, but this is where DS has the most org-wide leverage.

Skills & What's Expected

Primary Focus

E-commerceCloud ComputingHuman ResourcesMarketingSupply ChainCustomer ExperienceArtificial Intelligence

Skill Profile

Math & Stats

High

Strong foundation in statistical analysis, hypothesis testing, and quantitative methods for deriving business insights and developing predictive models. Required for interpreting data and solving analytical problems.

Software Eng

Medium

Ability to write clean, efficient code for data manipulation, analysis, and model development, primarily using scripting languages like Python or R. Not focused on large-scale software system design or production engineering.

Data & SQL

High

Expertise in SQL for complex querying, data analysis, reporting, and dashboarding. Comfort with data warehouses and big data technologies (e.g., Redshift, Snowflake, BigQuery) for creating and maintaining data pipelines.

Machine Learning

Medium

Ability to develop and apply predictive models (e.g., for attrition, performance, or demand forecasting). While the Twitch role emphasizes analytics, other Amazon DS roles clearly require this for 'creating models'.

Applied AI

Low

Not a primary focus for this specific Data Scientist role based on the provided job descriptions, which emphasize traditional analytics and predictive modeling. General awareness of AI trends might be beneficial but not explicitly required.

Infra & Cloud

Medium

Familiarity with cloud-based data warehousing and big data technologies (e.g., Redshift, Snowflake, BigQuery) for data access and analysis. The role focuses on using, rather than deploying or managing, cloud infrastructure.

Business

High

Strong ability to translate business problems into data questions, derive actionable insights, and collaborate effectively with business partners and product managers to drive strategic decisions. Essential for understanding viewer behavior and monetization.

Viz & Comms

High

Expertise in creating clear and impactful data visualizations and dashboards using tools like Tableau or Quicksight. Strong written and verbal communication skills are required to author narratives, summarize findings, and present complex analytical results to diverse audiences.

What You Need

3+ years of data science experience
Bachelor's degree in a quantitative field (e.g., Statistics, Business Analytics, Data Science, Mathematics, Economics, Engineering or Computer Science)
Expertise in using SQL for data analysis, reporting, and dashboarding
Expertise with visualization tools like Tableau or Quicksight
Comfort with data warehouses and big data technologies (e.g., Redshift, Snowflake, BigQuery)
Knowledge of Python or R or other scripting language
Experience solving analytical problems using statistical approaches
Comfort with developing predictive models
Ability to communicate complex quantitative analysis in a clear, precise, and actionable manner
Ability to create actionable insights from data
Collaboration with business partners and data team members

Nice to Have

Master’s or Doctorate degree in a quantitative field
Experience in gaming or digital media
Expertise with statistical languages like R, Stata, or SAS
Knowledge of Data Warehouse, Business Intelligence and reporting fundamentals

Languages

PythonRSQL

Tools & Technologies

TableauAmazon QuicksightRedshiftSnowflakeBigQueryRelational Databases

Want to ace the interview?

Practice with real questions.

Start Mock Interview

Statistics and ML get the headlines, but day-to-day you'll lean on SQL and communication just as much. Explaining a tradeoff to a PM who doesn't know what a p-value is matters as much as running the test itself.

Levels & Career Growth

Amazon Data Scientist Levels

Each level has different expectations, compensation, and interview focus.

Base

$137k

Stock/yr

$31k

Bonus

$14k

0–3 yrs Bachelor's degree in a quantitative field is required. Master's or PhD is common for this role.

What This Level Looks Like

Scope is limited to well-defined tasks and projects within a single team. The individual contributes to a component of a larger system or analysis, with direct guidance from senior scientists or a manager.

Day-to-Day Focus

→Developing technical proficiency in core data science skills (e.g., SQL, Python/R, statistics, ML).
→Learning the team's business domain, data sources, and technical infrastructure.
→Executing assigned tasks reliably and delivering results on time.

Interview Focus at This Level

Interviews emphasize fundamental knowledge. Candidates are tested on core statistics, probability, basic machine learning algorithms, SQL proficiency, and coding skills (Python/R). Behavioral questions are heavily weighted, focusing on Amazon's Leadership Principles, particularly 'Learn and Be Curious' and 'Deliver Results'.

Promotion Path

Promotion to L5 (Data Scientist II) requires demonstrating the ability to work with increasing independence on ambiguous problems. This includes scoping small projects, proactively identifying opportunities for analysis, and consistently delivering impactful results with minimal supervision. The scientist must show a deeper understanding of the business context and begin to influence their team's direction.

Find your level

Practice with questions tailored to your target level.

Start Practicing

Most people get stuck at the L5 to L6 jump. L5 means you execute well. L6 means you own a problem end-to-end and influence beyond your team. Fundamentally different skill set.

You can be the best modeler on the team and still not make Senior if you can't demonstrate Leadership Principles with concrete examples. Amazon rewards people who are both technically sharp and organizationally aware.

Work Culture

Amazon's culture is not for everyone, and that's by design. The pace is fast, feedback is direct, and two-pizza teams mean real ownership. When something breaks, it's yours.

Amazon Data Scientist Compensation

Amazon pays differently than most big tech companies. The numbers are in the chart above. Here is how the structure actually works and how to negotiate.

📅 The Back-Loaded Vesting Schedule

Most tech companies vest stock equally over four years. Amazon does not.

Year 1: 5%
Year 2: 15%
Year 3: 40%
Year 4: 40%

To make up for low stock in the first two years, Amazon offers large sign-on bonuses paid out monthly. This acts as a cash bridge until your stock grants hit in Year 3.

💰 Refresher Grants

Starting around your second year, Amazon issues additional RSU grants (refreshers) on top of your original package. This is how total comp stays high after your initial four-year grant runs out. The size of your refresher depends on performance and level. Strong performers often see their Year 3+ comp exceed the original offer.

🤝 How to Negotiate

Get the recruiter on the phone. Here is what moves and what does not.

Base Salary. Rigid. Amazon rarely moves on this.
Sign-on Bonus. Flexible. This is your best leverage point.
RSUs. Negotiable. Competing offers are the strongest lever here.
Level. The most important number in your offer. Pushing for L6 over L5 is worth more long-term than any base salary negotiation.

The Golden Handcuffs

They are real. If you stay past Year 2, the payout jumps significantly. Many people who plan to leave after two years end up staying because the math changes in Year 3.

Amazon Data Scientist Interview Process

6 rounds·~4 weeks end to end

Initial Screen

1 round

Recruiter Screen

30mPhone

This initial conversation assesses your background, experience, and interest in Amazon. The recruiter will also check for alignment with the role's basic qualifications and introduce Amazon's culture and the interview process.

behavioralgeneral

Tips for this round

Research Amazon's Leadership Principles (LPs): Be ready to discuss how your experience aligns with 2-3 LPs.
Articulate your career goals: Clearly explain why you're interested in Amazon and this specific Data Scientist role.
Prepare concise answers: Practice summarizing your resume and key projects in 1-2 minutes.
Ask insightful questions: Show genuine interest in the role, team, and company culture.
Highlight soft skills: Emphasize communication, teamwork, and problem-solving abilities.

Technical Assessment

1 round

Coding & Algorithms

60mLive

This round typically involves solving coding problems in Python and SQL on a shared editor. It assesses your foundational programming skills, data manipulation abilities, and understanding of data structures and algorithms.

algorithmsdata_structuresdatabaseml_coding

Tips for this round

Practice SQL queries: Focus on joins, aggregations, window functions, and subqueries, often involving real-world data scenarios.
Master Python fundamentals: Be proficient in data structures (lists, dicts, sets) and common algorithms for data processing.
Solve datainterview.com/coding-style problems: Practice medium-level problems, especially those involving data manipulation and string/array processing.
Think out loud: Explain your thought process, assumptions, and edge cases to the interviewer as you code.
Test your code: Walk through examples to demonstrate correctness and identify potential bugs, discussing time and space complexity.

Onsite

4 rounds

Machine Learning & Modeling

60mVideo Call

This interview delves into your theoretical and practical knowledge of machine learning. Expect questions on algorithm selection, model evaluation, feature engineering, and potentially conceptual system design for ML applications.

machine_learningdeep_learningml_system_design

Tips for this round

Review core ML algorithms: Understand their assumptions, strengths, weaknesses, and appropriate use cases (e.g., regression, classification, clustering).
Discuss past ML projects in detail: Be ready to explain your role, the problem, data used, model choices, challenges faced, and the impact of your work.
Understand model evaluation metrics: Know when to use different metrics (e.g., precision, recall, F1, AUC, RMSE) and their implications for business goals.
Prepare for ML system design: Think about data pipelines, model deployment strategies, monitoring, and scaling considerations for ML models in production.
Address Amazon LPs: Weave in examples demonstrating LPs like 'Invent and Simplify' or 'Dive Deep' when discussing project decisions.

Statistics & Probability

60mVideo Call

This round assesses your statistical foundations, experimental design skills, and ability to interpret data. You'll likely encounter questions on hypothesis testing, A/B testing, and probability concepts applied to business scenarios.

statisticsprobabilityab_testingcausal_inference

Tips for this round

Master A/B testing concepts: Understand experimental design, sample size calculation, statistical significance, and interpretation of results.
Review statistical tests: Know when to apply t-tests, chi-squared tests, ANOVA, and non-parametric tests, and their underlying assumptions.
Practice probability puzzles: Be able to solve common probability and conditional probability problems, explaining your reasoning clearly.
Explain statistical concepts clearly: Demonstrate your ability to communicate complex ideas simply to a non-technical audience.
Discuss potential biases and pitfalls: Show awareness of common issues in data analysis and experimentation, such as confounding variables or Simpson's Paradox.

Product Sense & Metrics

60mVideo Call

This interview evaluates your ability to apply data science to real-world business problems. You'll be given a product scenario or a case study to analyze, define metrics, and propose data-driven solutions, often including guesstimate questions.

product_senseguesstimateab_testing

Tips for this round

Structure your approach: Use frameworks (e.g., clarifying questions, problem breakdown, metric definition, solution proposal) to organize your thoughts.
Focus on customer impact: Align your solutions with Amazon's 'Customer Obsession' LP, demonstrating how data insights benefit the user.
Define clear metrics: Propose measurable and actionable metrics to track the success of your solutions, distinguishing between leading and lagging indicators.
Practice guesstimate questions: Be able to break down large, ambiguous problems into smaller, estimable parts, clearly stating your assumptions.
Consider trade-offs: Discuss the pros and cons of different approaches, potential risks, and how to mitigate them.

Bar Raiser

60mVideo Call

This crucial round focuses entirely on Amazon's 16 Leadership Principles, often conducted by a 'Bar Raiser' who is not from the hiring team. You will be asked to provide specific examples from your past experience that demonstrate how you embody these principles.

behavioral

Tips for this round

Prepare 2-3 STAR stories for each LP: Have diverse, detailed examples ready to draw from, covering both successes and failures.
Quantify your impact: Use numbers and metrics to highlight the results and scale of your actions in your STAR stories.
Focus on 'I' not 'we': While teamwork is important, emphasize your individual contributions and decisions within team projects.
Be authentic and reflective: Show self-awareness, discuss lessons learned, and how you've grown from experiences.
Demonstrate ownership and bias for action: These are highly valued LPs; ensure your stories showcase initiative and accountability.

Tips to Stand Out

Master Amazon's Leadership Principles (LPs). These are fundamental to every interview round, not just the behavioral one. Prepare specific STAR examples for each LP.
Practice the STAR method rigorously. For every behavioral question, structure your answer with Situation, Task, Action, and Result, quantifying results whenever possible.
Build a strong technical foundation. Be proficient in SQL, Python (data structures, algorithms), core Machine Learning concepts, and Statistics/Probability, as these are tested extensively.
Develop strong product sense. Understand how data science insights translate into business value and customer impact. Practice case studies and guesstimate questions.
Refine your communication skills. Clearly articulate your thought process, assumptions, and solutions, both technically and to a non-technical audience.
Ask clarifying questions. Don't jump to solutions; demonstrate critical thinking by asking questions to fully understand the problem scope and constraints.
Prepare insightful questions for your interviewers. This shows genuine interest in the role, team, and company culture, and helps you assess fit.

Common Reasons Candidates Don't Pass

✗Lack of Leadership Principle alignment. Candidates often fail to demonstrate how their past experiences align with Amazon's LPs, or provide generic answers without specific examples.
✗Weak technical skills. Inability to solve coding (Python/SQL), machine learning, or statistical problems effectively, or a lack of depth in technical knowledge.
✗Poor communication. Failing to articulate thoughts clearly, explain complex concepts simply, or structure problem-solving approaches logically.
✗Insufficient project depth. Candidates struggle to discuss their past data science projects in detail, including challenges, decisions, and the impact of their work.
✗Not asking clarifying questions. Jumping straight to a solution without fully understanding the problem's context, assumptions, or constraints, indicating a lack of critical thinking.
✗Inability to connect data to business impact. Failing to demonstrate product sense or how data science solutions drive measurable business outcomes and customer value.

Offer & Negotiation

Amazon's compensation typically comprises a base salary, Restricted Stock Units (RSUs), and a signing bonus. The RSU vesting schedule is heavily back-weighted (5% year 1, 15% year 2, 40% year 3, 40% year 4), which is offset by large signing bonuses in years 1 and 2. While base salary has some flexibility (up to $350K for senior roles), the most negotiable levers are often the signing bonus and the initial RSU grant. Focus on the total compensation over the first four years and leverage any competing offers to negotiate for a higher package.

Amazon Data Scientist Interview Questions

Alright, let's get into the actual questions. I'm going to organize these by type, but remember—most rounds blend technical and behavioral. Don't expect clean separation.

Statistics, Probability & Inference

Expect questions that force you to justify statistical choices under messy, real business constraints (non-normality, multiple comparisons, noisy labels). You’ll be evaluated on whether you can translate uncertainty into a clear decision and explain assumptions without hand-waving.

You are comparing average order value (AOV) between Prime and non-Prime customers in the US, but AOV is heavy-tailed and you have $n=500{,}000$ orders per group. Which test and interval would you report to decide if Prime truly has higher AOV, and what would you do about outliers?

EasyRobust Inference

Sample Answer

Most candidates default to a two-sample $t$-test on raw AOV, but that fails here because heavy tails and extreme orders dominate the mean and standard error, making the result fragile and not decision-safe. You should use a robust estimand like the difference in trimmed means or the log-transformed mean (reporting results back on the dollar scale), and form intervals via bootstrap or asymptotics on the transformed scale. If the business must decide on the mean AOV itself, winsorize at a pre-registered percentile or model the tail explicitly, then show sensitivity analyses so one whale order cannot flip the decision.

Marketing ran 30 simultaneous email subject line experiments and will ship the best performer by click-through rate (CTR); how do you adjust inference so the shipped subject line has a reliable lift, and what metric do you report to leadership?

HardMultiple Comparisons

Practice more Statistics, Probability & Inference questions

SQL / Database Querying

Most candidates underestimate how much signal Amazon expects you to extract with clean, performant SQL (window functions, CTEs, cohorting, metric definitions). The hard part is staying precise about joins, grain, and edge cases while moving fast.

In Redshift, you have orders(order_id, customer_id, order_ts, marketplace_id) and order_items(order_id, asin, quantity, item_price, is_refund). Write SQL to compute daily gross merchandise sales (GMS) per marketplace for the last 30 days, excluding refunded items, and include a 7-day trailing moving average of GMS.

MediumWindow Functions

Sample Answer

Compute daily GMS at the order-item grain, then roll up to day and marketplace, and finally apply a 7-day windowed average on the daily totals. Most people fail by double counting because they aggregate after joining without fixing grain. Filter refunded items at the item level (is_refund = false) before summing revenue. Use a date spine only if you need zero-sales days, otherwise you will silently drop them.

SQL

1/* Daily GMS and 7-day trailing moving average by marketplace, last 30 days */
2WITH filtered_items AS (
3  SELECT
4    oi.order_id,
5    (oi.quantity * oi.item_price) AS item_revenue
6  FROM order_items oi
7  WHERE COALESCE(oi.is_refund, FALSE) = FALSE
8),
9item_revenue_by_order AS (
10  /* Collapse to order grain so later joins cannot multiply revenue */
11  SELECT
12    fi.order_id,
13    SUM(fi.item_revenue) AS order_revenue
14  FROM filtered_items fi
15  GROUP BY fi.order_id
16),
17daily_gms AS (
18  SELECT
19    DATE_TRUNC('day', o.order_ts) AS order_day,
20    o.marketplace_id,
21    SUM(iro.order_revenue) AS gms
22  FROM orders o
23  JOIN item_revenue_by_order iro
24    ON o.order_id = iro.order_id
25  WHERE o.order_ts >= DATEADD(day, -30, CURRENT_DATE)
26  GROUP BY 1, 2
27)
28SELECT
29  dg.order_day,
30  dg.marketplace_id,
31  dg.gms,
32  AVG(dg.gms) OVER (
33    PARTITION BY dg.marketplace_id
34    ORDER BY dg.order_day
35    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
36  ) AS gms_7d_trailing_ma
37FROM daily_gms dg
38ORDER BY dg.order_day, dg.marketplace_id;

You own an HR attrition dashboard in Quicksight with employees(employee_id, hire_date, termination_date, org_id) and org_snapshots(employee_id, snapshot_date, org_id) taken daily. Write SQL to compute monthly attrition rate per org_id where attrition rate is terminations in month divided by active headcount at month start, and ensure each employee is counted once even if org changes mid-month.

HardCohorting and Metric Definitions

Practice more SQL / Database Querying questions

Product Sense & Metrics

Your ability to reason about customer and business outcomes is tested through metric design, tradeoffs, and diagnosing metric movement. You’ll need to pick the right north-star and guardrails for e-commerce-style funnels, marketing, HR, or supply chain scenarios and defend them crisply.

Amazon Retail launches a new default sort on search results meant to increase purchase conversion, but leadership worries about long term customer trust and delivery experience. Define one north star metric and 3 guardrails, include precise numerator and denominator for each, and explain what movement would make you roll back.

MediumMetric Design and Guardrails

Sample Answer

You could optimize for search-to-purchase conversion rate or for profit per search session. Conversion wins here because it is closer to customer value and less sensitive to pricing and mix shifts, then you protect business health with guardrails like cancellation rate, late delivery rate, and return rate. Roll back if the north star rises but any guardrail shows a statistically and practically meaningful regression that persists across key segments (Prime vs non-Prime, fast vs slow delivery promises).

In Amazon Ads, a new budget pacing algorithm ships and next-day revenue is flat, but CPC is up 12% and conversions are down 6%. Walk through, step by step, how you would diagnose whether this is measurement, auction dynamics, or true demand loss, and list the metrics and slices you would pull first.

HardMetric Movement Diagnosis

Practice more Product Sense & Metrics questions

A/B Testing & Experimentation

The bar here isn’t whether you know p-values, it’s whether you can design trustworthy experiments and avoid common failure modes (SRM, novelty, peeking, interference). You’ll be pushed to connect design decisions to power, bias, and practical rollout risk.

You run an A/B test on an Amazon retail search ranking tweak and see a 2% lift in clicks but a 1% drop in purchases; what is your decision framework for ship, iterate, or rollback, and which guardrail metrics do you require before calling it? Keep it to one primary metric, two guardrails, and one segmentation cut you would inspect for heterogeneous effects.

EasyExperiment Design and Metrics Tradeoffs

Sample Answer

Reason through it: Start by defining the single objective metric that maps to business value, typically $\text{conversion rate}$ or $\text{revenue per search session}$, not clicks. Then treat clicks as an input metric, and purchases, cancellations, returns, and latency as guardrails, because they capture downstream harm and customer experience. Next, check if the observed tradeoff is consistent with a funnel shift, for example more browsing with worse intent, and ensure the decision is based on the primary metric with a pre-registered rule (minimum detectable effect, sign, and duration). Finally, look at one segmentation cut where interference or intent differs (new vs returning customers, or high intent queries vs broad queries) to see if the effect is concentrated, which often flags a ranking bug or relevance regression.

A Prime signup experiment is randomized at the customer level, but households share devices and benefits; how do you detect and mitigate interference, and how does it change your variance, power, and unit of analysis? Give a concrete plan for what you would ship to production if the business insists on running it anyway.

HardInterference, Clustering, and Unit of Randomization

Practice more A/B Testing & Experimentation questions

Machine Learning & Applied Modeling

You’ll often be asked to choose and critique models for prediction problems like demand forecasting, churn/attrition, ranking, or propensity. Focus on feature leakage, evaluation metrics aligned to the business, calibration/thresholding, and interpreting errors rather than deep infrastructure details.

You are building a model to predict whether a Prime customer will churn in the next 30 days using clickstream and purchase data from the last 90 days. What are three concrete leakage risks in this setup, and how do you redesign features, labels, and splits to prevent them?

EasyML Theory

Sample Answer

This question is checking whether you can spot leakage that makes offline metrics meaningless, then fix it with time-correct data design. Call out label contamination (post-churn events), lookahead in aggregates (using data after prediction time), and splitting users randomly across time so future behavior trains past predictions. The fix is an explicit prediction timestamp, features computed strictly before it, labels in a forward window, and time-based validation that mimics deployment.

You built a purchase-propensity model for Amazon Marketing and the AUC is strong, but the campaign team needs a top-1% list to maximize incremental orders within a fixed budget. Which evaluation metrics do you report, how do you choose an operating threshold, and how do you check calibration before launch?

MediumModel Evaluation

Sample Answer

The standard move is to report AUC and maybe log loss, then pick a threshold like $0.5$. But here, ranking quality at the extreme tail matters because the business action is top-k targeting, so you report precision@k, recall@k, lift, and expected value under the budget constraint. Then you select the threshold by maximizing expected incremental orders or profit using a cost-benefit table, and you verify calibration with reliability curves and Brier score (or isotonic/Platt scaling) so scores map to real purchase probabilities.

You are forecasting daily demand for an Amazon supply chain node with strong weekly seasonality, intermittent SKUs, and frequent promotions that appear in the data after the day starts. What modeling approach do you choose, what validation scheme prevents optimistic bias, and how do you handle promo features so the model is usable for planning?

HardForecasting

Practice more Machine Learning & Applied Modeling questions

Coding & Algorithms (DS-leaning)

Coding rounds tend to probe whether you can implement correct, readable solutions under time pressure using core data structures and complexity intuition. Expect practical patterns (hash maps, sorting, two pointers) plus careful handling of edge cases and input constraints.

You have a Redshift export of Amazon retail search impressions as a list of tuples (customer_id, query, ts_epoch_seconds). Return the length of the longest streak of consecutive days where the same customer issued the same query at least once per day, treating multiple events in a day as one and using UTC days.

MediumHashing and Date Normalization

Sample Answer

The standard move is to normalize each event into a day key and then use a hash set to test consecutive days in $O(n)$. But here, duplicates within a day matter because they will inflate counts unless you dedupe per (customer, query, day) before streak logic.

Python

1from __future__ import annotations
2
3from collections import defaultdict
4from datetime import datetime, timezone
5from typing import Iterable, List, Tuple, Set
6
7
8def longest_same_query_daily_streak(
9    events: Iterable[Tuple[str, str, int]]
10) -> int:
11    """Return the longest consecutive-day streak where a customer issued the same query.
12
13    Each input is (customer_id, query, ts_epoch_seconds). Multiple events in the same
14    UTC day count once. Streak is measured in consecutive UTC dates.
15
16    Time: O(n) expected, Space: O(n)
17    """
18
19    # Step 1: Normalize timestamps into UTC dates and dedupe within a day.
20    days_by_pair: defaultdict[Tuple[str, str], Set[int]] = defaultdict(set)
21
22    for customer_id, query, ts in events:
23        # Convert epoch seconds to a UTC date, then map to an integer day index.
24        # Using day index makes consecutive checks cheap.
25        dt = datetime.fromtimestamp(ts, tz=timezone.utc)
26        day_index = dt.toordinal()  # days since year 1-01-01
27        days_by_pair[(customer_id, query)].add(day_index)
28
29    # Step 2: For each (customer, query), compute longest consecutive run.
30    best = 0
31    for _, day_set in days_by_pair.items():
32        if not day_set:
33            continue
34
35        # Standard longest-consecutive-sequence pattern using a set.
36        for d in day_set:
37            # Only start counting from the beginning of a streak.
38            if (d - 1) in day_set:
39                continue
40
41            length = 1
42            cur = d
43            while (cur + 1) in day_set:
44                cur += 1
45                length += 1
46
47            if length > best:
48                best = length
49
50    return best
51
52
53if __name__ == "__main__":
54    sample = [
55        ("c1", "laptop stand", 1704067200),  # 2024-01-01
56        ("c1", "laptop stand", 1704070800),  # same day duplicate
57        ("c1", "laptop stand", 1704153600),  # 2024-01-02
58        ("c1", "laptop stand", 1704326400),  # 2024-01-04 (break)
59        ("c2", "headphones", 1704067200),
60        ("c2", "headphones", 1704153600),
61        ("c2", "headphones", 1704240000),
62    ]
63    print(longest_same_query_daily_streak(sample))  # c2 has 3-day streak
64

You are building an on-call dashboard for Amazon Prime delivery promises and need a streaming metric: given an iterator of events (order_id, ts_epoch_seconds, event_type) where event_type is one of {"PLACED","SHIP","DELIVER"}, compute for each order its delivery latency in seconds as delivered_ts minus placed_ts, dropping orders with missing endpoints and keeping only the earliest PLACED and earliest DELIVER per order.

MediumStreaming Aggregation and Edge Cases

Practice more Coding & Algorithms (DS-leaning) questions

Amazon Leadership Principles (Behavioral)

Amazon is unique in a lot of ways, but their 16 Leadership Principles are the biggest hurdle for technical candidates. These are not just corporate buzzwords to memorize. They form the exact grading rubric your interviewers use to evaluate your culture fit and problem solving as a Data Scientist.

⭐️ 16 Amazon Leadership Principles

Learn these by heart.

Customer Obsession: Leaders start with the customer and work backwards. They work vigorously to earn and keep customer trust.
Ownership: Leaders are owners. They think long term and do not sacrifice long-term value for short-term results.
Invent and Simplify: Leaders expect and require innovation and invention from their teams and always find ways to simplify.
Are Right, A Lot: Leaders are right a lot. They have strong judgment and good instincts.
Learn and Be Curious: Leaders are never done learning and always seek to improve themselves.
Hire and Develop the Best: Leaders raise the performance bar with every hire and promotion.
Insist on the Highest Standards: Leaders have relentlessly high standards. Many people may think these standards are unreasonably high, but leaders are continually raising the bar.
Think Big: Thinking small is a self-fulfilling prophecy. Leaders create and communicate a bold direction that inspires results.
Bias for Action: Speed matters in business. Many decisions and actions are reversible and do not need extensive study.
Frugality: Accomplish more with less. Constraints breed resourcefulness, self-sufficiency, and invention.
Earn Trust: Leaders listen attentively, speak candidly, and treat others respectfully.
Dive Deep: Leaders operate at all levels, stay connected to the details, audit frequently, and are skeptical when metrics and anecdote differ.
Have Backbone; Disagree and Commit: Leaders are obligated to respectfully challenge decisions when they disagree. Once a decision is determined, they commit wholly.
Deliver Results: Leaders focus on the key inputs for their business and deliver them with the right quality and in a timely fashion.
Strive to be Earth's Best Employer: Leaders work every day to create a safer, more productive, higher performing, more diverse, and more just work environment.
Success and Scale Bring Broad Responsibility: We are big, we impact the world, and we are far from perfect. We must be humble and thoughtful about even the secondary effects of our actions.

💡 The No-BS Insight: You can write flawless Python and know the ins-and-outs of A/B tests, but if you fail the Leadership Principles, you will be rejected. So, don't wing it. Use this secret sauce for prep:

Build a Story Matrix: Do not try to memorize 16 different stories. Instead, outline 8 to 12 strong, versatile projects from your past. Map each project to two or three different Leadership Principles so you can pivot on the fly.
Format with STAR: Every answer must follow the Situation, Task, Action, Result framework. Keep the setup (Situation and Task) brief. Spend the bulk of your time detailing exactly what you did (Action) and the measurable business impact (Result).
Say "I", Not "We": This is a critical trap. Amazon wants to know what you built, not what your team accomplished. If you say "we trained a model," the interviewer will stop you and ask exactly which part you coded.
Quantify the Impact: Data Scientists live and die by metrics. Your results must include hard numbers. Did your model reduce latency, increase conversion by 4 percent, or save $100,000? If a project failed, quantify the cost and explain the lesson learned.
Prepare for Pushback: The Bar Raiser will cross-examine your stories. They will ask why you chose a specific algorithm, what trade-offs you made, and what you would do differently today. Surface-level answers will fall apart here.

👎 The Bad Answer

"Users complained about irrelevant product recommendations. I retrained the model with new data. After that, the recommendations were much better and engagement went up."

Why it fails: Lacks context, avoids personal ownership, and completely ignores quantifiable metrics, offline evaluation, or A/B testing results.

👍 The Good Answer (STAR Framework)

Situation: At my previous e-commerce startup, VIP customers experienced a 20 percent drop in engagement because the recommendation engine surfaced irrelevant, out-of-season items.
Task: I needed to diagnose the model degradation, engineer a solution, and validate the improvement without disrupting the baseline user experience.
Action: I conducted an exploratory data analysis and discovered seasonal trend weights were decaying too slowly. I engineered new time-decay features and trained an XGBoost model. I then designed and launched a two-week A/B test to evaluate the new model against the control group.
Result: The A/B test achieved a statistically significant 5 percent lift in click-through rate. We launched the model globally, recovered the 20 percent engagement drop, and drove $500,000 in incremental quarterly revenue

Want to learn more? Click this video👇

Amazon Data Scientist Interview Prep

Understand the Business

Amazon interviewers expect you to know how the company makes money and where data science fits. You don't need to memorize annual reports, but you should be able to connect your work to a business segment and explain why it matters.

Know the Business

Updated Q1 2026

Official mission

“Amazon is guided by four principles: customer obsession rather than competitor focus, passion for invention, commitment to operational excellence, and long-term thinking. We strive to be Earth’s most customer-centric company, Earth’s best employer, and Earth’s safest place to work.”

What it actually means

Amazon's core mission is to be the most customer-centric company on Earth, achieved through relentless innovation, operational excellence, and a long-term strategic outlook. It also aims to be Earth's best employer and safest place to work, though the consistent prioritization of these employee-focused goals is debated.

Seattle, WashingtonUnknown

Key Business Metrics

Revenue

$717B

+14% YoY

Market Cap

$2.2T

-12% YoY

Employees

1.6M

+1% YoY

Business Segments and Where DS Fits

AWS

Cloud platform that powers AI inference with custom chips, smart routing systems, and purpose-built infrastructure, making AI faster and more affordable. Offers services like Amazon Bedrock.

DS focus: Making AI faster and more affordable (inference), foundation model evaluation (via Amazon Bedrock with models like Claude Sonnet 4.6)

Amazon Stores

Encompasses Prime benefits, small businesses, retail stores, and other features. Focuses on improving delivery speed and expanding services like Amazon Pharmacy.

DS focus: Personalized product recommendations, tracking price history, automated purchasing based on target prices (via Rufus AI assistant)

Amazon Ads

Advertising platform for brands to connect with audiences, focusing on authenticated identity, AI-powered optimization, and integrated campaigns across streaming TV, online video, and display advertising. Offers solutions like Amazon Marketing Cloud and AWS Clean Rooms.

DS focus: AI-powered optimization, unified audience view across touchpoints, connecting media exposure to shopping behavior, AI for creative brief generation and storyboarding (Creative Agent), continuous optimization for full-funnel campaigns

Current Strategic Priorities

Continue to be a leading corporate purchaser of carbon-free energy
Make AI faster and more affordable via AWS infrastructure
Deploy initial low Earth orbit satellite internet constellation (Project Kuiper)
Expand Amazon Pharmacy Same-Day Delivery to nearly 4,500 cities
Improve Prime delivery speed (set new record in 2025)
Advance advertising solutions with authenticated identity, AI-powered optimization, and integrated campaigns
Simplify advertising for brands by leveraging AI to remove friction and accelerate insight-to-action

Competitive Moat

audience scaleextensive selectionglobal presenceconvenient buying experiencerapid delivery servicesSpeedTrustsearch engine

Before your loop, pick 2-3 segments that align with the team you're interviewing for. Be ready to propose a metric you'd track and explain why.

Leadership Principles Are Not Optional

Every single round at Amazon includes behavioral questions. The Bar Raiser round is entirely behavioral. You need 8-10 STAR stories mapped to specific Leadership Principles before you walk in.

The principles that come up most for data scientists: Customer Obsession, Dive Deep, Bias for Action, Have Backbone; Disagree and Commit, and Deliver Results. Rehearse each story out loud until you can tell it clearly in under 3 minutes.

Technical Preparation

Amazon DS technical rounds test breadth over depth. You won't build a transformer from scratch, but you will explain when logistic regression beats XGBoost and why.

SQL: Expect at least one live SQL question. Window functions (RANK, ROW_NUMBER, LAG), self-joins, and CTEs are fair game. Practice writing queries without an IDE.

Statistics: A/B testing comes up in nearly every loop. Know how to calculate sample size, pick the right test, and explain what to do when assumptions are violated.

Machine Learning: Focus on intuition over math. Be ready to walk through a full modeling pipeline: problem framing, feature selection, model choice, evaluation metrics, and how you'd deploy it.

sql

Given Prime signup events and order events, find for each user their first order placed after the latest Prime signup. Return only users whose first post-signup order occurs within $30$ days of signup, with columns user_id, signup_date, first_order_id, first_order_date, and days_to_first_order as an integer.

prime_signups

user_id	signup_date
U1	2024-01-01
U1	2024-02-01
U2	2024-01-10
U3	2024-01-05

orders

order_id	user_id	order_date	order_amount
O100	U1	2024-01-15	25.00
O101	U1	2024-02-10	40.00
O200	U2	2024-02-20	15.00
O300	U3	2024-01-25	60.00

SQL

1WITH latest_signup AS (
2  SELECT
3    user_id,
4    MAX(signup_date) AS signup_date
5  FROM prime_signups
6  GROUP BY user_id
7), post_signup_orders AS (
8  SELECT
9    o.user_id,
10    ls.signup_date,
11    o.order_id,
12    o.order_date,
13    ROW_NUMBER() OVER (
14      PARTITION BY o.user_id
15      ORDER BY o.order_date, o.order_id
16    ) AS rn
17  FROM orders o
18  JOIN latest_signup ls
19    ON o.user_id = ls.user_id
20   AND o.order_date > ls.signup_date
21)
22SELECT
23  user_id,
24  signup_date,
25  order_id AS first_order_id,
26  order_date AS first_order_date,
27  CAST(DATE_DIFF('day', signup_date, order_date) AS INTEGER) AS days_to_first_order
28FROM post_signup_orders
29WHERE rn = 1
30  AND DATE_DIFF('day', signup_date, order_date) <= 30
31ORDER BY user_id;

700+ ML coding problems with a live Python executor.

Practice in the Engine

The Product Sense Round

This is where Amazon DS interviews differ from other companies. You'll be given a vague product scenario and asked to define success metrics, identify tradeoffs, and propose an experiment.

Framework: Start with the business goal. Define a primary metric (what you're optimizing) and guardrail metrics (what you can't break). Then propose how you'd measure impact, whether that's an A/B test, quasi-experiment, or pre/post analysis.

How Ready Are You for Amazon Data Scientist?

1 / 10

Statistics

Can you choose an appropriate model evaluation approach and interpret results (for example, bias variance tradeoff, overfitting signals, cross validation choice) for a dataset with limited samples?

Alright, you know what to expect. Now let's talk about how to prepare. This is where most people either over-prepare the wrong things or under-prepare the right things.

Mock Interviews (Do Them, Seriously)

I'm obviously biased since I do interview coaching, but even if you don't hire a coach, do mocks with someone.

Why they matter:

You can know your stuff cold and still bomb under pressure. Mock interviews help you:

Practice articulating your thoughts clearly
Get comfortable with the pressure
Identify gaps in your stories
Learn to manage time
Handle unexpected follow-ups

Who to practice with:

Best: Interview coach who knows Amazon's process
Good: Friend who works in DS and can push back on your answers
Okay: Any technical friend who can hold you accountable to time limits
Not helpful: Your mom who thinks everything you say is perfect

What to practice:

Do at least 3-4 full mocks before your real interview:

1-2 focused on behavioral (full loop of LP questions)
1-2 focused on technical (SQL + coding + stats)
1 full simulation (both types mixed, 5 rounds)

Record yourself if you can. It's painful to watch but super helpful.

One client's experience:

Guy I worked with did zero mocks, felt confident, bombed the interview. His feedback: "Answers were too long and lacked structure."

We did 4 mock sessions. He got used to the pressure, tightened his answers, learned when to stop talking. Second interview at Amazon (different role, 6 months later), he got the offer.

Practice matters.

The Week Before

Final prep tips for the week leading up to your interview:

Monday-Wednesday:

Do one final mock if possible
Review your prepared stories one more time
Brush up on SQL with 10-15 practice problems
Read through Amazon's Leadership Principles again

Thursday-Friday:

Light review only, don't cram
Get sleep
Review the job description and team info
Prepare questions to ask your interviewers (yes, have these ready)

Day before:

Do nothing interview-related after 6pm
Get a good dinner
Sleep 8 hours
Seriously, sleep matters

Day of:

Test your tech setup 30 minutes early
Have water nearby
Take those 5-10 minute breaks between rounds

What not to do:

Don't stay up late cramming SQL. Don't try to memorize new concepts the day before. Don't have 3 cups of coffee right before (nervous energy ≠ good energy).

Frequently Asked Questions

How long does the Amazon Data Scientist interview process take?

Expect about 4 to 8 weeks from application to offer. The process typically starts with a recruiter screen, then a technical phone screen (usually SQL and stats), followed by an onsite loop of 4-5 interviews. Scheduling the onsite can take a couple weeks depending on team availability. If you get a referral, the initial screening phase moves faster.

What technical skills are tested in the Amazon Data Scientist interview?

SQL is non-negotiable. You'll face questions on joins, window functions, and aggregations. Python or R coding comes up in nearly every loop, often focused on data manipulation and writing clean functions. Beyond that, expect statistics, probability, A/B testing, and machine learning questions. For L6 and above, you'll also need to show depth in a specific domain like causal inference, econometrics, or optimization. Familiarity with data warehouses like Redshift and visualization tools like Tableau or QuickSight is expected too.

How should I tailor my resume for an Amazon Data Scientist role?

Amazon is obsessed with measurable impact, so every bullet point on your resume should tie back to a business outcome with real numbers. Use the format: what you did, how you did it, and what the result was. Mention SQL, Python, and any experience with big data tools like Redshift or Snowflake explicitly. If you've run A/B tests or built predictive models that drove decisions, put those front and center. Also, frame your work using language that maps to Amazon's Leadership Principles, especially Customer Obsession and Bias for Action.

What is the total compensation for Amazon Data Scientists by level?

At L4 (junior, 0-3 years experience), total comp averages around $182,000 with a range of $165K to $200K. L5 (mid-level, 3-8 years) averages $256,000 and can reach $310K. L6 (senior, 5-12 years) jumps to about $373,000 with a ceiling near $460K. L7 (staff) averages $639,000, and L8 (principal) hits $763,000 on average. One thing to know: Amazon's RSU vesting is back-loaded at 5% in year one and 15% in year two, with 40% each in years three and four. They offset the early gap with large signing bonuses.

How do I prepare for Amazon's behavioral interview as a Data Scientist?

Amazon's behavioral rounds are built entirely around their 16 Leadership Principles. Customer Obsession, Ownership, Dive Deep, and Bias for Action come up the most for data science roles. Prepare 8-10 detailed stories from your past work that you can adapt to different principles. Each story should cover a real situation where you made a measurable impact. I've seen candidates fail the behavioral loop even with perfect technical scores, so don't treat this as the easy part.

How hard are the SQL questions in Amazon Data Scientist interviews?

For L4, expect medium-difficulty SQL: multi-table joins, GROUP BY with HAVING, and basic window functions. At L5 and above, the questions get harder. Think self-joins, complex CTEs, running totals, and questions that require you to think about query performance on large datasets. The questions are practical, not abstract puzzles. They'll often be framed around real Amazon scenarios like customer purchase behavior or delivery metrics. You can practice similar problems at datainterview.com/questions.

What machine learning and statistics concepts should I know for Amazon's Data Scientist interview?

At a minimum, you need solid understanding of hypothesis testing, confidence intervals, probability distributions, and A/B testing methodology. For ML, know regression (linear and logistic), decision trees, random forests, and gradient boosting well enough to explain tradeoffs. L5+ candidates should expect deeper questions on topics like causal inference, uplift modeling, or time series forecasting depending on the team. At L6 and L7, you'll need to demonstrate real depth in a specialized area like deep learning or optimization. Practice applying these concepts to business problems at datainterview.com/questions.

What is the best format for answering Amazon behavioral interview questions?

Use the STAR format: Situation, Task, Action, Result. Amazon interviewers are trained to probe for specifics, so vague answers won't fly. Spend about 20% of your time on the situation and task, 50% on your specific actions (not the team's), and 30% on quantifiable results. Always end with a number. "Reduced churn by 12%" beats "improved retention" every time. When they ask follow-ups like "What would you do differently?", have a genuine answer ready. That maps to the Earn Trust principle.

What happens during the Amazon Data Scientist onsite interview?

The onsite (often virtual now) is a loop of 4-5 back-to-back interviews, each about 45-60 minutes. You'll typically get one or two technical rounds covering SQL, coding, and ML concepts. There's usually a case study or applied problem-solving round where you walk through how you'd approach a real business question. The remaining rounds focus on behavioral questions tied to Leadership Principles. For L6+, expect at least one round on system design for data science applications. Each interviewer writes independent feedback, and one person acts as the "Bar Raiser" to ensure hiring standards stay high.

What business metrics and concepts should I know for an Amazon Data Scientist interview?

Think like an Amazon PM. Know how to define and decompose metrics like conversion rate, customer lifetime value, churn, and revenue per user. You should be comfortable with experimentation frameworks: how to size an A/B test, pick the right metric, and handle tricky scenarios like network effects or novelty bias. Amazon cares deeply about customer-centric metrics, so practice framing everything in terms of customer impact. At L5 and above, you might get asked to design a measurement strategy for a new product feature or evaluate tradeoffs between competing metrics.

What are common mistakes candidates make in Amazon Data Scientist interviews?

The biggest one I see is under-preparing for behavioral rounds. Candidates spend all their time on SQL and ML, then give generic answers about teamwork. Amazon will reject strong technical candidates who bomb the Leadership Principles portion. Another common mistake is not quantifying impact in your answers. "I built a model" means nothing without the business result. Finally, at L5+, candidates often fail to show they can think beyond the model. Amazon wants data scientists who connect their work to business decisions, not just people who tune hyperparameters.

How does Amazon's RSU vesting schedule affect Data Scientist compensation?

Amazon's vesting schedule is unusual and catches people off guard. Your initial RSU grant vests 5% in year one, 15% in year two, then 40% in each of years three and four. That means your cash compensation looks very different in year one versus year three. To compensate, Amazon gives large signing bonuses in years one and two. So your total comp stays relatively flat, but the mix shifts from cash-heavy early on to equity-heavy later. Refresh grants are common after the initial four-year cliff but aren't guaranteed annually.

Amazon Data Scientist Interview

Amazon Data Scientist Role

A Typical Week

A Week in the Life of a Amazon Data Scientist

Weekly time split

Culture notes

Skills & What's Expected

Levels & Career Growth

Amazon Data Scientist Levels

Work Culture

Amazon Data Scientist Compensation

Amazon Data Scientist Interview Process

Initial Screen

Recruiter Screen

Technical Assessment

Coding & Algorithms

Onsite

Machine Learning & Modeling

Statistics & Probability

Product Sense & Metrics

Bar Raiser

Tips to Stand Out

Common Reasons Candidates Don't Pass

Amazon Data Scientist Interview Questions

Statistics, Probability & Inference

SQL / Database Querying

Product Sense & Metrics

A/B Testing & Experimentation

Machine Learning & Applied Modeling

Coding & Algorithms (DS-leaning)

Amazon Leadership Principles (Behavioral)

⭐️ 16 Amazon Leadership Principles

👎 The Bad Answer

👍 The Good Answer (STAR Framework)

Amazon Data Scientist Interview Prep

Understand the Business

Leadership Principles Are Not Optional

Technical Preparation

First purchase after Prime signup within 30 days

The Product Sense Round

Mock Interviews (Do Them, Seriously)

The Week Before

Frequently Asked Questions

Dan Lee

Related Articles

Product Data Scientist Interview Prep

xAI AI Engineer Interview Guide

Snap Machine Learning Engineer Interview Guide