eBay Data Scientist Guide (2026): Job, Salary & Interviews

eBay Data Scientist at a Glance

Total Compensation

$158k - $331k/yr

Interview Rounds

7 rounds

Difficulty

Levels

T23 - T27

Education

PhD

Experience

0–18+ yrs

SQL Pythonecommerce marketplaceexperimentation & A/B testingproduct analyticstrust & safety / fraud & riskmetrics & KPI monitoringpersonalization & rankingforecastingNLP / GenAI (nice-to-have)

eBay's data science interview barely touches deep learning, but it will punish you for not understanding two-sided marketplace dynamics. The candidates who struggle aren't missing technical chops. They can't articulate what happens to buyer conversion when you tweak a seller incentive, or why a search ranking win in Electronics might be a loss in Collectibles.

eBay Data Scientist Role

Primary Focus

ecommerce marketplaceexperimentation & A/B testingproduct analyticstrust & safety / fraud & riskmetrics & KPI monitoringpersonalization & rankingforecastingNLP / GenAI (nice-to-have)

Skill Profile

Math & Stats

High

Strong applied statistics expected, especially experimentation/A-B testing (hypothesis formulation, metric selection, statistical significance, confidence intervals) and solid statistical thinking for interpreting results. Evidence: eBay senior analytics DS role emphasizes A/B testing rigor; interview process includes basic statistics topics (distributions, hypothesis testing, confidence intervals, metrics design).

Software Eng

Medium

Hands-on coding in Python and SQL is required, plus automation of analytical workflows and code readability; however, heavy production-grade engineering is less central than analysis. Evidence: role requires Python for wrangling/modeling/automation and complex SQL; interview includes live SQL/Python coding. Uncertainty: exact SWE expectations vary by team (analytics vs ML platform).

Data & SQL

Medium

Ability to build/maintain SQL-based pipelines, analytical datasets, and dashboards; comfortable with large datasets in a data warehouse. Evidence: role explicitly calls out SQL-based data pipelines/dashboards/analytical datasets and large data warehouse experience.

Machine Learning

Medium

Expected to have exposure to statistical modeling/causal inference/predictive techniques, and interview loops may include ML deep dive; likely more applied than research-heavy for analyst-leaning DS roles. Evidence: preferred qualifications mention predictive techniques; interview guide notes ML deep dive on model building/deployment. Uncertainty: specific model families and depth depend on org (payments analytics vs ranking/recs).

Applied AI

Low

No explicit GenAI/LLM requirements in provided sources for the referenced eBay data science/analytics role and interview outline. Conservative estimate: may be nice-to-have in 2026 but not required for this DS/analytics profile (uncertain).

Infra & Cloud

Low

Sources emphasize warehouse-based analytics and experimentation rather than cloud/ML ops deployment. Interview guide mentions possible ML deployment discussion, but not specific cloud stack requirements. Conservative estimate due to limited direct evidence.

Business

High

Strong product/business partnering and translating ambiguous problems into structured analyses; focus on conversion, reliability, monetization, and communicating actionable insights. Evidence: role highlights partnering with PM/business stakeholders, structuring ambiguous problems, and driving product decisions; interview guide emphasizes product sense and business impact.

Viz & Comms

High

Clear communication to technical and non-technical audiences via narratives, dashboards, presentations; case study deliverables like Jupyter notebook/slide deck. Evidence: role requires communicating through dashboards/presentations and clear insights; interview process includes take-home case emphasizing storytelling and clarity.

What You Need

Advanced SQL (complex queries, joins, window functions, performance optimization)
Python for data analysis (data wrangling, EDA, statistical modeling, automation)
Experimentation / A-B testing (hypotheses, metrics, significance, interpretation)
Working with large datasets in data warehouse environments
Analytical problem-solving and attention to detail
Stakeholder collaboration (product/business partners) and ambiguity-to-structure problem framing
Communicating actionable insights (written narratives, dashboards, presentations)

Nice to Have

Domain experience in payments/fintech/e-commerce/platform analytics
Statistical modeling, causal inference, predictive techniques
Dashboarding experience (Tableau, Looker, or similar)
Data modeling concepts and analytics engineering practices
Experience in fast-paced, product-driven environments with global stakeholders

Languages

SQLPython

Tools & Technologies

Data warehouse environment (unspecified in sources)pandasnumpyscipymatplotlibseabornTableauLookerJupyter Notebook

Want to ace the interview?

Practice with real questions.

Start Mock Interview

You sit inside a product team (Search & Discovery, Seller Experience, Trust & Safety, Ads & Monetization, or Buyer Success) and own the metrics, experiments, and analytical narratives for that area. The work runs on SQL against eBay's warehouse, Python in Jupyter notebooks, and dashboarding tools like Tableau or Looker. Success after year one means you've shipped experiment readouts that changed a PM's roadmap and earned enough context that stakeholders pull you in before they've decided what to build.

A Typical Week

A Week in the Life of a Ebay Data Scientist

Typical L5 workweek · Ebay

Weekly time split

Analysis — 30%Meetings — 18%Writing — 18%Coding — 15%Research — 7%Break — 7%Infrastructure — 5%

Culture notes

eBay runs at a steady but not frantic pace — most data scientists work roughly 9-to-5:30 with occasional crunches around quarterly business reviews or major experiment launches.
eBay moved to a hybrid model requiring three days per week in the San Jose office, with most teams clustering Tuesday through Thursday on-site and keeping Monday or Friday flexible for remote deep work.

The breakdown won't shock you until you notice how much time goes to scoping, not executing. Monday afternoon you're translating a vague PM ask ("why did listing completions drop in this category?") into a concrete analysis plan with defined metrics and a timeline. That unglamorous translation work is where experienced DSs at eBay separate themselves from people who just run queries.

Projects & Impact Areas

Trust & Safety produces some of eBay's hardest DS problems because the open marketplace (anyone can list anything) creates adversarial scenarios like fake luxury goods and coordinated seller rings that a first-party retailer never faces. Search ranking presents a different puzzle: matching a buyer searching "1967 Camaro SS hood" to the right listing among billions of unique, long-tail items that commodity recommendation systems aren't designed for. Meanwhile, seller economics work (modeling how a promoted listings fee change affects participation and whether incremental ad revenue cannibalizes organic search quality) leans heavily on causal inference rather than prediction.

Skills & What's Expected

Statistics and business acumen are weighted far above ML sophistication here. The skill most candidates over-index on is deep learning, which matters only for specific ranking or NLP teams, not the broader DS org. The most underrated skill is designing experiments on a two-sided marketplace where treating sellers changes the buyer experience, a SUTVA violation that most textbook A/B testing prep ignores. SQL proficiency means window functions, self-joins for sessionization, and queries that perform well at warehouse scale. ML expectations scale with level and team: a fraud classifier or demand forecast model is sufficient for many roles, but senior loops (T25+) may include a deep dive on model evaluation, feature engineering, and production considerations.

Levels & Career Growth

Ebay Data Scientist Levels

Each level has different expectations, compensation, and interview focus.

Base

$125k

Stock/yr

$22k

Bonus

$11k

0–2 yrs BS in a quantitative field (CS, Statistics, Math, Economics, Engineering) required; MS/PhD preferred for some teams.

What This Level Looks Like

Owns well-scoped analyses or model components within a single product area; impacts a feature, experiment, or workflow through metrics definition, insights, and incremental modeling improvements under close mentorship.

Day-to-Day Focus

→Fundamentals of experimentation and causal reasoning
→Data quality, instrumentation, and metric correctness
→Strong SQL and pragmatic Python data workflows
→Clear communication of assumptions, limitations, and next steps
→Learning eBay domain (marketplace, search, trust, payments, shipping)

Interview Focus at This Level

Emphasis on SQL and basic Python, statistics/experimentation fundamentals (hypothesis testing, confidence intervals, p-values, power, interpreting A/B tests), product sense and metrics, and ability to communicate a structured analysis. ML questions are typically applied and foundational rather than deep research-level.

Promotion Path

Demonstrate consistent end-to-end ownership of small-to-medium projects: independently framing ambiguous questions, selecting appropriate methods, delivering accurate analyses/models that influence decisions, improving stakeholder trust through clear communication, and showing increasing autonomy in experiment design and model iteration to reach Data Scientist II (T24).

Find your level

Practice with questions tailored to your target level.

Start Practicing

The widget shows the level bands. What it won't tell you is that the T25-to-T26 jump is where most careers stall. The blocker, based on what the promo criteria describe, isn't technical depth. It's cross-team influence: setting measurement standards other DS teams adopt, or shaping a product roadmap through your analysis. T27 (Principal) roles are scarce and often resemble applied research leads focused on ranking algorithms or marketplace experimentation methodology.

Work Culture

eBay requires three days per week in the San Jose office, with most teams clustering Tuesday through Thursday on-site and keeping Monday or Friday flexible for remote deep work. The pace is calmer than Meta or Amazon: you'll work roughly 9-to-5:30 most weeks, with occasional crunches around quarterly business reviews or major experiment launches. Bureaucracy can slow experiment approvals, but the tradeoff is that your analysis gets the time and scrutiny to actually be defensible before it reaches a VP.

eBay Data Scientist Compensation

eBay's RSUs vest 25% per year over four years, paid out in quarterly chunks of 6.25%. No backloading, no cliff before your first quarterly vest hits. The simplicity is the selling point: your Year 1 comp is predictable, unlike Amazon's 5/15/40/40 structure where early-year cash has to fill the gap.

When negotiating, the offer notes themselves hint at the playbook: equity can often be rebalanced upward if base is capped by band. Ask for a larger initial RSU grant instead of grinding over a few thousand in base, because that grant pays out across four years of vesting. A signing bonus is also worth requesting explicitly, since it bridges the months before your first quarterly RSU vest lands in your brokerage account.

eBay Data Scientist Interview Process

7 rounds·~4 weeks end to end

Initial Screen

2 rounds

Recruiter Screen

30mPhone

Kick off with a recruiter call focused on role alignment, location/leveling, and what kind of marketplace problems you’ve worked on. You’ll discuss your resume highlights, why this team, and practical logistics like compensation range and start date.

generalbehavioral

Tips for this round

Prepare a 60–90 second story that maps your most relevant DS work to marketplace themes (search, pricing, trust/safety, seller tools, experimentation).
Have a crisp inventory of your tool stack (SQL, Python, experimentation platforms, dashboards) and what you personally owned vs. supported.
Use a concrete impact metric for each project (e.g., conversion, GMV, CTR, defect rate, latency, operational cost) and quantify lift where possible.
Be ready to state work authorization, preferred office/remote setup, and earliest start date to avoid delays later.
Ask what the core evaluation pillars are for this loop (SQL depth, experiment design, product sense, ML) so you can tailor prep.

Hiring Manager Screen

45mVideo Call

Next, you’ll speak with the hiring manager about the specific domain (e.g., search/recs, marketplace health, trust signals) and how data science drives decisions. Expect probing on how you frame ambiguous problems, choose success metrics, and work cross-functionally with PMs and engineers.

product_senseab_testingbehavioralmachine_learning

Tips for this round

Structure answers with a simple framework: goal → user journeys (buyer/seller) → metric tree → approach → risks/tradeoffs.
Bring one example of an A/B test you designed end-to-end, including sample ratio mismatch checks, guardrails, and rollout decisions.
Show how you translate analysis into action (PRD comments, experiment readouts, monitoring dashboards, follow-up iterations).
Prepare to discuss stakeholder management: handling conflicting goals (conversion vs. trust/safety, short-term vs. long-term value).
Clarify the team’s data reality (event logging, attribution windows, bot traffic) and mention how you’ve dealt with messy data before.

Technical Assessment

3 rounds

SQL & Data Modeling

60mLive

Expect a live SQL round where you write queries against marketplace-style tables like users, listings, transactions, clicks, and experiments. You’ll be evaluated on correctness, edge-case handling, and how you reason about joins, window functions, and metric definitions.

databasedata_modelingstats_codingdata_engineering

Tips for this round

Practice window functions (ROW_NUMBER, LAG, SUM OVER) for funnels, retention cohorts, and deduping events.
State assumptions before coding: time zones, attribution windows, cancellations/returns, and how to treat multiple purchases.
Use CTEs to keep logic readable and to separate data cleaning from aggregation steps.
Double-check join cardinality to avoid metric inflation; call out one-to-many risks and how you prevent them.
Be ready to discuss schema design choices (fact vs. dimension tables, experiment assignment tables, event logging granularity).

Statistics & Probability

60mLive

You’ll be tested on the stats that power large-scale experimentation and marketplace measurement. The interviewer will probe hypothesis testing, confidence intervals, power, bias, and how you’d interpret noisy results in real product settings.

statisticsprobabilityab_testingcausal_inference

Tips for this round

Know A/B testing fundamentals cold: null/alternative, p-values, CI interpretation, Type I/II errors, and power drivers.
Practice marketplace-relevant pitfalls: novelty effects, interference/network effects, selection bias, and seasonality.
Be able to explain SRM detection, multiple testing control (FDR/Bonferroni), and when to use sequential testing.
Prepare to discuss causal inference options beyond A/B tests (DiD, matching, IV) and the assumptions they require.
Show how you choose and defend guardrail metrics (latency, defects, cancellation rate, buyer complaints) alongside primary metrics.

Machine Learning & Modeling

60mLive

A modeling-focused session will cover how you build and evaluate ML for marketplace problems such as ranking, recommendations, fraud detection, or pricing signals. You should expect questions on feature design, leakage, offline vs. online metrics, and how you’d monitor model performance.

machine_learningml_codingstatisticsproduct_sense

Tips for this round

Be ready to walk through an end-to-end modeling project: objective, label definition, training data, feature store/logging, evaluation, deployment, monitoring.
Explain metric alignment for the domain (AUC/PR for fraud, NDCG/MAP for ranking, calibration for risk scores) and tradeoffs.
Call out data leakage patterns (post-event features, time travel, target leakage via aggregation) and how you prevent them.
Discuss handling imbalance and drift (reweighting, thresholding, retraining cadence, monitoring dashboards, alerting).
Prepare at least one example of turning offline gains into an online experiment plan with guardrails and rollout strategy.

Onsite

2 rounds

Product Sense & Metrics

60mVideo Call

You’ll be given a business problem and asked to define success metrics, diagnose a metric change, or propose an experiment for a product surface like search, listing pages, or seller tools. The focus is on structured thinking, metric trees, and clear communication rather than heavy math.

product_senseab_testingguesstimatevisualization

Tips for this round

Use a metric hierarchy: north star (e.g., GMV or successful transactions) → drivers (conversion, AOV, traffic quality) → guardrails (trust/safety, cancellations, latency).
When diagnosing drops, segment systematically (platform, geo, category, new vs. returning, acquisition channel, seller cohort) and propose top hypotheses.
Talk about marketplace two-sided effects: buyer experience vs. seller incentives, and how metrics can move in opposite directions.
Sketch an experiment plan: unit of randomization, exposure definition, logging needs, ramp plan, and decision criteria.
Communicate with an “executive summary first” style, then back it with supporting cuts and sanity checks.

Behavioral

45mVideo Call

The final conversation typically focuses on collaboration, ownership, and how you operate in cross-functional teams. Expect deep dives into past projects, disagreement handling, and how you influence decisions with data under tight timelines.

behavioralgeneral

Tips for this round

Prepare 6–8 STAR stories covering conflict, ambiguity, failed analysis, influencing without authority, and delivering impact.
Emphasize how you partner with PM/engineering: translating vague asks into testable hypotheses and measurable outcomes.
Include one story where you improved data quality or instrumentation (event definitions, logging gaps, pipeline checks).
Show good judgment around ethics and trust/safety: how you avoid optimizing metrics in ways that harm users or sellers.
Close by summarizing your fit to the domain (experimentation rigor + marketplace intuition + ability to ship) and ask about team expectations in the first 90 days.

Tips to Stand Out

Build a marketplace metric tree. Practice mapping buyer and seller journeys to measurable drivers (traffic quality, conversion, AOV, cancellations/returns, complaints) and explicitly call out guardrails tied to trust and operational health.
Over-prepare SQL for event data. Drill window functions, deduping, sessionization, and attribution windows; most mistakes come from join cardinality and poorly defined denominators.
Treat experimentation as a product skill. Be able to design an A/B test, justify the unit of randomization, estimate power, handle SRM, and explain how you’d make a ship/no-ship decision when results are mixed.
Connect ML to online impact. Translate offline model metrics into an experiment plan and monitoring approach; explain how ranking/recs/fraud models affect user experience and marketplace outcomes.
Communicate like an owner. Lead with the decision, then the evidence; show how you write clear readouts, align stakeholders, and iterate quickly when data is messy or incomplete.
Prepare domain-specific stories. Have examples involving personalization, pricing/elasticity, trust signals, anomaly detection, or marketplace health monitoring to demonstrate relevance beyond generic DS work.

Common Reasons Candidates Don't Pass

✗Weak metric definitions. Candidates lose points when they can’t define precise denominators, time windows, and edge cases (returns, cancellations, bots), leading to untrustworthy analysis.
✗Shallow experimentation reasoning. Getting p-values right isn’t enough; rejection often comes from missing SRM, multiple testing, interference, or ignoring guardrails and rollout risk.
✗SQL correctness issues. Frequent failures include incorrect joins, double-counting, mishandled nulls, and inability to use window functions to answer retention/funnel questions.
✗Modeling without product framing. Presenting algorithms without label clarity, leakage controls, or an online evaluation plan signals inability to drive real marketplace impact.
✗Unclear communication and collaboration. Rambling explanations, inability to defend tradeoffs, or blaming stakeholders during behavioral deep dives can outweigh strong technical skills.

Offer & Negotiation

For Data Scientist roles at a company like eBay, total compensation typically includes base salary, an annual bonus target, and RSUs with multi-year vesting (commonly 4 years, often with a 1-year cliff and quarterly vesting thereafter). The most negotiable levers are base salary, sign-on bonus, and RSU refresh/signing equity; bonus target is often level-based and less flexible. Anchor with your leveled scope (IC level and expected ownership), bring competing offers or market ranges, and ask whether equity can be rebalanced upward if base is capped by band.

The number one reason candidates wash out is sloppy metric definitions. Not just in the Product Sense round, either. Interviewers across SQL, experimentation, and modeling all probe whether you can specify precise denominators, handle edge cases like returns and bot traffic, and reason about time windows for a marketplace where buyer and seller metrics often conflict. If you can't cleanly define "conversion rate" for a listing page that serves both auction and fixed-price formats, that weakness shows up everywhere.

Most candidates underestimate the Hiring Manager Screen's filtering power. At many companies this round is a vibe check. At eBay, from what candidates report, the hiring manager digs into how you framed ambiguous marketplace problems and chose success metrics in past work. Getting past this screen without a concrete example of driving a decision with data (not just producing an analysis) is unlikely. Prepare for it like a technical round, because it functions as one.

eBay Data Scientist Interview Questions

Experimentation & A/B Testing

Expect questions that force you to design and critique experiments end-to-end: hypothesis, unit of randomization, power/MDE, and interpreting messy results. Candidates often stumble on marketplace-specific pitfalls like interference, noncompliance, and choosing metrics that won’t be gamed.

You run an A/B test that adds a tougher login challenge for suspicious buyers to reduce chargebacks. What is your primary success metric and one guardrail metric, and what unit of randomization do you choose to avoid bias from repeat buyers?

EasyMetric Design and Randomization Unit

Sample Answer

Most candidates default to chargeback rate per transaction, but that fails here because fraud controls change who is allowed to transact, so the denominator shifts and you can win by blocking good buyers. Use an intent-based metric like chargebacks per unique buyer (or per attempted checkout) plus a guardrail like completed checkout rate or GMV per eligible buyer to detect over-blocking. Randomize at buyer account (or buyer device-linked identity) to avoid within-user spillover from repeated attempts and to keep exposure consistent across sessions.

In an A/B test for a new seller risk banner, treatment assignment is at seller level, but buyers interact with many sellers in one week. How do you handle interference and what analysis method do you use to estimate impact on buyer-level conversion?

MediumInterference and Clustered Inference

Sample Answer

Use cluster-robust inference with exposure-aware analysis, or redesign to randomize at a higher level (like buyer) if the product allows it. Interference happens because one buyer can see both treated and control sellers, so naive buyer-level t-tests understate variance and bias the effect toward zero. Define exposure, for example fraction of impressions to treated sellers, then estimate a dose-response or compare high-exposure vs low-exposure groups with cluster-robust standard errors at buyer (and often seller) level.

Your fraud model rollout is A/B tested by routing $50\%$ of checkout attempts through a stricter risk threshold, but $15\%$ of treated attempts bypass the model due to a timeout fallback. How do you estimate the causal effect on chargebacks and on completed checkouts, and which estimand do you report to leadership?

HardNoncompliance, ITT vs LATE

Practice more Experimentation & A/B Testing questions

Product Sense & Metrics (Marketplace + Trust & Safety)

Most candidates underestimate how much you’ll be pushed to define success for ambiguous product changes using defensible KPIs and guardrails. You’ll need to connect buyer/seller incentives, trust signals (fraud, disputes, seller standards), and marketplace health into a coherent measurement plan.

eBay wants to show a new seller trust badge on listing pages for sellers that meet performance thresholds. What is your primary success metric and your top 3 guardrail metrics to ensure you do not increase fraud or bad buyer experiences?

EasyMetrics Design and Guardrails

Sample Answer

Primary success metric is completed buyer GMV per visitor (or per session), with guardrails on fraud, disputes, and seller health. GMV captures the marketplace goal only if it is tied to completion, otherwise you overcount canceled or refunded orders. Fraud and disputes can rise even when conversion rises, so you track order defect rate, INR or SNAD rate, and chargeback rate as hard guardrails. Add a seller side guardrail like seller policy violations per active seller to avoid incentivizing short-term conversion via risky sellers.

You ship a new risk model that blocks a subset of high risk checkout attempts, and leadership asks for one KPI to put on a weekly dashboard for "trust impact". Define that KPI precisely and explain how you would make it robust to changes in traffic mix and seasonality.

MediumKPI Definition and Monitoring

Sample Answer

You could use raw blocked fraud dollars per week, or you could use fraud loss rate on completed orders, for example $\text{fraud\_loss\_\$} / \text{completed\_GMV\_\$}$. Raw blocked dollars is intuitive but it is mostly a traffic proxy, it jumps when GMV or attack volume jumps. The rate metric wins here because it normalizes for volume and lets you compare across weeks even when traffic mix shifts. Make it robust by slicing and reweighting by key mix dimensions (country, payment method, device, new vs returning), then tracking both the overall rate and a fixed-mix version.

A new "one click return" flow reduces buyer friction, and an A/B test shows higher conversion, but also higher return rates and slightly higher SNAD disputes. How do you decide whether to launch, and what decision framework and metrics do you put in front of the GM?

HardTradeoff Analysis and Launch Decision

Practice more Product Sense & Metrics (Marketplace + Trust & Safety) questions

Applied Statistics for Product Analytics

Your ability to reason about uncertainty is evaluated beyond rote formulas—confidence intervals, variance reduction, multiple testing, and diagnosing biased measurement come up a lot. The goal is to show statistical judgment when data is noisy, skewed, and segmented across geos, devices, and cohorts.

eBay rolls out a new Trust & Safety interstitial that warns buyers about suspected counterfeit listings, and you want to measure impact on completed purchases per buyer in 7 days, a heavy-tailed metric with many zeros. What statistical approach would you use to estimate lift and a $95\%$ CI, and what tradeoffs matter here?

MediumConfidence intervals and heavy-tailed metrics

Sample Answer

You could do a parametric approach (difference in means with a $t$-interval) or a nonparametric approach (bootstrap the difference in means). The parametric route is fast and simple, but it is brittle when the per-buyer outcome is zero-inflated and heavy-tailed, which is common in marketplace purchase data. Bootstrap wins here because it better reflects the empirical skew and outliers without pretending normality at the user level. You still sanity check stability (enough buyers, no single whale dominating), or your CI is fake precision.

In an A/B test on a new fraud model threshold, you monitor 20 metrics daily (chargeback rate, seller GMV, cancellation rate, buyer conversion, CS contacts) and on day 7 one metric shows $p=0.01$ but the rest are null. How do you decide whether to call it a real win, and what statistical corrections or validation would you apply?

HardMultiple testing and sequential monitoring

Practice more Applied Statistics for Product Analytics questions

SQL (Analytics Queries + Performance)

The bar here isn’t whether you can write a join, it’s whether you can compute metrics correctly at scale using windows, deduping, sessionization-style logic, and careful grain control. Expect realistic tables (events, transactions, users, listings) where small query mistakes silently break experiment readouts.

You need daily experiment readouts for a trust banner A/B test: for each experiment_id, variant, and event_date, compute exposed_users, purchasers_7d (purchase within 7 days of first exposure), and conversion_7d. Tables: experiment_exposures(user_id, experiment_id, variant, exposure_ts), orders(order_id, buyer_id, order_ts, gmv_usd). Deduplicate to each user’s first exposure per experiment before attributing orders.

MediumWindow Functions and Attribution

Sample Answer

Reason through it: Start by collapsing exposure data to one row per user per experiment, keeping the earliest exposure timestamp and its variant. Then join orders on buyer_id and time, restricting to $[exposure\_ts, exposure\_ts + 7\text{ days}]$ so you do not leak pre-exposure purchases. Aggregate by experiment_id, variant, and exposure date, count distinct users exposed, and count distinct users with at least one qualifying order. Conversion is purchasers_7d divided by exposed_users, guard against divide-by-zero.

SQL

1WITH first_exposure AS (
2  SELECT
3    user_id,
4    experiment_id,
5    -- Use the variant at the earliest exposure.
6    MIN_BY(variant, exposure_ts) AS variant,
7    MIN(exposure_ts) AS first_exposure_ts,
8    CAST(MIN(exposure_ts) AS DATE) AS exposure_date
9  FROM experiment_exposures
10  GROUP BY 1, 2
11), purchasers_7d AS (
12  SELECT
13    fe.experiment_id,
14    fe.variant,
15    fe.exposure_date,
16    fe.user_id,
17    1 AS purchased_7d
18  FROM first_exposure fe
19  JOIN orders o
20    ON o.buyer_id = fe.user_id
21   AND o.order_ts >= fe.first_exposure_ts
22   AND o.order_ts < fe.first_exposure_ts + INTERVAL '7' DAY
23  GROUP BY 1, 2, 3, 4
24)
25SELECT
26  fe.experiment_id,
27  fe.variant,
28  fe.exposure_date AS event_date,
29  COUNT(DISTINCT fe.user_id) AS exposed_users,
30  COUNT(DISTINCT p.user_id) AS purchasers_7d,
31  CAST(COUNT(DISTINCT p.user_id) AS DOUBLE) / NULLIF(COUNT(DISTINCT fe.user_id), 0) AS conversion_7d
32FROM first_exposure fe
33LEFT JOIN purchasers_7d p
34  ON p.experiment_id = fe.experiment_id
35 AND p.variant = fe.variant
36 AND p.exposure_date = fe.exposure_date
37 AND p.user_id = fe.user_id
38GROUP BY 1, 2, 3
39ORDER BY 1, 2, 3;

A marketplace health dashboard needs listing-level daily defect rates: for each day, compute $\frac{\#\text{listings with at least 1 policy removal}}{\#\text{active listings}}$ where active means at least 1 view event that day. Tables: listing_events(event_ts, listing_id, event_type) with event_type in ('view','policy_removed','purchase'). Write a query that is correct and avoids double counting listings with multiple removal events.

EasyDeduping and Metric Grain Control

Sample Answer

Start with what the interviewer is really testing: This question is checking whether you can control grain so repeated events do not inflate counts. Build a per-day, per-listing flag table for activity and removals using conditional aggregation, not raw event counts. Then aggregate to day-level by counting listings where active_flag = 1, and listings where active_flag = 1 and removed_flag = 1. Divide with NULLIF so empty days do not error.

SQL

1WITH per_listing_day AS (
2  SELECT
3    CAST(event_ts AS DATE) AS event_date,
4    listing_id,
5    -- Flags prevent double counting when there are many events.
6    MAX(CASE WHEN event_type = 'view' THEN 1 ELSE 0 END) AS is_active,
7    MAX(CASE WHEN event_type = 'policy_removed' THEN 1 ELSE 0 END) AS is_removed
8  FROM listing_events
9  GROUP BY 1, 2
10)
11SELECT
12  event_date,
13  SUM(is_active) AS active_listings,
14  SUM(CASE WHEN is_active = 1 AND is_removed = 1 THEN 1 ELSE 0 END) AS removed_active_listings,
15  CAST(SUM(CASE WHEN is_active = 1 AND is_removed = 1 THEN 1 ELSE 0 END) AS DOUBLE)
16    / NULLIF(SUM(is_active), 0) AS defect_rate
17FROM per_listing_day
18GROUP BY 1
19ORDER BY 1;

You are debugging a slow query that computes buyer 30-day chargeback rate by signup cohort. Tables: users(user_id, signup_ts), payments(payment_id, buyer_id, paid_ts, amount_usd), chargebacks(payment_id, chargeback_ts). Compute, per signup_date, total_payments_30d, chargebacked_payments_30d, and rate, but write it in a way that scales (no correlated subqueries).

HardPerformance, Join Order, and Preaggregation

Practice more SQL (Analytics Queries + Performance) questions

Causal Inference & Observational Studies

When experimentation isn’t possible, you’ll be assessed on how you approximate causal answers with imperfect data using quasi-experimental designs and strong assumptions. Interviewers look for clear identification strategies (DiD, matching/weighting, IV intuition) and crisp caveats tied to product decisions.

A new on-site delivery estimate banner launched only for high-volume sellers, and you need the causal impact on buyer conversion rate and cancellation rate using 8 weeks of daily seller-level data before and after launch. Describe a difference-in-differences design, what you would plot or test to support the parallel trends assumption, and one falsification check.

EasyDifference-in-Differences

Sample Answer

This question is checking whether you can turn a biased rollout into a credible identification strategy, and state assumptions without hand-waving. You should define treated sellers (banner eligible) and controls (ineligible), use a pre and post window, and estimate a DiD on conversion and cancellations, ideally with seller and time fixed effects. You should show pre-trend plots or an event study with leads, and you should run a placebo test (for example, pretend the launch happened earlier) or use an outcome that should not move (for example, unrelated category traffic) to catch confounding.

Trust & Safety introduces a stricter ML fraud score threshold that increases auto-cancellations, but the rollout is triggered when a listing’s fraud score crosses a fixed cutoff and you only observe outcomes for transacted listings. How would you estimate the causal effect on seller GMV and buyer trust signals, and what specific threats to validity do you need to address?

HardRegression Discontinuity and Selection Bias

Practice more Causal Inference & Observational Studies questions

Applied Machine Learning (Fraud/Risk + Ranking/Forecasting)

Rather than deep research math, you’ll be probed on selecting and evaluating practical models for trust & safety and marketplace optimization. Focus on label quality, leakage, imbalanced metrics, calibration/thresholding, offline vs online evaluation, and how model outputs translate into policy actions.

You are building a model to flag risky listings (counterfeit risk) within 10 minutes of creation, using seller history, listing text, and early buyer signals; what offline metrics do you use, and how do you pick an operating threshold if enforcement capacity is fixed at 5,000 actions/day? Call out one concrete leakage trap in feature generation and how you would detect it.

MediumFraud/Risk Modeling

Sample Answer

The standard move is to optimize PR-AUC (or precision at $k$) under heavy class imbalance, then choose a threshold that hits your daily review cap, for example the top 5,000 risk scores per day, and report calibrated precision, recall, and false positive rate at that point. But here, calibration matters because policy decisions and capacity constraints care about expected bad rate, not just ranking quality, so you should reliability-check scores by day, country, and seller segment and threshold per segment if base rates differ. Leakage trap: using post-enforcement fields (takedown reason, refund, chargeback) or any feature computed after the 10-minute window, even if it is “joined later”; detect it by enforcing feature timestamps $t_{feature} \le t_{prediction}$ and running a backtest where you rebuild features only from data available up to each historical prediction time.

You are re-ranking search results with an ML model to improve GMV, but Trust & Safety wants to down-rank sellers with elevated dispute risk; how do you evaluate the tradeoff offline and online, and what is your plan for monitoring if base dispute rate drifts after launch? Be explicit about at least one metric that will look good while the marketplace gets worse.

HardRanking and Forecasting Under Risk Constraints

Practice more Applied Machine Learning (Fraud/Risk + Ranking/Forecasting) questions

Experimentation, product sense, and applied statistics together consume over 60% of the interview, which tells you eBay hires for people who can reason about measurement on a two-sided marketplace, not people who can tune XGBoost hyperparameters. These areas compound in practice: a product metrics question about a seller trust badge will slide into designing the A/B test, then into whether your statistical approach holds when buyers browse listings from both treatment and control sellers in a single session. The biggest prep mistake this distribution implies is spending equal time on all six areas when ML and SQL together make up only a quarter of the weight.

Practice eBay-style questions across all six areas at datainterview.com/questions.

How to Prepare for eBay Data Scientist Interviews

Know the Business

Updated Q1 2026

Official mission

“We connect people and build communities to create economic opportunity for all.”

What it actually means

eBay's real mission is to facilitate global commerce by connecting millions of buyers and sellers, providing a platform for economic opportunity, and offering a vast and unique selection of goods. It aims to be the preferred destination for discovering value and unique items, particularly focusing on enthusiast buyers and high-value categories.

San Jose, CaliforniaHybrid - Flexible

Key Business Metrics

Revenue

$11B

+15% YoY

Market Cap

$39B

+26% YoY

Employees

12K

-6% YoY

Current Strategic Priorities

Transform through innovation, investment, and powerful tools designed to fuel sellers’ growth
Accelerate innovation using AI to make selling smarter, faster, and more efficient
Enhance trust throughout the marketplace
Connect the right buyers to unique inventory
Create more personalized, inspirational shopping experiences for all

eBay's Q4 2025 earnings showed $11.1B in revenue, up 15% YoY, while headcount shrank roughly 6.5%. That math tells you where the company is headed: doing more with fewer people, powered by AI and sharper data science. Their focus category strategy channels investment into luxury, trading cards, refurbished goods, and auto parts, which means DS teams are building category-specific models for search relevance, pricing, and fraud rather than one-size-fits-all pipelines.

The "why eBay" answer that falls flat is any version of "I want to work on a marketplace at scale." eBay's interviewers perk up when you name the specific two-sided tension their platform creates: every listing is user-generated, so Trust & Safety problems look fundamentally different than on platforms with curated inventory. Reference how their seller tools launches aim to reduce friction while maintaining buyer trust, and explain why that tradeoff excites you analytically. Showing you've thought about how a fraud model's false positive rate directly erodes seller retention on an open marketplace will separate you from candidates who rehearsed a generic answer.

Try a Real Interview Question

A/B test lift on trust metric with 7-day conversion and significance

sql

Given users assigned to an experiment variant and their post-exposure orders, compute for each variant the 7-day conversion rate $p$ and the lift $Δ = p_{treatment} - p_{control}$. Also output the pooled-proportion $z$-test statistic $$z = \frac{p_t - p_c}{\sqrt{\hat{p}(1-\hat{p})(\frac{1}{n_t}+\frac{1}{n_c})}}$$ where $\hat{p}$ is computed from combined successes over combined users; count a user as converted if they have at least one order within $7$ days after exposure.

experiment_assignments

user_id	experiment_id	variant	exposed_at
U1	trust_checkout_v1	control	2026-01-01
U2	trust_checkout_v1	treatment	2026-01-01
U3	trust_checkout_v1	control	2026-01-02
U4	trust_checkout_v1	treatment	2026-01-02

orders

order_id	user_id	order_ts
O1	U1	2026-01-03
O2	U2	2026-01-20
O3	U4	2026-01-05
O4	U4	2026-01-06

SQL

1WITH base AS (
2  SELECT
3    ea.experiment_id,
4    ea.variant,
5    ea.user_id,
6    ea.exposed_at,
7    CASE
8      WHEN EXISTS (
9        SELECT 1
10        FROM orders o
11        WHERE o.user_id = ea.user_id
12          AND o.order_ts >= ea.exposed_at
13          AND o.order_ts < ea.exposed_at + INTERVAL '7' DAY
14      ) THEN 1 ELSE 0
15    END AS converted
16  FROM experiment_assignments ea
17  WHERE ea.experiment_id = 'trust_checkout_v1'
18),
19agg AS (
20  SELECT
21    variant,
22    COUNT(*) AS n_users,
23    SUM(converted) AS n_converted,
24    CAST(SUM(converted) AS DOUBLE PRECISION) / NULLIF(COUNT(*), 0) AS p
25  FROM base
26  GROUP BY variant
27),
28paired AS (
29  SELECT
30    MAX(CASE WHEN variant = 'control' THEN n_users END) AS n_c,
31    MAX(CASE WHEN variant = 'control' THEN n_converted END) AS x_c,
32    MAX(CASE WHEN variant = 'control' THEN p END) AS p_c,
33    MAX(CASE WHEN variant = 'treatment' THEN n_users END) AS n_t,
34    MAX(CASE WHEN variant = 'treatment' THEN n_converted END) AS x_t,
35    MAX(CASE WHEN variant = 'treatment' THEN p END) AS p_t
36  FROM agg
37)
38SELECT
39  a.variant,
40  a.n_users,
41  a.n_converted,
42  a.p AS conversion_rate_7d,
43  (p.p_t - p.p_c) AS lift_treatment_minus_control,
44  ((p.p_t - p.p_c) /
45    NULLIF(
46      SQRT(
47        ((CAST(p.x_t + p.x_c AS DOUBLE PRECISION) / NULLIF((p.n_t + p.n_c), 0))
48         * (1.0 - (CAST(p.x_t + p.x_c AS DOUBLE PRECISION) / NULLIF((p.n_t + p.n_c), 0)))
49         * (1.0 / NULLIF(p.n_t, 0) + 1.0 / NULLIF(p.n_c, 0))
50        )
51      ),
52      0
53    )
54  ) AS z_statistic
55FROM agg a
56CROSS JOIN paired p
57ORDER BY CASE WHEN a.variant = 'control' THEN 0 ELSE 1 END, a.variant;

700+ ML coding problems with a live Python executor.

Practice in the Engine

eBay's SQL rounds, from what candidates report, lean into transaction-level data where you need to reason about both sides of a marketplace interaction in a single query. The problem above captures that flavor: you're not just aggregating, you're thinking about relationships between entities. Build fluency with similar problems at datainterview.com/coding.

Test Your Readiness

How Ready Are You for eBay Data Scientist?

1 / 10

Experimentation & A/B Testing

Can you design an A/B test for a new search ranking tweak on eBay, including defining the primary metric, guardrail metrics, unit of randomization (user, session, listing), and the minimum detectable effect?

Identify your weak spots, then drill the specific topic areas where you're least confident using datainterview.com/questions.

Frequently Asked Questions

How long does the eBay Data Scientist interview process take?

Most candidates report the full process taking about 4 to 6 weeks from first recruiter call to offer. You'll typically have a recruiter screen, a technical phone screen focused on SQL and stats, and then a virtual or onsite loop with 4 to 5 interviews. Some teams move faster, but don't be surprised if scheduling the final round adds a week or two.

What technical skills are tested in the eBay Data Scientist interview?

SQL is the backbone of every eBay DS interview, and I mean advanced SQL: window functions, complex joins, performance optimization. Python comes up for data wrangling and statistical modeling. You'll also get tested on A/B testing design and interpretation, working with large datasets, and your ability to frame ambiguous problems into structured analyses. At senior levels (T25+), expect deeper dives into causal inference and applied ML.

How should I tailor my resume for an eBay Data Scientist role?

Lead every bullet with measurable impact. eBay cares about connecting data work to business outcomes, so phrases like 'increased conversion by X%' or 'reduced churn by Y%' land well. Highlight experience with experimentation and A/B testing prominently since that's central to the role. If you've worked with large-scale data warehouses or collaborated cross-functionally with product teams, make that obvious. Keep it to one page for junior roles, two pages max for senior.

What is the total compensation for eBay Data Scientists by level?

Here's what I've seen from real data. T23 (Junior, 0-2 years): around $158K total comp with a $125K base. T24 (Mid, 3-6 years): about $187K TC on a $146K base. T25 (Senior, 5-10 years): roughly $230K TC with a $178K base. T26 (Staff, 8-18 years): around $329K TC on a $223K base. T27 (Principal): about $331K TC with a $252K base. Equity comes as RSUs on a standard 4-year vest, 25% per year.

How do I prepare for the behavioral interview at eBay?

eBay's core values are Customer Focus, Innovate Boldly, Be For Everyone, Deliver With Impact, and Act With Integrity. You need stories that map to these. Prepare 5 to 6 strong examples covering cross-functional collaboration, handling ambiguity, driving impact with data, and navigating disagreements. I've seen candidates get tripped up because they only prep technical stuff and treat behavioral rounds as an afterthought. Don't do that.

How hard are the SQL questions in eBay Data Scientist interviews?

They're solidly medium to hard. Expect multi-table joins, window functions like ROW_NUMBER and LAG, CTEs, and sometimes performance optimization questions. Junior candidates (T23) get slightly easier problems, but even those require comfort with subqueries and aggregations. I'd recommend practicing at datainterview.com/coding to get comfortable with the e-commerce style queries you'll likely see, things like calculating seller metrics or buyer retention.

What machine learning and statistics concepts should I know for eBay?

A/B testing is the single most important topic. You need to understand hypothesis testing, p-values, confidence intervals, statistical power, and how to handle bias in experiments. For ML-focused roles, be ready to discuss model evaluation, tradeoffs between different algorithms, and when you'd choose one approach over another. At Staff level and above, causal inference methods beyond basic A/B testing become important. Brush up on practical interpretation, not just formulas.

What format should I use for behavioral answers at eBay?

Use the STAR format (Situation, Task, Action, Result) but keep it tight. I coach people to spend about 20% on setup and 80% on what you actually did and the outcome. Quantify your results whenever possible. eBay values 'Deliver With Impact,' so vague answers like 'the project went well' won't cut it. Say what moved: revenue, engagement, efficiency. Practice telling each story in under 2 minutes.

What happens during the eBay Data Scientist onsite interview?

The onsite (often virtual now) is typically 4 to 5 rounds spread across a day. You'll face a SQL/coding round, a statistics and experimentation round, a product sense or business case round, and at least one behavioral interview. For senior roles, expect a round focused on problem framing where you're given an ambiguous business question and need to structure an analytical approach. Each round is usually 45 to 60 minutes with different interviewers.

What business metrics and product concepts should I study for eBay interviews?

eBay is a two-sided marketplace, so you need to think about both buyer and seller metrics. Know concepts like GMV (gross merchandise volume), take rate, conversion rate, buyer retention, seller churn, and search relevance. Be ready to define success metrics for a new feature and propose guardrail metrics that ensure you're not hurting one side of the marketplace. Practice product case questions at datainterview.com/questions to build this muscle.

What education do I need to get hired as a Data Scientist at eBay?

For junior roles (T23), a BS in a quantitative field like CS, Statistics, Math, Economics, or Engineering is required. An MS or PhD is preferred for some teams but not always mandatory. At mid and senior levels, many hires have graduate degrees, especially on ML-focused teams. For Staff and Principal roles (T26, T27), a PhD or MS is typical, though strong equivalent industry experience can substitute. Bottom line: a BS can get you in the door, but advanced degrees help at higher levels.

What are common mistakes candidates make in eBay Data Scientist interviews?

The biggest one I see is treating the product sense round as optional prep. eBay wants data scientists who think like product partners, not just query writers. Another common mistake is giving textbook definitions of statistical concepts without connecting them to real decisions. When they ask about A/B testing, they want to hear how you'd actually design the test, pick metrics, and handle edge cases. Also, don't skip the 'why eBay' question. Show you understand their marketplace model and care about the mission of connecting buyers and sellers globally.

eBay Data Scientist Interview Guide

eBay Data Scientist Role

A Typical Week

A Week in the Life of a Ebay Data Scientist

Weekly time split

Culture notes

Projects & Impact Areas

Skills & What's Expected

Levels & Career Growth

Ebay Data Scientist Levels

Work Culture

eBay Data Scientist Compensation

eBay Data Scientist Interview Process

Initial Screen

Recruiter Screen

Hiring Manager Screen

Technical Assessment

SQL & Data Modeling

Statistics & Probability

Machine Learning & Modeling

Onsite

Product Sense & Metrics

Behavioral

Tips to Stand Out

Common Reasons Candidates Don't Pass

eBay Data Scientist Interview Questions

Experimentation & A/B Testing

Product Sense & Metrics (Marketplace + Trust & Safety)

Applied Statistics for Product Analytics

SQL (Analytics Queries + Performance)

Causal Inference & Observational Studies

Applied Machine Learning (Fraud/Risk + Ranking/Forecasting)

How to Prepare for eBay Data Scientist Interviews

Try a Real Interview Question

A/B test lift on trust metric with 7-day conversion and significance

Test Your Readiness

Frequently Asked Questions

Dan Lee

Related Articles

Snap Machine Learning Engineer Interview Guide

Product Data Scientist Interview Prep

Salesforce Data Analyst Interview Guide