eBay Machine Learning Engineer Interview Guide

Dan Lee's profile image
Dan LeeData & AI Lead
Last updateMarch 16, 2026
eBay Machine Learning Engineer Interview

eBay Machine Learning Engineer at a Glance

Total Compensation

$125k - $380k/yr

Interview Rounds

8 rounds

Difficulty

Levels

T22 - T26

Education

PhD

Experience

0–15+ yrs

Pythonecommerce-marketplacesearch-rankingads-mlrecommendation-personalizationfraud-detectiontrust-safetyreal-time-inferenceml-platform-mle

eBay's ML engineers rank 2+ billion live listings where every piece of metadata is seller-generated and wildly inconsistent. That single constraint shapes everything about this role: the models you build, the validation cycles you run, and the infrastructure you operate. If you've only worked with clean, first-party catalog data, the adjustment is real.

eBay Machine Learning Engineer Role

Primary Focus

ecommerce-marketplacesearch-rankingads-mlrecommendation-personalizationfraud-detectiontrust-safetyreal-time-inferenceml-platform-mle

Skill Profile

Math & StatsSoftware EngData & SQLMachine LearningApplied AIInfra & CloudBusinessViz & Comms

Math & Stats

High

Strong foundation expected in data analysis, model evaluation, and experimentation methodology (e.g., experiment design, validation, diagnosing regressions). Interview prep sources emphasize probability/statistics coverage. Not explicitly research-heavy, so typically not "expert".

Software Eng

High

Role emphasizes end-to-end productionization, designing reliable components, writing technical build/implementation plans, mentoring/reviewing, and integrating ML into live services; interview process includes DS&A/coding rounds.

Data & SQL

High

Explicit requirement to design/operate both batch and real-time inference pipelines with guarantees around correctness, reproducibility, and operational stability; handling noisy/large-scale data into reliable signals.

Machine Learning

High

Hands-on applied ML required: building and deploying predictive modeling solutions, modeling + experimentation + validation, accuracy evaluation, performance analysis, and iterative improvement in production.

Applied AI

Medium

Some evidence of transformer/modern model knowledge appearing in interview experiences; however, core posting focuses more broadly on predictive modeling and inference pipelines rather than explicit GenAI/LLM product work. Estimate is conservative.

Infra & Cloud

High

Production deployment is central (operationalizing models, high-traffic/high-reliability systems preferred). Close collaboration with platform/infra teams and ownership of operational debugging and architectural decisions are emphasized; specific cloud vendor details not explicit in provided sources.

Business

Medium

Must translate business/product intent into measurable production outcomes and operate from ambiguous problem statements; cross-team alignment and influencing stakeholders is repeatedly emphasized.

Viz & Comms

High

Strong technical communication explicitly required, including written design documentation and executive-level explanations; role involves aligning multiple teams and providing technical clarity.

What You Need

  • Production ML model development and deployment (predictive modeling solutions)
  • Model evaluation, performance analysis, and regression diagnosis
  • Experimentation methodology (design, validation, metrics/A-B style thinking)
  • Batch and real-time inference pipeline design and operation
  • Data analysis on imperfect/large-scale datasets; signal extraction from noisy data
  • Production-safe ML workflows enabling rapid experimentation
  • Integration of ML outputs into live systems with platform/application teams
  • Technical design documentation and implementation planning
  • Debugging and operational support for ML-based services
  • Cross-functional collaboration and influence across engineering/data/product

Nice to Have

  • Operationalizing ML in high-traffic, high-reliability systems
  • Distributed data processing familiarity (e.g., Spark-like patterns) (inferred from "distributed data processing" preference; specific tech varies)
  • Scalable inference architectures
  • Applied ML for classification/inference/decision-support systems
  • Ownership of shared ML workflows/platforms used by multiple teams

Languages

Python

Tools & Technologies

ML tooling in Python (unspecified in sources; likely common frameworks—uncertain)Batch inference pipelinesReal-time/online inference pipelinesExperimentation and model validation toolingDistributed data processing frameworks (preferred; specific stack not named in the job post source)GPU inference stack (e.g., Triton) (mentioned in interview preparation source; may be team-dependent)Workflow orchestration (e.g., Airflow) (mentioned in interview preparation source; may be team-dependent)Feature store concepts (mentioned in interview preparation source; may be team-dependent)

Want to ace the interview?

Practice with real questions.

Start Mock Interview

You'll own production ML models for one of eBay's core surfaces: search ranking, promoted listings, recommendations, or fraud detection. These models serve a marketplace where sellers create their own listings with unpredictable quality, so your day-to-day involves as much data debugging and production validation as it does modeling. Year-one success looks like shipping a model change that moves an online metric (search relevance, ad click-through, fraud catch rate) while earning enough operational trust to own your deployment pipeline end-to-end.

A Typical Week

A Week in the Life of a Ebay Machine Learning Engineer

Typical L5 workweek · Ebay

Weekly time split

Coding30%Meetings18%Analysis15%Infrastructure12%Writing10%Break10%Research5%

Culture notes

  • eBay runs at a steady large-company pace — weeks are structured but not frantic, and most engineers protect focused afternoon blocks for deep work without guilt.
  • eBay requires hybrid in-office attendance (typically Tuesday through Thursday at the San Jose campus), with Monday and Friday commonly worked from home.

The ratio of production validation to model architecture work will surprise you. Tracing why a listing quality score returns stale values from the feature store, or investigating why item condition fields are blank across an entire category like Collectibles, eats more hours than hyperparameter tuning. The mid-week deep modeling block (refactoring a two-tower retrieval model, adjusting negative sampling) is real, but you earn it by clearing Monday's operational debt first.

Projects & Impact Areas

Search ranking is the gravitational center, where retrieval and re-ranking models must handle seller-created inventory with no standardized catalog to fall back on. Promoted listings ML has grown increasingly strategic as eBay's advertising revenue expands, and those models must balance ad relevance against organic results without eroding buyer trust. Trust & Safety teams (based in Toronto and San Jose) tackle a different flavor of the problem: fraud detection where adversaries actively adapt, forcing retraining and monitoring cycles far tighter than what search ranking requires.

Skills & What's Expected

Deployment and infrastructure skills matter more here than at most peer companies. The provided data rates infrastructure/cloud deployment as "high" and notes that specific cloud vendor details aren't standardized across teams, so you can't assume a single managed platform in system design answers. Modern AI and GenAI knowledge is rated "medium," not negligible. eBay has shipped AI-powered listing tools and the role specialization explicitly includes transformer-based models, so dismissing GenAI entirely would be a mistake. The sweet spot is someone who pairs classical ranking and recommendation depth with real production engineering chops (pipeline orchestration, inference serving, monitoring).

Levels & Career Growth

Ebay Machine Learning Engineer Levels

Each level has different expectations, compensation, and interview focus.

Base

$105k

Stock/yr

$12k

Bonus

$8k

0–2 yrs BS in Computer Science, Engineering, Mathematics, Statistics or related field (MS preferred) or equivalent practical experience.

What This Level Looks Like

Contributes to a single ML component or small end-to-end feature (model, data pipeline, evaluation, or serving integration) within an established ML system. Impact is typically limited to one team’s roadmap and measured via offline metrics and small online experiments under guidance.

Day-to-Day Focus

  • Strong software engineering fundamentals (readability, testing, reliability)
  • ML fundamentals (supervised learning, evaluation, overfitting, feature leakage)
  • Data proficiency (SQL, data validation, pipeline basics)
  • Production awareness (latency, scalability basics, monitoring, rollback)

Interview Focus at This Level

Emphasis on coding ability (Python and/or another backend language), core ML knowledge (model selection, evaluation, leakage, bias/variance), practical data/SQL skills, and ability to reason about taking a model from notebook to production with basic MLOps hygiene (testing, monitoring, reproducibility). System design is lightweight and focused on simple ML pipeline/serving components rather than large-architecture ownership.

Promotion Path

Promotion to the next level typically requires independently owning a well-scoped ML feature end-to-end (data + model + deployment), consistently delivering high-quality production changes, demonstrating strong debugging/operational excellence (monitoring, incident follow-up), improving team velocity via reusable components or automation, and showing growing autonomy in scoping work and communicating tradeoffs to stakeholders.

Find your level

Practice with questions tailored to your target level.

Start Practicing

T24 (Senior) and T25 (Staff) are the levels that appear most often in current external ML job postings. The jump from T24 to T25 is the hardest promotion because it demands multi-team influence and platform-level thinking, not just shipping great models within your own pod. eBay uses "MTS" (Member of Technical Staff) titles alongside T-levels, which confuses external candidates: MTS-1 maps roughly to T23, MTS-2 to T24, and T26 Principal MTS roles show up in active postings with real architectural scope.

Work Culture

eBay's hybrid model has most ML engineers in-office Tuesday through Thursday at the San Jose campus (Austin and Bengaluru are the other major hubs), with Monday and Friday commonly remote. With roughly 12,000 employees globally, you get more direct product influence than you would at a company ten times that size, though internal mobility options are narrower if your team's charter shifts. eBay's mission around "economic opportunity for all" shows up in practice: ML teams actively debate ranking fairness for small sellers rather than optimizing purely for GMV.

eBay Machine Learning Engineer Compensation

eBay does not publicly disclose its RSU vesting schedule. The data that exists for other large tech companies (Amazon's backloaded structure, Google's even quarterly vest) simply can't be projected onto eBay. Ask your recruiter directly about cliff periods, vesting cadence, and, most importantly, refresh grant policies before you sign anything.

The negotiation data provided in the widget shows wide bands at T25 and T26, with a $220k+ spread between min and max total comp. That range is your signal: at Staff and above, there's real room to move. Your strongest lever is anchoring on the specific scope of the role you're being hired for (owning search ranking models across 2B+ listings, for example, or leading fraud detection for a global marketplace) rather than relying on generic competing-offer tactics. Equity and sign-on amounts are where conversations tend to have the most flexibility, so push there with concrete numbers tied to what you'd be leaving behind.

eBay Machine Learning Engineer Interview Process

8 rounds·~4 weeks end to end

Initial Screen

2 rounds
1

Recruiter Screen

30mPhone

Kick off with a recruiter conversation focused on role fit, location/remote expectations, and compensation alignment. You’ll walk through your resume with emphasis on ML projects that shipped to production (search, ads, personalization, fraud). Expect light probing on availability, work authorization, and what team/domain you’re targeting inside the marketplace ecosystem.

generalbehavioral

Tips for this round

  • Prepare a 60-second pitch that names 1-2 shipped ML systems, the metric moved (e.g., CTR, NDCG, fraud catch rate), and the scale (QPS, latency, data volume).
  • Have a crisp compensation range ready and ask how eBay structures base/bonus/RSUs for this level before giving a hard number.
  • Clarify domain preferences early (search ranking vs ads relevance vs trust/fraud) and tie your experience to that area.
  • Share concrete tooling you’ve used end-to-end (Python, Spark, Airflow, Kubernetes, Triton/TF-Serving) to signal production maturity.
  • Ask about interview format and whether a long final loop is expected so you can plan stamina and scheduling.

Technical Assessment

4 rounds
3

Coding & Algorithms

60mVideo Call

Expect a live coding session where you solve one or two problems under time pressure, typically in Python. The interviewer will care about correctness, complexity, and how you communicate your approach while debugging. Some prompts may be ML-adjacent (ranking, retrieval, string/array processing) but are graded like standard SWE problems.

algorithmsdata_structuresengineeringml_coding

Tips for this round

  • Practice writing clean Python with tests-in-head: edge cases, empty inputs, duplicates, and time/space complexity commentary.
  • Use a consistent problem-solving template: restate, list constraints, propose approach, then code and validate with examples.
  • Be fluent with hash maps, heaps, two pointers, BFS/DFS, and sorting-based techniques since these often appear in ranking/relevance contexts.
  • Narrate tradeoffs and add quick sanity checks (assertions, small dry-runs) to reduce mistakes.
  • If stuck, propose a baseline solution first, then optimize—showing progression matters in ambiguous setups.

Onsite

2 rounds
7

System Design

60mVideo Call

During the final loop, you’ll be asked to design an end-to-end ML system such as real-time search ranking, ads relevance, or fraud detection. Expect questions on data ingestion, feature stores, training cadence, online serving, latency budgets, and experimentation/rollout strategy. The interviewer will challenge scalability and reliability choices, especially around real-time inference and A/B testing infrastructure.

ml_system_designsystem_designcloud_infrastructuredata_pipeline

Tips for this round

  • Start with requirements: target metric, latency/QPS, freshness needs, and failure modes (fallback ranking, circuit breakers).
  • Propose a two-stage architecture (retrieval + ranking) and specify what runs online vs offline with clear boundaries.
  • Include an experimentation plan: logging, exposure attribution, guardrails, and rollback criteria.
  • Address observability: data quality checks, drift, model performance by segment, and alerting tied to business KPIs.
  • Mention inference optimization techniques (caching, batching, distillation, quantization, Triton-style serving) when latency is tight.

Tips to Stand Out

  • Tell one end-to-end production story. Pick a flagship ML project and be able to explain data collection, labeling, leakage prevention, offline metrics, online A/B design, deployment, monitoring, and rollback with numbers (traffic, latency, lift).
  • Bias toward marketplace-relevant examples. Frame answers around search relevance, ads ranking, personalization, or fraud/trust, using metrics like CTR/CVR, NDCG, GMV, chargeback rate, and seller/buyer experience guardrails.
  • Prepare for ambiguity in prompts. Practice clarifying questions (baseline rates, constraints, attribution windows, data availability) because candidates report unclear problem framing and you’ll be graded on how you impose structure.
  • Be fluent in experimentation details. Expect to discuss sample size/power, novelty effects, multiple testing, and segmentation; be ready with variance reduction ideas (CUPED/stratification) and robust guardrails.
  • Rehearse ML system design with latency constraints. Marketplace systems often require real-time inference; practice architectures with retrieval+ranking, feature stores, streaming updates, caching, and inference optimization.
  • Communicate while coding. In live coding, narrate tradeoffs, write small checks, and keep code readable; many rejections come from silent debugging and missed edge cases rather than lack of knowledge.

Common Reasons Candidates Don't Pass

  • Unstructured answers under ambiguity. Candidates get rejected when they jump into solutions without clarifying assumptions, success metrics, or constraints, leading to misaligned designs and incorrect conclusions.
  • Weak production/MLOps depth. Not being able to discuss monitoring, drift, data quality, rollout strategies, or latency/QPS tradeoffs signals notebook-only experience and is a frequent blocker for ML engineer roles.
  • Experimentation gaps. Misinterpreting A/B results, ignoring guardrails, or failing to address multiple testing/peeking makes it hard to trust you with marketplace-impacting launches.
  • Coding fundamentals issues. Struggling with standard data structures, complexity analysis, or edge cases in live sessions can outweigh strong ML knowledge.
  • Metrics not tied to business impact. Talking only about model accuracy without connecting to online KPIs (CTR/CVR/GMV, fraud loss, user trust) suggests poor product sense for applied ML.

Offer & Negotiation

For Machine Learning Engineer offers at a large public tech company like eBay, compensation is typically a mix of base salary, annual cash bonus (often tied to company and individual performance), and RSUs that vest over multiple years (commonly 4 years with periodic vesting). The most negotiable levers are base (within level band), initial equity/RSU grant, sign-on bonus (especially if you’re leaving unvested equity), and sometimes level/title if interview feedback supports it. Ask for the exact leveling, equity vesting schedule, and bonus target, then negotiate by anchoring on competing offers and the scope/impact of the role (e.g., owning ranking models in a high-traffic surface) while staying within bands.

The Hiring Manager Screen is where most candidates lose momentum. eBay's ML hiring managers tend to probe deeply on one past project, asking you to walk through offline evaluation, online A/B results, and production constraints like latency budgets or monitoring. From what candidates report, surface-level answers about "building a model" without specific metrics (NDCG lift, fraud catch rate, serving latency) make it hard to advance past this stage.

The SQL & Data Modeling round is the other common stumble point, especially for ML engineers who've lived in notebook environments. eBay's marketplace data involves massive event tables (impressions, clicks, purchases across 2B+ listings) and interviewers expect fluent window functions and complex joins under time pressure. If you're strong on ML but rusty on SQL, that gap alone can cost you the offer, so budget real prep time there.

eBay Machine Learning Engineer Interview Questions

ML System Design (Ranking/Ads/Personalization)

Expect questions that force you to design an end-to-end ML-powered service (candidate generation + ranking, feature computation, online/offline parity) under real marketplace constraints like latency, freshness, and scale. You’ll be evaluated on tradeoffs, failure modes, and how you make the system measurable and debuggable.

Design an online search ranking service for eBay where query, user, and listing features come from both real-time events (clicks, add-to-cart) and batch aggregates, with a hard $P99 \le 120\text{ ms}$ budget for the ranking call. What is your feature computation and serving plan to guarantee offline online parity and safe fallbacks when features are missing or stale?

EasyOnline Feature Serving and Parity

Sample Answer

Most candidates default to a single feature store call and assume the same features exist offline and online, but that fails here because latency and freshness constraints force different pipelines and create silent training serving skew. You separate features into tiers, request time cheap features computed in process, precomputed per user and per listing features served from low latency stores, and expensive cross features approximated or dropped online. You enforce parity with a shared feature spec and logging of resolved feature values, plus offline backfills that replay the same resolution logic. You ship degradations: default values with missingness indicators, cached last known values with TTLs, and a rules based safe ranker when critical features fail.

Practice more ML System Design (Ranking/Ads/Personalization) questions

MLOps & Production Inference

Most candidates underestimate how much the interview probes operational ownership: deployment patterns, rollouts/rollbacks, monitoring, drift/quality checks, and incident response for models in high-traffic services. The focus is on making model updates safe while keeping iteration speed high.

Your real-time search ranking model for eBay listings is rolled out to 10% traffic and CTR is flat, but conversion rate drops 0.8% and p99 latency increases by 25 ms. What production monitoring and rollback gates do you put in place to catch this within 10 minutes, and how do you decide whether to auto-rollback?

EasyMonitoring and Rollback Gates

Sample Answer

Set SLO-based auto-rollback gates on conversion rate (primary), p99 latency, and critical error rates, then page on-call when any gate breaches for 2 to 3 consecutive windows. CTR is not a safety gate here because it can stay flat while you harm downstream conversion or buyer experience. Use near-real-time metrics keyed by experiment cell, query class, and device, with a 1 to 2 minute aggregation and guardrails like $\Delta$CVR $< -\tau$ and p99 $> p99_{baseline}+\delta$. Add a canary holdback plus a hard stop if latency or error budget burn exceeds threshold, because infra regressions can mask model quality signals.

Practice more MLOps & Production Inference questions

Machine Learning (Applied Modeling & Evaluation)

Your ability to choose models, features, and metrics for ranking/ads/recs/trust & safety is tested through practical scenarios (sparse categorical signals, cold start, imbalance, delayed labels). You’ll need to explain validation strategy, diagnose regressions, and link offline metrics to online outcomes.

You are shipping a new learning-to-rank model for eBay Search and offline NDCG@10 improves by 1.2%, but the model reduces long-tail item exposure. What offline metric set and validation slice strategy do you use to decide whether to launch, and why?

EasyRanking Evaluation

Sample Answer

You could optimize on a pure relevance metric like NDCG@10, or you could use a balanced scorecard that includes relevance plus marketplace health metrics (long-tail exposure, coverage, seller diversity). A pure relevance focus wins if the business goal is narrowly clicks on head queries, but the scorecard wins here because eBay is a two-sided marketplace and long-tail exposure is a first-order constraint. Validate by slicing on query frequency buckets (head, torso, tail), cold-start items, and seller cohorts so you can see where the gain comes from and where you are harming the ecosystem. Ship only if gains are not concentrated in one bucket while tail exposure regresses beyond a predefined guardrail.

Practice more Machine Learning (Applied Modeling & Evaluation) questions

Data Pipelines & Feature Engineering at Scale

Rather than trivia, you’re judged on whether you can build reproducible batch + streaming pipelines that keep training and serving consistent (feature stores, backfills, late data, idempotency). Candidates often struggle to articulate data contracts, lineage, and correctness guarantees.

Your search ranking model uses a feature "seller_7d_cancel_rate" computed daily in batch, but online inference reads real-time cancellations from Kafka and you see an offline AUC lift with an online CTR drop. What exact checks and pipeline changes do you make to prove and then eliminate training serving skew for this feature?

MediumTraining Serving Skew

Sample Answer

Reason through it: Walk through the logic step by step as if thinking out loud. Start by validating that the feature definitions match byte-for-byte, same numerator, denominator, time zone, and inclusion rules, then compare offline feature values versus online logged feature values for the same $(user, item, timestamp)$ samples. Next, check time travel correctness, the batch job likely uses a full day of data while streaming uses event time with late arrivals, so your training set may be leaking future cancels relative to the impression time. Then enforce a single source of truth, either an online feature store with point-in-time reads for training, or logged features from serving used directly for training. Finally add automated skew monitors, distribution drift checks, and a canary that diffs batch versus streaming aggregates on a rolling window with thresholds that page.

Practice more Data Pipelines & Feature Engineering at Scale questions

Experimentation & Metrics (A/B, Marketplace Constraints)

The bar here isn’t whether you know A/B testing terms, it’s whether you can design experiments for search/ads systems with interference, seasonality, and multiple objectives (CTR vs revenue vs buyer/seller trust). You’ll be pushed to define guardrails, success criteria, and debugging steps when results disagree.

You run an A/B test on eBay search ranking that increases CTR by 0.8% but decreases GMV per search by 0.3% and increases return rate by 0.1 pp. What is your primary success metric, what are your guardrails, and what decision do you ship?

EasyMulti-metric success criteria and guardrails

Sample Answer

This question is checking whether you can translate business intent into a measurable win condition, then stop the experiment from optimizing the wrong thing. Pick one primary metric tied to the goal (for search, typically GMV per search or buyer conversion), and treat CTR as a diagnostic, not the objective. Add guardrails like return rate, cancel rate, buyer complaints, latency, and zero-results rate, and require the primary metric to win while guardrails stay within pre-set thresholds. If the primary metric loses or a trust metric regresses, you do not ship, you segment and diagnose.

Practice more Experimentation & Metrics (A/B, Marketplace Constraints) questions

ML Coding (Python for Modeling/Signals)

Coding prompts typically mirror day-to-day applied work: compute ranking metrics, build a simple training/eval loop, implement calibration or thresholding, or process logged events into features. Clear, correct code plus reasoning about edge cases matters more than fancy algorithms.

You are evaluating an eBay search ranker and have per-query graded relevance labels $y \in \{0,1,2,3\}$ for returned items. Write a function to compute $\mathrm{NDCG@k}$ averaged over queries, handling ties and queries with all-zero relevance safely.

EasyRanking Metrics

Sample Answer

The standard move is to compute $\mathrm{DCG@k}=\sum_{i=1}^{k} \frac{2^{rel_i}-1}{\log_2(i+1)}$ and divide by the ideal $\mathrm{IDCG@k}$ per query, then average over queries. But here, the all-zero (or empty) query matters because $\mathrm{IDCG@k}=0$ makes the ratio undefined, so you must decide and implement a consistent policy (typically return $0.0$ for that query).

Python
1from __future__ import annotations
2
3import math
4from typing import Dict, Iterable, List, Sequence, Tuple
5
6
7def ndcg_at_k(
8    per_query_results: Dict[str, Sequence[Tuple[float, int]]],
9    k: int,
10    *,
11    zero_idcg_policy: str = "zero",
12) -> float:
13    """Compute mean NDCG@k across queries.
14
15    Args:
16        per_query_results: Mapping query_id -> sequence of (score, relevance).
17            The sequence can be in any order, it will be sorted by score desc.
18        k: Cutoff.
19        zero_idcg_policy: What to return for a query with IDCG@k == 0.
20            "zero" -> NDCG = 0.0, "skip" -> exclude from mean.
21
22    Returns:
23        Mean NDCG@k.
24    """
25    if k <= 0:
26        raise ValueError("k must be positive")
27
28    def dcg(rels: List[int]) -> float:
29        total = 0.0
30        for i, rel in enumerate(rels[:k], start=1):
31            gain = (2 ** rel) - 1
32            denom = math.log2(i + 1)
33            total += gain / denom
34        return total
35
36    ndcgs: List[float] = []
37
38    for qid, items in per_query_results.items():
39        # Sort by model score descending. Stable sort helps deterministic tie handling.
40        sorted_by_score = sorted(items, key=lambda x: x[0], reverse=True)
41        rels_ranked = [rel for _, rel in sorted_by_score]
42        rels_ideal = sorted([rel for _, rel in items], reverse=True)
43
44        dcg_k = dcg(rels_ranked)
45        idcg_k = dcg(rels_ideal)
46
47        if idcg_k == 0.0:
48            if zero_idcg_policy == "skip":
49                continue
50            if zero_idcg_policy == "zero":
51                ndcgs.append(0.0)
52                continue
53            raise ValueError(f"Unknown zero_idcg_policy: {zero_idcg_policy}")
54
55        ndcgs.append(dcg_k / idcg_k)
56
57    return float(sum(ndcgs) / len(ndcgs)) if ndcgs else 0.0
58
59
60if __name__ == "__main__":
61    # Example usage
62    data = {
63        "q1": [(0.9, 3), (0.2, 0), (0.1, 2)],
64        "q2": [(0.3, 0), (0.2, 0)],  # all-zero relevance
65    }
66    print(ndcg_at_k(data, k=3))
67
Practice more ML Coding (Python for Modeling/Signals) questions

eBay's question mix rewards candidates who can trace a model from whiteboard sketch all the way through a canary rollout on live marketplace traffic, then diagnose why conversion dropped even though CTR held steady. Where this gets uniquely hard is the interplay between design and operations: a search reranker prompt doesn't end when you draw the architecture, because interviewers will pivot to asking how you'd detect training/serving skew when seller-generated listing metadata shifts daily in ways a curated catalog never would. The prep mistake that costs the most time is drilling applied modeling theory in isolation while ignoring the experimentation scenarios where eBay's two-sided marketplace creates interference (ranking changes alter seller pricing behavior, contaminating your control group) and the pipeline scenarios where late-arriving click data breaks feature parity.

Sharpen your prep with eBay-style ML interview questions at datainterview.com/questions.

How to Prepare for eBay Machine Learning Engineer Interviews

Know the Business

Updated Q1 2026

Official mission

We connect people and build communities to create economic opportunity for all.

What it actually means

eBay's real mission is to facilitate global commerce by connecting millions of buyers and sellers, providing a platform for economic opportunity, and offering a vast and unique selection of goods. It aims to be the preferred destination for discovering value and unique items, particularly focusing on enthusiast buyers and high-value categories.

San Jose, CaliforniaHybrid - Flexible

Key Business Metrics

Revenue

$11B

+15% YoY

Market Cap

$39B

+26% YoY

Employees

12K

-6% YoY

Current Strategic Priorities

  • Transform through innovation, investment, and powerful tools designed to fuel sellers’ growth
  • Accelerate innovation using AI to make selling smarter, faster, and more efficient
  • Enhance trust throughout the marketplace
  • Connect the right buyers to unique inventory
  • Create more personalized, inspirational shopping experiences for all

eBay's Q4 2025 earnings show $11.1 billion in revenue (up 15% YoY), and the company's north star priorities center on AI-powered seller tools, search personalization, and trust. Their engineering team published a candid breakdown of GenAI's actual impact on developer productivity, measuring what works and what doesn't rather than chasing hype. Read it before your loop, because eBay's ML culture skews empirical, and interviewers notice when candidates can speak to that measurement-first posture.

The "why eBay" answer that falls flat is any version of "marketplace scale excites me." What separates strong candidates: connecting your experience to a problem only eBay's two-sided marketplace creates. Maybe you've dealt with ranking under inconsistent metadata (eBay's 2B+ seller-generated listings have wildly uneven quality), or you've run A/B tests where treatment effects leak across user groups the way seller behavior shifts do when eBay changes ranking. eBay also designs its own server hardware and has open-sourced those designs, so if you've optimized model serving under hardware constraints rather than just scaling up cloud instances, say so.

Try a Real Interview Question

Streaming AUC for Click Prediction

python

Implement a function that computes ROC AUC for a binary label stream given $y_i \in \{0,1\}$ and predicted score $s_i \in \mathbb{R}$, where ties in $s$ must be handled by assigning average rank. Return the AUC as a float in $[0,1]$, and return $0.5$ if all labels are the same. Your implementation must be $O(n\log n)$ time and should not use external libraries.

Python
1def roc_auc(y_true, y_score):
2    """Compute ROC AUC for binary labels with tie-aware average ranks.
3
4    Args:
5        y_true: List[int] of 0/1 labels.
6        y_score: List[float] of predicted scores.
7
8    Returns:
9        float AUC in [0, 1]. If y_true has no positive or no negative labels, return 0.5.
10    """
11    pass
12

700+ ML coding problems with a live Python executor.

Practice in the Engine

eBay's ML coding round leans toward problems where you process marketplace signals or build scoring logic in Python, not isolated algorithm exercises. The widget above gives you a feel for that flavor. Build your muscle memory with more ML-oriented coding problems at datainterview.com/coding.

Test Your Readiness

How Ready Are You for eBay Machine Learning Engineer?

1 / 10
ML System Design

Can you design a ranking system for eBay search that combines candidate generation, feature retrieval, and learning to rank, and explain how you would optimize for both buyer satisfaction and marketplace health?

This quiz covers the mix of ranking, experimentation, and production ML topics that eBay's onsite emphasizes. Identify your weak spots, then go deeper at datainterview.com/questions.

Frequently Asked Questions

How long does the eBay Machine Learning Engineer interview process take?

From first recruiter call to offer, most candidates report the eBay MLE process takes about 4 to 6 weeks. You'll typically have a recruiter screen, a technical phone screen focused on coding and ML basics, and then a virtual or onsite loop with 4 to 5 rounds. Scheduling can stretch things out, especially if the team is busy, so don't be surprised if it takes closer to 7 weeks in some cases.

What technical skills are tested in the eBay Machine Learning Engineer interview?

Python coding is non-negotiable. Beyond that, expect questions on production ML model development and deployment, model evaluation and performance analysis, batch and real-time inference pipeline design, and experimentation methodology like A/B testing. They also care about your ability to work with large, noisy datasets and integrate ML outputs into live systems. At senior levels (T24+), system design for ML becomes a major focus.

How should I tailor my resume for an eBay MLE role?

Lead with production ML experience, not just research or Kaggle projects. eBay wants people who've deployed models, monitored them in production, and debugged issues in live systems. Quantify your impact with metrics (latency improvements, revenue lift, precision/recall gains). Mention Python explicitly, and call out experience with feature engineering, experimentation design, or real-time inference if you have it. If you're applying at T25 or T26, highlight cross-functional leadership and end-to-end ownership of ML systems.

What is the total compensation for eBay Machine Learning Engineers by level?

Here's what I've seen in the data. T22 (Junior, 0-2 years): total comp around $125K with a range of $95K to $160K. T23 (Mid, 2-5 years): about $165K, ranging $155K to $190K. T24 (Senior, 6-10 years): roughly $151K to $173K. T25 (Staff, 8-15 years): this is where it jumps, with total comp averaging $380K and a range of $300K to $520K. T26 (Principal): averages $276K but can reach $526K at the top end. Base salaries range from about $105K at junior to $254K at principal.

How do I prepare for the behavioral interview at eBay for a Machine Learning Engineer position?

eBay's core values are Customer Focus, Innovate Boldly, Be For Everyone, Deliver With Impact, and Act With Integrity. Prepare stories that map to these. They want to hear about times you shipped something that directly helped users, took a bold technical bet, collaborated across diverse teams, or made a tough ethical call. I'd have at least 6 to 8 stories ready that you can adapt to different prompts. Focus on cross-functional collaboration since MLE work at eBay involves tight partnership with engineering, data, and product teams.

How hard are the coding and SQL questions in the eBay MLE interview?

The coding questions are practical software engineering style, not pure competitive programming. Think data structures, algorithms, and writing clean Python that could go into production. Difficulty is roughly medium, occasionally medium-hard for senior levels. SQL comes up more at junior and mid levels (T22, T23) as part of data skills assessment. You should be comfortable with window functions, joins, and aggregations on messy datasets. Practice at datainterview.com/coding to get a feel for the style.

What ML and statistics concepts should I study for the eBay Machine Learning Engineer interview?

Bias-variance tradeoff comes up constantly, across every level. You also need solid understanding of model evaluation metrics (precision, recall, AUC, calibration), regularization techniques, feature engineering, and common model families (tree-based models, linear models, neural nets). At T23 and above, expect applied case studies where you'd improve a ranking or classification system. For staff and principal levels, they'll test your ability to reason about end-to-end ML systems, training/inference architecture, and production tradeoffs. Check datainterview.com/questions for ML concept practice.

What's the best format for answering behavioral questions at eBay?

Use the STAR format (Situation, Task, Action, Result) but keep it tight. I've seen candidates ramble for 10 minutes on a single story. Aim for 2 to 3 minutes max. Start with a one-sentence setup, spend most of your time on what you specifically did (not the team), and end with a measurable result. At eBay, they value impact and integrity, so always close with what changed because of your work and any lessons learned.

What happens during the eBay Machine Learning Engineer onsite interview?

The onsite (often virtual now) typically has 4 to 5 rounds. Expect at least one pure coding round in Python, one or two ML-focused rounds covering fundamentals and applied problem solving, a system design round (especially at T24 and above where you'll design production ML pipelines), and a behavioral round. At staff and principal levels, the system design round gets much deeper, covering data pipelines, feature management, training/inference architecture, and latency/reliability tradeoffs. There's usually a lunch or informal chat that isn't scored but still matters for culture fit.

What metrics and business concepts should I know for the eBay MLE interview?

eBay is a two-sided marketplace connecting buyers and sellers, generating $11.1B in revenue. You should understand marketplace metrics like GMV (gross merchandise volume), conversion rate, search relevance, and buyer/seller engagement. Know how to think about A/B testing in a marketplace context where treating one side affects the other. They also care about experimentation methodology, so be ready to discuss how you'd design an experiment, pick success metrics, and handle interference effects. Tying your ML solutions back to business outcomes will set you apart.

What education do I need for an eBay Machine Learning Engineer role?

A BS in Computer Science, Engineering, Mathematics, Statistics, or a related field is the baseline. An MS is preferred at most levels, and a PhD is common (though not required) for senior ML roles. That said, eBay explicitly notes that equivalent practical experience is acceptable. If you don't have a graduate degree but have shipped production ML systems and can demonstrate depth in your interviews, you're still a strong candidate. Your portfolio of real work matters more than the degree.

What are common mistakes candidates make in the eBay Machine Learning Engineer interview?

The biggest one I see is treating it like a pure software engineering interview and neglecting the ML depth. eBay wants you to reason about model selection, evaluation pitfalls, data leakage, and production deployment, not just write clean code. Another common mistake is being too theoretical. They care about practical tradeoffs: why would you choose batch over real-time inference, how would you debug a model regression in production. Finally, at senior levels, candidates often underestimate the system design round. Practice designing end-to-end ML systems with real constraints like latency, scale, and data freshness.

Dan Lee's profile image

Written by

Dan Lee

Data & AI Lead

Dan is a seasoned data scientist and ML coach with 10+ years of experience at Google, PayPal, and startups. He has helped candidates land top-paying roles and offers personalized guidance to accelerate your data career.

Connect on LinkedIn