Uber Data Engineer Guide (2026): Job, Salary & Interviews

Q: How long does the Uber Data Engineer interview process take?

From first recruiter call to offer, expect roughly 4 to 6 weeks. You'll start with a recruiter screen, then move to a technical phone screen focused on SQL and coding. After that comes the onsite (or virtual onsite), which typically includes 4 to 5 rounds in a single day. Scheduling can stretch things out, especially if the team is busy, so don't be surprised if it takes closer to 7 weeks in some cases.

Q: What technical skills are tested in the Uber Data Engineer interview?

Uber tests you hard on data pipeline design, dimensional data modeling, and data warehousing. You should be comfortable building production-quality ETL pipelines and working with distributed data systems, including logging, storage, data quality, and monitoring. Real-time data processing and scalability engineering come up frequently. On the coding side, SQL is non-negotiable (advanced level, including window functions), and you'll also need solid Python skills. Java and Scala knowledge is a plus, especially for pipeline work.

Q: How should I tailor my resume for an Uber Data Engineer role?

Lead with your data pipeline and ETL experience. Uber cares about scale, so quantify everything: how many records your pipelines processed, latency improvements you achieved, how many downstream consumers relied on your data. Call out specific technologies for distributed systems, real-time processing, and data warehousing. If you've done dimensional modeling or built monitoring/data quality frameworks, put that front and center. Keep it to one page if you have under 10 years of experience, and mirror the language from Uber's job description.

Q: What is the total compensation for Uber Data Engineer roles?

Uber pays competitively for data engineers in San Francisco. For a mid-level Data Engineer (L4), total compensation typically falls in the $200K to $280K range including base, bonus, and RSUs. Senior Data Engineers (L5) can expect $280K to $380K total comp. Staff level (L5b/L6) pushes well above $400K. RSUs vest over four years and make up a significant chunk, so pay attention to the stock component when evaluating your offer.

Q: How do I prepare for the Uber Data Engineer behavioral interview?

Uber's culture emphasizes integrity, customer obsession, and doing the right thing. Prepare stories that show you making tough tradeoffs, pushing back on bad ideas respectfully, and thinking about the end user. They want to see that you can operate with a global mindset while solving local problems. Have 5 to 6 strong stories ready that cover conflict resolution, technical leadership, and times you improved something without being asked. I've seen candidates get rejected despite strong technical rounds because they couldn't articulate how they collaborate across teams.

Q: How hard are the SQL questions in the Uber Data Engineer interview?

They're genuinely hard. Expect advanced SQL with window functions, CTEs, self-joins, and multi-step aggregations. You won't get away with just knowing SELECT and GROUP BY. Uber's SQL questions often involve real-world scenarios like calculating rider metrics, driver utilization, or trip-level analytics. Practice writing complex queries from scratch without an IDE helping you. You can find similar difficulty questions at datainterview.com/questions to get a feel for the level they expect.

Q: What happens during the Uber Data Engineer onsite interview?

The onsite typically has 4 to 5 rounds spread across one day. You'll face a SQL deep-dive round, a coding round (usually Python), a system design round focused on data pipeline architecture, and at least one behavioral round. The system design round is where many candidates struggle. You'll be asked to design end-to-end data systems covering ingestion, storage, transformation, and serving layers. Some loops also include a data modeling round where you design a dimensional schema from scratch.

Q: What metrics and business concepts should I know for the Uber Data Engineer interview?

Understand Uber's two-sided marketplace. Know metrics like trip completion rate, surge pricing mechanics, driver utilization, rider retention, and ETA accuracy. Think about how these metrics flow through data pipelines and what data quality issues could arise. Uber generates $52B in revenue, so the data volumes are massive. Being able to talk about how you'd model ride data, payment events, or driver earnings at that scale shows you understand the business, not just the tech.

Q: Are ML or statistics concepts tested in the Uber Data Engineer interview?

Data Engineer roles at Uber are more engineering-focused than ML-focused. You probably won't be asked to derive a gradient descent algorithm. But you should understand how your pipelines feed ML models and analytics. Concepts like A/B testing data pipelines, feature engineering at scale, and basic statistical awareness (distributions, sampling, aggregation bias) can come up in conversation. If you're interviewing for a more senior role, expect questions about how you'd build data infrastructure that supports ML workflows.

Q: What format should I use to answer Uber behavioral interview questions?

Use the STAR format (Situation, Task, Action, Result) but keep it tight. Uber interviewers don't want a 10-minute monologue. Spend about 20% on setup and 60% on what you actually did. Always end with a measurable result. For example, don't say 'the pipeline was faster.' Say 'latency dropped from 45 minutes to 8 minutes, which unblocked the pricing team's daily refresh.' Tie your answers back to Uber's values when it feels natural, especially customer obsession and integrity.

Uber Data Engineer at a Glance

Interview Rounds

7 rounds

Difficulty

SQL (advanced, including window functions) Python Java ScalaSalesData WarehousingETLBig DataData GovernanceAnalytics

From hundreds of mock interviews, here's the pattern that trips up Uber Data Engineer candidates: they prep like it's a SQL-heavy analytics role and get blindsided by the software engineering bar. Uber's DE org sits closer to platform engineering than to BI. You're expected to write production-grade PySpark jobs that deduplicate billions of Kafka trip events, then own the on-call pager for the pipelines those jobs feed.

Uber Data Engineer Role

Primary Focus

SalesData WarehousingETLBig DataData GovernanceAnalytics

Skill Profile

Math & Stats

Medium

A Bachelor's or Master's degree in Computer Science or a related field is required, implying a foundational understanding of mathematical and statistical concepts relevant to data analysis and engineering. While not explicitly focused on advanced statistical modeling, a solid grasp of data distributions and analytical principles is expected for structuring data for business insights.

Software Eng

Expert

This role demands extensive software engineering prowess, including technical leadership in architecting, implementing, testing, releasing, and monitoring data systems. Emphasis is placed on engineering best practices, producing high-quality code, documentation, and developing scripts and tools. The expectation for a 'Staff Engineer level or above' indicates a need for deep expertise in sustainable engineering and system design.

Data & SQL

Expert

This is a core competency, requiring extensive experience in designing and managing data pipelines, dimensional data models, and data warehouses. The role involves building and maintaining pipelines that process billions of events daily, ensuring scalability, reliability, and efficiency for real-time data processing and decision-making. Expertise in ETL, data quality, and monitoring for distributed data systems is paramount.

Machine Learning

Medium

While not primarily an ML model development role, Data Engineers at Uber are crucial architects of the data ecosystem that enables ML-driven solutions like fraud detection, dynamic pricing, and driver-rider matching. They need to understand the data requirements for machine learning models and build pipelines that serve these needs effectively, implying a strong understanding of ML data workflows.

Applied AI

Low

There is no explicit mention of modern AI or GenAI as a direct skill requirement in the provided sources for a Data Engineer role. While Uber likely leverages these technologies, the Data Engineer's primary focus, based on the sources, is on foundational data infrastructure. This is a conservative estimate, as the field is evolving rapidly by 2026.

Infra & Cloud

High

The role requires significant experience with distributed data systems for logging, storage, ETL, and monitoring. Familiarity with MPP databases (e.g., AWS Redshift, Teradata) and NoSQL databases like Cassandra is essential. Data Engineers are expected to handle petabytes of data, design for scalability, and understand trade-offs between consistency, availability, and latency in a global, real-time platform.

Business

High

A strong emphasis is placed on identifying and solving engineering and business problems with little guidance, seeing the 'big picture,' and driving alignment on strategically important improvements. The role requires building strong relationships, collaborating meaningfully with various stakeholders, and demonstrating excellent judgment and responsibility, indicative of high business acumen and leadership.

Viz & Comms

Medium

Excellent written and verbal communication skills are explicitly required, including the ability to write detailed technical documents and collaborate with cross-functional teams. The role involves structuring data for 'intuitive analytics and business insights,' suggesting an understanding of how data is consumed and presented, though direct data visualization might be handled by other roles.

What You Need

Designing and managing data pipelines
Dimensional data modeling
Data warehousing
Building and deploying production-quality ETL pipelines
Working with end-to-end distributed data systems (logging, storage, data quality, monitoring)
Real-time data processing
Scalability engineering
Technical leadership
Problem-solving (engineering and business)
Excellent written and verbal communication
Understanding of consistency, availability, and latency trade-offs

Languages

SQL (advanced, including window functions)PythonJavaScala

Tools & Technologies

HadoopHiveVerticaMPP databases (e.g., AWS Redshift, Teradata)CassandraApache Spark

Want to ace the interview?

Practice with real questions.

Start Mock Interview

At Uber, a Data Engineer builds and operates the pipelines behind the marketplace's nervous system. Trip events flow through Kafka, driver supply signals land in Hive, and Eats order fulfillment metrics refresh for downstream data science teams. Success after year one means product teams trust your tables enough to build surge pricing and ETA models on them without a second thought.

A Typical Week

A Week in the Life of a Uber Data Engineer

Typical L5 workweek · Uber

Weekly time split

Coding — 30%Infrastructure — 25%Meetings — 15%Writing — 10%Break — 10%Analysis — 5%Research — 5%

Culture notes

Uber operates at high velocity with massive data scale — expect to own pipelines that hundreds of teams depend on, and the pager can be unforgiving during your on-call rotation.
Uber requires three days per week in the San Francisco or Sunnyvale office (Tuesday, Wednesday, Thursday), with Monday and Friday as flexible remote days.

The thing that catches most candidates off guard is how much infrastructure work rivals pure coding. You're not just writing Spark jobs. You're cleaning up orphaned HDFS partitions, writing null-rate checks that page the on-call, and doing storage hygiene on shared Hadoop clusters before quota reviews hit.

Projects & Impact Areas

The core work revolves around Uber's marketplace signals: real-time trip deduplication feeding the trips fact table, driver incentive calculations that consolidate six upstream sources, and the dimensional models powering surge pricing decisions. Data quality isn't a side quest you tackle when things break. You'll contribute to internal tooling like schema registries and anomaly detection on data freshness, and Uber expects you to treat these systems as first-class products.

Skills & What's Expected

SQL is necessary but nowhere near sufficient. Uber's stack (Spark, Kafka, Hive, HDFS, Presto, Cassandra) means you need to reason about exactly-once semantics and executor OOM errors caused by data skew on keys like city_id in São Paulo. Business acumen scores surprisingly high, too. You're expected to explain why a driver earnings fact table exists and how it affects marketplace balance, not just demonstrate that you can build it.

Levels & Career Growth

From what candidates report on Blind, there's real confusion between L5ii and L6 offers for senior/staff candidates. Clarify your target level with the recruiter before the loop starts, because comp bands and interview expectations shift meaningfully between them. The blocker for L5-to-L6 promotion at Uber isn't technical skill; it's scope. Staff DEs drive alignment on platform-wide migrations (like evaluating Apache Iceberg as a replacement for Hive/ORC table formats), and if your impact stays within one team's pipelines, you'll stay at L5.

Work Culture

Uber mandates Tuesday, Wednesday, and Thursday in the San Francisco or Sunnyvale office, with Monday and Friday as flexible remote days. Remote-only arrangements for engineering roles are rare. On-call rotations are real, and the pager can be unforgiving when hundreds of downstream teams depend on your tables. The culture rewards ownership above all: you ship it, you monitor it, you fix it.

Uber Data Engineer Compensation

Uber's RSU grants vest over four years, often on a 25% annual schedule. That even cadence means no back-loading surprises, but it also means your initial grant size sets the trajectory for your total comp across the entire vesting window. Negotiate the RSU number hard before you sign, because there's no structural acceleration later to bail out a weak initial offer.

Both base salary and RSU grants are flexible levers at Uber, and a sign-on bonus can sweeten things further if you're holding a competing offer. Where most candidates leave money on the table: they fixate on one component instead of pushing across all three simultaneously. Come prepared to articulate your specific value (pipeline scale you've operated at, systems you've owned end-to-end) and let any competing offers do the rest of the talking.

Uber Data Engineer Interview Process

7 rounds·~5 weeks end to end

Initial Screen

1 round

Recruiter Screen

30mPhone

This initial 30-minute phone call will cover your background, career aspirations, and why you're interested in Uber. You'll also discuss the specific Data Engineer role, team alignment, and compensation expectations.

behavioralgeneral

Tips for this round

Research Uber's mission and recent projects to show genuine interest.
Prepare a concise summary of your relevant experience and career goals.
Clearly articulate why you are a good fit for a Data Engineer role at Uber.
Be ready to discuss your salary expectations and current compensation.
Highlight any experience with real-time data processing or large-scale systems.

Technical Assessment

1 round

Coding & Algorithms

60mVideo Call

Expect a 60-minute live coding session focusing on data structures and algorithms. You'll be asked to solve one or two datainterview.com/coding-style problems, demonstrating your problem-solving abilities and coding proficiency.

algorithmsdata_structuresengineering

Tips for this round

Practice datainterview.com/coding medium and hard problems, especially those involving arrays, strings, trees, and graphs.
Choose a programming language you are most proficient in and can write runnable code quickly.
Think out loud, explaining your approach, thought process, and any trade-offs considered.
Write clean, well-structured code and test it with various edge cases.
Be prepared to discuss time and space complexity of your solution.

Onsite

5 rounds

System Design

60mLive

You'll be challenged to design a scalable data system for a real-world Uber scenario, such as processing millions of concurrent events. This 60-minute session will assess your ability to architect robust, high-throughput data pipelines and infrastructure.

system_designdata_engineeringdata_pipelinecloud_infrastructure

Tips for this round

Focus on key data engineering principles like scalability, reliability, fault tolerance, and real-time processing.
Discuss relevant technologies like Kafka, Spark, Flink, Hadoop, and various database types (NoSQL, OLAP).
Clearly define requirements and constraints before diving into the design details.
Explain trade-offs for different architectural choices and justify your decisions.
Consider data modeling, storage solutions, and monitoring aspects of your design.

Coding & Algorithms

60mLive

This 60-minute live coding interview will present more complex algorithmic challenges than the phone screen. You'll need to write efficient, bug-free code and demonstrate strong problem-solving skills, often involving data structures relevant to large-scale data.

algorithmsdata_structuresengineering

Tips for this round

Practice advanced datainterview.com/coding problems, focusing on dynamic programming, graph algorithms, and complex data structures.
Optimize your solutions for both time and space complexity, explaining your reasoning.
Communicate your thought process clearly, especially when encountering difficulties.
Thoroughly test your code with various inputs, including edge cases and large datasets.
Be prepared for follow-up questions that ask for alternative solutions or further optimizations.

SQL & Data Modeling

60mLive

The interviewer will probe your expertise in SQL and data modeling during this 60-minute session. You'll likely be asked to write complex SQL queries, design database schemas, and discuss data warehousing concepts relevant to Uber's massive datasets.

databasedata_modelingdata_engineering

Tips for this round

Master advanced SQL concepts including window functions, CTEs, complex joins, and aggregation.
Understand different data modeling techniques (e.g., star schema, snowflake schema) and their applications.
Be able to design a database schema for a given business problem, discussing normalization and denormalization.
Discuss strategies for query optimization, indexing, and handling large volumes of data.
Familiarize yourself with distributed database concepts and data warehousing principles.

Behavioral

60mLive

This 60-minute conversation with a hiring manager or senior engineer will delve into your past projects, collaboration experiences, and leadership potential. You'll be expected to articulate your contributions, challenges faced, and lessons learned, aligning with Uber's 'hustle' culture.

behavioralgeneral

Tips for this round

Prepare several stories using the STAR method (Situation, Task, Action, Result) for common behavioral questions.
Highlight instances where you demonstrated initiative, problem-solving, and business impact.
Emphasize your ability to collaborate effectively with cross-functional teams.
Research Uber's cultural values and be prepared to show how you embody them, particularly the 'hustle' mindset.
Be ready to discuss project failures, what you learned, and how you would approach them differently.

Bar Raiser

60mLive

This is Uber's version of a final culture and technical depth check, typically lasting 60 minutes. An interviewer from a different team will assess your overall fit, technical rigor, and potential to raise the bar for the organization, often through deep dives into your experience and challenging scenarios.

behavioralgeneral

Tips to Stand Out

Master datainterview.com/coding. Uber emphasizes runnable code in its technical rounds, so extensive practice with datainterview.com/coding-style problems, especially medium to hard difficulty, is crucial. Focus on understanding underlying data structures and algorithms.
Prioritize System Design. Data Engineers at Uber build systems for massive scale and real-time processing. Be prepared to design robust, scalable data pipelines, discussing trade-offs and relevant technologies like Kafka, Spark, and distributed databases.
Showcase 'Hustle' and Business Impact. Uber values candidates who are proactive and can demonstrate how their work drives business results. Frame your experiences to highlight initiative, problem-solving, and the tangible impact of your projects.
Deep Dive into SQL and Data Modeling. As a Data Engineer, your ability to write complex SQL queries, design efficient database schemas, and understand data warehousing concepts will be thoroughly tested. Practice advanced SQL and schema design.
Prepare Behavioral Stories. Use the STAR method to prepare detailed stories about your past experiences, focusing on collaboration, leadership, overcoming challenges, and learning from failures. Align these stories with Uber's culture.
Leverage Referrals. A strong referral can significantly boost your chances, potentially even allowing you to bypass the technical phone screen. Network and seek out current Uber employees.
Understand Uber's Scale. Throughout your interviews, demonstrate an awareness of the challenges and considerations involved in handling petabytes of data and billions of events daily, as this is central to Uber's data ecosystem.

Common Reasons Candidates Don't Pass

✗Inability to write runnable code. Candidates often fail by providing pseudocode or incomplete solutions that don't execute correctly, indicating a lack of practical coding proficiency.
✗Weak system design for scale. Many struggle to design data systems that can handle Uber's immense scale (real-time processing, petabytes of data), failing to consider critical aspects like fault tolerance, latency, and throughput.
✗Lack of business impact or 'hustle'. Candidates who only focus on technical details without connecting their work to business outcomes or demonstrating a proactive, results-oriented mindset may not align with Uber's cultural expectations.
✗Insufficient SQL and data modeling skills. For a Data Engineer role, a shallow understanding of advanced SQL, database design principles, and data warehousing concepts is a common reason for rejection.
✗Poor communication during technical rounds. Failing to articulate thought processes, ask clarifying questions, or explain design choices clearly can lead interviewers to believe the candidate lacks problem-solving clarity.
✗Inadequate behavioral responses. Generic or unprepared answers to behavioral questions that don't highlight specific achievements, collaboration skills, or alignment with Uber's values can be a red flag.

Offer & Negotiation

Uber's compensation packages for Data Engineers typically include a competitive base salary, an annual performance bonus, and Restricted Stock Units (RSUs) that vest over a four-year period, often with a 25% annual vesting schedule. When negotiating, focus on increasing the base salary or the RSU grant, as these are often the most flexible components. A sign-on bonus can also be a negotiable lever, especially if you have competing offers. Be prepared to articulate your value and leverage any other offers you may have to secure a more favorable package.

Budget 5 weeks from recruiter screen to offer. The final round, Uber's Bar Raiser, is run by a senior engineer from a completely different team who pressure-tests your technical depth and cultural fit against Uber's company-wide standards. From what candidates report, this round carries outsized weight in the final decision, so don't treat it as a casual chat after surviving the technical gauntlet.

The most common rejection reason isn't a weak system design or a shaky star schema. It's submitting code that doesn't actually execute. Uber's loop puts real emphasis on runnable, edge-case-tested solutions, and interviewers penalize pseudocode heavily. Pair that with the fact that Uber's on-call culture means DEs ship production code in Go, Java, and Python daily, and you can see why they care. Sharpen that muscle at datainterview.com/coding with problems that mirror Uber's flavor: graph traversals for geospatial routing, streaming aggregation structures, and log parsing.

Uber Data Engineer Interview Questions

Data Pipeline & Platform Engineering

Expect questions that force you to design reliable batch and streaming pipelines end-to-end (ingestion, transformation, backfills, SLAs). Candidates often stumble on operational details like idempotency, late data, schema evolution, and data quality gates.

Your Spark job builds a daily Sales fact table for Uber Eats from Kafka order events, and retries sometimes double-count revenue. How do you make the pipeline idempotent across replays and backfills while keeping a 2 hour SLA?

MediumIdempotency and Exactly-Once Semantics

Sample Answer

Most candidates default to just running a daily overwrite or using at-least-once writes, but that fails here because retries and late events create duplicates and silent metric inflation. You need a deterministic primary key (for example, order_id plus event_type plus event_version) and a merge-based sink that upserts on that key. Add a watermark and a bounded late-data window, then run periodic reconciliation for stragglers outside the window. For backfills, reprocess by partition range and keep the same upsert key so replays converge.

A new field promo_funding_source is added to the order event schema, and downstream Hive tables and Redshift aggregates for Sales reporting start failing intermittently. What schema evolution strategy and validation gates do you put in place so producers can ship safely without breaking consumers?

EasySchema Evolution and Data Contracts

Sample Answer

Use schema registry with backward compatible evolution, and enforce contract checks in CI plus runtime validation at ingestion. Backward compatibility (adding nullable fields, not renaming or changing types) prevents older consumers from breaking when new fields appear. CI gates validate proposed schema changes against compatibility rules, and ingestion rejects or quarantines events that violate the registered schema. You also version the curated tables, publish deprecation timelines, and add canary consumers to catch breakage before broad rollout.

You need a near real-time metric: gross bookings by city for Uber rides, updated within 5 minutes, but events can arrive up to 2 hours late and cancellations can happen after trip completion. How do you design the aggregation so dashboards stay stable and you can still correct history?

HardLate Data, Watermarks, and Mutable Facts

Practice more Data Pipeline & Platform Engineering questions

System Design for Distributed Data Systems

Most candidates underestimate how much your design must balance latency, consistency, and cost at Uber scale. You’ll be evaluated on clear component boundaries, failure modes, and how you’d monitor and evolve the system over time.

Design an end to end pipeline that produces an hourly Sales Ops dashboard for Uber Eats showing gross bookings, net revenue, refunds, and promo spend by city and merchant, with updates within 5 minutes of the hour. Specify ingestion, storage, compute, dimensional model, and how you guarantee idempotency and backfills when late events arrive.

EasyStreaming ETL and Warehouse Modeling

Sample Answer

Use a Lambda style design, a streaming path for low latency aggregates plus a batch path that recomputes authoritative hourly facts and reconciles late data. Stream orders, refunds, and promos into a durable log, write curated tables with stable business keys, then serve the dashboard from an hourly fact table joined to city, merchant, and time dimensions. Idempotency comes from deterministic event IDs and merge semantics, late events trigger reprocessing by hour partitions, and you monitor freshness, duplicate rate, and reconciliation deltas between stream and batch outputs.

Uber wants near real time fraud and Sales anomaly monitoring on trips and Eats orders, you need to compute rolling $15$ minute metrics per merchant and city and alert within 60 seconds while also keeping a governed warehouse table for analytics. Design the distributed data system, call out consistency and latency trade offs, failure modes, and how you prevent double counting during retries and partial outages.

HardReal-time Metrics and Exactly-once Semantics

Practice more System Design for Distributed Data Systems questions

Coding & Algorithms (DE-leaning)

The bar here isn't whether you know obscure tricks, it's whether you can write correct, efficient code under interview constraints. Expect data-engineering flavored problems (parsing, aggregation, streaming-like logic) with solid complexity reasoning and clean tests.

You ingest Uber Eats order events as (order_id, ts, status) where status is one of CREATED, ACCEPTED, PICKED_UP, DELIVERED, CANCELED; return the final status per order_id and the final timestamp. If two events for the same order_id share the same ts, the later one in the input list wins.

EasyEvent Aggregation

Sample Answer

You could sort all events by (order_id, ts) and take the last one per order, or do a single pass hash aggregation that keeps the best-so-far event per order. Sorting is simpler to reason about, but it is $O(n \log n)$ and costs memory for rearrangement. The single pass wins here because you can compare timestamps in $O(1)$ per event and handle tie break by input position, so total time is $O(n)$ with $O(k)$ memory for $k$ orders.

Python

1from __future__ import annotations
2
3from dataclasses import dataclass
4from typing import Dict, Iterable, List, Tuple
5
6
7@dataclass(frozen=True)
8class Event:
9    order_id: str
10    ts: int
11    status: str
12
13
14def final_status_per_order(events: Iterable[Tuple[str, int, str]]) -> Dict[str, Tuple[int, str]]:
15    """Return {order_id: (final_ts, final_status)}.
16
17    Tie break: if ts is equal, later event in the input wins.
18    """
19    best: Dict[str, Tuple[int, int, str]] = {}
20    # Store (ts, index, status) so that (ts, index) defines total order.
21    for idx, (order_id, ts, status) in enumerate(events):
22        if order_id not in best:
23            best[order_id] = (ts, idx, status)
24            continue
25
26        best_ts, best_idx, _ = best[order_id]
27        # Later timestamp wins; if tied, later index wins.
28        if ts > best_ts or (ts == best_ts and idx > best_idx):
29            best[order_id] = (ts, idx, status)
30
31    return {oid: (ts, status) for oid, (ts, _idx, status) in best.items()}
32
33
34if __name__ == "__main__":
35    sample = [
36        ("o1", 10, "CREATED"),
37        ("o1", 12, "ACCEPTED"),
38        ("o2", 8, "CREATED"),
39        ("o1", 12, "CANCELED"),  # same ts as ACCEPTED, later in input so wins
40        ("o2", 9, "DELIVERED"),
41    ]
42    out = final_status_per_order(sample)
43    assert out["o1"] == (12, "CANCELED")
44    assert out["o2"] == (9, "DELIVERED")
45    print(out)
46

Given a stream of Trip events (driver_id, ts, event_type) where event_type is START or END, compute for each driver the maximum number of concurrent active trips at any moment; assume events can arrive out of order, and if START and END share the same ts, END happens first. Return a dict driver_id to max_concurrency.

MediumSweep Line and Ordering

Sample Answer

Reason through it: you need a per-driver timeline of +1 for START and -1 for END, then you want the maximum prefix sum over time. Because events arrive out of order, you cannot just maintain a running counter without reordering, you must sort within each driver by (ts, type_order) where END precedes START at the same ts. After sorting, scan, update current_active, track max_active. That scan is linear in events per driver, the dominant cost is sorting, which is $\sum_d O(n_d \log n_d)$.

Python

1from __future__ import annotations
2
3from collections import defaultdict
4from typing import DefaultDict, Dict, Iterable, List, Tuple
5
6
7def max_concurrent_trips_per_driver(
8    events: Iterable[Tuple[str, int, str]]
9) -> Dict[str, int]:
10    """Compute max number of concurrent active trips per driver.
11
12    event_type: 'START' or 'END'
13    Ordering rule: if START and END share the same ts, END happens first.
14    """
15    per_driver: DefaultDict[str, List[Tuple[int, str]]] = defaultdict(list)
16    for driver_id, ts, event_type in events:
17        if event_type not in ("START", "END"):
18            raise ValueError(f"Unknown event_type: {event_type}")
19        per_driver[driver_id].append((ts, event_type))
20
21    result: Dict[str, int] = {}
22
23    # Define a stable total order: (ts, rank) where END rank 0, START rank 1.
24    def sort_key(item: Tuple[int, str]) -> Tuple[int, int]:
25        ts, event_type = item
26        rank = 0 if event_type == "END" else 1
27        return (ts, rank)
28
29    for driver_id, evs in per_driver.items():
30        evs.sort(key=sort_key)
31        active = 0
32        max_active = 0
33        for _ts, event_type in evs:
34            if event_type == "END":
35                active -= 1
36            else:
37                active += 1
38                if active > max_active:
39                    max_active = active
40        # If data is imperfect, active could go negative; clamp is a product choice.
41        result[driver_id] = max_active
42
43    return result
44
45
46if __name__ == "__main__":
47    sample = [
48        ("d1", 5, "START"),
49        ("d1", 7, "START"),
50        ("d1", 7, "END"),  # END at same ts should apply first
51        ("d1", 9, "END"),
52        ("d2", 1, "START"),
53        ("d2", 2, "END"),
54    ]
55    out = max_concurrent_trips_per_driver(sample)
56    assert out["d1"] == 1
57    assert out["d2"] == 1
58    print(out)
59

You are building a sales analytics rollup and receive a list of updates (merchant_id, day, delta_sales) that can include duplicates; return the top $k$ merchants by total sales over a given day range [start_day, end_day], breaking ties by smaller merchant_id. Do it in better than $O(m \log m)$ where $m$ is number of distinct merchants, assuming $k \ll m$.

HardTop-K Aggregation

Practice more Coding & Algorithms (DE-leaning) questions

SQL Querying & Optimization

Your ability to express complex analytics with joins, windows, and careful filtering is a primary signal in the DE loop. Strong answers anticipate edge cases (duplicates, slowly changing entities) and show awareness of performance implications in MPP warehouses.

You have table fact_trip(trip_id, rider_id, city_id, request_ts, trip_date, status, fare_usd). For each city and trip_date, return completed trips, unique riders, and completion_rate (completed requests divided by all requests), with completion_rate as a decimal and safe for days with zero requests.

EasyAggregations and Filtering

Sample Answer

Reason through it: Filter nothing upfront, you need both completed and non-completed requests in the denominator. Aggregate by city_id and trip_date, compute total_requests as COUNT(*), completed_trips as SUM over status. Unique riders is COUNT(DISTINCT rider_id) across all requests for that day. Completion rate is completed_trips divided by total_requests, guard division by zero with NULLIF so you do not throw or lie.

SQL

1/* Daily city-level completion funnel metrics */
2SELECT
3  city_id,
4  trip_date,
5  /* All requests, regardless of status */
6  COUNT(*) AS total_requests,
7  /* Completed requests only */
8  SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) AS completed_trips,
9  /* Unique riders who made a request that day */
10  COUNT(DISTINCT rider_id) AS unique_riders,
11  /* Safe decimal rate */
12  (SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) * 1.0)
13    / NULLIF(COUNT(*), 0) AS completion_rate
14FROM fact_trip
15GROUP BY
16  city_id,
17  trip_date
18ORDER BY
19  trip_date,
20  city_id;

You have raw_event_trip_status(trip_id, event_ts, status, ingestion_ts) where the same trip_id can have duplicate statuses and late arrivals. Produce a snapshot table with exactly one latest status row per trip_id as of a given cutoff timestamp, and explain one change you would make to reduce the scan cost in an MPP warehouse.

MediumWindow Functions and Deduplication

Sample Answer

Start with what the interviewer is really testing: "This question is checking whether you can" write deterministic dedup logic under duplicates and late events, and also recognize what makes warehouses slow. You rank rows per trip_id by event_ts, then ingestion_ts as a tie-breaker, filter to rows at or before the cutoff, and keep rank 1. Most people fail by forgetting a tie-breaker, which makes results non-repeatable. For scan reduction, partition or cluster on event date (derived from event_ts) and trip_id so the cutoff and per-key windowing prune data.

SQL

1/* Latest known trip status as of a cutoff timestamp.
2   Assumptions:
3   - event_ts is the business event time
4   - ingestion_ts is used only to break ties deterministically
5   - :cutoff_ts is a bind parameter (timestamp)
6*/
7WITH filtered AS (
8  SELECT
9    trip_id,
10    event_ts,
11    status,
12    ingestion_ts
13  FROM raw_event_trip_status
14  WHERE event_ts <= :cutoff_ts
15), ranked AS (
16  SELECT
17    trip_id,
18    event_ts,
19    status,
20    ingestion_ts,
21    ROW_NUMBER() OVER (
22      PARTITION BY trip_id
23      ORDER BY event_ts DESC, ingestion_ts DESC
24    ) AS rn
25  FROM filtered
26)
27SELECT
28  trip_id,
29  event_ts AS latest_event_ts,
30  status AS latest_status,
31  ingestion_ts AS latest_ingestion_ts
32FROM ranked
33WHERE rn = 1;

You have fact_trip(trip_id, driver_id, city_id, request_ts, fare_usd, is_airport_pickup) and dim_driver(driver_id, effective_from_ts, effective_to_ts, status). Find the top 3 drivers per city for the last 7 days by airport revenue, but only counting trips where the driver status was 'active' at request_ts, and return driver_id, city_id, airport_revenue, and rank.

HardSCD Join and Ranking Optimization

Practice more SQL Querying & Optimization questions

Dimensional Modeling & Warehousing

Rather than raw SQL skill, you’re judged on how you structure facts, dimensions, and metrics so downstream analytics stays stable. Watch for prompts around SCD types, grain definition, and metric consistency across Sales/Analytics consumers.

Uber Eats wants a star schema for Sales analytics with metrics like gross_bookings, net_revenue, promo_spend, and completed_orders. Define the fact table grain and name 5 dimensions you would include, and explain one metric that must not be stored as a fact column.

EasyStar Schema and Grain

Sample Answer

This question is checking whether you can lock the grain before you model anything, and avoid mixing additive facts with derived ratios. Your fact grain should be something like order line or order, not "day", otherwise you cannot safely slice by store, eater, courier, or promo. Dimensions usually include date, eater, merchant, city, product SKU or menu item, and promo or campaign. A metric like take_rate is derived (net_revenue divided by gross_bookings), it should be computed in the semantic layer to avoid aggregation bugs.

In a Sales warehouse, the merchant dimension has attributes like merchant_name, category, chain_id, and onboarding_status, and downstream teams need both "current state" and "as-of order time" reporting. Which SCD type(s) do you use, and what surrogate key strategy keeps the fact table stable?

MediumSlowly Changing Dimensions

Sample Answer

The standard move is SCD Type 2 for merchant attributes that change over time, with effective_start, effective_end, and a current_flag, then join facts to the correct version using order_timestamp. But here, you also need "current state" fast and consistent, because many dashboards should not do time-range joins on every query. Keep a stable natural key (merchant_id) in the dimension, generate a surrogate merchant_sk per version, and store merchant_sk on the fact plus optionally merchant_id for reconciliation and late arriving fixes.

You are modeling trip and Eats order sales in one warehouse, and Finance insists that "gross bookings" must reconcile across products and cities while analysts want flexibility to drill into adjustments like refunds and chargebacks. Do you model a single fact table, multiple fact tables, or a fact plus an adjustments fact, and how do you enforce metric consistency across them?

HardFact Constellation and Metric Governance

Practice more Dimensional Modeling & Warehousing questions

Cloud Infrastructure & Data Stores

In practice, you’ll need to articulate why you’d pick Spark/Hive vs an MPP warehouse vs Cassandra for a specific workload. Interviewers look for pragmatic tradeoffs: throughput vs latency, partitioning/sharding choices, and operational constraints.

You need a daily Sales analytics table for trips and promos that powers dashboards and ad hoc SQL, and you also need a low latency lookup for the current promo eligibility by rider at request time. Which parts go to Spark plus Hive, an MPP warehouse (Redshift or Vertica), and Cassandra, and what partition or key design do you pick for each?

MediumStore Selection and Partitioning

Sample Answer

The standard move is Spark plus Hive for raw and heavy ETL, then publish curated facts and dims into an MPP warehouse for interactive analytics, and use Cassandra for serving lookups keyed by a single entity. But here, promo eligibility has sharp latency and availability constraints, so you denormalize into Cassandra by $rider\_id$ (and maybe $city\_id$) even if it duplicates warehouse data. Partition Hive by date and city for scan pruning, and model the warehouse with a partition or distribution strategy that aligns with your dominant joins (often date and city) to avoid expensive data movement.

You are building a near real-time Sales metrics pipeline for completed trips (gross bookings, net revenue) with a 5 minute SLA, consuming events that can arrive late or out of order by up to 2 hours. How do you choose between exactly-once semantics, idempotent writes, and upserts in your data store, and what consistency and compaction choices do you make if the serving layer is Cassandra?

HardConsistency, Idempotency, and Upserts

Practice more Cloud Infrastructure & Data Stores questions

Uber's loop is structured so that no single skill carries you through. A candidate who aces the SQL round but freezes when asked to sketch a real-time fraud detection pipeline for Eats merchants (where schema evolution, consumer lag, and exactly-once delivery all collide in one prompt) won't clear the bar. The most common prep mistake, from what candidates report, is treating system design as a soft round you can improvise through, when in practice it's where pipeline knowledge, infrastructure tradeoffs, and Uber-specific context (Kafka into Hive, Presto query patterns over trip/rider schemas, SLA monitoring for marketplace signals) get stress-tested simultaneously.

Practice Uber-specific questions across all six areas at datainterview.com/questions.

How to Prepare for Uber Data Engineer Interviews

Know the Business

Updated Q1 2026

Official mission

“to ignite opportunity by setting the world in motion.”

What it actually means

Uber's real mission is to be the global technology platform that powers and optimizes the movement of people and goods, creating economic opportunities and convenience across various sectors. The company also commits to sustainability and adapting its services to local needs.

San Francisco, CaliforniaHybrid - 2 days/week

Key Business Metrics

Revenue

$52B

+20% YoY

Market Cap

$153B

-14% YoY

Employees

34K

+9% YoY

Users

137.0M

Current Strategic Priorities

Bring a state-of-the-art robotaxi to market later in 2026
Build a unique new option for affordable and scalable autonomous rides in the San Francisco Bay Area and beyond
Introduce more riders to autonomous mobility
Deploy at least 1,200 Robotaxis across the Middle East by 2027
Help families navigate everyday transportation with greater ease, visibility, and confidence

Competitive Moat

Global market leadershipExtensive global presenceDiversified service offeringsNetwork effects

Uber's biggest bet right now is autonomous mobility layered on top of its existing marketplace. The company plans to deploy at least 1,200 WeRide robotaxis across the Middle East by 2027 and bring a Lucid/Nuro robotaxi to market later in 2026. Meanwhile, the core business hit $52 billion in revenue in 2025, up roughly 20% year over year.

For a data engineer, that dual focus creates a concrete challenge: ingesting third-party AV telemetry from WeRide and Nuro alongside Uber's own Kafka-based trip event streams, all feeding into the same Spark/Hive/Presto foundation that powers surge pricing and ETA models. When you're asked "why Uber," skip the brand story and talk about that tension. Reference a specific post from their tech stack blog series, or describe how merging heterogeneous AV sensor data with first-party marketplace signals creates schema evolution and data quality problems you'd be excited to solve.

Try a Real Interview Question

Sessionize Event Stream With Watermark

python

Given a list of events $(user\_id, ts)$ where $ts$ is an integer Unix timestamp in seconds and the list is not guaranteed to be sorted, compute per-user sessions using a gap threshold $g$ seconds. Two consecutive events for the same user belong to the same session if the time difference is $\le g$, and you must ignore late events with $ts < watermark$; return a list of sessions as tuples $(user\_id, start\_ts, end\_ts, event\_count)$ sorted by $user\_id$ then $start\_ts$.

Python

1from typing import Iterable, List, Tuple
2
3
4def sessionize_events(
5    events: Iterable[Tuple[str, int]],
6    gap_seconds: int,
7    watermark: int,
8) -> List[Tuple[str, int, int, int]]:
9    """Sessionize per-user events with a gap threshold and a watermark.
10
11    Args:
12        events: Iterable of (user_id, ts) events. Not guaranteed to be sorted.
13        gap_seconds: Gap threshold in seconds. Same session if next_ts - prev_ts <= gap_seconds.
14        watermark: Ignore late events where ts < watermark.
15
16    Returns:
17        List of (user_id, start_ts, end_ts, event_count) sorted by user_id then start_ts.
18    """
19    pass
20

Python

1from typing import Iterable, List, Tuple, Dict
2
3
4def sessionize_events(
5    events: Iterable[Tuple[str, int]],
6    gap_seconds: int,
7    watermark: int,
8) -> List[Tuple[str, int, int, int]]:
9    """Sessionize per-user events with a gap threshold and a watermark.
10
11    Args:
12        events: Iterable of (user_id, ts) events. Not guaranteed to be sorted.
13        gap_seconds: Gap threshold in seconds. Same session if next_ts - prev_ts <= gap_seconds.
14        watermark: Ignore late events where ts < watermark.
15
16    Returns:
17        List of (user_id, start_ts, end_ts, event_count) sorted by user_id then start_ts.
18    """
19    if gap_seconds < 0:
20        raise ValueError("gap_seconds must be non-negative")
21
22    per_user: Dict[str, List[int]] = {}
23    for user_id, ts in events:
24        if ts < watermark:
25            continue
26        per_user.setdefault(user_id, []).append(ts)
27
28    sessions: List[Tuple[str, int, int, int]] = []
29
30    for user_id, ts_list in per_user.items():
31        ts_list.sort()
32        start = ts_list[0]
33        end = ts_list[0]
34        count = 1
35
36        for ts in ts_list[1:]:
37            if ts - end <= gap_seconds:
38                end = ts
39                count += 1
40            else:
41                sessions.append((user_id, start, end, count))
42                start = ts
43                end = ts
44                count = 1
45
46        sessions.append((user_id, start, end, count))
47
48    sessions.sort(key=lambda x: (x[0], x[1]))
49    return sessions
50

700+ ML coding problems with a live Python executor.

Practice in the Engine

Uber's DE coding rounds sit closer to applied data manipulation than abstract puzzle-solving, from what candidates report. Sharpen that muscle at datainterview.com/coding, focusing on problems that involve transforming structured event data rather than pure algorithmic brain teasers.

Test Your Readiness

How Ready Are You for Uber Data Engineer?

1 / 10

Data Pipeline Engineering

Can you design an incremental ingestion pipeline (batch or streaming) that provides exactly-once semantics or effective deduplication using event_time, idempotent writes, and replay handling?

Gaps in your answers point you to exactly where to focus. Drill those weak spots at datainterview.com/questions before your loop.

Frequently Asked Questions

How long does the Uber Data Engineer interview process take?

From first recruiter call to offer, expect roughly 4 to 6 weeks. You'll start with a recruiter screen, then move to a technical phone screen focused on SQL and coding. After that comes the onsite (or virtual onsite), which typically includes 4 to 5 rounds in a single day. Scheduling can stretch things out, especially if the team is busy, so don't be surprised if it takes closer to 7 weeks in some cases.

What technical skills are tested in the Uber Data Engineer interview?

Uber tests you hard on data pipeline design, dimensional data modeling, and data warehousing. You should be comfortable building production-quality ETL pipelines and working with distributed data systems, including logging, storage, data quality, and monitoring. Real-time data processing and scalability engineering come up frequently. On the coding side, SQL is non-negotiable (advanced level, including window functions), and you'll also need solid Python skills. Java and Scala knowledge is a plus, especially for pipeline work.

How should I tailor my resume for an Uber Data Engineer role?

Lead with your data pipeline and ETL experience. Uber cares about scale, so quantify everything: how many records your pipelines processed, latency improvements you achieved, how many downstream consumers relied on your data. Call out specific technologies for distributed systems, real-time processing, and data warehousing. If you've done dimensional modeling or built monitoring/data quality frameworks, put that front and center. Keep it to one page if you have under 10 years of experience, and mirror the language from Uber's job description.

What is the total compensation for Uber Data Engineer roles?

Uber pays competitively for data engineers in San Francisco. For a mid-level Data Engineer (L4), total compensation typically falls in the $200K to $280K range including base, bonus, and RSUs. Senior Data Engineers (L5) can expect $280K to $380K total comp. Staff level (L5b/L6) pushes well above $400K. RSUs vest over four years and make up a significant chunk, so pay attention to the stock component when evaluating your offer.

How do I prepare for the Uber Data Engineer behavioral interview?

Uber's culture emphasizes integrity, customer obsession, and doing the right thing. Prepare stories that show you making tough tradeoffs, pushing back on bad ideas respectfully, and thinking about the end user. They want to see that you can operate with a global mindset while solving local problems. Have 5 to 6 strong stories ready that cover conflict resolution, technical leadership, and times you improved something without being asked. I've seen candidates get rejected despite strong technical rounds because they couldn't articulate how they collaborate across teams.

How hard are the SQL questions in the Uber Data Engineer interview?

They're genuinely hard. Expect advanced SQL with window functions, CTEs, self-joins, and multi-step aggregations. You won't get away with just knowing SELECT and GROUP BY. Uber's SQL questions often involve real-world scenarios like calculating rider metrics, driver utilization, or trip-level analytics. Practice writing complex queries from scratch without an IDE helping you. You can find similar difficulty questions at datainterview.com/questions to get a feel for the level they expect.

What happens during the Uber Data Engineer onsite interview?

The onsite typically has 4 to 5 rounds spread across one day. You'll face a SQL deep-dive round, a coding round (usually Python), a system design round focused on data pipeline architecture, and at least one behavioral round. The system design round is where many candidates struggle. You'll be asked to design end-to-end data systems covering ingestion, storage, transformation, and serving layers. Some loops also include a data modeling round where you design a dimensional schema from scratch.

What metrics and business concepts should I know for the Uber Data Engineer interview?

Understand Uber's two-sided marketplace. Know metrics like trip completion rate, surge pricing mechanics, driver utilization, rider retention, and ETA accuracy. Think about how these metrics flow through data pipelines and what data quality issues could arise. Uber generates $52B in revenue, so the data volumes are massive. Being able to talk about how you'd model ride data, payment events, or driver earnings at that scale shows you understand the business, not just the tech.

Are ML or statistics concepts tested in the Uber Data Engineer interview?

Data Engineer roles at Uber are more engineering-focused than ML-focused. You probably won't be asked to derive a gradient descent algorithm. But you should understand how your pipelines feed ML models and analytics. Concepts like A/B testing data pipelines, feature engineering at scale, and basic statistical awareness (distributions, sampling, aggregation bias) can come up in conversation. If you're interviewing for a more senior role, expect questions about how you'd build data infrastructure that supports ML workflows.

What format should I use to answer Uber behavioral interview questions?

Use the STAR format (Situation, Task, Action, Result) but keep it tight. Uber interviewers don't want a 10-minute monologue. Spend about 20% on setup and 60% on what you actually did. Always end with a measurable result. For example, don't say 'the pipeline was faster.' Say 'latency dropped from 45 minutes to 8 minutes, which unblocked the pricing team's daily refresh.' Tie your answers back to Uber's values when it feels natural, especially customer obsession and integrity.

What are common mistakes candidates make in the Uber Data Engineer interview?

The biggest one I see is underestimating the system design round. Candidates prep SQL and coding but walk into the design round without a framework for discussing data pipelines end to end. Another common mistake is being too theoretical. Uber wants people who've actually built things at scale, so vague answers about 'best practices' won't cut it. Also, don't skip behavioral prep. Uber takes culture fit seriously, and a weak behavioral round can sink an otherwise strong performance.

How should I practice coding for the Uber Data Engineer interview?

Focus on Python and SQL, in that order of coding priority. For Python, practice data manipulation, writing clean functions, and working with common libraries. For SQL, drill window functions, recursive CTEs, and complex joins until they're second nature. Write everything by hand or in a plain text editor to simulate interview conditions. I recommend practicing with the problems at datainterview.com/coding, which are calibrated to the difficulty level you'll actually face. Aim for at least 3 to 4 weeks of consistent daily practice before your onsite.

Uber Data Engineer Interview Guide

Uber Data Engineer Role

A Typical Week

A Week in the Life of a Uber Data Engineer

Weekly time split

Culture notes

Projects & Impact Areas

Skills & What's Expected

Levels & Career Growth

Work Culture

Uber Data Engineer Compensation

Uber Data Engineer Interview Process

Initial Screen

Recruiter Screen

Technical Assessment

Coding & Algorithms

Onsite

System Design

Coding & Algorithms

SQL & Data Modeling

Behavioral

Bar Raiser

Tips to Stand Out

Common Reasons Candidates Don't Pass

Uber Data Engineer Interview Questions

Data Pipeline & Platform Engineering

System Design for Distributed Data Systems

Coding & Algorithms (DE-leaning)

SQL Querying & Optimization

Dimensional Modeling & Warehousing

Cloud Infrastructure & Data Stores

How to Prepare for Uber Data Engineer Interviews

Try a Real Interview Question

Sessionize Event Stream With Watermark

Test Your Readiness

Frequently Asked Questions

Dan Lee

Related Articles

Two Sigma Data Scientist Interview Guide

Salesforce Data Analyst Interview Guide

Scale AI Machine Learning Engineer Interview Guide