Uber Data Engineer Interview Guide

Dan Lee's profile image
Dan LeeData & AI Lead
Last updateFebruary 27, 2026
Uber Data Engineer Interview

Uber Data Engineer at a Glance

Interview Rounds

7 rounds

Difficulty

SQL (advanced, including window functions) Python Java ScalaSalesData WarehousingETLBig DataData GovernanceAnalytics

From hundreds of mock interviews, here's the pattern that trips up Uber Data Engineer candidates: they prep like it's a SQL-heavy analytics role and get blindsided by the software engineering bar. Uber's DE org sits closer to platform engineering than to BI. You're expected to write production-grade PySpark jobs that deduplicate billions of Kafka trip events, then own the on-call pager for the pipelines those jobs feed.

Uber Data Engineer Role

Primary Focus

SalesData WarehousingETLBig DataData GovernanceAnalytics

Skill Profile

Math & StatsSoftware EngData & SQLMachine LearningApplied AIInfra & CloudBusinessViz & Comms

Math & Stats

Medium

A Bachelor's or Master's degree in Computer Science or a related field is required, implying a foundational understanding of mathematical and statistical concepts relevant to data analysis and engineering. While not explicitly focused on advanced statistical modeling, a solid grasp of data distributions and analytical principles is expected for structuring data for business insights.

Software Eng

Expert

This role demands extensive software engineering prowess, including technical leadership in architecting, implementing, testing, releasing, and monitoring data systems. Emphasis is placed on engineering best practices, producing high-quality code, documentation, and developing scripts and tools. The expectation for a 'Staff Engineer level or above' indicates a need for deep expertise in sustainable engineering and system design.

Data & SQL

Expert

This is a core competency, requiring extensive experience in designing and managing data pipelines, dimensional data models, and data warehouses. The role involves building and maintaining pipelines that process billions of events daily, ensuring scalability, reliability, and efficiency for real-time data processing and decision-making. Expertise in ETL, data quality, and monitoring for distributed data systems is paramount.

Machine Learning

Medium

While not primarily an ML model development role, Data Engineers at Uber are crucial architects of the data ecosystem that enables ML-driven solutions like fraud detection, dynamic pricing, and driver-rider matching. They need to understand the data requirements for machine learning models and build pipelines that serve these needs effectively, implying a strong understanding of ML data workflows.

Applied AI

Low

There is no explicit mention of modern AI or GenAI as a direct skill requirement in the provided sources for a Data Engineer role. While Uber likely leverages these technologies, the Data Engineer's primary focus, based on the sources, is on foundational data infrastructure. This is a conservative estimate, as the field is evolving rapidly by 2026.

Infra & Cloud

High

The role requires significant experience with distributed data systems for logging, storage, ETL, and monitoring. Familiarity with MPP databases (e.g., AWS Redshift, Teradata) and NoSQL databases like Cassandra is essential. Data Engineers are expected to handle petabytes of data, design for scalability, and understand trade-offs between consistency, availability, and latency in a global, real-time platform.

Business

High

A strong emphasis is placed on identifying and solving engineering and business problems with little guidance, seeing the 'big picture,' and driving alignment on strategically important improvements. The role requires building strong relationships, collaborating meaningfully with various stakeholders, and demonstrating excellent judgment and responsibility, indicative of high business acumen and leadership.

Viz & Comms

Medium

Excellent written and verbal communication skills are explicitly required, including the ability to write detailed technical documents and collaborate with cross-functional teams. The role involves structuring data for 'intuitive analytics and business insights,' suggesting an understanding of how data is consumed and presented, though direct data visualization might be handled by other roles.

What You Need

  • Designing and managing data pipelines
  • Dimensional data modeling
  • Data warehousing
  • Building and deploying production-quality ETL pipelines
  • Working with end-to-end distributed data systems (logging, storage, data quality, monitoring)
  • Real-time data processing
  • Scalability engineering
  • Technical leadership
  • Problem-solving (engineering and business)
  • Excellent written and verbal communication
  • Understanding of consistency, availability, and latency trade-offs

Languages

SQL (advanced, including window functions)PythonJavaScala

Tools & Technologies

HadoopHiveVerticaMPP databases (e.g., AWS Redshift, Teradata)CassandraApache Spark

Want to ace the interview?

Practice with real questions.

Start Mock Interview

At Uber, a Data Engineer builds and operates the pipelines behind the marketplace's nervous system. Trip events flow through Kafka, driver supply signals land in Hive, and Eats order fulfillment metrics refresh for downstream data science teams. Success after year one means product teams trust your tables enough to build surge pricing and ETA models on them without a second thought.

A Typical Week

A Week in the Life of a Uber Data Engineer

Typical L5 workweek · Uber

Weekly time split

Coding30%Infrastructure25%Meetings15%Writing10%Break10%Analysis5%Research5%

Culture notes

  • Uber operates at high velocity with massive data scale — expect to own pipelines that hundreds of teams depend on, and the pager can be unforgiving during your on-call rotation.
  • Uber requires three days per week in the San Francisco or Sunnyvale office (Tuesday, Wednesday, Thursday), with Monday and Friday as flexible remote days.

The thing that catches most candidates off guard is how much infrastructure work rivals pure coding. You're not just writing Spark jobs. You're cleaning up orphaned HDFS partitions, writing null-rate checks that page the on-call, and doing storage hygiene on shared Hadoop clusters before quota reviews hit.

Projects & Impact Areas

The core work revolves around Uber's marketplace signals: real-time trip deduplication feeding the trips fact table, driver incentive calculations that consolidate six upstream sources, and the dimensional models powering surge pricing decisions. Data quality isn't a side quest you tackle when things break. You'll contribute to internal tooling like schema registries and anomaly detection on data freshness, and Uber expects you to treat these systems as first-class products.

Skills & What's Expected

SQL is necessary but nowhere near sufficient. Uber's stack (Spark, Kafka, Hive, HDFS, Presto, Cassandra) means you need to reason about exactly-once semantics and executor OOM errors caused by data skew on keys like city_id in São Paulo. Business acumen scores surprisingly high, too. You're expected to explain why a driver earnings fact table exists and how it affects marketplace balance, not just demonstrate that you can build it.

Levels & Career Growth

From what candidates report on Blind, there's real confusion between L5ii and L6 offers for senior/staff candidates. Clarify your target level with the recruiter before the loop starts, because comp bands and interview expectations shift meaningfully between them. The blocker for L5-to-L6 promotion at Uber isn't technical skill; it's scope. Staff DEs drive alignment on platform-wide migrations (like evaluating Apache Iceberg as a replacement for Hive/ORC table formats), and if your impact stays within one team's pipelines, you'll stay at L5.

Work Culture

Uber mandates Tuesday, Wednesday, and Thursday in the San Francisco or Sunnyvale office, with Monday and Friday as flexible remote days. Remote-only arrangements for engineering roles are rare. On-call rotations are real, and the pager can be unforgiving when hundreds of downstream teams depend on your tables. The culture rewards ownership above all: you ship it, you monitor it, you fix it.

Uber Data Engineer Compensation

Uber's RSU grants vest over four years, often on a 25% annual schedule. That even cadence means no back-loading surprises, but it also means your initial grant size sets the trajectory for your total comp across the entire vesting window. Negotiate the RSU number hard before you sign, because there's no structural acceleration later to bail out a weak initial offer.

Both base salary and RSU grants are flexible levers at Uber, and a sign-on bonus can sweeten things further if you're holding a competing offer. Where most candidates leave money on the table: they fixate on one component instead of pushing across all three simultaneously. Come prepared to articulate your specific value (pipeline scale you've operated at, systems you've owned end-to-end) and let any competing offers do the rest of the talking.

Uber Data Engineer Interview Process

7 rounds·~5 weeks end to end

Initial Screen

1 round
1

Recruiter Screen

30mPhone

This initial 30-minute phone call will cover your background, career aspirations, and why you're interested in Uber. You'll also discuss the specific Data Engineer role, team alignment, and compensation expectations.

behavioralgeneral

Tips for this round

  • Research Uber's mission and recent projects to show genuine interest.
  • Prepare a concise summary of your relevant experience and career goals.
  • Clearly articulate why you are a good fit for a Data Engineer role at Uber.
  • Be ready to discuss your salary expectations and current compensation.
  • Highlight any experience with real-time data processing or large-scale systems.

Technical Assessment

1 round
2

Coding & Algorithms

60mVideo Call

Expect a 60-minute live coding session focusing on data structures and algorithms. You'll be asked to solve one or two datainterview.com/coding-style problems, demonstrating your problem-solving abilities and coding proficiency.

algorithmsdata_structuresengineering

Tips for this round

  • Practice datainterview.com/coding medium and hard problems, especially those involving arrays, strings, trees, and graphs.
  • Choose a programming language you are most proficient in and can write runnable code quickly.
  • Think out loud, explaining your approach, thought process, and any trade-offs considered.
  • Write clean, well-structured code and test it with various edge cases.
  • Be prepared to discuss time and space complexity of your solution.

Onsite

5 rounds
3

System Design

60mLive

You'll be challenged to design a scalable data system for a real-world Uber scenario, such as processing millions of concurrent events. This 60-minute session will assess your ability to architect robust, high-throughput data pipelines and infrastructure.

system_designdata_engineeringdata_pipelinecloud_infrastructure

Tips for this round

  • Focus on key data engineering principles like scalability, reliability, fault tolerance, and real-time processing.
  • Discuss relevant technologies like Kafka, Spark, Flink, Hadoop, and various database types (NoSQL, OLAP).
  • Clearly define requirements and constraints before diving into the design details.
  • Explain trade-offs for different architectural choices and justify your decisions.
  • Consider data modeling, storage solutions, and monitoring aspects of your design.

Tips to Stand Out

  • Master datainterview.com/coding. Uber emphasizes runnable code in its technical rounds, so extensive practice with datainterview.com/coding-style problems, especially medium to hard difficulty, is crucial. Focus on understanding underlying data structures and algorithms.
  • Prioritize System Design. Data Engineers at Uber build systems for massive scale and real-time processing. Be prepared to design robust, scalable data pipelines, discussing trade-offs and relevant technologies like Kafka, Spark, and distributed databases.
  • Showcase 'Hustle' and Business Impact. Uber values candidates who are proactive and can demonstrate how their work drives business results. Frame your experiences to highlight initiative, problem-solving, and the tangible impact of your projects.
  • Deep Dive into SQL and Data Modeling. As a Data Engineer, your ability to write complex SQL queries, design efficient database schemas, and understand data warehousing concepts will be thoroughly tested. Practice advanced SQL and schema design.
  • Prepare Behavioral Stories. Use the STAR method to prepare detailed stories about your past experiences, focusing on collaboration, leadership, overcoming challenges, and learning from failures. Align these stories with Uber's culture.
  • Leverage Referrals. A strong referral can significantly boost your chances, potentially even allowing you to bypass the technical phone screen. Network and seek out current Uber employees.
  • Understand Uber's Scale. Throughout your interviews, demonstrate an awareness of the challenges and considerations involved in handling petabytes of data and billions of events daily, as this is central to Uber's data ecosystem.

Common Reasons Candidates Don't Pass

  • Inability to write runnable code. Candidates often fail by providing pseudocode or incomplete solutions that don't execute correctly, indicating a lack of practical coding proficiency.
  • Weak system design for scale. Many struggle to design data systems that can handle Uber's immense scale (real-time processing, petabytes of data), failing to consider critical aspects like fault tolerance, latency, and throughput.
  • Lack of business impact or 'hustle'. Candidates who only focus on technical details without connecting their work to business outcomes or demonstrating a proactive, results-oriented mindset may not align with Uber's cultural expectations.
  • Insufficient SQL and data modeling skills. For a Data Engineer role, a shallow understanding of advanced SQL, database design principles, and data warehousing concepts is a common reason for rejection.
  • Poor communication during technical rounds. Failing to articulate thought processes, ask clarifying questions, or explain design choices clearly can lead interviewers to believe the candidate lacks problem-solving clarity.
  • Inadequate behavioral responses. Generic or unprepared answers to behavioral questions that don't highlight specific achievements, collaboration skills, or alignment with Uber's values can be a red flag.

Offer & Negotiation

Uber's compensation packages for Data Engineers typically include a competitive base salary, an annual performance bonus, and Restricted Stock Units (RSUs) that vest over a four-year period, often with a 25% annual vesting schedule. When negotiating, focus on increasing the base salary or the RSU grant, as these are often the most flexible components. A sign-on bonus can also be a negotiable lever, especially if you have competing offers. Be prepared to articulate your value and leverage any other offers you may have to secure a more favorable package.

Budget 5 weeks from recruiter screen to offer. The final round, Uber's Bar Raiser, is run by a senior engineer from a completely different team who pressure-tests your technical depth and cultural fit against Uber's company-wide standards. From what candidates report, this round carries outsized weight in the final decision, so don't treat it as a casual chat after surviving the technical gauntlet.

The most common rejection reason isn't a weak system design or a shaky star schema. It's submitting code that doesn't actually execute. Uber's loop puts real emphasis on runnable, edge-case-tested solutions, and interviewers penalize pseudocode heavily. Pair that with the fact that Uber's on-call culture means DEs ship production code in Go, Java, and Python daily, and you can see why they care. Sharpen that muscle at datainterview.com/coding with problems that mirror Uber's flavor: graph traversals for geospatial routing, streaming aggregation structures, and log parsing.

Uber Data Engineer Interview Questions

Data Pipeline & Platform Engineering

Expect questions that force you to design reliable batch and streaming pipelines end-to-end (ingestion, transformation, backfills, SLAs). Candidates often stumble on operational details like idempotency, late data, schema evolution, and data quality gates.

Your Spark job builds a daily Sales fact table for Uber Eats from Kafka order events, and retries sometimes double-count revenue. How do you make the pipeline idempotent across replays and backfills while keeping a 2 hour SLA?

MediumIdempotency and Exactly-Once Semantics

Sample Answer

Most candidates default to just running a daily overwrite or using at-least-once writes, but that fails here because retries and late events create duplicates and silent metric inflation. You need a deterministic primary key (for example, order_id plus event_type plus event_version) and a merge-based sink that upserts on that key. Add a watermark and a bounded late-data window, then run periodic reconciliation for stragglers outside the window. For backfills, reprocess by partition range and keep the same upsert key so replays converge.

Practice more Data Pipeline & Platform Engineering questions

System Design for Distributed Data Systems

Most candidates underestimate how much your design must balance latency, consistency, and cost at Uber scale. You’ll be evaluated on clear component boundaries, failure modes, and how you’d monitor and evolve the system over time.

Design an end to end pipeline that produces an hourly Sales Ops dashboard for Uber Eats showing gross bookings, net revenue, refunds, and promo spend by city and merchant, with updates within 5 minutes of the hour. Specify ingestion, storage, compute, dimensional model, and how you guarantee idempotency and backfills when late events arrive.

EasyStreaming ETL and Warehouse Modeling

Sample Answer

Use a Lambda style design, a streaming path for low latency aggregates plus a batch path that recomputes authoritative hourly facts and reconciles late data. Stream orders, refunds, and promos into a durable log, write curated tables with stable business keys, then serve the dashboard from an hourly fact table joined to city, merchant, and time dimensions. Idempotency comes from deterministic event IDs and merge semantics, late events trigger reprocessing by hour partitions, and you monitor freshness, duplicate rate, and reconciliation deltas between stream and batch outputs.

Practice more System Design for Distributed Data Systems questions

Coding & Algorithms (DE-leaning)

The bar here isn't whether you know obscure tricks, it's whether you can write correct, efficient code under interview constraints. Expect data-engineering flavored problems (parsing, aggregation, streaming-like logic) with solid complexity reasoning and clean tests.

You ingest Uber Eats order events as (order_id, ts, status) where status is one of CREATED, ACCEPTED, PICKED_UP, DELIVERED, CANCELED; return the final status per order_id and the final timestamp. If two events for the same order_id share the same ts, the later one in the input list wins.

EasyEvent Aggregation

Sample Answer

You could sort all events by (order_id, ts) and take the last one per order, or do a single pass hash aggregation that keeps the best-so-far event per order. Sorting is simpler to reason about, but it is $O(n \log n)$ and costs memory for rearrangement. The single pass wins here because you can compare timestamps in $O(1)$ per event and handle tie break by input position, so total time is $O(n)$ with $O(k)$ memory for $k$ orders.

Python
1from __future__ import annotations
2
3from dataclasses import dataclass
4from typing import Dict, Iterable, List, Tuple
5
6
7@dataclass(frozen=True)
8class Event:
9    order_id: str
10    ts: int
11    status: str
12
13
14def final_status_per_order(events: Iterable[Tuple[str, int, str]]) -> Dict[str, Tuple[int, str]]:
15    """Return {order_id: (final_ts, final_status)}.
16
17    Tie break: if ts is equal, later event in the input wins.
18    """
19    best: Dict[str, Tuple[int, int, str]] = {}
20    # Store (ts, index, status) so that (ts, index) defines total order.
21    for idx, (order_id, ts, status) in enumerate(events):
22        if order_id not in best:
23            best[order_id] = (ts, idx, status)
24            continue
25
26        best_ts, best_idx, _ = best[order_id]
27        # Later timestamp wins; if tied, later index wins.
28        if ts > best_ts or (ts == best_ts and idx > best_idx):
29            best[order_id] = (ts, idx, status)
30
31    return {oid: (ts, status) for oid, (ts, _idx, status) in best.items()}
32
33
34if __name__ == "__main__":
35    sample = [
36        ("o1", 10, "CREATED"),
37        ("o1", 12, "ACCEPTED"),
38        ("o2", 8, "CREATED"),
39        ("o1", 12, "CANCELED"),  # same ts as ACCEPTED, later in input so wins
40        ("o2", 9, "DELIVERED"),
41    ]
42    out = final_status_per_order(sample)
43    assert out["o1"] == (12, "CANCELED")
44    assert out["o2"] == (9, "DELIVERED")
45    print(out)
46
Practice more Coding & Algorithms (DE-leaning) questions

SQL Querying & Optimization

Your ability to express complex analytics with joins, windows, and careful filtering is a primary signal in the DE loop. Strong answers anticipate edge cases (duplicates, slowly changing entities) and show awareness of performance implications in MPP warehouses.

You have table fact_trip(trip_id, rider_id, city_id, request_ts, trip_date, status, fare_usd). For each city and trip_date, return completed trips, unique riders, and completion_rate (completed requests divided by all requests), with completion_rate as a decimal and safe for days with zero requests.

EasyAggregations and Filtering

Sample Answer

Reason through it: Filter nothing upfront, you need both completed and non-completed requests in the denominator. Aggregate by city_id and trip_date, compute total_requests as COUNT(*), completed_trips as SUM over status. Unique riders is COUNT(DISTINCT rider_id) across all requests for that day. Completion rate is completed_trips divided by total_requests, guard division by zero with NULLIF so you do not throw or lie.

SQL
1/* Daily city-level completion funnel metrics */
2SELECT
3  city_id,
4  trip_date,
5  /* All requests, regardless of status */
6  COUNT(*) AS total_requests,
7  /* Completed requests only */
8  SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) AS completed_trips,
9  /* Unique riders who made a request that day */
10  COUNT(DISTINCT rider_id) AS unique_riders,
11  /* Safe decimal rate */
12  (SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) * 1.0)
13    / NULLIF(COUNT(*), 0) AS completion_rate
14FROM fact_trip
15GROUP BY
16  city_id,
17  trip_date
18ORDER BY
19  trip_date,
20  city_id;
Practice more SQL Querying & Optimization questions

Dimensional Modeling & Warehousing

Rather than raw SQL skill, you’re judged on how you structure facts, dimensions, and metrics so downstream analytics stays stable. Watch for prompts around SCD types, grain definition, and metric consistency across Sales/Analytics consumers.

Uber Eats wants a star schema for Sales analytics with metrics like gross_bookings, net_revenue, promo_spend, and completed_orders. Define the fact table grain and name 5 dimensions you would include, and explain one metric that must not be stored as a fact column.

EasyStar Schema and Grain

Sample Answer

This question is checking whether you can lock the grain before you model anything, and avoid mixing additive facts with derived ratios. Your fact grain should be something like order line or order, not "day", otherwise you cannot safely slice by store, eater, courier, or promo. Dimensions usually include date, eater, merchant, city, product SKU or menu item, and promo or campaign. A metric like take_rate is derived (net_revenue divided by gross_bookings), it should be computed in the semantic layer to avoid aggregation bugs.

Practice more Dimensional Modeling & Warehousing questions

Cloud Infrastructure & Data Stores

In practice, you’ll need to articulate why you’d pick Spark/Hive vs an MPP warehouse vs Cassandra for a specific workload. Interviewers look for pragmatic tradeoffs: throughput vs latency, partitioning/sharding choices, and operational constraints.

You need a daily Sales analytics table for trips and promos that powers dashboards and ad hoc SQL, and you also need a low latency lookup for the current promo eligibility by rider at request time. Which parts go to Spark plus Hive, an MPP warehouse (Redshift or Vertica), and Cassandra, and what partition or key design do you pick for each?

MediumStore Selection and Partitioning

Sample Answer

The standard move is Spark plus Hive for raw and heavy ETL, then publish curated facts and dims into an MPP warehouse for interactive analytics, and use Cassandra for serving lookups keyed by a single entity. But here, promo eligibility has sharp latency and availability constraints, so you denormalize into Cassandra by $rider\_id$ (and maybe $city\_id$) even if it duplicates warehouse data. Partition Hive by date and city for scan pruning, and model the warehouse with a partition or distribution strategy that aligns with your dominant joins (often date and city) to avoid expensive data movement.

Practice more Cloud Infrastructure & Data Stores questions

Uber's loop is structured so that no single skill carries you through. A candidate who aces the SQL round but freezes when asked to sketch a real-time fraud detection pipeline for Eats merchants (where schema evolution, consumer lag, and exactly-once delivery all collide in one prompt) won't clear the bar. The most common prep mistake, from what candidates report, is treating system design as a soft round you can improvise through, when in practice it's where pipeline knowledge, infrastructure tradeoffs, and Uber-specific context (Kafka into Hive, Presto query patterns over trip/rider schemas, SLA monitoring for marketplace signals) get stress-tested simultaneously.

Practice Uber-specific questions across all six areas at datainterview.com/questions.

How to Prepare for Uber Data Engineer Interviews

Know the Business

Updated Q1 2026

Official mission

to ignite opportunity by setting the world in motion.

What it actually means

Uber's real mission is to be the global technology platform that powers and optimizes the movement of people and goods, creating economic opportunities and convenience across various sectors. The company also commits to sustainability and adapting its services to local needs.

San Francisco, CaliforniaHybrid - 2 days/week

Key Business Metrics

Revenue

$52B

+20% YoY

Market Cap

$153B

-14% YoY

Employees

34K

+9% YoY

Users

137.0M

Current Strategic Priorities

  • Bring a state-of-the-art robotaxi to market later in 2026
  • Build a unique new option for affordable and scalable autonomous rides in the San Francisco Bay Area and beyond
  • Introduce more riders to autonomous mobility
  • Deploy at least 1,200 Robotaxis across the Middle East by 2027
  • Help families navigate everyday transportation with greater ease, visibility, and confidence

Competitive Moat

Global market leadershipExtensive global presenceDiversified service offeringsNetwork effects

Uber's biggest bet right now is autonomous mobility layered on top of its existing marketplace. The company plans to deploy at least 1,200 WeRide robotaxis across the Middle East by 2027 and bring a Lucid/Nuro robotaxi to market later in 2026. Meanwhile, the core business hit $52 billion in revenue in 2025, up roughly 20% year over year.

For a data engineer, that dual focus creates a concrete challenge: ingesting third-party AV telemetry from WeRide and Nuro alongside Uber's own Kafka-based trip event streams, all feeding into the same Spark/Hive/Presto foundation that powers surge pricing and ETA models. When you're asked "why Uber," skip the brand story and talk about that tension. Reference a specific post from their tech stack blog series, or describe how merging heterogeneous AV sensor data with first-party marketplace signals creates schema evolution and data quality problems you'd be excited to solve.

Try a Real Interview Question

Sessionize Event Stream With Watermark

python

Given a list of events $(user\_id, ts)$ where $ts$ is an integer Unix timestamp in seconds and the list is not guaranteed to be sorted, compute per-user sessions using a gap threshold $g$ seconds. Two consecutive events for the same user belong to the same session if the time difference is $\le g$, and you must ignore late events with $ts < watermark$; return a list of sessions as tuples $(user\_id, start\_ts, end\_ts, event\_count)$ sorted by $user\_id$ then $start\_ts$.

Python
1from typing import Iterable, List, Tuple
2
3
4def sessionize_events(
5    events: Iterable[Tuple[str, int]],
6    gap_seconds: int,
7    watermark: int,
8) -> List[Tuple[str, int, int, int]]:
9    """Sessionize per-user events with a gap threshold and a watermark.
10
11    Args:
12        events: Iterable of (user_id, ts) events. Not guaranteed to be sorted.
13        gap_seconds: Gap threshold in seconds. Same session if next_ts - prev_ts <= gap_seconds.
14        watermark: Ignore late events where ts < watermark.
15
16    Returns:
17        List of (user_id, start_ts, end_ts, event_count) sorted by user_id then start_ts.
18    """
19    pass
20

700+ ML coding problems with a live Python executor.

Practice in the Engine

Uber's DE coding rounds sit closer to applied data manipulation than abstract puzzle-solving, from what candidates report. Sharpen that muscle at datainterview.com/coding, focusing on problems that involve transforming structured event data rather than pure algorithmic brain teasers.

Test Your Readiness

How Ready Are You for Uber Data Engineer?

1 / 10
Data Pipeline Engineering

Can you design an incremental ingestion pipeline (batch or streaming) that provides exactly-once semantics or effective deduplication using event_time, idempotent writes, and replay handling?

Gaps in your answers point you to exactly where to focus. Drill those weak spots at datainterview.com/questions before your loop.

Frequently Asked Questions

How long does the Uber Data Engineer interview process take?

From first recruiter call to offer, expect roughly 4 to 6 weeks. You'll start with a recruiter screen, then move to a technical phone screen focused on SQL and coding. After that comes the onsite (or virtual onsite), which typically includes 4 to 5 rounds in a single day. Scheduling can stretch things out, especially if the team is busy, so don't be surprised if it takes closer to 7 weeks in some cases.

What technical skills are tested in the Uber Data Engineer interview?

Uber tests you hard on data pipeline design, dimensional data modeling, and data warehousing. You should be comfortable building production-quality ETL pipelines and working with distributed data systems, including logging, storage, data quality, and monitoring. Real-time data processing and scalability engineering come up frequently. On the coding side, SQL is non-negotiable (advanced level, including window functions), and you'll also need solid Python skills. Java and Scala knowledge is a plus, especially for pipeline work.

How should I tailor my resume for an Uber Data Engineer role?

Lead with your data pipeline and ETL experience. Uber cares about scale, so quantify everything: how many records your pipelines processed, latency improvements you achieved, how many downstream consumers relied on your data. Call out specific technologies for distributed systems, real-time processing, and data warehousing. If you've done dimensional modeling or built monitoring/data quality frameworks, put that front and center. Keep it to one page if you have under 10 years of experience, and mirror the language from Uber's job description.

What is the total compensation for Uber Data Engineer roles?

Uber pays competitively for data engineers in San Francisco. For a mid-level Data Engineer (L4), total compensation typically falls in the $200K to $280K range including base, bonus, and RSUs. Senior Data Engineers (L5) can expect $280K to $380K total comp. Staff level (L5b/L6) pushes well above $400K. RSUs vest over four years and make up a significant chunk, so pay attention to the stock component when evaluating your offer.

How do I prepare for the Uber Data Engineer behavioral interview?

Uber's culture emphasizes integrity, customer obsession, and doing the right thing. Prepare stories that show you making tough tradeoffs, pushing back on bad ideas respectfully, and thinking about the end user. They want to see that you can operate with a global mindset while solving local problems. Have 5 to 6 strong stories ready that cover conflict resolution, technical leadership, and times you improved something without being asked. I've seen candidates get rejected despite strong technical rounds because they couldn't articulate how they collaborate across teams.

How hard are the SQL questions in the Uber Data Engineer interview?

They're genuinely hard. Expect advanced SQL with window functions, CTEs, self-joins, and multi-step aggregations. You won't get away with just knowing SELECT and GROUP BY. Uber's SQL questions often involve real-world scenarios like calculating rider metrics, driver utilization, or trip-level analytics. Practice writing complex queries from scratch without an IDE helping you. You can find similar difficulty questions at datainterview.com/questions to get a feel for the level they expect.

What happens during the Uber Data Engineer onsite interview?

The onsite typically has 4 to 5 rounds spread across one day. You'll face a SQL deep-dive round, a coding round (usually Python), a system design round focused on data pipeline architecture, and at least one behavioral round. The system design round is where many candidates struggle. You'll be asked to design end-to-end data systems covering ingestion, storage, transformation, and serving layers. Some loops also include a data modeling round where you design a dimensional schema from scratch.

What metrics and business concepts should I know for the Uber Data Engineer interview?

Understand Uber's two-sided marketplace. Know metrics like trip completion rate, surge pricing mechanics, driver utilization, rider retention, and ETA accuracy. Think about how these metrics flow through data pipelines and what data quality issues could arise. Uber generates $52B in revenue, so the data volumes are massive. Being able to talk about how you'd model ride data, payment events, or driver earnings at that scale shows you understand the business, not just the tech.

Are ML or statistics concepts tested in the Uber Data Engineer interview?

Data Engineer roles at Uber are more engineering-focused than ML-focused. You probably won't be asked to derive a gradient descent algorithm. But you should understand how your pipelines feed ML models and analytics. Concepts like A/B testing data pipelines, feature engineering at scale, and basic statistical awareness (distributions, sampling, aggregation bias) can come up in conversation. If you're interviewing for a more senior role, expect questions about how you'd build data infrastructure that supports ML workflows.

What format should I use to answer Uber behavioral interview questions?

Use the STAR format (Situation, Task, Action, Result) but keep it tight. Uber interviewers don't want a 10-minute monologue. Spend about 20% on setup and 60% on what you actually did. Always end with a measurable result. For example, don't say 'the pipeline was faster.' Say 'latency dropped from 45 minutes to 8 minutes, which unblocked the pricing team's daily refresh.' Tie your answers back to Uber's values when it feels natural, especially customer obsession and integrity.

What are common mistakes candidates make in the Uber Data Engineer interview?

The biggest one I see is underestimating the system design round. Candidates prep SQL and coding but walk into the design round without a framework for discussing data pipelines end to end. Another common mistake is being too theoretical. Uber wants people who've actually built things at scale, so vague answers about 'best practices' won't cut it. Also, don't skip behavioral prep. Uber takes culture fit seriously, and a weak behavioral round can sink an otherwise strong performance.

How should I practice coding for the Uber Data Engineer interview?

Focus on Python and SQL, in that order of coding priority. For Python, practice data manipulation, writing clean functions, and working with common libraries. For SQL, drill window functions, recursive CTEs, and complex joins until they're second nature. Write everything by hand or in a plain text editor to simulate interview conditions. I recommend practicing with the problems at datainterview.com/coding, which are calibrated to the difficulty level you'll actually face. Aim for at least 3 to 4 weeks of consistent daily practice before your onsite.

Dan Lee's profile image

Written by

Dan Lee

Data & AI Lead

Dan is a seasoned data scientist and ML coach with 10+ years of experience at Google, PayPal, and startups. He has helped candidates land top-paying roles and offers personalized guidance to accelerate your data career.

Connect on LinkedIn