Uber Data Engineer Interview Guide

Dan Lee's profile image
Dan LeeData & AI Lead
Last updateFebruary 24, 2026
Uber Data Engineer Interview

Uber Data Engineer at a Glance

Interview Rounds

7 rounds

Difficulty

SQL (advanced, including window functions) Python Java ScalaSalesData WarehousingETLBig DataData GovernanceAnalytics

From mock interviews we've run for Uber DE candidates, one pattern keeps showing up: strong coders who ace the PySpark round then can't sketch a real-time ingestion architecture for billions of Kafka events. Uber's data engineer role is platform engineering with a data flavor, and if you prep like it's just ETL work, you'll leave points on the table.

Uber Data Engineer Role

Primary Focus

SalesData WarehousingETLBig DataData GovernanceAnalytics

Skill Profile

Math & StatsSoftware EngData & SQLMachine LearningApplied AIInfra & CloudBusinessViz & Comms

Math & Stats

Medium

A Bachelor's or Master's degree in Computer Science or a related field is required, implying a foundational understanding of mathematical and statistical concepts relevant to data analysis and engineering. While not explicitly focused on advanced statistical modeling, a solid grasp of data distributions and analytical principles is expected for structuring data for business insights.

Software Eng

Expert

This role demands extensive software engineering prowess, including technical leadership in architecting, implementing, testing, releasing, and monitoring data systems. Emphasis is placed on engineering best practices, producing high-quality code, documentation, and developing scripts and tools. The expectation for a 'Staff Engineer level or above' indicates a need for deep expertise in sustainable engineering and system design.

Data & SQL

Expert

This is a core competency, requiring extensive experience in designing and managing data pipelines, dimensional data models, and data warehouses. The role involves building and maintaining pipelines that process billions of events daily, ensuring scalability, reliability, and efficiency for real-time data processing and decision-making. Expertise in ETL, data quality, and monitoring for distributed data systems is paramount.

Machine Learning

Medium

While not primarily an ML model development role, Data Engineers at Uber are crucial architects of the data ecosystem that enables ML-driven solutions like fraud detection, dynamic pricing, and driver-rider matching. They need to understand the data requirements for machine learning models and build pipelines that serve these needs effectively, implying a strong understanding of ML data workflows.

Applied AI

Low

There is no explicit mention of modern AI or GenAI as a direct skill requirement in the provided sources for a Data Engineer role. While Uber likely leverages these technologies, the Data Engineer's primary focus, based on the sources, is on foundational data infrastructure. This is a conservative estimate, as the field is evolving rapidly by 2026.

Infra & Cloud

High

The role requires significant experience with distributed data systems for logging, storage, ETL, and monitoring. Familiarity with MPP databases (e.g., AWS Redshift, Teradata) and NoSQL databases like Cassandra is essential. Data Engineers are expected to handle petabytes of data, design for scalability, and understand trade-offs between consistency, availability, and latency in a global, real-time platform.

Business

High

A strong emphasis is placed on identifying and solving engineering and business problems with little guidance, seeing the 'big picture,' and driving alignment on strategically important improvements. The role requires building strong relationships, collaborating meaningfully with various stakeholders, and demonstrating excellent judgment and responsibility, indicative of high business acumen and leadership.

Viz & Comms

Medium

Excellent written and verbal communication skills are explicitly required, including the ability to write detailed technical documents and collaborate with cross-functional teams. The role involves structuring data for 'intuitive analytics and business insights,' suggesting an understanding of how data is consumed and presented, though direct data visualization might be handled by other roles.

What You Need

  • Designing and managing data pipelines
  • Dimensional data modeling
  • Data warehousing
  • Building and deploying production-quality ETL pipelines
  • Working with end-to-end distributed data systems (logging, storage, data quality, monitoring)
  • Real-time data processing
  • Scalability engineering
  • Technical leadership
  • Problem-solving (engineering and business)
  • Excellent written and verbal communication
  • Understanding of consistency, availability, and latency trade-offs

Languages

SQL (advanced, including window functions)PythonJavaScala

Tools & Technologies

HadoopHiveVerticaMPP databases (e.g., AWS Redshift, Teradata)CassandraApache Spark

Want to ace the interview?

Practice with real questions.

Start Mock Interview

You're joining a team that builds and operates the data infrastructure behind every ride, every Eats order, and every freight shipment. Success after year one means you own a domain's pipelines end-to-end (say, the trips fact table or driver earnings) and at least one downstream team like pricing or safety considers your tables their source of truth. You're building the distributed systems that power Uber's sub-second dispatch and surge pricing decisions, not writing SQL for dashboards.

A Typical Week

A Week in the Life of a Uber Data Engineer

Typical L5 workweek · Uber

Weekly time split

Coding30%Infrastructure25%Meetings15%Writing10%Break10%Analysis5%Research5%

Culture notes

  • Uber operates at high velocity with massive data scale — expect to own pipelines that hundreds of teams depend on, and the pager can be unforgiving during your on-call rotation.
  • Uber requires three days per week in the San Francisco or Sunnyvale office (Tuesday, Wednesday, Thursday), with Monday and Friday as flexible remote days.

The split that catches most candidates off guard is how much infrastructure work you do relative to pure analysis. Your mornings aren't spent exploring data; they're spent triaging SLA breaches on Hive tables that feed surge pricing, writing data quality monitors, and reclaiming storage on shared Hadoop clusters. If you're coming from an analytics-heavy DE role, recalibrate: this is closer to backend platform engineering where your PySpark dedup job for Kafka trip events is the product, not a means to a dashboard.

Projects & Impact Areas

Surge pricing pipelines and Eats restaurant ranking both depend on tables your team builds, so a schema change you propose on Wednesday can affect how millions of riders get priced by Friday. Uber's open-source projects like Cadence (workflow orchestration) and AresDB (real-time analytics) aren't just resume decoration; DEs actively contribute to these tools, which means you'll spend real cycles on platform improvements that shape how the entire company processes data. Data quality is a first-class ownership area too: you write the anomaly detection monitors, you own the pages when thresholds breach, and no separate QA team exists to catch your misses.

Skills & What's Expected

Software engineering and data architecture are both rated expert-level for this role, and that pairing tells you everything. ML knowledge sits at medium, which doesn't mean irrelevant; Uber explicitly expects you to understand how your pipelines feed ML-driven systems like fraud detection and driver-rider matching. What's underrated in most candidates' prep is business acumen, scored high but often neglected. Uber wants you to articulate why a 15-minute freshness SLA on the trips table matters to surge pricing accuracy, not just how to hit it.

Levels & Career Growth

The widget shows the level bands, but here's what it can't tell you: the jump where most people stall requires visible cross-team impact, not just flawless execution within your own pipelines. Did you define a data contract that three other teams adopted? Did you drive a platform migration that changed how the org thinks about table formats? Scope is the differentiator at every transition, and promotion committees at Uber look for evidence that you shaped decisions beyond your immediate domain.

Work Culture

Uber requires three days per week in-office (Tuesday, Wednesday, Thursday) at SF or Sunnyvale, with Monday and Friday flexible. The work schedule is demanding, with expectations of flexibility beyond standard hours, especially during on-call rotations when hundreds of downstream teams depend on your tables. The upside is genuine: you get direct product impact, real autonomy over your domain, and the kind of scale problems (petabytes of data, billions of daily events) that most companies can only describe hypothetically.

Uber Data Engineer Compensation

Uber's comp structure for Data Engineers breaks into three pieces: base salary, annual performance bonus, and RSUs that vest over four years (from what candidates report, often at 25% per year). Both base salary and RSU grants are considered the most flexible levers in negotiation, so don't assume either is locked. A sign-on bonus is also on the table, especially when you're holding a competing offer.

Competing offers are your strongest card here. Uber's recruiters expect candidates to shop around, and bringing a credible counter-offer gives you real room to push on equity or sign-on. If you're negotiating without one, focus your energy on clearly articulating the specific pipeline and platform experience you'd bring to Uber's Kafka/Spark/Flink stack, since that's the kind of scarcity the hiring team can use to justify a bump internally. Practice these conversations with real scenarios at datainterview.com/questions.

Uber Data Engineer Interview Process

7 rounds·~5 weeks end to end

Initial Screen

1 round
1

Recruiter Screen

30mPhone

This initial 30-minute phone call will cover your background, career aspirations, and why you're interested in Uber. You'll also discuss the specific Data Engineer role, team alignment, and compensation expectations.

behavioralgeneral

Tips for this round

  • Research Uber's mission and recent projects to show genuine interest.
  • Prepare a concise summary of your relevant experience and career goals.
  • Clearly articulate why you are a good fit for a Data Engineer role at Uber.
  • Be ready to discuss your salary expectations and current compensation.
  • Highlight any experience with real-time data processing or large-scale systems.

Technical Assessment

1 round
2

Coding & Algorithms

60mVideo Call

Expect a 60-minute live coding session focusing on data structures and algorithms. You'll be asked to solve one or two datainterview.com/coding-style problems, demonstrating your problem-solving abilities and coding proficiency.

algorithmsdata_structuresengineering

Tips for this round

  • Practice datainterview.com/coding medium and hard problems, especially those involving arrays, strings, trees, and graphs.
  • Choose a programming language you are most proficient in and can write runnable code quickly.
  • Think out loud, explaining your approach, thought process, and any trade-offs considered.
  • Write clean, well-structured code and test it with various edge cases.
  • Be prepared to discuss time and space complexity of your solution.

Onsite

5 rounds
3

System Design

60mLive

You'll be challenged to design a scalable data system for a real-world Uber scenario, such as processing millions of concurrent events. This 60-minute session will assess your ability to architect robust, high-throughput data pipelines and infrastructure.

system_designdata_engineeringdata_pipelinecloud_infrastructure

Tips for this round

  • Focus on key data engineering principles like scalability, reliability, fault tolerance, and real-time processing.
  • Discuss relevant technologies like Kafka, Spark, Flink, Hadoop, and various database types (NoSQL, OLAP).
  • Clearly define requirements and constraints before diving into the design details.
  • Explain trade-offs for different architectural choices and justify your decisions.
  • Consider data modeling, storage solutions, and monitoring aspects of your design.

Tips to Stand Out

  • Master datainterview.com/coding. Uber emphasizes runnable code in its technical rounds, so extensive practice with datainterview.com/coding-style problems, especially medium to hard difficulty, is crucial. Focus on understanding underlying data structures and algorithms.
  • Prioritize System Design. Data Engineers at Uber build systems for massive scale and real-time processing. Be prepared to design robust, scalable data pipelines, discussing trade-offs and relevant technologies like Kafka, Spark, and distributed databases.
  • Showcase 'Hustle' and Business Impact. Uber values candidates who are proactive and can demonstrate how their work drives business results. Frame your experiences to highlight initiative, problem-solving, and the tangible impact of your projects.
  • Deep Dive into SQL and Data Modeling. As a Data Engineer, your ability to write complex SQL queries, design efficient database schemas, and understand data warehousing concepts will be thoroughly tested. Practice advanced SQL and schema design.
  • Prepare Behavioral Stories. Use the STAR method to prepare detailed stories about your past experiences, focusing on collaboration, leadership, overcoming challenges, and learning from failures. Align these stories with Uber's culture.
  • Leverage Referrals. A strong referral can significantly boost your chances, potentially even allowing you to bypass the technical phone screen. Network and seek out current Uber employees.
  • Understand Uber's Scale. Throughout your interviews, demonstrate an awareness of the challenges and considerations involved in handling petabytes of data and billions of events daily, as this is central to Uber's data ecosystem.

Common Reasons Candidates Don't Pass

  • Inability to write runnable code. Candidates often fail by providing pseudocode or incomplete solutions that don't execute correctly, indicating a lack of practical coding proficiency.
  • Weak system design for scale. Many struggle to design data systems that can handle Uber's immense scale (real-time processing, petabytes of data), failing to consider critical aspects like fault tolerance, latency, and throughput.
  • Lack of business impact or 'hustle'. Candidates who only focus on technical details without connecting their work to business outcomes or demonstrating a proactive, results-oriented mindset may not align with Uber's cultural expectations.
  • Insufficient SQL and data modeling skills. For a Data Engineer role, a shallow understanding of advanced SQL, database design principles, and data warehousing concepts is a common reason for rejection.
  • Poor communication during technical rounds. Failing to articulate thought processes, ask clarifying questions, or explain design choices clearly can lead interviewers to believe the candidate lacks problem-solving clarity.
  • Inadequate behavioral responses. Generic or unprepared answers to behavioral questions that don't highlight specific achievements, collaboration skills, or alignment with Uber's values can be a red flag.

Offer & Negotiation

Uber's compensation packages for Data Engineers typically include a competitive base salary, an annual performance bonus, and Restricted Stock Units (RSUs) that vest over a four-year period, often with a 25% annual vesting schedule. When negotiating, focus on increasing the base salary or the RSU grant, as these are often the most flexible components. A sign-on bonus can also be a negotiable lever, especially if you have competing offers. Be prepared to articulate your value and leverage any other offers you may have to secure a more favorable package.

From what candidates report, the most common reason people wash out isn't a single weak round. It's failing to write code that actually runs. Uber's two coding rounds and the SQL session all demand executable solutions, not pseudocode or hand-wavy logic. If your Spark job or Python script wouldn't pass a basic test harness, that's a rejection, even if your system design was solid.

The Bar Raiser round is where confident candidates get blindsided. It's run by an engineer outside your prospective team, and the round blends behavioral depth with technical pressure testing on past projects you've shipped. Treating it as a relaxed culture chat (instead of preparing STAR stories around ownership, cross-team influence, and pushing back on bad technical decisions) is the mistake that sinks people who cleared every other round cleanly.

Uber Data Engineer Interview Questions

Data Pipeline & Platform Engineering

Expect questions that force you to design reliable batch and streaming pipelines end-to-end (ingestion, transformation, backfills, SLAs). Candidates often stumble on operational details like idempotency, late data, schema evolution, and data quality gates.

Your Spark job builds a daily Sales fact table for Uber Eats from Kafka order events, and retries sometimes double-count revenue. How do you make the pipeline idempotent across replays and backfills while keeping a 2 hour SLA?

MediumIdempotency and Exactly-Once Semantics

Sample Answer

Most candidates default to just running a daily overwrite or using at-least-once writes, but that fails here because retries and late events create duplicates and silent metric inflation. You need a deterministic primary key (for example, order_id plus event_type plus event_version) and a merge-based sink that upserts on that key. Add a watermark and a bounded late-data window, then run periodic reconciliation for stragglers outside the window. For backfills, reprocess by partition range and keep the same upsert key so replays converge.

Practice more Data Pipeline & Platform Engineering questions

System Design for Distributed Data Systems

Most candidates underestimate how much your design must balance latency, consistency, and cost at Uber scale. You’ll be evaluated on clear component boundaries, failure modes, and how you’d monitor and evolve the system over time.

Design an end to end pipeline that produces an hourly Sales Ops dashboard for Uber Eats showing gross bookings, net revenue, refunds, and promo spend by city and merchant, with updates within 5 minutes of the hour. Specify ingestion, storage, compute, dimensional model, and how you guarantee idempotency and backfills when late events arrive.

EasyStreaming ETL and Warehouse Modeling

Sample Answer

Use a Lambda style design, a streaming path for low latency aggregates plus a batch path that recomputes authoritative hourly facts and reconciles late data. Stream orders, refunds, and promos into a durable log, write curated tables with stable business keys, then serve the dashboard from an hourly fact table joined to city, merchant, and time dimensions. Idempotency comes from deterministic event IDs and merge semantics, late events trigger reprocessing by hour partitions, and you monitor freshness, duplicate rate, and reconciliation deltas between stream and batch outputs.

Practice more System Design for Distributed Data Systems questions

Coding & Algorithms (DE-leaning)

The bar here isn't whether you know obscure tricks, it's whether you can write correct, efficient code under interview constraints. Expect data-engineering flavored problems (parsing, aggregation, streaming-like logic) with solid complexity reasoning and clean tests.

You ingest Uber Eats order events as (order_id, ts, status) where status is one of CREATED, ACCEPTED, PICKED_UP, DELIVERED, CANCELED; return the final status per order_id and the final timestamp. If two events for the same order_id share the same ts, the later one in the input list wins.

EasyEvent Aggregation

Sample Answer

You could sort all events by (order_id, ts) and take the last one per order, or do a single pass hash aggregation that keeps the best-so-far event per order. Sorting is simpler to reason about, but it is $O(n \log n)$ and costs memory for rearrangement. The single pass wins here because you can compare timestamps in $O(1)$ per event and handle tie break by input position, so total time is $O(n)$ with $O(k)$ memory for $k$ orders.

from __future__ import annotations

from dataclasses import dataclass
from typing import Dict, Iterable, List, Tuple


@dataclass(frozen=True)
class Event:
    order_id: str
    ts: int
    status: str


def final_status_per_order(events: Iterable[Tuple[str, int, str]]) -> Dict[str, Tuple[int, str]]:
    """Return {order_id: (final_ts, final_status)}.

    Tie break: if ts is equal, later event in the input wins.
    """
    best: Dict[str, Tuple[int, int, str]] = {}
    # Store (ts, index, status) so that (ts, index) defines total order.
    for idx, (order_id, ts, status) in enumerate(events):
        if order_id not in best:
            best[order_id] = (ts, idx, status)
            continue

        best_ts, best_idx, _ = best[order_id]
        # Later timestamp wins; if tied, later index wins.
        if ts > best_ts or (ts == best_ts and idx > best_idx):
            best[order_id] = (ts, idx, status)

    return {oid: (ts, status) for oid, (ts, _idx, status) in best.items()}


if __name__ == "__main__":
    sample = [
        ("o1", 10, "CREATED"),
        ("o1", 12, "ACCEPTED"),
        ("o2", 8, "CREATED"),
        ("o1", 12, "CANCELED"),  # same ts as ACCEPTED, later in input so wins
        ("o2", 9, "DELIVERED"),
    ]
    out = final_status_per_order(sample)
    assert out["o1"] == (12, "CANCELED")
    assert out["o2"] == (9, "DELIVERED")
    print(out)
Practice more Coding & Algorithms (DE-leaning) questions

SQL Querying & Optimization

Your ability to express complex analytics with joins, windows, and careful filtering is a primary signal in the DE loop. Strong answers anticipate edge cases (duplicates, slowly changing entities) and show awareness of performance implications in MPP warehouses.

You have table fact_trip(trip_id, rider_id, city_id, request_ts, trip_date, status, fare_usd). For each city and trip_date, return completed trips, unique riders, and completion_rate (completed requests divided by all requests), with completion_rate as a decimal and safe for days with zero requests.

EasyAggregations and Filtering

Sample Answer

Reason through it: Filter nothing upfront, you need both completed and non-completed requests in the denominator. Aggregate by city_id and trip_date, compute total_requests as COUNT(*), completed_trips as SUM over status. Unique riders is COUNT(DISTINCT rider_id) across all requests for that day. Completion rate is completed_trips divided by total_requests, guard division by zero with NULLIF so you do not throw or lie.

/* Daily city-level completion funnel metrics */
SELECT
  city_id,
  trip_date,
  /* All requests, regardless of status */
  COUNT(*) AS total_requests,
  /* Completed requests only */
  SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) AS completed_trips,
  /* Unique riders who made a request that day */
  COUNT(DISTINCT rider_id) AS unique_riders,
  /* Safe decimal rate */
  (SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) * 1.0)
    / NULLIF(COUNT(*), 0) AS completion_rate
FROM fact_trip
GROUP BY
  city_id,
  trip_date
ORDER BY
  trip_date,
  city_id;
Practice more SQL Querying & Optimization questions

Dimensional Modeling & Warehousing

Rather than raw SQL skill, you’re judged on how you structure facts, dimensions, and metrics so downstream analytics stays stable. Watch for prompts around SCD types, grain definition, and metric consistency across Sales/Analytics consumers.

Uber Eats wants a star schema for Sales analytics with metrics like gross_bookings, net_revenue, promo_spend, and completed_orders. Define the fact table grain and name 5 dimensions you would include, and explain one metric that must not be stored as a fact column.

EasyStar Schema and Grain

Sample Answer

This question is checking whether you can lock the grain before you model anything, and avoid mixing additive facts with derived ratios. Your fact grain should be something like order line or order, not "day", otherwise you cannot safely slice by store, eater, courier, or promo. Dimensions usually include date, eater, merchant, city, product SKU or menu item, and promo or campaign. A metric like take_rate is derived (net_revenue divided by gross_bookings), it should be computed in the semantic layer to avoid aggregation bugs.

Practice more Dimensional Modeling & Warehousing questions

Cloud Infrastructure & Data Stores

In practice, you’ll need to articulate why you’d pick Spark/Hive vs an MPP warehouse vs Cassandra for a specific workload. Interviewers look for pragmatic tradeoffs: throughput vs latency, partitioning/sharding choices, and operational constraints.

You need a daily Sales analytics table for trips and promos that powers dashboards and ad hoc SQL, and you also need a low latency lookup for the current promo eligibility by rider at request time. Which parts go to Spark plus Hive, an MPP warehouse (Redshift or Vertica), and Cassandra, and what partition or key design do you pick for each?

MediumStore Selection and Partitioning

Sample Answer

The standard move is Spark plus Hive for raw and heavy ETL, then publish curated facts and dims into an MPP warehouse for interactive analytics, and use Cassandra for serving lookups keyed by a single entity. But here, promo eligibility has sharp latency and availability constraints, so you denormalize into Cassandra by $rider\_id$ (and maybe $city\_id$) even if it duplicates warehouse data. Partition Hive by date and city for scan pruning, and model the warehouse with a partition or distribution strategy that aligns with your dominant joins (often date and city) to avoid expensive data movement.

Practice more Cloud Infrastructure & Data Stores questions

Uber's loop punishes candidates who prep like it's a generic algorithms screen. The sample questions reference Uber Eats order idempotency, trip-level fraud detection on rolling windows, and star schemas built around ride/promo fact tables, so your answers need to reflect how Uber's marketplace actually moves data, not abstract whiteboard patterns. The single costliest mistake is grinding coding problems while ignoring pipeline design and system design, which together dominate the evaluation and frequently overlap in the same question (designing a near-real-time Sales anomaly system, for instance, tests both simultaneously).

Practice Uber-style questions across all six areas at datainterview.com/questions.

How to Prepare for Uber Data Engineer Interviews

Know the Business

Updated Q1 2026

Official mission

to ignite opportunity by setting the world in motion.

What it actually means

Uber's real mission is to be the global technology platform that powers and optimizes the movement of people and goods, creating economic opportunities and convenience across various sectors. The company also commits to sustainability and adapting its services to local needs.

San Francisco, CaliforniaHybrid - 2 days/week

Key Business Metrics

Revenue

$52B

+20% YoY

Market Cap

$153B

-14% YoY

Employees

34K

+9% YoY

Users

137.0M

Current Strategic Priorities

  • Bring a state-of-the-art robotaxi to market later in 2026
  • Build a unique new option for affordable and scalable autonomous rides in the San Francisco Bay Area and beyond
  • Introduce more riders to autonomous mobility
  • Deploy at least 1,200 Robotaxis across the Middle East by 2027
  • Help families navigate everyday transportation with greater ease, visibility, and confidence

Competitive Moat

Global market leadershipExtensive global presenceDiversified service offeringsNetwork effects

Uber is making its biggest platform bet since Eats: autonomous mobility. The Lucid/Nuro partnership targets a robotaxi launch in 2026, while a separate WeRide deal aims to deploy at least 1,200 robotaxis across the Middle East by 2027. For data engineers, that translates to entirely new pipeline domains: sensor telemetry, vehicle state streams, and safety-critical SLAs that don't exist in human-driver trip data.

Meanwhile, the core business posted $52 billion in revenue, up roughly 20% year over year. Existing pipelines feeding pricing, dispatch, and Eats ranking still need to scale with that growth. You'd be building new autonomous data infrastructure while keeping a massive, revenue-critical foundation healthy.

When interviewers ask "why Uber," don't talk about robotaxis in the abstract. Talk about the specific data engineering problem they create. Autonomous trip events need different dimensional models than human-driver trips (sensor fusion dimensions, safety-incident fact tables, sub-second latency budgets for vehicle routing). Framing your answer around that gap, and why your background prepares you to close it, shows you've studied Uber's actual roadmap rather than skimming a press release.

Try a Real Interview Question

Sessionize Event Stream With Watermark

python

Given a list of events $(user\_id, ts)$ where $ts$ is an integer Unix timestamp in seconds and the list is not guaranteed to be sorted, compute per-user sessions using a gap threshold $g$ seconds. Two consecutive events for the same user belong to the same session if the time difference is $\le g$, and you must ignore late events with $ts < watermark$; return a list of sessions as tuples $(user\_id, start\_ts, end\_ts, event\_count)$ sorted by $user\_id$ then $start\_ts$.

from typing import Iterable, List, Tuple


def sessionize_events(
    events: Iterable[Tuple[str, int]],
    gap_seconds: int,
    watermark: int,
) -> List[Tuple[str, int, int, int]]:
    """Sessionize per-user events with a gap threshold and a watermark.

    Args:
        events: Iterable of (user_id, ts) events. Not guaranteed to be sorted.
        gap_seconds: Gap threshold in seconds. Same session if next_ts - prev_ts <= gap_seconds.
        watermark: Ignore late events where ts < watermark.

    Returns:
        List of (user_id, start_ts, end_ts, event_count) sorted by user_id then start_ts.
    """
    pass

700+ ML coding problems with a live Python executor.

Practice in the Engine

Uber's job postings for data engineers call out strong Java/Python and distributed systems experience, so expect coding rounds that reward clean, well-structured code over brute-force solutions that merely pass. Practice with problems at datainterview.com/coding that let you build that muscle in a timed setting.

Test Your Readiness

How Ready Are You for Uber Data Engineer?

1 / 10
Data Pipeline Engineering

Can you design an incremental ingestion pipeline (batch or streaming) that provides exactly-once semantics or effective deduplication using event_time, idempotent writes, and replay handling?

Gaps in SQL optimization or pipeline design are the fastest to close with targeted reps. Work through Uber-relevant scenarios at datainterview.com/questions.

Frequently Asked Questions

How long does the Uber Data Engineer interview process take?

From first recruiter call to offer, expect roughly 4 to 6 weeks. You'll start with a recruiter screen, then move to a technical phone screen focused on SQL and coding. After that comes the onsite (or virtual onsite), which typically includes 4 to 5 rounds in a single day. Scheduling can stretch things out, especially if the team is busy, so don't be surprised if it takes closer to 7 weeks in some cases.

What technical skills are tested in the Uber Data Engineer interview?

Uber tests you hard on data pipeline design, dimensional data modeling, and data warehousing. You should be comfortable building production-quality ETL pipelines and working with distributed data systems, including logging, storage, data quality, and monitoring. Real-time data processing and scalability engineering come up frequently. On the coding side, SQL is non-negotiable (advanced level, including window functions), and you'll also need solid Python skills. Java and Scala knowledge is a plus, especially for pipeline work.

How should I tailor my resume for an Uber Data Engineer role?

Lead with your data pipeline and ETL experience. Uber cares about scale, so quantify everything: how many records your pipelines processed, latency improvements you achieved, how many downstream consumers relied on your data. Call out specific technologies for distributed systems, real-time processing, and data warehousing. If you've done dimensional modeling or built monitoring/data quality frameworks, put that front and center. Keep it to one page if you have under 10 years of experience, and mirror the language from Uber's job description.

What is the total compensation for Uber Data Engineer roles?

Uber pays competitively for data engineers in San Francisco. For a mid-level Data Engineer (L4), total compensation typically falls in the $200K to $280K range including base, bonus, and RSUs. Senior Data Engineers (L5) can expect $280K to $380K total comp. Staff level (L5b/L6) pushes well above $400K. RSUs vest over four years and make up a significant chunk, so pay attention to the stock component when evaluating your offer.

How do I prepare for the Uber Data Engineer behavioral interview?

Uber's culture emphasizes integrity, customer obsession, and doing the right thing. Prepare stories that show you making tough tradeoffs, pushing back on bad ideas respectfully, and thinking about the end user. They want to see that you can operate with a global mindset while solving local problems. Have 5 to 6 strong stories ready that cover conflict resolution, technical leadership, and times you improved something without being asked. I've seen candidates get rejected despite strong technical rounds because they couldn't articulate how they collaborate across teams.

How hard are the SQL questions in the Uber Data Engineer interview?

They're genuinely hard. Expect advanced SQL with window functions, CTEs, self-joins, and multi-step aggregations. You won't get away with just knowing SELECT and GROUP BY. Uber's SQL questions often involve real-world scenarios like calculating rider metrics, driver utilization, or trip-level analytics. Practice writing complex queries from scratch without an IDE helping you. You can find similar difficulty questions at datainterview.com/questions to get a feel for the level they expect.

What happens during the Uber Data Engineer onsite interview?

The onsite typically has 4 to 5 rounds spread across one day. You'll face a SQL deep-dive round, a coding round (usually Python), a system design round focused on data pipeline architecture, and at least one behavioral round. The system design round is where many candidates struggle. You'll be asked to design end-to-end data systems covering ingestion, storage, transformation, and serving layers. Some loops also include a data modeling round where you design a dimensional schema from scratch.

What metrics and business concepts should I know for the Uber Data Engineer interview?

Understand Uber's two-sided marketplace. Know metrics like trip completion rate, surge pricing mechanics, driver utilization, rider retention, and ETA accuracy. Think about how these metrics flow through data pipelines and what data quality issues could arise. Uber generates $52B in revenue, so the data volumes are massive. Being able to talk about how you'd model ride data, payment events, or driver earnings at that scale shows you understand the business, not just the tech.

Are ML or statistics concepts tested in the Uber Data Engineer interview?

Data Engineer roles at Uber are more engineering-focused than ML-focused. You probably won't be asked to derive a gradient descent algorithm. But you should understand how your pipelines feed ML models and analytics. Concepts like A/B testing data pipelines, feature engineering at scale, and basic statistical awareness (distributions, sampling, aggregation bias) can come up in conversation. If you're interviewing for a more senior role, expect questions about how you'd build data infrastructure that supports ML workflows.

What format should I use to answer Uber behavioral interview questions?

Use the STAR format (Situation, Task, Action, Result) but keep it tight. Uber interviewers don't want a 10-minute monologue. Spend about 20% on setup and 60% on what you actually did. Always end with a measurable result. For example, don't say 'the pipeline was faster.' Say 'latency dropped from 45 minutes to 8 minutes, which unblocked the pricing team's daily refresh.' Tie your answers back to Uber's values when it feels natural, especially customer obsession and integrity.

What are common mistakes candidates make in the Uber Data Engineer interview?

The biggest one I see is underestimating the system design round. Candidates prep SQL and coding but walk into the design round without a framework for discussing data pipelines end to end. Another common mistake is being too theoretical. Uber wants people who've actually built things at scale, so vague answers about 'best practices' won't cut it. Also, don't skip behavioral prep. Uber takes culture fit seriously, and a weak behavioral round can sink an otherwise strong performance.

How should I practice coding for the Uber Data Engineer interview?

Focus on Python and SQL, in that order of coding priority. For Python, practice data manipulation, writing clean functions, and working with common libraries. For SQL, drill window functions, recursive CTEs, and complex joins until they're second nature. Write everything by hand or in a plain text editor to simulate interview conditions. I recommend practicing with the problems at datainterview.com/coding, which are calibrated to the difficulty level you'll actually face. Aim for at least 3 to 4 weeks of consistent daily practice before your onsite.

Dan Lee's profile image

Written by

Dan Lee

Data & AI Lead

Dan is a seasoned data scientist and ML coach with 10+ years of experience at Google, PayPal, and startups. He has helped candidates land top-paying roles and offers personalized guidance to accelerate your data career.

Connect on LinkedIn