Uber Data Engineer at a Glance
Interview Rounds
7 rounds
Difficulty
From mock interviews we've run for Uber DE candidates, one pattern keeps showing up: strong coders who ace the PySpark round then can't sketch a real-time ingestion architecture for billions of Kafka events. Uber's data engineer role is platform engineering with a data flavor, and if you prep like it's just ETL work, you'll leave points on the table.
Uber Data Engineer Role
Primary Focus
Skill Profile
Math & Stats
MediumA Bachelor's or Master's degree in Computer Science or a related field is required, implying a foundational understanding of mathematical and statistical concepts relevant to data analysis and engineering. While not explicitly focused on advanced statistical modeling, a solid grasp of data distributions and analytical principles is expected for structuring data for business insights.
Software Eng
ExpertThis role demands extensive software engineering prowess, including technical leadership in architecting, implementing, testing, releasing, and monitoring data systems. Emphasis is placed on engineering best practices, producing high-quality code, documentation, and developing scripts and tools. The expectation for a 'Staff Engineer level or above' indicates a need for deep expertise in sustainable engineering and system design.
Data & SQL
ExpertThis is a core competency, requiring extensive experience in designing and managing data pipelines, dimensional data models, and data warehouses. The role involves building and maintaining pipelines that process billions of events daily, ensuring scalability, reliability, and efficiency for real-time data processing and decision-making. Expertise in ETL, data quality, and monitoring for distributed data systems is paramount.
Machine Learning
MediumWhile not primarily an ML model development role, Data Engineers at Uber are crucial architects of the data ecosystem that enables ML-driven solutions like fraud detection, dynamic pricing, and driver-rider matching. They need to understand the data requirements for machine learning models and build pipelines that serve these needs effectively, implying a strong understanding of ML data workflows.
Applied AI
LowThere is no explicit mention of modern AI or GenAI as a direct skill requirement in the provided sources for a Data Engineer role. While Uber likely leverages these technologies, the Data Engineer's primary focus, based on the sources, is on foundational data infrastructure. This is a conservative estimate, as the field is evolving rapidly by 2026.
Infra & Cloud
HighThe role requires significant experience with distributed data systems for logging, storage, ETL, and monitoring. Familiarity with MPP databases (e.g., AWS Redshift, Teradata) and NoSQL databases like Cassandra is essential. Data Engineers are expected to handle petabytes of data, design for scalability, and understand trade-offs between consistency, availability, and latency in a global, real-time platform.
Business
HighA strong emphasis is placed on identifying and solving engineering and business problems with little guidance, seeing the 'big picture,' and driving alignment on strategically important improvements. The role requires building strong relationships, collaborating meaningfully with various stakeholders, and demonstrating excellent judgment and responsibility, indicative of high business acumen and leadership.
Viz & Comms
MediumExcellent written and verbal communication skills are explicitly required, including the ability to write detailed technical documents and collaborate with cross-functional teams. The role involves structuring data for 'intuitive analytics and business insights,' suggesting an understanding of how data is consumed and presented, though direct data visualization might be handled by other roles.
What You Need
- Designing and managing data pipelines
- Dimensional data modeling
- Data warehousing
- Building and deploying production-quality ETL pipelines
- Working with end-to-end distributed data systems (logging, storage, data quality, monitoring)
- Real-time data processing
- Scalability engineering
- Technical leadership
- Problem-solving (engineering and business)
- Excellent written and verbal communication
- Understanding of consistency, availability, and latency trade-offs
Languages
Tools & Technologies
Want to ace the interview?
Practice with real questions.
You're joining a team that builds and operates the data infrastructure behind every ride, every Eats order, and every freight shipment. Success after year one means you own a domain's pipelines end-to-end (say, the trips fact table or driver earnings) and at least one downstream team like pricing or safety considers your tables their source of truth. You're building the distributed systems that power Uber's sub-second dispatch and surge pricing decisions, not writing SQL for dashboards.
A Typical Week
A Week in the Life of a Uber Data Engineer
Typical L5 workweek · Uber
Weekly time split
Culture notes
- Uber operates at high velocity with massive data scale — expect to own pipelines that hundreds of teams depend on, and the pager can be unforgiving during your on-call rotation.
- Uber requires three days per week in the San Francisco or Sunnyvale office (Tuesday, Wednesday, Thursday), with Monday and Friday as flexible remote days.
The split that catches most candidates off guard is how much infrastructure work you do relative to pure analysis. Your mornings aren't spent exploring data; they're spent triaging SLA breaches on Hive tables that feed surge pricing, writing data quality monitors, and reclaiming storage on shared Hadoop clusters. If you're coming from an analytics-heavy DE role, recalibrate: this is closer to backend platform engineering where your PySpark dedup job for Kafka trip events is the product, not a means to a dashboard.
Projects & Impact Areas
Surge pricing pipelines and Eats restaurant ranking both depend on tables your team builds, so a schema change you propose on Wednesday can affect how millions of riders get priced by Friday. Uber's open-source projects like Cadence (workflow orchestration) and AresDB (real-time analytics) aren't just resume decoration; DEs actively contribute to these tools, which means you'll spend real cycles on platform improvements that shape how the entire company processes data. Data quality is a first-class ownership area too: you write the anomaly detection monitors, you own the pages when thresholds breach, and no separate QA team exists to catch your misses.
Skills & What's Expected
Software engineering and data architecture are both rated expert-level for this role, and that pairing tells you everything. ML knowledge sits at medium, which doesn't mean irrelevant; Uber explicitly expects you to understand how your pipelines feed ML-driven systems like fraud detection and driver-rider matching. What's underrated in most candidates' prep is business acumen, scored high but often neglected. Uber wants you to articulate why a 15-minute freshness SLA on the trips table matters to surge pricing accuracy, not just how to hit it.
Levels & Career Growth
The widget shows the level bands, but here's what it can't tell you: the jump where most people stall requires visible cross-team impact, not just flawless execution within your own pipelines. Did you define a data contract that three other teams adopted? Did you drive a platform migration that changed how the org thinks about table formats? Scope is the differentiator at every transition, and promotion committees at Uber look for evidence that you shaped decisions beyond your immediate domain.
Work Culture
Uber requires three days per week in-office (Tuesday, Wednesday, Thursday) at SF or Sunnyvale, with Monday and Friday flexible. The work schedule is demanding, with expectations of flexibility beyond standard hours, especially during on-call rotations when hundreds of downstream teams depend on your tables. The upside is genuine: you get direct product impact, real autonomy over your domain, and the kind of scale problems (petabytes of data, billions of daily events) that most companies can only describe hypothetically.
Uber Data Engineer Compensation
Uber's comp structure for Data Engineers breaks into three pieces: base salary, annual performance bonus, and RSUs that vest over four years (from what candidates report, often at 25% per year). Both base salary and RSU grants are considered the most flexible levers in negotiation, so don't assume either is locked. A sign-on bonus is also on the table, especially when you're holding a competing offer.
Competing offers are your strongest card here. Uber's recruiters expect candidates to shop around, and bringing a credible counter-offer gives you real room to push on equity or sign-on. If you're negotiating without one, focus your energy on clearly articulating the specific pipeline and platform experience you'd bring to Uber's Kafka/Spark/Flink stack, since that's the kind of scarcity the hiring team can use to justify a bump internally. Practice these conversations with real scenarios at datainterview.com/questions.
Uber Data Engineer Interview Process
7 rounds·~5 weeks end to end
Initial Screen
1 roundRecruiter Screen
This initial 30-minute phone call will cover your background, career aspirations, and why you're interested in Uber. You'll also discuss the specific Data Engineer role, team alignment, and compensation expectations.
Tips for this round
- Research Uber's mission and recent projects to show genuine interest.
- Prepare a concise summary of your relevant experience and career goals.
- Clearly articulate why you are a good fit for a Data Engineer role at Uber.
- Be ready to discuss your salary expectations and current compensation.
- Highlight any experience with real-time data processing or large-scale systems.
Technical Assessment
1 roundCoding & Algorithms
Expect a 60-minute live coding session focusing on data structures and algorithms. You'll be asked to solve one or two datainterview.com/coding-style problems, demonstrating your problem-solving abilities and coding proficiency.
Tips for this round
- Practice datainterview.com/coding medium and hard problems, especially those involving arrays, strings, trees, and graphs.
- Choose a programming language you are most proficient in and can write runnable code quickly.
- Think out loud, explaining your approach, thought process, and any trade-offs considered.
- Write clean, well-structured code and test it with various edge cases.
- Be prepared to discuss time and space complexity of your solution.
Onsite
5 roundsSystem Design
You'll be challenged to design a scalable data system for a real-world Uber scenario, such as processing millions of concurrent events. This 60-minute session will assess your ability to architect robust, high-throughput data pipelines and infrastructure.
Tips for this round
- Focus on key data engineering principles like scalability, reliability, fault tolerance, and real-time processing.
- Discuss relevant technologies like Kafka, Spark, Flink, Hadoop, and various database types (NoSQL, OLAP).
- Clearly define requirements and constraints before diving into the design details.
- Explain trade-offs for different architectural choices and justify your decisions.
- Consider data modeling, storage solutions, and monitoring aspects of your design.
Coding & Algorithms
This 60-minute live coding interview will present more complex algorithmic challenges than the phone screen. You'll need to write efficient, bug-free code and demonstrate strong problem-solving skills, often involving data structures relevant to large-scale data.
SQL & Data Modeling
The interviewer will probe your expertise in SQL and data modeling during this 60-minute session. You'll likely be asked to write complex SQL queries, design database schemas, and discuss data warehousing concepts relevant to Uber's massive datasets.
Behavioral
This 60-minute conversation with a hiring manager or senior engineer will delve into your past projects, collaboration experiences, and leadership potential. You'll be expected to articulate your contributions, challenges faced, and lessons learned, aligning with Uber's 'hustle' culture.
Bar Raiser
This is Uber's version of a final culture and technical depth check, typically lasting 60 minutes. An interviewer from a different team will assess your overall fit, technical rigor, and potential to raise the bar for the organization, often through deep dives into your experience and challenging scenarios.
Tips to Stand Out
- Master datainterview.com/coding. Uber emphasizes runnable code in its technical rounds, so extensive practice with datainterview.com/coding-style problems, especially medium to hard difficulty, is crucial. Focus on understanding underlying data structures and algorithms.
- Prioritize System Design. Data Engineers at Uber build systems for massive scale and real-time processing. Be prepared to design robust, scalable data pipelines, discussing trade-offs and relevant technologies like Kafka, Spark, and distributed databases.
- Showcase 'Hustle' and Business Impact. Uber values candidates who are proactive and can demonstrate how their work drives business results. Frame your experiences to highlight initiative, problem-solving, and the tangible impact of your projects.
- Deep Dive into SQL and Data Modeling. As a Data Engineer, your ability to write complex SQL queries, design efficient database schemas, and understand data warehousing concepts will be thoroughly tested. Practice advanced SQL and schema design.
- Prepare Behavioral Stories. Use the STAR method to prepare detailed stories about your past experiences, focusing on collaboration, leadership, overcoming challenges, and learning from failures. Align these stories with Uber's culture.
- Leverage Referrals. A strong referral can significantly boost your chances, potentially even allowing you to bypass the technical phone screen. Network and seek out current Uber employees.
- Understand Uber's Scale. Throughout your interviews, demonstrate an awareness of the challenges and considerations involved in handling petabytes of data and billions of events daily, as this is central to Uber's data ecosystem.
Common Reasons Candidates Don't Pass
- ✗Inability to write runnable code. Candidates often fail by providing pseudocode or incomplete solutions that don't execute correctly, indicating a lack of practical coding proficiency.
- ✗Weak system design for scale. Many struggle to design data systems that can handle Uber's immense scale (real-time processing, petabytes of data), failing to consider critical aspects like fault tolerance, latency, and throughput.
- ✗Lack of business impact or 'hustle'. Candidates who only focus on technical details without connecting their work to business outcomes or demonstrating a proactive, results-oriented mindset may not align with Uber's cultural expectations.
- ✗Insufficient SQL and data modeling skills. For a Data Engineer role, a shallow understanding of advanced SQL, database design principles, and data warehousing concepts is a common reason for rejection.
- ✗Poor communication during technical rounds. Failing to articulate thought processes, ask clarifying questions, or explain design choices clearly can lead interviewers to believe the candidate lacks problem-solving clarity.
- ✗Inadequate behavioral responses. Generic or unprepared answers to behavioral questions that don't highlight specific achievements, collaboration skills, or alignment with Uber's values can be a red flag.
Offer & Negotiation
Uber's compensation packages for Data Engineers typically include a competitive base salary, an annual performance bonus, and Restricted Stock Units (RSUs) that vest over a four-year period, often with a 25% annual vesting schedule. When negotiating, focus on increasing the base salary or the RSU grant, as these are often the most flexible components. A sign-on bonus can also be a negotiable lever, especially if you have competing offers. Be prepared to articulate your value and leverage any other offers you may have to secure a more favorable package.
From what candidates report, the most common reason people wash out isn't a single weak round. It's failing to write code that actually runs. Uber's two coding rounds and the SQL session all demand executable solutions, not pseudocode or hand-wavy logic. If your Spark job or Python script wouldn't pass a basic test harness, that's a rejection, even if your system design was solid.
The Bar Raiser round is where confident candidates get blindsided. It's run by an engineer outside your prospective team, and the round blends behavioral depth with technical pressure testing on past projects you've shipped. Treating it as a relaxed culture chat (instead of preparing STAR stories around ownership, cross-team influence, and pushing back on bad technical decisions) is the mistake that sinks people who cleared every other round cleanly.
Uber Data Engineer Interview Questions
Data Pipeline & Platform Engineering
Expect questions that force you to design reliable batch and streaming pipelines end-to-end (ingestion, transformation, backfills, SLAs). Candidates often stumble on operational details like idempotency, late data, schema evolution, and data quality gates.
Your Spark job builds a daily Sales fact table for Uber Eats from Kafka order events, and retries sometimes double-count revenue. How do you make the pipeline idempotent across replays and backfills while keeping a 2 hour SLA?
Sample Answer
Most candidates default to just running a daily overwrite or using at-least-once writes, but that fails here because retries and late events create duplicates and silent metric inflation. You need a deterministic primary key (for example, order_id plus event_type plus event_version) and a merge-based sink that upserts on that key. Add a watermark and a bounded late-data window, then run periodic reconciliation for stragglers outside the window. For backfills, reprocess by partition range and keep the same upsert key so replays converge.
A new field promo_funding_source is added to the order event schema, and downstream Hive tables and Redshift aggregates for Sales reporting start failing intermittently. What schema evolution strategy and validation gates do you put in place so producers can ship safely without breaking consumers?
You need a near real-time metric: gross bookings by city for Uber rides, updated within 5 minutes, but events can arrive up to 2 hours late and cancellations can happen after trip completion. How do you design the aggregation so dashboards stay stable and you can still correct history?
System Design for Distributed Data Systems
Most candidates underestimate how much your design must balance latency, consistency, and cost at Uber scale. You’ll be evaluated on clear component boundaries, failure modes, and how you’d monitor and evolve the system over time.
Design an end to end pipeline that produces an hourly Sales Ops dashboard for Uber Eats showing gross bookings, net revenue, refunds, and promo spend by city and merchant, with updates within 5 minutes of the hour. Specify ingestion, storage, compute, dimensional model, and how you guarantee idempotency and backfills when late events arrive.
Sample Answer
Use a Lambda style design, a streaming path for low latency aggregates plus a batch path that recomputes authoritative hourly facts and reconciles late data. Stream orders, refunds, and promos into a durable log, write curated tables with stable business keys, then serve the dashboard from an hourly fact table joined to city, merchant, and time dimensions. Idempotency comes from deterministic event IDs and merge semantics, late events trigger reprocessing by hour partitions, and you monitor freshness, duplicate rate, and reconciliation deltas between stream and batch outputs.
Uber wants near real time fraud and Sales anomaly monitoring on trips and Eats orders, you need to compute rolling $15$ minute metrics per merchant and city and alert within 60 seconds while also keeping a governed warehouse table for analytics. Design the distributed data system, call out consistency and latency trade offs, failure modes, and how you prevent double counting during retries and partial outages.
Coding & Algorithms (DE-leaning)
The bar here isn't whether you know obscure tricks, it's whether you can write correct, efficient code under interview constraints. Expect data-engineering flavored problems (parsing, aggregation, streaming-like logic) with solid complexity reasoning and clean tests.
You ingest Uber Eats order events as (order_id, ts, status) where status is one of CREATED, ACCEPTED, PICKED_UP, DELIVERED, CANCELED; return the final status per order_id and the final timestamp. If two events for the same order_id share the same ts, the later one in the input list wins.
Sample Answer
You could sort all events by (order_id, ts) and take the last one per order, or do a single pass hash aggregation that keeps the best-so-far event per order. Sorting is simpler to reason about, but it is $O(n \log n)$ and costs memory for rearrangement. The single pass wins here because you can compare timestamps in $O(1)$ per event and handle tie break by input position, so total time is $O(n)$ with $O(k)$ memory for $k$ orders.
from __future__ import annotations
from dataclasses import dataclass
from typing import Dict, Iterable, List, Tuple
@dataclass(frozen=True)
class Event:
order_id: str
ts: int
status: str
def final_status_per_order(events: Iterable[Tuple[str, int, str]]) -> Dict[str, Tuple[int, str]]:
"""Return {order_id: (final_ts, final_status)}.
Tie break: if ts is equal, later event in the input wins.
"""
best: Dict[str, Tuple[int, int, str]] = {}
# Store (ts, index, status) so that (ts, index) defines total order.
for idx, (order_id, ts, status) in enumerate(events):
if order_id not in best:
best[order_id] = (ts, idx, status)
continue
best_ts, best_idx, _ = best[order_id]
# Later timestamp wins; if tied, later index wins.
if ts > best_ts or (ts == best_ts and idx > best_idx):
best[order_id] = (ts, idx, status)
return {oid: (ts, status) for oid, (ts, _idx, status) in best.items()}
if __name__ == "__main__":
sample = [
("o1", 10, "CREATED"),
("o1", 12, "ACCEPTED"),
("o2", 8, "CREATED"),
("o1", 12, "CANCELED"), # same ts as ACCEPTED, later in input so wins
("o2", 9, "DELIVERED"),
]
out = final_status_per_order(sample)
assert out["o1"] == (12, "CANCELED")
assert out["o2"] == (9, "DELIVERED")
print(out)
Given a stream of Trip events (driver_id, ts, event_type) where event_type is START or END, compute for each driver the maximum number of concurrent active trips at any moment; assume events can arrive out of order, and if START and END share the same ts, END happens first. Return a dict driver_id to max_concurrency.
You are building a sales analytics rollup and receive a list of updates (merchant_id, day, delta_sales) that can include duplicates; return the top $k$ merchants by total sales over a given day range [start_day, end_day], breaking ties by smaller merchant_id. Do it in better than $O(m \log m)$ where $m$ is number of distinct merchants, assuming $k \ll m$.
SQL Querying & Optimization
Your ability to express complex analytics with joins, windows, and careful filtering is a primary signal in the DE loop. Strong answers anticipate edge cases (duplicates, slowly changing entities) and show awareness of performance implications in MPP warehouses.
You have table fact_trip(trip_id, rider_id, city_id, request_ts, trip_date, status, fare_usd). For each city and trip_date, return completed trips, unique riders, and completion_rate (completed requests divided by all requests), with completion_rate as a decimal and safe for days with zero requests.
Sample Answer
Reason through it: Filter nothing upfront, you need both completed and non-completed requests in the denominator. Aggregate by city_id and trip_date, compute total_requests as COUNT(*), completed_trips as SUM over status. Unique riders is COUNT(DISTINCT rider_id) across all requests for that day. Completion rate is completed_trips divided by total_requests, guard division by zero with NULLIF so you do not throw or lie.
/* Daily city-level completion funnel metrics */
SELECT
city_id,
trip_date,
/* All requests, regardless of status */
COUNT(*) AS total_requests,
/* Completed requests only */
SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) AS completed_trips,
/* Unique riders who made a request that day */
COUNT(DISTINCT rider_id) AS unique_riders,
/* Safe decimal rate */
(SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) * 1.0)
/ NULLIF(COUNT(*), 0) AS completion_rate
FROM fact_trip
GROUP BY
city_id,
trip_date
ORDER BY
trip_date,
city_id;You have raw_event_trip_status(trip_id, event_ts, status, ingestion_ts) where the same trip_id can have duplicate statuses and late arrivals. Produce a snapshot table with exactly one latest status row per trip_id as of a given cutoff timestamp, and explain one change you would make to reduce the scan cost in an MPP warehouse.
You have fact_trip(trip_id, driver_id, city_id, request_ts, fare_usd, is_airport_pickup) and dim_driver(driver_id, effective_from_ts, effective_to_ts, status). Find the top 3 drivers per city for the last 7 days by airport revenue, but only counting trips where the driver status was 'active' at request_ts, and return driver_id, city_id, airport_revenue, and rank.
Dimensional Modeling & Warehousing
Rather than raw SQL skill, you’re judged on how you structure facts, dimensions, and metrics so downstream analytics stays stable. Watch for prompts around SCD types, grain definition, and metric consistency across Sales/Analytics consumers.
Uber Eats wants a star schema for Sales analytics with metrics like gross_bookings, net_revenue, promo_spend, and completed_orders. Define the fact table grain and name 5 dimensions you would include, and explain one metric that must not be stored as a fact column.
Sample Answer
This question is checking whether you can lock the grain before you model anything, and avoid mixing additive facts with derived ratios. Your fact grain should be something like order line or order, not "day", otherwise you cannot safely slice by store, eater, courier, or promo. Dimensions usually include date, eater, merchant, city, product SKU or menu item, and promo or campaign. A metric like take_rate is derived (net_revenue divided by gross_bookings), it should be computed in the semantic layer to avoid aggregation bugs.
In a Sales warehouse, the merchant dimension has attributes like merchant_name, category, chain_id, and onboarding_status, and downstream teams need both "current state" and "as-of order time" reporting. Which SCD type(s) do you use, and what surrogate key strategy keeps the fact table stable?
You are modeling trip and Eats order sales in one warehouse, and Finance insists that "gross bookings" must reconcile across products and cities while analysts want flexibility to drill into adjustments like refunds and chargebacks. Do you model a single fact table, multiple fact tables, or a fact plus an adjustments fact, and how do you enforce metric consistency across them?
Cloud Infrastructure & Data Stores
In practice, you’ll need to articulate why you’d pick Spark/Hive vs an MPP warehouse vs Cassandra for a specific workload. Interviewers look for pragmatic tradeoffs: throughput vs latency, partitioning/sharding choices, and operational constraints.
You need a daily Sales analytics table for trips and promos that powers dashboards and ad hoc SQL, and you also need a low latency lookup for the current promo eligibility by rider at request time. Which parts go to Spark plus Hive, an MPP warehouse (Redshift or Vertica), and Cassandra, and what partition or key design do you pick for each?
Sample Answer
The standard move is Spark plus Hive for raw and heavy ETL, then publish curated facts and dims into an MPP warehouse for interactive analytics, and use Cassandra for serving lookups keyed by a single entity. But here, promo eligibility has sharp latency and availability constraints, so you denormalize into Cassandra by $rider\_id$ (and maybe $city\_id$) even if it duplicates warehouse data. Partition Hive by date and city for scan pruning, and model the warehouse with a partition or distribution strategy that aligns with your dominant joins (often date and city) to avoid expensive data movement.
You are building a near real-time Sales metrics pipeline for completed trips (gross bookings, net revenue) with a 5 minute SLA, consuming events that can arrive late or out of order by up to 2 hours. How do you choose between exactly-once semantics, idempotent writes, and upserts in your data store, and what consistency and compaction choices do you make if the serving layer is Cassandra?
Uber's loop punishes candidates who prep like it's a generic algorithms screen. The sample questions reference Uber Eats order idempotency, trip-level fraud detection on rolling windows, and star schemas built around ride/promo fact tables, so your answers need to reflect how Uber's marketplace actually moves data, not abstract whiteboard patterns. The single costliest mistake is grinding coding problems while ignoring pipeline design and system design, which together dominate the evaluation and frequently overlap in the same question (designing a near-real-time Sales anomaly system, for instance, tests both simultaneously).
Practice Uber-style questions across all six areas at datainterview.com/questions.
How to Prepare for Uber Data Engineer Interviews
Know the Business
Official mission
“to ignite opportunity by setting the world in motion.”
What it actually means
Uber's real mission is to be the global technology platform that powers and optimizes the movement of people and goods, creating economic opportunities and convenience across various sectors. The company also commits to sustainability and adapting its services to local needs.
Key Business Metrics
$52B
+20% YoY
$153B
-14% YoY
34K
+9% YoY
137.0M
Current Strategic Priorities
- Bring a state-of-the-art robotaxi to market later in 2026
- Build a unique new option for affordable and scalable autonomous rides in the San Francisco Bay Area and beyond
- Introduce more riders to autonomous mobility
- Deploy at least 1,200 Robotaxis across the Middle East by 2027
- Help families navigate everyday transportation with greater ease, visibility, and confidence
Competitive Moat
Uber is making its biggest platform bet since Eats: autonomous mobility. The Lucid/Nuro partnership targets a robotaxi launch in 2026, while a separate WeRide deal aims to deploy at least 1,200 robotaxis across the Middle East by 2027. For data engineers, that translates to entirely new pipeline domains: sensor telemetry, vehicle state streams, and safety-critical SLAs that don't exist in human-driver trip data.
Meanwhile, the core business posted $52 billion in revenue, up roughly 20% year over year. Existing pipelines feeding pricing, dispatch, and Eats ranking still need to scale with that growth. You'd be building new autonomous data infrastructure while keeping a massive, revenue-critical foundation healthy.
When interviewers ask "why Uber," don't talk about robotaxis in the abstract. Talk about the specific data engineering problem they create. Autonomous trip events need different dimensional models than human-driver trips (sensor fusion dimensions, safety-incident fact tables, sub-second latency budgets for vehicle routing). Framing your answer around that gap, and why your background prepares you to close it, shows you've studied Uber's actual roadmap rather than skimming a press release.
Try a Real Interview Question
Sessionize Event Stream With Watermark
pythonGiven a list of events $(user\_id, ts)$ where $ts$ is an integer Unix timestamp in seconds and the list is not guaranteed to be sorted, compute per-user sessions using a gap threshold $g$ seconds. Two consecutive events for the same user belong to the same session if the time difference is $\le g$, and you must ignore late events with $ts < watermark$; return a list of sessions as tuples $(user\_id, start\_ts, end\_ts, event\_count)$ sorted by $user\_id$ then $start\_ts$.
from typing import Iterable, List, Tuple
def sessionize_events(
events: Iterable[Tuple[str, int]],
gap_seconds: int,
watermark: int,
) -> List[Tuple[str, int, int, int]]:
"""Sessionize per-user events with a gap threshold and a watermark.
Args:
events: Iterable of (user_id, ts) events. Not guaranteed to be sorted.
gap_seconds: Gap threshold in seconds. Same session if next_ts - prev_ts <= gap_seconds.
watermark: Ignore late events where ts < watermark.
Returns:
List of (user_id, start_ts, end_ts, event_count) sorted by user_id then start_ts.
"""
pass
700+ ML coding problems with a live Python executor.
Practice in the EngineUber's job postings for data engineers call out strong Java/Python and distributed systems experience, so expect coding rounds that reward clean, well-structured code over brute-force solutions that merely pass. Practice with problems at datainterview.com/coding that let you build that muscle in a timed setting.
Test Your Readiness
How Ready Are You for Uber Data Engineer?
1 / 10Can you design an incremental ingestion pipeline (batch or streaming) that provides exactly-once semantics or effective deduplication using event_time, idempotent writes, and replay handling?
Gaps in SQL optimization or pipeline design are the fastest to close with targeted reps. Work through Uber-relevant scenarios at datainterview.com/questions.
Frequently Asked Questions
How long does the Uber Data Engineer interview process take?
From first recruiter call to offer, expect roughly 4 to 6 weeks. You'll start with a recruiter screen, then move to a technical phone screen focused on SQL and coding. After that comes the onsite (or virtual onsite), which typically includes 4 to 5 rounds in a single day. Scheduling can stretch things out, especially if the team is busy, so don't be surprised if it takes closer to 7 weeks in some cases.
What technical skills are tested in the Uber Data Engineer interview?
Uber tests you hard on data pipeline design, dimensional data modeling, and data warehousing. You should be comfortable building production-quality ETL pipelines and working with distributed data systems, including logging, storage, data quality, and monitoring. Real-time data processing and scalability engineering come up frequently. On the coding side, SQL is non-negotiable (advanced level, including window functions), and you'll also need solid Python skills. Java and Scala knowledge is a plus, especially for pipeline work.
How should I tailor my resume for an Uber Data Engineer role?
Lead with your data pipeline and ETL experience. Uber cares about scale, so quantify everything: how many records your pipelines processed, latency improvements you achieved, how many downstream consumers relied on your data. Call out specific technologies for distributed systems, real-time processing, and data warehousing. If you've done dimensional modeling or built monitoring/data quality frameworks, put that front and center. Keep it to one page if you have under 10 years of experience, and mirror the language from Uber's job description.
What is the total compensation for Uber Data Engineer roles?
Uber pays competitively for data engineers in San Francisco. For a mid-level Data Engineer (L4), total compensation typically falls in the $200K to $280K range including base, bonus, and RSUs. Senior Data Engineers (L5) can expect $280K to $380K total comp. Staff level (L5b/L6) pushes well above $400K. RSUs vest over four years and make up a significant chunk, so pay attention to the stock component when evaluating your offer.
How do I prepare for the Uber Data Engineer behavioral interview?
Uber's culture emphasizes integrity, customer obsession, and doing the right thing. Prepare stories that show you making tough tradeoffs, pushing back on bad ideas respectfully, and thinking about the end user. They want to see that you can operate with a global mindset while solving local problems. Have 5 to 6 strong stories ready that cover conflict resolution, technical leadership, and times you improved something without being asked. I've seen candidates get rejected despite strong technical rounds because they couldn't articulate how they collaborate across teams.
How hard are the SQL questions in the Uber Data Engineer interview?
They're genuinely hard. Expect advanced SQL with window functions, CTEs, self-joins, and multi-step aggregations. You won't get away with just knowing SELECT and GROUP BY. Uber's SQL questions often involve real-world scenarios like calculating rider metrics, driver utilization, or trip-level analytics. Practice writing complex queries from scratch without an IDE helping you. You can find similar difficulty questions at datainterview.com/questions to get a feel for the level they expect.
What happens during the Uber Data Engineer onsite interview?
The onsite typically has 4 to 5 rounds spread across one day. You'll face a SQL deep-dive round, a coding round (usually Python), a system design round focused on data pipeline architecture, and at least one behavioral round. The system design round is where many candidates struggle. You'll be asked to design end-to-end data systems covering ingestion, storage, transformation, and serving layers. Some loops also include a data modeling round where you design a dimensional schema from scratch.
What metrics and business concepts should I know for the Uber Data Engineer interview?
Understand Uber's two-sided marketplace. Know metrics like trip completion rate, surge pricing mechanics, driver utilization, rider retention, and ETA accuracy. Think about how these metrics flow through data pipelines and what data quality issues could arise. Uber generates $52B in revenue, so the data volumes are massive. Being able to talk about how you'd model ride data, payment events, or driver earnings at that scale shows you understand the business, not just the tech.
Are ML or statistics concepts tested in the Uber Data Engineer interview?
Data Engineer roles at Uber are more engineering-focused than ML-focused. You probably won't be asked to derive a gradient descent algorithm. But you should understand how your pipelines feed ML models and analytics. Concepts like A/B testing data pipelines, feature engineering at scale, and basic statistical awareness (distributions, sampling, aggregation bias) can come up in conversation. If you're interviewing for a more senior role, expect questions about how you'd build data infrastructure that supports ML workflows.
What format should I use to answer Uber behavioral interview questions?
Use the STAR format (Situation, Task, Action, Result) but keep it tight. Uber interviewers don't want a 10-minute monologue. Spend about 20% on setup and 60% on what you actually did. Always end with a measurable result. For example, don't say 'the pipeline was faster.' Say 'latency dropped from 45 minutes to 8 minutes, which unblocked the pricing team's daily refresh.' Tie your answers back to Uber's values when it feels natural, especially customer obsession and integrity.
What are common mistakes candidates make in the Uber Data Engineer interview?
The biggest one I see is underestimating the system design round. Candidates prep SQL and coding but walk into the design round without a framework for discussing data pipelines end to end. Another common mistake is being too theoretical. Uber wants people who've actually built things at scale, so vague answers about 'best practices' won't cut it. Also, don't skip behavioral prep. Uber takes culture fit seriously, and a weak behavioral round can sink an otherwise strong performance.
How should I practice coding for the Uber Data Engineer interview?
Focus on Python and SQL, in that order of coding priority. For Python, practice data manipulation, writing clean functions, and working with common libraries. For SQL, drill window functions, recursive CTEs, and complex joins until they're second nature. Write everything by hand or in a plain text editor to simulate interview conditions. I recommend practicing with the problems at datainterview.com/coding, which are calibrated to the difficulty level you'll actually face. Aim for at least 3 to 4 weeks of consistent daily practice before your onsite.



