Design a Ride-Sharing Service (Uber)

Understanding the Problem

Product definition: A platform that matches riders requesting trips with nearby drivers in real time, handles pricing, and tracks the ride from pickup to dropoff.

You've used Uber or Lyft. You open the app, drop a pin, tap "Request," and within seconds a driver appears on your map heading toward you. Behind that deceptively simple experience is one of the hardest real-time coordination problems in distributed systems: finding the best driver among millions of moving objects, all within a few seconds.

The interviewer isn't asking you to build a CRUD app with a map layer. They want to see you wrestle with the geospatial matching problem, the firehose of location data, and the state machine that keeps a ride consistent from request to payment. If you walk in thinking this is mostly about storing ride records in Postgres, you'll lose the room fast.

Functional Requirements

Core Requirements:

Ride requesting: A rider submits a trip with pickup and dropoff locations, receives a fare estimate, and enters a matching queue.
Driver matching: The system finds nearby available drivers, ranks them, and offers the ride. Drivers can accept or decline. If declined, the offer cascades to the next candidate.
Real-time tracking: Both rider and driver see each other's live position on a map throughout the ride, from en-route to dropoff.
Fare calculation and payment: When the trip ends, the system computes the final fare based on actual distance and time, then charges the rider and credits the driver.
Trip history: Both riders and drivers can view past trips, including route, fare breakdown, and ratings.

Below the line (out of scope):

Carpooling / shared rides (UberPool-style route merging)
Driver onboarding, background checks, and document verification
In-app messaging or calling between rider and driver

Note: "Below the line" features are acknowledged but won't be designed in this lesson. Mentioning them shows the interviewer you understand the full product surface without getting pulled into scope creep.

Non-Functional Requirements

The numbers here matter. Interviewers want to see that your architecture choices are driven by actual scale constraints, not vibes.

Low-latency matching: A rider should be matched with a driver in under 10 seconds. This is the number that shapes everything about the matching pipeline.
High availability for ride requests: The ride request path must be available at 99.99%. If a rider can't request a ride, the business loses money immediately. This is the revenue-critical path.
Strong consistency for payments and ride state: You can tolerate a stale driver position by a few seconds. You absolutely cannot tolerate charging a rider twice or assigning two drivers to the same ride. Payment and ride state transitions need strong consistency guarantees.
Scale: 20M daily active riders, 5M active drivers across multiple cities. Drivers send GPS pings every 4 seconds. That location update volume is the single biggest infrastructure challenge in this system.

Back-of-Envelope Estimation

Tip: Always clarify requirements before jumping into design. This shows maturity. In an actual interview, you'd ask: "How many daily active users are we targeting? What's the expected ride volume?" Then use the answers to drive these calculations.

Metric	Calculation	Result
Ride requests (avg)	20M rides/day ÷ 86,400 sec	~230 rides/sec
Ride requests (peak)	230 × 5 (rush hour multiplier)	~1,150 rides/sec
Location updates	5M drivers × 1 ping per 4 sec	~1.25M updates/sec
Location update size	driver_id (16B) + lat/lng (16B) + heading (2B) + speed (4B) + timestamp (8B) + overhead	~100 bytes
Location bandwidth	1.25M updates/sec × 100 bytes	~125 MB/sec
Location storage (raw, 1 day)	125 MB/sec × 86,400 sec	~10.8 TB/day
Ride records (1 day)	20M rides × ~2 KB per ride	~40 GB/day

Two numbers should jump out at you. First, 1.25M location writes per second is an enormous write throughput that immediately rules out a traditional relational database for the hot path. Second, 10.8 TB/day of raw location data means you need a retention and compaction strategy; you're not keeping every ping forever.

The core tension the interviewer wants you to grapple with: this is a real-time geospatial coordination problem. You have millions of drivers moving continuously, and when a rider requests a ride, you have roughly 10 seconds to search that moving dataset, rank candidates, offer the ride, and get an acceptance. The matching decision has a brutally tight time budget, and the location data powering it is changing 1.25 million times every second. Every architectural choice you make should flow from that reality.

The Set Up

Before you draw a single box on the whiteboard, you need to nail down what the system actually tracks. For Uber, the data model is deceptively simple on the surface: riders, drivers, rides. But the moment you start thinking about how a ride moves through its lifecycle, and how driver locations flow through the system at 1.25M updates per second, the modeling decisions get interesting fast.

Core Entities

Five primary entities carry this system, plus a couple of supporting tables that round out the model.

Rider is the person requesting a trip. They have a profile, a default payment method, and a rating that drivers see before accepting. Nothing exotic here.

Driver is more complex. Beyond their profile, a driver has a vehicle, a status that changes constantly (offline, available, or busy on a ride), and a current location. That location field is going to cause us some architectural headaches later, so flag it mentally now.

Ride is the central entity, and it's the one interviewers care about most during the setup phase. A ride links a rider to a driver and carries a status that follows a strict state machine:

REQUESTED → MATCHED → EN_ROUTE → IN_PROGRESS → COMPLETED

A ride also carries a ride_type (standard, premium, XL) because that field drives both matching logic and pricing. A ride can be CANCELLED from several of those states. Every state transition is a meaningful system event that triggers downstream work (notifications, location tracking, fare calculation). If you sketch this state machine on the whiteboard early, your interviewer will know you understand the problem.

Fare is a separate entity from the ride, not just a column. It captures the pricing breakdown: base fare, distance component, time component, surge multiplier, and the total. Keeping it separate lets you version pricing logic independently and makes payment reconciliation cleaner.

LocationLog is the odd one out. It's not a traditional relational entity. It's a time-series stream of GPS coordinates emitted by every active driver every 4 seconds. This data is ephemeral for matching purposes (you only care about the latest position) but permanent for fare calculation (you need the full route to compute distance traveled). That dual nature is why it doesn't belong in the same store as your ride records.

Tip: When you introduce the LocationLog entity, explicitly tell your interviewer: "This is high-frequency, append-only data that needs a different storage strategy than the rest of the schema." That one sentence signals you understand the system's core tension.

Two supporting entities fill in the gaps. Vehicle stores make, model, license plate, color, and capacity. It's referenced by the driver and also determines which ride types a driver is eligible for (an XL driver needs a vehicle with capacity >= 6). PaymentMethod stores tokenized card or wallet references for a rider. No raw card numbers; just a token from your payment processor (Stripe, Braintree) and a billing address. These tables aren't where the interesting design decisions live, but having them in your schema shows completeness.

Here are the schemas:

CREATE TABLE payment_methods (
    id             UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    rider_id       UUID NOT NULL REFERENCES riders(id),
    type           VARCHAR(20) NOT NULL,              -- 'card', 'wallet', 'paypal'
    token          VARCHAR(255) NOT NULL,             -- tokenized by payment processor
    last_four      CHAR(4),                           -- for display only
    is_default     BOOLEAN NOT NULL DEFAULT false,
    created_at     TIMESTAMP NOT NULL DEFAULT now()
);
CREATE INDEX idx_payment_rider ON payment_methods(rider_id);

CREATE TABLE vehicles (
    id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    make          VARCHAR(100) NOT NULL,
    model         VARCHAR(100) NOT NULL,
    year          SMALLINT NOT NULL,
    color         VARCHAR(50) NOT NULL,
    license_plate VARCHAR(20) NOT NULL UNIQUE,
    capacity      SMALLINT NOT NULL DEFAULT 4,        -- determines eligible ride types
    created_at    TIMESTAMP NOT NULL DEFAULT now()
);

CREATE TABLE riders (
    id                 UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name               VARCHAR(255) NOT NULL,
    email              VARCHAR(255) NOT NULL UNIQUE,
    phone              VARCHAR(20) NOT NULL UNIQUE,
    default_payment_id UUID REFERENCES payment_methods(id),
    rating             DECIMAL(3,2) DEFAULT 5.00,     -- rolling average from ride_ratings
    created_at         TIMESTAMP NOT NULL DEFAULT now()
);

CREATE TABLE drivers (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name        VARCHAR(255) NOT NULL,
    phone       VARCHAR(20) NOT NULL UNIQUE,
    vehicle_id  UUID NOT NULL REFERENCES vehicles(id),
    status      VARCHAR(20) NOT NULL DEFAULT 'OFFLINE',  -- OFFLINE | AVAILABLE | BUSY
    rating      DECIMAL(3,2) DEFAULT 5.00,
    current_lat DOUBLE PRECISION,  -- denormalized from location service for convenience
    current_lng DOUBLE PRECISION,
    created_at  TIMESTAMP NOT NULL DEFAULT now()
);
CREATE INDEX idx_drivers_status ON drivers(status);

CREATE TABLE rides (
    id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    rider_id      UUID NOT NULL REFERENCES riders(id),
    driver_id     UUID REFERENCES drivers(id),       -- NULL until MATCHED
    status        VARCHAR(20) NOT NULL DEFAULT 'REQUESTED',
    ride_type     VARCHAR(20) NOT NULL DEFAULT 'standard',  -- standard | premium | xl
    pickup_lat    DOUBLE PRECISION NOT NULL,
    pickup_lng    DOUBLE PRECISION NOT NULL,
    dropoff_lat   DOUBLE PRECISION NOT NULL,
    dropoff_lng   DOUBLE PRECISION NOT NULL,
    version       INT NOT NULL DEFAULT 1,             -- for optimistic locking on state transitions
    requested_at  TIMESTAMP NOT NULL DEFAULT now(),
    matched_at    TIMESTAMP,
    started_at    TIMESTAMP,                          -- rider picked up
    completed_at  TIMESTAMP,
    cancelled_at  TIMESTAMP
);
CREATE INDEX idx_rides_rider ON rides(rider_id, requested_at DESC);
CREATE INDEX idx_rides_driver ON rides(driver_id, requested_at DESC);
CREATE INDEX idx_rides_status ON rides(status);

CREATE TABLE fares (
    id               UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    ride_id          UUID NOT NULL UNIQUE REFERENCES rides(id),
    base_fare        DECIMAL(10,2) NOT NULL,
    distance_fare    DECIMAL(10,2) NOT NULL,          -- cents per mile * miles
    time_fare        DECIMAL(10,2) NOT NULL,           -- cents per minute * minutes
    surge_multiplier DECIMAL(4,2) NOT NULL DEFAULT 1.00,
    total            DECIMAL(10,2) NOT NULL,
    payment_status   VARCHAR(20) NOT NULL DEFAULT 'PENDING',  -- PENDING | CHARGED | FAILED | REFUNDED
    charged_at       TIMESTAMP
);

CREATE TABLE ride_ratings (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    ride_id     UUID NOT NULL REFERENCES rides(id),
    rater_id    UUID NOT NULL,                        -- rider or driver who gave the rating
    ratee_id    UUID NOT NULL,                        -- rider or driver who received it
    score       SMALLINT NOT NULL CHECK (score BETWEEN 1 AND 5),
    created_at  TIMESTAMP NOT NULL DEFAULT now(),
    UNIQUE(ride_id, rater_id)                         -- one rating per person per ride
);
CREATE INDEX idx_ratings_ratee ON ride_ratings(ratee_id, created_at DESC);

-- This lives in a time-series store (TimescaleDB), not the main Postgres
CREATE TABLE location_logs (
    driver_id   UUID NOT NULL,
    lat         DOUBLE PRECISION NOT NULL,
    lng         DOUBLE PRECISION NOT NULL,
    heading     SMALLINT,                             -- degrees 0-359
    speed       DECIMAL(5,1),                         -- km/h
    recorded_at TIMESTAMP NOT NULL DEFAULT now()
);
-- TimescaleDB hypertable, partitioned by time
-- No traditional PK; queried by driver_id + time range
CREATE INDEX idx_location_driver_time ON location_logs(driver_id, recorded_at DESC);

Notice the version column on the rides table. That's doing real work. Every state transition does a compare-and-swap: "update this ride to MATCHED only if the version is still 3." This prevents two drivers from accepting the same ride simultaneously. You'll want to mention this when you get to consistency discussions.

The ride_ratings table deserves a quick explanation. After each completed ride, both the rider and driver submit a score. The rating column on the riders and drivers tables is a denormalized rolling average. You don't recompute it from scratch on every new rating. Instead, an async worker picks up new ride_ratings rows and updates the average incrementally: new_avg = old_avg + (new_score - old_avg) / total_ratings. This keeps the hot path (reading a driver's rating during matching) as a single column lookup, while the write path (recording a new rating) is eventually consistent. If an interviewer asks "what if the worker falls behind?", the answer is that a slightly stale rating (off by one trip) has zero business impact.

Common mistake: Candidates put current_lat and current_lng on the drivers table and think that's sufficient for matching. It's fine as a denormalized convenience field, but the real current position lives in an in-memory spatial index (Redis). The database is too slow for 1.25M writes/sec of location data.

API Design

Each endpoint maps to one step in the ride lifecycle. Here's the full surface:

// Rider requests a new ride with pickup and dropoff locations
POST /rides
{
  "rider_id": "uuid",
  "pickup": { "lat": 37.7749, "lng": -122.4194 },
  "dropoff": { "lat": 37.7849, "lng": -122.4094 },
  "ride_type": "standard"
}
-> {
  "ride_id": "uuid",
  "status": "REQUESTED",
  "fare_estimate": { "min": 12.50, "max": 16.00, "surge": 1.2 },
  "estimated_pickup_time": "3 min"
}

// Driver accepts a ride offer pushed to them via WebSocket
POST /rides/{ride_id}/accept
{
  "driver_id": "uuid"
}
-> {
  "ride_id": "uuid",
  "status": "MATCHED",
  "pickup": { "lat": 37.7749, "lng": -122.4194 },
  "rider": { "name": "Alex", "rating": 4.85 }
}

// Transition ride state (driver marks pickup, completion, or cancellation)
PATCH /rides/{ride_id}/status
{
  "status": "IN_PROGRESS",
  "version": 4
}
-> {
  "ride_id": "uuid",
  "status": "IN_PROGRESS",
  "version": 5
}

// Driver pushes GPS coordinates (high frequency, every 4 seconds)
POST /drivers/location
{
  "driver_id": "uuid",
  "lat": 37.7752,
  "lng": -122.4190,
  "heading": 45,
  "speed": 32.5,
  "timestamp": "2024-01-15T10:30:00Z"
}
-> { "ack": true }

// Rider polls or subscribes to live driver location during a ride
GET /rides/{ride_id}/track
-> {
  "driver_location": { "lat": 37.7755, "lng": -122.4185 },
  "eta_minutes": 2,
  "updated_at": "2024-01-15T10:30:04Z"
}

// Get fare estimate before requesting, or final fare after completion
GET /rides/{ride_id}/fare
-> {
  "base_fare": 2.50,
  "distance_fare": 8.40,
  "time_fare": 3.20,
  "surge_multiplier": 1.2,
  "total": 16.92,
  "payment_status": "CHARGED"
}

A few verb choices worth calling out. The accept endpoint is POST, not PATCH, because the driver is creating an acceptance action, not updating a field. The status transition is PATCH because you're modifying an existing resource's state. The location ping is POST because each ping is a new data point being created, even though it looks like an update to the driver's position.

Tip: In practice, the location ping and tracking endpoints would run over a persistent WebSocket or gRPC stream, not REST. But framing them as REST endpoints first is the right move in an interview. It shows you understand the logical contract. Then you can say "in production, I'd upgrade this to a persistent connection for efficiency" and your interviewer will nod.

The GET /rides/{ride_id}/track endpoint is interesting because it's the one most likely to shift from pull to push. During an active ride, the rider's app doesn't want to poll every 4 seconds. It wants a WebSocket subscription where the server pushes location updates as they arrive. Mention both options and let the interviewer steer you toward whichever they want to explore.

One thing you should not do: design a single monolithic /rides endpoint that handles everything through request body flags. Each action in the ride lifecycle is a distinct operation with different authorization rules (riders request, drivers accept), different consistency requirements (fare calculation needs strong consistency, tracking can tolerate slight staleness), and different scaling profiles. Separate endpoints make those differences explicit.

High-Level Design

We'll build this system one user story at a time. Each functional requirement introduces new components, and by the end you'll see how they all connect.

1) Rider Requests a Ride

The rider opens the app, drops a pin (or types an address), and taps "Request Ride." That single tap kicks off a surprisingly deep chain of events.

Components involved: Rider App, API Gateway, Ride Service, Pricing Service, Matching Service, PostgreSQL (ride records).

The data flow:

The Rider App sends POST /rides to the API Gateway with pickup and dropoff coordinates.
The API Gateway authenticates the request and routes it to the Ride Service.
The Ride Service calls the Pricing Service with the coordinates to get a fare estimate (base fare + distance + time + any active surge multiplier).
The Ride Service creates a ride record in REQUESTED state, storing the fare estimate and the surge multiplier locked at this moment.
The Ride Service publishes a match request to the Matching Service (synchronously or via an internal queue, depending on your latency budget).
The Ride Service returns the ride ID and fare estimate to the rider.

Here's the API contract:

POST /rides
{
  "rider_id": "uuid",
  "pickup": { "lat": 37.7749, "lng": -122.4194 },
  "dropoff": { "lat": 37.7849, "lng": -122.4094 },
  "ride_type": "standard"
}

Response 201:
{
  "ride_id": "uuid",
  "status": "REQUESTED",
  "fare_estimate": {
    "min": 12.50,
    "max": 16.00,
    "surge_multiplier": 1.3,
    "currency": "USD"
  },
  "estimated_pickup_minutes": 4
}

And the ride record that gets persisted:

CREATE TABLE rides (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    rider_id        UUID NOT NULL REFERENCES riders(id),
    driver_id       UUID,                          -- NULL until matched
    status          VARCHAR(20) NOT NULL DEFAULT 'REQUESTED',
    pickup_lat      DOUBLE PRECISION NOT NULL,
    pickup_lng      DOUBLE PRECISION NOT NULL,
    dropoff_lat     DOUBLE PRECISION NOT NULL,
    dropoff_lng     DOUBLE PRECISION NOT NULL,
    surge_multiplier DECIMAL(3,2) NOT NULL DEFAULT 1.00,
    fare_estimate_min DECIMAL(10,2),
    fare_estimate_max DECIMAL(10,2),
    version         INTEGER NOT NULL DEFAULT 1,    -- for optimistic locking
    requested_at    TIMESTAMP NOT NULL DEFAULT now(),
    matched_at      TIMESTAMP,
    started_at      TIMESTAMP,
    completed_at    TIMESTAMP
);
CREATE INDEX idx_rides_rider ON rides(rider_id, requested_at DESC);
CREATE INDEX idx_rides_status ON rides(status) WHERE status IN ('REQUESTED', 'MATCHED', 'EN_ROUTE', 'IN_PROGRESS');

Tip: Notice the surge_multiplier is stored on the ride record at request time. Interviewers love asking "what if surge changes during the ride?" Your answer: it doesn't matter, because we locked the price when the rider confirmed. This is a product decision with real engineering implications, and calling it out shows maturity.

One design decision worth discussing: should the Ride Service call the Matching Service synchronously or asynchronously? Synchronous means the rider waits slightly longer for the initial response, but you can return an ETA immediately. Asynchronous (via a message queue) decouples the services and handles spikes better, but the rider gets a "searching for drivers..." state and you push the match result via WebSocket later. Most real systems go async here.

2) System Matches Rider with Nearby Driver

This is the heart of the entire system. Everything else is plumbing; this is where the magic happens.

Components involved: Matching Service, Location Service (backed by a spatial index like Redis Geo), ETA Service, Notification Service (WebSocket connections to driver apps), Ride Service.

The data flow:

The Matching Service receives the match request containing the pickup coordinates and ride ID.
It queries the Location Service: "Give me available drivers within 3 km of this point."
The Location Service runs a spatial query against its in-memory index (Redis GEOSEARCH) and returns a list of candidate drivers with their current positions.
The Matching Service filters out drivers who are already on an active ride or have a status other than AVAILABLE.
For the top ~5 candidates, the Matching Service calls the ETA Service to compute road-network travel time to the pickup point (not just straight-line distance).
It ranks candidates by ETA and sends a ride offer to the best candidate via WebSocket through the Notification Service.
The driver has 10 seconds to accept. If they accept, the Matching Service tells the Ride Service to transition the ride to MATCHED and assigns the driver.
If the driver declines or the timer expires, the offer cascades to the next candidate.
If all candidates in the initial batch decline, the Matching Service widens the search radius and tries again.

// Internal: Matching Service → Location Service
GET /drivers/nearby?lat=37.7749&lng=-122.4194&radius_km=3&status=AVAILABLE

Response:
{
  "drivers": [
    { "driver_id": "uuid-1", "lat": 37.776, "lng": -122.418, "distance_m": 320 },
    { "driver_id": "uuid-2", "lat": 37.773, "lng": -122.421, "distance_m": 580 },
    { "driver_id": "uuid-3", "lat": 37.779, "lng": -122.415, "distance_m": 710 }
  ]
}

// WebSocket push to driver app
{
  "type": "RIDE_OFFER",
  "ride_id": "uuid",
  "pickup": { "lat": 37.7749, "lng": -122.4194, "address": "123 Market St" },
  "dropoff": { "lat": 37.7849, "lng": -122.4094, "address": "456 Mission St" },
  "estimated_fare": 14.25,
  "offer_expires_at": "2024-01-15T10:30:15Z"
}

Common mistake: Candidates often say "find the nearest driver and assign them." That's a greedy algorithm that ignores the real world. The nearest driver might be across a highway with no exit for 2 miles. They might have a 15% acceptance rate. They might be heading away from the pickup at 60 mph. Distance alone is a terrible proxy for "best match." Mention ETA, heading, and acceptance probability, even if you don't build all three.

A subtle but important point: the Matching Service needs to "lock" a driver while their offer is pending. Otherwise, two concurrent ride requests could both offer to the same driver. You can do this with a short-lived reservation in Redis (a simple key with a 15-second TTL). If the driver declines or times out, the reservation expires automatically.

The offer cascade is sequential by default (offer to one driver at a time). Some systems use a broadcast model where multiple drivers see the ride and the fastest to accept wins. Sequential gives you more control over who gets the ride. Broadcast fills rides faster but creates a worse driver experience. Know the tradeoff; the interviewer might ask.

3) Real-Time Tracking During the Ride

Once a driver is matched and heading to pickup, both the rider and the driver need to see each other's position in near-real-time. After pickup, the rider watches the car move toward the destination. This is the feature that makes ride-sharing feel like ride-sharing.

Components involved: Driver App, Location Service, Spatial Index (Redis), Notification Service (WebSocket connections to rider apps), Location Log (Kafka + time-series store).

The data flow:

The Driver App sends a GPS ping to the Location Service every 4 seconds via a persistent connection (WebSocket or gRPC stream).
The Location Service updates the driver's current position in the spatial index (Redis GEOADD). This is the "hot path" used by the Matching Service for future queries.
Simultaneously, the Location Service publishes the location event to Kafka. This is the "cold path" for historical storage.
For drivers currently on an active ride, the Location Service also pushes the update to the Notification Service.
The Notification Service looks up which rider is associated with this driver's active ride and forwards the position update over the rider's WebSocket connection.
The Rider App renders the updated driver position on the map.

The GPS ping payload is intentionally tiny:

{
  "driver_id": "uuid",
  "lat": 37.7752,
  "lng": -122.4183,
  "heading": 270,
  "speed": 12.5,
  "timestamp": 1705312200
}

That's roughly 100 bytes. At 1.25M pings per second across all drivers, you're looking at ~125 MB/sec of inbound location data. Not trivial, but very manageable for a well-designed ingestion layer.

Key insight: The spatial index (Redis) only stores the current position of each driver. It's ephemeral, constantly overwritten. The location history flows through Kafka into a time-series database for trip reconstruction and fare calculation later. This separation of hot and cold paths is one of the most important architectural decisions in the entire system. If you only remember one thing from this section, make it this.

How does the Notification Service know which rider to push updates to? It maintains a mapping of ride_id → rider WebSocket connection. When a ride transitions to MATCHED, the Notification Service subscribes to location updates for that driver. When the ride completes, it unsubscribes. Simple pub/sub semantics.

One thing interviewers sometimes probe: what happens if the rider's WebSocket disconnects temporarily (subway, elevator)? The app reconnects and requests the latest position. You don't need to buffer missed updates because only the current position matters for the map view.

4) Fare Calculation and Trip Completion

The driver taps "Drop off" and the ride ends. Now you need to figure out what to charge.

Components involved: Driver App, Ride Service, Pricing Service, Location Log (TimescaleDB), Message Queue (Kafka or SQS), Payment Service.

The data flow:

The Driver App sends PATCH /rides/{id}/status with status: COMPLETED to the Ride Service.
The Ride Service validates the state transition (must be IN_PROGRESS → COMPLETED) using the version column for optimistic locking.
The Ride Service calls the Pricing Service to compute the final fare.
The Pricing Service queries the location log (TimescaleDB) for all GPS points recorded during this ride, reconstructs the actual route, and calculates total distance traveled and total trip time.
It applies the fare formula: (base_fare + distance_fare + time_fare) × surge_multiplier. The surge multiplier comes from the ride record (locked at request time, remember).
The Ride Service writes the fare record and updates the ride's completed_at timestamp.
The Ride Service publishes a RIDE_COMPLETED event to the message queue.
The Payment Service consumes this event and charges the rider's payment method asynchronously.
Both rider and driver receive a trip receipt via push notification.

CREATE TABLE fares (
    id                UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    ride_id           UUID NOT NULL UNIQUE REFERENCES rides(id),
    base_fare         DECIMAL(10,2) NOT NULL,
    distance_fare     DECIMAL(10,2) NOT NULL,      -- rate × km traveled
    time_fare         DECIMAL(10,2) NOT NULL,       -- rate × minutes elapsed
    surge_multiplier  DECIMAL(3,2) NOT NULL,
    total             DECIMAL(10,2) NOT NULL,
    payment_status    VARCHAR(20) NOT NULL DEFAULT 'PENDING',  -- PENDING, CHARGED, FAILED, REFUNDED
    idempotency_key   UUID NOT NULL UNIQUE,         -- prevents double charges on retry
    charged_at        TIMESTAMP,
    created_at        TIMESTAMP NOT NULL DEFAULT now()
);

Why is payment asynchronous? Because you never want a payment gateway timeout to block the ride completion flow. The rider and driver should see "Ride Complete" instantly. If the charge fails, a retry worker picks it up. If it fails repeatedly, it enters a manual review queue. The idempotency_key ensures that retries don't double-charge.

Tip: When you mention the idempotency key, you're signaling to the interviewer that you've thought about failure modes in distributed payment systems. This is a small detail that punches well above its weight in terms of the signal it sends.

The fare formula itself is straightforward, but the distance calculation is worth a sentence. You don't use the straight-line distance between pickup and dropoff. You sum the distances between consecutive GPS points from the location log. This gives you the actual road distance traveled, including detours, wrong turns, and traffic reroutes.

Putting It All Together

Here's how all the pieces connect:

The API Gateway sits at the front, handling authentication and routing requests to the appropriate service. Behind it, five core services each own a distinct responsibility:

Ride Service owns the ride lifecycle and state machine. It's the orchestrator that coordinates with other services during state transitions. Backed by PostgreSQL.
Matching Service finds and assigns drivers to rides. It queries the Location Service for candidates, computes ETAs, and manages the offer cascade. Stateless except for short-lived driver reservations in Redis.
Location Service ingests 1.25M GPS pings per second and maintains the real-time spatial index. It writes to Redis (hot path) and Kafka (cold path) simultaneously.
Pricing Service handles both fare estimates (at request time) and final fare calculation (at completion). It reads from the location log for trip reconstruction and from a zone-based surge pricing model.
Notification Service manages persistent WebSocket connections to both rider and driver apps. It's the real-time push layer for ride offers, location updates, status changes, and receipts.

A message queue (Kafka) connects the async workflows: payment processing, trip receipts, driver earnings calculations, and analytics. This keeps the synchronous request path fast and the system resilient to downstream slowdowns.

The two data stores you should highlight on your diagram: PostgreSQL for ride records, fares, and user data (strong consistency, transactional); Redis for the live spatial index and driver reservations (speed, ephemeral data). The time-series store (TimescaleDB or similar) holds the location history and sits behind Kafka consumers.

Warning: A common mistake at this stage is drawing 15 boxes on the whiteboard and losing the interviewer. Five services, two databases, one queue, one real-time push layer. That's the whole system. You can always add complexity in the deep dives, but the high-level architecture should fit in someone's head in under 30 seconds.

Deep Dives

This is where the interview gets interesting. The high-level design shows you can draw boxes and arrows. The deep dives show you actually understand what makes this system hard to build.

"How do we efficiently find nearby drivers?"

Your matching service just received a ride request at coordinates (40.7484, -73.9857). There are 5 million drivers across the platform. How do you find the 10 closest available ones?

Bad Solution: Brute-Force Distance Scan

The instinct is straightforward: query all drivers, compute the Haversine distance from each one to the pickup point, sort, take the top 10.

# O(n) scan over every driver in the system
def find_nearby_drivers(pickup_lat, pickup_lng, all_drivers):
    distances = []
    for driver in all_drivers:
        dist = haversine(pickup_lat, pickup_lng, driver.lat, driver.lng)
        if dist < MAX_RADIUS_KM and driver.status == 'AVAILABLE':
            distances.append((driver, dist))
    distances.sort(key=lambda x: x[1])
    return distances[:10]

At 5M drivers, this is a full table scan every time someone requests a ride. At 230 rides/sec average (and 1,000+ during peaks), you're doing billions of distance calculations per second. The math alone kills you before you even factor in the database I/O.

Warning: Candidates who jump straight to "just query the database with a WHERE clause on lat/lng ranges" are essentially proposing this. A bounding-box filter on a B-tree index is better than raw scanning, but it still degrades badly at scale because lat/lng range queries don't compose well with the spatial distribution of drivers.

Good Solution: Geohashing

Geohashing converts a (lat, lng) pair into a string like dr5ru7. Longer strings mean smaller cells. Drivers in the same geographic area share a common prefix, so you can bucket them by geohash and look up only the relevant buckets.

When a ride request comes in, you compute the geohash of the pickup location, then query that cell plus its 8 neighboring cells (to handle the boundary problem where a driver 50 meters away might be in an adjacent cell).

def find_nearby_drivers_geohash(pickup_lat, pickup_lng, precision=6):
    center_hash = geohash.encode(pickup_lat, pickup_lng, precision)
    neighbor_hashes = geohash.neighbors(center_hash)  # 8 surrounding cells

    candidates = []
    for gh in [center_hash] + neighbor_hashes:
        drivers = redis.smembers(f"drivers:geo:{gh}")  # O(1) lookup per cell
        candidates.extend(drivers)

    # Now compute exact distance only for this small candidate set
    return rank_by_distance(candidates, pickup_lat, pickup_lng)

At precision 6, each geohash cell is roughly 1.2km × 0.6km. You're scanning maybe a few hundred drivers instead of millions. That's a massive improvement.

The tradeoff: geohash cells are fixed-size rectangles. In Manhattan, a single cell might contain 200 available drivers. In rural Montana, you might need to expand your search radius across dozens of cells to find even one. The fixed resolution doesn't adapt to driver density.

Great Solution: Adaptive Spatial Indexing with Redis GEOSEARCH

The best approach combines two ideas: an adaptive spatial data structure for resolution flexibility, and Redis's native geospatial commands for the actual hot-path queries.

Google's S2 Geometry library divides the Earth's surface into hierarchical cells at 30 levels of resolution. Unlike geohash rectangles, S2 cells are roughly equal-area (they're projections of a cube face onto a sphere). You can cover a search area with a small number of cells at varying resolutions, which means dense urban areas get fine-grained cells while sparse areas get coarser ones.

For the production hot path, though, you don't need to implement S2 from scratch. Redis's GEOSEARCH command does the heavy lifting:

def find_nearby_drivers(pickup_lat, pickup_lng, radius_km=3):
    shard = get_city_shard(pickup_lat, pickup_lng)  # route to regional Redis

    candidates = redis.geosearch(
        f"shard:{shard}:drivers:available",
        longitude=pickup_lng,
        latitude=pickup_lat,
        radius=radius_km,
        unit="km",
        sort="ASC",        # nearest first
        count=20            # limit candidates
    )
    return candidates

Redis implements this with a sorted set where scores are 52-bit geohash integers. GEOSEARCH performs a range scan on the sorted set, which is O(log(N) + M) where N is total drivers in the set and M is results returned. For 50,000 drivers in a city shard, this returns in sub-millisecond time.

The sharding strategy matters. You partition the spatial index by city or metro region. Each shard's Redis instance holds only drivers in that region, keeping the working set small and memory-efficient. A shard router maps the pickup coordinates to the correct shard:

# Simple region-to-shard mapping
CITY_SHARDS = {
    "nyc": "redis-nyc-01",
    "sf": "redis-sf-01",
    "london": "redis-london-01",
    # ...
}

def get_city_shard(lat, lng):
    # Use a polygon-based lookup or coarse geohash prefix mapping
    region = reverse_geocode_to_region(lat, lng)
    return CITY_SHARDS[region]

Drivers near shard boundaries (e.g., between NYC and Newark) are a real edge case. The simplest fix: dual-register drivers within 5km of a boundary into both shards. The matching service deduplicates.

Tip: When you explain this in an interview, emphasize why you shard by city rather than by global geohash prefix. The answer: ride matching is inherently local. A rider in Brooklyn will never be matched with a driver in San Francisco. City-level sharding aligns the data partitioning with the access pattern, which is the whole point of sharding.

Geospatial Indexing: Finding Nearby Drivers

"How do we handle 1.25 million location updates per second?"

Five million active drivers, each pinging their GPS coordinates every 4 seconds. That's ~1.25M writes per second of ephemeral, rapidly-changing geospatial data. This is the hardest infrastructure problem in the entire system.

Bad Solution: Write to PostgreSQL

Every GPS ping becomes an INSERT or UPDATE on a driver_locations table in your relational database.

-- This will destroy your database
UPDATE drivers SET current_lat = 40.7128, current_lng = -74.0060, updated_at = NOW()
WHERE id = 'driver-abc-123';

At 1.25M updates/sec, you're asking a single Postgres instance (or even a sharded cluster) to handle write throughput that relational databases simply aren't designed for. Each UPDATE requires acquiring a row lock, writing a WAL entry, updating the index, and eventually vacuuming dead tuples. You'll hit I/O saturation, lock contention, and replication lag simultaneously. Your matching queries will compete with writes for the same resources.

Warning: Some candidates try to fix this by saying "just add more Postgres replicas." Replicas help with read scaling, not write scaling. The write bottleneck is on the primary, and every replica still has to replay every write.

Good Solution: Redis as the Live Spatial Index

Treat the current driver position as ephemeral, in-memory state. Redis with GEOADD gives you O(log(N)) writes into a sorted set with geospatial indexing built in.

def handle_location_ping(driver_id, lat, lng):
    shard = get_city_shard(lat, lng)
    redis.geoadd(f"shard:{shard}:drivers:available", lng, lat, driver_id)
    redis.expire(f"driver:heartbeat:{driver_id}", 30)  # TTL for staleness detection

A single Redis instance can handle ~100K+ GEOADD operations per second. With 12-15 city shards, you distribute the load comfortably. The data is ephemeral by design: if Redis restarts, drivers simply re-register on their next ping (within 4 seconds).

This solves the write throughput problem for the matching use case. But you've lost something: the historical location trail. You need that trail to calculate the actual distance traveled for fare computation, to reconstruct trips for dispute resolution, and for analytics.

Great Solution: Dual-Write Pipeline (Hot Path + Cold Path)

Separate the two concerns completely. The hot path serves real-time matching. The cold path preserves history.

async def handle_location_ping(driver_id, lat, lng, heading, speed, timestamp):
    shard = get_city_shard(lat, lng)

    # HOT PATH: update current position for matching (sub-ms)
    await redis.geoadd(f"shard:{shard}:drivers:available", lng, lat, driver_id)

    # COLD PATH: publish to Kafka for historical storage (async, fire-and-forget)
    await kafka.produce("driver-locations", {
        "driver_id": driver_id,
        "lat": lat,
        "lng": lng,
        "heading": heading,
        "speed": speed,
        "recorded_at": timestamp
    })

On the cold path, a Kafka consumer batch-writes location events into TimescaleDB (a time-series extension for Postgres):

CREATE TABLE location_logs (
    driver_id    UUID NOT NULL,
    lat          DOUBLE PRECISION NOT NULL,
    lng          DOUBLE PRECISION NOT NULL,
    heading      SMALLINT,
    speed        DECIMAL(5,2),
    recorded_at  TIMESTAMP NOT NULL
);

-- TimescaleDB hypertable for automatic time-based partitioning
SELECT create_hypertable('location_logs', 'recorded_at');

-- Index for reconstructing a specific trip's route
CREATE INDEX idx_location_driver_time 
    ON location_logs(driver_id, recorded_at DESC);

Kafka acts as the buffer. If TimescaleDB falls behind, events queue up in Kafka and get written when the consumer catches up. The hot path (Redis) is completely unaffected by cold path latency. You can also fan out the Kafka topic to other consumers: the Notification Service reads it to push real-time updates to riders, and an analytics pipeline consumes it for demand forecasting.

The numbers work out cleanly. Each location event is ~100 bytes. At 1.25M events/sec, that's ~125 MB/sec into Kafka, which is well within a modest Kafka cluster's throughput. TimescaleDB with batch inserts of 5,000-10,000 rows can absorb this with a few consumer instances.

Tip: The dual-write pattern is the single most important architectural insight in this problem. If you explain nothing else in the deep dive, explain this. It shows you understand that the same data serves two fundamentally different access patterns, and you designed the storage layer accordingly.

Dual-Write Location Pipeline (Hot + Cold Path)

"How does the matching algorithm work under load, and how do we handle failures?"

A ride request just came in. You've found 15 nearby available drivers from the spatial index. Now what?

Bad Solution: Nearest-Driver Assignment

Pick the driver with the shortest straight-line distance. Send them the offer. Done.

This fails in three ways. First, straight-line distance is meaningless in cities. A driver 0.5km away across a river might be a 15-minute drive, while one 2km away on the same road is 3 minutes out. Second, you're ignoring the driver's heading. A driver moving away from the pickup at 60 km/h is a worse match than one moving toward it. Third, you're not accounting for acceptance probability. Some drivers consistently decline certain ride types (long trips, short trips, airport runs). Assigning to them wastes 10-15 seconds of timeout before you can try the next candidate.

Warning: "Just pick the closest driver" is the most common weak answer in Uber system design interviews. It signals you haven't thought about the real-world complexity of the matching problem.

Good Solution: ETA-Ranked Offer Cascade

Instead of straight-line distance, compute actual ETA using road-network routing. Rank candidates by ETA, then offer the ride sequentially with timeouts.

async def match_ride(ride_request, candidate_drivers):
    # Batch-compute ETAs using a routing engine (OSRM, Valhalla, etc.)
    etas = await eta_service.batch_compute(
        origin_list=[(d.lat, d.lng) for d in candidate_drivers],
        destination=(ride_request.pickup_lat, ride_request.pickup_lng)
    )

    # Rank by ETA, break ties by driver rating
    ranked = sorted(
        zip(candidate_drivers, etas),
        key=lambda pair: (pair[1].minutes, -pair[0].rating)
    )

    # Offer cascade: try each driver with a timeout
    for driver, eta in ranked[:5]:  # limit cascade depth
        offer_id = await send_offer(driver.id, ride_request, eta)
        response = await wait_for_response(offer_id, timeout_seconds=10)

        if response == "ACCEPTED":
            await ride_service.transition(ride_request.id, "MATCHED", driver.id)
            return driver
        # DECLINED or TIMEOUT: try next candidate

    # All candidates exhausted
    await ride_service.transition(ride_request.id, "NO_DRIVERS_AVAILABLE")
    return None

The offer goes to the driver's app via WebSocket. They see the pickup location, estimated trip details, and have 10 seconds to accept or decline. If they don't respond, the system treats it as a decline and moves to the next candidate.

This works well for moderate load. The problem emerges during peak hours. Imagine 50 ride requests arrive within 3 seconds in the same neighborhood. With greedy sequential matching, the first request grabs the best driver, the second request grabs the second-best, and so on. But maybe a globally better assignment exists where swapping two driver assignments reduces total wait time for everyone.

Great Solution: Batch Matching with Optimization

Accumulate ride requests over a short window (2-3 seconds) and solve them together as a batch assignment problem.

async def batch_match(pending_requests, available_drivers):
    # Build a cost matrix: rows = requests, columns = drivers
    # Cost = ETA from driver to pickup point
    cost_matrix = await eta_service.compute_matrix(
        origins=[(d.lat, d.lng) for d in available_drivers],
        destinations=[(r.pickup_lat, r.pickup_lng) for r in pending_requests]
    )

    # Solve assignment problem (Hungarian algorithm or auction-based)
    # Minimizes total ETA across all assignments
    assignments = scipy.optimize.linear_sum_assignment(cost_matrix)

    for request_idx, driver_idx in zip(*assignments):
        ride = pending_requests[request_idx]
        driver = available_drivers[driver_idx]
        await send_offer(driver.id, ride, cost_matrix[driver_idx][request_idx])

The Hungarian algorithm solves this in O(n³), which is fine for batches of 50-100 requests. For larger batches, you use an auction-based approximation that runs in near-linear time.

Failure handling is where the real complexity lives. You need circuit breakers for drivers who repeatedly time out:

class DriverCircuitBreaker:
    def __init__(self, redis):
        self.redis = redis

    async def record_timeout(self, driver_id):
        key = f"driver:timeouts:{driver_id}"
        count = await self.redis.incr(key)
        await self.redis.expire(key, 300)  # 5-minute window
        if count >= 3:
            # Temporarily remove from available pool
            await self.redis.setex(f"driver:cooldown:{driver_id}", 120, "1")

    async def is_available(self, driver_id):
        return not await self.redis.exists(f"driver:cooldown:{driver_id}")

When an offer is declined or times out, the ride doesn't just go to "the next driver." It re-enters the match queue with priority boosted (since the rider has already been waiting). The matching engine picks it up in the next batch cycle.

Tip: In the interview, acknowledge that batch matching introduces a 2-3 second delay before the first offer goes out. Explain why this tradeoff is worth it: slightly longer initial wait, but significantly better match quality and lower average pickup times across all riders. This is the kind of tradeoff reasoning that separates senior candidates from mid-level ones.

"How do we implement surge pricing?"

The interviewer might frame this as: "It's New Year's Eve. Demand just spiked 10x in Times Square. How does the system respond?"

Surge pricing is a supply-and-demand balancing mechanism. You divide each city into geographic zones (hexagonal cells work well because they tile evenly and have consistent neighbor distances). For each zone, you continuously track two signals: how many ride requests are coming in (demand) and how many available drivers are present (supply).

async def compute_surge(zone_id):
    demand = await demand_tracker.get_request_count(zone_id, window_minutes=5)
    supply = await supply_tracker.get_available_drivers(zone_id)

    if supply == 0:
        return MAX_SURGE_MULTIPLIER  # cap at, say, 5.0x

    raw_ratio = demand / supply

    # Map ratio to multiplier using a predefined curve
    # ratio 1.0 = no surge, ratio 2.0 = 1.5x, ratio 4.0 = 2.5x, etc.
    multiplier = surge_curve(raw_ratio)

    # Smooth against previous value to prevent oscillation
    previous = await redis.get(f"surge:{zone_id}:current") or 1.0
    smoothed = 0.7 * float(previous) + 0.3 * multiplier

    await redis.setex(f"surge:{zone_id}:current", 60, round(smoothed, 1))
    return smoothed

That smoothing step is important. Without it, surge prices oscillate wildly. Demand spikes, price jumps to 3x, riders stop requesting, demand drops, price falls to 1x, riders flood back in, price jumps again. The exponential moving average dampens this feedback loop.

One detail that candidates often miss: the surge multiplier is locked at ride request time. When a rider sees "2.5x surge" and confirms the ride, that 2.5x is stored on the ride record. Even if surge drops to 1.0x by the time the ride completes 20 minutes later, the rider pays the 2.5x rate they agreed to. This gives riders price certainty and prevents disputes.

-- Surge is captured at request time, not completion time
ALTER TABLE rides ADD COLUMN surge_multiplier DECIMAL(3,1) NOT NULL DEFAULT 1.0;

The Pricing Service recomputes surge for all zones every 30-60 seconds. This runs as a background job, not in the request path. The Ride Service simply reads the current surge value for the pickup zone when creating a new ride.

"How do we ensure ride state consistency when things go wrong?"

The ride state machine (REQUESTED → MATCHED → EN_ROUTE → IN_PROGRESS → COMPLETED) looks clean on a whiteboard. In production, everything conspires to break it.

What happens when two drivers both try to accept the same ride within milliseconds? What if a driver's phone dies mid-ride? What if the payment charge fails after the ride is marked complete?

The foundation is optimistic locking on the ride record:

CREATE TABLE rides (
    id                UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    rider_id          UUID NOT NULL REFERENCES riders(id),
    driver_id         UUID REFERENCES drivers(id),
    status            VARCHAR(20) NOT NULL DEFAULT 'REQUESTED',
    version           INTEGER NOT NULL DEFAULT 0,       -- optimistic lock
    surge_multiplier  DECIMAL(3,1) NOT NULL DEFAULT 1.0,
    pickup_lat        DOUBLE PRECISION NOT NULL,
    pickup_lng        DOUBLE PRECISION NOT NULL,
    dropoff_lat       DOUBLE PRECISION NOT NULL,
    dropoff_lng       DOUBLE PRECISION NOT NULL,
    requested_at      TIMESTAMP NOT NULL DEFAULT now(),
    matched_at        TIMESTAMP,
    started_at        TIMESTAMP,
    completed_at      TIMESTAMP,
    cancelled_at      TIMESTAMP
);

Every state transition uses compare-and-swap on both the status and version:

async def accept_ride(ride_id, driver_id, expected_version):
    result = await db.execute("""
        UPDATE rides 
        SET status = 'MATCHED', 
            driver_id = $1, 
            matched_at = NOW(), 
            version = version + 1
        WHERE id = $2 
          AND status = 'REQUESTED' 
          AND version = $3
    """, driver_id, ride_id, expected_version)

    if result.rowcount == 0:
        # Someone else already accepted, or ride was cancelled
        raise RideAlreadyMatchedError()

    await event_log.append(ride_id, "MATCHED", driver_id=driver_id)

If two drivers accept simultaneously, exactly one UPDATE will match the WHERE clause. The other gets rowcount == 0 and receives a "ride no longer available" response. No distributed locks needed.

For driver disconnection, the system runs a heartbeat monitor. Drivers send a heartbeat every 10 seconds through their WebSocket connection. If three consecutive heartbeats are missed (30 seconds), the monitor fires a timeout alert:

async def handle_driver_timeout(driver_id):
    active_ride = await db.fetch_one(
        "SELECT id, status, version FROM rides WHERE driver_id = $1 AND status IN ('MATCHED', 'EN_ROUTE', 'IN_PROGRESS')",
        driver_id
    )
    if not active_ride:
        return

    if active_ride.status in ('MATCHED', 'EN_ROUTE'):
        # Driver hasn't picked up rider yet; reassign
        await transition_ride(active_ride.id, 'REQUESTED', active_ride.version)
        await matching_service.re_queue(active_ride.id, priority="HIGH")

    elif active_ride.status == 'IN_PROGRESS':
        # Rider is in the car; don't cancel, just flag for manual review
        await flag_for_support(active_ride.id, reason="DRIVER_DISCONNECTED")

Notice the different handling based on ride state. If the driver disconnects before pickup, you can safely reassign. If the rider is already in the car, you can't just cancel the ride. You flag it and let the driver reconnect (maybe they went through a tunnel) or escalate to support.

Payment idempotency is the last piece. When the ride completes, the Ride Service publishes a payment event to the message queue. If the Payment Service crashes mid-charge and the event gets redelivered, you need to guarantee the rider isn't charged twice:

async def process_payment(event):
    idempotency_key = f"payment:{event.ride_id}"

    if await redis.exists(idempotency_key):
        return  # Already processed

    charge = await payment_gateway.charge(
        rider_id=event.rider_id,
        amount=event.total_fare,
        idempotency_key=idempotency_key  # Gateway also deduplicates
    )

    await redis.setex(idempotency_key, 86400, charge.id)  # 24hr TTL
    await db.execute(
        "UPDATE fares SET payment_status = 'CHARGED', charged_at = NOW() WHERE ride_id = $1",
        event.ride_id
    )

Double protection: your own idempotency check in Redis, plus the payment gateway's built-in idempotency key support (Stripe, Braintree, and others all support this). Belt and suspenders.

Tip: When discussing consistency in the interview, don't just mention "optimistic locking" as a buzzword. Walk through a specific failure scenario (two drivers accepting simultaneously, or a payment retry) and show exactly how your design handles it. Concrete scenarios are far more convincing than abstract guarantees.

What is Expected at Each Level

Not every candidate is expected to cover every deep dive. Interviewers calibrate based on level, and knowing where the bar sits for your level lets you allocate your 35-45 minutes wisely.

Mid-Level (L3-L4)

Nail the ride lifecycle. You should be able to draw the state machine (REQUESTED → MATCHED → EN_ROUTE → IN_PROGRESS → COMPLETED/CANCELLED) without hesitation and explain what triggers each transition. If you fumble the core entity relationships or forget that a Ride links a Rider to a Driver, that's a red flag at any level.
Produce a coherent high-level architecture. Three services minimum: Ride Service, Location Service, Matching Service. You don't need to know the internals of each, but you need to show that you understand they're separate concerns. Lumping everything into one monolith will cost you.
Identify that geospatial indexing matters. You don't need to explain geohashing in detail or debate quadtrees vs. S2 cells. But you do need to say something like "we can't scan all 5 million drivers on every request; we need a spatial index." Recognizing the problem is more important than having the perfect solution at this level.
Handle the API design cleanly. POST to create a ride, a way for drivers to accept, a way to update status. Include pickup/dropoff coordinates in the request. Keep it RESTful and sensible. Interviewers aren't looking for creativity here; they're checking that you can think through the contract between client and server.

Senior (L5)

Separate the hot path from the cold path for location data. This is the single biggest differentiator between mid-level and senior answers. You should articulate why current driver positions belong in an in-memory spatial index (Redis, for example) while historical location logs flow through a stream processor into a time-series store. If you try to write 1.25M updates/sec to Postgres, the interviewer will push back hard.
Design the matching offer cascade with failure handling. Rank drivers by ETA, not just straight-line distance. Explain the 10-second offer timeout, the fallback to the next candidate, and what happens when all candidates in the initial radius decline. Senior candidates think about the unhappy path, not just the sunny-day scenario.
Propose a sharding strategy for the spatial index. City-level or region-level sharding is the natural answer. You should be able to explain why a single global Redis instance won't work at scale and how you'd route queries to the right shard based on pickup coordinates.
Discuss surge pricing tradeoffs at a conceptual level. You don't need to derive the algorithm, but you should know that the multiplier is locked at request time (not completion), and you should be able to reason about why: price certainty for the rider. Mentioning the tension between real-time reactivity and price stability shows mature product thinking.

Staff+ (L6+)

Reason about batch matching vs. greedy assignment. Greedy "nearest driver" matching is locally optimal but globally wasteful. Staff candidates should discuss accumulating requests over a short window (2-3 seconds) and solving a bipartite optimization to improve overall system efficiency, then explain when this complexity is justified (dense urban markets) versus when greedy is fine (suburban areas with sparse demand).
Address failure modes systematically, not as an afterthought. Driver phone dies mid-ride: heartbeat timeout triggers reassignment. Two drivers accept the same ride: compare-and-swap on the ride record with a version column. Payment gateway times out: idempotency keys ensure no double-charge on retry. The interviewer wants to see that you treat these as first-class design concerns, not edge cases you'll "handle later."
Reason about capacity at each component and identify scaling bottlenecks. The Location Service ingesting 1.25M writes/sec is a fundamentally different scaling challenge than the Ride Service handling 230 rides/sec. Staff candidates should pinpoint which services need horizontal scaling (Location, Notification) and which can stay relatively simple (Pricing, Payment). Throwing "just add more instances" at everything signals shallow thinking.
Think about regional topology and shard boundary problems. What happens when a driver is near the border of two city shards? How do you handle rides that cross regions? Staff candidates propose overlapping geohash cells at boundaries, or a routing layer that queries adjacent shards when the pickup point is within a threshold distance of a shard edge. These are the details that separate architects from senior engineers.

Key takeaway: The strongest candidates recognize that the Location Service, not the Ride Service, is the hardest engineering problem in this design. 1.25M ephemeral geospatial writes per second is a fundamentally different challenge from storing ride records in a database. Spend your interview time accordingly.

Understanding the Problem

Functional Requirements

Non-Functional Requirements

Back-of-Envelope Estimation

The Set Up

Core Entities

API Design

High-Level Design

1) Rider Requests a Ride

2) System Matches Rider with Nearby Driver

3) Real-Time Tracking During the Ride

4) Fare Calculation and Trip Completion

Putting It All Together

Deep Dives

"How do we efficiently find nearby drivers?"

Bad Solution: Brute-Force Distance Scan

Good Solution: Geohashing

Great Solution: Adaptive Spatial Indexing with Redis GEOSEARCH

"How do we handle 1.25 million location updates per second?"

Bad Solution: Write to PostgreSQL

Good Solution: Redis as the Live Spatial Index

Great Solution: Dual-Write Pipeline (Hot Path + Cold Path)

"How does the matching algorithm work under load, and how do we handle failures?"

Bad Solution: Nearest-Driver Assignment

Good Solution: ETA-Ranked Offer Cascade

Great Solution: Batch Matching with Optimization

"How do we implement surge pricing?"

"How do we ensure ride state consistency when things go wrong?"

What is Expected at Each Level

Mid-Level (L3-L4)

Senior (L5)

Staff+ (L6+)

Dan Lee

Related Articles

Design a Search Autocomplete System

Design a Chat System

Design a Notification System

Design a Ride-Sharing Service (Uber)

Understanding the Problem

What is a Ride-Sharing Service?

Functional Requirements

Non-Functional Requirements

Back-of-Envelope Estimation

The Set Up

Core Entities

API Design

High-Level Design

1) Rider Requests a Ride

2) System Matches Rider with Nearby Driver

3) Real-Time Tracking During the Ride

4) Fare Calculation and Trip Completion

Putting It All Together

Deep Dives

"How do we efficiently find nearby drivers?"

Bad Solution: Brute-Force Distance Scan

Good Solution: Geohashing

Great Solution: Adaptive Spatial Indexing with Redis GEOSEARCH

"How do we handle 1.25 million location updates per second?"

Bad Solution: Write to PostgreSQL

Good Solution: Redis as the Live Spatial Index

Great Solution: Dual-Write Pipeline (Hot Path + Cold Path)

"How does the matching algorithm work under load, and how do we handle failures?"

Bad Solution: Nearest-Driver Assignment

Good Solution: ETA-Ranked Offer Cascade

Great Solution: Batch Matching with Optimization

"How do we implement surge pricing?"

"How do we ensure ride state consistency when things go wrong?"

What is Expected at Each Level

Mid-Level (L3-L4)

Senior (L5)

Staff+ (L6+)

Dan Lee

Related Articles

Design a Search Autocomplete System

Design a Chat System

Design a Notification System