Design a Ticketing System (Ticketmaster)

Understanding the Problem

What is a Ticketing System?

Product definition: A platform where event organizers list events with venue seating, and users can search, browse, select seats, and purchase tickets with a guarantee of no double-booking.

Think about the last time you tried to buy concert tickets the moment they went on sale. You and 200,000 other people all clicked "Find Tickets" at 10:00:00 AM, watched a loading spinner, and maybe got lucky. Behind that experience is a system managing a fixed pool of physical seats, where every single one can only be sold once, while a swarm of concurrent users fight over them in real time.

That's what makes this problem fascinating for an interview. It's not just a CRUD app with a checkout flow. The core challenge is a concurrency and fairness problem disguised as an e-commerce system. A venue might have 20,000 seats, but 500,000 people show up in the first 60 seconds. Your system needs to be both fast and correct, and when those two goals conflict, correctness wins. You cannot oversell a seat.

Functional Requirements

Core Requirements:

Event and venue management. Organizers create events tied to a venue with a hierarchical seating structure (venue, section, row, seat). Each seat has a price and availability status.
Seat map browsing. Users search for events, select one, and view an interactive seat map showing which seats are available, held, or sold.
Seat selection with temporary holds. When a user picks seats, the system places a time-limited hold (5-10 minutes) so nobody else can grab them while the user checks out.
Checkout and payment processing. The user completes payment within the hold window. On success, the seats transition to booked. On failure or timeout, they release back to available.
Ticket delivery and order history. After purchase, the user receives an e-ticket (QR code) and can view past orders.

Below the line (out of scope):

Resale / secondary marketplace (transferring tickets between users)
Dynamic pricing algorithms that adjust seat prices based on demand
Event discovery recommendations and personalization

Note: "Below the line" features are acknowledged but won't be designed in this lesson. Mentioning them shows the interviewer you see the full product surface, but you're choosing to focus on the hard parts.

Non-Functional Requirements

This is where you earn points in the interview. The non-functional requirements reveal that you understand why this system is hard, not just what it does.

Strong consistency for seat inventory. Zero tolerance for overselling. If two users try to book the same seat at the same millisecond, exactly one succeeds. This is non-negotiable.
High availability during on-sale events. The system must stay up during peak traffic. A Taylor Swift on-sale generates more concentrated traffic than most systems see in a month. Target 99.99% availability during on-sale windows.
Low latency for seat map rendering. The seat map is the most-loaded page. Users expect it to render in under 300ms (p99). Stale data by a few seconds is acceptable here; the hold mechanism catches conflicts at write time.
Extreme traffic spike tolerance. A single popular event can attract 100K-500K concurrent users. Read QPS can spike to 500K+, and booking attempts can hit 50K+ per second, all concentrated on one event's inventory. The system must absorb this without falling over.

Tip: Always clarify requirements before jumping into design. This shows maturity. In a real interview, ask: "Should we optimize for assigned seating or general admission?" and "How important is fairness in the queue?" These questions signal that you've built systems where those distinctions matter.

Back-of-Envelope Estimation

Let's ground the design in real numbers. These don't need to be exact; the interviewer wants to see that you can reason about scale and identify which numbers actually matter.

Assumptions: - ~500M tickets sold per year - ~50K events active at any given time - Average event has ~10,000 seats - Peak on-sale event: 20,000 seats, 300,000 users competing

Metric	Calculation	Result
Average daily ticket sales	500M / 365	~1.4M tickets/day
Average booking QPS	1.4M / 86,400	~16 QPS (boring, not the real problem)
Peak read QPS (single hot event)	300K users refreshing seat map every 2s	150K QPS
Peak booking QPS (single hot event)	300K users, 20% get through queue in first minute	~1,000 booking attempts/sec
Seat map payload size	20K seats × 50 bytes each	~1 MB per event
Peak bandwidth (seat map reads)	150K QPS × 1 MB	150 GB/s (must be cached at CDN)
Storage (seat inventory)	500M tickets/year × 500 bytes/record × 3 years	~750 GB
Storage (events + venues)	50K events × 10KB metadata	~500 MB (trivial)

The numbers tell a clear story. Average load is completely manageable. The entire design challenge lives in the peak: a single event's on-sale window where hundreds of thousands of users converge on a tiny inventory of seats, all within 60 seconds.

That 150 GB/s bandwidth number for seat map reads should jump out at you. There is no world where your database handles that directly. CDN caching with short TTLs isn't optional; it's survival.

And notice how the average booking QPS (16) is laughably low, but the peak for one event (~1,000/sec) is orders of magnitude higher. This asymmetry is the architectural signature of the entire system. You're designing for the spike, not the average.

The Set Up

Before we draw any architecture boxes, we need to nail down what we're actually storing and how clients interact with it. In an interview, spending 3-4 minutes here saves you from redesigning your schema mid-conversation when you realize you forgot a critical field.

Core Entities

Five entities carry the weight of this system: Event, Venue, Seat, Booking, and Payment. Users matter too, but their schema is straightforward and not where the interesting design decisions live.

Venue represents a physical location. It has a hierarchical structure: a venue contains sections, sections contain rows, and rows contain individual seats. You could model this as separate tables (Venue → Section → Row → Seat), but in practice, flattening section, row, and seat number directly onto the Seat entity keeps queries simpler and avoids expensive joins during the flash-sale hot path.

Event is a scheduled performance at a venue. The on_sale_date field is quietly important: it's the timestamp that triggers the thundering herd. Your system needs to know when to activate the waiting room and scale up resources.

Seat is the single most important entity in the entire design. Each seat carries a status enum (AVAILABLE, HELD, BOOKED) and a hold_expires_at timestamp. That timestamp is your distributed lock. It's the mechanism that prevents double-booking while giving users time to enter payment details. Every concurrency discussion in the deep dive will revolve around this one row.

Key insight: The seat is modeled per event, not per venue. A venue with 20,000 physical seats generates 20,000 seat rows for every event hosted there. This means seat inventory is event-scoped, which simplifies locking and avoids cross-event contention.

Booking ties a user to one or more seats. Its status lifecycle is PENDING → CONFIRMED → CANCELLED. A booking starts as PENDING when seats are held, transitions to CONFIRMED when payment succeeds, and moves to CANCELLED if the hold expires or the user abandons checkout. The total_cents field snapshots the price at purchase time, so later price changes to the event don't retroactively affect completed orders.

Payment records the transaction with an external payment provider. It's deliberately a separate entity from Booking because payment processing is asynchronous. A booking can exist in PENDING state before any payment record exists, and the payment webhook might arrive seconds or even minutes after the initial hold.

CREATE TABLE venues (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name            VARCHAR(255) NOT NULL,
    address         TEXT NOT NULL,
    total_capacity  INT NOT NULL           -- sum of all seats across sections
);

CREATE TABLE events (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    venue_id        UUID NOT NULL REFERENCES venues(id),
    name            VARCHAR(255) NOT NULL,
    event_date      TIMESTAMP NOT NULL,
    on_sale_date    TIMESTAMP NOT NULL,    -- when tickets become purchasable
    status          VARCHAR(20) NOT NULL DEFAULT 'DRAFT',  -- DRAFT, ON_SALE, SOLD_OUT, COMPLETED, CANCELLED
    created_at      TIMESTAMP NOT NULL DEFAULT now()
);
CREATE INDEX idx_events_on_sale ON events(on_sale_date, status);

The composite index on (on_sale_date, status) without a partial filter is intentional. A partial index filtered to status = 'DRAFT' would only help an internal cron job that transitions events from DRAFT to ON_SALE. But the thundering herd scenario involves users querying events that are already ON_SALE. The unfiltered composite index serves both use cases: internal scheduling jobs and user-facing queries that need to find active, purchasable events quickly.

CREATE TABLE seats (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    event_id        UUID NOT NULL REFERENCES events(id),
    section         VARCHAR(50) NOT NULL,   -- e.g. 'Floor A', 'Balcony 2'
    row             VARCHAR(10) NOT NULL,    -- e.g. 'AA', '14'
    number          INT NOT NULL,
    price_cents     INT NOT NULL,
    status          VARCHAR(10) NOT NULL DEFAULT 'AVAILABLE',  -- AVAILABLE, HELD, BOOKED
    hold_expires_at TIMESTAMP,              -- NULL unless status = HELD
    version         INT NOT NULL DEFAULT 0, -- for optimistic concurrency control
    UNIQUE (event_id, section, row, number)
);
CREATE INDEX idx_seats_event_status ON seats(event_id, status);
CREATE INDEX idx_seats_held_expiry ON seats(status, hold_expires_at)
    WHERE status = 'HELD';

That version column on the seats table will become your best friend during the concurrency deep dive. Don't forget it.

CREATE TABLE bookings (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id         UUID NOT NULL REFERENCES users(id),  -- FK to users table
    event_id        UUID NOT NULL REFERENCES events(id),
    status          VARCHAR(15) NOT NULL DEFAULT 'PENDING',  -- PENDING, CONFIRMED, CANCELLED
    total_cents     INT NOT NULL,           -- price snapshot at time of hold
    created_at      TIMESTAMP NOT NULL DEFAULT now(),
    confirmed_at    TIMESTAMP               -- NULL until payment succeeds
);
CREATE INDEX idx_bookings_user ON bookings(user_id, created_at DESC);

CREATE TABLE booking_seats (
    booking_id      UUID NOT NULL REFERENCES bookings(id),
    seat_id         UUID NOT NULL REFERENCES seats(id),
    PRIMARY KEY (booking_id, seat_id)
);
CREATE UNIQUE INDEX idx_booking_seats_seat ON booking_seats(seat_id);
-- The unique index on seat_id is a safety net: a seat can only belong to one active booking

CREATE TABLE payments (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    booking_id      UUID NOT NULL REFERENCES bookings(id),
    provider        VARCHAR(30) NOT NULL,    -- 'stripe', 'paypal', etc.
    provider_txn_id VARCHAR(255),            -- NULL until provider confirms
    amount_cents    INT NOT NULL,
    status          VARCHAR(15) NOT NULL DEFAULT 'PENDING',  -- PENDING, COMPLETED, FAILED, REFUNDED
    created_at      TIMESTAMP NOT NULL DEFAULT now()
);
CREATE INDEX idx_payments_booking ON payments(booking_id);

Notice the booking_seats junction table. A single booking can reserve multiple seats (a group buying 4 tickets together), and the unique index on seat_id acts as a database-level guarantee that no seat can appear in two bookings simultaneously. Even if your application logic has a bug, this constraint catches it.

Tip: When you draw this in an interview, emphasize the Seat entity and its status/TTL fields. Interviewers want to see that you understand where the concurrency battle happens. The other entities are supporting cast.

API Design

Four endpoints cover the core functionality. Keep the surface area small; you can always add endpoints later, but an interviewer will question why you're designing twelve routes when four will do.

// Browse the seat map for an event. Returns seat availability grouped by section.
GET /events/{eventId}/seats?section={sectionId}
-> {
     "event_id": "uuid",
     "sections": [
       {
         "section": "Floor A",
         "available": 142,
         "seats": [
           { "id": "uuid", "row": "AA", "number": 5, "status": "AVAILABLE", "price_cents": 15000 },
           { "id": "uuid", "row": "AA", "number": 6, "status": "HELD", "price_cents": 15000 }
         ]
       }
     ]
   }

This is a GET because it's a pure read with no side effects. The optional section query parameter lets the client load one section at a time, which matters when a venue has 50+ sections and you don't want to ship the entire seat map in one payload.

// Initiate a booking by placing a temporary hold on selected seats.
POST /bookings
{
  "event_id": "uuid",
  "seat_ids": ["uuid", "uuid", "uuid"]
}
-> {
     "booking_id": "uuid",
     "status": "PENDING",
     "hold_expires_at": "2025-01-15T20:05:00Z",
     "total_cents": 45000
   }

This is the hottest endpoint in the system. Every concurrent user trying to grab seats hits this path, and it must be atomic. The response includes hold_expires_at so the client can show a countdown timer ("You have 4:32 to complete your purchase").

Warning: A common interview mistake is making seat selection and payment a single synchronous call. That forces users to enter credit card details before knowing if their seats are available, which is terrible UX and creates unnecessarily long lock windows. Always separate the hold from the payment.

// Confirm a booking by completing payment.
POST /bookings/{bookingId}/confirm
{
  "payment_method_token": "tok_visa_4242"
}
-> {
     "booking_id": "uuid",
     "status": "CONFIRMED",
     "payment_id": "uuid",
     "tickets": [
       { "seat": "Floor A, Row AA, Seat 5", "qr_code_url": "https://..." }
     ]
   }

Why POST and not PUT? Because this endpoint triggers a payment side effect. It's not idempotent by nature (though you should make it idempotent via the booking ID to handle retries). PUT implies replacing a resource's full state; POST better communicates "perform this action."

// Cancel a booking and release held seats.
DELETE /bookings/{bookingId}
-> { "status": "CANCELLED" }

DELETE is the natural fit here. The user is abandoning their hold, and the system releases the seats back to AVAILABLE. This also gets called internally by the expiry worker when a hold times out.

One thing to flag for your interviewer before moving to the high-level design: the read path (GET /events/{id}/seats) and the write path (POST /bookings) have fundamentally different scaling profiles. The seat map serves millions of reads per second during a flash sale, and a few seconds of staleness is perfectly fine. The booking path handles tens of thousands of writes per second against the same rows, and staleness means overselling. These two paths need completely different strategies, and that asymmetry will drive most of our architectural decisions.

High-Level Design

We have two fundamentally different paths through this system, and they need fundamentally different architectures. The read path (browsing events, loading seat maps) is high-volume and tolerant of slight staleness. The write path (holding and booking seats) demands strong consistency and can't tolerate even a single duplicate. Let's walk through each.

1) Seat Map Browsing and Event Discovery

When a user opens the app for a Taylor Swift concert, they want to see the venue map with available seats colored green and taken seats grayed out. During a major on-sale, millions of users will be loading this same seat map simultaneously. If every one of those requests hits your database, you're dead before the first ticket sells.

Components: Client (web/mobile), CDN/Cache layer, API Gateway, Event Service, PostgreSQL database.

Data flow:

Client requests the seat map for event E123 via GET /events/E123/seats.
The request hits the CDN edge node first. If a cached version exists and is less than 2-3 seconds old, it's returned immediately. Done.
On a cache miss, the request passes through the API Gateway (which enforces rate limits) to the Event Service.
The Event Service queries the database for seat availability. Rather than returning every individual seat's status, it returns pre-aggregated section-level counts (e.g., "Section 101: 47 available") for the overview, and individual seat data only when the user zooms into a specific section.
The response is written back to the CDN cache with a short TTL (2-3 seconds) before being returned to the client.

Tip: Interviewers will push back on the staleness. Your answer: "A user might see a seat as available that was just held 1 second ago. That's fine. When they try to book it, the write path will reject the attempt atomically. The cache is optimistic; the booking service is the source of truth."

The key insight here is that you don't need real-time accuracy on the read path. You need real-time accuracy on the write path. Separating these concerns is what makes the system scalable.

For truly hot events, you can go further: push incremental updates via Server-Sent Events or WebSockets so clients don't need to poll at all. The seat map gets a stream of "seat 42 in section 101 is now HELD" deltas. This cuts your read QPS dramatically during the critical first minutes of a sale.

2) Seat Selection, Temporary Holds, and Booking

This is where the interview gets interesting. A user picks two seats, clicks "Buy," and expects those seats to be reserved while they fumble for their credit card. Meanwhile, 500 other people are trying to grab the same seats.

Components: Client, Virtual Waiting Room (for high-demand events), Booking Service, PostgreSQL database (primary), Payment Gateway.

Data flow:

User selects seats and clicks "Book." If this is a high-demand event (more on that shortly), the user is routed into a virtual waiting room instead of hitting the booking service directly.
Once admitted, the client sends POST /bookings with the selected seat IDs.
The Booking Service attempts an atomic status transition on each requested seat: AVAILABLE → HELD. This happens inside a database transaction with row-level locking. If any seat is not AVAILABLE, the entire transaction rolls back and the user gets an error.
On success, the Booking Service creates a Booking record in PENDING status and sets hold_expires_at to now() + 7 minutes on each seat row.
The client is redirected to the payment flow. The user has 7 minutes.
The client submits payment info. The Booking Service sends a charge request to the Payment Gateway.
The Payment Gateway processes the charge and sends a webhook callback to the Booking Service.
On successful payment, the Booking Service transitions the seats from HELD → BOOKED and the booking from PENDING → CONFIRMED. The user gets their e-ticket.
If payment fails or the 7-minute hold expires, the seats revert to AVAILABLE and the booking moves to CANCELLED.

The atomic hold is the single most important mechanism in the entire system. Here's what it looks like at the database level:

BEGIN;

UPDATE seats
SET status = 'HELD',
    hold_expires_at = now() + INTERVAL '7 minutes',
    booking_id = :booking_id
WHERE id IN (:seat_ids)
  AND event_id = :event_id
  AND status = 'AVAILABLE';

-- Check that we actually updated the expected number of rows
-- If affected_rows != len(seat_ids), ROLLBACK

COMMIT;

That AND status = 'AVAILABLE' clause is your safety net. Two concurrent transactions trying to hold the same seat will serialize at the row lock. The second one will find status = 'HELD' and update zero rows. No double-booking. Ever.

Common mistake: Candidates often design a two-step "check then set" flow: first query the seat status, then update it in a separate call. This has a textbook race condition. The check and the set must happen in the same atomic operation.

3) The Virtual Waiting Room

Imagine 300,000 people all clicking "Buy" within 2 seconds of an on-sale. Without protection, your booking service receives 300K write transactions simultaneously, all competing for row locks on the same 20,000 seats. The database melts.

The virtual waiting room is a traffic shaping layer that sits between the client and the booking service. It's only activated for events flagged as high-demand (the organizer or an automated heuristic can set this).

Here's how it works:

When the on-sale time hits, users clicking "Buy" are enqueued into a distributed queue (Redis sorted set works well here, with the score being their arrival timestamp plus a small random jitter for fairness).
Each user gets a position number and a WebSocket connection that pushes position updates every few seconds. "You are #4,521 in line. Estimated wait: 3 minutes."
An Admission Controller runs a loop: it knows the booking service can handle roughly 2,000 booking attempts per second without degrading. It dequeues users in batches of that size and issues them short-lived admission tokens (JWTs with a 60-second expiry).
The user's client automatically submits the booking request with the admission token. The Booking Service rejects any request without a valid token.

The admission rate is the critical tuning knob. Set it too low and you waste capacity while users get frustrated. Set it too high and you're back to database contention. In practice, you calibrate it based on load testing: what's the maximum write throughput your database can sustain for this event's seat inventory before lock contention causes latency to spike?

Tip: When you introduce the waiting room, frame it as a business decision, not just a technical one. "This protects the user experience. Without it, users get random 500 errors and timeouts. With it, they see a clear queue position and feel the process is fair. Ticketmaster's actual waiting room exists for exactly this reason."

The random jitter added to queue positions is worth mentioning. Pure FIFO ordering rewards users with faster internet connections and encourages bot-like behavior (auto-refreshing at the exact on-sale second). A small random offset (say, within a 5-second window) levels the playing field slightly.

4) Asynchronous Payment Confirmation

Payment processing is inherently slow and unreliable. Credit card charges can take 2-10 seconds. Payment providers have outages. You absolutely cannot hold a database row lock open while waiting for Stripe to respond.

The flow is intentionally decoupled:

Booking Service places the hold and creates a PENDING booking. This takes milliseconds.
Booking Service sends an async charge request to the Payment Gateway (Stripe, Adyen, etc.).
The user sees a "Processing payment..." screen.
The Payment Gateway processes the charge and calls back via webhook: POST /webhooks/payment with the booking ID and result.
The Booking Service receives the webhook, verifies its signature, and either confirms or cancels the booking.

// Webhook payload from payment provider
{
  "event": "payment.completed",
  "data": {
    "booking_id": "b-9f3a2c",
    "provider_txn_id": "ch_1N2x3Y4z",
    "amount_cents": 35000,
    "status": "succeeded"
  }
}

What if the webhook never arrives? You need a fallback. A background job polls the payment provider for any bookings that have been in PENDING status for more than 2 minutes. This covers network failures, dropped webhooks, and provider-side delays.

And there's a nasty edge case: the payment webhook arrives after the hold has already expired and the seats have been released back to AVAILABLE (or worse, re-sold to someone else). The deep dive section covers this in detail, but the short answer is: you refund the payment and apologize. The hold expiry is the authoritative clock. If payment confirmation doesn't arrive within the hold window, the booking is dead.

Warning: Don't hand-wave the payment flow. Interviewers at senior+ levels will probe for exactly these edge cases. Having a clear answer for "what happens if the webhook is late" separates strong candidates from average ones.

Write Path: Booking Flow with Waiting Room

Putting It All Together

The full architecture splits cleanly along the read/write boundary:

Read path: Clients hit a CDN-cached seat map with 2-3 second TTLs. Cache misses flow through the API Gateway to the Event Service, which queries read replicas. For hot events, WebSocket/SSE connections push incremental seat status changes to reduce polling load. This path scales horizontally and handles millions of concurrent viewers.

Write path: Clients enter a virtual waiting room for high-demand events. The Admission Controller meters traffic into the Booking Service at a rate the database can handle. The Booking Service performs atomic seat holds using row-level locking on the primary database. Payment is processed asynchronously via webhooks. A background sweeper reclaims expired holds.

The two paths share the same underlying seat inventory, but they read from different database instances (replicas vs. primary) and have completely different caching and scaling strategies. The read path optimizes for throughput and can tolerate brief staleness. The write path optimizes for correctness and serializes all mutations through the primary database.

One final note on the Booking Service: it's deliberately a single logical service (though it can have multiple instances) because it owns the seat inventory state machine. Splitting seat-hold logic across multiple services would create distributed transaction problems that are far harder to solve than the concurrency problem itself.

Deep Dives

"How do we prevent double-booking under extreme concurrency?"

This is the question that separates candidates who've built real systems from those who haven't. The interviewer isn't just asking about correctness; they want to see you reason about what happens when 5,000 people click "Book Now" on the same seat within the same second.

Bad Solution: Application-Level Check-Then-Set

The instinct most people have is to read the seat status in application code, check if it's AVAILABLE, and then write the update:

def hold_seat(seat_id, user_id):
    seat = db.query("SELECT status FROM seats WHERE id = %s", seat_id)
    if seat.status == 'AVAILABLE':
        db.execute(
            "UPDATE seats SET status = 'HELD', held_by = %s, hold_expires_at = %s WHERE id = %s",
            user_id, now() + timedelta(minutes=7), seat_id
        )
        return Success
    return SeatUnavailable

The gap between the SELECT and the UPDATE is where everything breaks. Two threads read AVAILABLE simultaneously, both pass the check, both write HELD. One user's hold silently overwrites the other's. You've just double-sold a seat.

Warning: If you propose this in an interview and don't immediately identify the race condition yourself, it signals a lack of experience with concurrent systems. Always call out the TOCTOU (time-of-check, time-of-use) problem before the interviewer has to.

Good Solution: Pessimistic Locking with SELECT FOR UPDATE

Wrap the read and write in a single transaction and grab an exclusive row lock:

BEGIN;

SELECT id, status FROM seats
WHERE id = 'seat-42' AND event_id = 'evt-abc'
FOR UPDATE;

-- Application checks: is status = 'AVAILABLE'?

UPDATE seats
SET status = 'HELD',
    held_by = 'user-xyz',
    hold_expires_at = NOW() + INTERVAL '7 minutes'
WHERE id = 'seat-42' AND status = 'AVAILABLE';

COMMIT;

FOR UPDATE acquires a row-level exclusive lock. The second concurrent transaction trying to lock the same row blocks until the first commits or rolls back. No race condition, no double-booking.

This works well and is easy to reason about. The tradeoff: under extreme contention (thousands of users targeting the same few seats), you get lock queuing. Transactions pile up waiting for their turn, connection pools fill, and tail latencies spike. For most events this is fine. For a Taylor Swift on-sale with 50K booking attempts per second, the contention on popular seats becomes a bottleneck.

One practical tip: keep the transaction as short as possible. Don't do payment processing inside the lock. Grab the hold, commit, then handle payment separately.

Great Solution: Optimistic Concurrency Control + Database Constraints

Instead of locking the row before reading, you read freely and then attempt a conditional write that only succeeds if nobody else modified the row since you read it. Add a version column to the seat:

ALTER TABLE seats ADD COLUMN version INT NOT NULL DEFAULT 0;

The booking flow becomes:

def hold_seat(seat_id, user_id):
    seat = db.query(
        "SELECT id, status, version FROM seats WHERE id = %s", seat_id
    )
    if seat.status != 'AVAILABLE':
        return SeatUnavailable

    rows_updated = db.execute("""
        UPDATE seats
        SET status = 'HELD',
            held_by = %s,
            hold_expires_at = %s,
            version = version + 1
        WHERE id = %s AND version = %s AND status = 'AVAILABLE'
    """, user_id, now() + timedelta(minutes=7), seat_id, seat.version)

    if rows_updated == 0:
        return ConflictRetryOrFail
    return Success

When two users race on the same seat, both read version=3. User A's UPDATE matches WHERE version=3, bumps it to 4, and returns rows_updated=1. User B's UPDATE finds version=4 (not 3), matches zero rows, and fails cleanly. No blocking, no lock queuing.

The AND status = 'AVAILABLE' in the WHERE clause is your second safety net. Even if version tracking somehow went wrong, the status check prevents transitioning a HELD seat to HELD again. Belt and suspenders.

For the ultimate backstop, add a database-level constraint:

-- Ensures only one active hold per seat at any time
CREATE UNIQUE INDEX idx_seat_active_hold
ON seats (id)
WHERE status IN ('HELD', 'BOOKED');

This constraint means the database itself will reject any attempt to double-book, regardless of what your application code does. If your OCC logic has a bug, if someone deploys bad code at 2am, the constraint catches it.

Tip: Mentioning the database constraint as a safety net beyond your application-level concurrency control shows the interviewer you think defensively. Staff engineers always ask "what if my own code is wrong?"

The tradeoff compared to pessimistic locking: under high contention on the same seat, you'll see more failed attempts that need retrying or redirecting to other seats. But contention on a single specific seat is actually rare in practice. Users are spread across thousands of seats. OCC shines because it avoids holding locks during the read phase, which means your database connections aren't tied up waiting.

Preventing Double-Booking: Optimistic Concurrency Control

"How do we handle the thundering herd when a popular event goes on sale?"

A major concert goes on sale at 10:00 AM. At 9:59:58, 300,000 users are staring at a countdown timer. At 10:00:00, they all click simultaneously. Your booking service can handle maybe 2,000 concurrent booking transactions. What happens to the other 298,000 requests?

Bad Solution: Let Everyone Through

No queue, no throttling. All 300K requests hit the API gateway, fan out to the booking service, and slam the database. Connection pools exhaust in under a second. The database starts rejecting connections. The booking service returns 500s. Users retry aggressively, making it worse. The entire system falls over, and nobody gets tickets.

Warning: Candidates sometimes skip the thundering herd problem entirely and jump to "we'll just scale horizontally." But you can't horizontally scale a single event's seat inventory across multiple databases without partitioning the seats, and even then, the hottest sections create hotspots. The interviewer wants to hear you acknowledge that the write path has a throughput ceiling and you need to protect it.

Good Solution: Rate Limiting at the API Gateway

Configure the API gateway to cap the booking endpoint at, say, 5,000 requests per second per event. Excess requests get a 429 (Too Many Requests) with a Retry-After header.

This protects the backend. But the user experience is terrible. Whether you get through depends on network latency, how fast your browser fires the request, and pure luck. Users on faster connections or closer to the edge have an advantage. Bots with sub-millisecond response times dominate. And users who get 429'd will just hammer retry, creating sustained load.

Rate limiting is a necessary component, but it's not a user-facing solution. It's a circuit breaker, not a queue.

Great Solution: Distributed Virtual Waiting Room

When demand is expected to exceed capacity (the organizer or your system flags the event as high-demand), you activate a waiting room that sits between the user and the booking service.

Here's how it works:

1. Enqueue on arrival. When a user hits the event page around on-sale time, they're assigned a position in a Redis sorted set. The score is the timestamp of arrival plus a small random jitter (to prevent ties and make bot-farming harder):

import time, random

def enqueue_user(event_id, user_id):
    score = time.time() + random.uniform(0, 0.5)  # jitter
    redis.zadd(f"queue:{event_id}", {user_id: score})
    position = redis.zrank(f"queue:{event_id}", user_id)
    return position

The random jitter is subtle but important. Without it, bots that fire requests at exactly 10:00:00.000 all get score ties, and Redis breaks ties alphabetically by member name. Jitter turns a deterministic race into a lottery, which is actually fairer for humans.

2. Push position updates. Each user holds a WebSocket connection. Every few seconds, the server pushes their current position and estimated wait time. This keeps users engaged instead of rage-refreshing.

3. Controlled admission. An admission controller runs a loop: it checks the booking service's current throughput capacity (via a simple metric like "active holds in the last 10 seconds"), then dequeues a batch of users from the front of the sorted set and issues them short-lived admission tokens (JWT with a 2-minute TTL).

def admit_batch(event_id, batch_size=200):
    # Pop the first N users from the queue
    users = redis.zpopmin(f"queue:{event_id}", batch_size)
    tokens = []
    for user_id, score in users:
        token = generate_jwt(user_id=user_id, event_id=event_id, ttl=120)
        tokens.append((user_id, token))
        push_via_websocket(user_id, {"status": "admitted", "token": token})
    return tokens

4. Token validation on booking. The booking service rejects any POST /bookings request that doesn't carry a valid admission token for that event. This means even if someone bypasses the waiting room UI, they can't book without a token.

For bot prevention, you can layer in a CAPTCHA challenge at enqueue time, device fingerprinting, and rate limits per IP/account. No single measure stops all bots, but the combination raises the cost of cheating significantly.

Tip: When you propose the waiting room, proactively mention the admission rate tuning. The interviewer will want to know: how do you decide how many users to let through per second? The answer is you calibrate it to your booking service's measured throughput, with some headroom. If your service handles 2,000 holds/second, admit 1,500 users/second. This is the kind of operational detail that signals real-world experience.

"How do we handle hold expiration and seat recovery reliably?"

A user selects two seats, gets a 7-minute hold, then abandons their browser. Those seats are stuck in HELD status. If you don't reclaim them, they're lost inventory. If you reclaim them too aggressively, you might yank seats from someone who's mid-payment. And then there's the really fun edge case: the payment webhook arrives after you've already expired the hold.

Bad Solution: Periodic Cron Job

Run a cron job every 60 seconds that scans for expired holds:

UPDATE seats SET status = 'AVAILABLE', held_by = NULL, hold_expires_at = NULL
WHERE status = 'HELD' AND hold_expires_at < NOW();

This "works" but has problems. During a flash sale, you might have 50,000 seats in HELD status. Scanning all of them every minute creates a heavy query that competes with booking transactions for database resources. The granularity is also poor: a hold that expired 1 second ago and one that expired 59 seconds ago get treated the same. That's up to a minute of wasted inventory during the highest-demand period.

Warning: Cron-based cleanup is the first thing most candidates propose. It's not wrong, but if you stop there, you're leaving performance and reliability on the table.

Good Solution: Delayed Message Queue

When the booking service creates a hold, it simultaneously publishes a delayed message to a queue (SQS with delay, RabbitMQ with TTL, or a Redis-based delay queue) set to fire at exactly hold_expires_at:

def create_hold(seat_ids, user_id, booking_id):
    hold_ttl = timedelta(minutes=7)
    expires_at = now() + hold_ttl

    # ... atomic seat status update ...

    delay_queue.publish(
        message={"booking_id": booking_id, "seat_ids": seat_ids},
        delay_seconds=hold_ttl.total_seconds()
    )

When the message fires, the expiry worker checks if the booking is still in PENDING status. If so, it releases the seats. If the booking was already confirmed, it does nothing.

def handle_expiry(message):
    booking = db.query("SELECT status FROM bookings WHERE id = %s", message.booking_id)
    if booking.status == 'PENDING':
        db.execute("""
            UPDATE seats SET status = 'AVAILABLE', held_by = NULL
            WHERE id = ANY(%s) AND status = 'HELD'
        """, message.seat_ids)
        db.execute("UPDATE bookings SET status = 'EXPIRED' WHERE id = %s", message.booking_id)

This is precise (fires at the right time), lightweight (no scanning), and decoupled from the booking path. The tradeoff: you're now dependent on the message queue's reliability. If the queue loses a message, that seat stays held forever.

Great Solution: Lazy Expiration + Background Sweeper + Database Safety Net

Combine three mechanisms, each covering the others' failure modes:

Lazy expiration on read. Whenever any code path reads a seat's status, it checks hold_expires_at. If the hold is expired, it treats the seat as AVAILABLE regardless of the stored status. This means the seat map always shows correct availability, even if no cleanup has run yet.

def effective_status(seat):
    if seat.status == 'HELD' and seat.hold_expires_at < now():
        return 'AVAILABLE'
    return seat.status

Delayed queue for proactive cleanup. Same as the Good solution. This ensures the database gets updated promptly, so you're not relying on lazy checks forever.

Background sweeper as a catch-all. A lightweight cron that runs every 5 minutes, catching anything the delayed queue missed (queue failures, message loss, edge cases). Because the delayed queue handles 99% of cases, the sweeper only touches a handful of rows.

Now, the hard edge case: payment webhook arrives after hold expiry.

Here's the scenario. User holds seats at 10:00. Hold expires at 10:07. Payment gateway is slow and sends the success webhook at 10:08. By then, the expiry worker has already released the seats. Maybe another user has even re-held them.

You handle this with a strict state machine check at confirmation time:

def confirm_booking(booking_id, payment_confirmation):
    with db.transaction():
        booking = db.query(
            "SELECT * FROM bookings WHERE id = %s FOR UPDATE", booking_id
        )
        if booking.status != 'PENDING':
            # Hold already expired and was cleaned up
            refund(payment_confirmation.transaction_id)
            return BookingExpired

        seats = db.query(
            "SELECT * FROM seats WHERE booking_id = %s FOR UPDATE", booking_id
        )
        if any(s.status != 'HELD' or s.hold_expires_at < now() for s in seats):
            refund(payment_confirmation.transaction_id)
            return BookingExpired

        # Everything checks out: confirm
        db.execute("UPDATE seats SET status = 'BOOKED' WHERE booking_id = %s", booking_id)
        db.execute("UPDATE bookings SET status = 'CONFIRMED' WHERE id = %s", booking_id)
        return BookingConfirmed

The refund path is unavoidable here. You can't honor a payment for seats that are no longer held. The user gets their money back, and you show them a clear message: "Sorry, your hold expired. Your payment has been refunded."

Tip: Bringing up the late-webhook edge case before the interviewer asks about it is a strong signal. It shows you've thought about the full lifecycle, not just the happy path. Mention that you'd track refund rates as an operational metric; if they spike, your hold TTL might be too short or your payment provider too slow.

"How do we scale seat map reads when millions of users are refreshing during a flash sale?"

The read path and write path have completely different scaling profiles. Writes are bottlenecked by seat-level contention and database locks. Reads are bottlenecked by sheer volume: millions of users refreshing the seat map every few seconds, each expecting to see which seats are still available.

You don't need perfect real-time accuracy on the read path. A seat map that's 2-3 seconds stale is fine because the hold mechanism catches conflicts at write time. This insight unlocks aggressive caching.

Layer 1: CDN caching with short TTLs. Serve the seat map from CDN edge nodes with a 2-3 second TTL. During a flash sale for a 20,000-seat venue, this means the origin only gets hit once every 2-3 seconds per edge location, not once per user. If you have 50 edge locations, that's ~20 requests/second to origin instead of millions.

Layer 2: Pre-computed section-level aggregates. Most users don't need individual seat status for the entire venue. They first pick a section, then see the detailed seat map for that section. Pre-compute availability counts per section:

-- Materialized view or cache entry, refreshed every 2 seconds
SELECT section, COUNT(*) as available_seats, MIN(price_cents) as min_price
FROM seats
WHERE event_id = 'evt-abc'
  AND (status = 'AVAILABLE' OR (status = 'HELD' AND hold_expires_at < NOW()))
GROUP BY section;

This query runs against a read replica, not the primary. The result is tiny (maybe 30 rows for 30 sections) and gets pushed to the CDN cache. Users see "Section 102: 47 seats available from $85" without your database serving 20,000 individual seat rows per request.

Layer 3: WebSocket/SSE for incremental updates. Instead of every user polling every 2 seconds, establish a WebSocket connection and push seat status changes as they happen. When a seat transitions from AVAILABLE to HELD, the aggregation service publishes an event. The WebSocket server fans it out to connected clients.

This dramatically reduces read load. Instead of N users × 1 request every 2 seconds, you have N persistent connections receiving small delta updates. The tradeoff is that WebSocket connections consume server memory (each connection holds state), so you need to plan for connection capacity. For a 300K-user flash sale, that's 300K concurrent WebSocket connections, which is achievable with a few dozen servers using something like Socket.IO or a managed service.

Layer 4: Graceful degradation. If the WebSocket layer gets overwhelmed, fall back to polling with the CDN cache. If the CDN cache gets stale, the seat map is a few seconds behind but the system still works. If even the read replicas lag, show section-level counts only and disable the detailed seat-level view temporarily. Each degradation step reduces load while keeping the core booking flow operational.

Tip: When discussing the read path, explicitly tell the interviewer: "Staleness on reads is acceptable because correctness is enforced at write time by the hold mechanism." This one sentence shows you understand the relationship between the two paths and aren't trying to make everything strongly consistent unnecessarily.

Scaling Seat Map Reads During Flash Sales

What is Expected at Each Level

Interviewers calibrate against the level you're targeting. A solid mid-level answer that nails the fundamentals will pass at that band, but the same answer from a senior candidate will feel incomplete. Here's what the bar looks like for each.

Mid-Level

Identify the core entities and map out the booking flow clearly. You should walk through User, Event, Venue, Seat, Booking, and Payment without being prompted, and articulate the search → hold → pay → confirm lifecycle. If you skip the hold step and jump straight from seat selection to payment, that's a red flag.
Recognize that double-booking is THE problem and propose a database-level solution. You don't need to debate pessimistic vs. optimistic locking in depth, but you do need to say something like "we need row-level locking or a constraint to prevent two users from booking the same seat." Candidates who hand-wave this with "we'll just check availability in the application layer" will not pass.
Produce a clean API design and a reasonable schema. The interviewer wants to see that you can translate requirements into concrete endpoints and tables. Your schema doesn't need to be perfect, but the seat status field and its transitions should be front and center.
It's okay to not fully solve the flash-sale problem. Acknowledging that 100K concurrent users hitting a single event is a different beast than normal traffic is enough. Bonus points if you mention a queue or rate limiting, but a detailed waiting room design isn't expected at this level.

Senior

Design the temporary hold mechanism end-to-end. This means specifying the TTL, explaining what happens when it expires, and describing the recovery path for releasing seats back to inventory. You should be able to answer "what if the user's payment takes 12 minutes?" without hesitating.
Propose the virtual waiting room and explain why it's necessary. The interviewer will likely ask "what happens when Taylor Swift tickets go on sale?" You need a concrete answer: queue users, meter admission, match throughput to what the booking service can handle. Vague references to "scaling up" won't cut it.
Discuss the tradeoffs between pessimistic and optimistic locking with specifics. Which one gives you higher throughput under low contention? Which one is safer under extreme contention? When would you pick one over the other? The interviewer is testing whether you've actually thought about this or just memorized a definition.
Address the read/write asymmetry. Millions of users refreshing the seat map vs. thousands actually booking seats. You should propose caching with short TTLs for the read path and explain why eventual consistency is acceptable there but not on the write path.

Staff+

Drive the conversation toward edge cases before the interviewer does. What happens when a payment webhook arrives 30 seconds after the hold expired and the seat was re-sold? What about partial failures in a multi-seat booking where 3 of 4 seats lock successfully? Raising these proactively signals that you've built systems like this and know where they break.
Address bot detection and fairness as first-class concerns. A waiting room is useless if scalper bots can grab 500 queue positions. You should discuss CAPTCHA challenges at queue entry, device fingerprinting, purchase limits per account, and how randomized queue positioning can reduce the advantage of scripted requests.
Discuss data partitioning and hotspot avoidance. A single hot event means all writes target the same set of rows. How do you shard seat inventory without creating a single-partition bottleneck? Can you partition by event_id and still handle the case where one event dominates all traffic? This is where you show you understand the database layer deeply.
Bring up operational maturity unprompted. Graceful degradation when the payment provider is slow (extend hold TTLs dynamically). Monitoring for inventory leaks where seats get stuck in HELD state and never release. Alerting on booking success rates dropping below a threshold during an on-sale. The interviewer wants to see that you think about running the system in production, not just designing it on a whiteboard.

Key takeaway: This system lives and dies on one thing: the atomic transition of a seat from AVAILABLE to HELD. Every architectural decision, from database locking strategy to the virtual waiting room to hold expiration recovery, exists to protect the integrity of that single status change under extreme concurrency. If your design guarantees that transition is safe, everything else is optimization. If it doesn't, nothing else matters.