Design a URL Shortener

Requirements & Estimation

Let's start by clarifying what we're building. A URL shortener takes long URLs and converts them into short, manageable links that redirect to the original URL.

Functional Requirements

URL shortening: Given a long URL, generate a unique short URL (e.g., http://short.ly/abc123)
URL redirection: When users visit the short URL, redirect them to the original long URL
User authentication: Support both anonymous and registered users for URL creation
Custom aliases: Allow authenticated users to specify custom short codes (e.g., http://short.ly/my-product)
Link expiration: Support automatic expiration of URLs after a specified time period
Analytics: Track basic metrics like click count and timestamp for each short URL
URL management: Allow authenticated users to view, update, and delete their shortened URLs
Bulk operations: Support shortening multiple URLs in a single API call for authenticated users

Non-Functional Requirements

Scale: Handle 100 million URLs shortened per day
Performance: URL redirection latency < 100ms at p99
Availability: 99.9% uptime (allows ~8.7 hours downtime per year)
Data durability: No data loss for stored URL mappings
Security: Detect and block malicious URLs (phishing, malware) before shortening
Rate limiting: Prevent abuse through API rate limits per user/IP
Global reach: Serve users worldwide with low latency

Back-of-Envelope Estimation

Let's calculate the system requirements based on 100M new URLs per day.

Key assumptions: - Read/write ratio: 100:1 (each shortened URL is accessed 100 times on average) - Storage per URL record: 500 bytes - Short code: 7 bytes - Original URL: 200 bytes (average) - User ID: 8 bytes - Timestamps (created, expires): 16 bytes - Click count: 8 bytes - Metadata/padding: ~261 bytes

Metric	Calculation	Result
Write QPS	100M / (24 × 3600)	~1,160 writes/sec
Read QPS	1,160 × 100 (read/write ratio)	~116,000 reads/sec
Peak QPS	116,000 × 2 (peak factor)	~232,000 reads/sec
Storage (5 years)	100M × 365 × 5 × 500 bytes	~91 TB
Bandwidth (read)	116,000 × 500 bytes	~58 MB/sec
Bandwidth (write)	1,160 × 500 bytes	~0.58 MB/sec
Cache memory	20M URLs × 500 bytes	~10 GB

Cache sizing: We'll cache the 20% most frequently accessed URLs from our daily active set. With ~100M daily active URLs, that's 20M entries requiring ~10GB of memory.

URL length calculation: Using base62 encoding (a-z, A-Z, 0-9): - 7 characters: 62^7 = 3.5 trillion possible combinations - At 100M URLs/day, we have 35,000 days (95+ years) before exhaustion

Interview tip: Always validate your calculations. 232K QPS is high but achievable with proper caching and load distribution. The 91TB storage over 5 years is manageable with modern databases.

These numbers tell us we need a read-optimized system with strong caching. The 100:1 read/write ratio suggests that investing in a distributed cache will significantly reduce database load.

High-Level Design

Let's start with a simple architecture that can handle our requirements of 100M URLs per day and 40K read QPS at peak.

As shown in the architecture diagram below, our system consists of several key components working together:

The core components include: - Load Balancer: Distributes incoming requests across multiple web servers - Web Servers: Stateless servers handling URL shortening and redirection logic - Cache Layer (Redis): Stores frequently accessed URL mappings for fast retrieval - Primary Database (MySQL): Persists URL mappings with strong consistency guarantees - Read Replicas: Distribute read load across multiple database instances - CDN: Caches and serves redirect responses globally - Analytics Database: Stores click events and usage data separately

Write Flow: Creating Short URLs

When a user wants to shorten a URL, the request flows through our system as follows:

Client sends a POST request with the long URL to our API
Load balancer routes the request to an available web server
Web server generates a unique short code (we'll detail this algorithm later)
Server writes the mapping to both the database and cache layer
Server returns the shortened URL to the client

We chose MySQL for our primary database because URL shortening requires strong consistency. When a user creates a short URL, they expect it to work immediately across all regions.

Read Flow: URL Redirection

The read path is optimized for speed since we expect 100x more reads than writes:

User clicks on a short URL (e.g., short.ly/abc123)
CDN checks if it has the redirect cached; if yes, returns 302 immediately
On cache miss, request reaches our load balancer
Web server first checks Redis cache (sub-millisecond lookup)
On cache miss, server queries the read replica database
Server returns HTTP 302 redirect with the original URL in the Location header
Server asynchronously logs the click event to the analytics database

Interview tip: We use HTTP 302 (Found) instead of 301 (Moved Permanently) because 302 responses aren't cached by browsers. This allows us to track each click and update destination URLs if needed.

Handling Read-After-Write Consistency

Since read replicas may have replication lag (typically 10-100ms), we need to ensure newly created URLs work immediately. We implement this through a simple strategy:

For the first 30 seconds after creation, reads for a specific short URL are directed to the primary database if not found in cache. This guarantees that users can immediately use their newly created short URLs while giving replicas time to sync.

Geographic Distribution with CDN

To serve our global user base with low latency, we integrate a CDN at the edge. The CDN caches successful redirect responses with appropriate TTL values based on URL popularity.

For example, a viral link might be cached for 24 hours, while regular URLs use a 1-hour TTL. This reduces load on our origin servers and provides < 50ms response times for users worldwide.

The separation of analytics into its own database allows us to optimize each system independently. While URL mappings need ACID properties, analytics can tolerate eventual consistency and benefit from a write-optimized store like Cassandra.

API Design

Now that we've established the high-level architecture, let's define the API endpoints that clients will use to interact with our URL shortener service.

Core API Endpoints

Method	Endpoint	Description	Request Body	Response
POST	/shorten	Create a short URL	`{long_url, custom_alias?, expiration_time?}`	`{short_url, short_code, long_url, created_at, expires_at, deletion_token?}`
GET	/{short_code}	Redirect to original URL	None	301 Redirect with Location header
DELETE	/{short_code}	Delete a short URL	None	204 No Content
GET	/analytics/{short_code}	Get URL analytics	None	`{clicks, referrers[], countries[], daily_stats[]}`
GET	/api/urls	List user's URLs	None	`{urls[], pagination}`

Request and Response Examples

Let's look at detailed examples for each endpoint.

POST /shorten

// Request { "long_url": "https://example.com/very/long/path/to/article?utm_source=newsletter", "custom_alias": "my-article", // optional "expiration_time": "2024-12-31T23:59:59Z" // optional } // Response (201 Created) { "short_url": "https://short.ly/my-article", "short_code": "my-article", "long_url": "https://example.com/very/long/path/to/article?utm_source=newsletter", "created_at": "2024-01-15T10:30:00Z", "expires_at": "2024-12-31T23:59:59Z" } // Response for anonymous user (includes deletion token) { "short_url": "https://short.ly/abc123", "short_code": "abc123", "long_url": "https://example.com/very/long/path/to/article?utm_source=newsletter", "created_at": "2024-01-15T10:30:00Z", "expires_at": "2024-12-31T23:59:59Z", "deletion_token": "del_xKj9Lm2nP4qR7sT" } // Error Response (409 Conflict - alias taken) { "error": "ALIAS_ALREADY_EXISTS", "message": "The custom alias 'my-article' is already in use" }

GET /{short_code}

GET /abc123 HTTP/1.1 Host: short.ly

HTTP/1.1 301 Moved Permanently
Location: https://example.com/very/long/path/to/article?utm_source=newsletter
Cache-Control: public, max-age=3600

GET /analytics/{short_code}

// Response { "short_code": "abc123", "total_clicks": 15234, "unique_visitors": 8921, "referrers": [ {"source": "twitter.com", "count": 5234}, {"source": "facebook.com", "count": 3421}, {"source": "direct", "count": 6579} ], "countries": [ {"code": "US", "name": "United States", "count": 8234}, {"code": "UK", "name": "United Kingdom", "count": 3421} ], "daily_stats": [ {"date": "2024-01-15", "clicks": 234}, {"date": "2024-01-14", "clicks": 189} ] }

Authentication and Authorization

We'll use API keys for programmatic access and JWT tokens for web users.

// API Key Authentication (Header) X-API-Key: sk_live_abc123xyz789 // JWT Authentication (Header) Authorization: Bearer eyJhbGciOiJIUzI1NiIs...

For the DELETE endpoint, we verify ownership by checking if the authenticated user created the URL. Anonymous URLs can only be deleted using the deletion token returned during creation:

DELETE /abc123 HTTP/1.1 Host: short.ly X-Deletion-Token: del_xKj9Lm2nP4qR7sT

The deletion token is generated when an anonymous user creates a short URL and returned in the POST /shorten response. This token must be stored by the client as it's the only way to delete anonymous URLs.

Rate Limiting

We implement tiered rate limiting based on authentication status:

User Type	URL Creation	URL Resolution	Analytics
Anonymous	10/hour	1000/hour	N/A
Free User	100/hour	10000/hour	100/hour
Pro User	1000/hour	100000/hour	1000/hour

Rate limit headers are included in all responses:

X-RateLimit-Limit: 100 X-RateLimit-Remaining: 87 X-RateLimit-Reset: 1705321200

Pagination

For endpoints returning lists (like /api/urls), we use cursor-based pagination:

// Request GET /api/urls?cursor=eyJpZCI6MTIzfQ&limit=20 // Response { "urls": [...], "pagination": { "next_cursor": "eyJpZCI6MTQzfQ", "has_more": true, "total_count": 156 } }

Error Handling

We follow a consistent error response format across all endpoints:

{ "error": "VALIDATION_ERROR", "message": "Invalid URL format", "details": { "field": "long_url", "reason": "URL must start with http:// or https://" }, "request_id": "req_abc123" }

Common error codes include: - RATE_LIMIT_EXCEEDED (429) - URL_NOT_FOUND (404) - INVALID_REQUEST (400) - UNAUTHORIZED (401) - INTERNAL_ERROR (500)

Interview tip: When discussing API design, always mention idempotency. Our POST /shorten endpoint is idempotent when using custom aliases — submitting the same alias and URL combination returns the existing short URL rather than creating a duplicate.

Data Model

Now that we've defined our API endpoints, let's design the data model to support our URL shortener. We need to store URL mappings, track analytics, and ensure fast lookups for our 40K reads/sec requirement.

Core Tables

Let's start with the URL mappings table in MySQL:

CREATE TABLE url_mappings ( short_code VARCHAR(7) PRIMARY KEY, long_url TEXT NOT NULL, user_id BIGINT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, expires_at TIMESTAMP NULL, is_custom BOOLEAN DEFAULT FALSE, INDEX idx_user_id (user_id), INDEX idx_expires_at (expires_at), INDEX idx_created_at (created_at) );

For analytics, we'll use Cassandra to handle high write throughput:

CREATE TABLE url_analytics ( short_code VARCHAR(7), clicked_at TIMESTAMP, ip_address INET, country_code TEXT, referrer TEXT, user_agent TEXT, device_type TEXT, PRIMARY KEY ((short_code), clicked_at) ) WITH CLUSTERING ORDER BY (clicked_at DESC);

We also need a user table for authenticated features:

CREATE TABLE users ( user_id BIGINT PRIMARY KEY AUTO_INCREMENT, email VARCHAR(255) UNIQUE NOT NULL, api_key VARCHAR(64) UNIQUE NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, request_quota INT DEFAULT 1000, INDEX idx_api_key (api_key) );

Database Selection Rationale

We chose MySQL for URL mappings because we need strong consistency — when a user creates a short URL, it must be immediately available. MySQL provides ACID guarantees and supports our 400 writes/sec with proper indexing.

For analytics, we selected Cassandra because: - Handles 40K writes/sec for click events - Time-series data fits well with its data model - Eventual consistency is acceptable for analytics - Horizontal scaling through simple node addition

Notice we don't store click counts in MySQL. This avoids write hotspots on popular URLs — instead, we derive metrics from the analytics data.

Indexing Strategy

Our primary access pattern is looking up long URLs by short_code, which uses the primary key index. Secondary indexes support:

idx_user_id: Retrieve all URLs created by a user
idx_expires_at: Background job to clean expired URLs
idx_created_at: Generate usage reports and trending URLs

Interview tip: Always explain why you chose specific indexes. Each index speeds up reads but slows down writes, so justify the trade-off.

Partitioning Strategy

As we scale beyond a single MySQL instance, we'll partition the url_mappings table by short_code ranges:

Partition	Short Code Range	Server
P1	0000000 - 9zzzzzz	mysql-1
P2	A000000 - Jzzzzzz	mysql-2
P3	K000000 - Tzzzzzz	mysql-3
P4	U000000 - zzzzzzz	mysql-4

With base62 encoding (0-9, A-Z, a-z), we have 62^7 ≈ 3.5 trillion possible URLs. Each partition handles approximately 875 billion URLs.

For Cassandra, the partition key is short_code, which distributes click events across nodes. The clustering key (clicked_at) keeps events sorted by time within each partition.

Cache Schema

We'll use Redis with the following structure:

{ "key": "url:abc123", "value": { "long_url": "https://example.com/very/long/url", "expires_at": 1735689600 }, "ttl": 86400 }

Cache entries have a 24-hour TTL by default, but we'll adjust based on access patterns: - Hot URLs (>1000 clicks/day): 7-day TTL - Medium URLs (100-1000 clicks/day): 1-day TTL
- Cold URLs (<100 clicks/day): 1-hour TTL

Click counts are calculated from analytics data when needed, not cached to avoid consistency issues.

Data Relationships

The data model maintains these relationships:

One-to-Many: User to URL mappings (a user can create multiple short URLs)
One-to-Many: URL mapping to analytics events (one short URL generates many click events)
Loose Coupling: Analytics data references short_code but doesn't enforce foreign key constraints for performance

This separation allows us to scale reads and writes independently. URL mappings require consistency and moderate write volume, while analytics needs high write throughput with relaxed consistency.

Detailed Design

Now let's examine the core components that make our URL shortener performant and reliable. As shown in the detailed architecture diagram below, we've separated concerns into specialized services.

Short Code Generation with Range Allocation

The most critical component is generating unique short codes at scale. We use a counter-based approach with range allocation to avoid collisions while supporting multiple servers.

We implement range allocation using a dedicated database table with transactional updates:

CREATE TABLE id_ranges ( range_id BIGINT PRIMARY KEY AUTO_INCREMENT, start_id BIGINT NOT NULL, end_id BIGINT NOT NULL, server_id VARCHAR(50), allocated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP );

Each application server requests a range of IDs atomically:

class URLShortener: def __init__(self, server_id): self.server_id = server_id self.current_id = None self.max_id = None self.db = DatabaseConnection() def get_next_id(self): if self.current_id is None or self.current_id >= self.max_id: try: # Atomically allocate new range self.current_id, self.max_id = self._allocate_range() except Exception as e: # Fallback to random generation if range allocation fails return self._generate_random_id() id = self.current_id self.current_id += 1 return id def _allocate_range(self, range_size=1000): with self.db.transaction() as tx: # Get the last allocated range result = tx.execute(""" SELECT MAX(end_id) as last_id FROM id_ranges FOR UPDATE """) last_id = result[0]['last_id'] or 0 start_id = last_id + 1 end_id = last_id + range_size # Record the allocation tx.execute(""" INSERT INTO id_ranges (start_id, end_id, server_id) VALUES (?, ?, ?) """, (start_id, end_id, self.server_id)) return start_id, end_id def _generate_random_id(self): # Fallback: generate random ID in upper range return random.randint(10**15, 10**16) def id_to_short_code(self, id): # Convert numeric ID to base62 string chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" result = [] while id > 0: result.append(chars[id % 62]) id //= 62 return ''.join(reversed(result))

Each server gets a range of 1,000 IDs at a time. If a server crashes, we lose at most 1,000 IDs — acceptable given our 3.5 trillion possible combinations.

Interview tip: Always discuss failure scenarios. Here, losing 1,000 IDs per crash is negligible compared to the complexity of perfectly sequential IDs.

Custom URL and Collision Handling

Users often want memorable short codes like "fb2024" instead of "x7Kg9p2". We handle this through optimistic locking with retry logic:

async def create_custom_url(long_url, custom_code, max_retries=3): # Check availability if await is_reserved_keyword(custom_code): raise ValueError("This keyword is reserved") if await redis.exists(f"url:{custom_code}"): raise ValueError("Custom URL already taken") # Attempt creation with retries for attempt in range(max_retries): lock = await acquire_lock(f"lock:{custom_code}", timeout=5) try: # Double-check after acquiring lock if await redis.exists(f"url:{custom_code}"): raise ValueError("Custom URL already taken") # Write to database first (source of truth) try: await db.insert_url_mapping(custom_code, long_url) except UniqueConstraintError: raise ValueError("Custom URL already taken") # Then update cache (best effort) try: # Custom URLs don't expire - set without TTL await redis.set(f"url:{custom_code}", long_url) except RedisError: # Log error but don't fail - DB is source of truth logger.error(f"Failed to cache custom URL: {custom_code}") return custom_code except Exception as e: if attempt == max_retries - 1: raise await asyncio.sleep(0.1 * (2 ** attempt)) # Exponential backoff finally: await release_lock(lock)

We maintain a blocklist of reserved keywords loaded into memory on startup:

RESERVED_KEYWORDS = { 'admin', 'api', 'www', 'mail', 'ftp', 'blog', 'shop', 'help', 'support', 'terms' }

Intelligent Cache Warming

Our cache hit ratio directly impacts latency. We run a scheduled job every hour to warm the cache based on access patterns:

class CacheWarmer: def __init__(self, redis_cluster, db): self.redis = redis_cluster self.db = db async def warm_cache(self): # Load top 10K URLs from last 24 hours top_urls = await self.db.query(""" SELECT u.short_code, u.long_url, COUNT(*) as clicks FROM url_mappings u JOIN analytics a ON u.short_code = a.short_code WHERE a.clicked_at > NOW() - INTERVAL 24 HOUR GROUP BY u.short_code ORDER BY clicks DESC LIMIT 10000 """) # Batch load into Redis with exponential TTL based on popularity pipeline = self.redis.pipeline() for url in top_urls: # Popular URLs get longer TTL, up to 7 days ttl = min(86400 * 7, 3600 * math.log(url.clicks + 1)) pipeline.setex(f"url:{url.short_code}", url.long_url, int(ttl)) await pipeline.execute() # Schedule to run hourly scheduler.add_job(cache_warmer.warm_cache, 'interval', hours=1)

URLs with more clicks get longer TTLs, up to 7 days. This keeps our 100GB cache focused on truly popular content.

URL Validation Pipeline

We can't let users create short links to malicious sites. Our validation runs in parallel for speed:

async def validate_url(long_url): # Run all checks in parallel results = await asyncio.gather( check_url_format(long_url), check_against_blocklist(long_url), check_ssl_certificate(long_url), check_phishing_database(long_url), return_exceptions=True ) for result in results: if isinstance(result, Exception): raise ValidationError(f"URL validation failed: {result}") return True async def check_url_format(url): # Use urllib to parse and validate URL structure try: parsed = urlparse(url) if not parsed.scheme in ['http', 'https']: raise ValidationError("Only HTTP(S) URLs allowed") if not parsed.netloc: raise ValidationError("Invalid URL format") return True except Exception as e: raise ValidationError(f"Invalid URL: {e}") async def check_ssl_certificate(url): # Only check HTTPS URLs if not url.startswith('https://'): return True # Use aiohttp to verify SSL cert async with aiohttp.ClientSession() as session: try: async with session.head(url, timeout=5, ssl=True) as response: return True except aiohttp.ClientSSLError: raise ValidationError("Invalid SSL certificate") async def check_phishing_database(url): # Query Google Safe Browsing API domain = extract_domain(url) # Check local cache first if await redis.get(f"safe:{domain}"): return True # Query external API result = await safe_browsing_client.check_url(url) if result.is_malicious: raise ValidationError("URL flagged as malicious") # Cache safe domains for 24 hours await redis.setex(f"safe:{domain}", "1", 86400) return True async def check_against_blocklist(url): domain = extract_domain(url) # Check bloom filter first (fast negative check) if not bloom_filter.might_contain(domain): return True # Confirm with actual blocklist is_blocked = await redis.sismember("blocked_domains", domain) if is_blocked: raise ValidationError(f"Domain {domain} is blocked")

The bloom filter is rebuilt daily from our blocked domains database:

async def rebuild_bloom_filter(): # Load all blocked domains blocked_domains = await db.query("SELECT domain FROM blocked_domains") # Create new bloom filter with 0.01% false positive rate new_filter = BloomFilter(capacity=len(blocked_domains) * 2, error_rate=0.0001) for domain in blocked_domains: new_filter.add(domain['domain']) # Atomic swap global bloom_filter bloom_filter = new_filter # Schedule daily rebuild scheduler.add_job(rebuild_bloom_filter, 'cron', hour=3)

Real-time Analytics Pipeline

We process billions of click events daily without impacting redirect latency. Events flow through Kafka to our analytics service:

# In redirect service - fire and forget async def handle_redirect(short_code): # Get URL from cache/DB (existing logic) long_url = await get_long_url(short_code) # Send analytics event asynchronously event = { "short_code": short_code, "timestamp": time.time(), "ip": request.remote_addr, "user_agent": request.headers.get('User-Agent'), "referrer": request.headers.get('Referer') } # Non-blocking write to Kafka await kafka_producer.send('url_clicks', event) return redirect(long_url, code=301)

The analytics service batches writes to Cassandra with time-based flushing:

-- Cassandra schema optimized for time-series queries CREATE TABLE analytics ( short_code TEXT, date DATE, clicked_at TIMESTAMP, ip TEXT, user_agent TEXT, PRIMARY KEY ((short_code, date), clicked_at) ) WITH CLUSTERING ORDER BY (clicked_at DESC); class AnalyticsProcessor: def __init__(self): self.batch = [] self.batch_size = 1000 self.flush_interval = 5 # seconds self.last_flush = time.time() async def start(self): # Start consumer and flush timer in parallel await asyncio.gather( self.process_events(), self.periodic_flush() ) async def process_events(self): async for event in kafka_consumer: self.batch.append(event) if len(self.batch) >= self.batch_size: await self.flush_batch() async def periodic_flush(self): while True: await asyncio.sleep(self.flush_interval) if self.batch: # Only flush if there are events await self.flush_batch() async def flush_batch(self): if not self.batch: return # Batch insert to Cassandra insert_query = """ INSERT INTO analytics (short_code, date, clicked_at, ip, user_agent) VALUES (?, ?, ?, ?, ?) """ statements = [] for event in self.batch: date = datetime.fromtimestamp(event['timestamp']).date() statements.append(( event['short_code'], date, event['timestamp'], event['ip'], event['user_agent'] )) await cassandra.execute_batch(insert_query, statements) self.batch.clear() self.last_flush = time.time()

This design keeps our redirect latency under 50ms while capturing comprehensive analytics. The async pipeline handles traffic spikes gracefully — if Kafka backs up, redirects continue working.

Scaling & Optimization

With our initial design handling 40K reads/sec and 400 writes/sec, let's identify and address potential bottlenecks as we scale to billions of users.

Identifying Bottlenecks

The primary bottleneck in our system is the database. With a 100:1 read/write ratio, even with caching, popular URLs can overwhelm our read replicas. A viral link might receive millions of clicks within minutes.

The second bottleneck is geographic latency. Users in Asia accessing URLs stored in US data centers experience 200-300ms additional latency. This violates our <100ms latency requirement.

Database Sharding Strategy

We'll shard our URL mappings table using consistent hashing on the short_code. This approach provides even distribution and allows us to add shards without massive data migration.

def get_shard(short_code): hash_value = hash(short_code) return consistent_hash_ring.get_node(hash_value)

To determine the number of shards, let's calculate based on our requirements: - Each shard handles ~2K writes/sec at peak - Our system needs 400 writes/sec initially, scaling to 4K writes/sec - With 3x headroom for growth: 4K × 3 = 12K writes/sec - Number of shards needed: 12K / 2K = 6 shards minimum

We'll start with 8 shards for better distribution and easier scaling (powers of 2 work well with consistent hashing). Each shard runs on a primary-replica setup with 3-5 read replicas to handle our read load.

Multi-Tier Caching Architecture

We implement three cache layers to handle different access patterns:

L1 Cache - Application Server (Local) - Size: 1GB per server - TTL: 5 minutes - Stores: Top 10K most accessed URLs

L2 Cache - Redis Cluster (Regional) - Size: 100GB per node, 10 nodes per region - TTL: 1 hour for popular URLs, 10 minutes for others - Stores: All accessed URLs in the region

L3 Cache - CDN Edge (Global) - Stores: Static redirect pages for top 1M URLs - TTL: 24 hours - Serves: 301 redirects directly from edge locations

Cache invalidation happens through a pub/sub system when URLs are deleted or expire:

def invalidate_url(short_code): # Publish to Redis pub/sub redis_client.publish('url_invalidation', short_code) # Each cache layer subscribes and removes the entry l1_cache.delete(short_code) l2_cache.delete(short_code) cdn_api.purge(f"/{short_code}")

Handling Hot URLs

When a URL goes viral, we need special handling to prevent cache stampedes. We implement:

1. Bloom Filters - Before checking the database for non-existent URLs:

if not bloom_filter.might_contain(short_code): return 404 # Definitely doesn't exist # Only check cache/DB if bloom filter says maybe

2. Request Coalescing - Multiple concurrent requests for the same URL wait for a single DB query:

future = pending_requests.get(short_code) if not future: future = Future() pending_requests[short_code] = future # First request does the actual work result = fetch_from_db(short_code) future.set_result(result) return future.get_result()

3. Probabilistic Cache Refresh - Refresh cache entries before they expire to prevent thundering herd:

def should_refresh(ttl_remaining, beta=1.0): xfetch = delta * beta * log(random()) return ttl_remaining < xfetch

Global Distribution Strategy

We deploy in 5 regions: US-East, US-West, EU, Asia-Pacific, and South America. Each region has:

Complete application stack
Regional Redis cluster
Read replicas of all shards
Analytics data aggregation nodes

GeoDNS routes users to the nearest region. For URL creation, we use a global counter service with pre-allocated ID ranges per region:

US-East: 1,000,000,000 - 1,999,999,999
US-West: 2,000,000,000 - 2,999,999,999
EU: 3,000,000,000 - 3,999,999,999
Asia-Pac: 4,000,000,000 - 4,999,999,999
South-Am: 5,000,000,000 - 5,999,999,999

These numeric IDs are then encoded to short_codes using base62 encoding. Since we use consistent hashing on the final short_code (not the numeric ID), URLs from all regions distribute evenly across shards. This prevents regional hot-spotting.

Popular URLs get replicated across regions using a background job that monitors access patterns.

Monitoring & Alerting

We track these key metrics with DataDog and PagerDuty:

Performance Metrics: - P50, P95, P99 latency per endpoint (target: <50ms, <80ms, <100ms) - Cache hit ratio by layer (target: L1 >60%, L2 >90%, L3 >95%) - Database connection pool utilization (alert at >80%) - Queue depth for analytics events (alert at >100K)

Business Metrics: - URLs created per minute by region - Redirect success rate (target: >99.9%) - Invalid URL attempts (potential attacks) - Storage growth rate per shard

System Health: - CPU and memory per service - Network bandwidth utilization - Disk I/O on database servers - Redis memory usage and eviction rate

Alerts are configured with escalation policies:

alerts: - name: high_p99_latency condition: p99_latency > 100ms for 5 minutes severity: warning notify: on-call-engineer - name: cache_hit_ratio_low condition: l2_cache_hit_ratio < 80% for 10 minutes severity: critical notify: [on-call-engineer, team-lead]

Interview tip: Always mention specific monitoring metrics and thresholds. It shows you've operated production systems and understand the importance of observability.

Summary

We've designed a URL shortening service capable of handling 40K reads/sec and 400 writes/sec with global distribution. Let's recap the key architectural decisions that make this system work at scale.

Key Design Decisions

Counter-based ID Generation: We chose a counter-based approach with Zookeeper coordination over random generation or hashing. Each server gets a range of IDs (e.g., server 1 gets 1-1M, server 2 gets 1M-2M), eliminating collision checks and database lookups during URL creation.

Write-through Caching Strategy: Every new URL mapping writes to both MySQL and Redis simultaneously. This ensures cache consistency and eliminates cold start problems. With a 100:1 read/write ratio, this investment in write complexity pays off in read performance.

Async Analytics Pipeline: Click events flow through Kafka to Cassandra, decoupling the critical path (URL redirection) from analytics processing. This allows us to maintain <100ms redirect latency even while collecting detailed metrics.

MySQL + Cassandra Hybrid: We use MySQL for URL mappings (strong consistency, moderate write volume) and Cassandra for analytics (eventual consistency, high write throughput). This "right tool for the job" approach optimizes both cost and performance.

Common Follow-up Questions

Interview tip: Be prepared to discuss these areas in depth:

"How would you handle custom URLs that collide with generated ones?" — Reserve ranges, use different tables, or prefix custom URLs
"What if a URL becomes viral and gets millions of requests?" — Multi-tier caching, CDN integration, read replicas in hot regions
"How do you prevent abuse and spam?" — Rate limiting, CAPTCHA for high-volume users, URL blacklisting, machine learning for pattern detection
"How would you implement URL preview features?" — Async crawler service, OpenGraph tag extraction, screenshot service
"What about GDPR and data deletion?" — Soft deletes with TTL, audit logs, user data export APIs

Trade-offs Made

Eventual Consistency for Analytics: We sacrifice real-time accuracy for performance. Click counts might lag by seconds, but redirects stay fast. For most use cases, this delay is acceptable.

Storage Over Computation: We store full URLs rather than compressing them. At 500 bytes per URL, 5 years of data needs ~91TB. Storage is cheap; CPU cycles during redirect are expensive.

Regional Data Duplication: Popular URLs get replicated across regions. This wastes storage but dramatically improves global latency. A viral link in Asia won't slow down US users.

Possible Extensions

QR Code Generation: Add an image service that converts short URLs to QR codes
Bulk API: Allow enterprise users to shorten 1000s of URLs in one request
Advanced Analytics: Click heatmaps, conversion tracking, A/B testing support
API Gateway Features: OAuth integration, webhook notifications, SDKs for major languages

Key Takeaway

The mark of a mature system design is knowing when to break consistency. Our URL shortener maintains strong consistency for the core mapping (you always get the right redirect) but embraces eventual consistency for analytics, caching, and global replication. This selective relaxation of constraints is what enables true web-scale performance — not every part of your system needs the same guarantees.