Design a News Feed System

Understanding the Problem

What is a News Feed System?

Product definition: A news feed is a personalized, continuously updating stream that aggregates posts from people and pages a user follows, ranked by a mix of relevance and recency.

Think of your Facebook home feed or Twitter's "For You" timeline. You open the app, and there's a curated list of content from hundreds of accounts you follow, ordered so the most interesting stuff floats to the top. Behind that simple-looking scroll is one of the hardest problems in distributed systems: assembling a unique feed for each of hundreds of millions of users, from millions of new posts per minute, and serving it in under 200 milliseconds.

The interviewer picks this problem because it forces you to reason about read-heavy workloads, fan-out tradeoffs, caching strategies, and ranking. If you can design a news feed well, you can design most content-distribution systems.

Functional Requirements

Core Requirements:

Publish posts: Users can create posts containing text, images, or links.
Follow/unfollow: Users can follow other users and pages to subscribe to their content.
View home feed: Users can retrieve a personalized feed of posts from accounts they follow, ranked and paginated.
Interact with posts: Users can like, comment on, and share posts in their feed.

Below the line (out of scope):

Push notifications for new posts or interactions
Group or page feeds (separate from the home feed)
Direct messaging between users

Note: "Below the line" features are acknowledged but won't be designed in this lesson. Calling these out explicitly in an interview shows you can scope a problem without losing sight of the bigger picture.

Non-Functional Requirements

High availability: The feed must be available 99.99% of the time. A user opening the app to a blank feed is unacceptable.
Low latency: Feed fetches should return within 200ms at p99. Users expect the feed to feel instant on scroll.
Eventual consistency: A few seconds of delay before a new post appears in followers' feeds is perfectly fine. We don't need strong consistency here, and relaxing this constraint opens up major architectural wins.
Massive scale: Support 500M daily active users (DAU). The system must handle extreme read-to-write ratios and gracefully survive traffic spikes (think major sporting events or breaking news).

One edge case to flag early: celebrity accounts. A user with 10 million followers publishing a single post creates a fundamentally different load profile than a regular user with 200 followers. Mention this to your interviewer during requirements gathering. It signals that you're already thinking about hot keys and write amplification before you've drawn a single box.

Inactive users are another thing to call out. If 30% of your user base hasn't opened the app in a month, pre-computing their feeds wastes storage and compute. The interviewer will appreciate you noting this, even if you defer the solution.

Tip: Always clarify requirements before jumping into design. This shows maturity. Spend 3-5 minutes asking questions, confirming scale, and explicitly stating what's in and out of scope. Interviewers consistently rank candidates higher when they see disciplined scoping.

Back-of-Envelope Estimation

Assume these baseline numbers (state them out loud in your interview so the interviewer can course-correct):

500M DAU
Average user follows 200 accounts
Average user publishes 1 post/day
Average user checks their feed 5 times/day
Average post size: ~1 KB text + metadata (media stored separately on a CDN)

Metric	Calculation	Result
Post writes/day	500M users × 1 post/user	500M posts/day
Post write QPS	500M / 86,400 sec	~6,000 QPS (avg)
Peak write QPS	~3× average	~18,000 QPS
Feed read QPS	500M users × 5 fetches / 86,400 sec	~29,000 QPS (avg)
Peak read QPS	~3× average	~87,000 QPS
Post storage/day	500M posts × 1 KB	~500 GB/day
Post storage/year	500 GB × 365	~180 TB/year
Feed cache per user	500 post IDs × 16 bytes (UUID)	~8 KB/user
Total feed cache	500M users × 8 KB	~4 TB (fits in a Redis cluster)

The read-to-write ratio is roughly 5:1, which is important. This system is heavily read-optimized, and that ratio should steer your architectural choices toward pre-computation and caching rather than on-the-fly assembly.

Notice that the feed cache (4 TB) is entirely feasible to hold in memory across a Redis cluster. That single number justifies the fan-out-on-write approach you'll propose in the high-level design. If it came out to 400 TB, you'd need a very different strategy. Always let the math guide your architecture, not the other way around.

The Set Up

Four entities power this entire system. Get these right on the whiteboard and the rest of the design flows naturally.

Core Entities

User is your account record. Straightforward, but don't skip it. You'll need the id as a foreign key everywhere else.

Post is the content a user publishes. It needs to handle text, images, and links without separate tables for each. A content_type discriminator column plus an optional media_url keeps things simple. You'll also want denormalized engagement counters here; we'll talk about why in a moment.

Follow is the social graph edge. This is the single most important table in the system because it determines who sees whose content. It's a many-to-many relationship between users, and it needs indexes in both directions: "who do I follow?" and "who follows me?" Those two queries serve completely different parts of the architecture.

FeedItem is the materialized entry in a user's feed. Think of it as a pre-computed cache row that says "user X should see post Y with ranking score Z." This entity is what makes the fan-out-on-write strategy possible. If you're only doing fan-out-on-read, this table doesn't exist, but you'll want it for the hybrid approach we'll build in the high-level design.

Tip: When you draw these on the whiteboard, call out FeedItem explicitly. Many candidates forget to model the feed itself as a first-class entity and then struggle to explain where pre-computed feeds live.

CREATE TABLE users (
    id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    username      VARCHAR(64) NOT NULL UNIQUE,
    display_name  VARCHAR(128) NOT NULL,
    avatar_url    TEXT,                        -- nullable, default avatar in app layer
    created_at    TIMESTAMP NOT NULL DEFAULT now()
);

CREATE TABLE posts (
    id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    author_id     UUID NOT NULL REFERENCES users(id),
    content       TEXT,                        -- nullable if post is image-only
    media_url     TEXT,                        -- S3/CDN URL for images, videos, link previews
    content_type  VARCHAR(20) NOT NULL DEFAULT 'text',  -- 'text', 'image', 'link', 'video'
    like_count    INT NOT NULL DEFAULT 0,      -- denormalized for fast read; updated async
    comment_count INT NOT NULL DEFAULT 0,
    created_at    TIMESTAMP NOT NULL DEFAULT now()
);
CREATE INDEX idx_posts_author ON posts(author_id, created_at DESC);

Why denormalize like_count and comment_count directly on the post? Because every single feed render needs these numbers. Joining to a likes table or running a COUNT query for every post in a feed page would be brutal at 30K+ QPS. Accept the slight inconsistency from async counter updates; it's a worthwhile tradeoff.

CREATE TABLE follows (
    follower_id   UUID NOT NULL REFERENCES users(id),
    followee_id   UUID NOT NULL REFERENCES users(id),
    created_at    TIMESTAMP NOT NULL DEFAULT now(),
    PRIMARY KEY (follower_id, followee_id)
);
CREATE INDEX idx_follows_followee ON follows(followee_id, created_at DESC);
-- PK already covers lookups by (follower_id, followee_id) and scans by follower_id

The composite primary key on (follower_id, followee_id) gives you an implicit index for "who do I follow?" queries. The separate index on followee_id handles the reverse: "who follows me?" That reverse lookup is what the fan-out service hammers when distributing a new post.

CREATE TABLE feed_items (
    id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id       UUID NOT NULL REFERENCES users(id),  -- the feed owner
    post_id       UUID NOT NULL REFERENCES posts(id),
    score         FLOAT NOT NULL DEFAULT 0.0,           -- ranking score, higher = more relevant
    created_at    TIMESTAMP NOT NULL DEFAULT now()
);
CREATE INDEX idx_feed_user_score ON feed_items(user_id, score DESC);

That index on (user_id, score DESC) is doing the heavy lifting. It lets you grab the top N items for a user's feed in a single index scan. In practice, this table mirrors what lives in Redis sorted sets, but having it in a persistent store gives you a fallback when the cache is cold.

Common mistake: Candidates sometimes model FeedItem with the full post content copied in. Don't do that. Store only the post_id and score. Hydrate the actual post content at read time. Duplicating content across potentially billions of feed_items rows would be a storage nightmare, and updates (like editing a post) would require touching every copy.

API Design

Five endpoints cover the functional requirements we gathered. Each maps to one user action.

// Publish a new post
POST /v1/posts
{
  "content": "Just landed in Tokyo!",
  "media_url": "https://cdn.example.com/img/abc123.jpg",
  "content_type": "image"
}
-> 201 Created
{
  "id": "post-uuid-here",
  "author_id": "user-uuid",
  "content": "Just landed in Tokyo!",
  "media_url": "https://cdn.example.com/img/abc123.jpg",
  "content_type": "image",
  "created_at": "2025-01-15T09:30:00Z"
}

POST is the right verb here because you're creating a resource. The response returns the created post so the client can immediately render it without a second round trip.

// Follow another user
POST /v1/users/{followee_id}/follow
-> 204 No Content

// Unfollow a user
DELETE /v1/users/{followee_id}/follow
-> 204 No Content

Follow and unfollow are POST and DELETE respectively. Some candidates use PUT for follow, which technically works, but POST better communicates "create this relationship" while DELETE clearly says "remove it." The followee_id sits in the path; the follower_id comes from the authenticated session, so there's no request body.

// Fetch the home feed (paginated)
GET /v1/feed?cursor={score_cursor}&limit=20
-> 200 OK
{
  "items": [
    {
      "post_id": "post-uuid",
      "author": { "id": "user-uuid", "username": "tokyofoodie", "avatar_url": "..." },
      "content": "Best ramen spot in Shibuya",
      "media_url": "https://cdn.example.com/img/xyz.jpg",
      "like_count": 342,
      "comment_count": 28,
      "score": 0.9821,
      "created_at": "2025-01-15T08:00:00Z"
    }
  ],
  "next_cursor": "0.9650"
}

Cursor-based pagination using the ranking score as the cursor. Not offset-based. With a feed that's constantly being written to, offset pagination leads to duplicate or skipped items as the underlying data shifts. The cursor approach is stable: "give me the next 20 items scored below 0.9821."

Tip: If the interviewer asks why not use created_at as the cursor, that's your opening to explain that the feed isn't purely chronological. The score incorporates recency but also engagement and affinity signals, so it's the natural pagination key.

// Interact with a post (like, comment, share)
POST /v1/posts/{post_id}/interactions
{
  "type": "like"
}
-> 201 Created
{
  "interaction_id": "interaction-uuid",
  "type": "like",
  "created_at": "2025-01-15T09:35:00Z"
}

A single interactions endpoint with a type field keeps the API surface small. You could split likes, comments, and shares into separate endpoints, and that's a valid choice too. But in an interview, consolidating them lets you move faster and spend your time on the feed generation logic, which is where the real complexity lives.

High-Level Design

Two flows dominate this system: publishing a post and reading a feed. They look simple on the surface, but the decisions you make here (especially around fan-out strategy) will determine whether your design scales to hundreds of millions of users or collapses under its own weight. Let's walk through each flow, then tie them together.

1) Publishing a Post

When a user hits "Post," the system needs to persist the content and then notify the downstream machinery that a new post exists. These are two separate concerns, and decoupling them is the whole game.

Components involved: - Client (mobile/web app) - API Gateway (authentication, rate limiting, request routing) - Post Service (validates and persists the post) - Posts DB (durable storage for post content) - Message Queue (decouples write from fan-out) - CDN (serves media assets like images and videos)

Data flow:

The client uploads any media (images, video) directly to object storage (S3) via a pre-signed URL. The CDN sits in front of this storage for fast delivery later.
The client sends the post payload to the API Gateway, which authenticates the user and enforces rate limits.
The API Gateway routes the request to the Post Service.
The Post Service validates the content, writes the post to the Posts DB, and returns a success response to the client immediately. The user doesn't wait for fan-out.
After the DB write succeeds, the Post Service emits a PostCreated event onto a message queue (Kafka works well here given the volume).
The Fan-Out Service consumes this event asynchronously and begins distributing the post to followers' feeds.

Here's what the publish API looks like:

POST /v1/posts
Authorization: Bearer <token>

{
  "content": "Just shipped the new feature!",
  "media_url": "https://cdn.example.com/img/abc123.jpg",
  "content_type": "image"
}

Response 201:
{
  "id": "post_8f3a...",
  "author_id": "user_1b2c...",
  "content": "Just shipped the new feature!",
  "media_url": "https://cdn.example.com/img/abc123.jpg",
  "created_at": "2025-01-15T10:30:00Z"
}

Tip: When you draw this flow on the whiteboard, make the message queue prominent. Interviewers want to see that you understand why the publish path and the fan-out path must be decoupled. If the fan-out service is slow or crashes, the user's post should still succeed.

The PostCreated event on the queue is minimal. It carries just the post ID and the author ID. The Fan-Out Service will look up everything else it needs (follower list, author metadata) on its own.

2) Distributing Posts to Followers (Fan-Out)

This is where candidates either shine or stumble. The Fan-Out Service consumes PostCreated events and writes entries into each follower's pre-computed feed. Think of it as a mail carrier delivering a letter to every mailbox on the route.

Components involved: - Fan-Out Service (reads follower lists, writes to feed caches) - Social Graph DB (stores follow relationships) - Feed Cache (Redis) (sorted sets, one per user)

Data flow:

The Fan-Out Service picks up a PostCreated event from the queue.
It queries the Social Graph DB: "Who follows this author?"
For each follower, it writes a new entry into that follower's Redis sorted set. The entry is the post ID, scored by a combination of timestamp and initial ranking signals.
If the sorted set exceeds a configured limit (say, 1000 items), the oldest entries get trimmed via ZREMRANGEBYRANK.

For a normal user with 200 followers, this means 200 Redis writes per post. Totally manageable. But what about a celebrity with 10 million followers? That single post triggers 10 million writes. At our scale of 6K posts/sec, even a small percentage of celebrity posts would overwhelm the system.

This is the celebrity problem, and it's why we need the hybrid approach. For now, just know that the Fan-Out Service checks the author's follower count before deciding how to distribute. We'll cover the threshold logic in the deep dives.

Common mistake: Candidates often design pure fan-out-on-write without acknowledging the celebrity problem, then get caught off guard when the interviewer asks "What happens when Taylor Swift posts?" Name the problem yourself before they ask. It shows maturity.

3) Reading the Feed

The read path is where your user's patience lives. They open the app, pull to refresh, and expect content in under 200ms. Every millisecond of design overhead here matters.

Components involved: - Client (mobile/web app) - API Gateway (auth, routing) - Feed Service (orchestrates feed assembly) - Feed Cache (Redis) (pre-computed sorted sets of post IDs per user) - Celebrity Cache (recent posts from high-follower accounts, merged at read time) - Posts DB (full post content for hydration)

Data flow:

The client sends a feed request with a cursor for pagination: GET /v1/feed?cursor=<last_score>&limit=20.
The API Gateway authenticates and routes to the Feed Service.
The Feed Service fetches the next page of post IDs from the user's Redis sorted set using ZREVRANGEBYSCORE (highest score first, cursor-based).
Simultaneously, it fetches recent posts from the Celebrity Cache for any celebrities this user follows. These get merged into the candidate set.
The merged candidate set gets passed through a lightweight ranking step (more on this in deep dives) and trimmed to the requested page size.
The Feed Service hydrates the winning post IDs by batch-fetching full post content and author profiles from the Posts DB (or a read-through cache in front of it).
The fully assembled feed page is returned to the client.

GET /v1/feed?cursor=1705312200.0&limit=20
Authorization: Bearer <token>

Response 200:
{
  "items": [
    {
      "post_id": "post_8f3a...",
      "author": { "id": "user_1b2c...", "display_name": "Alice", "avatar_url": "..." },
      "content": "Just shipped the new feature!",
      "media_url": "https://cdn.example.com/img/abc123.jpg",
      "like_count": 42,
      "created_at": "2025-01-15T10:30:00Z"
    }
  ],
  "next_cursor": "1705311800.0"
}

Notice the cursor is a float (the ranking score). This avoids offset-based pagination, which breaks when new items get inserted between page fetches.

Interview tip: Mention the hydration step explicitly. It's easy to wave your hands and say "we read from the cache," but interviewers want to hear that the cache stores lightweight references (post IDs and scores), not full post objects. This keeps the cache small and fast.

Step 4 is what makes the hybrid approach work on the read side. For most posts, the work was already done at write time (they're sitting in the user's sorted set). Only celebrity posts require extra work at read time, and that's a small, bounded merge operation since we're only pulling recent posts from a handful of celebrity accounts.

4) The Hybrid Fan-Out Strategy

Pure fan-out-on-write gives you blazing read performance but punishes you on writes for popular accounts. Pure fan-out-on-read keeps writes cheap but makes every feed request expensive. The right answer is both.

Here's the rule: if an author has fewer than ~100K followers, fan out on write. If they have more, skip the per-follower writes and instead push the post to a Celebrity Cache that gets merged at read time.

Aspect	Fan-Out on Write (normal users)	Fan-Out on Read (celebrities)
Write cost per post	O(follower_count) Redis writes	O(1) write to Celebrity Cache
Read cost per feed request	O(1) sorted set read	O(celebrity_follows) merge
Latency for new post to appear	Seconds (async fan-out)	Instant (merged at read time)
Best for	Users with < 100K followers	Users with millions of followers

The threshold (100K) isn't magic. You'd tune it based on your infrastructure's write throughput and acceptable fan-out lag. In an interview, pick a number and justify it. "100K because beyond that, a single post generates enough writes to saturate a fan-out worker partition for several seconds" is a perfectly reasonable answer.

The Fan-Out Service makes this decision per event:

def handle_post_created(event):
    author_id = event.author_id
    post_id = event.post_id
    follower_count = social_graph.get_follower_count(author_id)

    if follower_count > CELEBRITY_THRESHOLD:
        # Fan-out on read: just cache the post for read-time merge
        celebrity_cache.add(author_id, post_id, score=compute_initial_score(event))
    else:
        # Fan-out on write: push to every follower's feed
        followers = social_graph.get_followers(author_id)
        for batch in chunk(followers, size=500):
            feed_cache.bulk_zadd(
                keys=[f"feed:{uid}" for uid in batch],
                member=post_id,
                score=compute_initial_score(event)
            )

Key insight: The hybrid approach means your read path always does a small merge, even for users who only follow normal accounts (the celebrity merge just returns empty). This keeps the read code path uniform, which is much easier to operate and debug than having two completely different read flows.

Putting It All Together

The full architecture has two main highways: the publish highway and the read highway, connected by the Fan-Out Service and the feed cache.

On the publish side: Client → API Gateway → Post Service → Posts DB + Message Queue → Fan-Out Service → Feed Cache (for normal users) or Celebrity Cache (for high-follower users).

On the read side: Client → API Gateway → Feed Service → Feed Cache + Celebrity Cache (merge) → Posts DB (hydrate) → Client.

Supporting infrastructure wraps around both paths: - A CDN in front of object storage handles all media delivery, keeping media completely off the critical path for both writes and reads. - The API Gateway enforces authentication and rate limiting uniformly. This is where you'd throttle abusive clients before they hit any backend service. - Kafka (or a similar durable message queue) absorbs write spikes. If a celebrity posts and triggers a flood of fan-out work, the queue buffers it. The Fan-Out Service processes at its own pace without back-pressuring the Post Service.

The system is eventually consistent by design. A post published right now might take 2-3 seconds to appear in all followers' feeds (the time for the fan-out workers to process the queue). For celebrity posts merged at read time, the delay is essentially zero since the next feed refresh picks them up immediately.

Deep Dives

"How should we distribute new posts to followers' feeds?"

This is the question that defines your entire architecture. The interviewer wants to see you reason through the fundamental tension: do you do the work at write time or read time? Get this wrong and everything downstream suffers.

Bad Solution: Pure Fan-Out on Read

When a user opens their feed, you query the social graph for everyone they follow, pull recent posts from each of those authors, merge and sort them, then return the top N. No pre-computation at all.

def get_feed(user_id, limit=20):
    followees = social_graph.get_followees(user_id)  # could be 200+
    posts = []
    for followee_id in followees:
        posts.extend(posts_db.get_recent(followee_id, limit=50))
    posts.sort(key=lambda p: p.created_at, reverse=True)
    return posts[:limit]

This looks clean, but the math is brutal. If a user follows 200 accounts, you're issuing 200+ database queries (or one massive IN query) on every single feed load. At 30K feed QPS, that's millions of post lookups per second. Your p99 latency will blow past 200ms before you even think about ranking.

Warning: Candidates who propose this approach often say "we can just cache the results." But that sidesteps the core problem: you still need to compute the feed from scratch whenever the cache is cold or stale. The interviewer will push back hard on this.

Good Solution: Pure Fan-Out on Write

Flip the model. When a user publishes a post, you immediately write a FeedItem entry into every follower's pre-computed feed. Reads become trivial: just fetch the top N items from a single user's feed cache.

def on_post_created(post):
    followers = social_graph.get_followers(post.author_id)
    for follower_id in followers:
        feed_cache.zadd(
            f"feed:{follower_id}",
            {post.id: post.created_at.timestamp()}
        )

Reads drop to a single Redis ZREVRANGE call per request. That's sub-millisecond. Your feed latency problem is solved.

The tradeoff? Write amplification. A user with 500 followers means 500 cache writes per post. That's fine. A celebrity with 10 million followers means 10 million writes per post. At 6K posts/sec globally, even a handful of celebrity posts per second can saturate your write pipeline. You've traded a read problem for a write problem, and the write problem has a long tail that's hard to control.

Great Solution: Hybrid Fan-Out with Follower Threshold

Split users into two tiers based on follower count. Normal users (say, under 100K followers) get fan-out on write. Celebrities get fan-out on read.

The Fan-Out Service checks the author's follower count when processing a PostCreated event. For normal users, it writes to each follower's feed cache as before. For celebrities, it writes only to a separate Celebrity Cache, a per-author sorted set of their recent posts.

At read time, the Feed Service pulls the user's pre-computed feed from Redis, then checks which celebrities the user follows and merges in their recent posts from the Celebrity Cache. Since a user typically follows only a handful of celebrities, this merge is cheap: maybe 5-10 extra cache reads instead of millions of extra writes.

CELEBRITY_THRESHOLD = 100_000

def on_post_created(post):
    follower_count = social_graph.get_follower_count(post.author_id)

    if follower_count >= CELEBRITY_THRESHOLD:
        celebrity_cache.zadd(
            f"celeb:{post.author_id}",
            {post.id: post.created_at.timestamp()}
        )
    else:
        followers = social_graph.get_followers(post.author_id)
        for batch in chunk(followers, size=1000):
            fanout_queue.enqueue(batch, post.id, post.created_at)

def get_feed(user_id, limit=20, cursor=None):
    # 1. Pre-computed feed items
    feed_items = feed_cache.zrevrangebyscore(
        f"feed:{user_id}", max=cursor or "+inf", count=limit
    )
    # 2. Merge celebrity posts
    celeb_followees = social_graph.get_celebrity_followees(user_id)
    for celeb_id in celeb_followees:
        celeb_posts = celebrity_cache.zrevrangebyscore(
            f"celeb:{celeb_id}", max=cursor or "+inf", count=limit
        )
        feed_items.extend(celeb_posts)
    # 3. Sort merged results, take top N
    feed_items.sort(key=lambda x: x.score, reverse=True)
    return feed_items[:limit]

Tip: Mention the threshold number explicitly. Saying "we'd set this around 100K followers based on our write capacity" shows you're thinking in concrete terms, not hand-waving. Interviewers love when you tie architectural decisions to capacity numbers.

Hybrid Fan-Out: Normal vs. Celebrity Path

"How do we rank the feed instead of just showing posts chronologically?"

Every interviewer who asks about news feed wants to hear you go beyond ORDER BY created_at DESC. Ranking is what separates a timeline from a feed.

Bad Solution: Pure Chronological Sort

Sort by timestamp. Done.

This is what Twitter did for years, and it works for a real-time firehose product. But for a Facebook-style feed where users follow hundreds of accounts and check in a few times a day, chronological ordering means they'll miss the best content from 6 hours ago because it's buried under 200 newer posts from accounts they barely care about.

Warning: Don't dismiss chronological entirely. If the interviewer specifies a Twitter-like product, it might be the right call. But you should still acknowledge the tradeoff and offer ranking as an option. Saying "chronological is fine" without discussing alternatives signals you haven't thought deeply about the problem.

Good Solution: Simple Scoring Function

Combine recency with engagement signals into a single score:

import time

def score_post(post, user_id):
    age_hours = (time.time() - post.created_at.timestamp()) / 3600
    recency = 1.0 / (1.0 + age_hours)  # decays over time

    engagement = (
        post.like_count * 0.3 +
        post.comment_count * 0.5 +
        post.share_count * 1.0
    )
    normalized_engagement = min(engagement / 1000, 1.0)

    return 0.6 * recency + 0.4 * normalized_engagement

This gets you 80% of the way there. Popular posts surface higher, but fresh content still has an edge. You can tune the weights manually. The problem is that it treats all authors equally. A post from your best friend and a post from a brand you followed two years ago get the same treatment.

Great Solution: Lightweight ML Ranking with Affinity and Diversity

Build a three-stage pipeline: candidate generation, scoring, and diversity filtering.

Candidate Generation pulls the raw feed items (from the hybrid fan-out cache). You over-fetch, maybe 200-500 candidates for a page of 20.

Scoring runs a lightweight model (logistic regression or a small gradient-boosted tree, not a deep neural net) that considers multiple feature families:

def compute_features(user_id, post, author):
    return {
        # Affinity: how much does this user engage with this author?
        "interaction_count_7d": get_interactions(user_id, author.id, days=7),
        "interaction_count_30d": get_interactions(user_id, author.id, days=30),
        "is_close_friend": is_close_friend(user_id, author.id),

        # Post signals
        "post_age_hours": hours_since(post.created_at),
        "like_count": post.like_count,
        "comment_count": post.comment_count,
        "content_type": post.content_type,  # image posts tend to engage more
        "has_media": post.media_url is not None,

        # User context
        "time_of_day": current_hour(),
        "posts_seen_this_session": get_session_post_count(user_id),
    }

Diversity Filtering is the piece most candidates forget. After scoring, you enforce rules: no more than 2 consecutive posts from the same author, mix content types (don't show 5 images in a row), and inject at least one post from an author the user hasn't seen recently. This prevents the feed from feeling repetitive even when the model is confident about a few authors.

def apply_diversity(ranked_posts, max_consecutive_same_author=2):
    result = []
    author_streak = {}
    for post in ranked_posts:
        streak = author_streak.get(post.author_id, 0)
        if streak >= max_consecutive_same_author:
            continue  # skip, will try to place later
        result.append(post)
        author_streak[post.author_id] = streak + 1
        # Reset other authors' streaks
        for aid in author_streak:
            if aid != post.author_id:
                author_streak[aid] = 0
    return result

Tip: You don't need to describe a production ML pipeline in detail. What the interviewer wants to hear is that you understand affinity (personalization), recency decay (freshness), engagement signals (quality), and diversity (user experience). Name those four concepts and you'll stand out.

"How do we handle the celebrity hot-key problem at scale?"

The interviewer might circle back to this after you've presented the hybrid approach. They want to see you think through the operational reality of fan-out for accounts with millions of followers.

Imagine a celebrity with 10 million followers publishes a post. Even with batched writes of 1,000 followers per batch, that's 10,000 batches. If each batch takes 5ms to process, a single worker needs 50 seconds. During a major event (a Super Bowl halftime tweet, a presidential announcement), dozens of celebrities might post within seconds of each other.

The answer is a combination of strategies. First, the fan-out workers process batches asynchronously from a partitioned message queue. You can scale workers horizontally. Second, you throttle fan-out for high-follower accounts, accepting that their posts reach followers over 30-60 seconds instead of instantly. Third, for accounts above the celebrity threshold, you skip fan-out entirely and use the read-time merge from the Celebrity Cache.

The Celebrity Cache itself is simple: one sorted set per celebrity, holding their last ~100 posts. When a user's feed is assembled, the Feed Service checks which celebrities they follow (typically a small list), reads from each celebrity's sorted set, and merges those posts into the pre-computed feed before ranking.

One subtlety worth mentioning: the threshold doesn't have to be binary. You can have a gradient. Accounts with 100K-1M followers get fan-out with throttled batching. Accounts above 1M skip fan-out entirely. This gives you a smoother operational profile instead of a hard cutoff that creates cliff effects.

"How do we design the feed cache for fast reads and efficient eviction?"

This is where you show you understand the data structure choices behind the architecture, not just the boxes on the diagram.

Bad Solution: Simple Key-Value Cache

Store each user's feed as a JSON blob in Redis or Memcached. On every new post, deserialize the blob, insert the new item, re-serialize, and write it back.

This creates race conditions when multiple fan-out workers write to the same user's feed simultaneously. It also means you're reading and writing the entire feed (potentially hundreds of items) just to add one entry. Pagination requires deserializing the whole blob and slicing in application code.

Good Solution: Redis Sorted Sets

Use one sorted set per user, keyed as feed:{user_id}. Each member is a post ID, and the score is the post's timestamp (or ranking score if you've pre-computed it).

-- Conceptual schema for what Redis stores:
-- ZSET key: "feed:{user_id}"
-- Members: post_id (UUID as string)
-- Scores: ranking_score (float, e.g. timestamp or ML score)

Writes are atomic ZADD operations, no read-modify-write cycle. Pagination uses ZREVRANGEBYSCORE with a cursor (the score of the last item on the previous page). Trimming old items is a single ZREMRANGEBYRANK call that keeps only the top N entries.

def write_to_feed(user_id, post_id, score):
    pipe = redis.pipeline()
    pipe.zadd(f"feed:{user_id}", {post_id: score})
    pipe.zremrangebyrank(f"feed:{user_id}", 0, -501)  # keep top 500
    pipe.execute()

def read_feed(user_id, cursor_score="+inf", page_size=20):
    return redis.zrevrangebyscore(
        f"feed:{user_id}",
        max=cursor_score,
        min="-inf",
        start=0,
        num=page_size,
        withscores=True
    )

This handles concurrent writes gracefully and gives you O(log N) inserts and range queries. For 500M users with 500 items each, you're looking at roughly 500M * 500 * ~50 bytes per entry = ~12.5 TB of Redis memory. That's a large cluster, but it's within the range of what companies like Facebook and Twitter actually run.

Great Solution: Sorted Sets with TTL Eviction and Cache-Aside Rebuild

Build on the sorted set approach with two additions that handle the long tail of inactive users and cache misses.

TTL-based eviction: Most users don't check their feed every day. Set a TTL (say, 7 days) on each feed key. If a user hasn't read their feed in a week, the key expires and you reclaim memory. This can cut your cache size by 50-70% since a large portion of registered users are inactive.

Cache-aside rebuild: When an active user returns and their feed key has expired (cache miss), the Feed Service falls back to on-the-fly generation. It queries the social graph for the user's followees, pulls their recent posts from the Posts DB, scores and ranks them, writes the result back into Redis, and returns the feed. This is the fan-out-on-read path, but it only triggers for cache misses, not for every read.

def get_feed_with_fallback(user_id, cursor=None, page_size=20):
    # Try cache first
    items = read_feed(user_id, cursor_score=cursor or "+inf", page_size=page_size)

    if items:
        redis.expire(f"feed:{user_id}", ttl=7 * 86400)  # refresh TTL on access
        return hydrate_posts(items)

    # Cache miss: rebuild from source
    followees = social_graph.get_followees(user_id)
    posts = posts_db.get_recent_by_authors(followees, limit=500)
    scored = [(p.id, score_post(p, user_id)) for p in posts]

    # Backfill cache
    if scored:
        redis.zadd(f"feed:{user_id}", dict(scored))
        redis.zremrangebyrank(f"feed:{user_id}", 0, -501)
        redis.expire(f"feed:{user_id}", ttl=7 * 86400)

    return hydrate_posts(scored[:page_size])

The beauty of this design is graceful degradation. If Redis goes down entirely, every request hits the fallback path. Latency spikes, but the system stays available. You can also monitor cache hit rates as a health signal: if hit rate drops below 90%, something is wrong with your fan-out pipeline.

Tip: Mention the TTL refresh on read. It's a small detail that shows you understand access patterns. Active users keep their cache warm automatically. Inactive users get evicted without any background cleanup job. Interviewers notice when you think about operational simplicity.

Feed Cache Design with Redis Sorted Sets

What is Expected at Each Level

Interviewers calibrate against a mental rubric whether they admit it or not. Here's what each level looks like for this specific problem, so you know where to aim.

Mid-Level

You clearly define the product ("a personalized feed of posts from people I follow, ranked and paginated") and gather requirements before jumping into boxes and arrows. Skipping this step is an instant yellow flag.
You identify User, Post, Follow, and FeedItem as core entities, sketch reasonable schemas, and design basic APIs for publishing a post and fetching a feed. The API doesn't need to be perfect, but it should include pagination parameters.
You draw a publish flow (client to post service to database) and a read flow (client to feed service to cache/database) that are logically sound, even if they aren't fully optimized. The two flows should be clearly separated.
You can explain the difference between fan-out-on-write and fan-out-on-read when asked. You might not nail the hybrid solution, but you should at least flag that a user with 10 million followers is going to be a problem for the write-heavy approach.

Senior

You propose the hybrid fan-out strategy without the interviewer needing to lead you there. You set a concrete threshold (something like 100K followers) and explain why celebrity posts get deferred to read time while normal posts get pushed eagerly.
Your cache design is specific. You name Redis sorted sets, explain that the score encodes ranking, describe cursor-based pagination with ZRANGEBYSCORE, and discuss what happens on a cache miss (fall back to rebuilding from the social graph and posts DB).
Ranking goes beyond "sort by timestamp." You bring up at least two or three signals: recency decay, engagement counts, author affinity. You don't need to present a full ML pipeline, but you should convince the interviewer that chronological order alone produces a bad user experience.
When the interviewer pushes on consistency or cache invalidation (and they will), you handle it confidently. You explain that eventual consistency within a few seconds is acceptable for feeds, that deleted posts can be filtered at read time with a tombstone check, and that cache TTLs handle staleness for inactive users.

Staff+

You drive the conversation. The interviewer barely needs to prompt you. You set the agenda, call out which areas deserve deep dives, and timebox yourself. This alone signals seniority more than any single technical insight.
You quantify tradeoffs with real math. "A celebrity with 5M followers posting once triggers 5M Redis ZADD operations. At 50 microseconds each, that's 250 seconds of single-threaded work. We need to shard the fan-out across N workers and batch writes to keep propagation under 10 seconds." Numbers like these separate hand-waving from engineering.
You propose the full ranking pipeline: candidate generation from the feed cache, a lightweight scoring model considering affinity, post type, and recency, then a diversity filter that caps posts from any single author per page. You also mention how you'd A/B test ranking changes by running shadow models alongside production.
You think about what happens next quarter, not just today. Ads injection as a ranked slot in the feed. Real-time updates via WebSockets or SSE for users who are actively scrolling. Multi-region replication of the feed cache so reads stay under 200ms globally. Operational dashboards tracking fan-out lag, cache hit rates, and p99 feed latency. Graceful degradation that falls back to chronological feeds if the ranking service is down.

Key takeaway: The news feed problem is really a write-amplification problem disguised as a read-latency problem. Every design decision, from hybrid fan-out to celebrity caching to sorted-set eviction, traces back to one question: how do you get fresh, relevant content in front of 500M users without drowning in writes? Show the interviewer you understand that tension, and the rest of the design falls into place.

Understanding the Problem

What is a News Feed System?

Functional Requirements

Non-Functional Requirements

Back-of-Envelope Estimation

The Set Up

Core Entities

API Design

High-Level Design

1) Publishing a Post

2) Distributing Posts to Followers (Fan-Out)

3) Reading the Feed

4) The Hybrid Fan-Out Strategy

Putting It All Together

Deep Dives

"How should we distribute new posts to followers' feeds?"

Bad Solution: Pure Fan-Out on Read

Good Solution: Pure Fan-Out on Write

Great Solution: Hybrid Fan-Out with Follower Threshold

"How do we rank the feed instead of just showing posts chronologically?"

Bad Solution: Pure Chronological Sort

Good Solution: Simple Scoring Function

Great Solution: Lightweight ML Ranking with Affinity and Diversity

"How do we handle the celebrity hot-key problem at scale?"

"How do we design the feed cache for fast reads and efficient eviction?"

Bad Solution: Simple Key-Value Cache

Good Solution: Redis Sorted Sets

Great Solution: Sorted Sets with TTL Eviction and Cache-Aside Rebuild

What is Expected at Each Level

Mid-Level

Senior

Staff+

Dan Lee

Related Articles

Design a Ride-Sharing Service (Uber)

Design a Web Crawler

Design a Key-Value Store