Understanding the Problem
What is a News Feed Ranking System?
Product definition: A news feed ranking system personalizes and orders a stream of posts for each user by scoring candidate content against that user's interests, relationships, and real-time engagement signals.
When someone opens Facebook, LinkedIn, or Twitter, they don't see posts in chronological order. They see a ranked list, curated specifically for them. The system's job is to take thousands of candidate posts from people they follow and surface the ones most likely to be relevant, engaging, and worth their time.
The first thing to clarify with your interviewer is what kind of feed this is. A social feed ranks content from people you follow. A content feed ranks posts by topic or interest, even from strangers. Most production systems are hybrids, and that distinction matters because it changes how you generate candidates. For this lesson, assume a hybrid social feed: personalized per user, driven primarily by follow relationships, but augmented by interest signals.
Functional Requirements
Core Requirements:
- Retrieve a set of candidate posts from a user's followees and interest graph
- Score and rank those candidates per user using a personalized ranking model
- Incorporate real-time engagement signals (likes, shares, comments from the last hour) into ranking scores
- Serve a ranked feed to the client within a strict latency budget
- Balance ranking objectives across engagement, recency, and content diversity
Below the line (out of scope):
- Ads ranking and sponsored content insertion
- Stories, Reels, or short-form video carousels (separate ranking surface)
- Notification delivery and push ranking
Note: "Below the line" features are acknowledged but won't be designed in this lesson.
One thing interviewers specifically listen for here: did you mention ranking goals explicitly? Don't just say "rank posts by relevance." Say you're optimizing for a combination of engagement signals (clicks, dwell time, shares), recency, and creator health. That framing tells the interviewer you understand this is a multi-objective problem, not a single-metric optimization.
Non-Functional Requirements
- Scale: 500M DAU, with peak feed load QPS around 500K requests per second
- Latency: p99 feed response under 200ms end-to-end, which means ranking must complete in well under 100ms on a cache miss
- Freshness: New posts from followees should appear in eligible feeds within 5 minutes of publishing
- Availability: Feed serving should target 99.99% uptime; a stale feed is far preferable to a failed request
Tip: Always clarify requirements before jumping into design. This shows maturity.
Back-of-Envelope Estimation
Start with DAU and work outward. If 500M users each load their feed an average of 5 times per day, that's 2.5 billion feed loads per day, or roughly 29,000 requests per second on average. Peak traffic (morning commute, lunch, evening scroll) runs about 17x the average, landing near 500K QPS at peak.
| Metric | Calculation | Result |
|---|---|---|
| Daily feed loads | 500M DAU × 5 loads/day | 2.5B loads/day |
| Average feed QPS | 2.5B / 86,400s | ~29K QPS |
| Peak feed QPS | 29K × 17x peak multiplier | ~500K QPS |
| Candidates per feed | 2,000 followees × 2.5 posts/day avg | ~5,000 candidates |
| Feed cache entry size | 200 post IDs × 8 bytes + scores | ~3KB per user |
| Total feed cache size | 500M users × 3KB | ~1.5TB |
| Interaction events | 500M users × 20 interactions/day | 10B events/day (~115K events/sec) |
| Interaction log storage | 10B events × 200 bytes | ~2TB/day |
The 1.5TB feed cache number is important. That's too large for a single Redis cluster but very manageable across a sharded fleet. The 2TB/day interaction log tells you that your training data pipeline needs serious throughput, and that you'll want to archive to columnar storage (Parquet on S3, for example) rather than keeping raw logs in a transactional database.
The Set Up
Core Entities
Five entities do most of the work here. Getting their relationships right early will make the rest of the design feel inevitable rather than improvised.
User is straightforward: anyone who reads or creates content. The attributes that matter for ranking are country and language, since they gate which posts are even eligible candidates.
Post is the content unit. You need content_type because ranking a video differently from a text post is table stakes at this scale. author_id links back to User and is the join key for candidate generation. Text posts store their content in text_content; image, video, and link posts leave that column NULL and use media_url instead.
Edge is the follow/friend graph. This is the engine of candidate generation. Every time a user loads their feed, you traverse their outgoing edges to find followees, then pull recent posts from those followees. A user following 500 accounts, each posting 10 times in the last 24 hours, produces a raw candidate pool of 5,000 posts before a single ranking signal is applied. The weight field lets you encode relationship strength (close friend vs. casual follow) as a feature.
Interaction is your raw signal log. Every click, share, comment, and dwell event lands here. This table feeds two separate consumers: the Flink stream processor computing real-time features (post CTR in the last 30 minutes), and the offline training pipeline that turns historical interactions into labeled examples. Think of it as the append-only ground truth for everything the model learns.
FeedScore is the materialized output of the ranking pipeline. It's not a source-of-truth table; it's a cache artifact. When a user opens the app, the feed API reads from here (or from Redis backed by this) rather than invoking the ranking model inline. The scored_at timestamp is what lets you detect staleness and decide whether to serve the cached ranking or trigger a re-rank.
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
country VARCHAR(10) NOT NULL, -- ISO 3166-1 alpha-2
language VARCHAR(10) NOT NULL, -- BCP 47 language tag
created_at TIMESTAMP NOT NULL DEFAULT now()
);
CREATE TABLE posts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
author_id UUID NOT NULL REFERENCES users(id),
content_type VARCHAR(50) NOT NULL, -- 'text', 'image', 'video', 'link'
text_content TEXT, -- NULL for non-text posts
media_url TEXT, -- NULL for text-only posts
created_at TIMESTAMP NOT NULL DEFAULT now()
);
CREATE INDEX idx_posts_author_created ON posts(author_id, created_at DESC);
CREATE TABLE edges (
follower_id UUID NOT NULL REFERENCES users(id),
followee_id UUID NOT NULL REFERENCES users(id),
weight FLOAT NOT NULL DEFAULT 1.0, -- relationship strength; higher = closer
created_at TIMESTAMP NOT NULL DEFAULT now(),
PRIMARY KEY (follower_id, followee_id)
);
CREATE INDEX idx_edges_followee ON edges(followee_id); -- supports fan-out on write: find all followers of a new post's author
CREATE TABLE interactions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id),
post_id UUID NOT NULL REFERENCES posts(id),
type VARCHAR(50) NOT NULL, -- 'click', 'like', 'share', 'comment', 'dwell'
dwell_ms INT, -- NULL for non-dwell event types
created_at TIMESTAMP NOT NULL DEFAULT now()
);
CREATE INDEX idx_interactions_post_created ON interactions(post_id, created_at DESC);
CREATE INDEX idx_interactions_user_created ON interactions(user_id, created_at DESC);
CREATE TABLE feed_scores (
user_id UUID NOT NULL REFERENCES users(id),
post_id UUID NOT NULL REFERENCES posts(id),
score FLOAT NOT NULL, -- model output; higher = ranked earlier
rank INT NOT NULL, -- position in the user's feed
scored_at TIMESTAMP NOT NULL DEFAULT now(),
PRIMARY KEY (user_id, post_id)
);
CREATE INDEX idx_feed_scores_user_rank ON feed_scores(user_id, rank ASC);
A quick note on the two indexes for edges. The primary key on (follower_id, followee_id) is what the pull model uses: given a user, look up everyone they follow, then fetch recent posts from those accounts. The secondary index on followee_id flips the direction and supports fan-out on write: when a new post is published, find all followers of the author and push the post into their pre-computed feeds. Both patterns appear in production systems; the index comment calls out which direction each one serves.
Notice the index on feed_scores(user_id, rank ASC). Feed reads are always "give me the top N posts for user X in rank order," so this index is the hot path for every feed load.

Key insight: FeedScore is the bridge between your expensive offline ranking pipeline and your cheap online serving path. The whole system is designed to make writes to FeedScore fast and reads from it trivially fast.
API Design
Three endpoints cover the functional requirements: loading a feed, publishing a post, and recording an interaction.
// Return the ranked feed for the authenticated user
GET /v1/feed?limit=20&cursor=<opaque_cursor>
-> {
"posts": [
{ "post_id": "uuid", "author_id": "uuid", "score": 0.94, "rank": 1, "content_type": "video" },
...
],
"next_cursor": "<opaque_cursor>",
"scored_at": "2024-01-15T10:30:00Z"
}
// Publish a new post; triggers candidate generation fan-out asynchronously
POST /v1/posts
{ "content_type": "image", "media_url": "https://cdn.example.com/img/abc.jpg" }
-> { "post_id": "uuid", "created_at": "2024-01-15T10:31:00Z" }
// Record a user interaction event for a post
POST /v1/interactions
{ "post_id": "uuid", "type": "dwell", "dwell_ms": 4200 }
-> { "interaction_id": "uuid" }
GET for the feed is the obvious choice: it's a read, it's cacheable at the CDN layer for anonymous users, and the cursor-based pagination avoids the offset instability you'd get with a ranked list that's constantly being updated. POST for publishing and interactions makes sense because both are writes with side effects. Publishing triggers an async fan-out job. Recording an interaction enqueues an event onto Kafka; the HTTP response doesn't wait for the feature store to update.
Common mistake: Candidates sometimes design the feed endpoint to accept auser_idin the request body. Don't. The authenticated user's identity comes from the session token. Accepting an arbitraryuser_idis a privilege escalation bug waiting to happen.
The scored_at field in the feed response is worth calling out explicitly. It tells the client (and your monitoring systems) how stale the ranking is. If scored_at is 20 minutes old and a viral post just dropped, you want to know that before your users do.
High-Level Design
A news feed ranking system has two distinct modes of operation: the hot path (a user opens the app and needs a ranked feed in under 200ms) and the cold path (the background machinery that keeps rankings fresh and models improving). Get the hot path wrong and users see a spinner. Get the cold path wrong and users see stale, irrelevant content.
Let's build each one.
1) Serving a Ranked Feed on App Open
The core requirement: when a user opens the app, return a personalized ranked list of posts within your latency budget. At 500K feed loads per second, you cannot run model inference inline for every request.
Components involved: - Mobile/web client - Feed API (stateless, horizontally scaled) - Feed Cache (Redis, keyed by user_id) - Ranking Service (triggered on cache miss) - Candidate Generator - Online Feature Store
Data flow:
- User opens the app. The client calls
GET /feed?user_id=abc&limit=50. - Feed API checks Redis for a precomputed ranked list at key
feed:{user_id}. - Cache hit: Feed API returns the stored post IDs directly. Done. This should cover the vast majority of requests.
- Cache miss: Feed API calls the Ranking Service asynchronously, returns a degraded response (recent posts, unranked) to the client immediately, then writes the ranked result to cache once scoring completes.
- Ranking Service calls Candidate Generator to fetch 1,000-5,000 recent post IDs from the user's followee graph.
- Ranking Service bulk-fetches features for all candidates from the Online Feature Store in a single batched call.
- Ranking Service runs model inference over the candidate set and produces a scored, ordered list.
- Ranked list is written back to Redis with a TTL (typically 5-15 minutes, depending on user activity level).

The most important decision here is precomputation. You're not ranking at request time for the typical case. You're serving a pre-ranked snapshot and refreshing it in the background. This is what makes sub-200ms p99 feasible at this scale.
Common mistake: Candidates often propose running the ranking model synchronously on every feed request. At 500K QPS with 2,000-candidate sets, that's billions of model forward passes per second. Precompute the feed; serve from cache.
The TTL is a real tradeoff. A 5-minute TTL means a post that goes viral right now won't appear in most users' feeds for up to 5 minutes. A 1-minute TTL cuts that window but multiplies your background recompute load by 5x. You'll want to tune this per user segment: highly active users get shorter TTLs because they open the app frequently anyway; inactive users can tolerate longer staleness.
// Feed Cache entry structure
{
"user_id": "abc-123",
"ranked_posts": [
{ "post_id": "p1", "score": 0.94, "rank": 1 },
{ "post_id": "p2", "score": 0.87, "rank": 2 }
],
"scored_at": "2024-01-15T10:30:00Z",
"ttl_seconds": 300
}
2) Generating and Ranking Candidates Offline
The Feed Cache has to come from somewhere. The offline ranking pipeline is what populates it, and it runs continuously in the background for every active user.
Components involved: - Candidate Generator (reads from follow graph + recent posts index) - Online Feature Store (Feast) - Ranking Model (deployed scoring service) - Feed Cache (Redis)
Data flow:
- A background worker selects users whose feed cache is expiring or has been marked stale.
- Candidate Generator queries the follow graph to get the user's followee list, then fetches recent posts from each followee (bounded to the last 48 hours, capped at 5,000 candidates total).
- For each candidate post, the Feature Store returns a feature vector: post age, author affinity score, post CTR in the last hour, user's historical engagement with this content type, and so on.
- The Ranking Service runs the feature vectors through the deployed model (typically a gradient-boosted tree or a two-tower neural net) and produces a relevance score per candidate.
- Candidates are sorted by score. The top 200-500 are written to the Feed Cache as a Redis sorted set, which the Feed API reads from on the hot path.

The feature hydration step deserves attention. You have up to 5,000 candidates per user. If you fetch features one post at a time, you're making 5,000 round trips to the feature store. You need bulk/batch reads here, and your feature store needs to support them efficiently. Feast with Redis as the online backend handles this well.
Key insight: The Ranking Service doesn't need to be a GPU cluster for this workload. Gradient-boosted models (LightGBM, XGBoost) score thousands of candidates in single-digit milliseconds on CPU. Save the GPU budget for embedding generation and two-tower retrieval if you go that route.
The Feed Cache stores ranked candidates as a Redis sorted set, using the model score as the sort key:
ZADD feed:{user_id} 0.94 post_id_1
ZADD feed:{user_id} 0.87 post_id_2
EXPIRE feed:{user_id} 300
This lets the Feed API retrieve the top N posts with a single ZREVRANGE feed:{user_id} 0 49 call, which is O(log N + M) and fast enough to be invisible in your latency budget.
3) Keeping Features Fresh with Real-Time Signals
A post published 10 minutes ago with 500 shares is more relevant than a post published 2 hours ago with 2 shares. Your ranking model needs to know about that velocity signal in near-real-time, not after the next batch job runs at midnight.
Components involved: - Interaction Event Stream (Kafka) - Stream Processor (Flink) - Online Feature Store (Feast)
Data flow:
- Every user interaction (click, like, share, scroll-past, dwell) is published to a Kafka topic as an event within milliseconds of occurring.
- Flink consumes the event stream and maintains rolling window aggregates: CTR over the last 5 minutes, share count over the last 30 minutes, dwell-time p50 over the last hour.
- Flink writes updated aggregate values back to the Online Feature Store, keyed by
post_id. - The next time the Ranking Service fetches features for that post (during the next background re-rank cycle), it gets the fresh signal.
The Kafka topic structure matters for throughput. Partition by post_id so that all events for a given post land on the same Flink task, making windowed aggregation stateful and correct without cross-partition joins.
# Flink windowed aggregation (pseudocode)
stream = kafka_source("interaction-events")
post_ctr = (
stream
.filter(lambda e: e["type"] in ["click", "impression"])
.key_by("post_id")
.window(SlidingEventTimeWindows.of(minutes(30), minutes(1)))
.aggregate(CTRAggregator())
)
post_ctr.add_sink(feature_store_sink("post_ctr_30m"))
Interview tip: When you mention Flink, expect the interviewer to ask "why not Spark Streaming?" The answer: Flink is a true streaming engine with per-event processing and low-latency state management. Spark Streaming uses micro-batches, which adds latency. For features that need to reflect activity from the last 5 minutes, Flink wins.
One thing to be explicit about: Flink updates the feature store, but it doesn't directly trigger a re-rank. The re-rank happens on the next background cycle (or on cache miss). There's an inherent lag between a post going viral and that signal propagating into ranked feeds. That lag is acceptable; sub-minute feature freshness with a 5-minute cache TTL means most users see viral content within 6 minutes of it taking off.
4) Retraining the Ranking Model
The model you deployed last week was trained on last week's interaction patterns. User behavior shifts. New content formats emerge. Without retraining, your ranking quality degrades quietly.
Components involved: - Data Warehouse (interaction logs) - Training Pipeline (Ray or Kubeflow) - Model Registry (MLflow) - Ranking Service (consumes new model artifact)
Data flow:
- All Kafka interaction events are also written to a data warehouse (Snowflake, BigQuery, or similar) for long-term storage. This is your training data source.
- A training job runs on a schedule (daily is common; some teams do continuous training on a rolling window). It reads the last N days of interaction logs, joins them with the feature snapshots that were served at the time of each impression (more on this in the deep dives), and produces labeled training examples.
- The training job fits a new model, evaluates it against a held-out validation set, and if metrics pass threshold, pushes the artifact to a model registry (MLflow).
- The Ranking Service polls the model registry (or receives a push notification) and hot-swaps to the new model artifact without restarting.
The hot-swap is worth calling out. You don't want a model deployment to require a service restart that causes a cache-miss storm. Most serving frameworks (TFServing, Triton) support versioned model loading with zero-downtime swaps.
Common mistake: Candidates often describe retraining as "we retrain the model on new data." That's incomplete. The interviewer will push: what labels do you use? The label is a delayed signal. A user clicking a post 30 seconds after it was shown is a positive example. A user scrolling past without dwelling is a negative. You need to define the label window (typically 1-24 hours after impression) and handle the fact that labels arrive late.
Putting It All Together
The full system has two loops running in parallel. The hot path serves precomputed ranked feeds from the Feed Cache in single-digit milliseconds for the typical case, falling back to online ranking only on cache miss. The cold path continuously refreshes those cached rankings: Flink keeps real-time features current, the offline ranking pipeline re-scores candidates every few minutes per user, and the training pipeline retrains the model daily to keep it calibrated to current behavior.
The Feed API is the only component users ever touch directly. Everything else is infrastructure that makes the answer it returns good.


A few numbers to anchor the architecture. At 500K feed loads per second with a 95% cache hit rate, the Ranking Service only handles 25K re-rank requests per second. Each re-rank scores ~2,000 candidates in roughly 5-10ms on CPU. That's a manageable fleet. The feature store needs to handle bulk reads at roughly the same rate; a Redis cluster with read replicas handles this comfortably. Kafka at this interaction volume (assume 10 interactions per active user per session, 500M DAU) means roughly 50-100M events per hour, which is well within Kafka's operating range at a few hundred partitions.
Interview tip: When you finish the high-level design, offer to go deeper on whichever component the interviewer finds most interesting. The natural candidates are candidate generation at scale, feature freshness tradeoffs, and training-serving skew. Having a clear mental model of all four components means you can pivot to any of them confidently.
Deep Dives
The interviewer will almost certainly pick two or three of these to probe. Know all five, but be ready to go deep on any one of them for ten minutes straight.
"How do we efficiently retrieve candidate posts for a user with 2,000 followees?"
A user with 2,000 followees might have 50,000 posts published in the last 48 hours across those accounts. You need 1,000-5,000 good candidates in under 20ms. How you get there matters enormously.
Bad Solution: Query the posts table at request time
The naive approach: when a user opens their feed, scan the posts table filtering by author_id IN (followee_ids) ordered by created_at DESC. Simple to implement, obvious to reason about.
At 500K feed loads per second, this melts your database. Even with indexes, a query joining against a 2,000-element followee list and sorting across millions of recent posts is expensive. You're also doing this work synchronously in the request path, which blows your latency budget before the ranking model even runs.
Warning: Candidates who propose "just query with an IN clause" usually haven't thought about what happens at scale. The interviewer will push you on this immediately. Have your answer ready.
Good Solution: Fan-out on write
When a user publishes a post, a background worker looks up all their followers and writes the post ID into each follower's feed inbox. The inbox is a sorted set in Redis, keyed by user_id, scored by timestamp.
def fan_out_post(post_id: str, author_id: str, published_at: float):
followers = follow_graph.get_followers(author_id) # from Cassandra
pipeline = redis.pipeline()
for follower_id in followers:
inbox_key = f"feed_inbox:{follower_id}"
pipeline.zadd(inbox_key, {post_id: published_at})
pipeline.zremrangebyrank(inbox_key, 0, -5001) # keep top 5000
pipeline.execute()
Feed load becomes a single ZREVRANGE call per user. Fast, predictable, cache-friendly. The tradeoff is write amplification: a post from someone with 500K followers triggers 500K Redis writes. That's manageable for most accounts but starts to hurt for large ones.
Great Solution: Hybrid fan-out with author tiering
The production answer is to fan-out on write for normal users and fan-out on read for celebrities. You classify authors by follower count at post-publish time.
For accounts with fewer than, say, 50K followers, you eagerly write to follower inboxes. For accounts above that threshold (your "celebrities"), you skip the inbox write entirely and maintain a separate per-author recent posts index. At feed load time, a Candidate Merger reads from the user's inbox AND fetches recent posts from any celebrity accounts they follow, then deduplicates and caps the result at N candidates.
CELEBRITY_THRESHOLD = 50_000
def handle_new_post(post_id: str, author_id: str, published_at: float):
follower_count = follow_graph.get_follower_count(author_id)
if follower_count < CELEBRITY_THRESHOLD:
fan_out_to_inboxes(post_id, author_id, published_at)
# Always write to the per-author index (used for on-read fetch)
author_index_key = f"author_posts:{author_id}"
redis.zadd(author_index_key, {post_id: published_at})
redis.zremrangebyrank(author_index_key, 0, -201) # keep 200 most recent
At feed load time, the Candidate Merger fetches the inbox and then does a small number of ZREVRANGE calls against the celebrity author indexes. The total read fan-out is bounded by the number of celebrity accounts a user follows, which is typically small.
Tip: Naming the hybrid approach and explaining the threshold logic is what separates senior candidates from mid-level ones. Bonus points if you mention that the threshold itself should be tunable and that some systems use a three-tier model (normal, mid-tier, celebrity) to smooth the transition.

"How do we keep ranking features fresh without blowing our latency budget?"
Your ranking model needs to know that a post is going viral right now, not just that it had a good CTR last week. But if you have 2,000 candidates and each feature lookup takes 5ms, you've already spent 10 seconds before scoring a single post. Something has to give.
Bad Solution: Per-candidate feature store lookups at scoring time
The straightforward approach: for each candidate post, call the feature store to get its current CTR, share velocity, dwell time, etc. Then assemble the feature vector and run inference.
This is fine for 10 candidates. At 2,000 candidates per ranking request, you're making 2,000 sequential (or even parallel) network calls. Even at 2ms per call with aggressive parallelism, you're looking at 20-50ms just for feature hydration. That's your entire latency budget gone before the model runs.
Warning: A lot of candidates describe this pattern without doing the math. The interviewer will ask "what's the latency?" and the answer will be embarrassing. Always sanity-check your design against the numbers.
Good Solution: Bulk feature fetch with precomputed aggregates
Instead of one call per candidate, batch all candidate post IDs into a single multi-get against the feature store. Redis pipelines and Feast's batch retrieval API both support this.
def hydrate_features(candidate_post_ids: list[str], user_id: str) -> dict:
# Single batched call instead of N individual lookups
post_features = feature_store.get_online_features(
features=[
"post_stats:ctr_1h",
"post_stats:share_velocity_30m",
"post_stats:dwell_p50_1h",
"post_stats:comment_rate_1h",
],
entity_rows=[{"post_id": pid} for pid in candidate_post_ids],
).to_dict()
user_features = feature_store.get_online_features(
features=["user_profile:avg_session_length", "user_profile:topic_affinities"],
entity_rows=[{"user_id": user_id}],
).to_dict()
return merge_features(post_features, user_features)
One network round-trip for all post features, one for user features. Latency drops from 50ms to 2-5ms. The catch: you're now dependent on the feature store being up-to-date. If Flink is lagging, your "real-time" CTR might be 10 minutes stale.
Great Solution: Tiered features with freshness-aware serving
Not all features need the same freshness. Split them into tiers and serve each tier from the appropriate source.
Real-time features (post CTR in the last 5 minutes, share velocity) are written by Flink from the Kafka interaction stream and served from the online feature store. These get refreshed every 30-60 seconds per post. Stable features (author follower count, user's historical topic preferences, post content embeddings) are computed nightly by a batch pipeline and also live in the feature store, but they're updated far less frequently.
At scoring time, the Ranking Service does one bulk fetch that returns both tiers together. The feature store handles the merge internally. Critically, the Ranking Service also logs the exact feature values it used, keyed by request ID, for training purposes (more on that in the next deep dive).
# Flink job: compute rolling aggregates and push to feature store
def compute_post_aggregates(events: DataStream) -> None:
events \
.key_by(lambda e: e.post_id) \
.window(SlidingEventTimeWindows.of(
Time.minutes(60), Time.minutes(1)
)) \
.aggregate(CTRAggregator()) \
.add_sink(FeatureStoreSink(
feature_view="post_stats",
ttl_seconds=7200
))
The freshness/latency tradeoff becomes explicit and tunable: you can decide that share velocity needs 1-minute freshness while author-level features can be 24 hours stale, and configure each accordingly.
Tip: Interviewers love when you articulate the freshness SLA per feature type. It shows you understand that "real-time" is not binary and that different features have different staleness tolerances.

"How do we avoid training-serving skew in the ranking model?"
Training-serving skew is the most common silent killer in production ML systems. The model trains on one version of the features and serves on another. Offline metrics look great; online metrics disappoint. The root cause is almost always how features are computed or joined.
Bad Solution: Recompute features from raw logs at training time
The intuitive approach: at training time, replay the interaction logs and recompute features using the same logic as the serving pipeline. If the serving code computes "CTR in the last hour," just run the same computation over the historical log.
The problem is that the historical log doesn't perfectly reconstruct the state of the world at serving time. The serving pipeline might have had a Flink lag of 3 minutes when the feature was read. The batch pipeline might have used a slightly different time window. Timezone handling might differ. These small discrepancies compound across millions of training examples and produce a model that's calibrated on features it will never actually see in production.
Warning: This is the answer most candidates give. It sounds reasonable. It is wrong. If you say this in an interview without flagging the skew risk, a senior interviewer will immediately challenge you on it.
Good Solution: Log features at serving time and join on labels
Instead of recomputing features at training time, log the exact feature vector the Ranking Service used when it made a prediction. Store it alongside a request ID. Later, when interaction labels arrive (user clicked, user scrolled past), join them to the logged features by request ID.
# In the Ranking Service, at inference time
def score_candidates(user_id: str, candidates: list[str]) -> list[ScoredPost]:
features = feature_store.bulk_fetch(candidates, user_id)
scores = model.predict(features)
# Log the exact features used, not a recomputed version
request_id = generate_request_id()
feature_log.append({
"request_id": request_id,
"user_id": user_id,
"candidates": candidates,
"features": features, # snapshot at this exact moment
"scores": scores,
"served_at": time.time(),
})
return rank_and_return(candidates, scores, request_id)
The Label Joiner then does a time-delayed join: interaction events that arrive within a 24-hour window get joined to the feature snapshot from the corresponding request. The training dataset contains exactly the features the model saw, with exactly the labels that resulted.
Great Solution: Immutable feature logging with distribution validation
The logging approach is correct, but it only works if you also validate that the feature distributions at training time match those at serving time. Add a monitoring step to the training pipeline that compares feature statistics from the training dataset against a sample of recent serving logs.
def validate_feature_distributions(
training_features: pd.DataFrame,
serving_sample: pd.DataFrame,
threshold: float = 0.1
) -> None:
for feature in training_features.columns:
train_mean = training_features[feature].mean()
serve_mean = serving_sample[feature].mean()
drift = abs(train_mean - serve_mean) / (serve_mean + 1e-9)
if drift > threshold:
raise FeatureSkewError(
f"Feature '{feature}' has {drift:.1%} mean drift "
f"between training ({train_mean:.4f}) and serving ({serve_mean:.4f}). "
f"Block model promotion."
)
If drift exceeds a threshold, block the model from being promoted to production. This catches cases where a feature pipeline change altered the distribution of a feature without anyone noticing. The feature log store should be append-only and immutable: no backfilling, no retroactive corrections. If a feature was wrong at serving time, that wrongness should be in the training data too, because that's the reality the model will face.
Tip: Mentioning feature distribution validation as a promotion gate is a Staff-level signal. Most candidates describe the logging loop but don't close it with validation. The interviewer will notice.

"How do we prevent the feed from becoming a filter bubble?"
A ranking model trained purely on clicks will learn to show users more of what they already engage with. That sounds good until you realize it means the feed converges to a narrow slice of content, popular creators crowd out new ones, and users eventually churn because the feed feels repetitive.
Bad Solution: Optimize a single engagement metric end-to-end
Train a model to maximize pCTR (predicted click-through rate). Rank posts by score descending. Ship it.
This works in the short term. Engagement metrics go up. Then creator diversity drops. New creators can't break through because they have no interaction history, so the model assigns them low scores. Users start seeing the same five creators repeatedly. Long-term retention suffers, but by the time you notice, the damage is done.
Warning: If you describe a ranking system that optimizes a single metric without mentioning the downstream effects, the interviewer will ask "what happens to creator health?" Be ready for that question.
Good Solution: Multi-objective scoring with weighted utility
Instead of a single score, compute a weighted utility that combines multiple objectives.
def compute_utility_score(
p_ctr: float,
p_like: float,
p_share: float,
recency_score: float,
creator_health_score: float,
weights: dict,
) -> float:
return (
weights["ctr"] * p_ctr
+ weights["like"] * p_like
+ weights["share"] * p_share
+ weights["recency"] * recency_score
+ weights["creator_health"] * creator_health_score
)
Creator health score might be a function of how long it's been since a creator's last post appeared in feeds, or a penalty applied when a single creator occupies more than X% of a user's recent feed. Recency score decays exponentially with post age. The weights are tunable and can be A/B tested independently.
The tradeoff: you now have a weight-tuning problem. Getting the weights right requires careful experimentation, and the right weights may differ by user segment.
Great Solution: Multi-objective scoring plus diversity re-ranking with position bias correction
Multi-objective scoring improves the score of individual posts but doesn't prevent clustering. You can still end up with the top 20 slots all occupied by posts from the same three authors, each with a high utility score. Diversity re-ranking fixes that.
Apply Maximal Marginal Relevance (MMR) after scoring: iteratively select the next post that maximizes a combination of its utility score and its dissimilarity from already-selected posts.
def mmr_rerank(
scored_posts: list[tuple[str, float]], # (post_id, utility_score)
post_metadata: dict, # author_id, topic_tags per post
lambda_diversity: float = 0.3,
top_k: int = 50,
) -> list[str]:
selected = []
remaining = list(scored_posts)
while remaining and len(selected) < top_k:
best_post, best_score = None, float("-inf")
for post_id, utility in remaining:
diversity_penalty = max(
similarity(post_metadata[post_id], post_metadata[s])
for s in selected
) if selected else 0.0
mmr_score = (1 - lambda_diversity) * utility - lambda_diversity * diversity_penalty
if mmr_score > best_score:
best_score = mmr_score
best_post = post_id
selected.append(best_post)
remaining = [(p, u) for p, u in remaining if p != best_post]
return selected
Position bias correction is the other piece. Users are far more likely to click on posts shown in slot 1 than slot 10, regardless of content quality. If you train on raw clicks without correcting for this, the model learns that "posts shown at the top are good" rather than "good posts are shown at the top." Inverse propensity scoring (IPS) reweights training examples by the inverse probability of being shown at that position, breaking the feedback loop.
Tip: Describing position bias and IPS without being prompted is a strong senior signal. It shows you understand that the training data is not a neutral sample of user preferences; it's a biased sample shaped by your own ranking decisions.

"How do we handle cache invalidation when a new post is published?"
Every time a user publishes a post, some set of followers' feed caches become stale. Deciding which caches to invalidate, and how eagerly, is where the system either scales gracefully or falls over.
Bad Solution: Invalidate all follower caches synchronously on publish
When a post is published, look up all followers and delete their feed cache entries. Simple, correct, immediate.
For a user with 500 followers, this is fine. For a user with 10 million followers, you've just queued 10 million cache deletes synchronously in the publish path. The publish request times out. The queue backs up. Downstream services start dropping work. This is the celebrity problem, and it's a real production failure mode.
Warning: Candidates who say "invalidate all followers' caches on publish" without qualifying it with follower count have not thought through the celebrity case. The interviewer will ask "what if the author has 100 million followers?"
Good Solution: Asynchronous eager invalidation for normal accounts
Move invalidation off the publish path entirely. The Post Publish Event goes to Kafka. A pool of invalidation workers consumes from the topic, looks up followers, and deletes or rewrites their cache entries asynchronously.
This decouples publish latency from invalidation throughput. Workers can be scaled independently. If a burst of posts arrives, the queue absorbs it and workers drain it at their own pace. The tradeoff: followers might see a stale feed for a few seconds to a few minutes after a post is published, depending on queue depth.
For most use cases, that's acceptable. The freshness SLA of "new posts appear within 5 minutes" is easily met with a healthy worker pool.
Great Solution: Tiered invalidation based on author follower count
The production answer combines eager invalidation for normal accounts with lazy invalidation for celebrities, using a tiered author classifier at publish time.
For authors below the celebrity threshold (say, 100K followers), eager invalidation works fine. Workers rewrite the feed cache for each follower proactively. For celebrity authors, you skip the per-follower cache rewrite entirely. Instead, you write a staleness flag to a lightweight flag store (a Redis hash keyed by user_id). When a follower next requests their feed, the Feed API checks the flag store first. If the flag is set, it triggers an inline re-rank before serving, then clears the flag.
def handle_post_published(post_id: str, author_id: str):
follower_count = author_stats.get_follower_count(author_id)
if follower_count < CELEBRITY_THRESHOLD:
# Eager: rewrite caches for all followers
invalidation_queue.publish({
"type": "eager_invalidate",
"author_id": author_id,
"post_id": post_id,
})
else:
# Lazy: mark followers' caches as stale via flag store
# Don't enumerate 100M followers; let them pull on next request
stale_authors_set.add(author_id) # stored in Redis set
# Feed API checks this set on each request
def serve_feed(user_id: str) -> list[Post]:
followed_celebrities = get_followed_celebrities(user_id)
has_stale_celebrity = any(
c in stale_authors_set for c in followed_celebrities
)
if has_stale_celebrity or not feed_cache.exists(user_id):
ranked_feed = ranking_service.rank(user_id)
feed_cache.set(user_id, ranked_feed, ttl=300)
return ranked_feed
return feed_cache.get(user_id)
The staleness flag approach means celebrity posts propagate to followers on their next feed load rather than proactively. For a user who opens their app every 30 minutes, they'll see the celebrity's post within 30 minutes. That's within most freshness SLAs and costs essentially nothing at publish time.
You can also add a TTL-based backstop: even without explicit invalidation, feed caches expire after 5 minutes, so the worst-case staleness is bounded regardless of the invalidation path.
Tip: The three-tier model (normal, mid-tier, celebrity) with different invalidation strategies per tier is the answer that distinguishes senior from mid-level candidates. Mid-level candidates describe eager invalidation. Senior candidates describe the hybrid. Staff candidates also ask "how do we tune the threshold, and what metrics tell us it's set correctly?"

What is Expected at Each Level
Interviewers calibrate their feedback against a mental model of what "good" looks like at each level. Here's what that model looks like for a news feed ranking system.
Mid-Level
- Correctly describe the two-stage pipeline: candidate generation narrows the pool, ranking scores it. Candidates who collapse these into one step ("just query the database and sort by score") signal they haven't thought about scale.
- Identify the fan-out tradeoff without being asked. You don't need to solve the celebrity problem perfectly, but you should name it and explain why fan-out on write breaks down at high follower counts.
- Know why a feature store exists. "We need low-latency access to precomputed features at serving time" is enough. Bonus points for distinguishing online from offline feature stores.
- Do the capacity math unprompted. At 500K feed loads per second with a 200ms p99 budget, how large does your Redis cluster need to be? What TTL makes sense? Walking through these numbers shows you think in systems, not just components.
Senior
- Proactively raise training-serving skew before the interviewer asks. This is the single most common silent killer in production ranking systems, and senior candidates bring it up on their own. Explain that features must be logged at serving time and joined with delayed labels, not recomputed from scratch during training.
- Quantify the latency budget across stages. "Candidate generation gets 20ms, feature hydration gets 30ms, model inference gets 40ms" is the kind of concrete allocation that separates senior engineers from mid-level ones who just say "it needs to be fast."
- Drive the diversity re-ranking discussion independently. Propose a concrete mechanism, whether that's MMR, slot-based author caps, or a multi-objective utility score, and explain what failure mode it prevents (filter bubbles, creator starvation, engagement cannibalization).
- Propose how you'd handle cache invalidation for celebrity accounts without melting your fan-out infrastructure. Lazy invalidation with a staleness flag is the right direction; explain the freshness tradeoff you're accepting.
Staff+
Staff candidates reframe the problem before solving it. The first question isn't "how do we rank posts" but "what are we actually optimizing for, and are those objectives in tension?"
- Frame ranking around business objectives first. Raw CTR is easy to optimize and easy to game. A staff-level answer discusses creator health (are smaller creators getting any distribution?), engagement quality (shares and saves vs. rage-clicks), and long-term retention, then proposes a weighted utility function that balances them.
- Address experimentation infrastructure as a first-class concern. A/B testing a new ranking model sounds simple until you realize the treatment and control groups share a feature store. If the treatment group's interactions update post CTR features that the control group also reads, your experiment is already contaminated. Staff candidates identify this and propose solutions: holdout feature namespacing, shadow scoring, or delayed feature writes.
- Identify second-order failure modes. A model trained on engagement data will amplify whatever content was already getting clicks, which shifts the training distribution, which makes the next model even more biased toward that content. This feedback loop is real, it happened at scale at multiple companies, and naming it with a mitigation (position bias correction, exploration traffic, counterfactual logging) is what distinguishes staff-level thinking.
- Discuss operational maturity. How do you detect model degradation in production? What's your rollback story if a new model tanks creator diversity metrics? What monitoring tells you the feature store is serving stale data? These questions don't have clean answers, but raising them shows you've shipped systems like this before.
Key takeaway: A news feed ranking system is not a search problem with a relevance score. It's a multi-objective optimization problem running under tight latency constraints, trained on biased data, and deployed into a feedback loop. The candidates who pass at senior and above are the ones who see all three of those dimensions at once.
