Replication: Keeping Your Data Alive When Things Die

Why This Matters

Picture this: it's 2 AM and your primary database just died. Every customer order, every user profile, every row of data your company cares about lives on that one machine. If you don't have copies of that data sitting on other nodes, ready to take over, you're staring at a full-blown outage. Replication is the practice of keeping copies of the same data on multiple machines so that when (not if) one of them fails, your system keeps running. It's how Netflix serves over 200 million users across the globe without routing every single read request back to one database in a single data center. Instead, copies of the data live on nodes spread across regions, absorbing massive read traffic and surviving hardware failures without anyone noticing.

In system design interviews, replication almost never shows up as the main question. Instead, it surfaces as a decision you're expected to make confidently: "How will your database handle a node failure?" or "How do you scale reads when your service hits 50,000 requests per second?" The interviewer is testing three things at once. Can you keep the system alive when nodes die (fault tolerance)? Can you spread read load across multiple copies instead of hammering one machine (read scalability)? Can you put data closer to users in different geographies so they're not waiting on cross-continent round trips (latency reduction)? Those are the three reasons replication exists, and you should be able to name them without hesitating.

Common mistake: Saying "we'll just replicate the data" and moving on. Interviewers hear this constantly, and it's a red flag. Replication always comes with tradeoffs between availability and consistency. Every time you propose it, you need to follow up with how you're replicating and what consistency guarantees you're accepting. By the end of this lesson, you'll know exactly which replication strategy to reach for, when to bring it up, and how to explain the tradeoffs clearly enough that your interviewer nods instead of probing.

How It Works

Strip away all the variations and patterns, and replication boils down to one thing: you write data to one node, and that change gets copied to other nodes. That's it. Every replication strategy you'll ever discuss in an interview is just a different answer to three questions: who accepts the writes, how changes get from one node to another, and when the replicas are considered caught up.

Think of it like a professor writing lecture notes. They write the original, then copies get distributed to students. The interesting questions are all about the distribution: does the professor wait until every student has their copy before continuing? Or do they keep writing and let students pick up copies whenever they can? And what happens if multiple professors are writing notes at the same time?

Three Models, One Core Idea

Before diving into mechanics, you should know that replication comes in three flavors, and your interviewer may ask you to compare them.

Leader-follower (also called primary-replica or master-slave) is the most common. One node accepts all writes, and the rest copy from it. PostgreSQL, MySQL, MongoDB by default, and Redis all use this model. It's simple to reason about because there's a single source of truth for writes.

Multi-leader allows writes at more than one node, each of which replicates to the others. You'll see this in multi-datacenter setups where you need local write latency in each region. CockroachDB and some configurations of MySQL support this. The catch is write conflicts: two leaders might accept conflicting updates to the same row at the same time, and you need a strategy to resolve that.

Leaderless (sometimes called Dynamo-style) skips the concept of a leader entirely. Clients write to multiple nodes simultaneously and read from multiple nodes, using quorum rules to determine what counts as "successful." Cassandra and Amazon DynamoDB follow this approach.

Each model makes fundamentally different tradeoffs around consistency, availability, and complexity. We'll explore multi-leader and leaderless replication in depth in later sections. For now, let's trace the mechanics through leader-follower replication, because it's the model interviewers expect you to explain first and the one that makes the other two easier to understand by contrast.

The Write Path, Step by Step

Here's what happens when a client writes a row to a leader-follower replicated database:

The client sends a write to the leader node (the one designated to accept writes).
The leader persists the change locally and appends it to a replication log, an ordered sequence of every mutation.
That log gets streamed to each follower node.
Each follower reads the log entries in order and applies them to its own copy of the data.
Clients reading from followers now (eventually) see the updated data.

Here's what that flow looks like:

Core Replication Flow: Leader to Followers via Replication Log

The replication log is the backbone of this whole process. In PostgreSQL, it's called the Write-Ahead Log (WAL). MySQL calls it the binlog. The name changes, but the idea is identical: an append-only, ordered record of every data change. Replicas consume this log sequentially, which guarantees they apply changes in the same order the leader did. Without that ordering guarantee, your replicas would diverge from the leader and from each other.

In multi-leader and leaderless systems, there's no single authoritative log. Instead, nodes exchange changes through gossip protocols, anti-entropy mechanisms, or read-repair. The ordering problem gets much harder, which is exactly why leader-follower remains the default choice when you can tolerate a single write endpoint.

Synchronous vs. Asynchronous: The First Fork

This is where your interviewer starts paying close attention. When the leader sends log entries to followers, does it wait for them to confirm before telling the client "your write succeeded"?

Synchronous replication means yes, it waits. The client's write request hangs until at least one (or all) replicas have acknowledged the change. You get a strong guarantee: if the leader dies right after confirming the write, the data exists on at least one other node. The cost is latency. Every write now includes a network round-trip to the replica, and if that replica is slow or unreachable, your writes stall.

Asynchronous replication means the leader confirms the write immediately after persisting it locally, then ships log entries to followers in the background. Writes are fast. But there's a window where the leader has data that no replica has yet. If the leader crashes during that window, those writes are gone.

Most production systems land somewhere in between. A common setup (and a great default answer in interviews) is semi-synchronous: one follower replicates synchronously to guarantee at least one backup copy, while the rest replicate asynchronously for speed. PostgreSQL supports this directly, and it's what most teams actually run.

Your 30-second explanation: "Replication works by copying data changes from one or more nodes to other nodes. In leader-follower setups, the leader appends every write to an ordered log and streams it to followers that apply changes in sequence. Multi-leader and leaderless models distribute writes differently but solve the same core problem. The big tradeoff is synchronous versus asynchronous propagation: synchronous gives you durability across nodes before confirming the write, but adds latency. Asynchronous is faster but risks losing recent writes if a node crashes before others catch up."

Replication Lag: The Source of Every Weird Bug

The time between a write landing on one node and that same write appearing on another. That gap has a name: replication lag.

Under normal conditions, it's tiny. Milliseconds, maybe low single-digit seconds. But under heavy write load, during network hiccups, or when a replica is recovering from a restart, lag can spike. And when it does, strange things happen. A user updates their profile, refreshes the page, and sees the old version because their read hit a lagging follower. Two users looking at the same dashboard see different numbers. An event that happened second appears to have happened first.

This isn't unique to leader-follower setups. Leaderless systems face their own version of the problem: a client might read from a node that hasn't yet received a recent write from another node. The quorum mechanism (requiring reads from R nodes and writes to W nodes where R + W > N) helps, but it doesn't eliminate staleness entirely, especially during network partitions.

Interview tip: When you tell your interviewer "reads go to replicas," expect the follow-up: "What happens if a replica is behind?" Having a concrete answer ready (route recent reads to the leader, use monotonic read guarantees, or accept staleness for this use case) separates you from candidates who treat replication as a magic checkbox.

Your interviewer cares about replication lag because it reveals whether you understand that "replicated" does not mean "instantly consistent everywhere." Every system you design that involves read replicas needs you to articulate what staleness is acceptable and what you'll do when lag exceeds that threshold. If you can name the lag, reason about its impact, and propose a mitigation, you're demonstrating exactly the kind of tradeoff thinking that scores well.

Patterns You Need to Know

In an interview, you'll usually need to pick a specific approach. Here are the ones worth knowing.

Leader-Follower (Single Leader)

One node is the leader. It's the only node that accepts writes. Every other node is a follower that receives a copy of those writes and serves read traffic. When a client wants to update a row, that request goes to the leader. The leader writes it locally, appends it to its replication log, and the followers consume that log to stay in sync. PostgreSQL, MySQL, and MongoDB all default to this model.

The interesting part, and the part interviewers will push on, is what happens when the leader dies. A follower needs to be promoted, and that process is messier than most candidates admit. First, the system has to detect that the leader is actually down (not just slow). Then it picks the most up-to-date follower and promotes it. But if the old leader comes back online thinking it's still in charge, you've got split-brain: two nodes accepting writes, and your data is now diverging. Production systems use fencing tokens or leader leases to prevent this. If you mention failover in your interview, mention the risks too. Saying "we just promote a follower" without acknowledging potential data loss or split-brain will cost you points.

Interview tip: When the interviewer asks how you'd make your database highly available, leader-follower with async replication is your safe starting answer. Name it explicitly: "I'd set up a leader with two async followers, knowing that gives us eventual consistency on reads and a small window of potential data loss on failover."

When to reach for this: Most OLTP workloads where your writes come from a single region. This covers the vast majority of interview scenarios.

Leader-Follower: Single Writer with Read Replicas

Multi-Leader Replication

Now imagine your users are split across the US and Europe, and you're running a single-leader setup. Every write from European users has to cross the Atlantic to hit the leader in Virginia, adding 80-120ms of latency per write. That's painful.

Multi-leader replication solves this by placing a leader in each region. Both leaders accept writes independently, then asynchronously sync their changes with each other. Users get local write latency, and the system stays available even if one region goes down. The catch? Conflicts. Two leaders can modify the same row at roughly the same time, and now you need a strategy to reconcile those writes. The simplest approach is last-write-wins (LWW), where you attach a timestamp to each write and the later one wins. It's easy to implement but silently drops data. For something like a collaborative document (think Google Docs), you'd need something smarter: CRDTs (conflict-free replicated data types) or operational transforms that can merge concurrent edits without losing either user's changes.

Conflict resolution is the thing interviewers want to hear you reason about. Don't just say "we'll use multi-leader." Say what happens when two users in different regions update the same record, and explain which resolution strategy fits your use case.

When to reach for this: Multi-datacenter deployments where write latency matters, or collaborative editing scenarios. If the interviewer mentions "users in multiple continents writing data," this is your cue.

Multi-Leader: Independent Writers with Conflict Resolution

Leaderless Replication (Dynamo-Style)

No leader at all. Any node can accept both reads and writes. The client sends a write to multiple nodes simultaneously, and the write is considered successful once a threshold number of nodes acknowledge it. Reads work the same way: the client reads from multiple nodes and uses conflict resolution mechanisms to determine the correct value.

This threshold is the quorum. With N total replicas, you write to W nodes and read from R nodes. As long as W + R > N, at least one node in your read set will have the latest write. A common configuration is N=3, W=2, R=2. The beauty of this approach is that no single node is special, so there's no failover process, no leader election, no split-brain. If a node goes down, the remaining nodes keep serving traffic. DynamoDB and Cassandra both use this model.

So how does a client actually figure out which value is "latest" when it reads from multiple nodes and gets back different answers? This is where version vectors come in. Each node tracks a vector of logical clocks, one per replica, that gets incremented on every write. When the client collects responses from R nodes, it compares these version vectors to determine the causal ordering of writes. If one vector strictly dominates another (every component is greater than or equal, with at least one strictly greater), that write happened later. But if neither dominates, the writes are truly concurrent, and the client or application has to merge them. Cassandra, for instance, defaults to last-write-wins for this merge, while DynamoDB can surface both conflicting values to the application and let your code decide.

Reading from multiple nodes also enables read repair. When the client detects that one of the R nodes returned a stale value (its version vector is behind), it writes the latest value back to that stale node right then and there. This is an opportunistic healing mechanism: every read is also a chance to fix inconsistencies. Some systems complement this with an anti-entropy process, a background task that continuously scans replicas and reconciles differences. Read repair catches staleness on hot keys quickly; anti-entropy catches it on cold keys that nobody reads for a while.

There's another subtlety that trips people up. During a network partition, the system might use a sloppy quorum: writes land on nodes that aren't the "correct" replicas for that key, just to keep the system available. Those writes get forwarded to the right nodes later through a process called hinted handoff. This keeps the system running but temporarily weakens your consistency guarantee, even if your W + R math looks right on paper.

Common mistake: Candidates set W=1 and R=1 with N=3, then claim they have consistent reads. They don't. W + R = 2, which is less than N = 3, meaning reads can miss the latest write entirely. Always do the math out loud so the interviewer sees you understand the tradeoff you're making.

Key insight: Quorum math tells you whether your read set overlaps with your write set. Version vectors tell you which value in that overlap is actually the latest. Read repair tells you how stale nodes catch up. These three mechanisms work together. If an interviewer asks "how does a leaderless system stay consistent?", walk through all three, not just the quorum formula.

When to reach for this: High-availability key-value stores or systems where you need to survive node failures without any downtime for leader election. If the problem calls for "always writable" semantics and can tolerate eventual consistency, leaderless is your answer.

Leaderless (Dynamo-style): Quorum Reads and Writes

Picking the Right Pattern

	Leader-Follower	Multi-Leader	Leaderless
Write throughput	Limited by single leader	Higher (parallel writes per region)	Highest (any node accepts writes)
Consistency	Strong (if sync) or eventual (if async)	Eventual, conflicts possible	Eventual, quorum-tunable
Conflict handling	None needed (single writer)	Requires resolution strategy	Version vectors, read repair
Typical use cases	Most OLTP apps, web backends	Multi-region apps, collaborative editing	Key-value stores, shopping carts

For most interview problems, you'll default to leader-follower. It's the simplest to reason about, the easiest to explain, and it matches what the majority of production systems actually use. Reach for multi-leader when the interviewer introduces multi-region writes or real-time collaboration. Reach for leaderless when the problem prioritizes availability above all else and the data model is simple enough that eventual consistency won't cause user-visible bugs (think session stores, analytics counters, or shopping carts where merging is straightforward).

What Trips People Up

Here's where candidates lose points, and it's almost always one of these.

The Mistake: Forgetting Read-After-Write Consistency

A candidate designs a system with a leader and two read replicas. They route all reads to the followers for scalability. The interviewer asks: "A user updates their profile name and immediately refreshes the page. What do they see?"

The candidate pauses. They hadn't thought about it.

Here's what happens: the user's write goes to the leader, the response comes back confirming success, and then the user's next GET request hits a follower that's 200ms behind. The page still shows the old name. From the user's perspective, the system just ate their update.

Common mistake: Candidates say "reads go to replicas" and move on. The interviewer hears "this person hasn't dealt with replication lag in practice."

The fix isn't complicated, but you need to know it exists. For data the current user just wrote, route their reads back to the leader for a short window (say, 10 seconds after their last write). Alternatively, track the replication log position at write time and only serve the read from a follower that's caught up past that position. This is called a "monotonic read" guarantee.

Interview tip: When you propose read replicas, immediately volunteer this edge case before the interviewer asks. Say: "One thing we need to handle is read-after-write consistency. For the writing user, we'll route reads to the leader for recently modified data. Other users can tolerate a few hundred milliseconds of staleness."

That kind of unprompted awareness signals real experience.

The Mistake: Treating Failover Like a Button Press

"If the leader goes down, we just promote a follower."

Interviewers hear this constantly, and it's the sentence that triggers the follow-up questions that sink candidates. Because failover is where replication gets genuinely hard.

With asynchronous replication, the leader might have accepted writes that hadn't yet propagated to any follower. When you promote a follower, those writes are gone. Not stale. Gone. If that was a financial transaction or an order confirmation, you have a real problem.

It gets worse. Say the old leader comes back online after a network blip. It still thinks it's the leader. Now you have two nodes accepting writes independently. This is split-brain, and it can corrupt your data in ways that take weeks to untangle. Production systems use fencing tokens (a monotonically increasing number that lets other nodes reject requests from a stale leader) to prevent this, but most candidates don't mention it.

What to say instead: "Failover involves promoting the most up-to-date follower, but with async replication we accept that some recent writes may be lost. We mitigate split-brain by using a fencing mechanism so the old leader can't continue accepting writes if it comes back. For our most critical data, we might make one follower semi-synchronous so at least one replica is always fully caught up before we acknowledge writes."

That answer shows you understand the cost. Saying "we just failover" shows you don't.

The Mistake: Confusing Replication with Sharding

This one seems basic, but it comes up more than you'd expect. A candidate says "we'll replicate the data across ten nodes to handle the write throughput." That's not replication. That's sharding (or partitioning).

Replication copies the same data to multiple nodes. Every replica holds a full copy. It buys you fault tolerance and read scalability, but it does nothing for write throughput because every write still has to reach every copy.

Sharding splits different data across nodes. User IDs 1-1000 on shard A, 1001-2000 on shard B. It buys you write scalability because different writes go to different nodes.

Common mistake: Candidates conflate the two and say things like "we'll replicate across shards." The interviewer will ask: "Are you replicating or partitioning?" If you stumble here, it undermines everything else you've said about your data layer.

In practice, most production databases use both. You shard your data across multiple partitions, and then each partition is replicated for fault tolerance. When you're whiteboarding, be precise about which one you're reaching for and why.

The Mistake: Getting Quorum Math Wrong (or Trusting It Too Much)

In leaderless systems like Cassandra or DynamoDB, candidates love to drop the quorum formula: "We write to W nodes, read from R nodes, and as long as W + R > N, we're consistent."

Two problems show up here.

First, some candidates set the numbers wrong. With N=3, they'll say W=1 and R=2. That's W + R = 3, which equals N but doesn't exceed it. Depending on timing, you can still read stale data. Or they'll say W=1, R=1 for "performance" and not realize they've completely abandoned any consistency guarantee.

Second, and this is the subtler trap: even correct quorum math doesn't give you strong consistency in practice. Sloppy quorums (where writes land on nodes outside the designated replica set during a partition) break the overlap guarantee entirely. Concurrent writes to different nodes can arrive in different orders, and without a conflict resolution strategy, you get silent data corruption.

Interview tip: When discussing quorums, don't just recite the formula. Say something like: "With N=3, W=2, R=2, we get overlap on at least one node. But this is still eventual consistency in practice because of edge cases like sloppy quorums and concurrent writes. For this use case, that's acceptable because we're storing session data, not financial records."

The formula is the starting point, not the whole answer. Interviewers want to see that you know where the model breaks down.

How to Talk About This in Your Interview

When to Bring It Up

Replication isn't usually the question. It's a decision you make inside the question, and interviewers notice whether you make it deliberately or just wave your hands.

Here are the cues that should trigger you to talk about replication explicitly:

"How would you make this highly available?" This is the most direct one. If a database or datastore is a single point of failure, replication is your first answer.
"We need to handle high read throughput." When the read-to-write ratio is heavily skewed (think product catalog pages, user profiles), read replicas are the natural move.
"Users are distributed globally." The moment geography enters the conversation, you need to reason about where replicas live and whether writes should be accepted in multiple regions.
"What happens if this node goes down?" They're testing your failure thinking. If you haven't already introduced replicas, now's the time.
"How fresh does this data need to be?" This is a consistency question in disguise. Your answer should reference your replication strategy and the lag it introduces.

Every time you bring up replication, finish the sentence. Don't say "we'll add replicas." Say what kind of replication, whether it's sync or async, and what consistency guarantee that gives you. That one habit separates candidates who understand replication from candidates who are just name-dropping it.

Sample Dialogue

Interviewer: "So this is a user-facing e-commerce platform. How are you making the database layer highly available?"

You: "I'd set up leader-follower replication with two read replicas. All writes go to the leader, and we replicate asynchronously to the followers. Reads for things like product listings and order history get distributed across the followers, which takes pressure off the leader. Since we're async, there's a small window where a replica might serve slightly stale data, but for browsing products that's completely fine."

Interviewer: "Okay, but what if the leader dies?"

You: "Right, so failover. One of the followers gets promoted to leader. The risk with fully async replication is that the old leader might have accepted some writes that hadn't propagated yet, so those could be lost. For this system, I'd actually make one of the two followers semi-synchronous. That means at least one follower is always caught up, so when we promote it, we minimize data loss. The other follower stays async to keep write latency low."

Interviewer: "What about split-brain? What if the old leader comes back and thinks it's still the leader?"

You: "Yeah, that's the scary scenario. You need fencing. The new leader gets a fencing token, basically an incrementing epoch number, and any writes from the old leader with a stale token get rejected by the storage layer. Most managed databases handle this through their orchestration tooling, but if we're running our own, we'd need to implement this explicitly. The key point is that failover isn't just 'promote a follower.' You have to make sure the old leader can't corrupt things when it wakes back up."

Interviewer: "And if we expand to serve users in Europe too?"

You: "Then single-leader gets painful, because every European write has to cross the Atlantic to hit the leader. That's 80-150ms of added latency per write. I'd consider multi-leader replication at that point, with a leader in each region. The tradeoff is conflict resolution. If two users in different regions modify the same record, we need a strategy. For most of our data, last-write-wins with timestamps is probably sufficient. For something like inventory counts, we'd need something more careful, maybe a CRDT-based counter or application-level merge logic."

Follow-Up Questions to Expect

"How much replication lag is acceptable?" Have a number ready. For most async setups, you're looking at milliseconds to low single-digit seconds under normal conditions. Say that, then add: "But under heavy write load or network issues, it can spike. We'd monitor p99 replication lag and alert if it exceeds our SLA."

"Why not just make all replication synchronous?" Because every write would block until all replicas acknowledge, which tanks your write latency and means a single slow replica can stall your entire write path. One semi-sync follower is the standard production compromise.

"How is replication different from sharding?" Replication copies the same data to multiple nodes for availability and read scaling. Sharding splits different data across nodes for write scaling and storage capacity. Most production systems use both together.

"What consistency model are you choosing here?" Don't panic. Just name it: "With async replication, reads from followers are eventually consistent. If a user needs to see their own write immediately, we route that specific read to the leader. Everything else can tolerate a few hundred milliseconds of staleness."

What Separates Good from Great

A mid-level answer says "we'll replicate the database for high availability." A senior answer specifies leader-follower with async replication, names the consistency tradeoff, and explains why it's acceptable for this particular workload. The gap is specificity.
Mid-level candidates treat failover as automatic and free. Senior candidates acknowledge the cost: potential data loss with async replication, the need for fencing tokens, the operational complexity of leader election. Interviewers love hearing you reason about what goes wrong, not just what goes right.
When multi-region comes up, a great candidate doesn't just switch to "multi-leader." They explain why single-leader fails (cross-region write latency), name the conflict resolution challenge they're inheriting, and pick a strategy. If you're unsure which replication model fits, default to leader-follower with one semi-synchronous replica. It's the most battle-tested pattern in production, and saying so shows you know what real systems actually run.

Key takeaway: Never mention replication without stating the consistency tradeoff it creates. "We replicate with X strategy, which gives us Y consistency, and that's acceptable here because Z" is the sentence structure that signals real understanding to your interviewer.