Join ML Engineer Interview MasterClass (April Cohort) led by FAANG Data Scientists | Just 6 seats remaining...
ML Engineer MasterClass (April) | 6 seats left
Slack routes each workspace's messages to a specific database shard so that searching your team's history never touches anyone else's data. Discord shards by guild. Instagram sharded by user ID when a single Postgres node could no longer absorb their write volume. The pattern shows up at every company that outgrows a single database, and it shows up in almost every senior engineering interview for the same reason: it's one of the few scaling decisions that's genuinely hard to undo once you've made it.
The core idea is simple enough. Instead of one machine holding all your data, you split it across many machines, each owning a slice. Interviewers use "partitioning" and "sharding" interchangeably, but if you want to be precise: partitioning is the general concept of dividing data into chunks, while sharding specifically means those chunks live on separate physical nodes. Drop that distinction naturally and you signal you've done more than skim a blog post.
A user writes a new order. That request hits your application layer, which needs to figure out one thing: which shard holds (or should hold) this data? The answer comes from the shard key, a field you've chosen from your data model. Maybe it's user_id, maybe it's order_id. Whatever it is, the routing layer takes that key, runs it through some deterministic logic (a hash, a range lookup, a directory check), and gets back a shard identifier. The request then goes directly to that shard and nowhere else.
Think of it like a mail sorting facility. Every letter has a zip code (the shard key), the sorting machine (routing layer) reads it, checks which truck goes to that region (shard map), and drops the letter on the right truck (shard). No letter needs to visit every truck. That's the whole point.
Here's what that flow looks like:

If you remember one thing from this section, make it this: the shard key is the single most important decision in your sharding design. Everything downstream depends on it.
A good shard key distributes writes evenly across shards and ensures that the queries your application runs most frequently can be satisfied by hitting a single shard. A bad shard key funnels disproportionate traffic to one node while the others sit idle. When you tell your interviewer "I'd shard by X," they're immediately evaluating whether X creates hot spots, whether it aligns with your query patterns, and whether it forces cross-shard joins.
You don't get to change your shard key easily after the fact. Picking the wrong one is closer to a schema redesign than a config change.
user_id because 90% of our queries are scoped to a single user, so each query hits exactly one shard." That one sentence shows the interviewer you're thinking about access patterns, not just picking a field at random.Between the routing layer and the actual shards sits a metadata service, often called the shard map (or partition map, or config server). This is the lookup that answers: "given this key or hash value, which physical node owns it?"
The shard map might be a simple in-memory table that says "hash values 0-1000 go to Shard A, 1001-2000 go to Shard B." It might be a dedicated coordination service like ZooKeeper or etcd. In MongoDB, it's the config server replica set. The routing layer consults this map on every request, which means it needs to be fast and highly available. If the shard map goes down, your routing layer can't route anything.
In practice, most systems cache the shard map aggressively at the routing layer and only refresh it when shards are added, removed, or rebalanced. This keeps the hot path (normal reads and writes) from depending on a synchronous metadata lookup every single time.
Each shard is a fully independent database instance. It has its own tables, its own indexes, its own query engine. It handles its own replication to followers for read scaling and failover. It has its own failure domain, meaning if Shard B's disk dies, Shards A and C keep serving traffic without interruption.
This independence is both the power and the pain of sharding. Power, because a problem on one shard is isolated. Pain, because you now have N independent databases to monitor, back up, upgrade, and keep in sync with your schema changes.
Deterministic routing means no broadcast. The whole point is that given a shard key, you can compute exactly which node to talk to without asking all of them. The moment you lose this property (because a query doesn't include the shard key, for example), you fall back to scatter-gather across every shard, which kills the performance benefit. Interviewers will test whether you understand this by asking about queries that don't include the shard key.
The shard map is a potential bottleneck and single point of failure. If the interviewer asks "what could go wrong with this architecture," the shard map is a strong answer. You should be ready to explain how caching, replication, and versioning of the shard map mitigate this.
Shard independence means no free joins. Data on Shard A can't do a local join with data on Shard B. There's no shared memory, no shared disk. If your interviewer hears you casually propose a JOIN across two entities that live on different shards, they'll push back. That awareness, knowing what you lose when data is split, is exactly what separates a surface-level answer from a strong one.
In an interview, you'll usually need to pick a specific approach. Here are the ones worth knowing.
Each of these patterns answers the same question differently: given a shard key, which node gets the data? Your job is to know the tradeoffs well enough to pick the right one for the scenario and explain why you're picking it.
Take the shard key (say, user_id), run it through a hash function, then mod the result by the number of shards. If you have 4 shards and hash(user_id) % 4 = 2, that record lives on shard 2. That's it. The beauty is simplicity: the distribution is nearly uniform regardless of what the underlying keys look like, because the hash function scrambles them.
The problem shows up the moment your cluster size changes. If you go from 4 shards to 5, almost every key's mod result changes. That means nearly all your data needs to move. For a 100GB database, you're looking at a massive, disruptive migration just to add one node. This is why hash-based sharding works best when your cluster size is fixed or changes very rarely. If the interviewer's scenario involves a predictable, stable workload (an internal analytics system, a fixed-size cache tier), this is a perfectly reasonable choice.
When to reach for this: the interviewer describes a system with a known, stable number of nodes and no expectation of frequent scaling. Or you're early in the discussion and want to start simple before upgrading to consistent hashing.

Instead of hashing, you divide the key space into contiguous ranges and assign each range to a shard. User IDs 1 through 999 go to Shard A, 1000 through 1999 go to Shard B, and so on. The routing layer just needs to know the boundary values to figure out where a request belongs.
The killer feature here is range queries. Need all orders from January? If you've partitioned by timestamp, those records sit on one or two shards, and you can scan them efficiently without touching every node in the cluster. Hash-based sharding can't do this; a range scan would scatter across all shards.
The killer flaw is hot spots. If you partition a time-series dataset by month and 90% of your reads and writes target the current month, one shard is on fire while the others sit idle. Same thing happens with alphabetical partitioning if your user base skews heavily toward certain name prefixes. The distribution is only as good as your access patterns, and access patterns are rarely uniform.
When to reach for this: the interviewer's scenario involves time-series data, ordered scans, or any query pattern where "give me everything between X and Y" is common. Think logging systems, analytics pipelines, or event stores.

This is the one interviewers expect you to know cold. Picture a circular number line (a "ring") from 0 to some large number, like 2^32. Each shard gets hashed onto a position on this ring. When a key arrives, you hash it onto the same ring and walk clockwise until you hit the first shard. That shard owns the key.
Why does this matter? When you add a new shard, it lands at one position on the ring and only takes over keys from its immediate clockwise neighbor. Everything else stays put. Removing a shard works the same way in reverse. Instead of rehashing the entire dataset (like hash-based sharding forces you to), you're only moving a fraction of the keys. For a cluster with N shards, adding one node redistributes roughly 1/N of the data.
There's a catch, though. With only a few physical nodes on the ring, the distribution can be wildly uneven. One node might own a huge arc of the ring while another owns a sliver. The fix is virtual nodes: each physical machine gets placed at multiple positions on the ring (say, 150 to 200 virtual positions per node). This smooths out the distribution dramatically. When the interviewer asks "but what about uneven load?", virtual nodes is your answer.
When to reach for this: any time the interviewer pushes on elasticity, auto-scaling, or "what happens when a node goes down." This is the default answer for systems that need to grow and shrink gracefully. DynamoDB and Cassandra both use variants of this approach, and mentioning that signals you've seen it in practice.

Forget algorithms entirely. Just maintain an explicit lookup table that says "key X lives on shard 3, key Y lives on shard 1." Every request hits this directory service first, gets told where to go, and then proceeds to the right shard.
This gives you maximum flexibility. Want to move a single high-traffic customer to a dedicated shard? Update one row in the directory. Want to rebalance by migrating a batch of keys? Update the mappings. No hash functions, no ring math, no worrying about whether your key distribution is uniform. You control placement directly.
The tradeoff is that the directory itself becomes a critical dependency. Every single read and write passes through it, so it needs to be fast (in-memory or heavily cached) and highly available (replicated, with failover). If the directory goes down, nothing can be routed. You've essentially traded algorithmic complexity for operational complexity.
When to reach for this: the interviewer describes a multi-tenant system where different tenants have wildly different sizes and you need fine-grained control over placement. Or when the access patterns are so irregular that no algorithmic approach can distribute load evenly.

| Pattern | Distribution Evenness | Range Query Support | Rebalancing Cost |
|---|---|---|---|
| Hash-based | High (uniform hashing) | None (data is scattered) | Very high (full rehash) |
| Range-based | Depends on access patterns | Excellent | Medium (split/merge ranges) |
| Consistent hashing | High (with virtual nodes) | None | Low (only 1/N keys move) |
| Directory-based | Fully controllable | Possible (if you design for it) | Low (update mappings), but operationally complex |
For most interview problems, you'll default to consistent hashing. It handles the "what about scaling?" follow-up gracefully, it's well-understood, and it shows the interviewer you know how real distributed databases work. Reach for range-based partitioning when the problem is fundamentally about ordered data (time-series, leaderboards, sequential IDs with range scans). And if the interviewer describes a multi-tenant scenario where one customer is 1000x larger than the rest, directory-based sharding lets you say "we can pin that tenant to dedicated infrastructure" without redesigning the whole system.
Here's where candidates lose points, and it's almost always one of these.
"I'd shard by country." Sounds reasonable until you remember that for most global products, the US or India alone might account for 60% of your traffic. You've just put the majority of your write load on a single node while the other shards sit idle. You haven't scaled anything. You've just made one machine work harder with extra routing overhead.
This isn't limited to geographic keys. Sharding by created_date in a time-series system means today's shard absorbs every single write while yesterday's shard does nothing. Sharding by merchant_id in a marketplace means the top 1% of sellers hammer one shard.
The fix depends on the situation. Compound keys help: instead of country, use country + user_id so that within each country, data fans out across multiple shards. Salting is another option: prepend a random prefix (say, 0-9) to the key so that one logical entity's data spreads across 10 shards. The tradeoff is that reads now need to query all 10 and merge results.
"We'll just query all the shards and combine the results."
That sentence makes interviewers wince. Yes, scatter-gather (fan out to all shards, collect responses, merge) is a real pattern. But candidates toss it out like it's free. It's not. If you have 50 shards, you're now issuing 50 parallel queries, waiting for the slowest one (tail latency kills you), and merging results in the routing layer. Sorting, pagination, aggregation across shards? Each of those is its own headache.
The real problem usually starts earlier: the candidate chose a shard key that doesn't align with the query patterns. If you shard an e-commerce orders table by user_id but the most common dashboard query is "show all orders for this product," every single query becomes a scatter-gather across all shards.
What to say instead: acknowledge that cross-shard queries are expensive and explain how you'd design around them. Denormalization is one approach. Store a copy of the data organized by the secondary access pattern. A global secondary index is another, where a separate service indexes the cross-cutting dimension and points back to the right shards. The key insight is that your shard key should match your dominant query pattern, and you should handle the secondary patterns through purpose-built read paths.
"If we need more capacity, we just add more shards."
Just. That word is doing a terrifying amount of heavy lifting. When a candidate says this, the interviewer is going to push: "Okay, walk me through what happens to the data that's already on the existing shards."
With hash-based sharding (hash mod N), changing N means almost every key maps to a different shard. You're not moving 1/Nth of the data. You're potentially moving nearly all of it. Even with consistent hashing, where only ~1/N of keys need to move when you add a node, you still have to physically copy that data to the new shard, handle requests for keys that are mid-migration (do you read from the old shard or the new one?), and update the shard map atomically so no requests get lost.
There's a big difference between offline resharding (take the system down, redistribute, bring it back up) and live migration (move data in the background while serving traffic). Live migration requires dual-reads during the transition, a way to track which keys have been migrated, and careful coordination so writes don't go to the wrong place. MongoDB's balancer does this with chunk migration. It's complex machinery.
Under pressure, candidates blur these two concepts together. They'll say something like "we'll shard the database for high availability" or "if a shard goes down, the other shards have the data."
No. The other shards do not have the data. That's the whole point.
Sharding splits data: each shard holds a different subset. If Shard B dies, the data on Shard B is gone (unless you've separately set up replication for it). Replication copies data: each replica holds the same data as its primary. Replication gives you fault tolerance and read scaling. Sharding gives you write scaling and larger total storage.
In practice, you almost always use both. Each shard is itself a replica set (a primary and one or more secondaries). So Shard B has its own replicas that can take over if the primary fails. But the data on Shard B is completely independent from the data on Shard A.
If an interviewer asks "what happens when a shard goes down?", the answer isn't "the other shards pick up the slack." The answer is "the shard's replica promotes to primary, and the routing layer redirects traffic to it." If you don't have replication within each shard, you have a single point of failure, and the interviewer will absolutely call that out.
Not every scaling problem is a sharding problem, and interviewers are testing whether you know that. Listen for these cues:
The cue you should not react to: "reads are slow." That's a caching or read-replica problem first. Jumping straight to sharding when the interviewer says "reads are slow" is one of the fastest ways to lose credibility.
user_id. Most write operations, like placing an order or updating a cart, are scoped to a single user. That means each write hits exactly one shard, no cross-shard coordination needed. I'd avoid something like product_id because a flash sale on one popular product would hammer a single shard while the others sit idle."user_id, a query like that becomes a scatter-gather across every shard, which is expensive. I'd handle that with a separate denormalized view or a secondary index. Maybe a dedicated analytics store that receives events asynchronously. You don't want to compromise your primary shard key choice to optimize for a reporting query that can tolerate a few seconds of staleness.""How do you handle a hot shard?" Compound shard keys or salting. If celebrity_user_id gets disproportionate traffic, you can append a random suffix (salt) to spread that user's data across multiple shards, at the cost of needing to query all salt variants on reads.
"What happens to transactions that span multiple shards?" Two-phase commit works but kills performance. The better answer is to design your shard key so that related data lives on the same shard. When that's impossible, saga patterns with compensating actions are the standard approach.
"How does the routing layer know which shard to hit?" A shard map (sometimes called a configuration server) stores the key-to-shard mapping. The routing layer caches this locally and refreshes it when shards are added or removed. MongoDB's mongos router and its config servers are a concrete example you can reference.
"Can you name a real system that does this?" DynamoDB uses consistent hashing across partitions. PostgreSQL supports declarative range and list partitioning natively (though on a single machine, not distributed). MongoDB's sharded clusters auto-balance chunks across shard replicas. Pick whichever one you've actually worked with.
GROUP BY now require scatter-gather, and backups become a coordination problem across N independent databases. Ending with tradeoffs signals senior-level thinking.