CDNs: The Caching Layer Between Your Users and Your Origin

Why This Matters

Picture this: your origin server lives in us-east-1, and a user in Tokyo is loading your image-heavy product page. Every single request, every thumbnail, every hero banner, every CSS file, crosses the Pacific Ocean and back. That round-trip is roughly 150ms of pure physics. You can optimize your database queries down to microseconds, tune your server to perfection, and it won't matter. Light through fiber optic cable only moves so fast. A CDN solves this by caching your content on servers physically close to your users. Think of it as a geographically distributed caching layer: hundreds of servers around the world, each holding copies of your content so users grab it from down the street instead of across the planet. This is how Netflix serves over 200 million users without routing every stream back to a single data center. Their Open Connect network places content servers inside ISPs themselves, so your Friday night movie literally starts from a box in your internet provider's building.

Most candidates in system design interviews say "we'll put a CDN in front of it" and move on, like it's a magic incantation. Interviewers notice. They'll follow up with "what's your TTL strategy?" or "how do you handle cache invalidation when content changes?" and suddenly the candidate is scrambling. CDNs reduce latency and offload origin traffic, yes. But they also introduce real complexity around stale data, cache key design, and consistency after writes. That's where the actual design thinking lives, and that's what separates a surface-level answer from one that lands.

One more thing candidates miss: CDNs aren't just for images and JavaScript files anymore. Modern CDNs handle API response caching, edge compute (running actual code at the PoP), DDoS absorption, and TLS termination. Cloudflare Workers, Lambda@Edge, these are CDN features. When you show an interviewer that you understand this full range, you signal depth that most candidates simply don't have. By the end of this lesson, you'll know exactly when to bring CDNs into a design, how to explain the mechanics clearly, and how to navigate the tradeoff questions that follow.

How It Works

When a user in Tokyo types your URL and hits enter, the very first thing that happens isn't a request to your server. It's a DNS lookup. And this is the step most candidates blow right past on the whiteboard.

The user's browser asks a DNS resolver "where is cdn.yourapp.com?" and instead of getting back the IP of your origin server in Virginia, it gets the IP of a CDN edge node in Tokyo. This routing happens through one of two mechanisms. Geo-DNS inspects the resolver's location and returns the nearest PoP's IP address. Anycast takes a different approach: multiple PoPs advertise the same IP address, and BGP (Border Gateway Protocol) routing decisions across autonomous systems naturally direct packets toward the PoP with the shortest AS path or lowest latency. Either way, the user never knows this happened. They just got silently redirected to a server that's physically close to them.

There's a DNS detail worth knowing here: the TTL (Time-to-Live) on that DNS record for cdn.yourapp.com controls how long resolvers cache the returned IP. A short TTL (say, 60 seconds) means resolvers re-query frequently, so if a CDN provider needs to reroute traffic away from a failing PoP, the change propagates fast. A long TTL reduces DNS lookup overhead but makes you slower to react. In an interview, if someone asks how quickly you can failover CDN traffic, the answer starts with DNS TTL.

Think of it like a library system. You don't drive to the Library of Congress every time you want a book. You go to your local branch. If they have a copy, great. If not, they request one from the central collection, keep a copy on their shelf, and hand it to you.

Once the browser has the edge PoP's IP, it sends the actual HTTP request there. The edge node checks its local cache. If the content is there and hasn't expired, it serves it immediately. That's a cache hit, and it's the happy path. Response time drops from hundreds of milliseconds to single digits because the data traveled 20 miles instead of 6,000.

On a cache miss, the request has to go further. But it doesn't necessarily go all the way to your origin. Most CDN architectures have a middle layer called a shield cache (sometimes called a mid-tier cache). This is a larger, shared cache that sits between the edge PoPs and your origin. Multiple edge nodes in the same region funnel their misses through a single shield. So if the Tokyo edge misses but the Asia-Pacific shield has the content, the origin never gets touched.

Only when the shield also misses does the request finally reach your origin server. The origin generates the response and, critically, attaches caching headers: Cache-Control to say how long the content can be cached, ETag for conditional revalidation, sometimes Vary to indicate which request headers affect the response. The CDN obeys these headers. Your origin is the source of truth, and the CDN is just following its instructions.

Here's what conditional revalidation actually looks like in practice. Say the edge cached a product image with Cache-Control: max-age=3600 and an ETag: "abc123". An hour later, that max-age expires. The edge doesn't just throw the cached copy away and fetch a fresh one blindly. Instead, it sends a request to the origin (or shield) with the header If-None-Match: "abc123". The origin checks whether the content has changed. If it hasn't, the origin responds with a tiny 304 Not Modified, no body, no bandwidth wasted. The edge resets its freshness timer and keeps serving the same cached copy. If the content did change, the origin sends back the full new response with a new ETag. This mechanism is what keeps CDNs both fresh and efficient. Without it, every expiration would mean a full re-download.

Interview tip: If an interviewer asks "how does the CDN know when content is stale?", walk them through the Cache-Control max-age plus ETag/If-None-Match flow. It shows you understand that freshness isn't just about timers; it's about conditional validation that avoids unnecessary data transfer.

The response flows back through the shield (which caches it), to the edge (which caches it), to the user. Every subsequent request for that same content from nearby users gets served straight from the edge. No origin involved.

Here's what that flow looks like:

CDN Request Lifecycle: Cache Hit vs. Cache Miss

Why Your Interviewer Cares About Each Piece

DNS routing is the invisible load balancer, and the first point of failure. If you draw a CDN on your whiteboard but skip the DNS step, the interviewer will wonder how users actually get routed to the right PoP. Mentioning geo-DNS or anycast (and that anycast relies on BGP path selection) shows you understand the full request path, not just the caching part. It also shows you recognize where things break. A misconfigured DNS record, a stale resolver cache from an overly long TTL, a BGP route leak: these can silently send all your users to the wrong continent. Five seconds of explanation separates you from candidates who treat the CDN as a black box.

The origin controls caching policy, not the CDN. This is a subtle but important point. The CDN doesn't decide what to cache or for how long. Your origin does, through response headers. When an interviewer asks "how do you control what gets cached?", the answer starts at the origin's Cache-Control headers, not at some CDN dashboard. (CDN providers do let you override headers with edge rules, but the default behavior follows what the origin says.)

The shield layer is what protects your origin at scale. Without it, a cache miss at 50 different edge PoPs means 50 simultaneous requests to your origin for the same asset. With a shield, those 50 misses collapse into one origin fetch. If you're designing a system that serves millions of users globally, mentioning the shield layer tells the interviewer you understand how CDNs actually protect backend infrastructure, not just speed up responses.

Your 30-second explanation: "When a user makes a request, DNS routes them to the nearest CDN edge node via geo-DNS or anycast. If the edge has the content cached and it's still fresh, it serves it directly. That's a cache hit. On a miss, or when the cached content's max-age has expired, the edge checks a shared shield cache before going to the origin. If the content might still be valid, the CDN uses conditional revalidation with ETags to avoid re-downloading unchanged data. The origin returns content along with Cache-Control headers that dictate freshness policy. From that point on, all nearby users get the cached version without touching the origin. So the CDN is really a geographically distributed cache where the origin stays in control of what gets cached and for how long."

One last thing worth internalizing before you move on: the CDN never invents content. Every byte it serves originated from your backend. That said, modern CDNs do perform transforms on that content if you configure them to: image optimization (resizing, format conversion to WebP/AVIF), on-the-fly compression (Brotli, gzip), and security functions like WAF rules and DDoS mitigation at the edge. These are powerful capabilities, but they're transformations of origin content, not replacements for it. When you're at the whiteboard, always draw the arrow from origin to CDN, never the reverse. It keeps your mental model clean and your explanation airtight.

Common mistake: Candidates sometimes describe CDNs as purely passive caches. If an interviewer asks "what else can a CDN do besides caching?", be ready to mention edge compute, image optimization, compression, and security features like DDoS absorption. It shows you understand the modern CDN as an edge platform, not just a file mirror.

Patterns You Need to Know

In an interview, you'll usually need to pick a specific approach. Here are the ones worth knowing.

Pull-Based (Lazy) Caching

This is the default model for nearly every CDN, and the one you should reach for first in an interview. The edge PoP doesn't have anything cached until a real user asks for it. On that first request, the edge says "I don't have this," fetches it from the origin (or a shield cache), stores the response locally, and serves it. Every subsequent request for the same content gets the cached copy until the TTL expires.

The origin controls caching behavior through HTTP headers. Cache-Control: max-age=86400 tells the edge "keep this for 24 hours." ETag and Last-Modified let the edge do conditional revalidation when the TTL expires, asking the origin "has this changed?" instead of re-downloading the whole thing. You should be comfortable naming these headers in an interview because they're how you demonstrate that you understand the mechanism, not just the concept.

The obvious downside is the cold-start problem. The very first user to request a piece of content from a given PoP always gets the slow path. If you have 200 PoPs worldwide, that's potentially 200 slow first-requests. For most workloads this is fine because the content gets warm quickly. But for a flash sale or a breaking news page where millions of users arrive simultaneously, that cold start can cascade into a thundering herd against your origin.

Interview tip: When you mention pull-based caching, say something like "the first request per PoP pays the latency cost, but after that we're serving from edge cache with single-digit millisecond responses." It shows you understand both the happy path and the limitation.

When to reach for this: any read-heavy system where content is requested organically over time. Product catalogs, blog posts, user avatars, API responses that don't change per-request. This is your default.

Pull-Based (Lazy) Caching: Content Fetched on First Request

Push-Based (Proactive) Caching

Sometimes you know exactly what content users will want and exactly when they'll want it. A new movie trailer dropping at midnight. A product launch page going live at 9 AM. A firmware update that every IoT device will pull within the hour. In these cases, waiting for the first user to trigger a cache fill is wasteful and risky.

With push-based caching, your origin (or a build pipeline) proactively distributes content to CDN edge nodes before any user requests it. You upload the assets, the CDN's control plane fans them out across PoPs, and when users arrive, every single one of them gets an instant cache hit. No cold start, no origin spike.

The tradeoff is storage and coordination cost. You're paying to store content at hundreds of PoPs whether or not users in every region actually request it. You also need tooling to manage what gets pushed, when, and to which regions. For a company with a small content catalog (a streaming service's featured titles, for example), this is manageable. For a long-tail e-commerce site with 50 million product images, pushing everything everywhere is absurd.

When to reach for this: predictable, high-traffic content where you cannot afford a slow first request. Think scheduled launches, live event streams, or large file distributions.

Push-Based (Proactive) Caching: Pre-Warmed Before Users Arrive

	Pull-Based (Lazy)	Push-Based (Proactive)
When content reaches edge	On first user request	Before any user requests
Cold-start latency	First request is slow per PoP	None; pre-warmed
Storage efficiency	Only caches what's requested	Caches everything you push, even if unused
Best for	Organic, long-tail traffic	Scheduled launches, predictable demand

For most interview problems, you'll default to pull-based caching. Reach for push-based when the interviewer describes a scenario with a known traffic spike at a known time, like "we're launching a new product page and expect 10 million hits in the first minute." That's your cue.

Cache Invalidation Strategies

"How do you handle stale content?" This question will come up. And if you just say "set a short TTL," the interviewer will push harder, because a short TTL means more origin hits, which defeats half the purpose of having a CDN. You need to know three distinct strategies and when each one fits.

TTL-based expiry is the simplest. You set Cache-Control: max-age=3600 and the edge serves the cached version for one hour, then revalidates or re-fetches. This works well for content that changes on a predictable schedule, like a news homepage that updates every hour. The risk is obvious: if you publish a correction to an article, users might see the old version for up to 59 minutes. For some content that's acceptable. For others it's not.

Versioned URLs sidestep the staleness problem entirely. Instead of serving style.css, you serve style.a3f8b2.css where the hash changes whenever the file changes. Since the URL itself is different, the CDN treats it as brand-new content. You can set an extremely long TTL (a year, even) because the old URL will never be requested again once your HTML references the new one. This is the gold standard for static assets in production. Build tools like Webpack and Vite do this automatically.

Explicit purge APIs give you an escape hatch. Every major CDN exposes an API where you can say "invalidate everything matching /api/product/123*" and the CDN will evict those entries across all PoPs. This is powerful but slow (propagation can take seconds to minutes across a global network) and dangerous at scale. A bug in your purge logic can wipe your entire cache and send all traffic crashing back to origin.

Common mistake: Candidates often propose purge APIs as their primary invalidation strategy. In practice, purging is a safety valve, not a routine mechanism. Versioned URLs for static assets and reasonable TTLs for dynamic content should handle 95% of cases. Mention purge as the fallback for "we just published something wrong and need it gone now."

When to reach for each: TTL-based for content with predictable freshness windows. Versioned URLs for anything your build pipeline produces. Purge APIs for emergency corrections or user-triggered updates (like a profile photo change where you purge that specific path on write).

Cache Invalidation: Three Strategies Compared

Edge Compute

This one catches candidates off guard because it stretches the definition of what a CDN does. Services like Cloudflare Workers, AWS Lambda@Edge, and Fastly Compute let you run actual code at the PoP, not just serve cached files. The request hits the edge, your function executes, and it can modify headers, rewrite URLs, check auth tokens, select an A/B test variant, or assemble a personalized response, all without ever touching your origin.

Consider a geo-based redirect. A user in Germany hits your site, and instead of routing that request all the way to your origin in Virginia just to get a 302 redirect to the .de domain, an edge function reads the request's geo headers and issues the redirect in under 5ms. Or think about A/B testing: the edge function hashes the user's cookie, picks variant A or B, and either serves the corresponding cached page or adds a header that your origin uses to generate the right version.

The tradeoff is operational complexity. Debugging code running across 200+ PoPs is genuinely hard. Logs are distributed. Cold starts exist for some runtimes. You're limited in execution time and memory. And if your edge function has a bug, it's a bug that affects every user globally, instantly. There's no gradual rollout by region unless you build that yourself.

Key insight: Edge compute shines for lightweight, latency-sensitive decisions that don't need your full application stack. Auth token validation, geo-routing, header manipulation, cache key customization. If you find yourself wanting to query a database from the edge, you've probably gone too far.

When to reach for this: the interviewer describes a need for per-request personalization or routing logic, but the underlying content is still mostly cacheable. "We need to show different pricing by region" or "we want to validate JWT tokens before requests hit our API" are classic edge compute signals.

What Trips People Up

Here's where candidates lose points, and it's almost always one of these.

The Mistake: The Magic Box CDN

This one is the most common by far. The candidate is designing a system with global users, they recognize the latency problem, and they say something like: "We'll just throw a CDN in front of it." Then they move on.

That's not a design decision. That's a wish.

The interviewer hears that you know the word "CDN" but haven't thought about what actually happens inside one. They'll follow up with "What gets cached?" or "How do you handle updates?" and suddenly you're scrambling. The vague answer has now created a trap you walked into yourself.

Common mistake: Candidates say "we'll put a CDN in front of our static assets" and treat it as a solved problem. The interviewer hears "I don't actually know how caching works."

What you should do instead: specify the what, the how long, and the what-if. "I'd cache product images and CSS bundles at the CDN edge with a 24-hour TTL. For cache invalidation, we'd use content-hashed filenames so deploys automatically bust the cache without needing explicit purges." That's three sentences and it tells the interviewer you understand the mechanics, not just the concept.

Interview tip: Any time you introduce a CDN into your design, immediately follow it with your TTL strategy and invalidation approach. Don't wait to be asked.

The Mistake: Forgetting That Cache Keys Aren't Just URLs

Most candidates think of CDN caching as "same URL, same response." In many real systems, that's wrong.

Imagine your e-commerce site serves product pages in English, Japanese, and German. The URL is identical: /products/12345. But the response body is completely different depending on the Accept-Language header. If your CDN caches the first response it sees and serves it to everyone, your Japanese users are getting English pages.

This is where Vary headers and custom cache keys come in. The Vary: Accept-Language header tells the CDN to treat each language variant as a separate cache entry for the same URL. You can also configure custom cache keys that factor in cookies, device type, or query parameters.

Candidates who skip this entirely are missing a real-world complexity that interviewers love to probe. If you're designing a system with any kind of personalization or localization, mention it before the interviewer has to drag it out of you. Something like: "Since we're serving multiple locales, I'd configure the CDN cache key to include the language header so we don't cross-contaminate cached responses."

The flip side matters too. Every dimension you add to the cache key fragments your cache and reduces your hit rate. Keying on Authorization headers, for example, effectively gives every user their own cache, which defeats the purpose. Be ready to articulate that tradeoff.

The Mistake: Ignoring the Thundering Herd

Your most popular product page has a cached response with a 60-second TTL. At second 61, that cache entry expires. In that same instant, 500 users request the page. Every single one of those requests is a cache miss. All 500 hit your origin server simultaneously.

This is the thundering herd problem, and candidates almost never bring it up.

Common mistake: Candidates describe TTL-based expiry as if cache entries expire gracefully, one request at a time. In reality, popular content expiring can cause a sudden traffic spike that overwhelms the origin.

The fix is request coalescing (sometimes called request collapsing). When the edge PoP gets 500 simultaneous misses for the same resource, it sends exactly one request to the origin and holds the other 499 until the response comes back. Then it serves all of them from the freshly cached copy.

You don't need to explain the implementation details of coalescing in an interview. But naming the problem and the mitigation pattern shows you've thought beyond the happy path. Try something like: "For high-traffic keys, I'd want request coalescing at the edge so a TTL expiry doesn't turn into a thundering herd against our origin." One sentence. Huge signal.

Another option worth mentioning: stale-while-revalidate. The edge serves the slightly stale cached version to users while fetching a fresh copy from the origin in the background. Users get fast responses, the origin gets a single revalidation request, and nobody notices the brief staleness. This is controlled via the Cache-Control: stale-while-revalidate directive, and it's one of the most practical tools in the CDN toolbox.

The Mistake: Thinking CDNs Are Only for Images and CSS

"I'd put our static assets on the CDN. Images, JavaScript, CSS files."

That's fine. It's also 2015-era thinking.

Modern CDNs cache API responses, server-rendered HTML pages, JSON payloads, even GraphQL query results. If your product catalog API returns the same JSON for /api/products/12345 thousands of times per second and the data changes once an hour, that's a perfect CDN candidate. Limiting your CDN discussion to static files tells the interviewer you haven't worked with CDNs in production recently.

The key distinction isn't static vs. dynamic. It's cacheable vs. uncacheable. A product listing page that's the same for every user? Cacheable, even though it's dynamically generated. A user's shopping cart? Uncacheable, because it's unique per session.

Interview tip: Instead of saying "I'd cache static assets on the CDN," try: "I'd cache anything that's read-heavy and shared across users, whether that's static files, rendered HTML, or API responses. For our product catalog endpoint, a 30-second TTL at the edge would absorb the vast majority of reads without meaningful staleness."

That reframing, from "static files" to "read-heavy shared content," shows the interviewer you understand the underlying principle rather than just following a checklist.

How to Talk About This in Your Interview

When to Bring It Up

Don't lead with "we should add a CDN." That's a solution before you've established the problem, and interviewers will mentally dock you for it. Instead, listen for the setup, then connect the dots out loud.

The signals that should trigger your CDN instinct:

"Users are globally distributed" or any mention of multiple regions. Latency across oceans is physics. A CDN is the standard answer.
"The workload is read-heavy" or "reads outnumber writes 100:1." High read ratios with repetitive content are exactly what caching layers exist for.
"We're serving a lot of static assets" like images, videos, CSS, JS bundles. This one's obvious, but still say why before you say what.
"We need to handle traffic spikes" like a product launch, a live event, or a viral moment. CDN edge nodes absorb the burst so your origin doesn't fall over.
"Latency is a priority" for any user-facing content that doesn't change per-request. Even semi-static API responses qualify.

Here's the framing that works: "Since we've established this is read-heavy with a global user base, I'd place a CDN in front of our static and semi-static content to serve requests from edge PoPs close to users." That one sentence shows you're reasoning from requirements, not reciting a checklist.

Common mistake: Proposing a CDN for an internal dashboard used by 50 people in one office. If the interviewer describes a small, localized user base, a CDN adds complexity for zero benefit. Showing restraint here actually earns you more points than showing off.

Sample Dialogue

Interviewer: "So we need to serve product images to users worldwide. Some of these products get millions of views per day. How would you approach that?"

You: "The images themselves are static once uploaded, and we're looking at a massively read-heavy pattern with global users. I'd put a CDN in front of our image storage. Users in Tokyo hit a PoP in Tokyo, users in Frankfurt hit one in Frankfurt, and the origin in us-east-1 only gets touched on cache misses. For TTLs, since product images rarely change, I'd set a long Cache-Control max-age, something like 24 hours or even longer, and use content-hashed filenames, where a hash of the file's contents is embedded in the filename itself, like product-abc123.jpg. That way each version of an image maps to a unique URL path."

Interviewer: "Okay, but what happens when a seller updates their product photo? How fast does the new one show up?"

You: "Because the hash is derived from the file contents, the new image gets a completely different filename and therefore a different URL. The real mechanism of the update is on the referencing side: the HTML page or API response that points to the image gets updated to include the new URL. Once that happens, every client requests the new path, which is a cache miss that gets fetched fresh from origin. The old URL might still sit in CDN caches until it naturally expires, but nobody's requesting it anymore, so it doesn't matter. That's the beauty of content-hashing: you sidestep cache invalidation entirely. The tradeoff is that the layer serving the image reference, your product page HTML or your catalog API, needs its own caching and invalidation strategy. If that response is cached with a stale reference, users still see the old image."

Interviewer: "And if we weren't using content-hashed filenames?"

You: "Then you'd be caching by a stable URL like /images/product-42.jpg, and on upload you'd need to fire a purge request through the CDN's API. That propagates across PoPs in a few seconds typically, but there's a window where some edges still serve the old version. It works, but it's more operationally fragile. Content-hashing is almost always the better choice for assets."

Interviewer: "What if the origin goes down during a traffic spike?"

You: "That's actually one of the underrated benefits here. As long as the cached content hasn't expired, the CDN keeps serving it regardless of origin health. We could also configure stale-while-revalidate behavior, where the edge serves a slightly stale response while asynchronously checking the origin. For a product image, serving a version that's a few minutes old is perfectly fine. I'd also want to monitor our cache hit ratio. If we're consistently above 95%, origin load stays manageable even during spikes. If it drops, that's an early warning that something's wrong with our caching strategy."

Follow-Up Questions to Expect

"How do you decide what TTL to set?" Match it to how often the content actually changes. Static assets with content-hashed filenames can be cached for a year since the URL changes when the content does. User-facing API responses that change hourly might get a 60-second TTL with stale-while-revalidate.

"What about personalized content? Can you cache that?" You can, but the cache key has to include whatever makes the response unique, like a user segment or locale. Mention the Vary header or custom cache keys, and note that high cardinality (per-user caching) destroys your hit rate. If the interviewer pushes on this, a strong follow-up is to mention edge compute: services like CloudFront Functions, Cloudflare Workers, or Lambda@Edge can run lightweight logic at the PoP to assemble personalized responses from cached fragments. You might cache the base product page at the edge and inject a personalized banner or recommendation block via an edge function, keeping the hit rate high for the expensive base content while still tailoring the experience. That distinction between "cache everything per-user" and "cache the common parts, personalize at the edge" is exactly the kind of nuance that signals senior thinking.

"How do you handle the thundering herd when a popular item's cache expires?" Request coalescing at the edge. When hundreds of requests arrive simultaneously for an expired entry, the PoP sends only one request to the origin and holds the rest until the response comes back.

"Would you use a CDN for API responses, or just static files?" Absolutely for API responses, if they're cacheable. A product catalog endpoint that returns the same JSON for every user is a great candidate. Anything with auth-dependent data needs more careful thought around cache keys.

What Separates Good from Great

A mid-level answer says "I'd add a CDN for static assets." A senior answer specifies the TTL strategy, explains how invalidation works on content updates, and acknowledges the consistency window that caching introduces. The difference is showing you've operated a CDN, not just drawn one on a whiteboard.

Quantifying impact separates you from the pack. Saying "this moves our p50 latency from around 200ms to under 20ms for cached content and offloads roughly 90% of read traffic from origin" makes your answer concrete and memorable. Even rough numbers work. Interviewers remember the candidate who gave them numbers.

Closing with operational awareness is the senior move most people skip. After your CDN discussion, add one sentence: "I'd monitor cache hit ratios per PoP and set an alert if origin traffic spikes unexpectedly, since that usually means a caching rule broke or a new content type isn't being cached." That signals you think beyond architecture into day-two operations.

Key insight: The interview win isn't knowing that CDNs exist. It's explaining what you'd cache, how long you'd cache it, and what happens when the content changes, all tied back to the specific requirements the interviewer gave you.