API Design Patterns: The Mental Models That Win System Design Interviews

Why This Matters

You're fifteen minutes into a system design interview for a URL shortener. You've sketched out the high-level architecture, talked about hashing strategies, and the interviewer nods along. Then they say: "Okay, let's define the API." You write POST /createURL on the whiteboard, and something shifts. The interviewer's eyebrows go up. Not in a good way. That one endpoint name just told them you think in RPC verbs instead of resources, and now they're wondering what else you've been hand-waving. API design is the part of system design where your engineering instincts are most visible. It's not about memorizing REST conventions. It's about showing that you understand the contracts between services, that you've thought about what happens when things fail, and that you can model a problem in a way that won't collapse under real-world pressure.

Think about how Stripe built a payments empire on top of their API. Their resource model (/v1/charges, /v1/customers, /v1/subscriptions) isn't just clean naming. Every endpoint implies a data ownership boundary, a set of failure modes, and a versioning commitment that thousands of companies depend on. When you get API design right, your database schema, caching layer, and scaling strategy all fall into place naturally. Get it wrong, and every layer of your system inherits the confusion. The interviewer knows this, which is why "design the API" is never a throwaway step. It's the moment they're evaluating whether you build systems from the seams outward or just slap endpoints on top of a database.

By the end of this lesson, you'll have a clear mental model for the API patterns that actually come up in interviews: pagination strategies, idempotency, versioning, rate limiting, and batch operations. More importantly, you'll know how to talk about them under pressure, when to reach for each one, and how to articulate your tradeoffs in a way that makes the interviewer want to work with you.

How It Works

Think of an API as a restaurant menu. The kitchen (your backend) can make all sorts of things, but the menu (your API) only exposes what diners (clients) actually need, described in terms they understand. You don't list recipes and raw ingredients. You list dishes, organized by how people want to order them.

That's the mental model. Now let's build it up piece by piece.

The Building Blocks

Every well-designed API rests on five things, and you should be able to name them if asked:

Resources are the nouns of your API. Users, orders, messages, payments. Each resource gets a URL path like /users, /orders/{id}, /users/{id}/messages. The strong convention is to use plural, lowercase names for these paths. It's not a technical requirement baked into HTTP, but it's so universally expected that deviating from it in an interview (or in production) will raise eyebrows. When you're whiteboarding, the first thing you should do is identify these nouns out loud. The interviewer is watching whether you think in terms of entities and relationships or just start throwing endpoints at the wall.

HTTP verbs give those nouns meaning. GET retrieves, POST creates, PUT replaces entirely, PATCH updates partially, DELETE removes. These aren't arbitrary conventions. GET is defined as safe and cacheable, which means proxies, CDNs, and browsers can optimize around it. POST is not. If you mix these up during an interview (say, POST /getUser), it signals you haven't built real APIs.

Request and response envelopes are the standardized wrappers around your data. A response envelope typically looks like { "data": ..., "meta": ..., "errors": ... }. Consistent structure means every client can parse every response the same way, even error responses. This is one of those details that separates someone who's shipped APIs from someone who's only read about them.

Status codes communicate what happened without the client needing to parse the body. 200 means success. 201 means created. 400 means the client sent garbage. 404 means the resource doesn't exist. 500 means your server broke. You don't need to memorize all of them, but know the major families: 2xx for success, 4xx for client errors, 5xx for server errors.

Error contracts are the part almost everyone forgets. When something goes wrong, what does the client get back? A well-designed API returns a consistent error shape every time: an error code, a human-readable message, and optionally a details field with validation specifics. If you mention this during your interview without being prompted, you'll stand out.

The Request Lifecycle

When a client makes an API call, here's what actually happens, step by step.

The client sends an HTTP request to your API gateway. The gateway is the front door. It handles authentication (is this caller who they say they are?), authorization (are they allowed to do this?), rate limiting (have they exceeded their quota?), and basic request validation (is the JSON well-formed? are required headers present?). If anything fails here, the request never reaches your application code. The gateway returns an error response directly.

If the request passes the gateway, it gets routed to the appropriate API server. This is where your business logic lives. The server parses the request, applies domain rules, reads from or writes to the data store, and then formats a response using your standard envelope.

The response flows back through the gateway to the client, carrying the right status code, the right headers (cache-control, rate-limit-remaining, etc.), and a body the client knows how to parse.

Here's what that flow looks like:

Three properties of this flow matter in interviews:

The gateway is a chokepoint by design. Every request passes through it, which means it's where you enforce cross-cutting concerns like auth and rate limiting. When your interviewer asks "where would you add rate limiting?" the answer is almost always "at the gateway layer." Don't scatter it across individual services.

The API server owns the response shape, not the database. Your server queries the data store, but it transforms the result before sending it back. The client should never see raw database rows, internal IDs they can't use, or fields that only matter to your backend. This separation is what lets you change your database schema without breaking every client.

Failures can happen at every layer. The gateway can reject the request. The server can hit a bug. The data store can be unreachable. Your API design needs to account for all three, and each should produce a different, clear error response. Interviewers care about this because it reveals whether you think about failure modes or only the happy path.

Your 30-second explanation: "An API request hits the gateway first for auth, rate limiting, and validation. If it passes, it's routed to the API server, which runs business logic, talks to the data store, and returns a structured response with a status code and consistent envelope. The key insight is that the API layer is a translation layer; it exposes what callers need, not what the database stores."

REST vs. RPC: Pick a Side (and Know Why)

There are two fundamental ways to think about API design, and interviewers will notice which one you default to.

Resource-oriented (REST) thinking says: "I have nouns, and I apply standard verbs to them." You get /orders/123 and you use GET to read it, PUT to replace it, DELETE to remove it. The URL identifies the thing. The verb says what to do with it. This maps beautifully to CRUD operations, and it's what most web APIs use.

RPC-style thinking says: "I have actions, and I call them." You get endpoints like /cancelOrder, /transferFunds, /recalculateShipping. The URL is a verb. The body contains the parameters.

In most system design interviews, default to REST. It's what interviewers expect, it's what most real systems use, and it forces you to think clearly about your data model. The moment you say "the resources are Users, Orders, and Products," you've implicitly defined your entities, their relationships, and the operations that make sense on them.

But know when RPC makes more sense. If you're designing a workflow that doesn't map to a single resource (like "transfer $50 from account A to account B"), forcing it into REST creates awkward endpoints. A POST /transfers with a body describing the operation is cleaner than trying to PATCH two account resources in sequence. If the interviewer's problem involves complex multi-step operations or real-time streaming (think chat, live collaboration), mention that you'd reach for RPC-style endpoints or WebSockets for those specific cases while keeping the rest RESTful.

Interview tip: If you're not sure which style to pick, say this: "These operations map cleanly to CRUD on resources, so I'll go with REST. If we hit a workflow that doesn't fit, I'll carve out an RPC-style endpoint for that specific case." That one sentence shows flexibility without over-engineering.

APIs Are Promises

Once you publish an API endpoint and a client starts calling it, you've made a promise. The field user_id will always be an integer. The /orders endpoint will always return a list. The created_at timestamp will always be ISO 8601.

Breaking that promise breaks every client that depends on it. This is why experienced engineers agonize over field names, response shapes, and whether a field should be a string or an integer. It's not pedantry. It's because renaming user_id to userId six months from now means coordinating a migration across every mobile app, every partner integration, and every internal service that calls you.

This "APIs as contracts" framing is what makes versioning, backward compatibility, and deprecation strategies necessary. You'll cover those patterns later, but the mental model to hold right now is simple: every field you add is easy, every field you remove is painful, and every field you rename is a breaking change. When you're whiteboarding, pick names carefully the first time. Say them out loud before writing them down.

Design for the Caller, Not the Database

Here's a mistake that's surprisingly common, even among experienced engineers. You look at your database schema, see a users table and an addresses table joined by a foreign key, and you create /users and /addresses as separate endpoints.

But if the caller always needs the user's address when they fetch a user profile, you've just forced them to make two API calls for one screen. Your API should reflect how clients actually use the data, not how you happened to normalize it in Postgres.

Sometimes that means nesting: GET /users/123 returns the address embedded in the response. Sometimes it means creating a resource that doesn't exist in your database at all, like a /feed endpoint that aggregates posts, recommendations, and ads into a single response. The "resource" in REST doesn't have to be a database table. It's an abstraction that serves the caller.

When you're designing an API in an interview, ask yourself: "If I were the mobile developer consuming this, would I be annoyed?" If the answer is yes, because they need five calls to render one screen, or because they get back 40 fields when they only need 3, redesign it. The interviewer wants to see that you think about the developer experience on the other side of the wire.

Patterns You Need to Know

In an interview, you'll usually need to pick a specific approach. Here are the ones worth knowing.

Pagination: Offset vs. Cursor

Offset pagination is the one everyone learns first. The client says "give me page 11" and the server translates that to OFFSET 200 LIMIT 20, skipping the first 200 rows and returning the next 20. It's dead simple to implement and lets users jump to any page they want. The problem shows up when data is being written while someone is paginating. If a new item gets inserted at the top of the list between page requests, the user sees a duplicate on the next page, or worse, misses an item entirely. On top of that, large offsets are expensive. The database still has to scan through all the skipped rows before returning your results.

Cursor pagination sidesteps both problems. Instead of saying "skip 200 rows," the client says "give me 20 items after this specific marker." That marker (the cursor) is typically an encoded reference to the last item the client saw, like a timestamp or a unique ID. The server query becomes WHERE id > last_seen_id LIMIT 20, which is index-friendly and stable regardless of concurrent inserts. The tradeoff? You can't jump to page 47. You can only go forward (or backward) from where you are. For feeds, timelines, and any dataset with high write volume, that's perfectly fine.

When to reach for this: if the interviewer's system involves a scrolling feed, search results, or any list that updates frequently, say "cursor pagination" and explain why offset would produce inconsistent results.

Interview tip: Don't just name the pattern. Say something like: "Offset pagination would work for a small, mostly-static catalog, but this feed has high write throughput, so I'd use cursor-based pagination to avoid skipped or duplicated items as new content is inserted."

Idempotency Keys

Picture this: a user taps "Pay Now," the request reaches your server, the charge goes through, but the response times out on the way back. The client retries. Without protection, you've just charged them twice. This is the scenario interviewers love to probe, and idempotency keys are the answer.

The mechanism is straightforward. The client generates a unique identifier (usually a UUID) for each logical operation and sends it as a header, something like Idempotency-Key: 550e8400-e29b.... Before processing the request, the server checks an idempotency store (Redis or a database table) for that key. If the key exists, the server returns the stored response from the first attempt without doing any work. If the key is new, the server processes the request, stores the result keyed by that ID, and then responds. Retries become free. The same key always produces the same outcome.

This pattern matters for any non-idempotent operation. GET is naturally idempotent (reading the same resource twice doesn't change anything), and PUT is idempotent by definition (replacing a resource with the same data yields the same state). But POST creates something new each time, so it needs explicit protection. The same logic applies to any state-changing operation where a network retry could cause real damage: payments, order placements, account creation.

When to reach for this: any time your design involves money, inventory decrements, or writes that would be harmful if duplicated. If the interviewer asks "what happens if the client retries?", this is your answer.

Common mistake: Candidates sometimes say "we'll just deduplicate on the server side" without explaining how. The interviewer wants to hear the specific mechanism: client-generated key, server-side lookup, stored response. Vague hand-waving about dedup doesn't count.

API Versioning

Your API is a contract. The moment an external client starts calling it, you can't just rename fields or restructure responses without breaking someone's integration. Versioning gives you a way to evolve your API while keeping older consumers alive.

There are three common strategies. URL path versioning (/v1/users, /v2/users) is the most popular in practice and the easiest to reason about in an interview. It's visible, explicit, and trivial to route at the gateway level. Header versioning (Accept: application/vnd.myapi.v2+json) keeps URLs clean but hides the version from anyone glancing at a URL in a log or browser. Query parameter versioning (/users?version=2) is the least common and generally the least loved; it muddies the distinction between resource identification and configuration.

Most companies pick URL path versioning because it's the simplest to operate. Your API gateway can route /v1/* to one set of handlers and /v2/* to another. Monitoring and logging become clearer when the version is right there in the path. The real interview conversation, though, isn't about which strategy to pick. It's about backward compatibility. When you introduce v2, what happens to v1? You should be able to say: "We'd maintain v1 for a deprecation window, communicate the timeline to consumers, and eventually sunset it. During the overlap, both versions hit the same underlying data, but the response shapes differ."

When to reach for this: if the interviewer mentions multiple client types (mobile app, web app, third-party integrations) or asks about evolving the API over time.

Key insight: Don't over-engineer versioning if the interviewer hasn't signaled it matters. If you're designing a single internal service with one consumer, mentioning "I'd version the URL path so we can evolve this later" in one sentence is enough. Spending five minutes on deprecation policy for an internal API is a misread of the room.

Rate Limiting and Throttling

Rate limiting isn't just about stopping abuse. It's about fairness. Without it, one misbehaving client (or one customer running a poorly written script) can starve everyone else of capacity. In an interview, mentioning rate limiting shows you think about production realities, not just happy-path functionality.

The token bucket algorithm is the one to know cold. Imagine each user has a bucket that holds, say, 100 tokens. Every request costs one token. The bucket refills at a steady rate (maybe 10 tokens per second). If the bucket is empty, the request gets rejected with a 429 Too Many Requests. This naturally allows short bursts (the user can spend all 100 tokens at once) while enforcing a sustained rate over time. The sliding window approach is the other common option. It counts requests within a rolling time window (e.g., "no more than 1,000 requests in the last 60 seconds"). Sliding window is simpler to reason about but doesn't handle bursts as gracefully.

Where you enforce the limit matters too. Per-user limits protect against a single account going rogue. Per-endpoint limits protect expensive operations (like search or report generation) from overwhelming specific backend services. Most production systems layer both. In your interview, the rate limiter typically lives in the API gateway, which checks limits before the request ever reaches your application servers.

When to reach for this: any design with public-facing APIs, multi-tenant systems, or endpoints that hit expensive downstream resources. If the interviewer asks "what if one user sends a million requests?", you need this pattern ready.

Here's a sample exchange that comes up more than you'd expect:

Interviewer: "What happens when a client exceeds the rate limit?"

You: "The gateway returns a 429 with a Retry-After header telling the client how long to wait. We'd also include the limit and remaining count in response headers on every request, like X-RateLimit-Remaining: 12, so well-behaved clients can throttle themselves before hitting the wall."

Interviewer: "What about internal services? Do they get rate limited too?"

You: "Depends on the trust boundary. Between our own microservices, I'd lean toward circuit breakers and backpressure rather than hard rate limits. But if an internal service is calling a shared resource like a search index, a per-service quota makes sense to prevent one team's bug from taking down everyone."

Bulk and Batch Operations

Standard REST works beautifully for single-resource operations. Creating one order, updating one profile, deleting one record. It falls apart when a client needs to import 10,000 products or send notifications to 50,000 users. Making 10,000 individual POST requests is slow, wasteful, and fragile.

The straightforward approach is a batch endpoint: POST /products/batch that accepts an array of items in the request body. For small batches (a few hundred items), this can work synchronously. The server processes all items and returns the results in one response. But for large batches, you need to go async. The server accepts the request, validates the basic structure, enqueues a job, and immediately returns a 202 Accepted with a job ID. The client then polls a status endpoint (GET /jobs/{id}) to check progress. That status response should include how many items have been processed, how many succeeded, how many failed, and ideally the specific errors for failed items.

The hardest part of batch design is partial failure semantics. If you submit 100 items and 3 have validation errors, do you reject the entire batch? Process the 97 good ones and report the 3 failures? The answer depends on the use case. For financial transactions, you might want all-or-nothing atomicity. For a product catalog import, processing the valid items and reporting errors on the rest is usually the right call. Tell your interviewer which approach you're choosing and why.

When to reach for this: any time the interviewer's scenario involves bulk data ingestion, mass notifications, or operations where processing one-at-a-time would be unreasonably slow.

Interview tip: Mentioning 207 Multi-Status as a response code for partial success/failure is a small detail that signals real-world experience. Most candidates only know 200, 201, 400, and 500.

Pattern	Best For	Key Tradeoff	What to Tell Your Interviewer
Cursor Pagination	High-write feeds, large datasets	No random page access	"Stable results under concurrent writes"
Idempotency Keys	Payments, order creation, any risky POST	Storage overhead for key-to-response mapping	"Prevents double-charges on retry"
URL Path Versioning	Public APIs, multi-consumer systems	Multiple handler versions to maintain	"Explicit, easy to route and monitor"
Token Bucket Rate Limiting	Multi-tenant APIs, public endpoints	Requires shared state for counters	"Allows bursts while enforcing sustained limits"
Async Bulk Operations	Large imports, mass updates	Polling complexity, partial failure handling	"Decouples acceptance from processing"

For most interview problems, you'll default to cursor pagination and URL path versioning because they're the safest, most broadly applicable choices. Reach for idempotency keys the moment your design involves payments or any write where a duplicate would cause real harm. Rate limiting and bulk operations tend to come up as follow-up questions rather than initial design choices, but having them ready shows the interviewer you've operated systems in production, not just designed them on a whiteboard.

What Trips People Up

Here's where candidates lose points, and it's almost always one of these.

The Mistake: Your API Is a Mirror of Your Database

You've just sketched out your schema. Users table, Addresses table with a foreign key back to Users, maybe an Orders table. Then you start writing endpoints and out comes:

GET /users/{id}
GET /addresses/{userId}
GET /orders/{userId}

The interviewer watches you do this and thinks: "They're not designing an API. They're exposing their database."

The problem is that your caller almost never wants an address by itself. When a mobile app renders a user profile, it needs the user and their address in one call. When it shows an order confirmation, it needs the order, the items, and the shipping address together. Three round trips to your server for one screen is a performance tax your callers shouldn't pay.

Common mistake: Candidates say "I'll have a /users endpoint and a separate /addresses endpoint since they're different tables." The interviewer hears "I'm thinking about storage, not about who's calling this API and why."

What to say instead: "The caller rendering a profile always needs the address alongside the user, so I'll nest it. GET /users/{id} returns the address inline. If there's ever a use case for managing addresses independently, I can add a dedicated endpoint later, but I'll start with the common access pattern."

Good APIs are shaped by use cases, not by your ER diagram.

The Mistake: The Happy Path and Nothing Else

This one is subtle because candidates don't realize they're doing it. You design five endpoints, describe the request and response bodies, maybe even sketch some JSON. Everything works perfectly. The user exists. The payment goes through. The data is always valid.

Then the interviewer asks: "What happens if the user sends a negative quantity?"

Silence. Or worse, a vague "I'd return an error."

What error? What status code? What does the body look like? Is it the same shape as your other errors, or does every endpoint invent its own format? Interviewers notice when you've only thought about sunshine and rainbows.

Here's what actually goes wrong in production when you skip error contracts: every frontend developer consuming your API has to guess. One endpoint returns {"error": "not found"}, another returns {"message": "No such resource", "code": 404}, and a third just sends back an empty 200. Your client team will hate you.

Interview tip: After you sketch your endpoints, volunteer this before anyone asks: "For errors, I'd use a consistent envelope. Something like {error: string, message: string, details: []}. 400 for validation failures, 404 for missing resources, 409 for conflicts like duplicate creation, 503 if a downstream dependency is unavailable." That single sentence changes how the interviewer sees you.

The Mistake: POST for Everything

I've heard this in real interviews more than once:

"So the client would call POST /getUser with the user ID in the body..."

Stop. The interviewer just mentally downgraded your seniority level.

POST /getUser is RPC thinking dressed in REST clothing, and it's wrong on multiple levels. GET requests are safe (no side effects), idempotent, and cacheable. The moment you use POST to fetch data, you've thrown away browser caching, CDN caching, and the ability for any intermediate proxy to help you. You've also made it impossible for anyone reading your API to know which calls are safe to retry.

This isn't pedantry. Verb semantics carry real engineering meaning:

GET reads data. Safe to retry, safe to cache, no side effects.
POST creates something new. Not idempotent by default (which is why you need idempotency keys).
PUT replaces a resource entirely. Idempotent: calling it twice produces the same result.
PATCH partially updates. Send only the fields you want to change.
DELETE removes. Should be idempotent: deleting something twice shouldn't error on the second call.

If you find yourself writing POST /updateOrder, rename it to PATCH /orders/{id}. If you catch yourself saying POST /deleteUser, just stop and use DELETE /users/{id}.

Common mistake: Candidates say "I'll use POST here to get the list of messages." The interviewer hears "This person hasn't built a production API before."

The Mistake: Ignoring What Happens When Half the Batch Fails

You're designing a system that needs to import contacts, process bulk payments, or update thousands of records. You propose a batch endpoint. Great instinct. Then the interviewer asks:

"What if 97 of the 100 items succeed and 3 fail validation?"

Most candidates freeze here or give one of two bad answers. Either "I'd reject the whole batch" (terrible UX for the 97 valid items) or "I'd just process the ones that work" (and silently drop the failures? The caller never finds out?).

Neither answer is good enough. The interviewer is testing whether you've thought about partial failure semantics, which come up constantly in distributed systems.

What to say: "I'd return a 207 Multi-Status response where each item in the response array has its own status. The caller can see that items 0 through 96 succeeded with 201s, and items 97, 98, 99 failed with 400s and specific error messages. This way the client can fix and retry just the failures."

If the batch is large enough that it can't complete synchronously, shift to an async model. Return 202 Accepted with a job ID, let the client poll GET /jobs/{id}, and include per-item results in the job status response. Mention this progression naturally and you'll signal that you've actually dealt with this problem at scale.

How to Talk About This in Your Interview

Most candidates can design a reasonable API if given enough time. The interview isn't about whether you can design one. It's about whether you can think out loud, name your tradeoffs, and respond to pushback without falling apart. This section is about the talking, not the designing.

When to Bring It Up

The moment an interviewer says "design the API" or "what does the interface look like," resist the urge to start scribbling endpoints. That's the single most common mistake. Instead, pause and verbalize the resources first.

Say something like: "Before I write any endpoints, let me map out the core resources. We have Rides, Drivers, and Riders. A Ride belongs to a Rider and gets assigned to a Driver. That gives me three resource collections to work with." The interviewer will nod. You've just demonstrated that you think in terms of domain modeling, not URL strings.

Other cues that should trigger specific API design patterns:

"What happens if the network drops mid-request?" This is your cue for idempotency keys. Don't wait for the word "idempotent" to appear.
"How does the mobile client load the feed?" Pagination. And you should immediately be thinking about whether the data is append-heavy (cursor) or relatively static (offset is fine).
"We have third-party developers consuming this." Versioning and backward compatibility just became first-class concerns. Mention them proactively.
"This endpoint is public-facing." Rate limiting. Say it before they ask.
"Users need to upload thousands of records." Batch operations and async job patterns. Don't try to shove 10,000 items through a synchronous POST.

Interview tip: You don't need to address every pattern in every interview. Read the problem. If it's a social feed, pagination matters more than batch operations. If it's a payment system, idempotency is the star. Pick the patterns that fit and go deep on those.

Sample Dialogue

Here's how a real exchange might go when designing a payment API. Notice how the candidate doesn't just answer questions; they anticipate problems.

Interviewer: "Alright, so we need an API for our checkout flow. A user has items in their cart and wants to place an order. Walk me through the endpoints."

You: "Sure. The core resources are Users, Carts, and Orders. I'd model placing an order as a POST to /orders, with the cart ID in the request body. The server validates the cart, charges the payment method, and creates the order. I'd return a 201 with the order object. One thing I want to call out early: since this involves a payment, I'd require an Idempotency-Key header on this POST."

Interviewer: "Why? What's the scenario you're worried about?"

You: "The client sends the POST, the server processes the payment successfully, but the response gets lost due to a network timeout. The client retries. Without an idempotency key, we'd charge them twice. With the key, the server looks up the previous result and returns it again. Same response, no second charge."

Interviewer: "Okay, but now you need to store those keys somewhere. Doesn't that add complexity? What if that store goes down?"

You: "It does add a dependency. I'd use Redis with a TTL of maybe 24 hours, since retries almost always happen within minutes. If Redis is unavailable, I'd rather fail the request with a 503 than risk a double charge. For a payment endpoint, correctness beats availability. That said, in a real system I'd have Redis in a replicated setup so that failure mode is rare."

Interviewer: "What status codes would you use beyond 201?"

You: "400 for malformed requests, like a missing cart ID. 409 if the order conflicts with current state, say the cart was already checked out. 402 if the payment is declined. And I'd wrap every error in a consistent envelope: an error code, a human-readable message, and a details array for field-level validation issues."

Notice how the candidate didn't recite a textbook definition of idempotency. They told a story: client sends request, response gets lost, client retries, user gets double-charged. That's what makes it stick.

Follow-Up Questions to Expect

"Why REST instead of GraphQL or gRPC here?" Default to: "REST fits well when operations map cleanly to CRUD on well-defined resources. I'd reach for GraphQL if clients needed highly variable queries across nested data, or gRPC for internal service-to-service calls where we want strong typing and streaming."

"How would you handle pagination on this list endpoint?" Name the approach and the reason: "Cursor-based, because this feed is sorted by recency and new items are constantly being added. Offset pagination would skip or duplicate items as the underlying data shifts."

"What's your versioning strategy?" Keep it simple unless the problem demands otherwise: "URL path versioning, like /v1/orders. It's the most visible and debuggable approach. I'd only move to header-based versioning if we had a strong reason to keep URLs stable across versions."

"What happens if a downstream service is slow or down?" This tests whether you've thought about failure modes: "I'd set timeouts on downstream calls, return a 503 with a Retry-After header, and make sure the client knows this is a transient failure. For non-critical downstream calls, I might use a circuit breaker so we degrade gracefully instead of cascading the failure."

What Separates Good from Great

A mid-level answer lists endpoints correctly: GET /orders, POST /orders, GET /orders/{id}. A senior answer explains why they modeled it that way. "I'm keeping line items nested inside the order resource because no caller ever needs line items independent of their order. If that changes, I'd promote them to their own resource."
Mid-level candidates pick a pagination strategy. Senior candidates connect it to the data characteristics. "This dataset is append-only and sorted by timestamp, so cursor pagination gives us stable pages even under high write throughput. The tradeoff is that users can't jump to page 47, but for a feed-style UI, that's not a real use case."
The biggest differentiator: senior candidates close the loop on errors. They don't just design the happy path and move on. They volunteer the error contract, the status codes, and what the client should do when things go wrong. If you say nothing about errors, the interviewer has to prompt you, and that costs you points.

Key takeaway: The interviewer isn't grading your URL structure; they're grading whether you can articulate why you made each choice, what breaks if you chose differently, and what happens when things fail.