Join Data Science Interview MasterClass (in 3 weeks) 🚀 led by FAANG Data Scientists | Just 6 seats remaining...
A candidate I coached last year could explain transformer attention from first principles, had shipped models at scale, and still bombed the system design round. The feedback: "Couldn't reason about infrastructure." They knew the model. They had no idea how many GPUs it needed, what it would cost to serve, or whether their latency estimate was even in the right ballpark.
That gap is what this guide closes. Estimation in ML interviews isn't about memorizing formulas. It's about having a repeatable process that lets you walk into any prompt, whether it's "train a 7B LLM" or "serve 100K QPS recommendations," and produce a credible, structured answer in under five minutes.
You'll face two flavors of estimation: training-time (GPU hours, storage, data throughput) and serving-time (QPS capacity, memory footprint, fleet sizing, latency budgets). Every problem reduces to four levers: data volume, model size, hardware specs, and traffic patterns. Your job is to identify which levers are load-bearing for the specific problem, anchor on a handful of numbers you've memorized cold (A100 FLOPS, GPU memory tiers, rough model sizes for BERT and GPT-2), and reason your way to an answer that's within an order of magnitude. Interviewers aren't checking your arithmetic. They're checking whether you think like someone who's actually had to pay a cloud bill.
Five steps. Every ML estimation question you'll face maps onto them. Memorize the sequence and the time allocation, because interviewers notice when you spend 10 minutes on model size and then rush through fleet sizing in 30 seconds.
| Phase | Time | Goal |
|---|---|---|
| 1. Clarify scope | 2-3 min | Lock in training vs. serving, model family, and scale inputs before any math |
| 2. Anchor on model size | 3-4 min | Estimate parameter count and memory footprint with explicit precision assumptions |
| 3. Estimate compute | 4-5 min | Calculate FLOPs for training or per-inference latency |
| 4. Size the fleet | 3-4 min | Convert QPS demand or training throughput into GPU replica count |
| 5. Sanity check | 2-3 min | Validate against latency SLA, cost budget, and a real-world comparable |
This is the one thing to internalize before you walk in. The numbers don't have to be perfect. The structure does.

You need three things before touching a number: are you estimating training or serving (or both), what model family are you working with, and what is the scale of the problem.
What to do:
Ask exactly these three questions, in this order:
Don't assume. A prompt like "design a recommendation system" could mean a 10ms real-time ranker or a weekly batch embedding job. Those require completely different calculations.
What to say:
"Before I start estimating, I want to make sure I'm solving the right problem. Are we focused on the cost to train this model, the infrastructure to serve it, or both? And what scale are we targeting? I want to anchor on QPS and latency budget for serving, or dataset size and timeline for training."
How the interviewer is evaluating you:
They're checking whether you distinguish between training and serving constraints. Candidates who jump straight to "okay so we need GPUs" without clarifying this fail immediately. Asking these questions signals you understand that a 7B parameter model looks completely different at training time versus serving time.
Example: "Okay, so we're sizing a serving fleet for a real-time embedding retrieval model at 50K QPS with a 100ms p99 budget. Let me anchor on the model size first before I touch fleet numbers."
Memory is almost always the binding constraint in serving, and it's the first thing that determines whether your design is even feasible. Start here.
What to do:
parameters × bytes_per_param. For fp16, that's 2 bytes. For fp32, 4 bytes. For int8, 1 byte.What to say:
"I'll treat this as a transformer model with roughly 7 billion parameters. At fp16, that's 7B × 2 bytes = 14GB just for the weights. For serving, I'll add a 1.2x overhead for activations and runtime buffers, so call it roughly 17GB per replica. That already tells me I need at least one A100 80GB per replica, and I can't fit two replicas on a single 40GB card."
How the interviewer is evaluating you:
They want to see you state your precision assumption explicitly and unprompted. Defaulting to fp32 in 2024 is a red flag; production serving is fp16 or int8. They're also watching whether you remember the optimizer state multiplier for training. Most candidates forget it, and it makes their training memory estimate 3x too small.
Example: "Good, so I've got 17GB per replica for the model itself. Now let me figure out what compute looks like per request so I can work out how many replicas I actually need."
For training, one formula covers most transformer problems. For serving, you're dividing work by hardware throughput.
What to do:
For training, use the 6ND rule:
Training FLOPs ≈ 6 × N × D
Where N is parameter count and D is the number of training tokens. The factor of 6 accounts for the forward pass, backward pass, and parameter update. A 7B model trained on 1T tokens: 6 × 7×10⁹ × 10¹² = 4.2×10²² FLOPs.
For serving, FLOPs per forward pass scale with both parameter count and sequence length:
FLOPs per request ≈ 2 × N × L
Where N is parameter count and L is the average input sequence length in tokens. The factor of 2 accounts for the multiply-accumulate operations per parameter per token. Before you plug in numbers, state your L assumption explicitly. A short classification prompt might average 50 tokens; a retrieval-augmented generation request with context could easily be 512 or more. That difference is a 10x swing in compute per request.
Once you have FLOPs per request, divide by hardware throughput adjusted for utilization:
Latency ≈ FLOPs_per_request / (hardware_FLOPS × utilization)
A realistic utilization for inference is 30-50% of peak FLOPS on an A100, not 100%.
What to say:
"For training, I'll use the 6ND rule. 6 times 7 billion parameters times 1 trillion tokens gives me about 4×10²² FLOPs for the full run. An A100 at fp16 does about 312 TFLOPS peak, but realistically I'll get maybe 40% utilization in a distributed training job, so call it 125 TFLOPS effective. That's 4×10²² divided by 1.25×10¹⁴, which is roughly 3×10⁸ seconds of single-GPU compute. Divide by 1000 GPUs and you're looking at about 300,000 seconds, or around 80 hours of wall-clock time."
"For serving, I'll assume an average input length of 100 tokens per request. That gives me 2 × 7B × 100 = 1.4 trillion FLOPs per request, or about 1.4×10¹² FLOPs. I'll state that assumption clearly and adjust if the interviewer tells me the typical prompt is longer."
How the interviewer is evaluating you:
They want to see you apply a formula, not guess. Saying "training a 7B model takes a few weeks" with no derivation gets you nothing. The 6ND rule is well-known enough that interviewers at Google and Meta will recognize it immediately. Using it signals you've done real training work, not just read papers.
On the serving side, explicitly calling out your sequence length assumption is what separates a rigorous estimate from a hand-wave. Interviewers will often probe this directly: "What if the average prompt is 500 tokens?" You want to have already shown you know L is a variable, not a constant.
Example: "So at L=100 tokens, I'm looking at roughly 1.4×10¹² FLOPs per request for a 7B model. Now I can figure out how many requests per second a single GPU can handle, and from there the fleet size."
This is where you convert your compute estimate into something the interviewer can evaluate against real infrastructure.
What to do:
GPU_FLOPS × utilization / FLOPs_per_request = requests_per_second_per_GPU.ceil(peak_QPS / (throughput_per_GPU × 0.65)).Always use peak QPS, not average. For consumer products, peak is typically 3-5x average. If the interviewer hasn't given you a peak number, state your assumption.
What to say:
"An A100 at fp16 does 312 TFLOPS peak. At 40% utilization for inference, that's about 125 TFLOPS effective. Each request needs 1.4×10¹² FLOPs at my assumed sequence length of 100 tokens, so one GPU handles roughly 125×10¹² / 1.4×10¹² ≈ 90 requests per second at batch size 1. I'll target 65% utilization for the fleet, so effective capacity per GPU is about 58 QPS. At 50K peak QPS, I need ceil(50,000 / 58) ≈ 863 replicas. That seems high, so I'd immediately look at batching to amortize that per-request cost, and I'd revisit whether L=100 is realistic for this use case."
How the interviewer is evaluating you:
They're watching whether you apply a utilization buffer and whether you size for peak. Candidates who size for average load and forget the 3-5x peak multiplier are describing a system that falls over every evening. Also: rounding up and explaining why ("rolling deployments, traffic variance") shows production intuition.
Do this: State your peak-to-average assumption explicitly. "I'm assuming peak is 3x average for a consumer product. If this is a B2B API with flatter traffic, I'd revise that down."
Example: "Alright, let me do a quick sanity check on cost and latency before I commit to that fleet number."
A fleet size without a cost and latency check is an incomplete answer. This step takes two minutes and it's where you demonstrate engineering judgment, not just arithmetic.
What to do:
What to say:
"Quick sanity check on latency: my model inference estimate at L=100 tokens is around 15ms per request at batch size 1. The SLA is 100ms p99, so I have roughly 85ms left for everything else. Feature retrieval from a vector store like Pinecone typically adds 10-30ms, preprocessing maybe 5ms, and network round-trip another 10ms. That puts me comfortably under 100ms. On cost: if the fleet comes out larger than expected, the first thing I'd do is increase batch size. Batching 10 requests together doesn't cost 10x the compute, so throughput per GPU improves significantly and the fleet shrinks. If we needed further cuts, I'd look at int8 quantization next."
How the interviewer is evaluating you:
This is where senior engineers separate from junior ones. Giving a fleet size and stopping is a junior answer. Connecting that fleet size to cost, checking it against the latency budget, and naming optimization levers shows you've actually operated ML systems at scale. Interviewers at Netflix, Uber, and Meta specifically probe this: "Okay, but that's $300K a month. How would you bring it down?"
Two worked examples, then a real dialogue. The goal is a repeatable pattern you can run under pressure.
The interviewer asks: "How long would it take to train a LLaMA-7B scale model on 1 trillion tokens, and how many GPUs do you need?"
Step 1: Anchor on the 6ND rule.
Total training FLOPs = 6 x N x D, where N is parameter count and D is token count.
1N = 7 x 10^9 parameters
2D = 1 x 10^12 tokens
3FLOPs = 6 x 7e9 x 1e12 = 4.2 x 10^22 FLOPs
4That's 42 zettaFLOPs. Big number. Now make it concrete.
Step 2: Map to hardware.
An A100 (80GB) delivers roughly 312 TFLOPS peak fp16, but peak is a fiction in multi-node training. Real utilization on large-scale transformer training accounts for communication overhead across nodes, pipeline bubbles, and load imbalance. In practice, MFU (Model FLOP Utilization) lands around 15-20% for multi-node runs. Use 17% as your working number, which is consistent with what Meta and others have reported for LLaMA-scale training.
Effective A100 throughput = 312 x 0.17 = ~53 TFLOPS = 5.3 x 10^13 FLOPs/sec
Do this: State your MFU assumption explicitly and explain why it's lower than the spec sheet. Saying "I'm using 17% MFU because multi-node training loses significant throughput to all-reduce communication and pipeline bubbles" shows you understand the gap between theory and production. Candidates who use 40-50% MFU for cluster-scale training are implicitly assuming single-node efficiency, and interviewers notice.
Step 3: Compute wall-clock time per GPU.
1Time (single GPU) = 4.2 x 10^22 / 5.3 x 10^13
2 = 7.9 x 10^8 seconds
3 ≈ 25,000 years
4That's your cue to size a cluster, not panic.
Step 4: Pick a realistic cluster and solve for time.
Meta trained LLaMA-2 on 2,000 A100s over roughly 35-40 days. Use that as your sanity anchor. With 2,000 GPUs:
Wall-clock time = 7.9 x 10^8 / 2,000 = 395,000 seconds ≈ 4.6 days
Wait, that's too fast compared to the reported 35-40 days. This is actually a useful moment in the interview: the math exposes that real training runs are slower than even a conservative MFU estimate suggests. In practice, you lose additional time to checkpoint restarts after hardware failures, evaluation runs, data loading bottlenecks, and gradient synchronization stalls. A rough rule of thumb is to apply a 5-10x "real-world overhead" multiplier on top of your compute estimate for large cluster runs. Applying 8x:
Adjusted wall-clock = 4.6 days x 8 ≈ 37 days
That matches Meta's reported numbers closely. Flagging this overhead explicitly, rather than pretending your formula produces the exact answer, is exactly the kind of intellectual honesty that impresses senior interviewers.
Step 5: Memory check.
7B parameters at fp16 = 14GB just for weights. Add optimizer states (Adam uses 3x parameter memory for fp32 master weights + momentum + variance = 42GB on top), plus activations. You need at least 80GB per replica, which means one A100-80GB per model shard minimum, and in practice you'd use tensor parallelism across 4-8 GPUs per replica.
The setup: you're building real-time item retrieval for a feed ranking system. Each request encodes a user query into an embedding and retrieves the top-K items from a vector index. The model is a two-tower encoder, roughly BERT-base scale (110M parameters). Target latency is 50ms p99. Peak QPS is 50,000.
Model memory per replica.
110M params at fp16 = 220MB. Add 20% overhead for runtime buffers and you're at ~270MB. That's tiny. A single A100-80GB could theoretically hold hundreds of copies, but compute, not memory, is your bottleneck here.
Throughput estimation.
BERT-base inference at batch size 32 on an A100 runs at roughly 3,000-5,000 sequences/second in fp16 with TensorRT or Triton optimization. Use 4,000 as your working number.
1Throughput per GPU = 4,000 sequences/sec
2Peak QPS needed = 50,000
3Raw GPU count = 50,000 / 4,000 = 12.5 GPUs
4Apply a 70% utilization target (you never want to run GPUs at 100% or latency spikes):
Sized GPU count = ceil(12.5 / 0.70) = ceil(17.9) = 18 GPUs
Batching strategy.
At 50K QPS with a 50ms budget, you have a real opportunity to batch. If requests arrive at 50K/sec and you batch over a 5ms window, you're collecting 250 requests per batch. That's well within the batch-32 throughput assumption above, and batching amortizes the fixed overhead of a GPU kernel launch. Flag this to the interviewer: "I'd use dynamic batching in Triton with a max batch size of 64 and a max wait time of 5ms."
The vector retrieval piece.
Don't forget: the embedding model is only half the latency budget. You still need to run ANN search over your item index. If you're using FAISS with an IVF index over 10M items, expect 5-15ms for retrieval. That's 10-30% of your 50ms budget gone before postprocessing. State this explicitly or the interviewer will ask.
This is what a real estimation conversation sounds like. Notice it's not clean.
Do this: That clarification just changed everything. "Batch with a 10-minute SLA" means you can use larger batch sizes and cheaper on-demand instances, not expensive low-latency serving infrastructure. Always ask before you calculate.
Do this: When challenged on a number, show your reasoning chain. "I derived it from X by applying Y" is infinitely better than restating the number louder. Interviewers aren't trying to trick you; they want to see that you can defend your assumptions.
Memorize these. They're your raw material for every estimation.
| Hardware | Memory | fp16 TFLOPS | Approx $/hr (cloud) | Best used for |
|---|---|---|---|---|
| A100 40GB | 40GB | 312 | ~$2.50 | Serving mid-size models |
| A100 80GB | 80GB | 312 | ~$3.00 | Training, large model serving |
| H100 80GB | 80GB | 989 | ~$8.00 | LLM training, high-throughput serving |
| V100 16GB | 16GB | 125 | ~$1.50 | Smaller models, cost-sensitive batch jobs |
| T4 16GB | 16GB | 65 | ~$0.50 | CPU-replacement inference, embeddings |
| Model | Params | fp16 Memory | Inference throughput (A100, bs=32) |
|---|---|---|---|
| BERT-base | 110M | ~220MB | ~4,000 seq/sec |
| GPT-2 (1.5B) | 1.5B | ~3GB | ~500 seq/sec |
| LLaMA-7B | 7B | ~14GB | ~150 seq/sec |
| LLaMA-70B | 70B | ~140GB | ~15 seq/sec (8xA100) |
| ViT-L/16 | 307M | ~600MB | ~1,200 img/sec |
| CLIP ViT-L | ~900M | ~1.8GB | ~800 img/sec |
Throughput numbers assume fp16, Triton or TensorRT, and reasonable batching. They're order-of-magnitude anchors, not benchmarks.
Do this: When you cite a number in the interview, say where it comes from. "BERT-base is 110M parameters, which I know because it's a standard reference point" lands better than just stating it. It signals you have a mental model, not a memorized list.
When the interviewer says "design YouTube recommendations," they haven't given you a scale. You need to extract it.
Ask: "What's the rough QPS for recommendation requests? And are we optimizing for latency or throughput?" If they say "you tell me," give a reasonable estimate: "YouTube serves roughly 2 billion logged-in users. If 5% are active at any given time and each triggers a recommendation request every 30 seconds, that's about 3 million QPS. I'll use that as my baseline."
When the prompt is specific ("100K QPS, 200ms p99 budget, how many GPUs"), skip the scoping questions and go straight to fleet sizing. The interviewer has already done the scope work for you. Jumping into clarifying questions at that point wastes time and signals you didn't hear the constraints.
The tell for a vague prompt: no QPS number, no latency SLA, no model specified. Fill in all three before you touch a formula.
Most candidates can produce some numbers. What separates a hire from a no-hire is whether those numbers reflect how ML systems actually behave in production. These mistakes are the ones that come up again and again, and every single one signals the same thing to an interviewer: this person hasn't shipped a real model.
You say: "LLaMA-7B is 7 billion parameters at 2 bytes each in fp16, so that's 14GB. Fits on one A100."
The interviewer nods, then asks: "What about during training?" You pause. "And what happens to your KV cache at batch size 32?"
Raw parameter memory is just the starting point. During training with Adam, you're carrying the model weights, gradients, and two optimizer moment tensors, which puts you at roughly 4x the base parameter memory, not 1x. During autoregressive inference, your KV cache grows with sequence length and batch size and can easily double your memory footprint at realistic serving loads. Activation memory during batched serving adds more on top.
Don't do this: Quote parameter count times bytes and call it done.
Do this: Apply your multipliers out loud. Say "14GB base, plus KV cache at batch 32 and sequence length 512 adds roughly another 8-10GB, so I'd plan for a full 80GB A100 with limited headroom."
Interviewers penalize this because undersizing memory means your fleet estimate is wrong, your batching strategy is wrong, and your cost estimate is wrong. Everything downstream breaks.
"We need to handle 10K QPS, so I'll size for 10K QPS."
If you say this, you've just designed a system that falls over every evening at 8pm. Consumer products routinely see 3-5x spikes over their daily average. A recommendation system averaging 10K QPS might hit 40-50K during a live event or a viral moment.
Don't do this: Size your fleet to average load without acknowledging peak.
Do this: State your assumption explicitly. "I'll assume a 4x peak-to-average ratio and size for 40K QPS, then target 65% GPU utilization to leave headroom for spikes."
The interviewer isn't expecting you to know the exact ratio. They're checking whether you know the ratio exists and that you need to ask about it or assume it. Skipping this makes your fleet estimate look naive, and it tells them you've never been paged at 2am because a model cluster fell over.
This one is subtle, which is why it catches so many people.
A candidate estimates 20ms for a transformer forward pass and declares "we can hit our 50ms p99 SLA, no problem." But they've accounted for exactly one of the five things that happen on every request. Feature retrieval from a store like Feast or Redis can add 5-15ms. Preprocessing and tokenization adds a few milliseconds. The network round-trip from the client to your serving cluster and back adds more. Postprocessing, re-ranking, or business logic filtering adds the rest.
Don't do this: Quote model inference latency as your total serving latency.
Do this: Decompose the budget. "50ms total: ~5ms network, ~10ms feature retrieval, ~20ms model inference, ~5ms postprocessing, leaving 10ms buffer for tail latency."
Interviewers at companies like Meta and Google care deeply about this because latency decomposition is exactly how their oncall engineers debug SLA misses. If you can't decompose it in an interview, they don't trust you to debug it in production.
It's 2024. No one is serving a production model in fp32.
When you use fp32 as your default precision, your memory estimate is 2x too high and your throughput estimate is roughly half of what you'd actually get. That makes your fleet size look 2-4x too large, your cost estimate looks absurd, and you've signaled that you're reasoning about ML systems from a research mindset, not an engineering one.
Don't do this: Say "each parameter takes 4 bytes" without qualifying your precision assumption.
Do this: Default to fp16 for serving and state it. "I'll assume fp16 inference, which is standard for production. If we needed further optimization, I'd consider int8 quantization and note the accuracy tradeoff."
The fix is one sentence. Say fp16. If the interviewer wants to explore fp32 or int8, they'll ask.
"So we need exactly 47 GPUs."
The false precision is worse than being wrong. Interviewers know you can't derive an exact number from the information given in a 45-minute interview. When you present a single number with no uncertainty, you're not demonstrating rigor. You're demonstrating that you don't understand the sources of variance in your own calculation.
Batch size, sequence length, request concurrency, model quantization choice, and hardware generation all move your final number significantly. A candidate who says "I'm estimating 40-80 A100s depending on our batch size strategy and whether we go int8" sounds more credible than one who confidently states 47.
Do this: Bound your answer. "My estimate is 40-80 GPUs. The low end assumes aggressive batching with int8; the high end assumes fp16 with conservative batch sizes for latency reasons. I'd start with 60 and load test."
This also gives you a natural opening to discuss the tradeoffs, which is exactly what the interviewer wants to hear.
You've done the math. You've got a GPU count. You stop there.
The sanity check is where you prove you have intuition, not just arithmetic. If your estimate lands at 5,000 A100s to serve a mid-size recommendation system, something is wrong and you should say so. If it lands at 2 GPUs for a 100K QPS LLM serving system, also wrong. Comparing your answer to a real-world reference point (GPT-3 reportedly ran on thousands of A100s; BERT-scale models at 10K QPS typically need tens of GPUs) takes 20 seconds and shows the interviewer you've internalized what reasonable looks like.
Don't do this: Hand over your final number without a gut-check against reality.
Do this: Close with one sentence. "This feels reasonable given that a comparable BERT-scale system at similar QPS is typically in the 20-40 GPU range."
If your number is off by 10x, the sanity check is your chance to catch it yourself rather than have the interviewer catch it for you.
Everything below is designed to be scanned, not read. Run through it once before you walk in.
| GPU | Memory | fp16 TFLOPS | ~$/hr (cloud) | When to cite it |
|---|---|---|---|---|
| V100 | 16 GB | 125 | $1.50 | Legacy systems, cost-sensitive orgs, anything pre-2022 |
| A100 | 80 GB | 312 | $3.00 | Default for most interview estimates; widely understood |
| H100 | 80 GB | 989 | $8.00 | LLM training at scale, latency-critical serving, 2024+ infra |
Default to A100 unless the interviewer signals otherwise. It's the reference GPU that lands in the right ballpark for almost every scenario.
| Model | Params | fp16 Memory | Notes |
|---|---|---|---|
| BERT-base | 110M | ~220 MB | Good anchor for encoder-only tasks |
| GPT-2 | 1.5B | ~3 GB | Useful for "small generative model" scenarios |
| LLaMA-7B | 7B | ~14 GB | Fits on a single A100 for serving; training needs multiple GPUs |
| LLaMA-70B | 70B | ~140 GB | Requires 2+ A100s; use to illustrate multi-GPU serving |
| CLIP ViT-L | ~900M | ~1.8 GB | Multimodal, embedding retrieval, recommendation systems |
Memory rule of thumb: fp16 costs 2 bytes per parameter. Int8 halves that again.
Training FLOPs (6ND rule):
total_FLOPs = 6 × N × D
# N = number of parameters, D = number of training tokens
Memory footprint:
1memory = params × bytes_per_param × overhead_multiplier
2# Serving: multiplier ~1.2 over fp16 base size.
3#
4# Training with mixed precision (fp16 weights + fp32 Adam states):
5# - fp16 weights: 2 bytes/param
6# - fp32 master weights: 4 bytes/param
7# - fp32 Adam momentum: 4 bytes/param
8# - fp32 Adam variance: 4 bytes/param
9# Total: ~14 bytes/param, or ~7x the fp16 model weight size
10#
11# Example: LLaMA-7B training needs ~98 GB, not the 14 GB you'd see at serving time.
12Fleet sizing:
replicas = ceil(peak_QPS / (throughput_per_GPU × utilization_target))
# Use 0.6–0.7 for utilization_target
Inference latency:
latency = FLOPs_per_request / hardware_FLOPS
# This is a floor. Add 20–40ms for network, preprocessing, and postprocessing.
Start with serving when the interviewer mentions QPS, p99 latency, SLA, or fleet size. That's the harder constraint in most production systems.
Start with training when the question is about cost to build, data pipeline design, or how long a model takes to produce.
When the prompt is vague (think "design YouTube recommendations"), go serving-first. The fleet size and latency budget will surface the interesting tradeoffs faster.
| Phase | What You Do | Time to Spend |
|---|---|---|
| 1. Clarify scope | Training vs. serving, model family, scale | 1–2 min |
| 2. Anchor model size | Param count, memory footprint, precision | 1–2 min |
| 3. Estimate compute | FLOPs for training run or per inference | 2–3 min |
| 4. Size the fleet | QPS per GPU, utilization target, replica count | 2–3 min |
| 5. Sanity check | Latency vs. SLA, cost vs. budget, real-world comp | 1 min |
These are the exact lines that signal structured thinking to an interviewer.