Google Data Scientist Interview Guide by Googler

Google Data Scientist at a Glance

Total Compensation

$168k - $661k/yr

Interview Rounds

7 rounds

Difficulty

Levels

L3 - L7

Education

Bachelor's / Master's / PhD

Experience

0–22+ yrs

Python R SQLOperations ResearchInfrastructureSystems Optimization

From hundreds of mock interviews, here's what catches candidates off guard about Google's Data Scientist role: the interview loop includes a dedicated statistics and probability round that most other big tech companies have dropped. Google still runs thousands of simultaneous A/B tests across Search, Ads, and Cloud, and they want DSs who can reason about multiple comparisons, interference effects, and metric sensitivity, not just fit an XGBoost model and call it a day.

Google Data Scientist Role

Primary Focus

Operations ResearchInfrastructureSystems Optimization

Skill Profile

Math & Stats

Expert

Expertise in statistics, mathematics, operations research, and quantitative methods is fundamental. This includes statistical analysis, forecasting, and model-based decision support. Advanced degrees (Master's/PhD) in quantitative fields are highly valued.

Software Eng

Medium

Proficiency in coding for data manipulation, analysis, and scripting is required. While strong coding skills are necessary, the role emphasizes analytical application rather than large-scale software system design or development.

Data & SQL

Medium

Ability to query and work with large-scale databases is essential. An understanding of data infrastructure is important for optimizing decisions related to technical infrastructure, though direct pipeline building may not be a primary focus.

Machine Learning

High

Strong capability in developing and applying models for decision support and forecasting, which inherently includes various machine learning techniques. This is crucial for optimizing large-scale systems and solving complex product/business problems.

Applied AI

Low

Not explicitly mentioned in the provided job descriptions for these specific Data Scientist roles. The focus appears to be on traditional data science, statistics, and operations research applications.

Infra & Cloud

Medium

A solid understanding of technical infrastructure and data centers is required, particularly for roles focused on optimizing Google's Technical Infrastructure. This involves providing model-based decision support, not necessarily hands-on cloud deployment.

Business

High

Strong ability to translate data insights into actionable business recommendations, solve product and business problems, and influence large dollar spend decisions. This includes understanding market dynamics and strategic perspectives.

Viz & Comms

High

Excellent communication skills are critical, including the ability to present complex analytical insights and recommendations clearly to executive-level stakeholders and to 'weave stories with meaningful insight from data'.

What You Need

Analytics to solve product or business problems
Statistical analysis
Coding for data analysis
Querying databases
Model-based decision support
Quantitative analysis
Executive-level business communications
Operations Research

Nice to Have

Advanced modeling techniques
Experimentation design (e.g., A/B testing)
Domain expertise relevant to specific team (e.g., infrastructure optimization, product analytics)
PhD degree in a quantitative field

Languages

PythonRSQL

Tools & Technologies

SQL databases (e.g., BigQuery)Statistical software/libraries (e.g., pandas, NumPy, scikit-learn in Python; R packages)Machine Learning frameworks (e.g., TensorFlow, scikit-learn)Data visualization tools

Want to ace the interview?

Practice with real questions.

Start Mock Interview

Google's DS org blends deep statistical work with product influence. You'll write BigQuery SQL against petabyte-scale logs tables, design A/B tests for Search ranking changes using Google's internal experimentation platform, and build causal inference models in Colab Enterprise notebooks. But you'll also spend real time in rooms with PMs and engineering leads, defending metric trade-offs and translating analysis into ship/no-ship recommendations. The role demands both technical depth and the ability to make a VP care about your one-pager.

A Typical Week

A Week in the Life of a Google Data Scientist

Typical L5 workweek · Google

Weekly time split

Analysis — 25%Coding — 18%Writing — 18%Meetings — 15%Research — 10%Break — 10%Infrastructure — 4%

Culture notes

Google DSs typically work around 42-47 hours per week with genuine flexibility on daily scheduling, though meeting density on Search teams can spike around launch cycles and quarterly OKR reviews.
Google requires 3 days per week in the Mountain View office (hybrid policy), and most Search DS teams coordinate Tuesday-Thursday as their in-office overlap days.

The thing the widget won't convey is how writing-heavy this job feels in practice. Experiment design docs, findings summaries, executive one-pagers: these aren't afterthoughts, they're primary deliverables that get scrutinized by peers before anything reaches a PM. Monday and Wednesday skew heavily toward meetings and cross-functional readouts (Search Quality PMs pushing back on your metric choices in a room of 12), which means you need to protect your analysis blocks aggressively or they'll evaporate.

Projects & Impact Areas

Search and Ads dominate the DS headcount because they dominate Google's revenue. You might be segmenting query intent types to evaluate a snippet relevance signal on the Search Quality team, while a colleague on Ads models heterogeneous treatment effects for auction bid optimization. Less visible but equally real is the infrastructure and operations research work: data center efficiency modeling, network capacity planning for Google Cloud, and supply chain optimization for hardware products like Pixel. These projects pull on linear programming and optimization under constraints, which is why Google's interview loop tests OR concepts that most other companies skip entirely.

Skills & What's Expected

The skill scores in the data tell a story worth reading carefully. Math and statistics are rated expert, ML is high, but modern AI and GenAI depth isn't emphasized for DS roles (that work lives with research scientists and MLEs). The underrated dimension is data visualization and executive communication, scored just as high as ML. You're expected to own the recommendation layer, translating complex statistical findings into clear narratives for stakeholders who won't read your notebook. Medium-level software engineering is sufficient: clean Python, solid BigQuery data modeling, no distributed systems design.

Levels & Career Growth

Google Data Scientist Levels

Each level has different expectations, compensation, and interview focus.

Base

$131k

Stock/yr

$26k

Bonus

$11k

0–3 yrs Bachelor's degree in a quantitative field (e.g., Statistics, Computer Science, Economics) is typically required. Master's or PhD is common.

What This Level Looks Like

Scope is limited to well-defined tasks and specific sub-problems within a single project or feature area. Work is closely supervised by senior team members and requires significant guidance.

Day-to-Day Focus

→Execution of assigned analytical tasks.
→Learning the team's technical stack, data sources, and problem domain.
→Delivering accurate and well-documented analyses with guidance.

Interview Focus at This Level

Interviews focus on core technical skills: probability, statistics, SQL, coding (Python/R), and foundational machine learning concepts. Emphasis is on problem-solving ability on well-scoped questions rather than system design or product ambiguity.

Promotion Path

Promotion to L4 (Data Scientist III) requires demonstrating the ability to independently own and deliver on medium-sized projects from start to finish, requiring less direct supervision and showing a deeper understanding of the team's product area and business impact.

Find your level

Practice with questions tailored to your target level.

Start Practicing

The widget shows the level bands and comp ranges, so here's the context it can't capture. The L5-to-L6 transition is where careers stall. L6 (Staff) requires demonstrated cross-team influence, meaning you're shaping the data science roadmap for a product area, not just executing analyses within one. It's a fundamentally different job, more strategy, less hands-on analysis. Getting your target level right before the recruiter screen matters because it determines your interview difficulty and the behavioral bar you'll be measured against.

Work Culture

Google requires three days per week in-office, with most DS teams coordinating Tuesday through Thursday as overlap days. The culture runs on peer review and data-driven rigor: expect your analyses to be scrutinized by other DSs and engineers before any recommendation moves forward. That raises the quality bar but also gives your work genuine visibility across the org.

Google Data Scientist Compensation

The vesting schedule looks generous up front, but the back half is where it bites. Years 3 and 4 vest significantly less, so your effective annual comp can quietly shrink unless refresh grants make up the difference. Refreshers aren't guaranteed, and from what candidates report, the size and timing vary widely even among strong performers.

Because the source data on Google's DS negotiation process is limited, take this as directional rather than gospel: the comp structure (base, equity, bonus) gives you multiple surfaces to negotiate against, and candidates with competing offers from peer companies tend to have more room to move. If you're sitting on another offer, don't leave it unmentioned. Silence rarely helps.

Google Data Scientist Interview Process

7 rounds·~6 weeks end to end

Initial Screen

1 round

Recruiter Screen

mPhone

An initial phone call with a recruiter to discuss your background, experience, and interest in Google, as well as to confirm basic qualifications and fit for the role.

generalbehavioral

Tips for this round

Be prepared to articulate your experience and why you are interested in Google and this specific Data Scientist role.
Have questions ready for the recruiter about the role, team, or interview process.

Technical Assessment

1 round

SQL & Data Modeling

mLive

This technical screen focuses on your proficiency in SQL for data manipulation, Python/R for data analysis, and foundational knowledge in statistics and probability. It may also include light machine learning concepts.

databasedata_modelingstatisticsprobability

Tips for this round

Practice complex SQL queries involving joins, aggregations, and window functions.
Review core statistical concepts like hypothesis testing, confidence intervals, and probability distributions.
Be ready to write and debug Python/R code for data cleaning and analysis.

Onsite

5 rounds

Coding & Algorithms

mLive

This round assesses your problem-solving skills through coding challenges, focusing on data structures and algorithms. You'll typically use Python or R to solve data-related problems.

algorithmsdata_structuresengineering

Tips for this round

Practice datainterview.com/coding-style problems, focusing on efficiency and edge cases.
Clearly explain your thought process, including initial approaches and optimizations.
Be proficient in common data structures like arrays, lists, dictionaries, and trees.

Statistics & Probability

mLive

This interview delves into experimental design, A/B testing, hypothesis testing, and advanced statistical concepts. You'll be expected to apply these to real-world product scenarios.

statisticsprobabilityab_testingcausal_inference

Product Sense & Metrics

mLive

Evaluates your ability to translate open-ended business problems into data questions, define relevant metrics, and use data to drive product decisions. May include guesstimate questions.

product_senseguesstimateab_testing

Machine Learning & Modeling

mLive

This round focuses on your understanding of machine learning fundamentals, including model selection, evaluation metrics, bias-variance tradeoff, and practical application of ML models.

machine_learningdata_modelingstatistics

Behavioral

mLive

This interview assesses your collaboration skills, leadership potential, problem-solving approach, and how your values align with Google's culture, often referred to as 'Googliness'.

behavioralgeneral

Tips to Stand Out

Master core statistics, probability, and experiment design concepts.
Be fluent in SQL and Python/R for data manipulation, analysis, and coding challenges.
Develop strong product sense to connect data insights to business outcomes and user behavior.
Practice clear and concise communication of technical concepts, analytical thought processes, and findings to both technical and non-technical stakeholders.
Demonstrate curiosity, pragmatism, and a willingness to tackle complex, messy datasets.
Prepare for a rigorous technical process that emphasizes deep quantitative and analytical skills.
While 'Googliness' is a factor, technical depth and problem-solving ability are often prioritized over traditional culture fit.

Common Reasons Candidates Don't Pass

✗Lack of depth in statistical or machine learning fundamentals.
✗Inability to translate data insights into actionable product or business recommendations.
✗Weak coding skills (SQL or Python/R) for data manipulation and problem-solving.
✗Poor communication of technical solutions or analytical thought process during interviews.
✗Insufficient experience or understanding of experimental design and A/B testing principles.

Offer & Negotiation

The provided research context does not contain specific details regarding the Google Data Scientist offer negotiation process.

From what candidates report, the most common rejection trigger is insufficient depth across the stats-heavy rounds. Google's loop dedicates three separate stages (SQL & Data Modeling, Statistics & Probability, and Product Sense & Metrics) that all probe quantitative reasoning from different angles. Candidates who allocate most of their prep to ML and coding discover too late that those areas account for a smaller share of the overall signal.

Your interviewers don't make the hiring decision. They write up structured feedback, and a separate committee of people who've never met you reviews the full packet. One rough round won't automatically kill your chances, because the committee weighs aggregate signal across all seven stages.

That committee model changes how you should perform in each round. At Google, your answers need to survive being paraphrased in a written summary by someone who may not share your exact framing. Structured, clearly reasoned explanations that translate well onto paper will serve you better than conversational rapport alone.

Google Data Scientist Interview Questions

Statistics, Probability & Experimentation

Expect questions that force you to translate ambiguous real-world variability into testable assumptions and defensible conclusions. Candidates often stumble by knowing formulas but failing to choose the right test, handle power/multiple testing, or explain uncertainty clearly.

You run an A/B test on a Google data center scheduling change; unit is host and outcome is daily energy per job, but jobs move across hosts during the day. What is the correct analysis unit and variance estimator, and why is a naive two-sample $t$-test wrong?

HardInterference and Clustering in Experiments

Sample Answer

Most candidates default to a two-sample $t$-test on host-level daily averages, but that fails here because jobs migrate, so treatment spills across hosts and induces correlated outcomes. You need to analyze at the randomization unit that actually receives treatment, often the scheduler region, cluster, or time block, or you need an exposure model that defines treatment by fraction of time under the new scheduler. Use cluster-robust standard errors (clustered at the randomization or interference unit) or a randomization-based test; otherwise your standard errors are too small and your p-values are fantasy. If you cannot define a clean interference boundary, you should redesign the experiment, for example switchback by time with sufficient washout.

A new congestion-control setting for Google Front End is rolled out to 50% of RPCs; the primary metric is tail latency, $p99$, and you have per-RPC samples. How do you compute a confidence interval for the lift in $p99$ and decide significance without assuming normality?

MediumNonparametric Inference for Quantiles

Sample Answer

Use a bootstrap (preferably stratified by service or region) to get a confidence interval for the difference in $p99$ between treatment and control. Quantiles have non-normal sampling distributions and are extremely sensitive to dependence and heavy tails, so a $t$-test on per-RPC values is miscalibrated. Resample at the correct independence unit, often client, session, or time bucket, recompute $p99$ per resample, then take percentile or BCa intervals. Declare significance if the interval excludes $0$ and you have validated stability across strata.

Python

1import numpy as np
2
3def bootstrap_p99_diff(control, treatment, n_boot=5000, alpha=0.05, rng_seed=0):
4    """Returns (point_estimate, ci_low, ci_high) for treatment - control p99."""
5    rng = np.random.default_rng(rng_seed)
6    control = np.asarray(control)
7    treatment = np.asarray(treatment)
8
9    def p99(x):
10        return np.quantile(x, 0.99, method="linear")
11
12    point = p99(treatment) - p99(control)
13
14    boots = np.empty(n_boot)
15    n_c = control.shape[0]
16    n_t = treatment.shape[0]
17
18    for b in range(n_boot):
19        c_s = control[rng.integers(0, n_c, size=n_c)]
20        t_s = treatment[rng.integers(0, n_t, size=n_t)]
21        boots[b] = p99(t_s) - p99(c_s)
22
23    lo = np.quantile(boots, alpha / 2)
24    hi = np.quantile(boots, 1 - alpha / 2)
25    return float(point), float(lo), float(hi)
26

You monitor daily packet loss rate for a fleet and see 20 alerts across regions after a routing change; each alert is a hypothesis test at $\alpha=0.05$. How do you control false positives while still catching real regressions, and what would you report to an exec?

EasyMultiple Testing and Alerting

Practice more Statistics, Probability & Experimentation questions

Machine Learning & Forecasting for Decision Support

Most candidates underestimate how much the interview emphasizes model choice tradeoffs for operational decisions (forecasting, capacity, anomaly detection) rather than leaderboard performance. You’ll be pushed to justify features, metrics, and failure modes, and to connect model outputs to actions.

You forecast weekly CPU demand for a Google data center to set next-week capacity buffers, but you only have 18 months of data and promotions and incident weeks create spikes. What model, features, and evaluation metric do you choose if over-forecasting costs money but under-forecasting triggers SLO violations?

MediumForecasting for Operations

Sample Answer

Use a quantile forecast that targets an asymmetric loss (for example, predict the $q$-th quantile and evaluate with pinball loss), then pick $q$ to reflect the under-forecast penalty. Add seasonality and calendar features (week-of-year, day count, planned maintenance windows), plus event flags for promotions and post-incident recovery, and use a robust baseline like ETS or gradient-boosted trees with lag features. Do backtesting with rolling-origin splits, then convert the quantiles into a buffer policy (for example, reserve the predicted $P90$ headroom) and validate the realized SLO breach rate against the target.

You need to forecast per-region request volume for Google Search to decide load-shedding thresholds and staffing, and traffic has strong daily seasonality plus sudden step-changes from launches. Would you use a global model trained across all regions or separate per-region models, and how do you prevent the forecast from causing harmful automated decisions during regime shifts?

HardModeling Tradeoffs and Failure Modes

Practice more Machine Learning & Forecasting for Decision Support questions

Product Sense & Metrics (Ops/Infrastructure Context)

Your ability to reason about what to measure—and why—matters as much as the math, especially when the “product” is internal infrastructure or systems reliability. You’ll need crisp metric definitions, guardrails, and rollout/measurement plans that anticipate confounding and unintended incentives.

Google rolls out a new autoscaling policy for Borg that aims to reduce latency regressions while cutting compute cost. Define one north star metric and 3 guardrail metrics, include precise numerator, denominator, and time window for each.

EasyMetric Definition and Guardrails

Sample Answer

You could optimize for user-perceived latency stability or for infrastructure efficiency. You could do p99 service request latency error budget burn as the north star, or compute cost per successful request. The latency and SLO framing wins here because it aligns to reliability promises and prevents cost savings that silently violate SLOs. Guardrails like request success rate, throttling rate, and capacity headroom (for example, fraction of time CPU $>80\%$) catch perverse incentives and rollout risk.

A new rack power-capping policy in a Google data center reduces total power draw by 5%, but some teams report higher tail latency. Propose an analysis plan to determine whether the policy caused the latency change, name at least 3 confounders, and specify what slices you would look at.

MediumCausal Reasoning in Ops Metrics

Sample Answer

Walk through the logic step by step as if thinking out loud. Start by defining treatment as racks under the new cap and control as comparable racks not capped, then check pre-trends in p95 and p99 latency to see if they moved together before rollout. Next, compute a difference-in-differences estimate over a stable window, then stratify by workload class (CPU bound vs memory bound), region, and time of day to see heterogeneity. Confounders to call out include workload mix shifts, concurrent kernel or scheduler changes, traffic seasonality, thermal events, and cache warmup after migrations. Finally, validate mechanism by correlating latency with throttling indicators (CPU frequency drops, queueing time, tail retries) and confirm no data quality artifacts in instrumentation.

Google is considering changing SRE oncall paging from threshold-based alerts to anomaly detection for a large fleet, with the goal of reducing toil without increasing incident impact. What metrics would you use to decide whether to ship, and how would you design the rollout to avoid gaming and blind spots?

HardAlerting Product Metrics and Rollout

Practice more Product Sense & Metrics (Ops/Infrastructure Context) questions

SQL & Data Modeling (BigQuery-style Analytics)

The bar here isn’t whether you can write a query, it’s whether you can produce correct results under messy schemas, joins, and time-window logic. Interviewers look for clarity on grain, deduping, null handling, and how your query supports a decision or metric.

In BigQuery, compute the weekly P50 and P95 of job queue wait time (from submit to start) for Google data center batch jobs, excluding canceled jobs and de-duping retries to the first attempt per job_id.

EasyWindow Functions

Sample Answer

Reason through it: Start by fixing grain, you want one row per job_id, not per attempt. De-dupe retries with QUALIFY and ROW_NUMBER, keep the earliest attempt, and filter out canceled states. Compute wait_seconds as the difference between start_ts and submit_ts, then bucket by ISO week. Finally, use BigQuery quantile functions for P50 and P95, and be explicit about safe handling of missing start_ts.

SQL

1/* Weekly queue wait time percentiles for batch jobs (BigQuery)
2   Assumed table: `infra.batch_job_attempts`
3   Columns: job_id, attempt_id, submit_ts, start_ts, state, cluster, region
4*/
5WITH first_attempt AS (
6  SELECT
7    job_id,
8    attempt_id,
9    submit_ts,
10    start_ts,
11    state
12  FROM `infra.batch_job_attempts`
13  WHERE submit_ts IS NOT NULL
14  QUALIFY ROW_NUMBER() OVER (
15    PARTITION BY job_id
16    ORDER BY submit_ts ASC, attempt_id ASC
17  ) = 1
18), cleaned AS (
19  SELECT
20    job_id,
21    DATE_TRUNC(DATE(submit_ts), ISOWEEK) AS week_start_date,
22    TIMESTAMP_DIFF(start_ts, submit_ts, SECOND) AS wait_seconds
23  FROM first_attempt
24  WHERE state != 'CANCELED'
25    AND start_ts IS NOT NULL
26    AND start_ts >= submit_ts
27)
28SELECT
29  week_start_date,
30  COUNT(*) AS jobs_started,
31  -- APPROX_QUANTILES returns an array of quantiles; for N=100, offsets map to percent.
32  APPROX_QUANTILES(wait_seconds, 100)[OFFSET(50)] AS p50_wait_seconds,
33  APPROX_QUANTILES(wait_seconds, 100)[OFFSET(95)] AS p95_wait_seconds
34FROM cleaned
35GROUP BY week_start_date
36ORDER BY week_start_date;

You have BigQuery tables for (1) per-minute fleet capacity and (2) per-minute workload demand by cluster; write a query that outputs, per cluster and day, the total minutes of capacity shortfall where $demand > capacity$, treating missing capacity as 0 and missing demand as 0.

HardTime Series Joins

Practice more SQL & Data Modeling (BigQuery-style Analytics) questions

Operations Research & Systems Optimization

In research-ops roles, you’re expected to frame infrastructure problems as optimization under constraints (cost, latency, reliability, capacity). Strong answers show clean problem formulation, appropriate relaxation/heuristics, and sensitivity analysis instead of jumping straight to a solver.

You run a fleet of $N$ identical servers, each fails independently with probability $p$ per day; capacity must stay above $K$ servers with probability at least $1-\alpha$. What is the smallest $N$ that satisfies $\mathbb{P}(\text{alive} \ge K) \ge 1-\alpha$, and how would you approximate it for large $N$ without brute force?

EasyReliability Sizing, Chance Constraints

Sample Answer

This question is checking whether you can translate an SLO into a chance constraint and pick a sane approximation under scale. Model alive servers as $X \sim \text{Binomial}(N, 1-p)$ and choose the smallest $N$ such that $\mathbb{P}(X \ge K) \ge 1-\alpha$. For large $N$, use a normal approximation with continuity correction or a Chernoff bound to get a conservative $N$, then validate with an exact binomial CDF for the final answer.

You are allocating requests from $m$ regions to $n$ data centers with per-unit costs $c_{ij}$ and capacities $u_j$, and you must satisfy demand $d_i$; formulate this as a min-cost flow or transportation LP. Then add a constraint that at least $\rho$ of each region’s demand must stay within its continent, and explain how this changes feasibility and dual interpretation.

MediumLP Formulation, Network Flow, Duals

Sample Answer

The standard move is a transportation LP with decision variables $x_{ij} \ge 0$, objective $\min \sum_{i,j} c_{ij} x_{ij}$, demand constraints $\sum_j x_{ij} = d_i$, and capacity constraints $\sum_i x_{ij} \le u_j$. But here, locality matters because the continent constraint adds coupling, for each region $i$ you require $\sum_{j \in \text{sameCont}(i)} x_{ij} \ge \rho d_i$, which can make an otherwise feasible flow infeasible under tight $u_j$. In the dual, you now get additional shadow prices for locality that quantify the marginal cost of tightening $\rho$, which is exactly what you want to communicate to infra and finance.

You need to reserve capacity across 3 data centers for next quarter under uncertain demand $D$ with scenarios $s$ and probabilities $\pi_s$; reserved capacity costs $c_r$ per unit, on-demand costs $c_o > c_r$, and unmet demand incurs penalty $c_p$. Formulate a two-stage stochastic program and state when you would use sample average approximation (SAA) versus a robust (minimax) formulation.

HardStochastic Optimization, Recourse, Robustness

Practice more Operations Research & Systems Optimization questions

Coding & Algorithms (Data-focused)

You’ll encounter prompts where speed and correctness depend on how you structure data transformations, not on obscure CS tricks. The common failure mode is writing code that works on toy inputs but ignores edge cases, complexity, or reproducibility.

In BigQuery you pulled per minute cluster CPU utilization for a week as (minute_ts, cluster_id, cpu_util) with occasional missing minutes; write Python to compute the longest continuous interval (in minutes) where cpu_util exceeds a threshold for each cluster. Return top 5 clusters by longest interval, tie break by earliest start time.

EasyTime Series Runs

Sample Answer

The standard move is sort by (cluster_id, minute_ts) and scan once while tracking the current run length and best run. But here, missing minutes matter because you must break a run when the timestamp gap exceeds 60 seconds even if utilization stays above threshold.

Python

1from __future__ import annotations
2
3from dataclasses import dataclass
4from datetime import datetime
5from typing import Iterable, List, Dict, Any, Tuple
6
7
8@dataclass
9class RunBest:
10    length: int = 0
11    start_ts: datetime | None = None
12    end_ts: datetime | None = None
13
14
15def longest_high_util_runs(
16    rows: Iterable[Dict[str, Any]],
17    threshold: float,
18    step_seconds: int = 60,
19    top_k: int = 5,
20) -> List[Dict[str, Any]]:
21    """Compute longest continuous (no missing minutes) interval with cpu_util > threshold per cluster.
22
23    Args:
24        rows: Iterable of dicts with keys: minute_ts (datetime), cluster_id (hashable), cpu_util (float).
25        threshold: Strictly greater-than threshold for being in a high-util run.
26        step_seconds: Expected cadence in seconds, default 60.
27        top_k: Number of clusters to return.
28
29    Returns:
30        List of dicts: cluster_id, longest_minutes, start_ts, end_ts.
31    """
32    # Defensive copy into list so we can sort.
33    data = list(rows)
34    data.sort(key=lambda r: (r["cluster_id"], r["minute_ts"]))
35
36    best: Dict[Any, RunBest] = {}
37
38    cur_cluster = None
39    cur_len = 0
40    cur_start = None
41    cur_end = None
42    prev_ts = None
43
44    def flush_current_run(cluster_id: Any):
45        nonlocal cur_len, cur_start, cur_end
46        if cluster_id is None or cur_len == 0:
47            return
48        b = best.setdefault(cluster_id, RunBest())
49        # Prefer longer runs, then earlier start time.
50        if (cur_len > b.length) or (
51            cur_len == b.length and b.start_ts is not None and cur_start is not None and cur_start < b.start_ts
52        ) or (cur_len == b.length and b.start_ts is None and cur_start is not None):
53            b.length = cur_len
54            b.start_ts = cur_start
55            b.end_ts = cur_end
56
57    for r in data:
58        cid = r["cluster_id"]
59        ts: datetime = r["minute_ts"]
60        util = r["cpu_util"]
61
62        if cid != cur_cluster:
63            # New cluster. Flush prior run state.
64            flush_current_run(cur_cluster)
65            cur_cluster = cid
66            cur_len = 0
67            cur_start = None
68            cur_end = None
69            prev_ts = None
70
71        is_high = util > threshold
72        is_contiguous = (
73            prev_ts is not None and int((ts - prev_ts).total_seconds()) == step_seconds
74        )
75
76        if is_high:
77            if cur_len == 0:
78                # Start a new run.
79                cur_start = ts
80                cur_end = ts
81                cur_len = 1
82            else:
83                # Continue only if contiguous, otherwise start over.
84                if is_contiguous:
85                    cur_len += 1
86                    cur_end = ts
87                else:
88                    flush_current_run(cid)
89                    cur_start = ts
90                    cur_end = ts
91                    cur_len = 1
92        else:
93            # Not high, close any active run.
94            flush_current_run(cid)
95            cur_len = 0
96            cur_start = None
97            cur_end = None
98
99        prev_ts = ts
100
101    # Flush the last cluster.
102    flush_current_run(cur_cluster)
103
104    # Build sortable summary.
105    summary: List[Tuple[int, datetime, Any, datetime, datetime]] = []
106    for cid, b in best.items():
107        if b.length > 0 and b.start_ts is not None and b.end_ts is not None:
108            summary.append((b.length, b.start_ts, cid, b.start_ts, b.end_ts))
109
110    # Sort by length desc, start asc.
111    summary.sort(key=lambda x: (-x[0], x[1]))
112
113    out = []
114    for length, _, cid, start_ts, end_ts in summary[:top_k]:
115        out.append(
116            {
117                "cluster_id": cid,
118                "longest_minutes": length,
119                "start_ts": start_ts,
120                "end_ts": end_ts,
121            }
122        )
123    return out
124
125
126if __name__ == "__main__":
127    # Minimal sanity check.
128    from datetime import timedelta
129
130    base = datetime(2026, 1, 1, 0, 0, 0)
131    rows = []
132    # cluster A has a break (missing minute) that should split the run.
133    for i in [0, 1, 2, 4, 5]:
134        rows.append({"minute_ts": base + timedelta(minutes=i), "cluster_id": "A", "cpu_util": 0.9})
135    # cluster B has a continuous run of 4.
136    for i in [0, 1, 2, 3]:
137        rows.append({"minute_ts": base + timedelta(minutes=i), "cluster_id": "B", "cpu_util": 0.95})
138
139    print(longest_high_util_runs(rows, threshold=0.8))
140

You have a stream of (ts, dc, request_id) from Google Front End logs and you need the earliest timestamp where any data center exceeds $p$ fraction of all requests in the last $W$ seconds; write Python for an online algorithm that updates per event in amortized $O(1)$. Assume events arrive in nondecreasing ts.

MediumSliding Window Heavy Hitter

Sample Answer

Get this wrong in production and you page the wrong oncall because stale traffic never leaves the window and a normal burst looks like a persistent skew. The right call is maintain a deque of events to expire by time, counts per data center, and a running window size, then check whether $\max_{dc} \text{count}(dc) > p \cdot N$ after each insert and expiry.

Python

1from __future__ import annotations
2
3from collections import Counter, deque
4from dataclasses import dataclass
5from datetime import datetime
6from typing import Deque, Dict, Iterable, Iterator, List, Optional, Tuple, Any
7
8
9@dataclass
10class Event:
11    ts: float  # seconds since epoch (or any monotone seconds)
12    dc: str
13    request_id: str
14
15
16def earliest_skew_timestamp(
17    events: Iterable[Tuple[float, str, str]],
18    window_seconds: float,
19    p: float,
20) -> Optional[float]:
21    """Return earliest ts where some dc exceeds p fraction within last window_seconds.
22
23    Online, amortized O(1) updates, assuming nondecreasing timestamps.
24
25    Args:
26        events: iterable of (ts, dc, request_id). request_id is unused here but realistic.
27        window_seconds: window size W.
28        p: fraction threshold in (0, 1]. Trigger if strictly greater than p.
29
30    Returns:
31        earliest triggering timestamp, or None.
32    """
33    if window_seconds < 0:
34        raise ValueError("window_seconds must be nonnegative")
35    if not (0 < p <= 1):
36        raise ValueError("p must be in (0, 1]")
37
38    q: Deque[Tuple[float, str]] = deque()  # (ts, dc)
39    counts: Counter[str] = Counter()
40    n_total = 0
41
42    # Track current max count to avoid scanning all DCs each time.
43    # This can become stale on decrements, so we lazily repair by rescanning when needed.
44    max_dc = None
45    max_count = 0
46
47    def expire(now: float):
48        nonlocal n_total, max_dc, max_count
49        cutoff = now - window_seconds
50        while q and q[0][0] <= cutoff:
51            ts0, dc0 = q.popleft()
52            counts[dc0] -= 1
53            n_total -= 1
54            if counts[dc0] == 0:
55                del counts[dc0]
56            # max_count might become stale, repair lazily later.
57
58        if n_total == 0:
59            max_dc = None
60            max_count = 0
61
62    def repair_max_if_needed():
63        nonlocal max_dc, max_count
64        if n_total == 0:
65            max_dc = None
66            max_count = 0
67            return
68        # If max_dc is missing or its count is less than max_count, we might be stale.
69        if max_dc is None or counts.get(max_dc, 0) != max_count:
70            # Full rescan, O(#dcs), happens only when max becomes stale.
71            dc, c = max(counts.items(), key=lambda kv: kv[1])
72            max_dc, max_count = dc, c
73
74    for ts, dc, _rid in events:
75        expire(ts)
76
77        q.append((ts, dc))
78        counts[dc] += 1
79        n_total += 1
80
81        # Update max optimistically.
82        if counts[dc] >= max_count:
83            max_dc = dc
84            max_count = counts[dc]
85
86        # Repair if decrements made max stale.
87        repair_max_if_needed()
88
89        if n_total > 0 and max_count > p * n_total:
90            return ts
91
92    return None
93
94
95if __name__ == "__main__":
96    # Example: DC1 dominates after 3 events in a window.
97    evs = [
98        (0.0, "dc1", "a"),
99        (1.0, "dc1", "b"),
100        (2.0, "dc2", "c"),
101        (3.0, "dc1", "d"),
102    ]
103    print(earliest_skew_timestamp(evs, window_seconds=10.0, p=0.6))
104

You are planning network capacity between Google data centers and have a directed graph of links with capacities; write Python to compute the minimum cut value between two sites $s$ and $t$ and return the cut partition (reachable set from $s$ in the residual graph). Use Edmonds Karp or Dinic, but handle up to 2,000 nodes and 20,000 edges.

HardMax Flow Min Cut

Practice more Coding & Algorithms (Data-focused) questions

The heaviest two areas, stats and ML, compound each other in Google's loop because infrastructure experimentation problems (like testing a Borg scheduling change) force you to build a causal model and a forecasting model in the same answer. That overlap means weak stats foundations don't just cost you one round; they undermine your ML answers on capacity planning and anomaly detection too. The most under-practiced area is operations research, where questions about server fleet reliability or cross-region request allocation require constrained optimization thinking that no amount of A/B testing prep will cover.

Practice questions calibrated to Google's infrastructure-heavy DS loop at datainterview.com/questions.

How to Prepare for Google Data Scientist Interviews

Know the Business

Updated Q1 2026

Official mission

“Google’s mission is to organize the world's information and make it universally accessible and useful.”

What it actually means

Google's real mission is to empower individuals globally by organizing information and making it universally accessible and useful, while also developing advanced technologies like AI responsibly and fostering opportunity and social impact.

Mountain View, CaliforniaHybrid - Flexible

Key Business Metrics

Revenue

$403B

+18% YoY

Market Cap

$3.7T

+65% YoY

Employees

191K

+4% YoY

Business Segments and Where DS Fits

Google Cloud

Cloud platform, 10.77% of Alphabet's revenue in fiscal year 2025.

Google Network

10.19% of Alphabet's revenue in fiscal year 2025.

Google Search & Other

56.98% of Alphabet's revenue in fiscal year 2025.

Google Subscriptions, Platforms, And Devices

11.29% of Alphabet's revenue in fiscal year 2025.

Other Bets

0.5% of Alphabet's revenue in fiscal year 2025.

YouTube Ads

10.26% of Alphabet's revenue in fiscal year 2025.

Current Strategic Priorities

Pivoting toward Autonomous AI Agents—systems designed to plan, execute, monitor, and adapt complex, multi-step tasks without continuous human input.
Radical expansion of compute infrastructure.
Evolution of its foundational models (Gemini and its successors).
Massive, long-term commitment to infrastructure via strategic partnerships, such as the one recently announced with NextEra Energy, to co-develop multiple gigawatt-scale data center campuses across the United States.
Maturation of Agentic AI.
Drive the cost of expertise toward zero, enabling high-paying knowledge work—from legal review to financial planning—to become exponentially more productive.
Transform Google Search from a retrieval system to a synthesized answer engine.

Competitive Moat

Better at service and supportEasier to integrate and deployBetter evaluation and contracting

Alphabet's annual revenue topped $400 billion in fiscal year 2025, with Google Search & Other still representing about 57% of the total. The company's stated bets right now: evolving Gemini, building autonomous AI agents, and transforming Search from a retrieval system into a synthesized answer engine. Compute infrastructure is expanding at a staggering pace, including gigawatt-scale data center partnerships with companies like NextEra Energy.

What does that mean if you're interviewing? You should expect interviewers to probe whether you can reason about the tensions these bets create. A good "why Google" answer isn't "I love the scale of Search." It's something like: "AI Overviews risk cannibalizing the ad clicks that fund 57% of revenue, and I want to work on the experimentation frameworks that measure whether synthesized answers actually shift long-term engagement enough to justify that tradeoff." Or point to Google Cloud (about 11% of revenue and growing fast) and articulate a specific measurement problem you'd want to own there. The bar is naming a real analytical tension at Google, not expressing admiration for the company. Interviewers at Google's hiring committees review candidate packets without context from small talk, so your specificity in product sense answers is what survives into the written feedback.

Try a Real Interview Question

On-call capacity shortfall by cluster and week

sql

You are given weekly forecasts of incident volume per cluster and weekly on-call capacity per cluster. For each cluster and week, compute the expected shortfall $\max(0, \text{forecasted_incidents} - \text{capacity_incidents})$, where $\text{capacity_incidents} = \left\lfloor \frac{\text{engineer_hours} \cdot 60}{\text{mean_minutes_per_incident}} \right\rfloor$. Output one row per cluster-week with the shortfall, ordered by week then cluster.

incident_forecast_weekly

cluster	week_start	forecasted_incidents
A	2026-01-05	120
A	2026-01-12	90
B	2026-01-05	70
B	2026-01-12	95

oncall_capacity_weekly

cluster	week_start	engineer_hours	mean_minutes_per_incident
A	2026-01-05	35	20
A	2026-01-12	25	15
B	2026-01-05	20	15
B	2026-01-12	24	18

SQL

1WITH joined AS (
2  SELECT
3    f.cluster,
4    f.week_start,
5    CAST(f.forecasted_incidents AS INT64) AS forecasted_incidents,
6    CAST(c.engineer_hours AS INT64) AS engineer_hours,
7    CAST(c.mean_minutes_per_incident AS INT64) AS mean_minutes_per_incident
8  FROM incident_forecast_weekly f
9  JOIN oncall_capacity_weekly c
10    USING (cluster, week_start)
11), capacity AS (
12  SELECT
13    cluster,
14    week_start,
15    forecasted_incidents,
16    CAST(FLOOR(engineer_hours * 60.0 / NULLIF(mean_minutes_per_incident, 0)) AS INT64) AS capacity_incidents
17  FROM joined
18)
19SELECT
20  cluster,
21  week_start,
22  forecasted_incidents,
23  capacity_incidents,
24  GREATEST(0, forecasted_incidents - capacity_incidents) AS expected_shortfall
25FROM capacity
26ORDER BY week_start, cluster;

700+ ML coding problems with a live Python executor.

Practice in the Engine

Google's coding problems lean toward BigQuery-flavored SQL (STRUCT types, ARRAY functions, window functions over partitioned tables) and Python that tests clean analytical thinking rather than competitive-programming tricks. The round is a gate, not a differentiator, so you need fluency without needing to be brilliant. Build that fluency with DS-calibrated problems at datainterview.com/coding.

Test Your Readiness

How Ready Are You for Google Data Scientist?

1 / 10

Statistics

Can you choose and justify an appropriate statistical test (for example t-test, chi-square, Mann-Whitney) given data type, sample size, distribution shape, and independence assumptions?

The widget above shows where your gaps are. Fill them with targeted practice at datainterview.com/questions, paying extra attention to any category where you scored below confident.

Frequently Asked Questions

How long does the Google Data Scientist interview process take?

Expect roughly 6 to 10 weeks from first recruiter call to offer. The process starts with a recruiter screen, then a technical phone screen (usually coding and stats), followed by 4-5 onsite interviews. After onsites, there's a hiring committee review that can add 2-3 weeks on its own. Google is notoriously slow here. I've seen candidates wait even longer if the committee requests additional signals.

What technical skills are tested in the Google Data Scientist interview?

SQL and Python (or R) are non-negotiable. You'll be tested on statistical analysis, probability, experimental design including A/B testing, and machine learning concepts. Google also cares about product intuition and model-based decision support. At higher levels (L5+), expect questions on operations research and quantitative analysis applied to ambiguous business problems. Practice coding for data analysis, not just algorithms.

How should I tailor my resume for a Google Data Scientist role?

Lead every bullet with measurable impact. Google wants to see that you used analytics to solve real product or business problems, so frame your experience that way. Mention specific tools (Python, R, SQL) and techniques (A/B testing, statistical modeling, ML). If you have executive-level communication experience, call it out explicitly. Keep it to one page for L3-L4, two pages max for L5+. A quantitative degree (Stats, CS, Econ, Math) should be prominent since it's basically required.

What is the total compensation for a Google Data Scientist by level?

At L3 (junior, 0-3 years experience), total comp averages around $168,000 with a range of $117K to $205K and base salary near $131K. L4 (mid-level, 3-8 years) averages $267,505 total comp with base around $181K. At the top end, L7 (principal, 14-22 years) can reach $661K to $950K in total comp with a base near $276K. RSUs vest over 4 years on a front-loaded schedule: 33%, 33%, 22%, 12%. That front-loading matters a lot for your first two years.

How do I prepare for the behavioral interview at Google for Data Scientist?

Google evaluates culture fit through its core values: user-centricity, innovation, openness, and responsibility. Prepare 5-6 stories that show you solving ambiguous problems, collaborating across teams, and putting the user first. Use the STAR format (Situation, Task, Action, Result) but keep it tight. At L5 and above, they want to hear about project leadership and influencing senior stakeholders. Be specific about your individual contribution versus the team's work.

How hard are the SQL and coding questions in Google Data Scientist interviews?

The SQL questions are medium to hard. You'll need window functions, CTEs, complex joins, and sometimes optimization thinking. For Python/R, expect data manipulation and analysis problems, not pure software engineering puzzles. At L3, the questions are well-scoped. By L4-L5, they get more ambiguous and you'll need to define the approach yourself. I'd recommend practicing on datainterview.com/coding to get used to the style and difficulty level.

What machine learning and statistics concepts should I know for Google Data Scientist interviews?

Probability and statistics are the foundation. You need to be sharp on hypothesis testing, confidence intervals, Bayesian reasoning, and distributions. For ML, know regression, classification, clustering, and when to use each. Experimental design is huge at Google, especially A/B testing methodology, power analysis, and handling common pitfalls like novelty effects. At senior levels (L5+), they'll push you on advanced ML concepts and expect you to lead the discussion on tradeoffs.

What happens during the Google Data Scientist onsite interview?

The onsite typically consists of 4-5 back-to-back interviews, each about 45 minutes. You'll face separate rounds for coding (Python/R), SQL, statistics and probability, product/business sense, and behavioral (Googleyness and leadership). At L6 and L7, expect heavier emphasis on strategic thinking and system design for data science. After the onsite, your packet goes to a hiring committee, which is a separate group that reviews all interviewer feedback before making a decision.

What metrics and business concepts should I study for Google Data Scientist interviews?

You need strong product intuition. Practice defining success metrics for Google products like Search, YouTube, or Ads. Know how to break down a vague business question into measurable KPIs. Understand tradeoffs between metrics (engagement vs. revenue, for example). Executive-level business communication is listed as a required skill, so practice explaining analytical findings clearly and concisely. At L4+, they'll test whether you can connect data analysis to real product decisions.

What format should I use to answer Google behavioral interview questions?

STAR works well here: Situation, Task, Action, Result. But Google interviewers want depth on the Action piece specifically. Don't spend two minutes on context and thirty seconds on what you actually did. Quantify your results whenever possible. For senior roles, add a reflection component about what you learned or would do differently. Keep each answer under 3 minutes. If the interviewer wants more detail, they'll ask.

What are common mistakes candidates make in Google Data Scientist interviews?

The biggest one I see is jumping straight into a solution without clarifying the problem. Google interviewers deliberately leave questions ambiguous to test your thinking process. Another common mistake is weak experimental design answers, especially around A/B testing edge cases. Candidates also underestimate the behavioral rounds. Googleyness matters, and I've seen technically strong people get rejected because they couldn't demonstrate collaboration or user-first thinking. Finally, don't ignore the hiring committee stage. Your interviewers don't make the final call.

What degree do I need to become a Data Scientist at Google?

A bachelor's degree in a quantitative field like Statistics, Computer Science, Math, or Economics is required at every level. That said, a Master's or PhD is very common among Google Data Scientists, especially at L4 and above. At L7 (Principal), a PhD or Master's is typical, though a bachelor's with extensive experience is possible. If you don't have a graduate degree, make sure your resume clearly demonstrates equivalent depth through projects and work experience. Check datainterview.com/questions for practice problems that match the technical bar Google expects.

Google Data Scientist Interview Guide

Google Data Scientist Role

A Typical Week

A Week in the Life of a Google Data Scientist

Weekly time split

Culture notes

Projects & Impact Areas

Skills & What's Expected

Levels & Career Growth

Google Data Scientist Levels

Work Culture

Google Data Scientist Compensation

Google Data Scientist Interview Process

Initial Screen

Recruiter Screen

Technical Assessment

SQL & Data Modeling

Onsite

Coding & Algorithms

Statistics & Probability

Product Sense & Metrics

Machine Learning & Modeling

Behavioral

Tips to Stand Out

Common Reasons Candidates Don't Pass

Google Data Scientist Interview Questions

Statistics, Probability & Experimentation

Machine Learning & Forecasting for Decision Support

Product Sense & Metrics (Ops/Infrastructure Context)

SQL & Data Modeling (BigQuery-style Analytics)

Operations Research & Systems Optimization

Coding & Algorithms (Data-focused)

How to Prepare for Google Data Scientist Interviews

Try a Real Interview Question

On-call capacity shortfall by cluster and week

Test Your Readiness

Frequently Asked Questions

Dan Lee

Related Articles

Snap Machine Learning Engineer Interview Guide

xAI AI Engineer Interview Guide

Snap Data Scientist Interview Guide