Goldman Sachs Data Scientist Guide (2026): Job, Salary & Interviews

Goldman Sachs Data Scientist at a Glance

Interview Rounds

5 rounds

Difficulty

Python Java C/C++ Scala R Matlab SASInvestment ManagementRisk ManagementTradingQuantitative Finance

Goldman Sachs screens for behavioral fit and stakeholder communication across multiple touchpoints in its data scientist interview loop. If you're the type who preps only for the ML round and wings the "tell me about a time" questions, you're making a mistake that costs a lot of otherwise strong candidates.

Goldman Sachs Data Scientist Role

Primary Focus

Investment ManagementRisk ManagementTradingQuantitative Finance

Skill Profile

Math & Stats

Expert

Deep theoretical and practical expertise in advanced statistical modeling, quantitative analysis, time series analysis, information theory, and mathematical foundations, particularly for financial applications and capital planning (e.g., CCAR).

Software Eng

High

Strong programming skills for building, deploying, and integrating data science models into production systems, with an understanding of MLOps practices and scalable architectures. Experience with mainstream programming languages is essential.

Data & SQL

High

Experience with large-scale data processing, Big Data tools (e.g., Hadoop, Spark), and contributing to data architecture design for analytical and machine learning pipelines.

Machine Learning

Expert

Extensive hands-on experience in designing, developing, evaluating, and deploying a wide range of machine learning models and advanced algorithms (e.g., neural networks, SVMs, random forests) for business decision-making, including MLOps practices.

Applied AI

High

Strong understanding and practical experience with modern AI paradigms, including Generative AI, Large Language Models (LLMs), and agentic systems, and their application in business contexts, reflecting Goldman Sachs' strategic focus on cutting-edge AI.

Infra & Cloud

High

Experience with major cloud platforms (AWS, Azure, GCP) for deploying, managing, and scaling AI/ML infrastructure and solutions, including cloud-native architectures and services.

Business

High

Ability to translate complex analytical insights into actionable business strategies, understand financial services domain challenges (e.g., credit risk, fraud, marketing), provide business justification for model attributes (e.g., Fair Lending compliance), and strong project management skills.

Viz & Comms

High

Excellent communication and interpersonal skills to articulate complex technical concepts and model insights effectively to both technical and non-technical stakeholders, along with strong documentation practices for peer reviews and validation.

What You Need

Data analysis and processing
Statistical modeling and analysis
Machine learning model development, evaluation, and monitoring
Predictive modeling and advanced algorithms
Dimensionality reduction and feature engineering
Working with large datasets
Model validation and testing
Project management
Business justification for model attributes (e.g., Fair Lending compliance)
Ability to translate complex technical concepts into business insights

Nice to Have

Credit risk modeling
Marketing response modeling and forecasting
Experience in a startup or new business line environment

Languages

PythonJavaC/C++ScalaRMatlabSAS

Tools & Technologies

HadoopSparkAWSAzureGCPDatabricksMLOps platforms/toolsModern AI frameworks

Want to ace the interview?

Practice with real questions.

Start Mock Interview

You're building models inside Goldman's Asset Management division that feed directly into how portfolio managers make decisions. That means portfolio construction signals, credit risk scoring with feature engineering in PySpark on Databricks, and client segmentation work that shapes rebalancing strategies. Success in year one looks like getting a model through GS's independent Model Risk Management validation and into production where a PM actually acts on the output, not leaving it in a notebook.

A Typical Week

A Week in the Life of a Goldman Sachs Data Scientist

Typical L5 workweek · Goldman Sachs

Weekly time split

Analysis — 22%Coding — 20%Meetings — 18%Writing — 15%Break — 10%Research — 8%Infrastructure — 7%

Culture notes

Goldman expects consistent in-office presence five days a week at 200 West Street, and while core hours are roughly 9-to-6, it's common to see Slack activity until 7 or 8 PM during model review cycles or quarter-end pushes.
The culture is polished and high-accountability — you'll spend more time on documentation, model governance, and stakeholder communication than at a typical tech company, and the bar for analytical rigor is set by the Model Risk Management group.

The surprise isn't the modeling time. It's how much of your week disappears into documentation, governance memos, and stakeholder decks. If you're coming from a startup or a tech company where "ship it and iterate" was the ethos, the regulatory overhead at GS will feel like a second job. Debugging an Airflow DAG failure on an upstream data feed one afternoon, then translating Shapley values into PM-friendly language the next morning: that's the actual rhythm.

Projects & Impact Areas

Portfolio rebalancing signal models anchor the Asset Management DS team's work, with your outputs going straight into presentations where you explain lift curves in terms PMs care about. Credit risk scoring runs in parallel, pulling from counterparty exposure tables, and those model outputs feed capital reserve calculations and trading desk decisions. GS is also deploying LLM-based tools for earnings summarization and compliance document analysis, and the firm's skill expectations reflect a real strategic push into GenAI and modern AI, so you'll be building with these tools, not just evaluating them from the sidelines.

Skills & What's Expected

Software engineering is the most underrated skill for this role. Candidates pour prep time into ML theory and statistics, which absolutely matter here, but GS expects you to ship production code, review PRs, and work within deployment pipelines on their modernized tech stack (think GraalVM adoption, the open-source Legend platform). The flip side: financial fluency is where most tech-background candidates fall short. If you can't explain what a basis point move means for a portfolio or why a sparse feature set in credit risk might call for L1 over L2 regularization, your technical depth won't compensate.

Levels & Career Growth

The VP-to-Executive Director promotion is where careers stall. Strong individual contributor work isn't enough at that threshold. You need visible cross-divisional impact and sponsorship from at least one MD. Internal mobility between divisions (Asset Management to Global Markets, for example) requires a formal internal application and your current manager's sign-off, which creates enough friction that many people stay in their lane longer than they'd planned.

Work Culture

Goldman expects consistent in-office presence five days a week at 200 West Street, and CEO David Solomon has been publicly vocal about prioritizing in-person culture. Slack threads run until 7 or 8 PM during quarter-end pushes, and your methodology gets scrutinized in regular readout sessions with MRM validators and senior PMs who won't nod politely if your backtesting looks thin. The feedback loop is fast because of that scrutiny, but the documentation and governance burden that comes with operating under regulatory oversight will feel heavy if you've never worked in financial services before.

Goldman Sachs Data Scientist Compensation

GS comp is a bonus-driven machine. Your annual discretionary bonus is the real variable, tied to both individual and firm performance, and it's not guaranteed money. Senior roles may include RSUs, but for most data scientists the cash bonus is where the action is.

Base salary is negotiable, so don't leave that on the table. The source of real leverage, though, is a competing offer, which can help you push for a stronger sign-on bonus. Spend your negotiation energy on those two levers: base and sign-on.

Goldman Sachs Data Scientist Interview Process

5 rounds·~3 weeks end to end

Initial Screen

1 round

Behavioral

15mVideo Call

This initial step is an AI-conducted, recorded video interview where you'll answer five general personality and behavioral questions. You'll have two minutes to respond to each question, focusing on how you navigate different work situations and solve real-world problems. The goal is to assess your fit with Goldman Sachs's culture and values.

behavioralgeneral

Tips for this round

Practice answering common behavioral questions using the STAR method (Situation, Task, Action, Result).
Research Goldman Sachs's 14 Business Principles and core values to align your responses.
Familiarize yourself with the Hirevue platform by doing a practice run to get comfortable with the timed format.
Prepare examples that showcase your problem-solving skills, teamwork, and resilience.
Ensure you have a quiet, well-lit environment with a stable internet connection for the recording.

Technical Assessment

1 round

Coding & Algorithms

60mLive

You will face a technical challenge focused heavily on data structures and algorithm questions. This round assesses your foundational computer science knowledge and your ability to write efficient, clean code. Expect to solve problems similar to those found on platforms like datainterview.com/coding.

algorithmsdata_structuresengineering

Tips for this round

Practice a wide range of datainterview.com/coding-style problems, focusing on medium to hard difficulty.
Be proficient in at least one programming language (e.g., Python, Java, C++) and be able to articulate your thought process.
Understand time and space complexity analysis (Big O notation) and be able to apply it to your solutions.
Review common data structures like arrays, linked lists, trees, graphs, and hash maps, along with their associated algorithms.
Practice explaining your approach out loud, discussing trade-offs, and handling edge cases.

Onsite

3 rounds

Behavioral

30mLive

As part of the 'Superday' final round, this interview will delve deeper into your professional experiences, motivations, and cultural fit. You'll discuss your career aspirations, how you handle challenges, and your approach to collaboration. Interviewers, often senior employees or hiring managers, will assess your alignment with the firm's leadership principles.

behavioralgeneral

Tips for this round

Prepare several detailed STAR stories that highlight your leadership, teamwork, problem-solving, and resilience.
Be ready to discuss your resume in detail, connecting your past experiences to the requirements of a Data Scientist role at Goldman Sachs.
Articulate why you are interested in Goldman Sachs specifically and how your values align with the firm's culture.
Formulate thoughtful questions to ask your interviewer about their role, the team, and the firm's direction.
Demonstrate enthusiasm and a strong work ethic, which are highly valued at Goldman Sachs.

Machine Learning & Modeling

30mLive

This technical interview, also part of the Superday, will focus on your expertise in machine learning concepts and their practical application. You can expect questions on various ML algorithms, model selection, evaluation metrics, and how to deploy and monitor models. You may also be asked to solve a coding problem related to ML or data manipulation.

machine_learningstatisticsalgorithmsml_coding

Tips for this round

Review core machine learning algorithms (e.g., linear models, tree-based models, clustering, neural networks) and their underlying mathematics.
Understand model evaluation metrics (e.g., precision, recall, F1, AUC, RMSE) and when to use them.
Be prepared to discuss your past data science projects in detail, highlighting your contributions and the impact.
Practice coding solutions for common data science tasks, such as data cleaning, feature engineering, and model implementation.
Familiarize yourself with MLOps concepts and how to build robust, scalable ML systems.

Case Study

30mLive

In this Superday interview, you'll be presented with a real-world business problem, likely related to finance or market analysis, and asked to develop a data-driven solution. This round assesses your ability to structure complex problems, make data-informed decisions, and communicate your thought process clearly. You'll need to demonstrate both your analytical skills and your business acumen.

product_sensefinancedata_modelingstatistics

Tips for this round

Clarify the problem statement and objectives before diving into solutions, asking clarifying questions about data availability and constraints.
Structure your approach logically, outlining steps from data collection and cleaning to model building and evaluation.
Consider the business impact and potential risks of your proposed solution, demonstrating a holistic perspective.
Practice guesstimate questions and back-of-the-envelope calculations, as these are common in finance-related case studies.
Communicate your reasoning clearly and concisely, explaining assumptions and trade-offs to the interviewer.

Tips to Stand Out

Deep Dive into Goldman Sachs. Thoroughly research the firm's history, recent financial results, business principles, and current market activities. Understand their commitment to corporate citizenship and diversity.
Master Data Structures & Algorithms. Goldman Sachs places a significant emphasis on foundational computer science skills. Dedicate substantial time to practicing coding problems, especially those involving common data structures and algorithms.
Prepare for Behavioral Questions. Craft compelling stories using the STAR method that showcase your leadership, teamwork, problem-solving, and resilience, aligning them with Goldman Sachs's values.
Understand Financial Context. While a Data Scientist role, working at an investment bank means understanding basic financial concepts, market dynamics, and how data science applies to financial services.
Practice Explaining Your Work. Be able to clearly articulate your thought process for technical problems, the rationale behind your machine learning choices, and your approach to business case studies.
Ask Thoughtful Questions. Prepare insightful questions for your interviewers about their roles, the team, company culture, and specific projects. This demonstrates engagement and genuine interest.
Show Enthusiasm and Drive. Goldman Sachs values candidates with a strong work ethic and a proactive attitude. Convey your passion for data science and your desire to contribute to a high-performing environment.

Common Reasons Candidates Don't Pass

✗Lack of Technical Depth. Candidates often struggle with the rigor of data structure and algorithm questions, failing to provide optimal solutions or clearly explain their approach.
✗Poor Cultural Fit. Not demonstrating alignment with Goldman Sachs's business principles, values, or high-performance culture can lead to rejection, even with strong technical skills.
✗Inability to Articulate Problem-Solving. Candidates may solve a problem but fail to clearly communicate their thought process, assumptions, and trade-offs, which is crucial for complex financial problems.
✗Insufficient Domain Knowledge. For a Data Scientist role at a financial institution, a lack of understanding of basic finance concepts or how data science applies to the industry can be a significant drawback.
✗Weak Behavioral Responses. Generic or unprepared answers to behavioral questions that don't showcase specific examples of impact, leadership, or resilience often result in a negative impression.

Offer & Negotiation

Goldman Sachs offers a competitive compensation package typically comprising a base salary, a significant annual bonus (often discretionary and tied to individual and firm performance), and sometimes restricted stock units (RSUs) for more senior roles. While base salary is negotiable, the bonus component can be substantial and varies year-to-year. Focus your negotiation on the base salary and a potential sign-on bonus, especially if you have competing offers. Be prepared to articulate your value and market worth based on your skills and experience.

The HireVue screen is AI-conducted and recorded, which means your first impression at GS happens without a human in the room. Candidates who treat it as a throwaway often don't advance to the live coding round, where you'll write production-quality solutions under time pressure on GS's preferred data structures and algorithm problems. Among the most common rejection reasons candidates report: lacking technical depth on algorithms and failing to clearly walk interviewers through their reasoning, and both of these get tested before you ever reach the Superday.

The Superday itself packs a second behavioral round, an ML and modeling deep-dive, and a finance-flavored case study into a single day. That second behavioral isn't redundant. GS's 14 Business Principles (client service, integrity, excellence) aren't just wall art; interviewers in that round probe whether your stakeholder communication and values alignment hold up under scrutiny from a different interviewer than the one who screened you initially. Post-Superday, from what candidates report, the wait for a final answer can stretch beyond the typical timeline if division leadership needs to weigh in on headcount or competing candidates.

Goldman Sachs Data Scientist Interview Questions

Machine Learning & Statistical Modeling (Finance)

Expect questions that force you to choose and defend models for noisy, non-stationary financial data (returns, spreads, vol surfaces), including features, leakage controls, and validation. Candidates often stumble when translating a modeling choice into concrete evaluation logic under regime shifts.

You are forecasting 1-day ahead PnL for an equity stat-arb book using factor exposures, recent returns, and borrow cost data, and you see an AUC of 0.62 in random CV but 0.51 live. How do you redesign validation to eliminate leakage and handle regime shifts, and what metric do you report to Risk?

MediumTime Series Validation and Leakage

Sample Answer

Most candidates default to random $k$-fold CV, but that fails here because it mixes future information into the training folds through time dependence, corporate actions, and slow-moving features like borrow cost. Use walk-forward or purged, embargoed CV with a time gap sized to your maximum feature lookback and position holding period, and validate across distinct regimes (pre, during, post stress). Report a utility-aligned metric like out-of-sample information ratio of the strategy implied by the model, plus calibration (reliability) for drawdown control. If Risk wants classification, give out-of-sample precision at a fixed turnover or risk budget, not raw AUC.

You need a model for daily default risk of a corporate bond issuer where defaults are rare and censored, and you want time-varying covariates like spreads and equity vol. Which modeling family fits and how do you validate it without overstating performance?

EasySurvival Modeling and Calibration

Sample Answer

Use a survival model with time-varying covariates, for example a Cox model (possibly regularized) or a discrete-time hazard model. Censoring breaks naive logistic regression labels because non-default does not mean no default, it often means not yet. Validate with time-based splits and survival metrics like concordance and time-dependent Brier score, and check calibration of predicted hazard or $S(t)$ across buckets. Also stress-test stability by issuer sector and rating band to catch spurious spread-driven signals.

You are building a short-horizon realized volatility forecast for an options market-making desk using intraday returns, order book imbalance, and macro event indicators. How do you choose between a GARCH-type model and a gradient-boosted tree model, and what diagnostic tells you the model is breaking under a new volatility regime?

HardVolatility Modeling and Regime Robustness

Practice more Machine Learning & Statistical Modeling (Finance) questions

Financial Markets, Risk, and Quant Reasoning

Most candidates underestimate how much domain intuition you need to sanity-check a model’s output against market mechanics and risk constraints. You’ll be tested on how you reason about PnL drivers, risk metrics, hedging intuition, and what can go wrong in trading or portfolio contexts.

A delta-hedged long call on a non-dividend stock shows a consistent negative daily PnL while implied volatility is flat and your hedging is correct. What is the most likely driver, and how do you validate it with trade and market data?

EasyOptions PnL Attribution

Sample Answer

It is theta bleed, with residual slippage and funding explaining any deviation from the textbook decay. A delta-hedged long option has expected PnL of roughly $\Theta\,dt + \tfrac{1}{2}\Gamma\,(dS)^2$; with flat implied vol and typical realized moves, the negative $\Theta$ dominates. Validate by decomposing daily PnL into Greeks using mid-market Greeks at hedge times, then reconcile the leftover with bid-ask, discrete hedging error, and financing on the stock hedge.

You need 1-day 99% VaR for a $5\times 10^9$ multi-asset portfolio (rates, equities, credit) for a risk committee, with 3 years of daily history and known volatility clustering. Do you ship historical simulation or parametric (EWMA or GARCH) VaR, and what failure modes do you call out?

MediumVaR Method Selection and Model Risk

Sample Answer

You could do historical simulation VaR or parametric VaR (EWMA or GARCH). Historical simulation wins here because it captures non-normal tails and cross-asset dependence without forcing a covariance model, which is exactly where committees get burned. Parametric can be cleaner for intraday scaling and scenario expansion, but it fails hard if returns are skewed, fat-tailed, or correlations spike, and the Gaussian assumption underprices $\text{VaR}_{0.99}$. Either way you must flag procyclicality, window sensitivity, and backtesting gaps, especially around regime shifts and stale marks.

A trader claims their intraday mean-reversion signal is market-neutral because its beta to the S&P 500 futures is near zero, yet the strategy has large losses on high-vol days. How do you diagnose whether the exposure is to volatility, liquidity, or correlation breakdown, and what risk metric would you add beyond beta?

HardMarket Neutrality and Hidden Risk Exposures

Practice more Financial Markets, Risk, and Quant Reasoning questions

Coding & Algorithms (Round 2)

Your performance here depends on writing correct, efficient code under time pressure, not just describing an approach. You’ll see problems that resemble real data work (arrays/strings/hash maps, streaming-style computations) with clear complexity expectations.

You ingest a live stream of equity trades as (ts, symbol, size) and need to flag the first symbol whose cumulative size exceeds a threshold $T$ (ties broken by earliest timestamp). Implement a function that returns (symbol, ts) or null if no symbol crosses $T$.

EasyStreaming Hash Map Aggregation

Sample Answer

You could sort by timestamp and then aggregate, or you could aggregate in one pass while tracking the first crossing event. Sorting is $O(n\log n)$ and wastes time when the stream is already in order. The one-pass hash map wins here because it is $O(n)$ time, $O(k)$ space for $k$ symbols, and you can stop early the moment any symbol crosses $T$.

Python

1from __future__ import annotations
2
3from dataclasses import dataclass
4from typing import Dict, Iterable, Optional, Tuple
5
6
7@dataclass(frozen=True)
8class Trade:
9    ts: int  # epoch millis or any comparable integer timestamp
10    symbol: str
11    size: float
12
13
14def first_symbol_to_cross_threshold(
15    trades: Iterable[Trade],
16    T: float,
17) -> Optional[Tuple[str, int]]:
18    """Return (symbol, ts) for the first symbol whose cumulative size exceeds T.
19
20    Assumptions:
21      - trades arrive in non-decreasing timestamp order (streaming ingestion).
22      - crossing means cumulative_size > T (strictly exceeds).
23      - If multiple symbols cross at the same timestamp, return the one that crossed
24        at that timestamp first in the input order (stable tie-break).
25
26    Time: O(n)
27    Space: O(k) where k is number of distinct symbols
28    """
29    if T < 0:
30        # If threshold is negative, any positive cumulative would exceed immediately.
31        # But to keep semantics sane, treat it as already exceeded by first trade.
32        for tr in trades:
33            return (tr.symbol, tr.ts)
34        return None
35
36    cum: Dict[str, float] = {}
37
38    for tr in trades:
39        prev = cum.get(tr.symbol, 0.0)
40        new_total = prev + tr.size
41        cum[tr.symbol] = new_total
42
43        # Flag the first crossing event in stream order.
44        if prev <= T and new_total > T:
45            return (tr.symbol, tr.ts)
46
47    return None
48

Given daily PnL by strategy for a portfolio as an array of floats, compute the maximum drawdown value over the full period in $O(n)$ time (drawdown is peak-to-trough decline of cumulative PnL). Return the drawdown as a positive number.

MediumPrefix Sums and Running Extremes

Sample Answer

Walk through the logic step by step as if thinking out loud. You turn daily PnL into a cumulative equity curve by adding each day to a running total. You keep the best peak seen so far, then at each day you measure the drop from that peak to the current equity. The maximum of those drops over the scan is the maximum drawdown, with one pass and constant extra space.

Python

1from __future__ import annotations
2
3from typing import Iterable
4
5
6def max_drawdown(daily_pnl: Iterable[float]) -> float:
7    """Compute maximum drawdown on cumulative PnL.
8
9    Let equity_t = sum_{i<=t} pnl_i. Drawdown at t is max_{s<=t} equity_s - equity_t.
10    Return max_t drawdown_t as a positive float.
11
12    Time: O(n)
13    Space: O(1)
14    """
15    equity = 0.0
16    peak = 0.0
17    max_dd = 0.0
18
19    for pnl in daily_pnl:
20        equity += pnl
21        if equity > peak:
22            peak = equity
23        else:
24            dd = peak - equity
25            if dd > max_dd:
26                max_dd = dd
27
28    return max_dd
29

You have mid-price snapshots for a single symbol as (ts, mid) sorted by ts, and you need the maximum absolute mid-price change within any rolling window of length $W$ seconds, computed over all pairs of points inside each window. Implement an $O(n)$ solution that returns the maximum value.

HardMonotonic Deques, Sliding Window Extremes

Practice more Coding & Algorithms (Round 2) questions

SQL / Data Retrieval for Analytics

The bar isn’t whether you know SELECT syntax; it’s whether you can express multi-step financial analytics cleanly with joins, window functions, and careful filtering. You’ll be judged on correctness with edge cases like missing days, duplicated fills, and as-of joins.

You have equity order executions with occasional duplicated fill events. Write SQL to compute each account’s daily traded notional ($\sum \text{qty} \cdot \text{price}$) for the last 30 calendar days, deduping fills by the latest ingested record per (trade_id, fill_id) and returning 0 notional on no-trade days.

MediumWindow Functions

Sample Answer

Reason through it: You first isolate the time window and dedupe at the fill grain, because any aggregation before dedupe bakes in overcounting. Next, compute per-fill notional and roll it up to (account_id, trade_date). Then build a calendar of the last 30 days and cross it with the set of accounts you care about, left join the daily aggregates, and coalesce missing rows to 0. This is where most people fail, they skip the calendar join and silently drop zero-activity days.

SQL

1/*
2Assumptions
3- executions(trade_id, fill_id, account_id, symbol, qty, price, execution_ts, ingest_ts)
4- Postgres-compatible SQL
5- 'Latest ingested' means greatest ingest_ts for a given (trade_id, fill_id)
6*/
7
8WITH params AS (
9  SELECT
10    (CURRENT_DATE - INTERVAL '29 days')::date AS start_date,
11    CURRENT_DATE::date AS end_date
12),
13-- 1) Dedupe fills to the latest ingested record per (trade_id, fill_id)
14dedup_fills AS (
15  SELECT
16    e.trade_id,
17    e.fill_id,
18    e.account_id,
19    e.qty,
20    e.price,
21    e.execution_ts,
22    e.ingest_ts,
23    ROW_NUMBER() OVER (
24      PARTITION BY e.trade_id, e.fill_id
25      ORDER BY e.ingest_ts DESC
26    ) AS rn
27  FROM executions e
28  JOIN params p
29    ON e.execution_ts::date BETWEEN p.start_date AND p.end_date
30),
31latest_fills AS (
32  SELECT
33    trade_id,
34    fill_id,
35    account_id,
36    qty,
37    price,
38    execution_ts
39  FROM dedup_fills
40  WHERE rn = 1
41),
42-- 2) Aggregate notional by account and trade date
43daily_notional AS (
44  SELECT
45    account_id,
46    execution_ts::date AS trade_date,
47    SUM(qty * price) AS traded_notional
48  FROM latest_fills
49  GROUP BY 1, 2
50),
51-- 3) Calendar for the last 30 calendar days
52calendar AS (
53  SELECT generate_series(p.start_date, p.end_date, INTERVAL '1 day')::date AS trade_date
54  FROM params p
55),
56-- 4) Accounts with any activity in the window (scope accounts to active ones)
57active_accounts AS (
58  SELECT DISTINCT account_id
59  FROM latest_fills
60)
61SELECT
62  a.account_id,
63  c.trade_date,
64  COALESCE(d.traded_notional, 0) AS traded_notional
65FROM active_accounts a
66CROSS JOIN calendar c
67LEFT JOIN daily_notional d
68  ON d.account_id = a.account_id
69 AND d.trade_date = c.trade_date
70ORDER BY a.account_id, c.trade_date;

Given end-of-day positions for a portfolio and a separate daily close price table with gaps on holidays, write SQL to compute the portfolio’s daily market value for the last 60 business days, using the most recent available close price on or before each position date (as-of join).

HardAs-Of Join

Sample Answer

Start with what the interviewer is really testing: This question is checking whether you can implement an as-of join without exploding row counts or accidentally forward-filling prices. You need to join each (symbol, position_date) to the max price_date where $\text{price\_date} \le \text{position\_date}$. Then compute $\sum \text{shares} \cdot \text{price}$ at the portfolio-date grain, and keep the date filter at the position level so you do not miss earlier prices needed for the as-of lookup. If you filter the price table too aggressively, you get nulls and understate risk and PnL.

SQL

1/*
2Assumptions
3- positions_eod(portfolio_id, position_date, symbol, shares)
4- prices_close(symbol, price_date, close_price)
5- Business days are represented by position_date rows (common in IM systems).
6- Postgres-compatible SQL.
7*/
8
9WITH params AS (
10  SELECT
11    (CURRENT_DATE - INTERVAL '90 days')::date AS lookback_for_prices, -- extra buffer for as-of
12    CURRENT_DATE::date AS asof_end
13),
14-- 1) Get the last 60 business days from positions (business calendar comes from position rows)
15last_60_bd AS (
16  SELECT position_date
17  FROM (
18    SELECT
19      DISTINCT p.position_date,
20      DENSE_RANK() OVER (ORDER BY p.position_date DESC) AS dr
21    FROM positions_eod p
22    JOIN params prm
23      ON p.position_date <= prm.asof_end
24  ) x
25  WHERE x.dr <= 60
26),
27positions_scoped AS (
28  SELECT
29    p.portfolio_id,
30    p.position_date,
31    p.symbol,
32    p.shares
33  FROM positions_eod p
34  JOIN last_60_bd d
35    ON p.position_date = d.position_date
36),
37-- 2) As-of join using a lateral subquery to pick the latest close on or before position_date
38positions_priced AS (
39  SELECT
40    ps.portfolio_id,
41    ps.position_date,
42    ps.symbol,
43    ps.shares,
44    pc.close_price
45  FROM positions_scoped ps
46  LEFT JOIN LATERAL (
47    SELECT c.close_price
48    FROM prices_close c
49    JOIN params prm ON TRUE
50    WHERE c.symbol = ps.symbol
51      AND c.price_date <= ps.position_date
52      AND c.price_date >= prm.lookback_for_prices
53    ORDER BY c.price_date DESC
54    LIMIT 1
55  ) pc ON TRUE
56)
57SELECT
58  portfolio_id,
59  position_date,
60  SUM(shares * close_price) AS market_value
61FROM positions_priced
62GROUP BY 1, 2
63ORDER BY portfolio_id, position_date;

For a rates trading desk, you have intraday risk snapshots per book with timestamped DV01 values. Write SQL to return, for each book and day, the end-of-day DV01 and the max intraday DV01, treating end-of-day as the latest snapshot at or before 16:00 New York time.

MediumTime Windows and Cutoffs

Practice more SQL / Data Retrieval for Analytics questions

Case Study: Investment/Risk Analytics & Business Judgment

In the case, you’re expected to turn an ambiguous investment-management or risk problem into a crisp plan with measurable success criteria. What trips people up is failing to define the decision, the metric, the constraints (risk, latency, costs), and the iteration path.

You are asked to decide whether to allocate 5 percent of a multi-asset portfolio to a new stat-arb strategy whose backtest shows a Sharpe of 1.8 but has only 18 months of data and high turnover. What decision framework, metrics (including at least one tail metric), and data checks do you present to the investment committee to approve, size, or reject the allocation?

EasyPortfolio Allocation and Risk Governance

Sample Answer

This question is checking whether you can turn a shiny backtest into an investment decision with explicit risk, capacity, and implementation constraints. You should anchor on a go or no-go plus sizing, then lay out success metrics like net-of-costs Sharpe, hit rate stability, and a tail metric such as 99 percent 1-day CVaR, plus drawdown and time-to-recover. Data checks should include survivorship and lookahead bias tests, realistic transaction cost and slippage modeling, and sensitivity to rebalancing frequency. Most people fail by quoting Sharpe and ignoring turnover, costs, and regime dependence.

A daily equity factor model used for risk and hedging starts underestimating portfolio $99\%$ 10-day VaR during volatility spikes, and the desk wants a fix within 2 weeks without breaking hedging workflows. How do you diagnose whether the issue is model misspecification, stale correlations, or data problems, and what change do you ship that improves tail coverage while controlling false alarms and model risk?

HardVaR Backtesting and Tail-Risk Remediation

Practice more Case Study: Investment/Risk Analytics & Business Judgment questions

Behavioral & Stakeholder Management

You’ll need to show you can operate in a high-stakes environment where model risk, scrutiny, and cross-team dependencies are normal. Interviewers look for structured stories about ownership, influencing without authority, handling pushback, and making tradeoffs under deadlines.

A portfolio manager wants to ship a new factor model into the daily risk report, but Model Risk Management is pushing back on explainability and stability under regime shifts. Describe how you align PM, MRM, and Engineering on a go live decision and what evidence you bring to the approval meeting.

EasyInfluencing Without Authority

Sample Answer

The standard move is to lock a decision framework up front: owner, success metrics, validation checklist, and a date when evidence freezes for review. But here, model governance matters because MRM is optimizing for auditability and tail risk, so you bring pre agreed artifacts like out of sample stability, stress scenarios, and a clear rollback plan that Engineering can execute.

During a fast market event, your intraday risk alert model spikes false positives, Trading wants thresholds loosened immediately, and Compliance warns about missing true limit breaches. Walk through how you decide, communicate, and document the change in under 60 minutes.

HardHigh Stakes Decision Making Under Time Pressure

Practice more Behavioral & Stakeholder Management questions

What jumps out isn't any single dominant area. It's that GS spreads weight across six areas with no safe one to skip, yet the case study round (which simulates real Asset Management decisions like sizing an allocation to a new stat-arb strategy) demands you pull from nearly every other area simultaneously. The biggest prep mistake is drilling ML and coding in isolation, then freezing when a case prompt requires you to question a backtest's Sharpe ratio, sketch a validation scheme, and justify the business decision in one coherent answer. From what candidates report, that synthesis under pressure is where preparation breaks down.

Drill GS-style case and modeling questions with financial framing at datainterview.com/questions.

How to Prepare for Goldman Sachs Data Scientist Interviews

Know the Business

Updated Q1 2026

Official mission

“Goldman Sachs’ mission is to advance sustainable economic growth and financial opportunity across the globe.”

What it actually means

Goldman Sachs aims to provide comprehensive financial services, including investment banking, asset management, and wealth management, to a diverse global client base. Its core purpose is to foster sustainable economic growth and broaden financial opportunities for individuals and institutions worldwide.

New York, New YorkHybrid - Flexible

Key Business Metrics

Revenue

$59B

+15% YoY

Market Cap

$279B

+35% YoY

Employees

47K

+3% YoY

Business Segments and Where DS Fits

Goldman Sachs Asset Management

The primary investing area within Goldman Sachs, delivering investment and advisory services across public and private markets for the world's leading institutions, financial advisors, and individuals. It is a leading investor across fixed income, liquidity, equity, alternatives, and multi-asset solutions. Goldman Sachs oversees approximately $3.5 trillion in assets under supervision as of September 30, 2025.

DS focus: Utilizing quantitative strategies to navigate market complexities and inefficiencies, employing data-driven approaches for diversified portfolios, and leveraging AI applications for automation, customer engagement, and operational intelligence.

Current Strategic Priorities

Expand offerings in the wealth channel to help more investors reach their long-term goals by combining expertise with T. Rowe Price through co-branded model portfolios.

Competitive Moat

Larger scaleDiversityProven ability to invest in technologyProven ability to invest for future growth in attractive geographiesHigher revenue from investment banking and trading activitiesInherent competitive advantage given their base in the world’s broadest and deepest capital marketsBenefit of a faster recovery after the global financial crisisMore favourable operating environment

Goldman Sachs Asset Management oversees $3.5 trillion in assets under supervision, and its 2026 investment outlook makes clear that alternatives and AI-driven analytics are where the firm is placing its bets. For data scientists, that translates to quantitative strategies for navigating market inefficiencies and building AI applications for automation and customer engagement, not just back-office reporting.

The "why GS" answer that actually works ties your skills to a specific problem the firm has publicly named. GS has forecast that alternative investment demand will outstrip origination supply, a supply-demand imbalance that creates real modeling challenges around allocation and risk. Pair that with the firm's published skepticism about generative AI hype and you can articulate a point of view on where LLMs add value in investment research versus where they don't. That's a far more memorable answer than "I want to apply ML in finance."

Try a Real Interview Question

5-Day Rolling Volatility of Daily Portfolio PnL

sql

Given a table of daily portfolio $pnl$ by $trade_date$, return one row per date with the 5-trading-day rolling sample volatility of $pnl$ (use $n-1$ in the denominator) and include dates even when fewer than 5 prior rows exist. Output columns: $trade\_date$, $window\_n$, $pnl\_mean$, $pnl\_vol$, where $pnl\_vol=\sqrt{\frac{\sum (x_i-\bar{x})^2}{n-1}}$ when $n\ge2$, else $NULL$.

portfolio_pnl

trade_date	portfolio_id	pnl
2025-01-02	PF1	100
2025-01-03	PF1	-50
2025-01-06	PF1	150
2025-01-07	PF1	0
2025-01-08	PF1	200

SQL

1WITH base AS (
2  SELECT
3    trade_date,
4    portfolio_id,
5    CAST(pnl AS DOUBLE PRECISION) AS pnl
6  FROM portfolio_pnl
7  WHERE portfolio_id = 'PF1'
8), w AS (
9  SELECT
10    trade_date,
11    COUNT(*) OVER (
12      ORDER BY trade_date
13      ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
14    ) AS window_n,
15    AVG(pnl) OVER (
16      ORDER BY trade_date
17      ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
18    ) AS pnl_mean,
19    SUM(pnl * pnl) OVER (
20      ORDER BY trade_date
21      ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
22    ) AS sum_x2,
23    SUM(pnl) OVER (
24      ORDER BY trade_date
25      ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
26    ) AS sum_x
27  FROM base
28)
29SELECT
30  trade_date,
31  window_n,
32  pnl_mean,
33  CASE
34    WHEN window_n >= 2 THEN
35      SQRT( (sum_x2 - (sum_x * sum_x) / window_n) / (window_n - 1) )
36    ELSE NULL
37  END AS pnl_vol
38FROM w
39ORDER BY trade_date;

700+ ML coding problems with a live Python executor.

Practice in the Engine

GS data scientists ship production code on an internal stack that includes Scala, Java, and contributions to the open-source Legend platform, so coding rounds test whether you write reviewable, deployable code rather than notebook-style scripts. Expect the interviewer to probe edge cases and ask you to reason about complexity tradeoffs out loud. Drill similar problems on datainterview.com/coding, prioritizing clean Python with proper error handling over speed-optimized one-liners.

Test Your Readiness

How Ready Are You for Goldman Sachs Data Scientist?

1 / 10

Machine Learning

Can you design a time series prediction model for daily equity returns while avoiding lookahead bias, including a train, validation, and test scheme appropriate for nonstationary data?

Focus on the financial reasoning gaps (Sharpe ratio intuition, VaR interpretation) that separate GS interviews from generic DS loops, then target those weak spots at datainterview.com/questions.

Frequently Asked Questions

How long does the Goldman Sachs Data Scientist interview process take?

Expect roughly 4 to 8 weeks from application to offer. You'll typically go through an initial recruiter screen, a technical phone interview, and then a Super Day (their version of an onsite) with multiple back-to-back interviews. Goldman moves faster for campus hires and can be slower for experienced roles, especially if there's internal committee review. Don't be surprised if there are delays between rounds. The firm has a structured approval process that can add a week or two.

What technical skills are tested in the Goldman Sachs Data Scientist interview?

SQL, Python, and statistics are the big three. You'll be tested on data analysis and processing, statistical modeling, machine learning model development and evaluation, and predictive modeling. They also care about dimensionality reduction, feature engineering, and model validation. Goldman uses Python heavily, but familiarity with R, Java, Scala, or SAS can help depending on the team. I'd say Python and SQL fluency are non-negotiable, everything else is a bonus.

How should I tailor my resume for a Goldman Sachs Data Scientist role?

Lead with impact, not tools. Goldman wants to see that you can translate technical work into business outcomes, so frame your bullet points around results: revenue generated, risk reduced, efficiency gained. Mention experience with large datasets explicitly since they work at massive scale. If you've done anything related to model validation, compliance (like Fair Lending), or working cross-functionally with business stakeholders, put that front and center. Keep it to one page if you have under 10 years of experience. Financial services experience helps but isn't required.

What is the total compensation for a Goldman Sachs Data Scientist?

Goldman Sachs pays competitively with other top financial institutions. For an entry-level or Associate-level Data Scientist in New York, expect a base salary in the $120K to $150K range, with total comp (including bonus) reaching $150K to $200K. Vice President-level data scientists can see total comp in the $200K to $350K range depending on the division and performance. Bonuses at Goldman are a significant portion of pay and are heavily tied to firm and individual performance. Keep in mind New York cost of living when evaluating these numbers.

How do I prepare for the behavioral interview at Goldman Sachs for a Data Scientist position?

Goldman's core values are partnership, client service, integrity, and excellence. Your behavioral answers need to reflect these. They'll ask about teamwork, handling disagreements, and times you went above and beyond for a stakeholder. Prepare stories that show you can work across teams and communicate technical findings to non-technical people. Goldman has a strong culture of collaboration, so anything that signals you're a lone wolf will hurt you. I've seen candidates get dinged on culture fit even with strong technical performance.

How hard are the SQL and coding questions in the Goldman Sachs Data Scientist interview?

SQL questions are medium difficulty. Think window functions, CTEs, complex joins, and aggregation problems, often framed around financial data. Python coding questions tend to focus on data manipulation (pandas, numpy) and implementing algorithms from scratch rather than pure software engineering puzzles. You might get asked to write a function for a statistical test or build a simple model pipeline. Practice at datainterview.com/coding to get comfortable with the style and pacing. The questions aren't tricky for the sake of being tricky, but they expect clean, efficient code.

What machine learning and statistics concepts should I know for a Goldman Sachs Data Scientist interview?

Regression (linear and logistic), tree-based models, and ensemble methods come up frequently. You should be solid on bias-variance tradeoff, regularization, cross-validation, and evaluation metrics like AUC, precision, and recall. Goldman also tests on dimensionality reduction (PCA especially), feature engineering, and model monitoring over time. Predictive modeling with advanced algorithms is a listed requirement, so be ready to discuss gradient boosting or neural nets at a conceptual level. They care a lot about model validation and testing, so know how to explain why a model works, not just that it works.

What format should I use to answer behavioral questions at Goldman Sachs?

Use the STAR format (Situation, Task, Action, Result) but keep it tight. Goldman interviewers are busy and sharp. They don't want a five-minute monologue. Aim for 90 seconds to two minutes per answer. Spend most of your time on the Action and Result. Quantify results whenever possible. And always connect back to what the business gained from your work. Practice 6 to 8 stories that you can adapt to different question angles, and you'll cover most of what they throw at you.

What happens during the Goldman Sachs Data Scientist onsite or Super Day?

The Super Day typically involves 3 to 5 back-to-back interviews, each about 30 to 45 minutes. You'll face a mix of technical and behavioral rounds. Some interviewers will drill into your past projects, others will give you live coding or case-style problems. At least one round will focus on statistical reasoning or ML concepts. Expect at least one senior person (VP or Managing Director) to assess culture fit and your ability to explain technical concepts in business terms. It's a long day. Bring water, stay sharp, and treat every interviewer like they have equal say in the decision, because they do.

What business metrics and domain concepts should I know for a Goldman Sachs Data Scientist interview?

You don't need to be a finance expert, but you should understand basic financial concepts relevant to data science: risk modeling, portfolio optimization, fraud detection, and compliance. Goldman specifically mentions business justification for model attributes and Fair Lending compliance, so read up on how models are evaluated for fairness and regulatory requirements. Know how to frame a data science problem in terms of business value. If you can explain how your model saves money, reduces risk, or improves a client outcome, you'll stand out. Practice with business-oriented case questions at datainterview.com/questions.

What common mistakes do candidates make in the Goldman Sachs Data Scientist interview?

The biggest one I see is being too academic. Goldman wants practitioners who can ship models and explain them to stakeholders. If you can't articulate why your model matters to the business, that's a red flag. Another common mistake is underestimating the behavioral rounds. Some candidates prep only for technical and then stumble when asked about teamwork or conflict. Finally, don't ignore model validation. Goldman operates in a regulated environment, so they want to hear about how you test, monitor, and justify your models. Not just how you build them.

Does Goldman Sachs prefer specific programming languages for Data Scientist roles?

Python is the primary language and the one you'll be tested on most. That said, Goldman lists Java, C/C++, Scala, R, Matlab, and SAS as relevant languages. Different teams use different stacks. Quantitative strategy teams might lean toward C++ or Scala for performance. Risk teams might use SAS or R for legacy systems. For the interview itself, Python is your safest bet. Be fluent in pandas, numpy, and scikit-learn. If you know a second language from their list, mention it, but don't stress about knowing all of them.

Goldman Sachs Data Scientist Interview Guide

Goldman Sachs Data Scientist Role

A Typical Week

A Week in the Life of a Goldman Sachs Data Scientist

Weekly time split

Culture notes

Projects & Impact Areas

Skills & What's Expected

Levels & Career Growth

Work Culture

Goldman Sachs Data Scientist Compensation

Goldman Sachs Data Scientist Interview Process

Initial Screen

Behavioral

Technical Assessment

Coding & Algorithms

Onsite

Behavioral

Machine Learning & Modeling

Case Study

Tips to Stand Out

Common Reasons Candidates Don't Pass

Goldman Sachs Data Scientist Interview Questions

Machine Learning & Statistical Modeling (Finance)

Financial Markets, Risk, and Quant Reasoning

Coding & Algorithms (Round 2)

SQL / Data Retrieval for Analytics

Case Study: Investment/Risk Analytics & Business Judgment

Behavioral & Stakeholder Management

How to Prepare for Goldman Sachs Data Scientist Interviews

Try a Real Interview Question

5-Day Rolling Volatility of Daily Portfolio PnL

Test Your Readiness

Frequently Asked Questions

Dan Lee

Related Articles

Salesforce Machine Learning Engineer Interview Guide

TikTok Data Engineer Interview Guide

Salesforce AI Engineer Interview Guide