Goldman Sachs Data Scientist at a Glance
Interview Rounds
5 rounds
Difficulty
Goldman Sachs screens for behavioral fit and stakeholder communication across multiple touchpoints in its data scientist interview loop. If you're the type who preps only for the ML round and wings the "tell me about a time" questions, you're making a mistake that costs a lot of otherwise strong candidates.
Goldman Sachs Data Scientist Role
Primary Focus
Skill Profile
Math & Stats
ExpertDeep theoretical and practical expertise in advanced statistical modeling, quantitative analysis, time series analysis, information theory, and mathematical foundations, particularly for financial applications and capital planning (e.g., CCAR).
Software Eng
HighStrong programming skills for building, deploying, and integrating data science models into production systems, with an understanding of MLOps practices and scalable architectures. Experience with mainstream programming languages is essential.
Data & SQL
HighExperience with large-scale data processing, Big Data tools (e.g., Hadoop, Spark), and contributing to data architecture design for analytical and machine learning pipelines.
Machine Learning
ExpertExtensive hands-on experience in designing, developing, evaluating, and deploying a wide range of machine learning models and advanced algorithms (e.g., neural networks, SVMs, random forests) for business decision-making, including MLOps practices.
Applied AI
HighStrong understanding and practical experience with modern AI paradigms, including Generative AI, Large Language Models (LLMs), and agentic systems, and their application in business contexts, reflecting Goldman Sachs' strategic focus on cutting-edge AI.
Infra & Cloud
HighExperience with major cloud platforms (AWS, Azure, GCP) for deploying, managing, and scaling AI/ML infrastructure and solutions, including cloud-native architectures and services.
Business
HighAbility to translate complex analytical insights into actionable business strategies, understand financial services domain challenges (e.g., credit risk, fraud, marketing), provide business justification for model attributes (e.g., Fair Lending compliance), and strong project management skills.
Viz & Comms
HighExcellent communication and interpersonal skills to articulate complex technical concepts and model insights effectively to both technical and non-technical stakeholders, along with strong documentation practices for peer reviews and validation.
What You Need
- Data analysis and processing
- Statistical modeling and analysis
- Machine learning model development, evaluation, and monitoring
- Predictive modeling and advanced algorithms
- Dimensionality reduction and feature engineering
- Working with large datasets
- Model validation and testing
- Project management
- Business justification for model attributes (e.g., Fair Lending compliance)
- Ability to translate complex technical concepts into business insights
Nice to Have
- Credit risk modeling
- Marketing response modeling and forecasting
- Experience in a startup or new business line environment
Languages
Tools & Technologies
Want to ace the interview?
Practice with real questions.
You're building models inside Goldman's Asset Management division that feed directly into how portfolio managers make decisions. That means portfolio construction signals, credit risk scoring with feature engineering in PySpark on Databricks, and client segmentation work that shapes rebalancing strategies. Success in year one looks like getting a model through GS's independent Model Risk Management validation and into production where a PM actually acts on the output, not leaving it in a notebook.
A Typical Week
A Week in the Life of a Goldman Sachs Data Scientist
Typical L5 workweek · Goldman Sachs
Weekly time split
Culture notes
- Goldman expects consistent in-office presence five days a week at 200 West Street, and while core hours are roughly 9-to-6, it's common to see Slack activity until 7 or 8 PM during model review cycles or quarter-end pushes.
- The culture is polished and high-accountability — you'll spend more time on documentation, model governance, and stakeholder communication than at a typical tech company, and the bar for analytical rigor is set by the Model Risk Management group.
The surprise isn't the modeling time. It's how much of your week disappears into documentation, governance memos, and stakeholder decks. If you're coming from a startup or a tech company where "ship it and iterate" was the ethos, the regulatory overhead at GS will feel like a second job. Debugging an Airflow DAG failure on an upstream data feed one afternoon, then translating Shapley values into PM-friendly language the next morning: that's the actual rhythm.
Projects & Impact Areas
Portfolio rebalancing signal models anchor the Asset Management DS team's work, with your outputs going straight into presentations where you explain lift curves in terms PMs care about. Credit risk scoring runs in parallel, pulling from counterparty exposure tables, and those model outputs feed capital reserve calculations and trading desk decisions. GS is also deploying LLM-based tools for earnings summarization and compliance document analysis, and the firm's skill expectations reflect a real strategic push into GenAI and modern AI, so you'll be building with these tools, not just evaluating them from the sidelines.
Skills & What's Expected
Software engineering is the most underrated skill for this role. Candidates pour prep time into ML theory and statistics, which absolutely matter here, but GS expects you to ship production code, review PRs, and work within deployment pipelines on their modernized tech stack (think GraalVM adoption, the open-source Legend platform). The flip side: financial fluency is where most tech-background candidates fall short. If you can't explain what a basis point move means for a portfolio or why a sparse feature set in credit risk might call for L1 over L2 regularization, your technical depth won't compensate.
Levels & Career Growth
The VP-to-Executive Director promotion is where careers stall. Strong individual contributor work isn't enough at that threshold. You need visible cross-divisional impact and sponsorship from at least one MD. Internal mobility between divisions (Asset Management to Global Markets, for example) requires a formal internal application and your current manager's sign-off, which creates enough friction that many people stay in their lane longer than they'd planned.
Work Culture
Goldman expects consistent in-office presence five days a week at 200 West Street, and CEO David Solomon has been publicly vocal about prioritizing in-person culture. Slack threads run until 7 or 8 PM during quarter-end pushes, and your methodology gets scrutinized in regular readout sessions with MRM validators and senior PMs who won't nod politely if your backtesting looks thin. The feedback loop is fast because of that scrutiny, but the documentation and governance burden that comes with operating under regulatory oversight will feel heavy if you've never worked in financial services before.
Goldman Sachs Data Scientist Compensation
GS comp is a bonus-driven machine. Your annual discretionary bonus is the real variable, tied to both individual and firm performance, and it's not guaranteed money. Senior roles may include RSUs, but for most data scientists the cash bonus is where the action is.
Base salary is negotiable, so don't leave that on the table. The source of real leverage, though, is a competing offer, which can help you push for a stronger sign-on bonus. Spend your negotiation energy on those two levers: base and sign-on.
Goldman Sachs Data Scientist Interview Process
5 rounds·~3 weeks end to end
Initial Screen
1 roundBehavioral
This initial step is an AI-conducted, recorded video interview where you'll answer five general personality and behavioral questions. You'll have two minutes to respond to each question, focusing on how you navigate different work situations and solve real-world problems. The goal is to assess your fit with Goldman Sachs's culture and values.
Tips for this round
- Practice answering common behavioral questions using the STAR method (Situation, Task, Action, Result).
- Research Goldman Sachs's 14 Business Principles and core values to align your responses.
- Familiarize yourself with the Hirevue platform by doing a practice run to get comfortable with the timed format.
- Prepare examples that showcase your problem-solving skills, teamwork, and resilience.
- Ensure you have a quiet, well-lit environment with a stable internet connection for the recording.
Technical Assessment
1 roundCoding & Algorithms
You will face a technical challenge focused heavily on data structures and algorithm questions. This round assesses your foundational computer science knowledge and your ability to write efficient, clean code. Expect to solve problems similar to those found on platforms like datainterview.com/coding.
Tips for this round
- Practice a wide range of datainterview.com/coding-style problems, focusing on medium to hard difficulty.
- Be proficient in at least one programming language (e.g., Python, Java, C++) and be able to articulate your thought process.
- Understand time and space complexity analysis (Big O notation) and be able to apply it to your solutions.
- Review common data structures like arrays, linked lists, trees, graphs, and hash maps, along with their associated algorithms.
- Practice explaining your approach out loud, discussing trade-offs, and handling edge cases.
Onsite
3 roundsBehavioral
As part of the 'Superday' final round, this interview will delve deeper into your professional experiences, motivations, and cultural fit. You'll discuss your career aspirations, how you handle challenges, and your approach to collaboration. Interviewers, often senior employees or hiring managers, will assess your alignment with the firm's leadership principles.
Tips for this round
- Prepare several detailed STAR stories that highlight your leadership, teamwork, problem-solving, and resilience.
- Be ready to discuss your resume in detail, connecting your past experiences to the requirements of a Data Scientist role at Goldman Sachs.
- Articulate why you are interested in Goldman Sachs specifically and how your values align with the firm's culture.
- Formulate thoughtful questions to ask your interviewer about their role, the team, and the firm's direction.
- Demonstrate enthusiasm and a strong work ethic, which are highly valued at Goldman Sachs.
Machine Learning & Modeling
This technical interview, also part of the Superday, will focus on your expertise in machine learning concepts and their practical application. You can expect questions on various ML algorithms, model selection, evaluation metrics, and how to deploy and monitor models. You may also be asked to solve a coding problem related to ML or data manipulation.
Case Study
In this Superday interview, you'll be presented with a real-world business problem, likely related to finance or market analysis, and asked to develop a data-driven solution. This round assesses your ability to structure complex problems, make data-informed decisions, and communicate your thought process clearly. You'll need to demonstrate both your analytical skills and your business acumen.
Tips to Stand Out
- Deep Dive into Goldman Sachs. Thoroughly research the firm's history, recent financial results, business principles, and current market activities. Understand their commitment to corporate citizenship and diversity.
- Master Data Structures & Algorithms. Goldman Sachs places a significant emphasis on foundational computer science skills. Dedicate substantial time to practicing coding problems, especially those involving common data structures and algorithms.
- Prepare for Behavioral Questions. Craft compelling stories using the STAR method that showcase your leadership, teamwork, problem-solving, and resilience, aligning them with Goldman Sachs's values.
- Understand Financial Context. While a Data Scientist role, working at an investment bank means understanding basic financial concepts, market dynamics, and how data science applies to financial services.
- Practice Explaining Your Work. Be able to clearly articulate your thought process for technical problems, the rationale behind your machine learning choices, and your approach to business case studies.
- Ask Thoughtful Questions. Prepare insightful questions for your interviewers about their roles, the team, company culture, and specific projects. This demonstrates engagement and genuine interest.
- Show Enthusiasm and Drive. Goldman Sachs values candidates with a strong work ethic and a proactive attitude. Convey your passion for data science and your desire to contribute to a high-performing environment.
Common Reasons Candidates Don't Pass
- ✗Lack of Technical Depth. Candidates often struggle with the rigor of data structure and algorithm questions, failing to provide optimal solutions or clearly explain their approach.
- ✗Poor Cultural Fit. Not demonstrating alignment with Goldman Sachs's business principles, values, or high-performance culture can lead to rejection, even with strong technical skills.
- ✗Inability to Articulate Problem-Solving. Candidates may solve a problem but fail to clearly communicate their thought process, assumptions, and trade-offs, which is crucial for complex financial problems.
- ✗Insufficient Domain Knowledge. For a Data Scientist role at a financial institution, a lack of understanding of basic finance concepts or how data science applies to the industry can be a significant drawback.
- ✗Weak Behavioral Responses. Generic or unprepared answers to behavioral questions that don't showcase specific examples of impact, leadership, or resilience often result in a negative impression.
Offer & Negotiation
Goldman Sachs offers a competitive compensation package typically comprising a base salary, a significant annual bonus (often discretionary and tied to individual and firm performance), and sometimes restricted stock units (RSUs) for more senior roles. While base salary is negotiable, the bonus component can be substantial and varies year-to-year. Focus your negotiation on the base salary and a potential sign-on bonus, especially if you have competing offers. Be prepared to articulate your value and market worth based on your skills and experience.
The HireVue screen is AI-conducted and recorded, which means your first impression at GS happens without a human in the room. Candidates who treat it as a throwaway often don't advance to the live coding round, where you'll write production-quality solutions under time pressure on GS's preferred data structures and algorithm problems. Among the most common rejection reasons candidates report: lacking technical depth on algorithms and failing to clearly walk interviewers through their reasoning, and both of these get tested before you ever reach the Superday.
The Superday itself packs a second behavioral round, an ML and modeling deep-dive, and a finance-flavored case study into a single day. That second behavioral isn't redundant. GS's 14 Business Principles (client service, integrity, excellence) aren't just wall art; interviewers in that round probe whether your stakeholder communication and values alignment hold up under scrutiny from a different interviewer than the one who screened you initially. Post-Superday, from what candidates report, the wait for a final answer can stretch beyond the typical timeline if division leadership needs to weigh in on headcount or competing candidates.
Goldman Sachs Data Scientist Interview Questions
Machine Learning & Statistical Modeling (Finance)
Expect questions that force you to choose and defend models for noisy, non-stationary financial data (returns, spreads, vol surfaces), including features, leakage controls, and validation. Candidates often stumble when translating a modeling choice into concrete evaluation logic under regime shifts.
You are forecasting 1-day ahead PnL for an equity stat-arb book using factor exposures, recent returns, and borrow cost data, and you see an AUC of 0.62 in random CV but 0.51 live. How do you redesign validation to eliminate leakage and handle regime shifts, and what metric do you report to Risk?
Sample Answer
Most candidates default to random $k$-fold CV, but that fails here because it mixes future information into the training folds through time dependence, corporate actions, and slow-moving features like borrow cost. Use walk-forward or purged, embargoed CV with a time gap sized to your maximum feature lookback and position holding period, and validate across distinct regimes (pre, during, post stress). Report a utility-aligned metric like out-of-sample information ratio of the strategy implied by the model, plus calibration (reliability) for drawdown control. If Risk wants classification, give out-of-sample precision at a fixed turnover or risk budget, not raw AUC.
You need a model for daily default risk of a corporate bond issuer where defaults are rare and censored, and you want time-varying covariates like spreads and equity vol. Which modeling family fits and how do you validate it without overstating performance?
You are building a short-horizon realized volatility forecast for an options market-making desk using intraday returns, order book imbalance, and macro event indicators. How do you choose between a GARCH-type model and a gradient-boosted tree model, and what diagnostic tells you the model is breaking under a new volatility regime?
Financial Markets, Risk, and Quant Reasoning
Most candidates underestimate how much domain intuition you need to sanity-check a model’s output against market mechanics and risk constraints. You’ll be tested on how you reason about PnL drivers, risk metrics, hedging intuition, and what can go wrong in trading or portfolio contexts.
A delta-hedged long call on a non-dividend stock shows a consistent negative daily PnL while implied volatility is flat and your hedging is correct. What is the most likely driver, and how do you validate it with trade and market data?
Sample Answer
It is theta bleed, with residual slippage and funding explaining any deviation from the textbook decay. A delta-hedged long option has expected PnL of roughly $\Theta\,dt + \tfrac{1}{2}\Gamma\,(dS)^2$; with flat implied vol and typical realized moves, the negative $\Theta$ dominates. Validate by decomposing daily PnL into Greeks using mid-market Greeks at hedge times, then reconcile the leftover with bid-ask, discrete hedging error, and financing on the stock hedge.
You need 1-day 99% VaR for a $5\times 10^9$ multi-asset portfolio (rates, equities, credit) for a risk committee, with 3 years of daily history and known volatility clustering. Do you ship historical simulation or parametric (EWMA or GARCH) VaR, and what failure modes do you call out?
A trader claims their intraday mean-reversion signal is market-neutral because its beta to the S&P 500 futures is near zero, yet the strategy has large losses on high-vol days. How do you diagnose whether the exposure is to volatility, liquidity, or correlation breakdown, and what risk metric would you add beyond beta?
Coding & Algorithms (Round 2)
Your performance here depends on writing correct, efficient code under time pressure, not just describing an approach. You’ll see problems that resemble real data work (arrays/strings/hash maps, streaming-style computations) with clear complexity expectations.
You ingest a live stream of equity trades as (ts, symbol, size) and need to flag the first symbol whose cumulative size exceeds a threshold $T$ (ties broken by earliest timestamp). Implement a function that returns (symbol, ts) or null if no symbol crosses $T$.
Sample Answer
You could sort by timestamp and then aggregate, or you could aggregate in one pass while tracking the first crossing event. Sorting is $O(n\log n)$ and wastes time when the stream is already in order. The one-pass hash map wins here because it is $O(n)$ time, $O(k)$ space for $k$ symbols, and you can stop early the moment any symbol crosses $T$.
from __future__ import annotations
from dataclasses import dataclass
from typing import Dict, Iterable, Optional, Tuple
@dataclass(frozen=True)
class Trade:
ts: int # epoch millis or any comparable integer timestamp
symbol: str
size: float
def first_symbol_to_cross_threshold(
trades: Iterable[Trade],
T: float,
) -> Optional[Tuple[str, int]]:
"""Return (symbol, ts) for the first symbol whose cumulative size exceeds T.
Assumptions:
- trades arrive in non-decreasing timestamp order (streaming ingestion).
- crossing means cumulative_size > T (strictly exceeds).
- If multiple symbols cross at the same timestamp, return the one that crossed
at that timestamp first in the input order (stable tie-break).
Time: O(n)
Space: O(k) where k is number of distinct symbols
"""
if T < 0:
# If threshold is negative, any positive cumulative would exceed immediately.
# But to keep semantics sane, treat it as already exceeded by first trade.
for tr in trades:
return (tr.symbol, tr.ts)
return None
cum: Dict[str, float] = {}
for tr in trades:
prev = cum.get(tr.symbol, 0.0)
new_total = prev + tr.size
cum[tr.symbol] = new_total
# Flag the first crossing event in stream order.
if prev <= T and new_total > T:
return (tr.symbol, tr.ts)
return None
Given daily PnL by strategy for a portfolio as an array of floats, compute the maximum drawdown value over the full period in $O(n)$ time (drawdown is peak-to-trough decline of cumulative PnL). Return the drawdown as a positive number.
You have mid-price snapshots for a single symbol as (ts, mid) sorted by ts, and you need the maximum absolute mid-price change within any rolling window of length $W$ seconds, computed over all pairs of points inside each window. Implement an $O(n)$ solution that returns the maximum value.
SQL / Data Retrieval for Analytics
The bar isn’t whether you know SELECT syntax; it’s whether you can express multi-step financial analytics cleanly with joins, window functions, and careful filtering. You’ll be judged on correctness with edge cases like missing days, duplicated fills, and as-of joins.
You have equity order executions with occasional duplicated fill events. Write SQL to compute each account’s daily traded notional ($\sum \text{qty} \cdot \text{price}$) for the last 30 calendar days, deduping fills by the latest ingested record per (trade_id, fill_id) and returning 0 notional on no-trade days.
Sample Answer
Reason through it: You first isolate the time window and dedupe at the fill grain, because any aggregation before dedupe bakes in overcounting. Next, compute per-fill notional and roll it up to (account_id, trade_date). Then build a calendar of the last 30 days and cross it with the set of accounts you care about, left join the daily aggregates, and coalesce missing rows to 0. This is where most people fail, they skip the calendar join and silently drop zero-activity days.
/*
Assumptions
- executions(trade_id, fill_id, account_id, symbol, qty, price, execution_ts, ingest_ts)
- Postgres-compatible SQL
- 'Latest ingested' means greatest ingest_ts for a given (trade_id, fill_id)
*/
WITH params AS (
SELECT
(CURRENT_DATE - INTERVAL '29 days')::date AS start_date,
CURRENT_DATE::date AS end_date
),
-- 1) Dedupe fills to the latest ingested record per (trade_id, fill_id)
dedup_fills AS (
SELECT
e.trade_id,
e.fill_id,
e.account_id,
e.qty,
e.price,
e.execution_ts,
e.ingest_ts,
ROW_NUMBER() OVER (
PARTITION BY e.trade_id, e.fill_id
ORDER BY e.ingest_ts DESC
) AS rn
FROM executions e
JOIN params p
ON e.execution_ts::date BETWEEN p.start_date AND p.end_date
),
latest_fills AS (
SELECT
trade_id,
fill_id,
account_id,
qty,
price,
execution_ts
FROM dedup_fills
WHERE rn = 1
),
-- 2) Aggregate notional by account and trade date
daily_notional AS (
SELECT
account_id,
execution_ts::date AS trade_date,
SUM(qty * price) AS traded_notional
FROM latest_fills
GROUP BY 1, 2
),
-- 3) Calendar for the last 30 calendar days
calendar AS (
SELECT generate_series(p.start_date, p.end_date, INTERVAL '1 day')::date AS trade_date
FROM params p
),
-- 4) Accounts with any activity in the window (scope accounts to active ones)
active_accounts AS (
SELECT DISTINCT account_id
FROM latest_fills
)
SELECT
a.account_id,
c.trade_date,
COALESCE(d.traded_notional, 0) AS traded_notional
FROM active_accounts a
CROSS JOIN calendar c
LEFT JOIN daily_notional d
ON d.account_id = a.account_id
AND d.trade_date = c.trade_date
ORDER BY a.account_id, c.trade_date;Given end-of-day positions for a portfolio and a separate daily close price table with gaps on holidays, write SQL to compute the portfolio’s daily market value for the last 60 business days, using the most recent available close price on or before each position date (as-of join).
For a rates trading desk, you have intraday risk snapshots per book with timestamped DV01 values. Write SQL to return, for each book and day, the end-of-day DV01 and the max intraday DV01, treating end-of-day as the latest snapshot at or before 16:00 New York time.
Case Study: Investment/Risk Analytics & Business Judgment
In the case, you’re expected to turn an ambiguous investment-management or risk problem into a crisp plan with measurable success criteria. What trips people up is failing to define the decision, the metric, the constraints (risk, latency, costs), and the iteration path.
You are asked to decide whether to allocate 5 percent of a multi-asset portfolio to a new stat-arb strategy whose backtest shows a Sharpe of 1.8 but has only 18 months of data and high turnover. What decision framework, metrics (including at least one tail metric), and data checks do you present to the investment committee to approve, size, or reject the allocation?
Sample Answer
This question is checking whether you can turn a shiny backtest into an investment decision with explicit risk, capacity, and implementation constraints. You should anchor on a go or no-go plus sizing, then lay out success metrics like net-of-costs Sharpe, hit rate stability, and a tail metric such as 99 percent 1-day CVaR, plus drawdown and time-to-recover. Data checks should include survivorship and lookahead bias tests, realistic transaction cost and slippage modeling, and sensitivity to rebalancing frequency. Most people fail by quoting Sharpe and ignoring turnover, costs, and regime dependence.
A daily equity factor model used for risk and hedging starts underestimating portfolio $99\%$ 10-day VaR during volatility spikes, and the desk wants a fix within 2 weeks without breaking hedging workflows. How do you diagnose whether the issue is model misspecification, stale correlations, or data problems, and what change do you ship that improves tail coverage while controlling false alarms and model risk?
Behavioral & Stakeholder Management
You’ll need to show you can operate in a high-stakes environment where model risk, scrutiny, and cross-team dependencies are normal. Interviewers look for structured stories about ownership, influencing without authority, handling pushback, and making tradeoffs under deadlines.
A portfolio manager wants to ship a new factor model into the daily risk report, but Model Risk Management is pushing back on explainability and stability under regime shifts. Describe how you align PM, MRM, and Engineering on a go live decision and what evidence you bring to the approval meeting.
Sample Answer
The standard move is to lock a decision framework up front: owner, success metrics, validation checklist, and a date when evidence freezes for review. But here, model governance matters because MRM is optimizing for auditability and tail risk, so you bring pre agreed artifacts like out of sample stability, stress scenarios, and a clear rollback plan that Engineering can execute.
During a fast market event, your intraday risk alert model spikes false positives, Trading wants thresholds loosened immediately, and Compliance warns about missing true limit breaches. Walk through how you decide, communicate, and document the change in under 60 minutes.
What jumps out isn't any single dominant area. It's that GS spreads weight across six areas with no safe one to skip, yet the case study round (which simulates real Asset Management decisions like sizing an allocation to a new stat-arb strategy) demands you pull from nearly every other area simultaneously. The biggest prep mistake is drilling ML and coding in isolation, then freezing when a case prompt requires you to question a backtest's Sharpe ratio, sketch a validation scheme, and justify the business decision in one coherent answer. From what candidates report, that synthesis under pressure is where preparation breaks down.
Drill GS-style case and modeling questions with financial framing at datainterview.com/questions.
How to Prepare for Goldman Sachs Data Scientist Interviews
Know the Business
Official mission
“Goldman Sachs’ mission is to advance sustainable economic growth and financial opportunity across the globe.”
What it actually means
Goldman Sachs aims to provide comprehensive financial services, including investment banking, asset management, and wealth management, to a diverse global client base. Its core purpose is to foster sustainable economic growth and broaden financial opportunities for individuals and institutions worldwide.
Key Business Metrics
$59B
+15% YoY
$279B
+35% YoY
47K
+3% YoY
Business Segments and Where DS Fits
Goldman Sachs Asset Management
The primary investing area within Goldman Sachs, delivering investment and advisory services across public and private markets for the world's leading institutions, financial advisors, and individuals. It is a leading investor across fixed income, liquidity, equity, alternatives, and multi-asset solutions. Goldman Sachs oversees approximately $3.5 trillion in assets under supervision as of September 30, 2025.
DS focus: Utilizing quantitative strategies to navigate market complexities and inefficiencies, employing data-driven approaches for diversified portfolios, and leveraging AI applications for automation, customer engagement, and operational intelligence.
Current Strategic Priorities
- Expand offerings in the wealth channel to help more investors reach their long-term goals by combining expertise with T. Rowe Price through co-branded model portfolios.
Competitive Moat
Goldman Sachs Asset Management oversees $3.5 trillion in assets under supervision, and its 2026 investment outlook makes clear that alternatives and AI-driven analytics are where the firm is placing its bets. For data scientists, that translates to quantitative strategies for navigating market inefficiencies and building AI applications for automation and customer engagement, not just back-office reporting.
The "why GS" answer that actually works ties your skills to a specific problem the firm has publicly named. GS has forecast that alternative investment demand will outstrip origination supply, a supply-demand imbalance that creates real modeling challenges around allocation and risk. Pair that with the firm's published skepticism about generative AI hype and you can articulate a point of view on where LLMs add value in investment research versus where they don't. That's a far more memorable answer than "I want to apply ML in finance."
Try a Real Interview Question
5-Day Rolling Volatility of Daily Portfolio PnL
sqlGiven a table of daily portfolio $pnl$ by $trade_date$, return one row per date with the 5-trading-day rolling sample volatility of $pnl$ (use $n-1$ in the denominator) and include dates even when fewer than 5 prior rows exist. Output columns: $trade\_date$, $window\_n$, $pnl\_mean$, $pnl\_vol$, where $pnl\_vol=\sqrt{\frac{\sum (x_i-\bar{x})^2}{n-1}}$ when $n\ge2$, else $NULL$.
| trade_date | portfolio_id | pnl |
|------------|--------------|------|
| 2025-01-02 | PF1 | 100 |
| 2025-01-03 | PF1 | -50 |
| 2025-01-06 | PF1 | 150 |
| 2025-01-07 | PF1 | 0 |
| 2025-01-08 | PF1 | 200 |
700+ ML coding problems with a live Python executor.
Practice in the EngineGS data scientists ship production code on an internal stack that includes Scala, Java, and contributions to the open-source Legend platform, so coding rounds test whether you write reviewable, deployable code rather than notebook-style scripts. Expect the interviewer to probe edge cases and ask you to reason about complexity tradeoffs out loud. Drill similar problems on datainterview.com/coding, prioritizing clean Python with proper error handling over speed-optimized one-liners.
Test Your Readiness
How Ready Are You for Goldman Sachs Data Scientist?
1 / 10Can you design a time series prediction model for daily equity returns while avoiding lookahead bias, including a train, validation, and test scheme appropriate for nonstationary data?
Focus on the financial reasoning gaps (Sharpe ratio intuition, VaR interpretation) that separate GS interviews from generic DS loops, then target those weak spots at datainterview.com/questions.
Frequently Asked Questions
How long does the Goldman Sachs Data Scientist interview process take?
Expect roughly 4 to 8 weeks from application to offer. You'll typically go through an initial recruiter screen, a technical phone interview, and then a Super Day (their version of an onsite) with multiple back-to-back interviews. Goldman moves faster for campus hires and can be slower for experienced roles, especially if there's internal committee review. Don't be surprised if there are delays between rounds. The firm has a structured approval process that can add a week or two.
What technical skills are tested in the Goldman Sachs Data Scientist interview?
SQL, Python, and statistics are the big three. You'll be tested on data analysis and processing, statistical modeling, machine learning model development and evaluation, and predictive modeling. They also care about dimensionality reduction, feature engineering, and model validation. Goldman uses Python heavily, but familiarity with R, Java, Scala, or SAS can help depending on the team. I'd say Python and SQL fluency are non-negotiable, everything else is a bonus.
How should I tailor my resume for a Goldman Sachs Data Scientist role?
Lead with impact, not tools. Goldman wants to see that you can translate technical work into business outcomes, so frame your bullet points around results: revenue generated, risk reduced, efficiency gained. Mention experience with large datasets explicitly since they work at massive scale. If you've done anything related to model validation, compliance (like Fair Lending), or working cross-functionally with business stakeholders, put that front and center. Keep it to one page if you have under 10 years of experience. Financial services experience helps but isn't required.
What is the total compensation for a Goldman Sachs Data Scientist?
Goldman Sachs pays competitively with other top financial institutions. For an entry-level or Associate-level Data Scientist in New York, expect a base salary in the $120K to $150K range, with total comp (including bonus) reaching $150K to $200K. Vice President-level data scientists can see total comp in the $200K to $350K range depending on the division and performance. Bonuses at Goldman are a significant portion of pay and are heavily tied to firm and individual performance. Keep in mind New York cost of living when evaluating these numbers.
How do I prepare for the behavioral interview at Goldman Sachs for a Data Scientist position?
Goldman's core values are partnership, client service, integrity, and excellence. Your behavioral answers need to reflect these. They'll ask about teamwork, handling disagreements, and times you went above and beyond for a stakeholder. Prepare stories that show you can work across teams and communicate technical findings to non-technical people. Goldman has a strong culture of collaboration, so anything that signals you're a lone wolf will hurt you. I've seen candidates get dinged on culture fit even with strong technical performance.
How hard are the SQL and coding questions in the Goldman Sachs Data Scientist interview?
SQL questions are medium difficulty. Think window functions, CTEs, complex joins, and aggregation problems, often framed around financial data. Python coding questions tend to focus on data manipulation (pandas, numpy) and implementing algorithms from scratch rather than pure software engineering puzzles. You might get asked to write a function for a statistical test or build a simple model pipeline. Practice at datainterview.com/coding to get comfortable with the style and pacing. The questions aren't tricky for the sake of being tricky, but they expect clean, efficient code.
What machine learning and statistics concepts should I know for a Goldman Sachs Data Scientist interview?
Regression (linear and logistic), tree-based models, and ensemble methods come up frequently. You should be solid on bias-variance tradeoff, regularization, cross-validation, and evaluation metrics like AUC, precision, and recall. Goldman also tests on dimensionality reduction (PCA especially), feature engineering, and model monitoring over time. Predictive modeling with advanced algorithms is a listed requirement, so be ready to discuss gradient boosting or neural nets at a conceptual level. They care a lot about model validation and testing, so know how to explain why a model works, not just that it works.
What format should I use to answer behavioral questions at Goldman Sachs?
Use the STAR format (Situation, Task, Action, Result) but keep it tight. Goldman interviewers are busy and sharp. They don't want a five-minute monologue. Aim for 90 seconds to two minutes per answer. Spend most of your time on the Action and Result. Quantify results whenever possible. And always connect back to what the business gained from your work. Practice 6 to 8 stories that you can adapt to different question angles, and you'll cover most of what they throw at you.
What happens during the Goldman Sachs Data Scientist onsite or Super Day?
The Super Day typically involves 3 to 5 back-to-back interviews, each about 30 to 45 minutes. You'll face a mix of technical and behavioral rounds. Some interviewers will drill into your past projects, others will give you live coding or case-style problems. At least one round will focus on statistical reasoning or ML concepts. Expect at least one senior person (VP or Managing Director) to assess culture fit and your ability to explain technical concepts in business terms. It's a long day. Bring water, stay sharp, and treat every interviewer like they have equal say in the decision, because they do.
What business metrics and domain concepts should I know for a Goldman Sachs Data Scientist interview?
You don't need to be a finance expert, but you should understand basic financial concepts relevant to data science: risk modeling, portfolio optimization, fraud detection, and compliance. Goldman specifically mentions business justification for model attributes and Fair Lending compliance, so read up on how models are evaluated for fairness and regulatory requirements. Know how to frame a data science problem in terms of business value. If you can explain how your model saves money, reduces risk, or improves a client outcome, you'll stand out. Practice with business-oriented case questions at datainterview.com/questions.
What common mistakes do candidates make in the Goldman Sachs Data Scientist interview?
The biggest one I see is being too academic. Goldman wants practitioners who can ship models and explain them to stakeholders. If you can't articulate why your model matters to the business, that's a red flag. Another common mistake is underestimating the behavioral rounds. Some candidates prep only for technical and then stumble when asked about teamwork or conflict. Finally, don't ignore model validation. Goldman operates in a regulated environment, so they want to hear about how you test, monitor, and justify your models. Not just how you build them.
Does Goldman Sachs prefer specific programming languages for Data Scientist roles?
Python is the primary language and the one you'll be tested on most. That said, Goldman lists Java, C/C++, Scala, R, Matlab, and SAS as relevant languages. Different teams use different stacks. Quantitative strategy teams might lean toward C++ or Scala for performance. Risk teams might use SAS or R for legacy systems. For the interview itself, Python is your safest bet. Be fluent in pandas, numpy, and scikit-learn. If you know a second language from their list, mention it, but don't stress about knowing all of them.



