JP Morgan Chase Data Scientist Guide (2026): Job, Salary & Interviews

JP Morgan Chase Data Scientist at a Glance

Interview Rounds

5 rounds

Difficulty

Python SQL R Scala Stata SAS C++FinanceMachine LearningRisk ManagementData AnalyticsPaymentsQuantitative Finance

From hundreds of mock interviews we've run, the single biggest predictor of success at JP Morgan Chase isn't technical depth. It's whether you can reframe a model's precision-recall curve as a dollar figure that a Managing Director in Card Services would repeat in a quarterly business review. Candidates who lead with architecture choices instead of business impact consistently underperform in the later rounds.

JP Morgan Chase Data Scientist Role

Primary Focus

FinanceMachine LearningRisk ManagementData AnalyticsPaymentsQuantitative Finance

Skill Profile

Math & Stats

High

Strong foundation in statistics, data analysis, quantitative research methods, and probability. Experience with empirical research and methods for causal inference is highly valued.

Software Eng

High

Strong programming skills for data manipulation, analysis, and empirical research. Understanding of data structures, algorithms, and principles for developing robust solutions.

Data & SQL

Medium

Experience working with big data sets and extracting insights from large panel data. Familiarity with big data computing on public cloud platforms (e.g., Spark, PySpark) is a plus.

Machine Learning

High

Solid understanding and practical experience with machine learning algorithms and analytical solutions. Ability to design experiments, implement algorithms, and validate results.

Applied AI

Medium

Awareness and foundational understanding of modern AI concepts such as Natural Language Processing (NLP) and Large Language Models (LLMs). Expertise in advanced applied ML areas (e.g., prompt engineering, RAG) is a significant plus, especially for more senior roles. (Uncertainty: For a general Data Scientist role, this is likely a strong preferred skill rather than a core requirement, but JPMC's lead roles show a strong push in this area).

Infra & Cloud

Medium

Experience with public cloud platforms for data processing and analysis. Familiarity with cloud services (e.g., AWS, EMR, Sagemaker) and the ability to productionize scalable solutions is beneficial.

Business

High

Strong interest and ability to apply data science to financial services, economic research, and business problems such as risk analysis, fraud detection, investment strategies, and operational improvements.

Viz & Comms

High

Excellent written and verbal communication skills to convey complex analytical insights effectively to both technical and non-technical audiences. Experience with impactful visual analytics and report drafting/editing.

What You Need

Experience working with big data sets
Strong programming skills for empirical research and data analysis
Excellent written and verbal communication skills
Intellectual curiosity and commitment to rigorous analysis
Ability to manage multiple priorities in a fast-paced environment
Adherence to compliance and data privacy guidelines
Ability to extract and communicate insights from large datasets
Foundational understanding of machine learning algorithms and analytical solutions
Ability to apply data science to financial services and economic research

Nice to Have

Graduate degree in economics, public policy, statistics, computer science, mathematics, or related field
Experience with big data computing on public cloud platforms (e.g., Spark, PySpark, AWS)
Experience working with technical collaboration tools (e.g., Git, Jira)
Prior experience working with financial services or other administrative data sets
Experience with quantitative research methods, especially causal inference
Familiarity with advanced applied ML techniques (e.g., GPU optimization, finetuning, embedding models, inferencing, prompt engineering, RAG)
Knowledge of specific AI areas (e.g., Large Language Models, Natural Language Processing, Knowledge Graph, Reinforcement Learning, Ranking and Recommendation, Time Series Analysis)
Experience with machine learning frameworks (e.g., Tensorflow, Pytorch, Keras, Scikit-Learn)

Languages

PythonSQLRScalaStataSASC++

Tools & Technologies

SparkPySparkAWSEMRSagemakerTensorflowPytorchKerasScikit-LearnMXNetpyGGitJiraLangChainLangGraphAuto-GPTNumpyScipy

Want to ace the interview?

Practice with real questions.

Start Mock Interview

You're building and maintaining models that directly affect how JPMC catches fraud and scores risk within its Consumer Banking operation, particularly across Chase's card transaction network. The day-in-life data paints a clear picture: you'll spend time monitoring a transaction fraud classifier's drift metrics in SageMaker, engineering behavioral features in PySpark on EMR, and then translating all of that into findings decks for non-technical VPs. Success after year one means you can point to a measurable business outcome (a false positive reduction, an improved KS/Gini on a credit scorer, or a shipped champion-challenger test on a Chase Mobile risk threshold) that survived a full model governance review with the Model Risk Management team.

A Typical Week

A Week in the Life of a JP Morgan Chase Data Scientist

Typical L5 workweek · JP Morgan Chase

Weekly time split

Analysis — 22%Meetings — 20%Coding — 18%Writing — 14%Research — 10%Break — 10%Infrastructure — 6%

Culture notes

Most data scientists work roughly 8:45 AM to 6:15 PM with occasional late pushes around model validation deadlines or quarterly business reviews — the pace is steady and corporate but intellectually demanding, especially navigating model risk governance.
JPMC operates on a structured hybrid policy requiring most employees in-office at the Manhattan or Midtown campus at least three days per week, with Tuesdays through Thursdays being the heaviest in-office days across the analytics org.

The ratio of communication work to pure coding will surprise anyone coming from a tech company. Stakeholder readouts aren't a Friday afterthought; they're a recurring commitment where you build dollar-impact estimates and defend methodology to Card Services leadership who care about business recommendations, not your hyperparameter grid. The genuine research time baked into the weekly rhythm (prototyping graph neural networks for transaction fraud, reviewing JPMC's internal AI research digest) is unusually generous for a bank this size and keeps the work from feeling purely operational.

Projects & Impact Areas

Fraud detection on Chase's card transactions is the flagship workstream, where you're engineering features like session velocity, merchant category sequences, and geolocation deviation scores from billions of rows in the firm's internal data lake. That work sits alongside credit risk scoring models tied to regulatory stress tests like CCAR, where model outputs carry direct P&L consequences and face actual auditor scrutiny. A growing GenAI practice is also creating new demand, applying LLMs and RAG pipelines through frameworks like LangChain to document processing and internal tooling.

Skills & What's Expected

What's underrated for this role is your ability to write model documentation that satisfies Model Risk Management reviewers under SR 11-7 supervisory expectations, without needing hand-holding from senior teammates. Python and SQL are table stakes, PySpark on EMR is the real workhorse at JPMC's data scale, and the skill profile rates communication just as high as machine learning because every model needs a stakeholder-facing narrative before it ships. Deep learning frameworks like PyTorch and TensorFlow show up in preferred qualifications (especially for teams working on LLMs and NLP), so don't dismiss them, but interpretability constraints mean gradient boosting and logistic regression still carry heavy production workloads on the fraud and risk side.

Levels & Career Growth

The jump that stalls most people is Senior Associate to VP, and it's almost never about technical ability. What separates those levels is owning a model portfolio end-to-end, including governance, stakeholder relationships, and the business case, not just the code. Lateral moves across business lines (Consumer Banking DS to Investment Banking quant analytics, for instance) are genuinely possible at a firm with this many divisions, which is a real perk if you want breadth.

Work Culture

JPMC's culture notes describe a structured hybrid policy with at least three days per week in-office, Tuesdays through Thursdays being the heaviest, at locations like the Manhattan campus or Jersey City. The pace is steady rather than chaotic, but navigating model governance alongside the Model Risk Management team and compliance reviewers adds an intellectual weight that pure tech roles don't have. Hierarchy is real, face time with senior leaders still counts, and the banking floors run noticeably more formal than the tech org's subculture.

JP Morgan Chase Data Scientist Compensation

RSUs at JPMC generally don't start until the VP level, so if you're comparing against a tech offer, you're mostly weighing base salary plus a discretionary annual cash bonus (paid each January, with the possibility of a full payout even if you started mid-year). That bonus is tied to both firm performance and your individual rating, which means real year-to-year variance that no offer letter can guarantee.

Your strongest negotiation levers, from what candidates report, are base salary and the performance bonus target. JPMC's recruiters will ask about competing opportunities without usually requiring written proof, so surface other offers early and let them close the gap across whichever component has the most room. Don't expect a dollar-for-dollar match against a tech package with equity; JPMC's counter will lean on the cash bonus and, in some cases, a signing bonus to bridge the difference.

JP Morgan Chase Data Scientist Interview Process

5 rounds·~6 weeks end to end

Initial Screen

1 round

Recruiter Screen

30mPhone

This initial conversation evaluates your baseline fit for the role, communication clarity, and motivation for joining JP Morgan Chase. You'll discuss your background, how your experience aligns with financial services, and your exposure to regulated environments and data governance principles.

behavioralgeneralfinance

Tips for this round

Clearly articulate why you are interested in JP Morgan Chase specifically, beyond just a 'tech-first' company.
Be prepared to discuss how your data science experience is relevant to financial services challenges.
Highlight any prior experience working in regulated or high-stakes environments.
Demonstrate strong communication skills and enthusiasm for the role and company.
Have a few thoughtful questions ready for the recruiter about the team or process.

Technical Assessment

2 rounds

Machine Learning & Modeling

60mLive

This round will test your foundational knowledge across SQL, statistics, and machine learning fundamentals. Expect to solve problems involving advanced SQL queries, demonstrate understanding of statistical concepts, and discuss core ML algorithms and model evaluation metrics.

databasestatisticsmachine_learningalgorithmsdata_structures

Tips for this round

Practice advanced SQL queries, including window functions, joins, and aggregations, to ensure readiness for SQL interview questions.
Review key statistical concepts such as hypothesis testing, A/B testing, and probability distributions.
Be prepared to explain various machine learning algorithms (e.g., regression, classification, clustering) and their underlying principles.
Understand and be able to articulate different model evaluation metrics (e.g., precision, recall, F1-score, AUC) and when to use them.
Brush up on basic data structures and algorithms, as some coding questions might involve these.

Case Study

60mLive

You'll be given a business problem, often scenario-driven and related to financial challenges, and asked to apply your analytical skills to solve it. This round assesses your ability to translate complex business needs into analytical solutions and demonstrate end-to-end data science problem-solving.

product_sensemachine_learningstatisticsfinance

Tips for this round

Clearly structure your approach to the case study, outlining problem definition, data exploration, modeling, and evaluation steps.
Articulate your assumptions and discuss potential trade-offs for different analytical choices.
Focus on translating the business problem into a quantifiable data science question and proposing practical solutions.
Demonstrate an understanding of the full data science process, from data acquisition to model deployment and monitoring.
Consider the business impact, ethical implications, and potential risks associated with your proposed solutions, especially in a financial context.

Onsite

2 rounds

Behavioral

45mVideo Call

The interviewer will probe your communication skills, ability to collaborate with stakeholders, and ethical judgment within a regulated environment. Expect questions about past experiences, how you handle challenges, and your approach to data governance and compliance.

behavioralgeneralfinance

Tips for this round

Prepare several STAR method stories that highlight your problem-solving, teamwork, and leadership skills.
Emphasize your ability to communicate complex technical concepts to non-technical stakeholders clearly.
Discuss instances where you've demonstrated ethical judgment and accountability in your work.
Highlight your awareness of data governance principles and working within regulated environments.
Showcase an ownership mentality and a proactive approach to challenges.

Behavioral

45mVideo Call

This final round assesses your overall cultural fit, long-term thinking, and alignment with JP Morgan Chase's values, particularly around responsible innovation and accountability. You'll likely speak with a senior leader who will evaluate your strategic thinking and motivation for the role.

behavioralgeneralfinance

Tips for this round

Research JP Morgan Chase's recent initiatives in AI/ML and fintech to demonstrate genuine interest.
Prepare insightful questions for the interviewer that show your strategic thinking and engagement.
Articulate your long-term career goals and how they align with opportunities at JP Morgan Chase.
Demonstrate enthusiasm for working in financial services and contributing to a large, established institution.
Reiterate your commitment to ethical judgment, risk awareness, and accountability, which are highly valued at JPMC.

Tips to Stand Out

Master SQL and Statistics. JP Morgan Chase places a strong emphasis on robust analytical fundamentals. Ensure you can handle advanced SQL queries and clearly explain statistical concepts like hypothesis testing and model evaluation metrics.
Practice Business Problem Solving. Be ready to translate ambiguous business problems into structured data science solutions. Focus on the end-to-end process, from problem definition to impact assessment, especially within a financial context.
Emphasize Communication and Stakeholder Alignment. Data Scientists at JPMC need to clearly articulate findings and collaborate effectively. Practice explaining complex technical concepts to both technical and non-technical audiences.
Demonstrate Risk and Compliance Awareness. Given the highly regulated nature of financial services, highlight your understanding of data governance, ethical considerations, and risk management in your data science work.
Show Ownership Mentality. JPMC values candidates who take accountability for their work and demonstrate a proactive approach to problem-solving and innovation. Share examples where you've taken ownership of projects.
Research JP Morgan Chase's AI/ML Initiatives. Show genuine interest in their specific efforts in AI and machine learning within the financial sector. This demonstrates motivation and alignment with their strategic direction.

Common Reasons Candidates Don't Pass

✗Weak SQL or Statistical Foundations. Candidates often struggle if they cannot demonstrate advanced SQL proficiency or a solid grasp of statistical reasoning and model evaluation metrics, which are core requirements.
✗Inability to Translate Business Problems. Failing to connect data science solutions directly to business value or struggling to structure a clear approach to a case study is a common pitfall.
✗Lack of Risk/Compliance Awareness. Not demonstrating an understanding of the unique regulatory and ethical considerations in financial data science can be a significant red flag for JPMC.
✗Poor Communication Skills. Even with strong technical skills, candidates who cannot clearly articulate their thought process, assumptions, or findings to various audiences often do not progress.
✗Generic Answers. Providing unspecific or uninspired answers to behavioral questions, especially regarding motivation for JPMC or experience in regulated environments, can indicate a lack of genuine interest or fit.

Offer & Negotiation

JP Morgan Chase does engage in salary negotiations, but it's important to note that their compensation for technical roles may not be as competitive as top-tier tech companies like FAANG. They are unlikely to match offers from such companies. The initial offer is often not your market value, so negotiation is expected. The compensation package typically includes a base salary, a performance bonus (annual, paid in January, potentially full even if starting mid-year), and sometimes a signing bonus. Equity packages, usually in the form of RSUs, are generally offered starting at the VP level. The most negotiable components are typically the base salary and performance bonuses. Relocation packages, ranging from $5k-$10k, are also negotiable, especially since fully remote roles are rare and hybrid arrangements require at least three days in the office. While JPMC doesn't usually require competing offers in writing, they will inquire about other opportunities.

The most common reason candidates wash out isn't technical weakness. It's failing to connect their data science skills to JPMC's financial context. Candidates who can't explain why they'd pick a specific evaluation metric for a fraud model, or who freeze when asked to scope data requirements for a credit default problem, get flagged across multiple rounds. Strong SQL and stats are necessary but not sufficient when every round (from the case study to the behaviorals) probes whether you can operate inside a regulated, compliance-heavy environment like JPMC's Consumer Banking or Commercial Banking divisions.

The two behavioral rounds carry real veto power, and they aren't redundant. From what candidates report, one tends to be with your prospective team lead and the other with a more senior leader outside the immediate group, meaning a weak showing on either can sink an otherwise strong technical performance. JPMC's internal scheduling across business lines can also stretch the timeline well beyond the typical window, so don't panic if you hit a multi-week gap between rounds.

JP Morgan Chase Data Scientist Interview Questions

Applied Statistics & Experimentation

Expect questions that force you to choose the right statistical test, define success metrics, and diagnose noisy results under business constraints. Candidates often struggle to translate assumptions (independence, stationarity, normality) into practical decisions and clear next steps.

You launch a fraud model that queues transactions for manual review, and you see chargeback rate drop week over week, but approval rate also drops and volume is rising. Which metric(s) do you treat as the primary success metric, and what statistical test or interval do you use to decide whether the change is real given shifting mix?

MediumMetric Design and Hypothesis Testing

Sample Answer

Most candidates default to a two-sample $t$-test on weekly averages, but that fails here because the outcome is a rate with changing denominators and strong confounding from mix and volume. You should frame success as a cost or utility metric, for example expected fraud loss plus ops cost plus revenue impact per $1{,}000$ transactions, then compare with a stratified or regression-adjusted estimator by channel, merchant category, country, and risk bucket. Use a difference in proportions or a GLM (logistic or Poisson with exposure) and report robust confidence intervals, for example cluster-robust by merchant or day to handle correlation. If you cannot adjust sufficiently, you are not testing the model, you are testing the traffic shift.

A payments A/B test targets merchants with a new Smart Retry rule intended to increase authorization rate, but randomization is at the merchant level and you measure daily transaction-level outcomes. How do you compute the standard error and confidence interval correctly, and why is the naive transaction-level calculation wrong?

EasyClustered Experiments and Variance Estimation

Sample Answer

Use cluster-level inference, either aggregate to merchant-level metrics and run a two-sample test, or keep transaction-level data but use cluster-robust standard errors with merchant as the cluster. The naive transaction-level calculation is wrong because transactions within a merchant are correlated, so the effective sample size is closer to the number of merchants, not the number of transactions. Ignoring intracluster correlation makes standard errors too small and produces false positives. You also need to be explicit about weighting, merchant-weighted for a merchant-centric KPI, or transaction-weighted for a network-level KPI.

After rolling out a new credit line increase policy to some existing card customers, you want to estimate its effect on 90-day delinquency, but take-up is imperfect and exposure timing varies by customer. Would you use intention-to-treat or a treatment-on-the-treated estimate, and what design or estimator handles timing and censoring cleanly?

HardImperfect Compliance and Time-to-Event Outcomes

Practice more Applied Statistics & Experimentation questions

Machine Learning & Model Evaluation (Applied)

Most candidates underestimate how much emphasis is placed on model choice tradeoffs and evaluation rigor for financial problems (fraud, risk, payments). You’ll be pushed to explain leakage, calibration, thresholding, class imbalance, and how to validate models on time-sliced data.

You built a card-fraud model to score authorizations in real time, and offline AUC is strong, but in shadow mode the alert volume is 3x forecast and confirmed fraud capture is flat. What evaluation checks do you run to detect leakage and miscalibration, and how do you set an operating threshold under class imbalance and asymmetric costs?

MediumModel Evaluation and Thresholding

Sample Answer

Run leakage checks and calibration diagnostics, then choose the threshold by optimizing expected cost under your class prior and action costs. Leakage shows up when features are not available at decision time (post-authorization chargeback signals, investigator outcomes, future aggregates), so you validate feature timestamps, recompute features as-of $t$, and re-evaluate on a strict time-sliced holdout. Miscalibration shows up when predicted probabilities do not match observed rates, so you plot reliability curves, check calibration-in-the-large, and apply isotonic or Platt scaling on a recent validation window. Then pick a threshold maximizing $\mathbb{E}[\text{benefit}] - \mathbb{E}[\text{cost}]$ (fraud dollars prevented versus declines, ops review, and customer friction), and report PR AUC plus precision and recall at that point, not just ROC AUC.

For a payments risk model (ACH returns or card chargebacks), you need to estimate how the model will perform over the next quarter while the fraud strategy team changes rules weekly and the population shifts. How do you validate and monitor the model so you can trust the estimate, and what does your split strategy look like?

HardTime-Sliced Validation and Monitoring

Practice more Machine Learning & Model Evaluation (Applied) questions

SQL & Database Analytics

Your ability to extract trustworthy insights from messy transaction and customer tables is a core signal in the ML & Modeling round. You’ll need to write joins, window functions, cohort/retention or funnel queries, and spot pitfalls like duplicated rows and incorrect denominators.

Given card transactions `card_txn(txn_id, customer_id, merchant_id, txn_ts, amount, status)` and disputes `dispute(dispute_id, txn_id, opened_ts)`, compute each merchant’s dispute rate over the last 90 days as $\frac{\#\text{distinct disputed settled txns}}{\#\text{distinct settled txns}}$ and return the top 20 merchants by dispute rate with at least 500 settled transactions.

EasyJoins and Aggregations

Sample Answer

You could do a naive join between transactions and disputes and then count rows, or you could aggregate to distinct transaction keys before computing the numerator and denominator. The naive join inflates counts when a transaction has multiple dispute records or other one to many artifacts. The distinct txn level aggregation wins here because it protects denominators and makes the metric auditable.

WITH base AS (
  SELECT
    t.merchant_id,
    t.txn_id
  FROM card_txn t
  WHERE t.status = 'SETTLED'
    AND t.txn_ts >= (CURRENT_DATE - INTERVAL '90' DAY)
), disputed AS (
  -- Distinct txn_id to avoid double counting if multiple dispute rows exist per transaction
  SELECT DISTINCT
    d.txn_id
  FROM dispute d
  WHERE d.opened_ts >= (CURRENT_DATE - INTERVAL '90' DAY)
), merchant_counts AS (
  SELECT
    b.merchant_id,
    COUNT(DISTINCT b.txn_id) AS settled_txn_cnt,
    COUNT(DISTINCT CASE WHEN dis.txn_id IS NOT NULL THEN b.txn_id END) AS disputed_settled_txn_cnt
  FROM base b
  LEFT JOIN disputed dis
    ON dis.txn_id = b.txn_id
  GROUP BY b.merchant_id
)
SELECT
  merchant_id,
  settled_txn_cnt,
  disputed_settled_txn_cnt,
  (disputed_settled_txn_cnt::DECIMAL(18,6) / NULLIF(settled_txn_cnt, 0)) AS dispute_rate
FROM merchant_counts
WHERE settled_txn_cnt >= 500
ORDER BY dispute_rate DESC, settled_txn_cnt DESC
FETCH FIRST 20 ROWS ONLY;

For each customer in `payments(txn_id, customer_id, txn_ts, amount, channel, status)`, compute a 30 day rolling total of successful payment amount and flag the first timestamp where the rolling total exceeds the customer’s prior 180 day rolling mean plus $3$ rolling standard deviations, then return the flagged rows for the last 14 days.

HardWindow Functions and Anomaly Features

Practice more SQL & Database Analytics questions

Causal Inference & Policy Evaluation

The bar here isn’t whether you’ve memorized DID/IV/RDD, it’s whether you can justify identification in a real financial setting with selection bias and confounding. Interviewers look for crisp assumptions, falsification tests, and how you’d communicate limitations to stakeholders.

Chase rolls out a new overdraft grace period to some checking customers, based on an internal risk score threshold, and you need the causal effect on 90-day charge-off rate and customer attrition. What identification strategy do you use, what assumptions must hold, and what falsification tests do you run?

MediumPolicy Evaluation (RDD vs DID)

Sample Answer

Reason through it: The policy is assigned by a cutoff, so you start with a sharp or fuzzy RDD around the score threshold and define a tight bandwidth where customers look comparable. You check continuity of baseline covariates and the running variable density at the cutoff (no sorting), then validate that pre-policy outcomes are smooth at the cutoff. You estimate local treatment effects for charge-offs and attrition, and you run placebo cutoffs and placebo outcomes to see if anything else “jumps” where it should not. If take-up is imperfect, you treat assignment as an instrument and estimate LATE with 2SLS.

A fraud model change increases transaction declines for a subset of debit card authorizations, and leadership wants the causal impact on fraud losses and approved spend. How do you design an evaluation when the change was deployed globally but only activates for certain score bands and MCCs?

HardCausal ML and Heterogeneous Treatment Effects

Sample Answer

Start with what the interviewer is really testing: This question is checking whether you can build a credible counterfactual when treatment is deterministic, heterogeneous, and entangled with the outcome definition. You frame activation rules as quasi-experimental variation, for example diff-in-diff with treated segments (score band by MCC) versus untreated segments, plus event-study to test parallel trends and detect anticipation. You guard against label leakage because declines mechanically change observed fraud, so you define outcomes on a consistent exposure basis (attempted authorizations, not just approved). You stress-test with negative controls (MCCs never affected), sensitivity to segment definitions, and a doubly robust or causal forest style estimator to summarize heterogeneity without turning it into a fishing expedition.

A retention offer in the Chase app is triggered when a customer’s predicted churn probability exceeds $p_0$, and you want the causal effect on 30-day churn and net revenue. Would you use RDD, IV, or matching, and how do you explain what effect you are estimating to a non-technical stakeholder?

EasyTargeting, IV, and Effect Interpretation

Practice more Causal Inference & Policy Evaluation questions

ML Coding (Python/R) for Analytics

In timed exercises, you’re evaluated on turning a modeling or metric idea into correct, readable code with edge cases handled. You’ll likely implement feature prep, train/validation splits (often time-based), metric computation, and basic model training using common libraries.

You have a pandas DataFrame of card transactions with columns ['account_id','txn_time','amount','is_fraud'] where 'txn_time' is timezone-aware; write code to create a daily table with fraud rate, total volume, and fraud dollar loss per day, then return the top 5 days by fraud dollar loss.

EasyMetrics Engineering

Sample Answer

This question is checking whether you can translate business metrics into correct, vectorized code with time handling and no silent pandas bugs. You need to floor timestamps to day in a consistent timezone, aggregate safely, and avoid divide-by-zero when volume is zero. You are also being tested on naming and returning a clean result that someone could chart or feed into monitoring.

import pandas as pd
import numpy as np


def top_fraud_loss_days(txns: pd.DataFrame, tz: str = "UTC", top_k: int = 5) -> pd.DataFrame:
    """Build daily fraud metrics and return the top days by fraud dollar loss.

    Parameters
    ----------
    txns : pd.DataFrame
        Columns: account_id, txn_time (tz-aware), amount, is_fraud.
    tz : str
        Target timezone to normalize days (default UTC).
    top_k : int
        Number of days to return.

    Returns
    -------
    pd.DataFrame
        Columns: day, txn_count, total_volume, fraud_count, fraud_rate,
                 fraud_dollar_loss.
    """
    required = {"account_id", "txn_time", "amount", "is_fraud"}
    missing = required - set(txns.columns)
    if missing:
        raise ValueError(f"Missing columns: {sorted(missing)}")

    df = txns.copy()

    # Normalize to a single timezone, then derive the day.
    if not pd.api.types.is_datetime64tz_dtype(df["txn_time"]):
        raise TypeError("txn_time must be timezone-aware (datetime64[ns, tz])")
    df["txn_time"] = df["txn_time"].dt.tz_convert(tz)
    df["day"] = df["txn_time"].dt.floor("D")

    # Ensure numeric and boolean types are consistent.
    df["amount"] = pd.to_numeric(df["amount"], errors="coerce")
    df["is_fraud"] = df["is_fraud"].astype(bool)

    # Fraud loss is the sum of fraudulent amounts.
    df["fraud_amount"] = np.where(df["is_fraud"], df["amount"], 0.0)

    daily = (
        df.groupby("day", as_index=False)
        .agg(
            txn_count=("amount", "size"),
            total_volume=("amount", "sum"),
            fraud_count=("is_fraud", "sum"),
            fraud_dollar_loss=("fraud_amount", "sum"),
        )
    )

    # Guard against divide-by-zero.
    daily["fraud_rate"] = np.where(
        daily["txn_count"] > 0, daily["fraud_count"] / daily["txn_count"], 0.0
    )

    out = daily.sort_values("fraud_dollar_loss", ascending=False).head(top_k)
    return out.reset_index(drop=True)

Given a panel dataset of ACH payments with columns ['customer_id','event_time','amount','returned_within_5d'] sorted arbitrarily, write code to build time-based features (rolling 30-day count and sum per customer) using only past events, then train a logistic regression and report AUC on a chronological 80/20 split.

MediumTime-based Feature Engineering and Validation

Sample Answer

The standard move is to use rolling windows per entity and a time-based split to avoid leakage. But here, strict causality matters because the features must exclude the current event and any future payments, otherwise AUC looks great and fails the first time the model sees new data. You also need to handle customers with sparse history and keep feature generation aligned row-wise after sorting.

import pandas as pd
import numpy as np

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler


def train_ach_return_model(df: pd.DataFrame, split_quantile: float = 0.8, tz: str | None = None) -> dict:
    """Create leakage-safe 30D rolling features and evaluate logistic regression.

    Features (per customer, using only past events):
      - rolling_30d_count
      - rolling_30d_sum

    Evaluation:
      - chronological split by event_time at the given quantile
      - AUC on test

    Returns
    -------
    dict with keys: model, auc, split_time, feature_frame
    """
    required = {"customer_id", "event_time", "amount", "returned_within_5d"}
    missing = required - set(df.columns)
    if missing:
        raise ValueError(f"Missing columns: {sorted(missing)}")

    data = df.copy()
    data["event_time"] = pd.to_datetime(data["event_time"], utc=True, errors="coerce")
    if data["event_time"].isna().any():
        raise ValueError("event_time contains unparseable timestamps")

    if tz is not None:
        # Optional normalization if you need business-day alignment.
        data["event_time"] = data["event_time"].dt.tz_convert(tz)

    data["amount"] = pd.to_numeric(data["amount"], errors="coerce")
    if data["amount"].isna().any():
        raise ValueError("amount contains non-numeric values")

    data["y"] = data["returned_within_5d"].astype(int)

    # Sort within customer by time, then compute rolling windows on shifted values.
    data = data.sort_values(["customer_id", "event_time"]).reset_index(drop=True)

    def add_rolling_features(g: pd.DataFrame) -> pd.DataFrame:
        g = g.copy()
        # Exclude the current event: shift by 1.
        g["amount_prev"] = g["amount"].shift(1)

        # Time-based rolling needs the index to be the timestamp.
        g = g.set_index("event_time")
        g["rolling_30d_count"] = (
            g["amount_prev"].rolling("30D", min_periods=1).count().fillna(0.0)
        )
        g["rolling_30d_sum"] = (
            g["amount_prev"].rolling("30D", min_periods=1).sum().fillna(0.0)
        )
        g = g.reset_index()

        # Replace NaNs for customers with no history.
        g["rolling_30d_count"] = g["rolling_30d_count"].fillna(0.0)
        g["rolling_30d_sum"] = g["rolling_30d_sum"].fillna(0.0)
        return g

    feat = data.groupby("customer_id", group_keys=False).apply(add_rolling_features)

    # Chronological split across the full dataset.
    split_time = feat["event_time"].quantile(split_quantile)
    train = feat[feat["event_time"] <= split_time]
    test = feat[feat["event_time"] > split_time]
    if len(test) == 0 or len(train) == 0:
        raise ValueError("Split produced empty train or test, adjust split_quantile")

    X_cols = ["rolling_30d_count", "rolling_30d_sum"]
    X_train, y_train = train[X_cols], train["y"]
    X_test, y_test = test[X_cols], test["y"]

    model = Pipeline(
        steps=[
            ("scaler", StandardScaler()),
            ("lr", LogisticRegression(max_iter=1000, class_weight="balanced")),
        ]
    )
    model.fit(X_train, y_train)

    # Use predicted probabilities for AUC.
    p_test = model.predict_proba(X_test)[:, 1]
    auc = roc_auc_score(y_test, p_test) if y_test.nunique() > 1 else float("nan")

    return {
        "model": model,
        "auc": auc,
        "split_time": split_time,
        "feature_frame": feat,
    }

You are building a probability of default model for a credit card portfolio with a DataFrame containing ['account_id','as_of_date','utilization','fico','default_next_12m']; write code to do walk-forward validation with yearly folds, train a gradient boosting model each fold, compute out-of-time AUC and expected calibration error (ECE), then aggregate fold metrics.

HardWalk-forward Validation and Calibration

Practice more ML Coding (Python/R) for Analytics questions

Finance Domain & Risk/Fraud Case Framing

Because many case studies are anchored in payments, risk, or operational losses, you must frame problems using domain-appropriate metrics and constraints (cost of false positives, chargebacks, regulatory sensitivity). Strong answers connect model outputs to decisioning, monitoring, and business impact.

You are building a real-time card payments fraud model that can either approve, decline, or step-up to 3DS, and business asks for a single threshold. How do you pick the operating point using expected loss, given fraud loss $L$, interchange margin $M$, manual review cost $C$, and class-conditional error rates?

EasyFraud Decisioning Economics

Sample Answer

The standard move is to choose the threshold that minimizes expected cost, for each action compute $E[\text{cost}]=P(\text{fraud}|x)\cdot L+P(\text{legit}|x)\cdot(\text{lost margin or friction})+C$ where applicable, then pick the lowest. But here, step-up and decline have asymmetric customer harm and long-run attrition, so you treat the friction term as a calibrated business penalty and enforce constraints like a maximum false positive rate on high value customers because brand and retention costs dominate one-period $M$.

A new chip-enabled merchant segment shows a sudden 30% drop in fraud rate week-over-week while chargeback volume is unchanged, and leadership wants you to declare the model improved. What checks do you run to distinguish real risk reduction from label leakage or reporting lag, and what metric would you report to risk management?

HardRisk Measurement and Label Integrity

Practice more Finance Domain & Risk/Fraud Case Framing questions

What jumps out isn't any single category but how the case study round forces you to chain them together: a fraud detection prompt will start as a domain framing question, pivot into model evaluation when you're asked about threshold selection, then land on causal inference when the interviewer asks how you'd measure the policy's downstream impact on chargebacks. The compounding difficulty between ML evaluation and causal inference catches most candidates off guard because JPMC's interviewers on the payments and risk teams expect you to move fluidly from "how does this model perform?" to "did this model actually cause the outcome we observe?" within the same answer. Candidates who prep each topic in isolation tend to freeze at exactly that handoff.

Practice with finance-contextualized questions at datainterview.com/questions.

How to Prepare for JP Morgan Chase Data Scientist Interviews

Know the Business

Updated Q1 2026

Official mission

“We aim to be the most respected financial services firm in the world, serving corporations and individuals.”

What it actually means

To drive global economic growth and create financial opportunities for individuals, businesses, and communities worldwide, while delivering value to shareholders and employees through comprehensive financial services and large-scale impact.

New York City, New YorkUnknown

Key Business Metrics

Revenue

$168B

+3% YoY

Market Cap

$802B

+19% YoY

Employees

319K

+2% YoY

Business Segments and Where DS Fits

Consumer Banking

The U.S. consumer and commercial banking business, operating the largest branch network in the U.S. and focused on helping customers maximize their financial goals.

Investment Banking

A leading business segment providing investment banking services globally.

Commercial Banking

A leading business segment providing commercial banking services.

Financial Transaction Processing

A leading business segment focused on financial transaction processing.

Asset Management

A leading business segment focused on asset management.

J.P. Morgan Private Bank

Provides personalized, concierge-style service for clients with complex financial needs, including wealth planning, advisory, and trust & estate planning.

Card & Connected Commerce

Manages the firm's co-brand credit card programs, including the upcoming issuance of Apple Card.

Current Strategic Priorities

Expand access to affordable and convenient financial services nationwide
Open more than 500 new branches, renovate 1,700 locations, and hire 3,500 employees across the country over three years
Hire more than 10,500 Consumer Bank team members by year-end
Aim for 75% of Americans to be within a reasonable drive of a branch and over 50% within each state
Elevate the Affluent Experience with J.P. Morgan Financial Centers
Invest in innovative products and services to make banking easier, supporting leadership in deposit market share
Deepen relationship by becoming the new issuer of Apple Card

Competitive Moat

Diversified portfolio and business mixGlobal reach and expansionInnovation and technology (digital transformation, fintech)Customer-centric approach and personalized servicesSustainability and ESG integrationCapacity for large-scale transactionsExceptional client franchisesComprehensive product and service offeringsPowerful brandsFortress balance sheetStrong risk governance and controlsOperational resilienceEmployer of choice for top talentComplete, global, diversified, and at scale operations

JPMC is betting hard on physical reach and AI simultaneously. The firm plans to open more than 160 new branches in over 30 states in 2026, targeting underserved regions where 75% of Americans would be within a reasonable drive of a location. Every new branch generates fresh customer transaction data that feeds fraud detection, credit risk, and personalization models, so the data science surface area is literally expanding with the real estate footprint.

On the digital side, JPMC's emerging technology trends report signals serious investment in GenAI for document processing and internal tooling. If you're asked "why JPMC?", anchor your answer to one of these specific bets. Something like: "Chase is expanding into regions with thin branch coverage, and I want to build the customer acquisition models that make those new locations profitable from day one." That framing, pulled from their 2025 Investor Day materials, beats vague enthusiasm about "the scale of financial services" every time.

Try a Real Interview Question

Monthly chargeback rate by merchant with volume threshold

sql

Given card payment transactions and chargebacks, compute each merchant's monthly chargeback rate defined as $\frac{\#\text{chargebacks in month}}{\#\text{settled transactions in month}}$ for months where the merchant has at least $2$ settled transactions. Output: month (as $YYYY\text{-}MM$), merchant_id, settled_txn_count, chargeback_count, chargeback_rate; sort by month then chargeback_rate descending.

| transaction_id | merchant_id | user_id | txn_ts     | amount_usd | status  |
|----------------|-------------|---------|------------|------------|---------|
| t1             | m1          | u1      | 2024-01-05 | 120.00     | SETTLED |
| t2             | m1          | u2      | 2024-01-20 | 80.00      | SETTLED |
| t3             | m2          | u3      | 2024-01-25 | 50.00      | DECLINED |
| t4             | m2          | u4      | 2024-02-02 | 200.00     | SETTLED |
| t5             | m2          | u5      | 2024-02-15 | 75.00      | SETTLED |

| chargeback_id | transaction_id | cb_ts      | cb_reason |
|--------------|-----------------|------------|-----------|
| c1           | t2              | 2024-02-10 | FRAUD     |
| c2           | t4              | 2024-02-20 | DISPUTE   |
| c3           | t4              | 2024-02-25 | DUPLICATE |
| c4           | t99             | 2024-02-01 | FRAUD     |

WITH settled AS (
  SELECT
    t.transaction_id,
    t.merchant_id,
    DATE_TRUNC('month', t.txn_ts) AS month_start
  FROM transactions t
  WHERE t.status = 'SETTLED'
),
settled_monthly AS (
  SELECT
    month_start,
    merchant_id,
    COUNT(*) AS settled_txn_count
  FROM settled
  GROUP BY 1, 2
),
chargebacks_dedup AS (
  SELECT DISTINCT
    s.transaction_id,
    s.merchant_id,
    s.month_start
  FROM chargebacks cb
  JOIN settled s
    ON cb.transaction_id = s.transaction_id
),
chargebacks_monthly AS (
  SELECT
    month_start,
    merchant_id,
    COUNT(*) AS chargeback_count
  FROM chargebacks_dedup
  GROUP BY 1, 2
)
SELECT
  TO_CHAR(sm.month_start, 'YYYY-MM') AS month,
  sm.merchant_id,
  sm.settled_txn_count,
  COALESCE(cm.chargeback_count, 0) AS chargeback_count,
  COALESCE(cm.chargeback_count, 0)::DECIMAL / sm.settled_txn_count AS chargeback_rate
FROM settled_monthly sm
LEFT JOIN chargebacks_monthly cm
  ON sm.month_start = cm.month_start
 AND sm.merchant_id = cm.merchant_id
WHERE sm.settled_txn_count >= 2
ORDER BY sm.month_start, chargeback_rate DESC, sm.merchant_id;

700+ ML coding problems with a live Python executor.

Practice in the Engine

From what candidates report, JPMC's coding questions lean applied and contextual, often framed around transaction data or risk scenarios rather than pure algorithmic puzzles. Practicing problems with that flavor builds the right muscle memory. You'll find more finance-contextualized coding problems at datainterview.com/coding.

Build Your Finance Translation Layer

You probably already understand precision-recall tradeoffs. The question is whether you can explain why a 2% false positive rate in fraud detection costs Chase millions in blocked legitimate transactions and customer attrition, while a looser threshold invites regulatory scrutiny. That's the kind of reframing JPMC's applied rounds reward: not renaming concepts, but grounding them in real P&L and compliance consequences.

JPMC operates under a formal "How We Do Business" framework that shapes how models get built and governed. When you discuss model choices, weave in awareness of explainability requirements and model risk documentation as first-class deliverables, not afterthoughts. Candidates who treat governance as a constraint they'd tolerate rather than a design input they'd embrace tend to land poorly.

Prepare Distinct Behavioral Stories

JPMC weighs culture fit and conduct risk heavily enough to dedicate significant interview time to behavioral assessment. You'll want at least five or six STAR-format stories covering distinct themes: navigating compliance constraints on a technical decision, influencing a skeptical senior stakeholder with data, handling ambiguity when data was incomplete, and deliberately trading model performance for explainability.

If you've never worked in a regulated industry, pull from any experience where external rules shaped your technical choices (GDPR, HIPAA, internal security policies all count). Have a genuine self-critique ready for each story, because "I wouldn't change anything" reads as low self-awareness in a culture that takes conduct risk seriously.

Test Your Readiness

How Ready Are You for JP Morgan Chase Data Scientist?

1 / 10

Applied Statistics & Experimentation

Can you diagnose and address issues in A/B tests, including low power, multiple comparisons, peeking, and sample ratio mismatch, and clearly state what you would do in each case?

See where your gaps are and close them with realistic practice at datainterview.com/questions.

Frequently Asked Questions

How long does the JP Morgan Chase Data Scientist interview process take?

Most candidates report the process taking 4 to 8 weeks from application to offer. You'll typically go through an initial recruiter screen, a technical phone interview, and then a final round (virtual or onsite). Some teams move faster, especially if they have urgent headcount, but the compliance and background check steps at JP Morgan can add extra time at the end.

What technical skills are tested in the JP Morgan Chase Data Scientist interview?

SQL and Python are non-negotiable. You should also be comfortable with R, and having exposure to Scala, Stata, SAS, or C++ can set you apart depending on the team. Expect questions around working with big data sets, writing production-quality code for data analysis, and applying machine learning to financial problems. I've seen candidates get tripped up when they can only do modeling but can't wrangle messy data at scale.

How should I tailor my resume for a JP Morgan Chase Data Scientist role?

Lead with impact, not tools. JP Morgan cares about your ability to extract and communicate insights from large datasets, so quantify everything. Instead of 'built a model,' say 'built a classification model that reduced false positives by 30% on a 50M-row transaction dataset.' Mention compliance awareness or data privacy experience if you have it. Financial services experience is a big plus, but if you don't have it, frame your work in terms of business outcomes and rigorous analysis.

What is the salary and total compensation for a Data Scientist at JP Morgan Chase?

For an entry-level or Associate Data Scientist at JP Morgan, base salary typically falls in the $90K to $120K range. Vice President level data scientists can expect $130K to $170K base, with total comp (including bonus) reaching $180K to $230K or more. Bonuses at JP Morgan are a meaningful part of compensation, often 15-30% of base depending on performance and level. New York roles tend to be at the higher end of these ranges.

How do I prepare for the behavioral interview at JP Morgan Chase for a Data Scientist position?

JP Morgan's core values are Service, Heart, Curiosity, Courage, and Excellence. Your behavioral answers need to map to these. Prepare stories about going above and beyond for a stakeholder (Service), showing intellectual curiosity in your analysis work, and having the courage to push back on flawed assumptions. They also care a lot about managing multiple priorities in a fast-paced environment, so have a concrete example of juggling competing deadlines ready.

How hard are the SQL and coding questions in the JP Morgan Data Scientist interview?

The SQL questions are medium difficulty. Think multi-table joins, window functions, aggregations with HAVING clauses, and sometimes CTEs for readability. Python questions lean toward data manipulation with pandas, writing clean functions, and occasionally implementing a simple algorithm from scratch. It's not about trick questions. They want to see that you can write correct, readable code for real analytical work. You can practice similar problems at datainterview.com/coding.

What machine learning and statistics concepts should I know for the JP Morgan Chase Data Scientist interview?

You need a foundational understanding of supervised and unsupervised learning, regression, classification, decision trees, and ensemble methods. They'll also test your grasp of statistical inference, hypothesis testing, and probability. Since this is financial services, be ready to discuss how you'd apply these methods to problems like credit risk, fraud detection, or customer segmentation. Know the tradeoffs between model interpretability and performance, because regulators care about explainability.

What format should I use to answer behavioral questions at JP Morgan Chase?

Use the STAR format (Situation, Task, Action, Result) but keep it tight. I've seen too many candidates spend two minutes on setup and thirty seconds on what they actually did. Flip that ratio. Give just enough context, then spend most of your time on the specific actions you took and the measurable result. JP Morgan interviewers are busy people. They appreciate concise, structured answers that demonstrate rigorous thinking and clear communication.

What happens during the onsite or final round interview for a JP Morgan Data Scientist?

The final round typically involves 3 to 5 back-to-back interviews, each 30 to 45 minutes. You'll face a mix of technical deep-dives (SQL, Python, ML concepts), a case study or take-home analysis, and behavioral rounds with hiring managers and cross-functional partners. Some teams include a presentation where you walk through a past project or a provided dataset. Expect at least one interviewer to probe your ability to communicate insights clearly to non-technical stakeholders.

What business metrics and financial concepts should I know for a JP Morgan Chase Data Scientist interview?

You should understand basic financial metrics like revenue, net income, risk-adjusted return, and default rates. Familiarity with concepts like credit scoring, portfolio risk, and customer lifetime value will help you stand out. JP Morgan wants data scientists who can apply their skills to financial services and economic research, not just build models in a vacuum. If they give you a case study, frame your approach around business impact and compliance considerations.

What are common mistakes candidates make in the JP Morgan Data Scientist interview?

The biggest one is ignoring the financial services context. If you talk about models without mentioning interpretability, data privacy, or regulatory constraints, you'll seem naive about the industry. Another common mistake is poor communication. JP Morgan explicitly values excellent written and verbal communication skills, so rambling through technical explanations will hurt you. Finally, don't underestimate the behavioral rounds. They carry real weight here, and candidates who only prep the technical side often get dinged.

Does JP Morgan Chase hire remote Data Scientists or is it mostly in-office?

JP Morgan has been one of the more vocal companies about return-to-office. Most Data Scientist roles are based in New York, though there are positions in other hubs like Wilmington, Chicago, and London. Expect a hybrid arrangement at minimum, with many teams requiring 4-5 days in office. If location flexibility matters to you, ask the recruiter early so there are no surprises later in the process.

JP Morgan Chase Data Scientist Interview Guide

JP Morgan Chase Data Scientist Role

A Typical Week

A Week in the Life of a JP Morgan Chase Data Scientist

Weekly time split

Culture notes

Projects & Impact Areas

Skills & What's Expected

Levels & Career Growth

Work Culture

JP Morgan Chase Data Scientist Compensation

JP Morgan Chase Data Scientist Interview Process

Initial Screen

Recruiter Screen

Technical Assessment

Machine Learning & Modeling

Case Study

Onsite

Behavioral

Behavioral

Tips to Stand Out

Common Reasons Candidates Don't Pass

JP Morgan Chase Data Scientist Interview Questions

Applied Statistics & Experimentation

Machine Learning & Model Evaluation (Applied)

SQL & Database Analytics

Causal Inference & Policy Evaluation

ML Coding (Python/R) for Analytics

Finance Domain & Risk/Fraud Case Framing

How to Prepare for JP Morgan Chase Data Scientist Interviews

Try a Real Interview Question

Monthly chargeback rate by merchant with volume threshold

Build Your Finance Translation Layer

Prepare Distinct Behavioral Stories

Test Your Readiness

Frequently Asked Questions

Dan Lee

Related Articles

xAI AI Engineer Interview Guide

xAI Data Engineer Interview Guide

Google AI Researcher Interview Guide