Time series questions appear in nearly every data scientist interview at companies like Netflix, Uber, and Meta because real business problems involve temporal data. Whether you're forecasting demand, detecting payment anomalies, or building recommendation systems, you need to handle sequential patterns, seasonality, and the cardinal rule of never learning from the future.
What makes time series interviews brutal is that subtle mistakes create plausible wrong answers. You might confidently explain how to build a 7-day rolling average feature, but if you don't account for late-arriving data or timezone normalization, your model will leak future information and fail in production. Interviewers specifically design questions around these edge cases because they separate candidates who have shipped real systems from those who only know textbook examples.
Here are the top 25 time series questions organized by the skills that matter most: data preparation fundamentals, stationarity concepts, forecast evaluation, anomaly detection, and financial markets basics.
Time Series Interview Questions
Top Time Series interview questions covering the key areas tested at leading tech companies. Practice with real questions and detailed solutions.
Time Series Basics: Indexing, Resampling, Lags, and Splits
Interviewers start with indexing, resampling, and lag features because most candidates fail on data preparation basics. You'll get a messy dataset with mixed timezones, missing values, and late-arriving updates, then be asked to create features for a forecasting model.
The key insight is that temporal data leakage is silent and deadly. Your training metrics look great, but production performance crashes because your rolling averages accidentally included future information or your timestamp alignment was off by one period. Practice building features with strict temporal ordering under realistic data quality constraints.
Time Series Basics: Indexing, Resampling, Lags, and Splits
Start by showing you can handle timestamps correctly: sorting, time zones, missing intervals, lag features, and time based train validation splits. Candidates struggle because small leakage or alignment mistakes silently break model evaluation in interviews.
You have event logs with columns user_id, event_ts, and value, where event_ts is a string with mixed time zones (some end in Z, others have -0700). You need hourly aggregates per user and a 24 hour lag feature for modeling, what exact timestamp normalization and ordering steps do you apply to avoid silent leakage or misalignment?
Sample Answer
Most candidates default to parsing timestamps into naive datetimes and calling resample, but that fails here because the same wall clock hour means different instants across time zones. You should parse as timezone aware, convert everything to a single canonical zone like UTC, then sort by (user_id, event_ts) and de duplicate if your source can emit repeats. Aggregate on the normalized timestamp, then build the 24 hour lag using a time aware shift, for example groupby user and shift by 24 rows only if you have a complete hourly index, otherwise use a time based merge or asof join on $t-24\text{h}$. Finally, verify alignment by checking that the lagged feature timestamp is strictly earlier than the target timestamp for every row.
You are given 15 minute sensor readings with occasional missing intervals and want to train a model to predict the next 15 minute value. How do you resample and create lag features so the model never learns from future information when there are gaps?
You are forecasting daily demand and want features like a 7 day rolling mean and a 7 day lag. The data arrives as transactions with multiple rows per day and late arriving corrections. How do you aggregate, define the day boundary, and compute rolling features so they are strictly based on information available at prediction time?
You need to evaluate a model on 2 years of minute level data with strong seasonality and concept drift. Describe a time based train, validation, test split strategy and how you would tune hyperparameters without leaking information across time.
A teammate builds lag features with groupby(user_id).shift(1) after sorting by event_ts, but the dataset has duplicate timestamps per user and multiple event types. In an interview, how do you diagnose the bug and redesign the indexing and lagging so the feature corresponds to the correct previous event in time?
Stationarity, Decomposition, and Transformations
Stationarity questions test whether you can diagnose time series properties and choose appropriate transformations. Netflix and Uber love asking about revenue or ride demand series that have trends, seasonality, and heteroskedasticity all mixed together.
Most candidates memorize that you difference to remove trends and log-transform to stabilize variance, but they can't decide the right order of operations or validate their choices. The secret is using ACF patterns and residual diagnostics to confirm your transformations actually worked, not just applying cookbook recipes.
Stationarity, Decomposition, and Transformations
In interviews, you are tested on whether you can diagnose trend, seasonality, and changing variance and decide what to difference, detrend, or transform. Many candidates can recite definitions but cannot justify choices using ACF, PACF, and residual checks.
You are forecasting daily rides for a city. The series has an upward trend and a strong weekly pattern, the ACF decays slowly and has big spikes at lags 7, 14, 21. What transformations would you apply before fitting an ARIMA-type model, and how would you validate that you did enough?
Sample Answer
Use seasonal differencing at lag 7 and then difference once more if trend remains, and only add a variance-stabilizing transform if residual variance grows with level. The slow ACF decay suggests a unit root or trend, and the repeated spikes at multiples of 7 suggest weekly seasonality, so $\nabla_7 x_t = x_t - x_{t-7}$ is a first move, followed by $\nabla x_t$ if needed. You validate by checking that the residual ACF has no significant structure and that the mean and variance look stable over time. If PACF and ACF of the differenced series cut off or decay quickly, and residuals look like white noise, you are done.
You have weekly revenue per user for a subscription product. Variance increases with the mean and the ACF of the raw series shows persistence, but after taking logs the ACF still decays slowly. Would you log-difference, difference then log, or do something else, and what tells you that your choice is correct?
A quant desk gives you daily realized volatility. It is strictly positive, shows volatility clustering, and the ACF of levels is high while the ACF of squared returns is also high. Walk through how you would decide between using $\log(\sigma_t)$, differencing, or leaving it in levels with a mean-reverting model, and what diagnostics you would rely on.
You are modeling hourly request counts for a large service. STL decomposition shows multiple seasonalities, daily and weekly, and residuals still show autocorrelation at lags 24 and 168. How would you modify your decomposition or transformations to address this before forecasting?
You difference a series once to remove trend, but the differenced ACF has a strong negative spike at lag 1 and then quickly dies out, and forecasts look too reactive. What is the most likely issue, and what would you try next to fix it?
Classical Forecasting: ARIMA, SARIMA, and Exponential Smoothing
Forecast evaluation separates candidates who have deployed models from those who haven't. You'll design backtesting frameworks, choose appropriate metrics, and handle the practical constraints of production forecasting systems.
The trap here is evaluation methodologies that look rigorous but contain subtle biases. Many candidates suggest standard train/test splits or use MAPE without thinking about zero values and outliers. Real systems need rolling origin backtests that respect information availability and metrics that align with business impact.
Classical Forecasting: ARIMA, SARIMA, and Exponential Smoothing
You will often be asked to pick and tune a classical model under time pressure, then defend your assumptions and backtesting setup. Candidates stumble when translating business context into orders, seasonal terms, and diagnostics like Ljung Box and prediction intervals.
Forecast Evaluation, Backtesting, and Production Constraints
Anomaly detection questions focus on threshold setting, false positive control, and distinguishing real signals from noise. Companies like Citadel and Two Sigma ask these because trading and payment systems need reliable alerting with minimal false alarms.
Candidates typically propose simple statistical thresholds without considering seasonality, autocorrelation, or multiple testing problems. The winning approach involves seasonal decomposition, dynamic thresholds that adapt to volatility, and statistical corrections when monitoring hundreds of series simultaneously.
Forecast Evaluation, Backtesting, and Production Constraints
Expect questions that probe how you evaluate forecasts without leaking future information, including rolling origin backtests and metrics like MAPE, sMAPE, and pinball loss. It is easy to give a metric name, harder to explain when it fails and how you would monitor drift in production.
You are forecasting daily rides for 200 cities and you have 2 years of history. How would you design a rolling origin backtest so every forecast only uses information available at that time, and how would you aggregate results across cities?
Sample Answer
Reason through it: First, you pick an initial training window, for example first 12 months, and a forecast horizon $h$, for example 7 days. Then you define cutoffs $t_1, t_2, \dots$ and for each cutoff you refit using data up to $t_i$, forecast $\hat{y}_{t_i+1:t_i+h}$, and score against realized $y_{t_i+1:t_i+h}$, never touching data beyond $t_i$ during fitting or feature construction. You roll forward by a fixed step, often 1 day or 1 week, and keep the training policy fixed, expanding window or fixed length, based on how you will train in production. For aggregation, you report both macro averages (each city equal weight) and volume weighted averages (weight by actual demand) because the former reflects robustness and the latter reflects business impact.
Your team reports MAPE for a demand forecast, but your product includes many low volume SKUs with frequent zeros and occasional spikes. What metric would you use instead, and how would you explain MAPE failure modes to a stakeholder?
You trained a probabilistic forecast that outputs $P10, P50, P90$ for each day. How would you evaluate it offline, and what would you look for to detect miscalibration and horizon dependent degradation?
In production, your forecast error suddenly worsens after a pricing change, but only in a subset of regions. What monitoring and alerting would you set up to detect drift early without triggering constant false alarms, and how would you triage root cause?
You are asked to backtest a strategy that trades on next day forecasts of a time series, but the feature set includes a rolling z score and a target encoding built from historical outcomes. Where can leakage sneak in, and what concrete checks would you implement to prove the backtest is clean?
Anomaly Detection and Change Point Analysis
Financial time series questions appear at quant funds and fintech companies, testing your understanding of returns, volatility, and market microstructure effects. These roles demand precision because small modeling errors become large trading losses.
The classic mistake is building strategies that work in backtests but fail live due to survivorship bias, transaction costs, or lookahead bias in feature construction. Interviewers probe whether you understand bid-ask spreads, market impact, and the difference between research environments and production trading systems.
Anomaly Detection and Change Point Analysis
When you design anomaly detection, you must balance false positives, delayed detection, and seasonality, then explain thresholds and alerting logic. Interviews reveal gaps when candidates ignore multiple testing, autocorrelation, or how to label and evaluate anomalies realistically.
You own alerting for hourly payment failures, with strong day of week seasonality and autocorrelation. How do you set an anomaly threshold that controls false positives while still catching a 5 to 10 percent sustained increase within 2 hours?
Sample Answer
This question is checking whether you can separate signal from predictable structure, then tune detection under latency constraints. You should first model expected failures with seasonality and correlation, for example a regression or state space model with day of week effects, then monitor residuals $r_t = y_t - \hat{y}_t$ rather than raw $y_t$. Set thresholds on standardized residuals using an estimate of one step ahead uncertainty, and add a persistence rule like 2 consecutive breaches to reduce noise driven alerts. Calibrate the cutoff to a target false alarm rate using backtests on clean periods, and report the tradeoff curve between detection delay and false positives.
A streaming service monitors video start failures across 200 countries every minute, and runs the same anomaly test per country. How do you avoid being paged constantly due to multiple testing, while still surfacing real regional incidents quickly?
You detect a change point in conversion rate after a product launch, but traffic composition also shifted toward new users and a different set of geos. How do you decide whether it is a real product impact versus a confounder, and how would you adjust the analysis?
A quant strategy monitors intraday returns for structural breaks, but volatility is heteroskedastic and autocorrelated. Which change point method would you use, what statistic would you monitor, and how would you validate it without lookahead bias?
Amazon logistics tracks daily package delivery times and sees occasional extreme spikes due to weather, plus gradual drifts due to capacity constraints. How would you design an alerting system that distinguishes isolated outliers from sustained regime shifts, and how would you label data for evaluation?
Financial Time Series and Market Microstructure Basics
Financial Time Series and Market Microstructure Basics
For quant roles, interviews check whether you understand returns, volatility clustering, and non IID behavior, plus how backtests can lie due to lookahead and survivorship bias. Candidates struggle most with articulating a robust research workflow, from feature design to risk controls and validation.
You are modeling daily returns for a single stock to forecast next-day volatility, do you use simple returns or log returns, and how does that choice affect aggregation and your interpretation of extreme moves?
Sample Answer
The standard move is to use log returns $r_t = \log(P_t/P_{t-1})$ because they add nicely over time, so multi-day return is approximately $\sum r_t$. But here, the exception is large moves and low prices, because log returns and simple returns diverge when $|R_t|$ is big, and risk metrics like drawdowns are naturally in simple return space. If you care about compounding exactly, simple returns multiply, log returns add, choose based on what you report and how you aggregate. For microstructure-heavy data, your return definition also interacts with bid-ask bounce, so consistent price conventions matter as much as the formula.
You backtest an intraday mean-reversion strategy using midprice returns and it looks strong, but live trading loses money. What microstructure effects and costs are you likely missing, and how would you adjust the backtest to be realistic?
Your factor model uses a feature built from 'next day close to close return' but computed in a wide table that includes future dates. The backtest Sharpe is 3.0. Walk through how you would detect lookahead and survivorship bias, then redesign the research workflow to prevent them.
You observe volatility clustering in daily equity returns. Describe how you would test for it and pick a model family, and explain what you would monitor to ensure your volatility forecasts remain calibrated as regimes change.
You compute 1-minute returns using last-trade prices and see strong negative first-lag autocorrelation. What is bid-ask bounce, how would you verify it, and what alternative price series or sampling scheme would you use?
How to Prepare for Time Series Interviews
Build Features With Real Messiness
Practice creating lag and rolling features on datasets with missing timestamps, late-arriving data, and timezone issues. Use pandas with explicit UTC conversion and forward-fill strategies that respect information availability at prediction time.
Master ACF Pattern Recognition
Train yourself to diagnose stationarity issues from autocorrelation plots. Plot the ACF of raw data, first differences, and log differences until you can instantly recognize trend persistence, seasonal patterns, and white noise signatures.
Design Backtests That Mirror Production
Always propose rolling origin validation with realistic data cutoffs that simulate how your model would perform with only historical information. Account for training time, feature computation delays, and the business constraints of when predictions are actually needed.
Question Every Evaluation Metric
For each forecasting scenario, think through why MAPE, RMSE, or accuracy might give misleading results. Practice explaining metric failure modes to non-technical stakeholders and proposing alternatives that align with business costs.
Simulate Multiple Testing Problems
When designing anomaly detection for multiple time series, always address how you'll control family-wise error rates. Practice calculating Bonferroni corrections and explaining why naive per-series thresholds will flood you with false positives.
How Ready Are You for Time Series Interviews?
1 / 6You inherit a dataset of user signups with timestamps in local time across multiple time zones. The model will use daily features and predict next-day signups. What is the best preprocessing approach to avoid leakage and time alignment bugs?
Frequently Asked Questions
How much time series depth do I need for interviews in Data Scientist or Quantitative Researcher roles?
You should be comfortable with stationarity, autocorrelation, differencing, seasonality, and basic forecasting baselines like ARIMA and exponential smoothing. You also need to explain validation for time dependent data, including walk-forward backtesting and why random splits leak information. For quant roles, expect deeper questions on stochastic processes, state space models, and how assumptions break in real markets.
Which companies tend to ask the most time series interview questions?
You will see the most time series depth at hedge funds, proprietary trading firms, market makers, and quant research teams at banks. Time series also shows up heavily at ad tech, forecasting, and operations heavy companies, like logistics, marketplaces, and subscription businesses. If the team owns forecasting, anomaly detection, or experimentation with sequential data, you should expect time series questions.
Do I need to code for time series interviews, and what kinds of coding tasks appear?
Yes, you often need to code, especially to build features like lags and rolling statistics, run walk-forward evaluation, or implement a simple forecasting baseline. You may also be asked to diagnose leakage or compute metrics over horizons and groups. Practice these patterns with realistic prompts at datainterview.com/coding.
How do time series interviews differ between Data Scientist and Quantitative Researcher roles?
As a Data Scientist, you will usually focus on practical forecasting workflows, feature engineering, monitoring, and model selection under business constraints. As a Quantitative Researcher, you will be pushed harder on statistical assumptions, dependence structure, microstructure effects, and rigorous backtesting that avoids look-ahead bias. You should be able to discuss both predictive performance and whether your inference is valid under autocorrelation.
How can I prepare for time series interviews if I have no real-world time series experience?
You can build a small portfolio project that uses public data, like energy demand, retail sales, or macro indicators, and demonstrate walk-forward backtests with clear baselines. Make sure you show how you handle seasonality, missing timestamps, and exogenous variables, and include a simple error analysis by horizon. Use datainterview.com/questions to practice explaining your approach and tradeoffs clearly.
What are the most common mistakes candidates make in time series interviews?
You often lose points by using random train test splits, leaking future information through scaling or feature windows, or tuning on the test set instead of a rolling validation. Another common mistake is ignoring seasonality and calendar effects, or evaluating with a single aggregate metric that hides horizon specific errors. You should also avoid claiming causality from correlated time series without addressing confounding and autocorrelation.
