SQL Interview Questions

Dan Lee's profile image
Dan LeeData & AI Lead
Last updateMarch 13, 2026

SQL is the most tested skill in data interviews, appearing in 95% of Data Analyst, Data Scientist, and Data Engineer roles at top-tier companies. Meta, Google, Amazon, Airbnb, Uber, and Stripe all use SQL as a primary filter to assess your ability to work with real data at scale. Unlike coding interviews where algorithms dominate, SQL questions mirror the exact problems you'll solve on the job: measuring user engagement, computing business metrics, and debugging production queries.

What makes SQL interviews particularly challenging is that seemingly simple questions hide multiple layers of complexity. A question about computing monthly revenue sounds straightforward until you realize you need to handle refunds in different months, avoid double-counting from table joins, and deal with NULL customer IDs for guest purchases. The difference between a junior and senior data professional is often their ability to spot these edge cases and write queries that work correctly in production.

Here are the top 33 SQL interview questions organized by the core skills companies actually test, from basic filtering semantics to advanced performance optimization.

Intermediate33 questions

SQL Interview Questions

Top SQL interview questions covering the key areas tested at leading tech companies. Practice with real questions and detailed solutions.

Data AnalystData ScientistData EngineerMetaGoogleAmazonAirbnbUberStripeNetflixSpotify

Core SELECT, Filtering, and NULL Semantics

Most candidates fail SQL interviews not because they can't write SELECT statements, but because they misunderstand how NULL values behave in filters and comparisons. Companies use these foundational questions to quickly identify candidates who will write buggy queries that return incorrect results in production.

The most common mistake is assuming that NOT IN works the same way when the subquery contains NULLs. If your subquery returns even a single NULL value, NOT IN will return zero rows, not the filtered set you expect.

Core SELECT, Filtering, and NULL Semantics

Start here: you are tested on whether you can accurately filter and shape rows under time pressure. Candidates often slip on NULL behavior, boolean logic, date handling, and subtle differences between WHERE and HAVING.

At Netflix, you are pulling a list of active subscriptions where country is not 'US'. The table has many NULL countries due to missing data. Write a query that returns only truly non-US rows and excludes NULLs, and explain why a naive filter is wrong.

NetflixNetflixEasyCore SELECT, Filtering, and NULL Semantics

Sample Answer

Most candidates default to `WHERE country <> 'US'`, but that fails here because comparisons with NULL evaluate to UNKNOWN and get filtered out in `WHERE`. You need to be explicit about NULL handling, for example `WHERE country IS NOT NULL AND country <> 'US'`. If the requirement instead is to treat NULL as non-US, you would use `WHERE COALESCE(country,'') <> 'US'` or `WHERE country IS NULL OR country <> 'US'`. The key is remembering SQL uses three-valued logic, TRUE, FALSE, UNKNOWN.

Practice more Core SELECT, Filtering, and NULL Semantics questions

Aggregations, Grouping, and Conditional Metrics

Aggregation questions test whether you understand the difference between row-level and group-level logic, which is critical for building accurate business metrics. Data professionals who confuse WHERE and HAVING clauses or misuse window functions in aggregations create dashboards that show wrong numbers to executives.

The key insight is that conditional aggregation with CASE statements is often cleaner and more performant than multiple subqueries. Instead of writing separate queries for each metric, you can compute DAU, conversion rate, and average order value in a single pass through the data.

Aggregations, Grouping, and Conditional Metrics

In analytics and product SQL interviews, you need to translate a metric definition into correct GROUP BY logic. You will likely struggle if you mix row level filters with aggregate filters, or if you mis-handle distinct counts, cohorts, and conditional aggregation.

At Meta, you are given events(user_id, event_time, event_name). Define DAU for each date as users who had at least one 'app_open' that day, and also compute the share of DAU who had at least one 'purchase' the same day.

MetaMetaMediumAggregations, Grouping, and Conditional Metrics

Sample Answer

Compute per-day DAU with a distinct user count of app_open users, and compute purchase share using conditional distinct counting in the same grouped query. You group by date(event_time), set DAU as COUNT(DISTINCT CASE WHEN event_name = 'app_open' THEN user_id END). Then compute purchasers as COUNT(DISTINCT CASE WHEN event_name = 'purchase' THEN user_id END) and divide by DAU, guarding against $0$ with NULLIF. You do not filter the table to app_open only, or you will undercount purchasers who did not also have an app_open row in the filtered set.

Practice more Aggregations, Grouping, and Conditional Metrics questions

Joins, Relationships, and Duplicates

Join questions separate candidates who write correct queries from those who accidentally multiply rows and return inflated metrics. This skill matters because real production tables have one-to-many relationships, and a poorly written join can make your revenue numbers 10x too high.

Always ask yourself: am I joining to a table that has multiple rows per key? If yes, you need to either filter to one row per key before joining, or use window functions to deduplicate after joining. The FROM clause structure determines whether your final aggregations will be correct.

Joins, Relationships, and Duplicates

Expect join questions to probe whether you can reason about cardinality and avoid accidental row multiplication. You will get tripped up when the interviewer introduces one-to-many relationships, missing keys, or requires anti-joins and de-duplication.

At Spotify, you need daily active users by country from `events(user_id, event_ts)` and `users(user_id, country, updated_at)`, where `users` has multiple rows per user due to profile updates. Write a query that counts distinct active users per day and country without multiplying rows.

SpotifySpotifyMediumJoins, Relationships, and Duplicates

Sample Answer

You could join `events` to `users` directly, or you could first collapse `users` to one row per `user_id` and then join. The direct join looks simpler but it multiplies each event by the number of profile versions, inflating counts. The de-dup wins here because it restores the intended 1-to-1 join key, for example by picking the latest profile per user with `ROW_NUMBER() = 1`. Then you group by `date(event_ts)` and `country`, and count distinct `user_id`.

Practice more Joins, Relationships, and Duplicates questions

Subqueries, CTEs, and Set Operations

Subquery and CTE questions test your ability to break down complex business logic into readable, maintainable SQL. Senior data professionals use CTEs to make their intent clear, while junior candidates write nested subqueries that are impossible to debug.

Window functions inside CTEs are your best tool for ranking and filtering patterns. When you need the top N items per group or consecutive day streaks, build your window function first in a CTE, then filter the results in the outer query.

Subqueries, CTEs, and Set Operations

Rather than writing one giant query, you are evaluated on how you structure logic into readable, correct steps. You can lose points by using correlated subqueries incorrectly, misunderstanding UNION vs UNION ALL, or producing non-deterministic results.

At Spotify, you have a table plays(user_id, track_id, played_at). Return the top 3 tracks per day by unique listeners for the last 7 days, and include ties deterministically so results do not change between runs.

SpotifySpotifyHardSubqueries, CTEs, and Set Operations

Sample Answer

Reason through it: first, you filter plays to the last 7 days and compute daily unique listeners per track with a GROUP BY on date(played_at), track_id and COUNT(DISTINCT user_id). Next, you rank tracks within each day using a window function like DENSE_RANK() over (PARTITION BY play_date ORDER BY listeners DESC, track_id ASC) so ties are handled and ordering is deterministic. Then you select ranks $\le 3$ and output play_date, track_id, listeners. If the question forces subqueries or CTEs, you put the aggregation in a CTE, then apply the ranking in an outer SELECT to keep the logic readable and correct.

Practice more Subqueries, CTEs, and Set Operations questions

Window Functions and Advanced Analytics SQL

Window function questions are where companies assess your ability to solve advanced analytics problems that simple GROUP BY cannot handle. These questions mirror real product analytics work: cohort analysis, rolling metrics, and ranking systems that power recommendation engines.

The secret to window functions is understanding the frame specification. ROWS BETWEEN tells SQL exactly which rows to include in calculations like rolling sums, and getting this wrong will shift your numbers by days or weeks in time-series analysis.

Window Functions and Advanced Analytics SQL

When the question asks for top-N per group, running totals, retention curves, or sessionization, you are expected to reach for window functions. You may struggle with partitioning, ordering, frames, and the difference between windowed and grouped aggregates.

Meta wants to show the 3 most recent posts per user in the feed. Given a posts table (user_id, post_id, created_at), write SQL to return only the latest 3 posts per user, breaking ties deterministically.

MetaMetaEasyWindow Functions and Advanced Analytics SQL

Sample Answer

This question is checking whether you can use window functions for top-N per group without accidentally collapsing rows. You should rank posts per user with ROW_NUMBER() over (PARTITION BY user_id ORDER BY created_at DESC, post_id DESC). Then filter to row_number <= 3 in an outer query or QUALIFY if supported. The deterministic tie break is the extra ORDER BY column, otherwise your results can be non-repeatable.

Practice more Window Functions and Advanced Analytics SQL questions

Query Performance, Debugging, and Edge Cases

Performance and debugging questions test whether you can optimize queries for production scale and diagnose problems when things go wrong. These skills matter because a query that works on sample data might time out or return wrong results when run against billions of rows.

Query execution plans reveal the truth about performance: are you scanning full tables when you should hit an index, or creating massive intermediate results from poorly ordered joins? Learning to read EXPLAIN plans turns you from someone who writes queries into someone who optimizes data systems.

Query Performance, Debugging, and Edge Cases

To close out, you are judged on whether your SQL would hold up in production style datasets at Meta, Google, Amazon, or Stripe. You can stumble by ignoring explain plans, indexing implications, predicate pushdown, and correctness under messy data edge cases.

At Meta, you need daily active users per country from a 10B row events table. Your query joins events to users and filters to the last 30 days, but it scans the full table. What specific query changes and physical design choices would you make to force predicate pushdown and avoid the scan?

MetaMetaHardQuery Performance, Debugging, and Edge Cases

Sample Answer

The standard move is to filter as early as possible and join as late as possible, so you put the 30 day filter in a subquery or CTE that only selects needed columns, then aggregate, then join. But here, predicate pushdown can be blocked by wrapping the partition column in a function, for example WHERE DATE(event_ts) >= ..., so you should filter with a sargable predicate like event_ts >= CURRENT_TIMESTAMP - INTERVAL '30' DAY. Partition on event_date or event_ts, cluster or sort by user_id, and only project the columns you need so the engine can prune partitions and use data skipping. Verify with EXPLAIN that the filter is applied at the scan and that the join order did not pull the users table first.

Practice more Query Performance, Debugging, and Edge Cases questions

How to Prepare for SQL Interviews

Practice NULL edge cases systematically

For every filter or join condition you write, ask what happens if the column contains NULL values. Test your assumptions by creating small sample tables with NULLs and running your queries against them.

Build queries incrementally from inner to outer

Start with the most complex CTE or subquery, verify it returns the right data, then build the next layer. This approach catches logical errors early and makes debugging much faster during interviews.

Always specify your window frame explicitly

Instead of letting SQL guess, write ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW for running totals or ROWS BETWEEN 6 PRECEDING AND CURRENT ROW for 7-day windows. Explicit frames prevent off-by-one errors.

Test join logic with small manual examples

Before writing complex joins, sketch out 3-4 rows from each table on paper and manually trace through what the join should produce. This catches cardinality issues before you run the query.

Use consistent date filtering patterns

Establish habits like DATE(timestamp_column) = '2024-01-01' for exact day matches or timestamp_column >= '2024-01-01' AND timestamp_column < '2024-01-02' for day ranges. Consistent patterns reduce bugs and improve readability.

How Ready Are You for SQL Interviews?

1 / 6
Core SELECT, Filtering, and NULL Semantics

You need all users who are not from the US, but the country column is nullable and you want to exclude NULLs from the result. Which predicate best matches the requirement?

Frequently Asked Questions

How much SQL depth do I need for a Data Analyst, Data Scientist, or Data Engineer interview?

You should be comfortable writing joins, aggregations, subqueries or CTEs, and window functions, and you should understand how NULLs affect logic and aggregates. For analyst and scientist roles, focus on accurate analytics SQL and edge cases like duplicates and time ranges. For engineering roles, add deeper knowledge of query performance, indexing basics, data modeling, and incremental loads.

Which companies tend to ask the most SQL interview questions?

Data heavy companies often lean hard on SQL, including Meta, Google, Amazon, Microsoft, Uber, Lyft, Airbnb, Netflix, and Stripe. Many fintech, marketplaces, and ad tech firms also use SQL as a primary filter because day to day work is query driven. You should expect at least one SQL round at most roles where you will analyze product, growth, or pipeline data.

Do SQL interviews require live coding, or is it mostly discussion?

Many interviews include writing SQL live in a shared editor, or in a take home query task, not just talking through concepts. You may be asked to produce a correct query, then iterate after feedback to handle edge cases and improve readability. Practice writing queries from scratch at datainterview.com/coding and review common prompts at datainterview.com/questions.

How do SQL interviews differ across Data Analyst, Data Scientist, and Data Engineer roles?

For Data Analysts, questions emphasize business metrics, cohorting, funnels, and clean aggregations with correct grouping and filtering. For Data Scientists, expect experiment analysis, feature extraction, and time series style queries, often using window functions and careful joins. For Data Engineers, expect more about correctness at scale, schema design, deduplication, slowly changing dimensions, and performance considerations like partitioning and avoiding expensive cross joins.

How can I prepare for SQL interviews if I have no real world SQL experience?

Start by learning a consistent pattern: define the grain, list the tables, decide join keys, then build the query in layers using CTEs. Work through realistic datasets and prompts, then compare your outputs to expected results to catch logic bugs like double counting. Use datainterview.com/questions to study common scenarios and datainterview.com/coding to practice writing complete solutions under time constraints.

What are common SQL mistakes that cause people to fail interviews?

You often lose points by double counting from many to many joins, forgetting DISTINCT when needed, or grouping at the wrong grain. Another common issue is mishandling NULLs, for example using NOT IN with NULLs, or filtering in WHERE instead of ON and changing join behavior. You should also avoid ambiguous columns, unreadable nested queries, and missing edge cases like ties, multiple events per user, or incomplete date ranges.

Dan Lee's profile image

Written by

Dan Lee

Data & AI Lead

Dan is a seasoned data scientist and ML coach with 10+ years of experience at Google, PayPal, and startups. He has helped candidates land top-paying roles and offers personalized guidance to accelerate your data career.

Connect on LinkedIn