Join ML Engineer Interview MasterClass (April Cohort) led by FAANG Data Scientists | Just 6 seats remaining...
ML Engineer MasterClass (April) | 6 seats left
By the end of this lesson, you will:
Your manager at Netflix, Alex, wants to identify power users—those who have watched at least 3 different series.
Can you find users who have watched at least 3 series?
Your task is to query the watch_history table and filter results based on aggregated values.
| history_id | user_id | series_id | watch_date | watch_duration_minutes | completed |
|---|---|---|---|---|---|
| 1 | 1 | 2 | “2024-02-10T18:30:00” | 60 | false |
| 2 | 2 | 4 | “2024-01-20T21:00:00” | 50 | true |
| 3 | 3 | 1 | “2024-02-01T19:45:00” | 75 | true |
| 4 | 4 | 3 | “2024-02-15T17:10:00” | 30 | false |
| 5 | 5 | 10 | “2024-02-18T23:00:00” | 45 | true |
| 6 | 1 | 5 | “2024-02-12T14:30:00” | 50 | true |
| 7 | 2 | 7 | “2024-02-15T19:00:00” | 40 | false |
| 8 | 3 | 9 | “2024-02-20T20:15:00” | 60 | true |
| 9 | 4 | 6 | “2024-02-22T21:45:00” | 30 | false |
| 10 | 5 | 8 | “2024-02-25T18:00:00” | 40 | true |
| 11 | 1 | 4 | “2024-03-01T20:00:00” | 25 | false |
| 12 | 2 | 2 | “2024-03-03T22:30:00” | 90 | true |
Both WHERE and HAVING allow us to filter data, but they work differently:
| Clause | Used For | Works With |
|---|---|---|
WHERE | Filters individual rows before aggregation | Regular column values |
HAVING | Filters aggregated results after GROUP BY | Aggregate functions |
Filtering Before Aggregation (WHERE)
Find all watch sessions that lasted more than 30 minutes:
SELECT user_id, series_id, watch_duration_minutes
FROM watch_history
WHERE watch_duration_minutes > 30;| user_id | series_id | watch_duration_minutes |
|---|---|---|
| 1 | 2 | 60 |
| 2 | 4 | 50 |
| 3 | 1 | 75 |
| 5 | 10 | 45 |
| 1 | 5 | 50 |
| 2 | 7 | 40 |
| 3 | 9 | 60 |
| 5 | 8 | 40 |
| 2 | 2 | 90 |
Filtering After Aggregation (HAVING)
Find users who have watched at least 3 different series:
1SELECT user_id, COUNT(series_id) AS total_shows_watched
2FROM watch_history
3GROUP BY user_id
4HAVING COUNT(series_id) >= 3;| user_id | total_shows_watched |
|---|---|
| 1 | 3 |
| 2 | 3 |
Since HAVING works after aggregation, we can use it to filter users based on how many shows they watched.
1SELECT user_id, COUNT(series_id) AS total_shows_watched
2FROM watch_history
3GROUP BY user_id
4HAVING COUNT(series_id) >= 3;| user_id | total_shows_watched |
|---|---|
| 1 | 3 |
| 2 | 3 |
What’s Happening?
COUNT(series_id) calculates how many different shows each user has watched.HAVING COUNT(series_id) >= 3 filters the result after aggregation, keeping only users who watched at least 3 series.You can use WHERE and HAVING together in a query.
For example, if we want to:
WHERE watch_duration_minutes > 30)HAVING COUNT(series_id) >= 2)We can write:
1SELECT user_id, COUNT(series_id) AS total_shows_watched
2FROM watch_history
3WHERE watch_duration_minutes > 30
4GROUP BY user_id
5HAVING COUNT(series_id) >= 2;| user_id | total_shows_watched |
|---|---|
| 1 | 2 |
| 2 | 3 |
| 3 | 2 |
| 5 | 2 |
What’s Happening?
WHERE watch_duration_minutes > 30 filters out watch sessions before aggregation.GROUP BY user_id groups data by user.HAVING COUNT(series_id) >= 2 filters users based on the aggregated count.Alex wants a report of users who have watched at least 3 different series.
Filter condition:
total_shows_watched in descending order.| user_id | total_shows_watched |
|---|---|
| 1 | 3 |
| 2 | 3 |
Write an SQL query to return the requested data.
Alex also wants to identify users who watched at least 2 series but only counting watch sessions longer than 30 minutes.
Filter condition:
| user_id | total_shows_watched |
|---|---|
| 2 | 3 |
| 1 | 2 |
| 3 | 2 |
| 5 | 2 |
Write an SQL query to return the requested data.