Join ML Engineer Interview MasterClass (April Cohort) led by FAANG Data Scientists | Just 6 seats remaining...
ML Engineer MasterClass (April) | 6 seats left
By the end of this lesson, you will:
Your manager at Netflix, Alex, wants insights into how users engage with the platform. They need a summary of:
Alex asks:
Can you group user watch data and provide aggregated statistics for each user?
Your task is to query the watch_history table and provide the requested insights.
| history_id | user_id | series_id | watch_date | watch_duration_minutes | completed |
|---|---|---|---|---|---|
| 1 | 1 | 2 | 2024-02-10T18:30:00 | 60 | false |
| 2 | 2 | 4 | 2024-01-20T21:00:00 | 50 | true |
| 3 | 3 | 1 | 2024-02-01T19:45:00 | 75 | true |
| 4 | 4 | 3 | 2024-02-15T17:10:00 | 30 | false |
| 5 | 5 | 10 | 2024-02-18T23:00:00 | 45 | true |
The GROUP BY clause allows us to group data by a specific column and apply aggregate functions to summarize the results.
SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1;GROUP BY column1 groups rows that share the same value in column1.To find the total number of shows watched per user, we can use COUNT() to count the number of series_id values for each user.
SELECT user_id, COUNT(series_id) AS total_shows_watched
FROM watch_history
GROUP BY user_id;| user_id | total_shows_watched |
|---|---|
| 1 | 3 |
| 2 | 3 |
| 3 | 2 |
| 4 | 2 |
| 5 | 2 |
What’s Happening?
COUNT(series_id) counts how many different shows each user has watched.GROUP BY user_id ensures each row in the output represents a unique user.To find the average time users spend watching, we use AVG() on the watch_duration_minutes column.
SELECT user_id, AVG(watch_duration_minutes) AS avg_watch_time
FROM watch_history
GROUP BY user_id;| user_id | avg_watch_time |
|---|---|
| 1 | 45.0 |
| 2 | 60.0 |
| 3 | 67.5 |
| 4 | 30.0 |
| 5 | 42.5 |
What’s Happening?
AVG(watch_duration_minutes) calculates the average watch time per user.GROUP BY user_id ensures each row represents a single user.We can use MAX() and MIN() to determine the longest and shortest viewing session for each user.
1SELECT user_id,
2 MAX(watch_duration_minutes) AS max_watch_time,
3 MIN(watch_duration_minutes) AS min_watch_time
4FROM watch_history
5GROUP BY user_id;| user_id | max_watch_time | min_watch_time |
|---|---|---|
| 1 | 60 | 25 |
| 2 | 90 | 40 |
| 3 | 75 | 60 |
| 4 | 50 | 30 |
| 5 | 45 | 40 |
What’s Happening?
MAX(watch_duration_minutes) returns the longest watch session for each user.MIN(watch_duration_minutes) returns the shortest watch session per user.Alex wants to know how many shows each user has watched.
Filter condition:
series_id each user has watched.total_shows_watched in descending order.| user_id | total_shows_watched |
|---|---|
| 1 | 3 |
| 2 | 3 |
| 3 | 2 |
| 4 | 2 |
| 5 | 2 |
Write an SQL query to return the requested data.
Alex also wants to see how much time each user spends watching on average.
Filter condition:
avg_watch_time in descending order.| user_id | avg_watch_time |
|---|---|
| 3 | 67.5 |
| 2 | 60.0 |
| 1 | 45.0 |
| 5 | 42.5 |
| 4 | 30.0 |
Write an SQL query to return the requested data.