PRO SQL Modules
SQL Basics1h 20m • 4 lessons
Multiple Tables1h 40m • 5 lessons
Query Restructuring40m • 2 lessons
Data Transformation1h 40m • 5 lessons
Analytical SQL2h • 6 lessons
GROUP BY

Dan Lee
Lesson Objectives
By the end of this lesson, you will:
- Learn how to summarize data using GROUP BY
- Understand aggregate functions like SUM(), COUNT(), AVG(), MIN(), MAX()
- Write SQL queries to analyze user engagement with Netflix’s watch history
🎬 Scenario: Analyzing User Watch Patterns
Your manager at Netflix, Alex, wants insights into how users engage with the platform. They need a summary of:
- The total number of shows watched per user
- The average watch duration per user
- The longest and shortest watch session per user
Alex asks:
Can you group user watch data and provide aggregated statistics for each user?
Your task is to query the watch_history table and provide the requested insights.
history_id | user_id | series_id | watch_date | watch_duration_minutes | completed |
---|---|---|---|---|---|
1 | 1 | 2 | 2024-02-10T18:30:00 | 60 | false |
2 | 2 | 4 | 2024-01-20T21:00:00 | 50 | true |
3 | 3 | 1 | 2024-02-01T19:45:00 | 75 | true |
4 | 4 | 3 | 2024-02-15T17:10:00 | 30 | false |
5 | 5 | 10 | 2024-02-18T23:00:00 | 45 | true |
1. Understanding GROUP BY
The GROUP BY clause allows us to group data by a specific column and apply aggregate functions to summarize the results.
Syntax
1SELECT column1, aggregate_function(column2)
2FROM table_name
3GROUP BY column1;
- GROUP BY column1 groups rows that share the same value in column1.
- The aggregate function performs calculations on the grouped rows.
2. Counting Shows Watched per User
To find the total number of shows watched per user, we can use COUNT() to count the number of series_id values for each user.
Example Query: Counting Shows Watched
1SELECT user_id, COUNT(series_id) AS total_shows_watched
2FROM watch_history
3GROUP BY user_id;
user_id | total_shows_watched |
---|---|
1 | 3 |
2 | 3 |
3 | 2 |
4 | 2 |
5 | 2 |
What’s Happening?
- COUNT(series_id) counts how many different shows each user has watched.
- GROUP BY user_id ensures each row in the output represents a unique user.
3. Calculating Average Watch Duration
To find the average time users spend watching, we use AVG() on the watch_duration_minutes column.
Example Query: Average Watch Duration Per User
1SELECT user_id, AVG(watch_duration_minutes) AS avg_watch_time
2FROM watch_history
3GROUP BY user_id;
user_id | avg_watch_time |
---|---|
1 | 45.0 |
2 | 60.0 |
3 | 67.5 |
4 | 30.0 |
5 | 42.5 |
What’s Happening?
- AVG(watch_duration_minutes) calculates the average watch time per user.
- GROUP BY user_id ensures each row represents a single user.
4. Finding Maximum & Minimum Watch Duration
We can use MAX() and MIN() to determine the longest and shortest viewing session for each user.
Example Query: Longest & Shortest Watch Session
1SELECT user_id,
2 MAX(watch_duration_minutes) AS max_watch_time,
3 MIN(watch_duration_minutes) AS min_watch_time
4FROM watch_history
5GROUP BY user_id;
user_id | max_watch_time | min_watch_time |
---|---|---|
1 | 60 | 25 |
2 | 90 | 40 |
3 | 75 | 60 |
4 | 50 | 30 |
5 | 45 | 40 |
What’s Happening?
- MAX(watch_duration_minutes) returns the longest watch session for each user.
- MIN(watch_duration_minutes) returns the shortest watch session per user.
✍️ SQL Exercises
Exercise 1: Counting the Number of Shows Watched per User
Alex wants to know how many shows each user has watched.
Filter condition:
- Count the number of series_id each user has watched.
- Sort by total_shows_watched in descending order.
user_id | total_shows_watched |
---|---|
1 | 3 |
2 | 3 |
3 | 2 |
4 | 2 |
5 | 2 |
Write an SQL query to return the requested data.
Exercise 2: Finding the Average Watch Duration per User
Alex also wants to see how much time each user spends watching on average.
Filter condition:
- Calculate the average watch duration for each user.
- Sort by avg_watch_time in descending order.
user_id | avg_watch_time |
---|---|
3 | 67.5 |
2 | 60.0 |
1 | 45.0 |
5 | 42.5 |
4 | 30.0 |
Write an SQL query to return the requested data.