Last Chance to Join Data Science Interview MasterClass (this week) 🚀 | Just 2 spots remaining...

PRO SQL Modules

SQL Basics
1h 20m • 4 lessons
  1. SQL SELECT
  2. LIMIT
  3. ORDER BY
  4. DISTINCT
Data Filtering
1h 20m • 4 lessons
  1. WHERE
  2. LIKE
  3. IN
  4. BETWEEN
Aggregations
40m • 2 lessons
  1. GROUP BY
  2. HAVING
Multiple Tables
1h 40m • 5 lessons
  1. INNER JOIN
  2. LEFT & RIGHT JOIN
  3. FULL OUTER JOIN
  4. SELF JOIN
  5. INTERSECT & EXCEPT
Query Restructuring
40m • 2 lessons
  1. Subquery
  2. Common Table Expressions (CTE)
Data Transformation
1h 40m • 5 lessons
  1. CASE
  2. String
  3. Date Manipulation
  4. Math Operations
  5. COALESCE
Analytical SQL
2h • 6 lessons
  1. Ranking - ROW_NUMBER(), RANK(), DENSE_RANK()
  2. Running Totals
  3. LAG and LEAD
  4. FIRST_VALUE, LAST_VALUE, Nth_VALUE
  5. PARTITION BY
  6. SQL BETWEEN and PRECEDING
Unlock Premium
FREE

GROUP BY

Course Author Daniel Lee
Instructor

Dan Lee

Learn SQL for free with interactive exercises designed to help aspiring Data Engineers, Data Analysts, and Data Scientists. Learn the basics and advanced SQL concepts by solving real-world problems from MANGO (Meta, Amazon, Netflix, Google, OpenAI)

Lesson Objectives

By the end of this lesson, you will:

  • Learn how to summarize data using GROUP BY
  • Understand aggregate functions like SUM(), COUNT(), AVG(), MIN(), MAX()
  • Write SQL queries to analyze user engagement with Netflix’s watch history

🎬 Scenario: Analyzing User Watch Patterns

Your manager at Netflix, Alex, wants insights into how users engage with the platform. They need a summary of:

  • The total number of shows watched per user
  • The average watch duration per user
  • The longest and shortest watch session per user

Alex asks:

Can you group user watch data and provide aggregated statistics for each user?

Your task is to query the watch_history table and provide the requested insights.

history_iduser_idseries_idwatch_datewatch_duration_minutescompleted
1122024-02-10T18:30:0060false
2242024-01-20T21:00:0050true
3312024-02-01T19:45:0075true
4432024-02-15T17:10:0030false
55102024-02-18T23:00:0045true

1. Understanding GROUP BY

The GROUP BY clause allows us to group data by a specific column and apply aggregate functions to summarize the results.

Syntax

1SELECT column1, aggregate_function(column2)  
2FROM table_name  
3GROUP BY column1;
  • GROUP BY column1 groups rows that share the same value in column1.
  • The aggregate function performs calculations on the grouped rows.

2. Counting Shows Watched per User

To find the total number of shows watched per user, we can use COUNT() to count the number of series_id values for each user.

Example Query: Counting Shows Watched

1SELECT user_id, COUNT(series_id) AS total_shows_watched  
2FROM watch_history  
3GROUP BY user_id;
user_idtotal_shows_watched
13
23
32
42
52

What’s Happening?

  • COUNT(series_id) counts how many different shows each user has watched.
  • GROUP BY user_id ensures each row in the output represents a unique user.

3. Calculating Average Watch Duration

To find the average time users spend watching, we use AVG() on the watch_duration_minutes column.

Example Query: Average Watch Duration Per User

1SELECT user_id, AVG(watch_duration_minutes) AS avg_watch_time  
2FROM watch_history  
3GROUP BY user_id;
user_idavg_watch_time
145.0
260.0
367.5
430.0
542.5

What’s Happening?

  • AVG(watch_duration_minutes) calculates the average watch time per user.
  • GROUP BY user_id ensures each row represents a single user.

4. Finding Maximum & Minimum Watch Duration

We can use MAX() and MIN() to determine the longest and shortest viewing session for each user.

Example Query: Longest & Shortest Watch Session

1SELECT user_id,  
2       MAX(watch_duration_minutes) AS max_watch_time,  
3       MIN(watch_duration_minutes) AS min_watch_time  
4FROM watch_history  
5GROUP BY user_id;
user_idmax_watch_timemin_watch_time
16025
29040
37560
45030
54540

What’s Happening?

  • MAX(watch_duration_minutes) returns the longest watch session for each user.
  • MIN(watch_duration_minutes) returns the shortest watch session per user.

✍️ SQL Exercises

Exercise 1: Counting the Number of Shows Watched per User

Alex wants to know how many shows each user has watched.

Filter condition:

  • Count the number of series_id each user has watched.
  • Sort by total_shows_watched in descending order.
user_idtotal_shows_watched
13
23
32
42
52

Write an SQL query to return the requested data.

Upgrade to Pro

Exercise 2: Finding the Average Watch Duration per User

Alex also wants to see how much time each user spends watching on average.

Filter condition:

  • Calculate the average watch duration for each user.
  • Sort by avg_watch_time in descending order.
user_idavg_watch_time
367.5
260.0
145.0
542.5
430.0

Write an SQL query to return the requested data.

Upgrade to Pro

👉 Found this lesson helpful?