Last Chance to Join Data Science Interview MasterClass (this week) 🚀 | Just 2 spots remaining...

FREE

PARTITION BY

Course Author Daniel Lee
Instructor

Dan Lee

Learn SQL for free with interactive exercises designed to help aspiring Data Engineers, Data Analysts, and Data Scientists. Learn the basics and advanced SQL concepts by solving real-world problems from MANGO (Meta, Amazon, Netflix, Google, OpenAI)

Lesson Objectives

By the end of this lesson, you will:

  • Learn how to use PARTITION BY to segment data into logical groups
  • Understand the difference between PARTITION BY and GROUP BY
  • Apply window functions with PARTITION BY to analyze engagement trends on Meta posts

🎬 Scenario: Analyzing Engagement per User

Your manager at Meta, Jordan, wants to analyze user engagement patterns, specifically:

How many likes has each user received across all their posts? Can we also find their highest liked post?

Your task is to use PARTITION BY with window functions to segment and analyze user-level engagement. Use the meta_posts table in this lesson:

post_iduser_idpost_contentpost_datelikessharespost_status
11“Had a great day at the beach!”“2024-01-10T14:30:00”12045“Public”
22“Launching our new product!”“2024-01-12T09:20:00”25090“Friends Only”
33“Just finished a 10K run!”“2024-01-15T18:10:00”8030“Public”
44“Hosting a webinar next week.”“2024-01-18T20:45:00”5025“Private”
55“Traveling to Japan soon!”“2024-01-20T10:05:00”18075“Public”
61“Excited for the weekend!”“2024-01-25T16:15:00”9540“Friends Only”
72“Need recommendations for a new laptop.”“2024-01-27T12:40:00”7020“Public”
83“Learning SQL window functions!”“2024-01-29T08:55:00”11050“Public”
94“Working on a new AI model.”“2024-02-01T14:00:00”200100“Friends Only”
105“Book recommendations?”“2024-02-05T11:30:00”6515“Public”

1. Understanding PARTITION BY in SQL

The PARTITION BY clause groups rows into subsets without aggregating them into a single row, like GROUP BY does.

ClauseWhat It DoesKeeps Individual Rows?Example Use Case
GROUP BYGroups rows and collapses data into one row per group❌ NoFind total likes per user
PARTITION BYGroups rows without collapsing them✅ YesCalculate running totals per user

Syntax for PARTITION BY

1SELECT column1, column2,  
2       function() OVER (PARTITION BY column ORDER BY column) AS result  
3FROM table_name;
  • PARTITION BY groups the dataset into logical segments.
  • Unlike GROUP BY, it retains all rows and applies the function within each partition.

2. Using PARTITION BY to Find Total Likes Per User

Jordan wants a report showing the total number of likes each user has received across all their posts, while keeping each post visible.

Example Query: Calculating Total Likes Per User

1SELECT user_id, post_id, post_date, likes,  
2       SUM(likes) OVER (PARTITION BY user_id) AS total_likes_per_user  
3FROM meta_posts;
user_idpost_idpost_datelikestotal_likes_per_user
33“2024-01-15T18:10:00”80190
38“2024-01-29T08:55:00”110190
44“2024-01-18T20:45:00”50250
49“2024-02-01T14:00:00”200250
11“2024-01-10T14:30:00”120215
16“2024-01-25T16:15:00”95215
22“2024-01-12T09:20:00”250320
27“2024-01-27T12:40:00”70320
55“2024-01-20T10:05:00”180245
510“2024-02-05T11:30:00”65245

What’s Happening?

  • SUM(likes) OVER (PARTITION BY user_id) calculates total likes per user.
  • Each post still appears in the result, but all posts for a user show the same total.

3. Using PARTITION BY to Rank Posts Per User

Jordan now wants to rank posts per user based on likes, so they can identify each user’s most popular post.

Example Query: Ranking Posts Per User by Likes

1SELECT user_id, post_id, post_date, likes,  
2       RANK() OVER (PARTITION BY user_id ORDER BY likes DESC) AS rank_per_user  
3FROM meta_posts;
user_idpost_idpost_datelikesrank_per_user
38“2024-01-29T08:55:00”1101
33“2024-01-15T18:10:00”802
49“2024-02-01T14:00:00”2001
44“2024-01-18T20:45:00”502
11“2024-01-10T14:30:00”1201
16“2024-01-25T16:15:00”952
22“2024-01-12T09:20:00”2501
27“2024-01-27T12:40:00”702
55“2024-01-20T10:05:00”1801
510“2024-02-05T11:30:00”652

What’s Happening?

  • RANK() OVER (PARTITION BY user_id ORDER BY likes DESC) assigns rankings separately for each user.
  • The highest liked post for each user is ranked 1.

4. Using PARTITION BY to Track Running Totals Per User

Jordan also wants to track how engagement grows for each user over time.

Example Query: Running Total of Likes Per User

1SELECT user_id, post_id, post_date, likes,  
2       SUM(likes) OVER (PARTITION BY user_id ORDER BY post_date) AS running_total_likes  
3FROM meta_posts;
user_idpost_idpost_datelikesrunning_total_likes
33“2024-01-15T18:10:00”8080
38“2024-01-29T08:55:00”110190
44“2024-01-18T20:45:00”5050
49“2024-02-01T14:00:00”200250
11“2024-01-10T14:30:00”120120
16“2024-01-25T16:15:00”95215
22“2024-01-12T09:20:00”250250
27“2024-01-27T12:40:00”70320
55“2024-01-20T10:05:00”180180
510“2024-02-05T11:30:00”65245

What’s Happening?

  • SUM(likes) OVER (PARTITION BY user_id ORDER BY post_date) tracks engagement growth for each user.
  • Helps understand how engagement accumulates over time.

✍️ SQL Exercises

Exercise 1: Calculating Total Likes Per User

Jordan wants to find the total number of likes each user has received across all posts, while keeping each post visible.

Filter condition:

  • Retrieve user ID, post ID, post date, likes, and total likes per user.
  • Sort by user_id and post_date in ascending order.
user_idpost_idpost_datelikestotal_likes_per_user
33“2024-01-15T18:10:00”80190
38“2024-01-29T08:55:00”110190
44“2024-01-18T20:45:00”50250
49“2024-02-01T14:00:00”200250
11“2024-01-10T14:30:00”120215
16“2024-01-25T16:15:00”95215
22“2024-01-12T09:20:00”250320
27“2024-01-27T12:40:00”70320
55“2024-01-20T10:05:00”180245
510“2024-02-05T11:30:00”65245

Write an SQL query to return the requested data.

Upgrade to Pro

Exercise 2: Ranking Posts Per User by Likes

Jordan also wants to rank each user’s posts based on likes.

Filter condition:

  • Retrieve user ID, post ID, post date, likes, and rank per user.
  • Sort by user_id and rank in ascending order.
user_idpost_idpost_datelikesrank_per_user
38“2024-01-29T08:55:00”1101
33“2024-01-15T18:10:00”802
49“2024-02-01T14:00:00”2001
44“2024-01-18T20:45:00”502
22“2024-01-12T09:20:00”2501
27“2024-01-27T12:40:00”702
11“2024-01-10T14:30:00”1201
16“2024-01-25T16:15:00”952
55“2024-01-20T10:05:00”1801
510“2024-02-05T11:30:00”652

Write an SQL query to return the requested data.

Upgrade to Pro
👉 Found this lesson helpful?