Join ML Engineer Interview MasterClass (April Cohort) led by FAANG Data Scientists | Just 6 seats remaining...
ML Engineer MasterClass (April) | 6 seats left
By the end of this lesson, you will:
Your manager at Meta, Jordan, wants to analyze user engagement patterns, specifically:
How many likes has each user received across all their posts? Can we also find their highest liked post?
Your task is to use PARTITION BY with window functions to segment and analyze user-level engagement. Use the meta_posts table in this lesson:
| post_id | user_id | post_content | post_date | likes | shares | post_status |
|---|---|---|---|---|---|---|
| 1 | 1 | “Had a great day at the beach!” | “2024-01-10T14:30:00” | 120 | 45 | “Public” |
| 2 | 2 | “Launching our new product!” | “2024-01-12T09:20:00” | 250 | 90 | “Friends Only” |
| 3 | 3 | “Just finished a 10K run!” | “2024-01-15T18:10:00” | 80 | 30 | “Public” |
| 4 | 4 | “Hosting a webinar next week.” | “2024-01-18T20:45:00” | 50 | 25 | “Private” |
| 5 | 5 | “Traveling to Japan soon!” | “2024-01-20T10:05:00” | 180 | 75 | “Public” |
| 6 | 1 | “Excited for the weekend!” | “2024-01-25T16:15:00” | 95 | 40 | “Friends Only” |
| 7 | 2 | “Need recommendations for a new laptop.” | “2024-01-27T12:40:00” | 70 | 20 | “Public” |
| 8 | 3 | “Learning SQL window functions!” | “2024-01-29T08:55:00” | 110 | 50 | “Public” |
| 9 | 4 | “Working on a new AI model.” | “2024-02-01T14:00:00” | 200 | 100 | “Friends Only” |
| 10 | 5 | “Book recommendations?” | “2024-02-05T11:30:00” | 65 | 15 | “Public” |
The PARTITION BY clause groups rows into subsets without aggregating them into a single row, like GROUP BY does.
| Clause | What It Does | Keeps Individual Rows? | Example Use Case |
|---|---|---|---|
| GROUP BY | Groups rows and collapses data into one row per group | ❌ No | Find total likes per user |
| PARTITION BY | Groups rows without collapsing them | ✅ Yes | Calculate running totals per user |
SELECT column1, column2,
function() OVER (PARTITION BY column ORDER BY column) AS result
FROM table_name;PARTITION BY groups the dataset into logical segments.GROUP BY, it retains all rows and applies the function within each partition.Jordan wants a report showing the total number of likes each user has received across all their posts, while keeping each post visible.
SELECT user_id, post_id, post_date, likes,
SUM(likes) OVER (PARTITION BY user_id) AS total_likes_per_user
FROM meta_posts;| user_id | post_id | post_date | likes | total_likes_per_user |
|---|---|---|---|---|
| 3 | 3 | “2024-01-15T18:10:00” | 80 | 190 |
| 3 | 8 | “2024-01-29T08:55:00” | 110 | 190 |
| 4 | 4 | “2024-01-18T20:45:00” | 50 | 250 |
| 4 | 9 | “2024-02-01T14:00:00” | 200 | 250 |
| 1 | 1 | “2024-01-10T14:30:00” | 120 | 215 |
| 1 | 6 | “2024-01-25T16:15:00” | 95 | 215 |
| 2 | 2 | “2024-01-12T09:20:00” | 250 | 320 |
| 2 | 7 | “2024-01-27T12:40:00” | 70 | 320 |
| 5 | 5 | “2024-01-20T10:05:00” | 180 | 245 |
| 5 | 10 | “2024-02-05T11:30:00” | 65 | 245 |
What’s Happening?
SUM(likes) OVER (PARTITION BY user_id) calculates total likes per user.Jordan now wants to rank posts per user based on likes, so they can identify each user’s most popular post.
SELECT user_id, post_id, post_date, likes,
RANK() OVER (PARTITION BY user_id ORDER BY likes DESC) AS rank_per_user
FROM meta_posts;| user_id | post_id | post_date | likes | rank_per_user |
|---|---|---|---|---|
| 3 | 8 | “2024-01-29T08:55:00” | 110 | 1 |
| 3 | 3 | “2024-01-15T18:10:00” | 80 | 2 |
| 4 | 9 | “2024-02-01T14:00:00” | 200 | 1 |
| 4 | 4 | “2024-01-18T20:45:00” | 50 | 2 |
| 1 | 1 | “2024-01-10T14:30:00” | 120 | 1 |
| 1 | 6 | “2024-01-25T16:15:00” | 95 | 2 |
| 2 | 2 | “2024-01-12T09:20:00” | 250 | 1 |
| 2 | 7 | “2024-01-27T12:40:00” | 70 | 2 |
| 5 | 5 | “2024-01-20T10:05:00” | 180 | 1 |
| 5 | 10 | “2024-02-05T11:30:00” | 65 | 2 |
What’s Happening?
RANK() OVER (PARTITION BY user_id ORDER BY likes DESC) assigns rankings separately for each user.Jordan also wants to track how engagement grows for each user over time.
SELECT user_id, post_id, post_date, likes,
SUM(likes) OVER (PARTITION BY user_id ORDER BY post_date) AS running_total_likes
FROM meta_posts;| user_id | post_id | post_date | likes | running_total_likes |
|---|---|---|---|---|
| 3 | 3 | “2024-01-15T18:10:00” | 80 | 80 |
| 3 | 8 | “2024-01-29T08:55:00” | 110 | 190 |
| 4 | 4 | “2024-01-18T20:45:00” | 50 | 50 |
| 4 | 9 | “2024-02-01T14:00:00” | 200 | 250 |
| 1 | 1 | “2024-01-10T14:30:00” | 120 | 120 |
| 1 | 6 | “2024-01-25T16:15:00” | 95 | 215 |
| 2 | 2 | “2024-01-12T09:20:00” | 250 | 250 |
| 2 | 7 | “2024-01-27T12:40:00” | 70 | 320 |
| 5 | 5 | “2024-01-20T10:05:00” | 180 | 180 |
| 5 | 10 | “2024-02-05T11:30:00” | 65 | 245 |
What’s Happening?
SUM(likes) OVER (PARTITION BY user_id ORDER BY post_date) tracks engagement growth for each user.Jordan wants to find the total number of likes each user has received across all posts, while keeping each post visible.
Filter condition:
| user_id | post_id | post_date | likes | total_likes_per_user |
|---|---|---|---|---|
| 3 | 3 | “2024-01-15T18:10:00” | 80 | 190 |
| 3 | 8 | “2024-01-29T08:55:00” | 110 | 190 |
| 4 | 4 | “2024-01-18T20:45:00” | 50 | 250 |
| 4 | 9 | “2024-02-01T14:00:00” | 200 | 250 |
| 1 | 1 | “2024-01-10T14:30:00” | 120 | 215 |
| 1 | 6 | “2024-01-25T16:15:00” | 95 | 215 |
| 2 | 2 | “2024-01-12T09:20:00” | 250 | 320 |
| 2 | 7 | “2024-01-27T12:40:00” | 70 | 320 |
| 5 | 5 | “2024-01-20T10:05:00” | 180 | 245 |
| 5 | 10 | “2024-02-05T11:30:00” | 65 | 245 |
Write an SQL query to return the requested data.
Jordan also wants to rank each user’s posts based on likes.
Filter condition:
| user_id | post_id | post_date | likes | rank_per_user |
|---|---|---|---|---|
| 3 | 8 | “2024-01-29T08:55:00” | 110 | 1 |
| 3 | 3 | “2024-01-15T18:10:00” | 80 | 2 |
| 4 | 9 | “2024-02-01T14:00:00” | 200 | 1 |
| 4 | 4 | “2024-01-18T20:45:00” | 50 | 2 |
| 2 | 2 | “2024-01-12T09:20:00” | 250 | 1 |
| 2 | 7 | “2024-01-27T12:40:00” | 70 | 2 |
| 1 | 1 | “2024-01-10T14:30:00” | 120 | 1 |
| 1 | 6 | “2024-01-25T16:15:00” | 95 | 2 |
| 5 | 5 | “2024-01-20T10:05:00” | 180 | 1 |
| 5 | 10 | “2024-02-05T11:30:00” | 65 | 2 |
Write an SQL query to return the requested data.