Join ML Engineer Interview MasterClass (April Cohort) led by FAANG Data Scientists | Just 6 seats remaining...
ML Engineer MasterClass (April) | 6 seats left
By the end of this lesson, you will:
Your manager at Meta, Jordan, wants insights into the first, last, and most engaging posts within user activity groups.
Can you provide engagement insights by retrieving the most liked post for each user, as well as their first and last post?
Your task is to use FIRST_VALUE(), LAST_VALUE(), and NTH_VALUE() to analyze user engagement patterns. Use the meta_posts table in this lesson:
| post_id | user_id | post_content | post_date | likes | shares | post_status |
|---|---|---|---|---|---|---|
| 1 | 1 | “Had a great day at the beach!” | “2024-01-10T14:30:00” | 120 | 45 | “Public” |
| 2 | 2 | “Launching our new product!” | “2024-01-12T09:20:00” | 250 | 90 | “Friends Only” |
| 3 | 3 | “Just finished a 10K run!” | “2024-01-15T18:10:00” | 80 | 30 | “Public” |
| 4 | 4 | “Hosting a webinar next week.” | “2024-01-18T20:45:00” | 50 | 25 | “Private” |
| 5 | 5 | “Traveling to Japan soon!” | “2024-01-20T10:05:00” | 180 | 75 | “Public” |
| 6 | 1 | “Excited for the weekend!” | “2024-01-25T16:15:00” | 95 | 40 | “Friends Only” |
| 7 | 2 | “Need recommendations for a new laptop.” | “2024-01-27T12:40:00” | 70 | 20 | “Public” |
| 8 | 3 | “Learning SQL window functions!” | “2024-01-29T08:55:00” | 110 | 50 | “Public” |
| 9 | 4 | “Working on a new AI model.” | “2024-02-01T14:00:00” | 200 | 100 | “Friends Only” |
| 10 | 5 | “Book recommendations?” | “2024-02-05T11:30:00” | 65 | 15 | “Public” |
Window functions allow you to retrieve specific row values while keeping the full dataset.
| Function | What It Does | Example Use Case |
|---|---|---|
| FIRST_VALUE() | Returns the first row’s value in a partition | Identify the first post a user made |
| LAST_VALUE() | Returns the last row’s value in a partition | Find the latest post a user made |
| NTH_VALUE(n) | Retrieves the n-th row’s value in a partition | Get the third most liked post |
SELECT column1,
function(column2) OVER (PARTITION BY column ORDER BY column) AS result
FROM table_name;PARTITION BY groups data into subsets (optional).ORDER BY column defines row sequence.Jordan wants to identify the first post each user made based on post date.
SELECT user_id, post_id, post_date, likes,
FIRST_VALUE(post_id) OVER (PARTITION BY user_id ORDER BY post_date) AS first_post
FROM meta_posts;| user_id | post_id | post_date | likes | first_post |
|---|---|---|---|---|
| 1 | 3 | 2024-01-15 10:20:00 | 140 | 3 |
| 1 | 5 | 2024-01-25 12:00:00 | 150 | 3 |
| 2 | 2 | 2024-01-10 18:30:00 | 100 | 2 |
What’s Happening?
FIRST_VALUE(post_id) OVER (PARTITION BY user_id ORDER BY post_date) retrieves the earliest post each user made.Jordan now wants to identify the last post made by each user.
SELECT user_id, post_id, post_date, likes,
LAST_VALUE(post_id) OVER (PARTITION BY user_id ORDER BY post_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_post
FROM meta_posts;| user_id | post_id | post_date | likes | last_post |
|---|---|---|---|---|
| 4 | 4 | “2024-01-18T20:45:00” | 50 | 9 |
| 4 | 9 | “2024-02-01T14:00:00” | 200 | 9 |
| 1 | 1 | “2024-01-10T14:30:00” | 120 | 6 |
| 1 | 6 | “2024-01-25T16:15:00” | 95 | 6 |
| 3 | 3 | “2024-01-15T18:10:00” | 80 | 8 |
| 3 | 8 | “2024-01-29T08:55:00” | 110 | 8 |
| 2 | 2 | “2024-01-12T09:20:00” | 250 | 7 |
| 2 | 7 | “2024-01-27T12:40:00” | 70 | 7 |
| 5 | 5 | “2024-01-20T10:05:00” | 180 | 10 |
| 5 | 10 | “2024-02-05T11:30:00” | 65 | 10 |
What’s Happening?
LAST_VALUE(post_id) OVER (...) retrieves the latest post per user.ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) to ensure it looks at all rows in the partition.Jordan wants to retrieve the second most liked post for each user.
SELECT user_id, post_id, post_date, likes,
NTH_VALUE(post_id, 2) OVER (PARTITION BY user_id ORDER BY likes DESC) AS second_most_liked_post
FROM meta_posts;| user_id | post_id | post_date | likes | second_most_liked_post |
|---|---|---|---|---|
| 3 | 8 | “2024-01-29T08:55:00” | 110 | null |
| 3 | 3 | “2024-01-15T18:10:00” | 80 | 3 |
| 1 | 1 | “2024-01-10T14:30:00” | 120 | null |
| 1 | 6 | “2024-01-25T16:15:00” | 95 | 6 |
| 4 | 9 | “2024-02-01T14:00:00” | 200 | null |
| 4 | 4 | “2024-01-18T20:45:00” | 50 | 4 |
| 2 | 2 | “2024-01-12T09:20:00” | 250 | null |
| 2 | 7 | “2024-01-27T12:40:00” | 70 | 7 |
| 5 | 5 | “2024-01-20T10:05:00” | 180 | null |
| 5 | 10 | “2024-02-05T11:30:00” | 65 | 10 |
What’s Happening?
NTH_VALUE(post_id, 2) OVER (PARTITION BY user_id ORDER BY likes DESC) retrieves the second most liked post per user.Jordan wants to identify the first post made by each user.
Filter condition:
| user_id | post_id | post_date | likes | first_post |
|---|---|---|---|---|
| 3 | 3 | “2024-01-15T18:10:00” | 80 | 3 |
| 3 | 8 | “2024-01-29T08:55:00” | 110 | 3 |
| 4 | 4 | “2024-01-18T20:45:00” | 50 | 4 |
| 4 | 9 | “2024-02-01T14:00:00” | 200 | 4 |
| 1 | 1 | “2024-01-10T14:30:00” | 120 | 1 |
| 1 | 6 | “2024-01-25T16:15:00” | 95 | 1 |
| 2 | 2 | “2024-01-12T09:20:00” | 250 | 2 |
| 2 | 7 | “2024-01-27T12:40:00” | 70 | 2 |
| 5 | 5 | “2024-01-20T10:05:00” | 180 | 5 |
| 5 | 10 | “2024-02-05T11:30:00” | 65 | 5 |
Write an SQL query to return the requested data.
Jordan also wants to retrieve the second most liked post per user.
Filter condition:
| user_id | post_id | post_date | likes | second_most_liked_post |
|---|---|---|---|---|
| 3 | 8 | “2024-01-29T08:55:00” | 110 | null |
| 3 | 3 | “2024-01-15T18:10:00” | 80 | 3 |
| 4 | 9 | “2024-02-01T14:00:00” | 200 | null |
| 4 | 4 | “2024-01-18T20:45:00” | 50 | 4 |
| 1 | 1 | “2024-01-10T14:30:00” | 120 | null |
| 1 | 6 | “2024-01-25T16:15:00” | 95 | 6 |
| 2 | 2 | “2024-01-12T09:20:00” | 250 | null |
| 2 | 7 | “2024-01-27T12:40:00” | 70 | 7 |
| 5 | 5 | “2024-01-20T10:05:00” | 180 | null |
| 5 | 10 | “2024-02-05T11:30:00” | 65 | 10 |
Write an SQL query to return the requested data.