Join ML Engineer Interview MasterClass (April Cohort) led by FAANG Data Scientists | Just 6 seats remaining...
ML Engineer MasterClass (April) | 6 seats left
By the end of this lesson, you will:
Your manager at Meta, Jordan, wants to analyze engagement trends with rolling averages and cumulative sums.
Can you calculate engagement trends using different time windows? We need insights into how post likes evolve over time?
Your task is to use ROWS BETWEEN and PRECEDING to control window function behavior for rolling analytics.
Use the meta_posts table in this lesson:
| post_id | user_id | post_content | post_date | likes | shares | post_status |
|---|---|---|---|---|---|---|
| 1 | 1 | “Had a great day at the beach!” | “2024-01-10T14:30:00” | 120 | 45 | “Public” |
| 2 | 2 | “Launching our new product!” | “2024-01-12T09:20:00” | 250 | 90 | “Friends Only” |
| 3 | 3 | “Just finished a 10K run!” | “2024-01-15T18:10:00” | 80 | 30 | “Public” |
| 4 | 4 | “Hosting a webinar next week.” | “2024-01-18T20:45:00” | 50 | 25 | “Private” |
| 5 | 5 | “Traveling to Japan soon!” | “2024-01-20T10:05:00” | 180 | 75 | “Public” |
| 6 | 1 | “Excited for the weekend!” | “2024-01-25T16:15:00” | 95 | 40 | “Friends Only” |
| 7 | 2 | “Need recommendations for a new laptop.” | “2024-01-27T12:40:00” | 70 | 20 | “Public” |
| 8 | 3 | “Learning SQL window functions!” | “2024-01-29T08:55:00” | 110 | 50 | “Public” |
| 9 | 4 | “Working on a new AI model.” | “2024-02-01T14:00:00” | 200 | 100 | “Friends Only” |
| 10 | 5 | “Book recommendations?” | “2024-02-05T11:30:00” | 65 | 15 | “Public” |
ROWS BETWEEN in SQLThe default window function behavior considers all previous rows up to the current row. However, using ROWS BETWEEN, we can:
ROWS BETWEEN Configurations| Clause | Description | Use Case |
|---|---|---|
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW | Default cumulative sum | Running totals |
ROWS BETWEEN X PRECEDING AND CURRENT ROW | Limits to the last X rows | Moving averages |
ROWS BETWEEN X PRECEDING AND X FOLLOWING | Averages over a balanced window | Centered moving averages |
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING | Considers all rows | Full window analysis |
ROWS BETWEEN for Running TotalsJordan wants to track how engagement accumulates over time for each user’s posts.
1SELECT user_id, post_id, post_date, likes,
2 SUM(likes) OVER (PARTITION BY user_id ORDER BY post_date
3 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total_likes
4FROM meta_posts;| user_id | post_id | post_date | likes | running_total_likes |
|---|---|---|---|---|
| 3 | 3 | “2024-01-15T18:10:00” | 80 | 80 |
| 3 | 8 | “2024-01-29T08:55:00” | 110 | 190 |
| 4 | 4 | “2024-01-18T20:45:00” | 50 | 50 |
| 4 | 9 | “2024-02-01T14:00:00” | 200 | 250 |
| 1 | 1 | “2024-01-10T14:30:00” | 120 | 120 |
| 1 | 6 | “2024-01-25T16:15:00” | 95 | 215 |
| 2 | 2 | “2024-01-12T09:20:00” | 250 | 250 |
| 2 | 7 | “2024-01-27T12:40:00” | 70 | 320 |
| 5 | 5 | “2024-01-20T10:05:00” | 180 | 180 |
| 5 | 10 | “2024-02-05T11:30:00” | 65 | 245 |
What’s Happening?
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW adds up likes progressively.ROWS BETWEEN X PRECEDING AND CURRENT ROW for Moving AveragesJordan wants to calculate a 3-post moving average of likes, considering only the last 3 posts.
1SELECT user_id, post_id, post_date, likes,
2 AVG(likes) OVER (PARTITION BY user_id ORDER BY post_date
3 ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg_likes
4FROM meta_posts;| user_id | post_id | post_date | likes | moving_avg_likes |
|---|---|---|---|---|
| 3 | 3 | “2024-01-15T18:10:00” | 80 | 80 |
| 3 | 8 | “2024-01-29T08:55:00” | 110 | 95 |
| 4 | 4 | “2024-01-18T20:45:00” | 50 | 50 |
| 4 | 9 | “2024-02-01T14:00:00” | 200 | 125 |
| 1 | 1 | “2024-01-10T14:30:00” | 120 | 120 |
| 1 | 6 | “2024-01-25T16:15:00” | 95 | 107.5 |
| 2 | 2 | “2024-01-12T09:20:00” | 250 | 250 |
| 2 | 7 | “2024-01-27T12:40:00” | 70 | 160 |
| 5 | 5 | “2024-01-20T10:05:00” | 180 | 180 |
| 5 | 10 | “2024-02-05T11:30:00” | 65 | 122.5 |
What’s Happening?
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW considers only the last 2 posts + the current post.ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING for Centered Moving AveragesJordan wants a balanced moving average that considers the previous, current, and next post.
1SELECT user_id, post_id, post_date, likes,
2 AVG(likes) OVER (PARTITION BY user_id ORDER BY post_date
3 ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS centered_avg_likes
4FROM meta_posts;| user_id | post_id | post_date | likes | centered_avg_likes |
|---|---|---|---|---|
| 3 | 3 | “2024-01-15T18:10:00” | 80 | 95 |
| 3 | 8 | “2024-01-29T08:55:00” | 110 | 95 |
| 4 | 4 | “2024-01-18T20:45:00” | 50 | 125 |
| 4 | 9 | “2024-02-01T14:00:00” | 200 | 125 |
| 1 | 1 | “2024-01-10T14:30:00” | 120 | 107.5 |
| 1 | 6 | “2024-01-25T16:15:00” | 95 | 107.5 |
| 2 | 2 | “2024-01-12T09:20:00” | 250 | 160 |
| 2 | 7 | “2024-01-27T12:40:00” | 70 | 160 |
| 5 | 5 | “2024-01-20T10:05:00” | 180 | 122.5 |
| 5 | 10 | “2024-02-05T11:30:00” | 65 | 122.5 |
What’s Happening?
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING balances the window by including the next row.Jordan wants to calculate a running total of likes for each user.
Filter condition:
| user_id | post_id | post_date | likes | running_total_likes |
|---|---|---|---|---|
| 3 | 3 | “2024-01-15T18:10:00” | 80 | 80 |
| 3 | 8 | “2024-01-29T08:55:00” | 110 | 190 |
| 4 | 4 | “2024-01-18T20:45:00” | 50 | 50 |
| 4 | 9 | “2024-02-01T14:00:00” | 200 | 250 |
| 1 | 1 | “2024-01-10T14:30:00” | 120 | 120 |
| 1 | 6 | “2024-01-25T16:15:00” | 95 | 215 |
| 2 | 2 | “2024-01-12T09:20:00” | 250 | 250 |
| 2 | 7 | “2024-01-27T12:40:00” | 70 | 320 |
| 5 | 5 | “2024-01-20T10:05:00” | 180 | 180 |
| 5 | 10 | “2024-02-05T11:30:00” | 65 | 245 |
Write an SQL query to return the requested data.
Jordan also wants to calculate a 3-post centered moving average of likes per user.
Filter condition:
| user_id | post_id | post_date | likes | centered_avg_likes |
|---|---|---|---|---|
| 3 | 3 | “2024-01-15T18:10:00” | 80 | 95 |
| 3 | 8 | “2024-01-29T08:55:00” | 110 | 95 |
| 4 | 4 | “2024-01-18T20:45:00” | 50 | 125 |
| 4 | 9 | “2024-02-01T14:00:00” | 200 | 125 |
| 1 | 1 | “2024-01-10T14:30:00” | 120 | 107.5 |
| 1 | 6 | “2024-01-25T16:15:00” | 95 | 107.5 |
| 2 | 2 | “2024-01-12T09:20:00” | 250 | 160 |
| 2 | 7 | “2024-01-27T12:40:00” | 70 | 160 |
| 5 | 5 | “2024-01-20T10:05:00” | 180 | 122.5 |
| 5 | 10 | “2024-02-05T11:30:00” | 65 | 122.5 |
Write an SQL query to return the requested data.