Join ML Engineer Interview MasterClass (April Cohort) led by FAANG Data Scientists | Just 6 seats remaining...
ML Engineer MasterClass (April) | 6 seats left
By the end of this lesson, you will:
Your manager at Amazon, Sophia, noticed that some orders are missing delivery dates and some total spending values are NULL.
Can you ensure that missing delivery dates display ‘Not Delivered’ and that NULL spending amounts default to 0?
Your task is to use COALESCE() to replace missing values and NULLIF() to clean up inconsistent data. Use the amazon_orders and amazon_customers tables to address the questions.
amazon_orders table:
| order_id | customer_id | product_id | order_date | quantity | total_price | status | delivery_date |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 2 | “2024-01-10T14:30:00” | 1 | 59.99 | “Delivered” | “2024-01-15T00:00:00” |
| 2 | 2 | 3 | “2024-01-12T09:20:00” | 2 | 39.98 | “Shipped” | “NaT” |
| 3 | 3 | 5 | “2024-01-15T18:10:00” | 1 | 39.99 | “Pending” | “NaT” |
| 4 | 4 | 1 | “2024-01-18T20:45:00” | 3 | 77.97 | “Delivered” | “2024-01-22T00:00:00” |
| 5 | 5 | 4 | “2024-01-20T10:05:00” | 1 | 699.99 | “Canceled” | “NaT” |
| 6 | 1 | 1 | “2024-01-25T16:15:00” | 2 | 51.98 | “Shipped” | “NaT” |
| 7 | 2 | 4 | “2024-01-27T12:40:00” | 1 | 699.99 | “Pending” | “NaT” |
| 8 | 3 | 2 | “2024-01-29T08:55:00” | 2 | 119.98 | “Delivered” | “2024-02-02T00:00:00” |
| 9 | 4 | 5 | “2024-02-01T14:00:00” | 1 | 39.99 | “Shipped” | “NaT” |
| 10 | 5 | 3 | “2024-02-05T11:30:00” | 3 | 59.97 | “Pending” | “NaT” |
amazon_customers table:
| customer_id | name | country | signup_date | membership_status | total_spent | |
|---|---|---|---|---|---|---|
| 1 | “Alice Johnson” | “alice@amazon.com” | “USA” | “2020-06-15T00:00:00” | “Prime” | 5000 |
| 2 | “Bob Smith” | “bob@amazon.com” | “Canada” | “2019-04-10T00:00:00” | “Regular” | 1200.5 |
| 3 | “Charlie Lee” | “charlie@amazon.com” | “UK” | “2021-02-22T00:00:00” | “Business” | 8500.75 |
| 4 | “David Kim” | “david@amazon.com” | “Germany” | “2022-09-30T00:00:00” | “Prime” | 3000 |
| 5 | “Emma Garcia” | “emma@amazon.com” | “Mexico” | “2023-01-12T00:00:00” | “Regular” | 400.2 |
The COALESCE() function returns the first non-NULL value from a list of arguments.
| Why use COALESCE()? |
|---|
| Provide default values for NULLs |
| Prevent NULLs from breaking calculations |
| Ensure cleaner reports |
SELECT COALESCE(column_name, default_value) FROM table_name;column_name is NULL, COALESCE() returns default_value.column_name is not NULL, the original value is returned.1SELECT order_id, customer_id,
2 COALESCE(CAST(delivery_date AS VARCHAR), 'Not Delivered') AS formatted_delivery_date
3FROM amazon_orders
4ORDER BY order_id;| order_id | customer_id | formatted_delivery_date |
|---|---|---|
| 1 | 1 | 2024-01-15 |
| 2 | 2 | Not Delivered |
| 3 | 3 | Not Delivered |
What’s Happening?
COALESCE(delivery_date, 'Not Delivered') replaces NULL delivery dates with "Not Delivered".⚠️ Important Note:
total_spent is a DECIMAL or FLOAT, ensure the replacement value (0) is also of the same type (e.g., 0.00 instead of 0 in some databases).Sophia wants to ensure that all missing total_spent values are displayed as $0 instead of NULL. This is important for financial reports where missing values should be treated as zero rather than unknown.
1SELECT customer_id, name,
2 COALESCE(total_spent, 0) AS adjusted_total_spent
3FROM amazon_customers
4ORDER BY customer_id;| customer_id | name | adjusted_total_spent |
|---|---|---|
| 1 | Alice Johnson | 5000.00 |
| 2 | Bob Smith | 1200.50 |
| 3 | Charlie Lee | 8500.75 |
| 4 | David Kim | 3000.00 |
| 5 | Emma Garcia | 400.20 |
What’s Happening?
COALESCE(total_spent, 0) replaces any NULL values with 0.The NULLIF() function returns NULL if two values are equal, otherwise it returns the first value.
| Why use NULLIF()? |
|---|
| Handle division errors (e.g., dividing by zero) |
| Convert “invalid” data into NULLs |
| Normalize inconsistent data |
SELECT NULLIF(value1, value2) FROM table_name;value1 = value2, NULL is returned.value1 is returned.1SELECT customer_id, name,
2 NULLIF(total_spent, 0) AS valid_total_spent
3FROM amazon_customers
4ORDER BY customer_id;| customer_id | name | valid_total_spent |
|---|---|---|
| 1 | Alice Johnson | 5000.00 |
| 2 | Bob Smith | 1200.50 |
| 3 | Charlie Lee | 8500.75 |
| 4 | David Kim | 3000.00 |
| 5 | Emma Garcia | NULL |
What’s Happening?
NULLIF(total_spent, 0) replaces 0 with NULL to avoid including customers who haven’t spent anything.Sophia needs a report that ensures missing delivery dates display ‘Not Delivered’.
Filter condition:
"Not Delivered".order_id in ascending order.| order_id | customer_id | formatted_delivery_date |
|---|---|---|
| 1 | 1 | 2024-01-15 |
| 2 | 2 | Not Delivered |
| 3 | 3 | Not Delivered |
Write an SQL query to return the requested data.
Sophia also wants to replace zero total spending values with NULL to differentiate customers who haven’t made purchases.
Filter condition:
customer_id in ascending order.| customer_id | name | valid_total_spent |
|---|---|---|
| 1 | “Alice Johnson” | 5000 |
| 2 | “Bob Smith” | 1200.5 |
| 3 | “Charlie Lee” | 8500.75 |
| 4 | “David Kim” | 3000 |
| 5 | “Emma Garcia” | 400.2 |
Write an SQL query to return the requested data.