Data Modeling Interview Questions

Dan Lee's profile image
Dan LeeData & AI Lead
Last updateMarch 13, 2026

Data modeling questions appear in 80% of senior data engineer interviews at Meta, Google, Amazon, and other tech giants. Unlike coding challenges that test algorithms, these questions probe your ability to translate messy business requirements into clean, scalable table structures. Engineering managers use modeling scenarios to assess whether you can design systems that won't break when product requirements evolve or data volumes explode.

The trickiest part isn't knowing star schema theory, it's making real-world tradeoffs under constraint. You might need to choose between fast queries and storage costs, or decide whether to denormalize user profiles into a 10TB clickstream table when only 1% of profiles change daily. Interviewers often give you conflicting requirements from different stakeholders, then watch how you prioritize and communicate the tradeoffs.

Here are the top 22 data modeling questions organized by the core concepts that trip up candidates most often.

Intermediate22 questions

Data Modeling Interview Questions

Top Data Modeling interview questions covering the key areas tested at leading tech companies. Practice with real questions and detailed solutions.

Data EngineerMetaGoogleAmazonAirbnbUberNetflixSnowflakeDatabricks

Modeling Fundamentals, Entities, Keys, and Cardinality

Most candidates stumble on entity modeling because they focus on perfect normalization instead of business logic. Interviewers want to see if you can identify the right grain for fact tables and avoid double-counting when multiple events relate to the same business transaction.

The killer mistake is picking composite primary keys that seem logical but create downstream headaches. When modeling Uber rides with fare adjustments, many candidates make fare_adjustment_id part of the rides table primary key, which breaks when you need ride-level aggregations without the adjustment details.

Modeling Fundamentals, Entities, Keys, and Cardinality

Start with how you translate a product requirement into entities, relationships, and constraints. You are tested on whether you can reason about keys, cardinality, and grain under time pressure, where small mistakes cascade into wrong joins and counts.

You are modeling an Uber-like rides product. A ride can have multiple fare adjustments (surge, promo, toll), and analytics needs exact net fare per ride and per passenger; what are your entities, primary keys, and relationship cardinalities?

UberUberMediumModeling Fundamentals, Entities, Keys, and Cardinality

Sample Answer

Most candidates default to storing a single final_fare on the Ride row, but that fails here because adjustments are multi valued and you will lose auditability and double count when joining to adjustments later. Model Ride (PK ride_id), FareAdjustment (PK adjustment_id, FK ride_id), and RidePassenger (PK ride_id, passenger_id) if split fares exist. Cardinality is Ride 1 to many FareAdjustment, Ride many to many Passenger via RidePassenger. Net fare per ride is $\sum\text{adjustment_amount}$ plus base fare at the ride grain, not at the adjustment grain.

Practice more Modeling Fundamentals, Entities, Keys, and Cardinality questions

Normalization vs Denormalization Tradeoffs

Senior data engineers get asked normalization questions because denormalization decisions affect query performance, storage costs, and data consistency for years. The wrong choice can mean rewriting ETL pipelines when business logic changes or query patterns shift.

Smart candidates recognize that denormalization isn't just about speed, it's about operational complexity. Storing user country in a clickstream fact table might speed up dashboards, but it creates a nightmare when countries change and you need to backfill terabytes of historical data.

Normalization vs Denormalization Tradeoffs

Interviewers often ask you to justify when you would normalize for integrity versus denormalize for performance. You struggle here if you cannot tie the choice to concrete query patterns, update frequency, and failure modes like duplication and drift.

You own a warehouse table powering a dashboard that shows daily active users by country and device, queried every minute. The raw events stream includes user_id, country, device, and you also have a user dimension where country can change, would you store country in the fact table or join to the dimension at query time?

NetflixNetflixMediumNormalization vs Denormalization Tradeoffs

Sample Answer

Denormalize country into the fact table only if the dashboard needs the country as of event time, otherwise normalize and join to the user dimension. If country is mutable and you join to the latest dimension row, you will rewrite history and your DAU by country will drift. If you denormalize, you pay storage and duplication, but you get stable, point-in-time attribution and faster queries. If you normalize, you must implement slowly changing dimensions or time-valid joins to preserve correctness.

Practice more Normalization vs Denormalization Tradeoffs questions

Dimensional Modeling, Facts, Dimensions, and Grain

Dimensional modeling separates junior from senior engineers because it requires balancing multiple stakeholder needs in a single design. Product wants user-level metrics, finance needs transaction totals, and executives want high-level trends, all from the same underlying facts.

The most common failure is building separate fact tables for each use case instead of finding the right atomic grain. When Netflix needs both watch time by title and unique viewers by device, inexperienced candidates create two facts instead of one viewing_events table that supports both aggregations through proper dimension design.

Dimensional Modeling, Facts, Dimensions, and Grain

In analytics heavy roles, you will be asked to design star schemas that support common business questions. Candidates get tripped up by unclear grain, mismatched fact tables, and dimensions that are not conformed across domains.

You are modeling an Uber style rides business. Analytics needs daily revenue by city and product, plus average trip duration. Do you build one trips fact table or separate facts for requests and completed trips, and what grain do you pick?

UberUberMediumDimensional Modeling, Facts, Dimensions, and Grain

Sample Answer

You could do one unified fact at the trip level, or split into a request fact and a completed trip fact. The unified trip fact wins here because the metrics requested are completion based and naturally attach to a completed trip grain, which avoids mixing request only rows with completion only measures. Set the grain to one row per completed trip, then derive daily revenue and average duration by joining conformed dimensions like date, city, and product. If you also need funnel metrics later, add a separate request fact at one row per request and keep dimensions conformed.

Practice more Dimensional Modeling, Facts, Dimensions, and Grain questions

Slowly Changing Dimensions and Temporal Modeling

Enterprise modeling questions test whether you can handle the chaos of real data platforms where multiple source systems disagree on basic definitions. Google and Amazon engineers deal with customer data from dozens of systems, each with different keys, update frequencies, and business rules.

Data Vault modeling becomes critical when you can't control upstream systems and need to preserve full audit history. The framework forces you to separate business keys from descriptive attributes, which prevents the cascade failures that happen when one system's schema change breaks downstream marts.

Slowly Changing Dimensions and Temporal Modeling

When requirements include history, you need to pick the right SCD strategy and make it queryable. This section tests whether you can model time correctly, handle late arriving data, and avoid double counting across effective date ranges.

At larger companies, you may be prompted to design for multiple source systems, auditability, and evolving schemas. You are evaluated on whether you can separate business keys from relationships and attributes, and keep ingestion resilient while still serving downstream marts.

You are integrating Customer data from Salesforce, a mobile app, and a legacy billing system, and each has different identifiers and update cadence. Design a Data Vault core that keeps a stable business key and full audit history while allowing late arriving changes.

AmazonAmazonHardEnterprise and Multi-Source Modeling, Data Vault and Hubs-Links-Satellites

Sample Answer

This question is checking whether you can separate business keys from relationships and attributes, and preserve lineage and history across sources. You create a Hub for the customer business key, plus a Record Source and Load Date on every record to keep auditability. You model identities and crosswalks as Links, for example a Customer-to-SourceIdentifier link, and put changing attributes like email, status, tier into Satellites with hashdiff for change detection. You handle late arriving data by always loading by load date, never overwriting, then letting downstream marts resolve the latest effective view.

Practice more Enterprise and Multi-Source Modeling, Data Vault and Hubs-Links-Satellites questions

How to Prepare for Data Modeling Interviews

Start with business questions, not tables

Before drawing any schema, list the specific analytical questions stakeholders need answered. This forces you to identify the right fact table grain and prevents over-engineering. Practice by taking any product (Spotify, DoorDash) and writing 5 business questions that would drive different modeling decisions.

Call out your tradeoffs explicitly

When you choose denormalization or a specific key structure, state the tradeoff you're making and what could break later. Interviewers want to hear you reason about query performance versus storage costs, or data freshness versus consistency. This shows senior-level thinking about system evolution.

Draw your cardinalities on the whiteboard

Always sketch entity relationships with explicit cardinality markers (1:M, M:N) and sample data. This catches logical errors before you build complex schemas and shows you understand how joins will behave at scale. Practice drawing schemas for common products until the muscle memory sticks.

Know when NOT to use star schema

Be ready to argue against dimensional modeling when requirements call for operational reporting, real-time updates, or highly normalized OLTP patterns. Understanding the boundaries of each approach demonstrates depth beyond just memorizing Kimball principles.

How Ready Are You for Data Modeling Interviews?

1 / 6
Modeling Fundamentals

You are asked to model a new customer analytics platform. Stakeholders keep changing report requirements, but they agree on core business concepts like Customer, Order, Product, and Store. What is the best approach to start the data model so it stays stable while requirements evolve?

Frequently Asked Questions

How deep do I need to go on data modeling for a Data Engineer interview?

You should be able to model common business domains, explain your grain, and justify keys, relationships, and constraints. Expect to discuss normalization vs denormalization, star schema basics, slowly changing dimensions, and how your model supports specific queries and SLAs. You also need to reason about tradeoffs like storage, join patterns, and change data capture impacts.

Which companies tend to ask the most data modeling questions for Data Engineers?

Companies with large analytics ecosystems and strong data platform practices ask a lot of modeling, including big tech, fintech, marketplaces, and B2B SaaS with self serve BI. You will see it often in roles supporting a data warehouse or lakehouse where many teams share datasets. If the job description mentions dimensional modeling, metrics, or semantic layers, you should expect heavy data modeling coverage.

Is coding required in a data modeling interview?

Often yes, but it is usually SQL focused rather than algorithm heavy. You may be asked to write DDL for tables, define primary and foreign keys, create partitioning or clustering strategies, and write a few queries that validate your model. Practice SQL and modeling exercises together using datainterview.com/coding and datainterview.com/questions.

How do data modeling expectations differ across Data Engineer roles?

For analytics focused Data Engineers, you are expected to design star schemas, handle SCDs, and model metrics for BI performance and consistency. For platform or infrastructure focused Data Engineers, you will be pushed more on event schemas, data contracts, schema evolution, and modeling for ingestion and interoperability. In both cases, you should tie the model to downstream consumers and operational constraints.

How can I prepare for data modeling interviews if I have no real world experience?

Pick a familiar domain like e commerce, subscriptions, or ride sharing, then design a model from requirements to tables, keys, and example queries. You should practice stating assumptions, choosing a grain, and iterating when requirements change like refunds, late arriving events, or new dimensions. Use datainterview.com/questions to drill common prompts and build a small portfolio of two or three complete models you can explain end to end.

What are common mistakes candidates make in data modeling interviews?

You often lose points by not defining the grain, which makes facts and dimensions inconsistent and leads to double counting. Another common issue is ignoring query patterns, leading to models that are correct on paper but slow or expensive in a warehouse. You should also avoid vague keys and weak constraints, and you should explicitly address many to many relationships, history tracking, and schema evolution.

Dan Lee's profile image

Written by

Dan Lee

Data & AI Lead

Dan is a seasoned data scientist and ML coach with 10+ years of experience at Google, PayPal, and startups. He has helped candidates land top-paying roles and offers personalized guidance to accelerate your data career.

Connect on LinkedIn