Target Data Engineer Guide (2026): Job, Salary & Interviews

Target Data Engineer at a Glance

Total Compensation

$115k - $230k/yr

Interview Rounds

5 rounds

Difficulty

Levels

P1 - P5

Education

BS in Computer Science, Engineering, Information Systems, or equivalent practical experience (internships/co-ops acceptable). BS in Computer Science, Software Engineering, Information Systems, or related field (or equivalent practical experience). BS in Computer Science, Engineering, Information Systems, or equivalent practical experience (MS preferred for some teams). BS in Computer Science, Engineering, or related field (or equivalent experience); MS preferred for some teams BS in Computer Science, Engineering, or related field typically required; MS preferred; equivalent practical experience acceptable.

Experience

0–18+ yrs

Python Java Scala SQL HiveQLretaildata-pipelines-etl-eltdata-modelingbig-data-sparksql

Target's interview loop is compact (five rounds, about two weeks), but the System Design round is where most rejections happen. Candidates who can write clean Spark jobs still freeze when asked to design a pipeline with explicit SLA windows, fault tolerance, and cost constraints tied to real retail operations like overnight inventory replenishment or promotional pricing updates.

Target Data Engineer Role

Primary Focus

retaildata-pipelines-etl-eltdata-modelingbig-data-sparksql

Skill Profile

Math & Stats

Low

Not a primary emphasis for this Target Data Engineer posting; focus is on building/operating pipelines and software components. Some analytical ability for data profiling is implied via SQL/HiveQL usage, but advanced statistics is not indicated.

Software Eng

High

Strong emphasis on building robust/scalable components, code quality, code reviews, design patterns, handling edge cases/errors/security, debugging, and CI/CD basics (sources: BuiltIn Target posting; Indeed DE skills overview).

Data & SQL

High

Core to the role: data engineering/Hadoop (Hive, Spark), distributed programming concepts, metadata understanding across sources/metrics, query languages for profiling, and data migration; aligns with general DE responsibilities of building/optimizing pipelines and architectures (sources: BuiltIn; Indeed).

Machine Learning

Low

No explicit ML modeling or feature engineering requirements; role is positioned as data engineering infrastructure/pipelines rather than data science/ML engineering (sources: BuiltIn; CourseReport role differentiation).

Applied AI

Low

No GenAI/LLM, vector databases, prompt engineering, or model ops requirements mentioned in the provided Target posting; conservative estimate due to lack of explicit evidence (uncertain beyond sources).

Infra & Cloud

Medium

Cloud platform experience is explicitly requested (GCP/AWS/Azure) plus disaster recovery participation, monitoring capacity, and basic CI/CD; however, deep infrastructure/DevOps ownership is not fully specified (source: BuiltIn).

Business

Medium

Some product/service lifecycle and TCO considerations are called out (evaluation of new technologies, lifecycle management, TCO) and domain knowledge building is expected; limited direct business/retail KPI ownership described (source: BuiltIn).

Viz & Comms

Medium

Visualization tool experience (Power BI, Domo) is explicitly listed and communication/collaboration is required; visualization appears supportive rather than central BI developer scope (source: BuiltIn).

What You Need

Software development fundamentals (code organization, quality, secure coding, edge cases)
Distributed programming concepts
Big data/data engineering experience (Hadoop ecosystem: Hive, Spark)
SQL and HiveQL for data analysis/profiling
Cloud platform experience (Google Cloud, AWS, or Azure)
CI/CD basic understanding
Software design patterns and principles
Debugging/troubleshooting; familiarity with OS/networking/databases
Automation/testing participation (integration/regression; automate test scripts)
Operational support (monitoring capacity; incident/change management)

Nice to Have

BigQuery (explicitly noted as an added advantage; likely Google Cloud focus)
Data migration tooling experience (specific tools not named in source)
Experience with visualization tools (Power BI, Domo) beyond basics
Proof-of-concept/research of new technologies and contributing to architecture/design reviews
Disaster recovery planning participation

Languages

PythonJavaScalaSQLHiveQL

Tools & Technologies

Apache SparkApache HiveHadoop ecosystem (general)Google Cloud Platform (GCP)AWSMicrosoft AzureBigQueryPower BIDomoCI/CD tools (unspecified in source)Data migration tools (unspecified in source)

Want to ace the interview?

Practice with real questions.

Start Mock Interview

You're joining a team that builds and operates data pipelines on a hybrid platform spanning legacy Hadoop clusters and a growing GCP/BigQuery footprint. Your pipelines feed inventory replenishment, pricing analytics, demand forecasting, and Target Circle personalization. Success after year one means you own a production pipeline end-to-end (ingestion through curated serving layer), your downstream analysts trust your datasets, and you've handled at least one on-call rotation during Q4 without a major SLA breach on overnight batch jobs.

A Typical Week

A Week in the Life of a Target Data Engineer

Typical L5 workweek · Target

Weekly time split

Coding — 30%Infrastructure — 22%Meetings — 18%Writing — 12%Analysis — 8%Research — 5%Break — 5%

Culture notes

Target engineering runs at a steady, sustainable pace — most people are offline by 5:30 PM and weekend work is rare outside of on-call rotations.
The Minneapolis HQ teams follow a hybrid policy with roughly three days in-office at Target Plaza, though some data engineering squads flex to two days depending on sprint needs.

The infrastructure slice is where your planned coding day goes to die. A stale Power BI dashboard gets traced back to a broken Hive-to-BigQuery sync, and suddenly your afternoon is a manual backfill instead of the Spark job you'd scoped. Cross-functional syncs with Supply Chain and Merchandising teams aren't filler, either. Those meetings are where you learn which schema changes will break a demand forecasting retrain or why a store-cluster dimension matters for same-day fulfillment.

Projects & Impact Areas

Target is actively migrating workloads from on-prem Hadoop to GCP, so you'll write design docs for things like moving batch Hive ETL jobs to near-real-time Dataflow streaming for fulfillment use cases. That migration work sits alongside building curated datasets and feature tables consumed by analytics and data science teams across merchandising, supply chain, and pricing. The variety is real, but so is the operational weight: Target's platform engineering org expects you to own cost visibility for your pipelines through their showback infrastructure, not just correctness.

Skills & What's Expected

Software engineering quality is the most underrated dimension here. Target expects production-grade Scala/Python/Java with proper testing, CI/CD, and code review participation, not notebook scripts. Data architecture and pipeline design (Spark, Hive, BigQuery) is the core competency, scored high alongside SWE practices. Cloud/infra knowledge matters but carries medium weight, so don't over-rotate on GCP certifications. ML and GenAI are scored low. Skip them in your prep.

Levels & Career Growth

Target Data Engineer Levels

Each level has different expectations, compensation, and interview focus.

Base

$105k

Stock/yr

$3k

Bonus

$7k

0–2 yrs BS in Computer Science, Engineering, Information Systems, or equivalent practical experience (internships/co-ops acceptable).

What This Level Looks Like

Implements well-scoped components of data pipelines and data models for a single product/team domain; impact is primarily within the immediate squad, with emphasis on correctness, maintainability, and meeting defined SLAs under guidance.

Day-to-Day Focus

→SQL proficiency and data modeling fundamentals
→Reliable pipeline implementation and operational hygiene (testing, monitoring, incident response basics)
→Cloud data platform basics (e.g., object storage, compute, orchestration) and cost/performance awareness
→Code quality (readability, review practices) and learning team architecture

Interview Focus at This Level

Emphasis on SQL and data fundamentals (joins, window functions, aggregations), basic data modeling, scripting/programming for ETL (often Python), debugging and reliability mindset (tests/monitoring), and behavioral signals for collaboration, learning, and ownership on well-defined tasks.

Promotion Path

Promotion to the next level typically requires independently owning an end-to-end pipeline/dataset in a team domain, consistently delivering on commitments with minimal oversight, demonstrating strong data quality and operational reliability (alerts, SLAs, incident handling), contributing improvements to shared frameworks or standards, and showing effective cross-functional communication and code review participation.

Find your level

Practice with questions tailored to your target level.

Start Practicing

The P2-to-P3 promotion hinges on independently owning an end-to-end pipeline domain and demonstrating operational reliability (reducing incidents, hitting SLAs, handling backfills without hand-holding). P3 to P4 is where people stall, because the bottleneck shifts from technical execution to scope of influence: setting standards adopted by multiple teams, leading cross-team initiatives like platform migrations, and mentoring others so outcomes continue without your direct involvement.

Work Culture

Most data engineers work roughly three days a week in-office at Target Plaza in Minneapolis, following Target's hybrid policy. The pace is sustainable on a normal week (most people offline by 5:30 PM), but it ramps around seasonal events like back-to-school and Q4 holidays, when pipeline SLA windows tighten and on-call rotations carry more backfill pressure. The collaboration culture is strong, with blameless postmortems and genuine team ownership of production systems, though "ownership" at Target concretely means you're responsible for the operational health, cost tracking, and incident response for your pipelines, not just the initial build.

Target Data Engineer Compensation

Target communicates RSU grants as a target dollar amount, not a fixed share count. The conversion to actual shares happens at the stock price on or near the grant date, so if TGT dips between your offer letter and the board's routine approval, you end up with more shares (and vice versa). Specific vesting schedules and refresh grant details aren't publicly documented, so ask your recruiter for the exact cliff and annual breakdown before you sign anything.

When negotiating, anchor your base salary counter to the pipeline scale you'd own (inventory replenishment across 1,900+ stores, near real-time pricing feeds into BigQuery) rather than generic market data. Target's offer process moves fast for a Fortune 50, so explicitly request time and a full written breakdown of base, bonus target, equity, vesting, and relocation before countering. The single lever most candidates miss: pushing for a higher level rather than a higher number within the same band, because at Target the jump from P2 to P3 unlocks a roughly 4x increase in annual equity grant value, which compounds far more than a few thousand dollars on base.

Target Data Engineer Interview Process

5 rounds·~2 weeks end to end

Initial Screen

2 rounds

Recruiter Screen

30mPhone

First, you’ll have a recruiter conversation to confirm role fit, location/onsite expectations, and compensation range alignment. The discussion usually stays high level, focusing on your recent projects (pipelines, cloud, Spark/SQL) and your availability/timeline. Expect quick follow-up cadence, often within days, based on candidate reports of an efficient process.

generalbehavioraldata_engineeringcloud_infrastructure

Tips for this round

Prepare a 60–90 second walkthrough of your most relevant data engineering project, naming concrete tech (Spark, Kafka, Airflow/ADF, Snowflake/BigQuery, Databricks, AWS/GCP/Azure).
Clarify work model early (in-office vs hybrid) and scheduling constraints—some Target loops can be multiple rounds in a single day (per candidate reports).
Have a crisp salary ask and justification (market data + scope level); confirm if the level is DE vs Sr DE and what that implies for expectations.
Ask what the next step is (often a recorded video interview or technical screens) and whether any SQL/system design is emphasized for the team.
Share links or artifacts if allowed (GitHub, portfolio, architecture diagram) that demonstrate production-grade pipeline work and reliability practices.

Hiring Manager Screen

45mVideo Call

Expect a live discussion with the hiring manager centered on your end-to-end data engineering experience and how you work with product/analytics partners. The interviewer will probe your depth across ingestion, transformation, orchestration, and serving layers, plus how you ensure reliability and data quality. You may also be asked to explain trade-offs you made (batch vs streaming, schema design, cost vs performance).

data_engineeringdata_pipelinedata_warehousebehavioral

Tips for this round

Bring one architecture story and walk through sources → landing → transformations → warehouse/lakehouse → consumers; mention orchestration (Airflow) and CI/CD.
Be ready to describe data quality controls (Great Expectations-like checks, constraints, anomaly detection), SLAs, and on-call/operational habits.
Explain how you model data (facts/dimensions, SCD types, partitioning/clustering) and how that supports downstream query patterns.
Prepare examples of cross-team communication: writing RFCs, aligning on definitions/metrics, and handling breaking schema changes.
Ask what the team’s stack is (e.g., Spark/Databricks, Kafka, cloud DW) and tailor your examples to those tools and constraints.

Technical Assessment

2 rounds

Behavioral

30mVideo Call

Next comes a one-way online video interview where you respond to pre-set prompts without a live interviewer, a format frequently mentioned by candidates. You’ll typically answer scenario-style questions covering collaboration, ownership, and how you handle ambiguity. The tone is often described as casual and approachable, but time-boxed.

behavioralgeneral

Tips for this round

Use a STAR structure and keep each answer to ~1.5–2 minutes; practice with a timer to avoid getting cut off.
Bank 6–8 stories that map to ownership, conflict resolution, stakeholder management, and delivering under deadlines in data projects.
Include metrics in stories (latency reduced, cost saved, SLA improved, data quality defects prevented) to sound engineering-focused.
Test your setup (camera, mic, lighting) and do a dry run; one-way tools penalize rambling and audio issues.
When asked about mistakes, focus on detection (monitoring/data checks), mitigation (rollback/backfill), and prevention (tests, contracts, runbooks).

SQL & Data Modeling

60mLive

You’ll be given data tables and asked to write SQL that demonstrates correct joins, aggregations, window functions, and edge-case handling. The session often blends query writing with discussion of modeling choices, indexes/partitioning concepts, and how you’d make queries robust for production dashboards or pipelines. Accuracy matters, but so does explaining your approach and validating results.

databasedata_modelingdata_warehousedata_engineering

Tips for this round

Practice window functions (ROW_NUMBER, LAG/LEAD), deduping, sessionization patterns, and handling NULLs/late-arriving data.
State assumptions out loud (grain, uniqueness, time zones) and add validation queries (row counts, distinct keys) before finalizing.
Know common warehouse optimizations: partition pruning, clustering/sort keys, avoiding exploding joins, and using incremental strategies.
Be comfortable modeling at the right grain and explaining fact vs dimension, SCD2 handling, and surrogate keys.
Write clean SQL: CTEs with meaningful names, consistent filters, and comments; interviewers often score readability and reasoning.

Onsite

1 round

System Design

60mLive

Finally, you’ll typically face a data system design conversation, sometimes as part of a multi-round onsite loop that can occur in one day. You’ll be asked to architect a pipeline (batch or streaming) with requirements around scale, freshness, cost, reliability, and governance. Trade-offs, failure handling, and how you’d operationalize the solution are usually as important as naming tools.

system_designdata_pipelinecloud_infrastructuredata_engineering

Tips for this round

Start by clarifying requirements: volume/velocity, latency SLA, consumers, retention, backfills, and compliance/PII constraints.
Draw a simple architecture first (sources → ingestion → storage → processing → serving → monitoring) before adding optimizations.
Cover reliability explicitly: idempotency, checkpoints, retries, DLQs, schema evolution, and replay/backfill strategy.
Discuss observability: data quality checks, pipeline SLAs, lineage, alerting thresholds, and runbooks/on-call readiness.
Compare at least two options (e.g., Kafka vs batch files; lakehouse vs warehouse) and justify with cost/performance/operability trade-offs.

Tips to Stand Out

Prepare for one-way video questions. Rehearse concise STAR answers, keep a steady pace, and make sure your setup (audio/lighting) is flawless because you won’t get live prompts to recover.
Demonstrate end-to-end ownership. Emphasize how you moved from raw ingestion to curated models to reliable serving, including SLAs, monitoring, and incident/backfill handling.
Lead with SQL and modeling fundamentals. Strong window functions, clear grain definition, and practical warehouse performance considerations tend to differentiate candidates in DE loops.
Make trade-offs explicit in design rounds. Always compare batch vs streaming, lake vs warehouse, and cost vs latency, then justify with assumptions and measurable constraints.
Quantify impact. Bring metrics (runtime, cost, latency, data quality defect rate, adoption) to every project story to show business and operational outcomes.
Expect a fast timeline. Candidates frequently report quick communication; keep availability flexible for a condensed multi-round day if requested.

Common Reasons Candidates Don't Pass

✗Shallow pipeline depth. Candidates who can list tools but can’t explain orchestration, incremental processing, backfills, and failure modes often appear unready for production ownership.
✗Weak SQL under pressure. Common issues include incorrect join logic, wrong grain/aggregation, ignoring NULL/duplicates, and inability to validate results with quick sanity checks.
✗Unclear data modeling thinking. Failing to articulate fact/dimension design, key strategy, SCD handling, or how the model serves real query patterns can sink the evaluation.
✗No reliability/operations story. If you can’t discuss monitoring, alerting, data quality tests, and incident response, interviewers may doubt your ability to run pipelines at scale.
✗Behavioral signals of low ownership. Vague answers about conflicts, missed deadlines, or ambiguity—without showing accountability and concrete actions—tend to be scored down.

Offer & Negotiation

For Data Engineer offers at a large retailer like Target, compensation commonly includes base salary plus an annual bonus target, and may include equity/RSUs for some levels or business units; benefits can be a meaningful part of total comp. The most negotiable levers are typically base pay, sign-on bonus, and level/title (which affects band and bonus); equity is sometimes less flexible but can move with leveling. Use market ranges for your metro, anchor with scope (pipeline scale, on-call, leadership), and ask for the full comp breakdown (base/bonus/equity/vesting, relocation, benefits) before countering. If the process is moving quickly, explicitly request time to review and come back with a written counter grounded in comparable DE offers and your proven production impact.

The one-way recorded video at round two is the sneakiest filter in this loop. There's no live interviewer to read, no back-and-forth to recover from a stumble. Candidates report getting cut before ever reaching a technical round because their recorded answers lacked specifics about operational ownership, incident response, or cross-team collaboration. If your stories don't include real numbers (latency improvements, cost reductions, SLA targets you maintained), the recording won't survive scrutiny.

From what candidates report, the rejection reasons skew toward depth, not breadth. Shallow answers about orchestration and incremental processing sink people in the SQL & Data Modeling round, while the System Design round punishes anyone who can't articulate tradeoffs specific to Target's hybrid Hadoop-to-GCP migration or explain how they'd handle batch-plus-streaming pipelines feeding BigQuery consumers. Knowing the tools matters less than showing you've operated them under real production pressure.

Target Data Engineer Interview Questions

Data Pipelines & Distributed ETL (Spark/Hive/Hadoop)

Expect questions that force you to explain how you build and run reliable batch pipelines at scale (ingest, transform, backfill, late data, and reruns) using Spark/Hive patterns. Candidates often struggle when asked to connect partitioning, shuffles, and data skew to real production symptoms like slow jobs or missed SLAs.

A nightly Spark job builds a Hive table for store-day SKU on-hand and on-order used by replenishment, partitioned by dt and store_id; after a change, it started missing its 6 a.m. SLA and shows huge shuffle spill. What specific checks and Spark or Hive changes would you make to isolate skew and reduce shuffles without changing the output schema?

HardSpark Performance, Partitioning, Skew

Sample Answer

Most candidates default to cranking up executors or bumping shuffle partitions, but that fails here because skew creates a few giant tasks that still spill and straggle. You check the Spark UI for skewed stages, top keys (store_id, sku_id), skewed join sides, and whether partition pruning is lost (dt pushed down or not). Fixes are targeted: broadcast the smaller dimension, pre-aggregate before the join, add salting for the skewed key, enable AQE skew join handling if allowed, and repartition by the post-join key that matches the final write. On the Hive side, validate dynamic partition settings, file sizing (avoid too many small files), and that the partition columns are not being transformed in filters so pruning works.

You ingest POS transactions into HDFS hourly and populate a Hive fact table partitioned by dt and hr for store sales and returns; late-arriving records can show up up to 72 hours late and the dashboard metric is net_sales. How do you design the backfill and rerun strategy so net_sales is correct, idempotent, and you do not rewrite all partitions every run?

EasyIdempotent ETL, Late Data, Backfills

Practice more Data Pipelines & Distributed ETL (Spark/Hive/Hadoop) questions

SQL & Data Profiling (SQL/HiveQL/BigQuery)

Most candidates underestimate how much of the interview hinges on writing correct SQL quickly for retail-style datasets (transactions, items, stores) and explaining edge cases. You’ll be tested on joins, window functions, deduping, incremental logic, and profiling/quality checks using SQL or HiveQL (often with BigQuery-style syntax awareness).

You have a BigQuery table `raw.pos_transactions` with columns (`transaction_id`, `store_id`, `register_id`, `transaction_ts`, `line_id`, `item_id`, `qty`, `net_sales`, `ingest_ts`) and duplicates arrive on replays. Write SQL to keep only the latest version per (`transaction_id`, `line_id`) based on `ingest_ts`, and return deduped rows for the last 7 days of `transaction_ts`.

EasyDeduplication and Incremental Filtering

Sample Answer

Filter to the last 7 days, then use `QUALIFY` with `ROW_NUMBER()` to keep the single latest ingested row per (`transaction_id`, `line_id`). This removes replay duplicates without needing a self-join. Ties on `ingest_ts` are where most people fail, add a deterministic tiebreaker to avoid non-repeatable results.

SQL

1/* BigQuery Standard SQL */
2WITH recent AS (
3  SELECT
4    transaction_id,
5    store_id,
6    register_id,
7    transaction_ts,
8    line_id,
9    item_id,
10    qty,
11    net_sales,
12    ingest_ts
13  FROM `raw.pos_transactions`
14  WHERE transaction_ts >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
15)
16SELECT
17  transaction_id,
18  store_id,
19  register_id,
20  transaction_ts,
21  line_id,
22  item_id,
23  qty,
24  net_sales,
25  ingest_ts
26FROM recent
27QUALIFY
28  ROW_NUMBER() OVER (
29    PARTITION BY transaction_id, line_id
30    ORDER BY ingest_ts DESC, transaction_ts DESC, store_id DESC
31  ) = 1;

In `analytics.daily_store_item_sales` (`sales_dt`, `store_id`, `item_id`, `units`, `net_sales`), you suspect late-arriving data and bad keys. Write SQL that profiles the last 14 days and outputs, per `sales_dt`, row_count, distinct key count, duplicate key count, null key counts, and how many rows have negative `units` or `net_sales`.

HardData Profiling and Quality Checks

Practice more SQL & Data Profiling (SQL/HiveQL/BigQuery) questions

Data Modeling & Warehousing for Analytics

Your ability to reason about analytic data models is used as a proxy for whether downstream teams can trust and reuse your outputs. Be ready to discuss star/snowflake choices, slowly changing dimensions, grain, surrogate keys, and how you’d model common retail entities like orders, returns, inventory, and promotions.

You need a warehouse model for enterprise analytics on Orders, OrderLines, Payments, and Returns at Target. When do you model this as a star schema versus a more normalized snowflake, and what is the grain of each fact table?

EasyDimensional Modeling, Star vs Snowflake, Grain

Sample Answer

You could do a denormalized star or a more normalized snowflake. Star wins here because BI and ad hoc analysts need predictable joins, stable performance, and fewer ways to double count revenue, returns, and units. Snowflake is still reasonable when product, store, or customer hierarchies are large, shared across many marts, and you need tighter governance, but you pay with join complexity and analyst error rate. The fact grains should be explicit, for example order line for sales, payment transaction for tenders, and return line for returns.

Target wants net sales by day, store, and item, where net sales is shipped sales minus returns, and returns can happen weeks later. How do you model and compute this so the metric is stable and does not rewrite history unexpectedly?

MediumRetail Facts, Late Arriving Data, Net Sales

Sample Answer

Reason through it: Walk through the logic step by step as if thinking out loud. Start by separating events into two fact tables at their natural grains, a sales fact at ship date and a returns fact at return date, both keyed by item, store, and order identifiers. Next decide what “by day” means, usually activity-day reporting, so net sales for a day is sales on that ship day minus returns processed on that day, not a retroactive adjustment to the ship day. If the business also needs “sales adjusted back to original sale day,” add a second metric or a bridge that links returns to original ship date, and version the output so downstream dashboards know which definition they are using.

You ingest daily full snapshots of inventory by store and item from an upstream system (no CDC), and analysts want end-of-day on-hand as well as change-in-on-hand and stockout rate. How do you model this in the warehouse, including keys and SCD choices, so queries are fast and duplicates do not corrupt metrics?

HardSnapshot Facts, Inventory Modeling, SCD

Practice more Data Modeling & Warehousing for Analytics questions

Software Engineering Practices (Scala/Python/Java quality)

The bar here isn’t whether you know a language, it’s whether you can ship maintainable, secure, testable pipeline code under real constraints. Interviewers look for clean design, error handling, idempotency, dependency management, unit/integration testing strategy, and how you approach debugging and code reviews.

A Spark/Scala job builds a daily store sku inventory snapshot from Kafka events into Hive partitioned by dt, and reruns happen after failures. What code and data checks make the write idempotent and prevent duplicate or missing rows for a given (store_id, sku_id, dt)?

MediumIdempotency and error handling

Sample Answer

Reason through it: Walk through the logic step by step as if thinking out loud. Start with a deterministic business key, for example (store_id, sku_id, dt), then ensure every transformation preserves or re-derives that key. Next, make the sink write atomic, write to a staging path or temp table for that dt, validate row counts and uniqueness, then swap or insert overwrite the dt partition. Finally, add dedupe logic on ingest (event_id plus max event_time) and fail fast on violations, because silent duplicates are the most expensive bug in retail metrics.

You inherit a Python ETL that reads BigQuery sales_line_item and writes curated facts, and it intermittently times out then retries, creating partial loads. Show how you would structure the code (functions, retries, transactions, logging) and what unit and integration tests you would add to make retries safe.

HardTestability and safe retries

Sample Answer

Start with what the interviewer is really testing: "This question is checking whether you can ship production code that fails predictably and recovers without corrupting data." You want a single orchestration entrypoint with pure functions for transforms, explicit side-effect boundaries for reads and writes, and structured logs with run_id, dataset, table, and dt. Tests should cover deterministic transforms (unit), retry behavior and idempotent writes (integration against a temp dataset), plus a failure injection test that kills the job after staging write to prove the next run either completes cleanly or rolls back.

Python

1import json
2import logging
3import os
4import time
5import uuid
6from dataclasses import dataclass
7from typing import Callable, Optional
8
9from google.cloud import bigquery
10
11logging.basicConfig(level=logging.INFO)
12
13
14@dataclass(frozen=True)
15class LoadConfig:
16    project: str
17    raw_dataset: str
18    curated_dataset: str
19    dt: str
20    max_attempts: int = 4
21    base_backoff_s: float = 5.0
22
23
24def with_retries(fn: Callable[[], None], *, max_attempts: int, base_backoff_s: float, logger: logging.Logger) -> None:
25    attempt = 1
26    while True:
27        try:
28            fn()
29            return
30        except Exception as e:
31            if attempt >= max_attempts:
32                logger.exception("exhausted_retries")
33                raise
34            sleep_s = base_backoff_s * (2 ** (attempt - 1))
35            logger.warning("retrying", extra={"attempt": attempt, "sleep_s": sleep_s, "error": str(e)})
36            time.sleep(sleep_s)
37            attempt += 1
38
39
40def ensure_tables(client: bigquery.Client, curated_dataset: str) -> None:
41    # In real code, manage this via IaC or migrations.
42    client.query(
43        f"""
44        CREATE TABLE IF NOT EXISTS `{curated_dataset}.fact_sales` (
45          dt DATE,
46          order_id STRING,
47          line_id STRING,
48          store_id STRING,
49          sku_id STRING,
50          net_sales NUMERIC,
51          load_run_id STRING,
52          load_ts TIMESTAMP
53        )
54        PARTITION BY dt
55        """
56    ).result()
57
58
59def stage_and_swap_partition(client: bigquery.Client, cfg: LoadConfig, *, run_id: str, logger: logging.Logger) -> None:
60    curated = f"{cfg.project}.{cfg.curated_dataset}"
61    raw = f"{cfg.project}.{cfg.raw_dataset}"
62
63    staging_table = f"{curated}.fact_sales_stg_{run_id.replace('-', '_')}"
64
65    # Step 1: Create staging table for this run.
66    client.query(
67        f"""
68        CREATE TABLE `{staging_table}` AS
69        SELECT
70          DATE(@dt) AS dt,
71          order_id,
72          line_id,
73          store_id,
74          sku_id,
75          CAST(net_sales AS NUMERIC) AS net_sales,
76          @run_id AS load_run_id,
77          CURRENT_TIMESTAMP() AS load_ts
78        FROM `{raw}.sales_line_item`
79        WHERE DATE(order_ts) = DATE(@dt)
80        """,
81        job_config=bigquery.QueryJobConfig(
82            query_parameters=[
83                bigquery.ScalarQueryParameter("dt", "DATE", cfg.dt),
84                bigquery.ScalarQueryParameter("run_id", "STRING", run_id),
85            ]
86        ),
87    ).result()
88
89    # Step 2: Validate uniqueness and basic sanity.
90    dup_check = client.query(
91        f"""
92        SELECT COUNT(1) AS dup_cnt
93        FROM (
94          SELECT order_id, line_id, COUNT(1) c
95          FROM `{staging_table}`
96          GROUP BY 1, 2
97          HAVING c > 1
98        )
99        """
100    ).result()
101    dup_cnt = next(iter(dup_check)).dup_cnt
102    if dup_cnt > 0:
103        raise ValueError(f"staging_has_duplicates dup_cnt={dup_cnt}")
104
105    # Step 3: Atomic partition replace via transaction.
106    # BigQuery scripting transaction keeps delete+insert together.
107    client.query(
108        f"""
109        BEGIN TRANSACTION;
110
111        DELETE FROM `{curated}.fact_sales`
112        WHERE dt = DATE(@dt);
113
114        INSERT INTO `{curated}.fact_sales` (dt, order_id, line_id, store_id, sku_id, net_sales, load_run_id, load_ts)
115        SELECT dt, order_id, line_id, store_id, sku_id, net_sales, load_run_id, load_ts
116        FROM `{staging_table}`;
117
118        COMMIT TRANSACTION;
119        """,
120        job_config=bigquery.QueryJobConfig(
121            query_parameters=[bigquery.ScalarQueryParameter("dt", "DATE", cfg.dt)]
122        ),
123    ).result()
124
125    # Step 4: Cleanup staging.
126    client.delete_table(staging_table, not_found_ok=True)
127    logger.info("load_complete", extra={"dt": cfg.dt, "run_id": run_id})
128
129
130def run_load(cfg: LoadConfig) -> None:
131    logger = logging.getLogger("target_de_load")
132    run_id = str(uuid.uuid4())
133    logger.info("load_start", extra={"dt": cfg.dt, "run_id": run_id})
134
135    client = bigquery.Client(project=cfg.project)
136    ensure_tables(client, f"{cfg.project}.{cfg.curated_dataset}")
137
138    def attempt() -> None:
139        stage_and_swap_partition(client, cfg, run_id=run_id, logger=logger)
140
141    with_retries(attempt, max_attempts=cfg.max_attempts, base_backoff_s=cfg.base_backoff_s, logger=logger)
142
143
144if __name__ == "__main__":
145    # Example: PROJECT=myproj RAW_DS=raw CURATED_DS=curated DT=2026-02-25 python load.py
146    cfg = LoadConfig(
147        project=os.environ["PROJECT"],
148        raw_dataset=os.environ["RAW_DS"],
149        curated_dataset=os.environ["CURATED_DS"],
150        dt=os.environ["DT"],
151    )
152    run_load(cfg)
153

In a Spark job that computes weekly sales metrics (units, net_sales, margin) by store and department, you see null spikes and negative margin after a refactor. What are your code review rules for null handling, type safety, and schema evolution to prevent this class of bugs?

EasyCode review and schema safety

Practice more Software Engineering Practices (Scala/Python/Java quality) questions

Cloud & Data Platform Fundamentals (GCP/AWS/Azure, BigQuery)

In practice, you’ll need to translate pipeline requirements into cloud-native components and operational guardrails without overengineering. Prepare to cover storage/compute separation, IAM basics, cost/TCO tradeoffs, batch scheduling options, and what changes when moving Hive/Spark workloads onto platforms like GCP (with BigQuery as a common advantage).

A daily BigQuery fact table for Target.com orders is partitioned by order_date and clustered by guest_id. A downstream dashboard is slow and expensive when filtering by store_id and sku_id, what changes do you make in partitioning, clustering, or table layout and why?

EasyBigQuery Partitioning and Clustering

Sample Answer

This question is checking whether you can map query patterns to BigQuery physical design and cost controls. You should say partition on the most common time filter to enable partition pruning, then cluster on the highest selectivity non-time predicates used in WHERE and JOIN. If store_id and sku_id drive most filters, consider clustering on (store_id, sku_id) or producing a separate aggregated table for that dashboard to avoid scanning the raw grain.

You are migrating a Hive on Spark job that builds a weekly inventory snapshot into GCP. Which parts should become BigQuery SQL, which should stay as Spark on Dataproc or Dataplex, and what rule do you use to decide given cost, latency, and maintainability?

MediumWorkload Placement, Spark vs BigQuery

Sample Answer

The standard move is to push set-based transforms, joins, aggregations, and incremental merges into BigQuery because it is serverless, operationally simpler, and scales predictably. But here, Spark still matters because UDF-heavy logic, complex file-level parsing, bespoke libraries, or shuffle patterns that explode slot consumption can be cheaper and more controllable on Dataproc. Your decision rule is: keep BigQuery for relational ELT on curated tables, keep Spark for heavy preprocessing and non-SQL compute, then hand off clean, typed tables into BigQuery.

A BigQuery scheduled query writes the daily sales fact table used for guest-level personalization and store KPIs, but reruns sometimes create duplicates and inconsistent totals. How do you implement idempotent loads and backfills in BigQuery, including IAM and guardrails to prevent bad writes?

HardBigQuery Idempotent Loads, MERGE, IAM

Practice more Cloud & Data Platform Fundamentals (GCP/AWS/Azure, BigQuery) questions

Behavioral & Operational Ownership (incidents, DR, collaboration)

When a feed breaks at 6am or a metric shifts unexpectedly, your response process matters as much as the fix. You’ll be evaluated on incident triage, on-call/hand-off habits, change management, stakeholder communication, and examples of preventing repeats through monitoring, runbooks, and postmortems (plus any DR participation).

At 6am, the daily sales fact table feeding a Target store performance dashboard is missing two hours of transactions, and business users are paging you. Walk through your triage and comms in the first 30 minutes, including what you check in Spark, the scheduler, and downstream tables.

EasyIncident Triage and Stakeholder Comms

Sample Answer

The standard move is to stabilize the blast radius first, acknowledge the incident, stop bad data from propagating, and open a clear timeline with owners and ETAs. But here, the dashboard is already live, so you also need to decide fast between pausing refresh, backfilling, or serving stale data because the wrong choice creates more distrust than a short outage. You check upstream landing completeness, job retries and late data thresholds, partition watermark logic, and whether the downstream model is doing an inner join that silently drops hours. You communicate in plain terms, what is impacted, what is not, and when the next update is coming.

Your team needs to ship a schema change to the item dimension that will add a new attribute used in promo reporting, and it touches multiple ELT jobs and a BigQuery table. Describe the change management steps you follow to avoid breaking downstream consumers and how you validate after deploy.

MediumChange Management and Backward Compatibility

Sample Answer

Get this wrong in production and your promo lift numbers swing, dashboards flatline, and teams start copying data into spreadsheets to work around you. The right call is a compatibility-first rollout, add the column without changing semantics, version the model or contract if needed, and run parallel validation against known-good aggregates. You gate with CI checks, data quality tests, and a canary run on a subset of partitions before full backfill. After deploy, you verify freshness, row counts, null rate deltas for the new field, and downstream job success, then you announce completion with the exact effective date and any required consumer action.

A regional outage takes out access to a primary dataset needed for same-day fulfillment metrics, and you are asked to propose a DR approach for your pipelines and data models. What recovery objectives do you set (RPO and RTO), and what concrete design choices do you make in storage, compute, and runbooks to meet them?

HardDisaster Recovery and Resilience Planning

Practice more Behavioral & Operational Ownership (incidents, DR, collaboration) questions

The distribution rewards candidates who can move fluidly between writing a Spark job, modeling the Hive table it lands in, and defending the code quality of both. Treating those as separate study tracks is the most common prep mistake, because Target's retail pipelines (think: nightly store-SKU inventory snapshots feeding replenishment across 1,900 locations) demand all three skills in a single design conversation. If you've only practiced cloud architecture or behavioral storytelling, you're studying for the minority of the interview.

Drill retail-flavored pipeline, modeling, and SQL scenarios together at datainterview.com/questions.

How to Prepare for Target Data Engineer Interviews

Know the Business

Updated Q1 2026

Official mission

“To help all families discover the joy of everyday life.”

What it actually means

Target aims to be a leading multi-channel retailer, providing affordable, convenient, and enjoyable shopping experiences for families. It also focuses on fostering a positive environment for its team members and contributing to the communities it serves.

Minneapolis, MinnesotaUnknown

Key Business Metrics

Revenue

$107B

-1% YoY

Market Cap

$52B

Current Strategic Priorities

Strengthen leadership as the destination for trend-forward products and everyday wellbeing
Make wellness accessible (fun, easy, affordable, personalized)
Make trend-driven, expert-backed beauty more accessible
Refresh in-store beauty experience and host beauty events

Competitive Moat

Upscale discount positioningHigh-quality and current trend merchandise at feasible pricesExclusive designer partnershipsDiverse merchandise assortmentsCustomer loyalty program

Target pulled in $106.6 billion in revenue last year and has publicly committed to more than $15 billion in additional sales growth by 2030. Its current north-star priorities center on becoming the destination for trend-forward products and making wellness accessible, personalized, and affordable. For data engineers, that translates into pipelines powering product assortment decisions, personalization engines, and the operational infrastructure behind rapid category expansion (like the largest spring beauty assortment the company has ever launched).

What makes Target's engineering org distinct is how it treats infrastructure ownership. The infra showback system means you own cost visibility for your pipelines, not just correctness. And the platform engineering playbook frames data engineering as a product discipline where shared data products serve hundreds of internal consumers.

The "why Target?" answer that actually works ties your experience to a specific engineering challenge described on tech.target.com. Saying "I read your showback post and I've built similar cost-attribution layers for shared compute clusters" is concrete and hard to fake. Contrast that with vague enthusiasm about the shopping experience or the brand, which tells the interviewer nothing about how you think as an engineer. Reference the playbook's emphasis on feature documents and platform thinking, then connect it to a project where you treated a pipeline as a product with defined consumers and SLAs.

Try a Real Interview Question

Daily Order Dedup and Revenue by Store

sql

You ingest order header events that can contain duplicates and late arriving updates for the same $order_id$. For each $store_id$ and $order_date$, compute $unique_orders$ and $gross_revenue$ after keeping only the latest record per $order_id$ by $updated_at$, and excluding cancelled orders where $status$ equals $CANCELLED$. Output rows grouped by $store_id$, $order_date$ sorted by $order_date$ then $store_id$.

orders_events

order_id	store_id	order_date	updated_at	status	total_amount
1001	101	2026-02-20	2026-02-20 10:05:00	PLACED	55.20
1001	101	2026-02-20	2026-02-20 10:15:00	SHIPPED	55.20
1002	101	2026-02-20	2026-02-20 11:00:00	CANCELLED	18.99
1003	102	2026-02-20	2026-02-20 09:30:00	PLACED	120.00
1004	101	2026-02-21	2026-02-21 08:00:00	PLACED	75.00

stores

store_id	store_name
101	Downtown
102	Uptown
103	Suburban

SQL

1WITH ranked AS (
2  SELECT
3    order_id,
4    store_id,
5    order_date,
6    updated_at,
7    status,
8    total_amount,
9    ROW_NUMBER() OVER (
10      PARTITION BY order_id
11      ORDER BY updated_at DESC
12    ) AS rn
13  FROM orders_events
14)
15SELECT
16  store_id,
17  order_date,
18  COUNT(*) AS unique_orders,
19  SUM(CAST(total_amount AS DECIMAL(18,2))) AS gross_revenue
20FROM ranked
21WHERE rn = 1
22  AND status <> 'CANCELLED'
23GROUP BY store_id, order_date
24ORDER BY order_date, store_id;

700+ ML coding problems with a live Python executor.

Practice in the Engine

Target's interview loop, from what candidates report, puts real weight on SQL fluency applied to retail-shaped data. The problems aren't abstract puzzles; they tend to involve the kind of aggregation, windowing, and schema reasoning you'd actually need when working on inventory or sales datasets at scale. Build that muscle consistently at datainterview.com/coding.

Test Your Readiness

How Ready Are You for Target Data Engineer?

1 / 10

Distributed ETL (Spark)

Can you design a Spark job that joins large and skewed datasets, and explain how you would mitigate skew (salting, broadcast joins, AQE) while controlling shuffle and partitioning?

Drill data modeling and pipeline design scenarios at datainterview.com/questions, focusing on retail patterns like slowly changing dimensions for product catalogs and fact tables for transactional data.

Frequently Asked Questions

How long does the Target Data Engineer interview process take?

Most candidates report the full process takes about 3 to 5 weeks from initial recruiter screen to offer. You'll typically start with a recruiter call, move to a technical phone screen focused on SQL and coding, and then do a virtual or onsite loop with 3 to 4 rounds. Target's recruiting team in Minneapolis tends to move at a reasonable pace, but holiday seasons (Q4 especially) can slow things down since that's their busiest retail period.

What technical skills are tested in a Target Data Engineer interview?

SQL is the backbone of every round, no matter the level. Beyond that, expect questions on Python (sometimes Java or Scala), distributed programming concepts, and the Hadoop ecosystem (Hive, Spark specifically). Cloud platform experience with Google Cloud, AWS, or Azure comes up regularly. You should also be comfortable discussing CI/CD basics, software design patterns, debugging, and operational support topics like monitoring and incident management.

How should I tailor my resume for a Target Data Engineer role?

Lead with your data pipeline and ETL experience. Target cares about Hadoop ecosystem tools, so if you've used Hive, Spark, or HiveQL, put those front and center. Quantify your impact with metrics like pipeline throughput, data volume processed, or latency improvements. Mention any cloud platform work (GCP, AWS, Azure) and call out Python or Scala projects explicitly. Keep it to one page for junior roles, two pages max for senior and above.

What is the salary and total compensation for Target Data Engineers?

Compensation varies by level. At P1 (Junior, 0-2 years), total comp averages around $115,000 with a base of $105,000. P2 (Mid, 2-5 years) jumps to about $140,000 TC on a $125,000 base. P3 (Senior, 4-10 years) averages $170,000 TC with a $135,000 base. Staff level (P4) hits roughly $190,000 TC, and Principal (P5) averages $230,000 with a base around $190,000. RSUs are part of the package, communicated as a target dollar amount and converted to shares based on stock price near the grant date.

How do I prepare for the behavioral interview at Target?

Target's core values are Care, Grow, Win, Ethical Business Practices, and Community Responsibility. I'd prepare 5 to 6 stories that map to these themes. Think about times you mentored someone (Grow), made a tough ethical call, or rallied a team around a shared goal (Win). At senior levels and above, they'll dig into cross-team influence and how you've driven alignment across engineering and business stakeholders. Be genuine. Target's culture leans collaborative, not cutthroat.

How hard are the SQL questions in Target Data Engineer interviews?

For junior roles (P1), expect medium-difficulty SQL covering joins, window functions, and aggregations. Nothing exotic, but you need to be fast and accurate. At P2 and above, they layer on performance tuning, HiveQL-specific syntax, and complex data profiling scenarios. Senior and staff candidates should be ready for questions about query optimization in distributed environments. I'd recommend practicing on datainterview.com/questions to get comfortable with retail-flavored data problems.

What ML or statistics concepts should I know for a Target Data Engineer interview?

Data Engineer roles at Target are not ML-heavy. The focus stays on data engineering fundamentals: data modeling (star schema, snowflake schema, slowly changing dimensions), data quality, and pipeline reliability. That said, understanding basic statistical profiling of data (distributions, outliers, null handling) helps during data quality discussions. You won't be asked to build models, but knowing how your pipelines feed downstream analytics and ML teams is a plus at senior levels.

What format should I use to answer behavioral questions at Target?

I recommend the STAR format (Situation, Task, Action, Result) but keep it tight. Two minutes max per answer. Target interviewers want specifics, not vague generalities. Quantify results where you can: "reduced pipeline failures by 40%" hits harder than "improved reliability." For P4 and P5 candidates, they'll probe deeper into the Action step, asking about tradeoffs you considered and how you influenced others without direct authority.

What happens during the Target Data Engineer onsite interview?

The onsite (often virtual) typically has 3 to 4 rounds. Expect at least one SQL and coding round, one system design round (especially for P3 and above), and one or two behavioral rounds. Junior candidates get more weight on SQL fundamentals and basic ETL scripting. Senior and staff candidates face practical system design problems around data pipelines, lakehouse or warehouse modeling, orchestration, and reliability. There's usually a hiring manager conversation mixed in as well.

What business metrics and concepts should I know for a Target Data Engineer interview?

Target is a $106.6 billion revenue retailer, so think in terms of retail metrics: sales per store, inventory turnover, supply chain throughput, customer lifetime value, and conversion rates across channels. Understanding their multi-channel strategy (in-store, online, same-day delivery) helps you frame system design answers. At senior levels, being able to connect your pipeline design decisions to business outcomes like faster inventory insights or better personalization will set you apart.

What coding languages should I practice for a Target Data Engineer interview?

SQL is non-negotiable. Every single round will touch it in some form. Python is the most common scripting language they test, particularly for ETL and data transformation tasks. Java and Scala come up for teams working heavily in the Spark ecosystem. HiveQL is also worth brushing up on since Target uses the Hadoop ecosystem extensively. I'd focus 60% of your prep time on SQL and Python, then allocate the rest based on the specific team. Practice at datainterview.com/coding to build speed.

What are common mistakes candidates make in Target Data Engineer interviews?

The biggest one I've seen is underestimating the system design round. Candidates nail the SQL but freeze when asked to design an end-to-end data pipeline with orchestration, monitoring, and failure handling. Another common mistake is ignoring data quality and observability. Target cares a lot about operational support, incident management, and reliability. Don't just describe the happy path. Talk about what breaks, how you detect it, and how you recover. Finally, skipping behavioral prep is a real risk, especially at P3 and above where culture fit carries significant weight.

Target Data Engineer Interview Guide

Target Data Engineer Role

A Typical Week

A Week in the Life of a Target Data Engineer

Weekly time split

Culture notes

Projects & Impact Areas

Skills & What's Expected

Levels & Career Growth

Target Data Engineer Levels

Work Culture

Target Data Engineer Compensation

Target Data Engineer Interview Process

Initial Screen

Recruiter Screen

Hiring Manager Screen

Technical Assessment

Behavioral

SQL & Data Modeling

Onsite

System Design

Tips to Stand Out

Common Reasons Candidates Don't Pass

Target Data Engineer Interview Questions

Data Pipelines & Distributed ETL (Spark/Hive/Hadoop)

SQL & Data Profiling (SQL/HiveQL/BigQuery)

Data Modeling & Warehousing for Analytics

Software Engineering Practices (Scala/Python/Java quality)

Cloud & Data Platform Fundamentals (GCP/AWS/Azure, BigQuery)

Behavioral & Operational Ownership (incidents, DR, collaboration)

How to Prepare for Target Data Engineer Interviews

Try a Real Interview Question

Daily Order Dedup and Revenue by Store

Test Your Readiness

Frequently Asked Questions

Dan Lee

Related Articles

Snap Data Scientist Interview Guide

TikTok Data Engineer Interview Guide

Salesforce Data Analyst Interview Guide