Meta Machine Learning Engineer Guide (2026): Job, Salary & Interviews

Meta Machine Learning Engineer at a Glance

Total Compensation

$187k - $785k/yr

Interview Rounds

7 rounds

Difficulty

Levels

E3 - E7

Education

Bachelor's / Master's / PhD

Experience

0–25+ yrs

Python C++ Java JavaScript Hack Perl PHP Shell scriptsMachine LearningMLOpsScalable SystemsRecommendation SystemsSearchIntegrity & Abuse DetectionPersonalizationDeep Learning

Most candidates prep for Meta's MLE role like it's a modeling job with some coding on the side. The skill expectations tell a different story. Software engineering and machine learning are both rated at expert level, which means the coding bar is closer to a pure SWE loop than you'd find at most companies' MLE roles. If you're splitting prep time, weight them equally.

Meta Machine Learning Engineer Role

Primary Focus

Machine LearningMLOpsScalable SystemsRecommendation SystemsSearchIntegrity & Abuse DetectionPersonalizationDeep Learning

Skill Profile

Math & Stats

High

Strong understanding of statistical modeling, probability, optimization techniques, and model evaluation metrics essential for developing and improving ML models and strategies.

Software Eng

Expert

Deep expertise in designing, developing, testing, deploying, and maintaining robust, scalable, and efficient software systems, including API design, distributed systems, and architectural patterns.

Data & SQL

High

Experience with designing and implementing scalable data architectures, ETL processes, and ML data pipelines for large-scale data ingestion, processing, and feature engineering.

Machine Learning

Expert

Comprehensive expertise in machine learning algorithms, model development, training, evaluation, deployment, and lifecycle management, including experience with recommendation systems, pattern recognition, and data mining.

Applied AI

High

Strong understanding and practical experience with modern AI techniques, including deep learning architectures and Large Language Models (LLMs), for research, development, and application.

Infra & Cloud

High

Experience with designing, deploying, monitoring, and scaling large-scale ML systems and applications within a distributed infrastructure environment, including performance optimization and operational best practices.

Business

Expert

Exceptional ability to understand business problems, identify ML opportunities to drive significant business impact, translate technical insights into actionable recommendations, and balance technical and business trade-offs.

Viz & Comms

High

Strong ability to communicate complex technical concepts and ML model performance/insights effectively to technical and non-technical stakeholders, and translate insights into business recommendations.

What You Need

Extensive experience in supporting and evolving a portfolio of ML models that deliver on critical business goals
Experience developing machine learning models at scale from inception to business impact
Ability to architect efficient and scalable systems that drive complex applications
Experience building maintainable and testable code bases, including API design and unit testing techniques
Experience in machine learning, recommendation systems, pattern recognition, data mining, or artificial intelligence
Proven ability to translate insights into business recommendations

Nice to Have

Experience working with ML models in financial risk or similar financial contexts
Exposure to architectural patterns of large-scale software applications
Experience improving quality through thoughtful code reviews, appropriate testing, proper rollout, monitoring, and proactive changes
Experience with research and introduction of deep learning and Large Language Model (LLM) technologies
Experience with filesystems, server architectures, and distributed systems

Languages

PythonC++JavaJavaScriptHackPerlPHPShell scripts

Tools & Technologies

PyTorchTensorFlowHadoopHbasePigMapReduceSawzallBigtable

Want to ace the interview?

Practice with real questions.

Start Mock Interview

The widget covers the role's shape. What it won't tell you is how the day-to-day feels: you're writing production Python and C++ that gets code-reviewed like any other SWE commit, while also owning model training in PyTorch and monitoring metrics after deployment. Success here means shipping model changes that move product metrics on surfaces like Feed ranking, Ads prediction, or Integrity classifiers, and doing it through reviewed, tested code rather than handed-off prototypes.

A Typical Week

A Week in the Life of a Meta Machine Learning Engineer

Typical L5 workweek · Meta

Weekly time split

Coding — 30%Meetings — 22%Infrastructure — 15%Writing — 10%Break — 10%Analysis — 8%Research — 5%

What stands out in that breakdown is the sheer amount of time spent on code and infrastructure relative to experimentation. Cross-functional syncs with product engineers, data engineers, and PMs are a recurring fixture, especially on teams like Ads Prediction where model changes directly affect revenue. MLEs at Meta tend to own the full pipeline from training through serving, so expect your week to include debugging production issues alongside designing new features.

Projects & Impact Areas

Feed and Reels ranking concentrates the largest share of MLEs, building recommendation systems that determine content ordering for Meta's massive user base. Ads prediction sits right alongside it as the revenue engine, optimizing click-through and conversion models under tight latency constraints. The GenAI surface is expanding fast (Meta AI assistant, the Llama model family), while Reality Labs offers a smaller niche where on-device ML for Quest headsets trades datacenter-scale problems for model size and power budget constraints.

Skills & What's Expected

Business acumen is rated expert-level, same as ML and software engineering, and that's the detail most candidates underweight. You need to articulate how a model improvement maps to engagement, revenue, or integrity coverage, not just explain the architecture. Data pipelines and infrastructure/deployment skills are rated high rather than expert, but don't mistake "high" for optional: building and maintaining feature pipelines at Meta's scale is core to the job.

Levels & Career Growth

Meta Machine Learning Engineer Levels

Each level has different expectations, compensation, and interview focus.

Base

$138k

Stock/yr

$33k

Bonus

$17k

0–2 yrs Bachelor's degree in Computer Science or a related field is typically required. A Master's or PhD is common but not strictly necessary for this entry-level role.

What This Level Looks Like

Works on well-defined tasks and features within a single team, guided by senior engineers. Scope is limited to specific components of a larger system. Focus is on execution and learning the codebase and team processes.

Day-to-Day Focus

→Learning core machine learning and software engineering skills.
→Executing on assigned tasks with high quality and timeliness.
→Ramping up on the team's specific technologies, codebase, and infrastructure.
→Developing as a productive and collaborative member of the team.

Interview Focus at This Level

Interviews heavily emphasize coding fundamentals (data structures, algorithms) and foundational machine learning concepts (e.g., classification, regression, evaluation metrics). System design questions are typically basic, focusing on thought process rather than deep expertise. Behavioral questions assess learning aptitude and teamwork.

Promotion Path

Promotion to E4 requires demonstrating the ability to work independently on medium-sized, well-scoped projects. This includes showing ownership from design to completion, consistently delivering high-quality work with minimal guidance, and developing a solid understanding of the team's systems and domain.

Find your level

Practice with questions tailored to your target level.

Start Practicing

Most external hires land at E4 or E5. The jump from E5 to E6 is where careers stall, and the data explains why: E6 scope requires leading multi-quarter, multi-team initiatives, not just shipping better models. That shift from individual technical excellence to cross-team influence is the single biggest promotion blocker candidates report.

Work Culture

Meta's pace is genuinely fast. You'll ship experiments quickly and iterate, which is energizing if you like velocity and draining if you prefer deep ownership of one system. The source data describes the work schedule as flexible and hybrid, though from what candidates report, in-office expectations have been tightening and fully remote MLE roles are uncommon.

Meta Machine Learning Engineer Compensation

Meta's RSUs vest quarterly at 6.25% per quarter, totaling 25% each year across a four-year grant. That steady quarterly cadence means you're seeing real money hit your brokerage account every three months from day one, which matters when you're comparing offer structures.

The single biggest lever in your compensation isn't negotiation, it's leveling. The widget above makes the E5-to-E6 gap obvious, but what it can't show is that your interview performance determines which band you land in. A strong behavioral round and system design showing can be the difference between an E4 and E5 offer on teams like Feed Ranking or Ads, where Meta needs senior MLEs who can own model serving pipelines end to end.

Because Meta's MLE ladder sits under the unified Software Engineer title, your level carries over if you ever move from, say, an Integrity classifier team to the Llama serving infrastructure group. That portability is worth factoring into how you evaluate the offer beyond just the year-one number.

Meta Machine Learning Engineer Interview Process

7 rounds·~4 weeks end to end

Initial Screen

1 round

Recruiter Screen

30mPhone

In this 30-minute call, you’ll walk through your background, the types of ML problems you’ve worked on, and what kind of team/level you’re targeting. Expect a quick calibration on role fit (MLE vs adjacent roles), location/remote constraints, and compensation expectations. You’ll also align on timelines and what the interview loop will cover.

generalbehavioralengineering

Tips for this round

Prepare a 90-second narrative tying your most recent role to ML impact (metrics moved, latency/cost reductions, launch outcomes).
Be ready to specify your preferred domain (ranking/recsys, ads, integrity, GenAI, infra) and how that maps to MLE work vs Research Scientist.
Clarify level signals (scope, autonomy, leadership) using concrete examples rather than years of experience.
Share constraints early (visa, start date, location) to avoid later scheduling resets.
If asked about comp, provide a range anchored to level and location, and emphasize you’re optimizing for role/team fit first.

Technical Assessment

1 round

Coding & Algorithms

60mVideo Call

Next you’ll do a live coding screen focused on data structures and algorithms, typically in a shared editor. The interviewer will evaluate correctness, runtime/space complexity, and how you communicate tradeoffs while coding. You should expect 1–2 problems with follow-ups that push edge cases and optimization.

algorithmsdata_structuresengineeringml_coding

Tips for this round

Practice implementing solutions in a Meta-common language (Python/C++/Java) with clean function signatures and minimal bugs.
Talk through complexity explicitly (e.g., O(n log n) vs O(n)) and justify data structure choices (heap, deque, hashmap, union-find).
Use a consistent approach: clarify requirements, propose brute force, optimize, then code and test with custom cases.
Write and run through edge cases aloud (empty input, duplicates, overflow, large constraints) before finalizing.
Keep code interview-ready: avoid overengineering, but include helper functions and meaningful variable names.

Onsite

5 rounds

Coding & Algorithms

45mVideo Call

Expect a second coding interview that’s similar in style but often probes deeper via follow-ups and alternative approaches. You’ll be judged on how quickly you converge on a working solution and whether you can adapt when constraints change. The pace is faster than the screen, so communication and testing discipline matter.

algorithmsdata_structuresengineeringml_coding

Tips for this round

Train for speed: aim to reach a correct baseline within 10–15 minutes, then iterate with optimizations.
Get comfortable with common Meta patterns (two pointers, sliding window, BFS/DFS, top-k, intervals, string parsing).
When stuck, narrate what you’re trying and propose a smaller subproblem or invariant to regain momentum.
After coding, do a structured dry run with at least two non-trivial examples and one edge case.
Be ready to discuss alternative implementations and why you chose yours (readability vs performance).

Machine Learning & Modeling

45mVideo Call

You’ll be asked ML fundamentals questions that test depth beyond “library usage,” including modeling choices and failure modes. The interviewer may move between theory (bias/variance, regularization, calibration) and applied troubleshooting (data leakage, class imbalance, offline vs online gaps). Questions can span classical ML and deep learning depending on the team.

machine_learningstatisticsprobabilitydeep_learning

Tips for this round

Refresh core concepts: loss functions, regularization (L1/L2), feature leakage, calibration, and evaluation metrics (AUC, PR, NDCG).
Practice diagnosing model issues with a checklist: data quality, label definition, train/serve skew, hyperparameters, and objective mismatch.
Be ready to compare models (logistic regression vs GBDT vs DNN) with tradeoffs in interpretability, latency, and data requirements.
Know experimentation basics: how to validate improvements, avoid overfitting, and interpret confidence intervals at a high level.
Use concrete examples from your experience to anchor answers (what you tried, what worked, what you measured).

System Design

45mVideo Call

The system design round will probe how you’d build and scale an ML-powered product end-to-end. You’ll cover data ingestion, feature computation, training, offline/online evaluation, serving, monitoring, and iteration loops. Expect emphasis on practical tradeoffs: latency, reliability, scalability, and experiment velocity.

ml_system_designsystem_designml_operationsdata_pipeline

Tips for this round

Use an ML system design template: requirements → data/labels → features → model → training pipeline → serving → monitoring → iteration.
Call out online constraints explicitly (P99 latency, throughput, availability) and propose caching, batching, or approximate retrieval where needed.
Discuss feature stores, training/serving consistency, and how you would prevent train-serve skew.
Include monitoring beyond infra: data drift, concept drift, calibration drift, and guardrail metrics tied to user harm and quality.
Show an experimentation plan: shadow mode, A/B testing, rollback strategy, and how you’d debug regressions.

Product Sense & Metrics

45mVideo Call

You’ll be given a product/ML scenario and asked to define success metrics, design experiments, and reason about tradeoffs. The interviewer will test whether you can connect model outputs to product impact and choose metrics that resist gaming. Expect follow-ups about segmentation, guardrails, and how you’d interpret ambiguous results.

product_senseab_testingstatisticscausal_inference

Tips for this round

Start with a metric tree: north star → input metrics → diagnostic slices, plus guardrails (latency, complaints, integrity).
Use A/B testing fundamentals: randomization unit, power considerations, novelty effects, and interpreting confidence intervals.
Anticipate confounders: seasonality, selection bias, sample ratio mismatch, and interference; propose mitigations.
Define how offline metrics map (or fail to map) to online outcomes; propose an evaluation plan that closes the loop.
Bring structure to ambiguity: list assumptions, propose a minimal viable experiment, then expand with iterations.

Behavioral

45mVideo Call

The behavioral interview focuses on how you work: ownership, collaboration, conflict, execution under ambiguity, and learning from mistakes. You’ll be evaluated on signal-rich storytelling rather than polished slogans. Expect probing follow-ups to validate scope, your exact role, and the impact you drove.

behavioralgeneralengineering

Tips for this round

Prepare 6–8 STAR stories covering: leading a project, influencing without authority, a failure, a conflict, and a high-ambiguity launch.
Quantify impact with before/after metrics (CTR, revenue, latency, cost, data quality) and clarify your personal contribution.
Show strong engineering judgment: tradeoffs, timelines, risk management, and how you handled on-call/production issues if applicable.
Demonstrate collaboration with cross-functional partners (PM, DS, SWE, infra) and how decisions were made.
Practice concise answers (2–3 minutes) then invite deep dives; don’t ramble or hide the hard parts.

Tips to Stand Out

Treat MLE as SWE+ML. Prioritize coding fluency (DSA speed, clean implementation) while also showing you can reason about modeling and real-world deployment constraints.
Use consistent interview structure. For every problem: clarify → propose approach → analyze complexity/tradeoffs → implement → test → iterate; interviewers reward disciplined communication.
Anchor answers in metrics and decisions. When describing projects, emphasize what you chose (objective, features, model, infra), why you chose it, and the measured outcome and guardrails.
Prepare an end-to-end ML system story. Be ready to design data/labeling, training pipelines, online serving, and monitoring/rollback as one coherent system, not disconnected components.
Practice product/experiment thinking. Build comfort translating model changes into A/B tests, choosing robust metrics, and diagnosing when offline gains don’t ship to online impact.
Calibrate to level. For senior levels, explicitly highlight scope, cross-team influence, and how you set direction; for mid-level, emphasize strong execution and reliability in production.

Common Reasons Candidates Don't Pass

✗Coding execution gaps. Even with the right idea, frequent bugs, missing edge cases, or weak complexity analysis can lead to a ‘no’ in a loop that heavily weights SWE fundamentals.
✗Shallow ML understanding. Answers that rely on ‘try X model’ without discussing objectives, leakage, calibration, evaluation choice, or failure modes often signal weak modeling judgment.
✗Hand-wavy system design. Omitting data/label generation, train-serve skew, monitoring, or latency/scaling tradeoffs makes the design feel academic rather than production-ready.
✗Weak product/metric reasoning. Choosing misaligned metrics, ignoring guardrails, or being unable to interpret noisy A/B results suggests you can’t tie ML work to user/business outcomes.
✗Low signal behavioral stories. Vague narratives without clear ownership, decisions, and quantified impact (or inconsistent details under follow-up) can sink an otherwise strong technical loop.

Offer & Negotiation

Meta MLE compensation is typically a mix of base salary, an annual bonus target, and RSUs, with equity commonly vesting over 4 years (often heavier in earlier years at large tech, though schedules can vary by offer). Negotiation levers usually include RSU grant size, sign-on bonus (sometimes split across years), and level/title (which strongly drives pay bands); base has less flexibility than equity/sign-on at many levels. Practical approach: confirm level and competing offers first, then negotiate for additional RSUs/sign-on tied to market data and your specific strengths, and ask about refreshers and performance-based equity cadence for long-term upside.

Most candidates expect the loop to take a month or two, but timelines vary wildly depending on recruiter bandwidth, team headcount urgency, and how quickly you schedule rounds. One structural detail worth internalizing early: Meta often interviews MLEs into a general pool rather than a specific team, with team matching happening after the hiring committee signs off. This isn't universal (some reqs are team-specific), but if you're in the pool path, you can't bank on deep Ads ranking knowledge to paper over a weak coding performance. You need to show up strong across every round.

The behavioral round is where leveling decisions quietly get made, especially for candidates targeting E5. Hiring committee members look at your leadership and cross-functional signals to calibrate whether you belong at Senior or one level below. A soft showing here won't always reject you, but it can land you an E4 offer instead of E5, a gap that meaningfully changes your total comp and your starting negotiation position.

Meta Machine Learning Engineer Interview Questions

ML System Design & Serving

Expect questions that force you to design an end-to-end ML product (training, offline/online features, serving, monitoring, and retraining) under real latency and reliability constraints. Candidates often struggle to connect modeling choices to system bottlenecks like feature freshness, tail latency, and safe rollouts.

Design the online serving stack for a Facebook Feed ranking model that uses 200 features, including user, creator, and post embeddings, with a $50\,\text{ms}$ P99 budget and a requirement that engagement regressions over $0.5\%$ trigger an automatic rollback. What do you cache, what do you compute on request, and what monitoring signals and canary plan do you ship with?

EasyOnline Serving, Caching, Rollouts

Sample Answer

Most candidates default to putting every feature behind a single online feature store call, but that fails here because network fanout and embedding fetches blow up tail latency and make failures correlated. Split features by volatility and compute cost, precompute and cache stable features and embeddings keyed by (user, creator, post), and compute only truly request-scoped features (for example, session context) inline. Enforce strict timeouts, fallbacks, and a default score path so missing features do not wedge the request. For safety, canary by traffic slice and by geography, monitor P50 and P99 latency, feature missingness, model score distribution drift, and top-line engagement plus guardrail metrics, then auto-rollback on a sustained $>0.5\%$ drop with a minimum sample size gate.

You are launching an integrity and abuse detection model for Instagram comments that must score within $20\,\text{ms}$ P99 and keep false positives below $0.1\%$ while attackers adapt daily. Design the training, feature freshness, online inference, and retraining loop, including how you handle delayed labels and safe rollout when the data distribution shifts.

HardIntegrity Serving, Drift, Retraining

Practice more ML System Design & Serving questions

Machine Learning (RecSys, Search, Integrity)

Most candidates underestimate how much you’ll be pushed on choosing objectives and metrics for ranking/personalization and on diagnosing failure modes (bias, feedback loops, abuse/adversaries, cold start). You’ll need to justify tradeoffs among precision/recall, calibration, diversity, and long-term user value.

You ship a new Instagram Reels ranking model and total watch time goes up, but creator complaints spike that distribution feels unfair and repetitive. What two offline metrics would you add to your evaluation suite to catch this before launch, and what failure mode does each metric target?

EasyRecSys Metrics and Objectives

Sample Answer

Add (1) creator-level exposure inequality such as a Gini coefficient over impressions, and (2) an intra-session diversity metric such as average pairwise distance of topic embeddings (or unique-audio rate). The Gini flags concentration where a small set of creators get most impressions even if watch time rises. The diversity metric catches homogenization from over-optimizing short-term engagement, which drives repetitiveness and long-term churn risk.

In Facebook Search, an integrity classifier downranks queries likely tied to scams, and after launch you see fewer reported scams but also a drop in overall query success for new users and new locales. How do you redesign the objective and evaluation to reduce this regression under distribution shift while keeping abuse pressure low?

HardIntegrity Ranking Under Shift

Practice more Machine Learning (RecSys, Search, Integrity) questions

MLOps, Monitoring & Experimentation in Production

Your ability to keep models healthy after launch is a major hiring signal: data drift detection, alerting, on-call mitigations, and rollback strategies come up frequently. Interviewers probe whether you can set up guardrails (shadow/canary, model/versioning, reproducibility) that prevent silent regressions.

A new ranking model for Facebook Feed looks stable on offline AUC, but in production your main alert is a $+3\%$ increase in hides per impression within 30 minutes of a canary rollout. What two monitoring approaches could have caught this earlier, and which one is better here and why?

EasyMonitoring and Guardrails

Sample Answer

You could do threshold based metric alerts on hides per impression, or distribution shift monitoring on key features and model scores. Threshold alerts win here because the regression is directly in a business safety metric and it moved fast, so you want tight, low latency detection tied to user harm. Drift monitors are still useful, but they often lag, are harder to tune, and can fire on benign shifts.

You own an integrity model that blocks scam ads, and after a model refresh the block rate stays flat but advertiser appeals increase $15\%$ while manual review says precision dropped. How do you debug whether the issue is label delay, a logging bug, or a distribution shift, and what checks do you run in what order?

MediumProduction Debugging and Data Quality

Sample Answer

Walk through the logic step by step as if thinking out loud. Start by verifying instrumentation, compare event counts, join keys, and schema versions between the old and new pipelines, then replay a known time window to see if predictions and features match. Next account for label delay, compare precision using only matured labels, and look for time since impression in your evaluation slices. Finally test distribution shift by slicing on campaign type, geo, and device, then compare feature and score distributions, especially on high spend cohorts where appeals concentrate.

A PyTorch recommendation model for Instagram Reels is trained daily and served from a feature store, and you see intermittent regressions where CTR drops for one country only, then self recovers the next day. Design an end to end rollback and reproducibility strategy that lets on call pinpoint the cause within 1 hour and ship a safe mitigation without stopping training.

HardRollout, Rollback, and Reproducibility

Practice more MLOps, Monitoring & Experimentation in Production questions

Software Engineering (Production Quality)

The bar here isn’t whether you can write code, it’s whether you can build maintainable services and libraries that other engineers can safely extend. You’ll be evaluated on API design, testing strategy, code review instincts, and handling edge cases in large, long-lived codebases.

You are shipping a new ranking model behind a service endpoint used by Feed, Reels, and Search, and you must guarantee stable outputs for identical feature vectors across deploys. What versioning, API contract, and regression testing strategy do you put in place to catch silent feature schema changes and non-determinism before rollout?

EasyAPI Design and Testing Strategy

Sample Answer

Reason through it: Walk through the logic step by step as if thinking out loud. Start by freezing the request and response schema in an explicit contract, include feature names, types, defaults, and unknown-field behavior, then version it so clients can pin and you can safely evolve. Next, enforce determinism by controlling seeds, model eval mode, and any randomness in feature computation, then add golden tests that replay a fixed corpus of feature vectors and assert identical scores within a tight tolerance $\epsilon$. Add schema drift checks at build time (generated code or schema validation), at runtime (reject or log unknown and missing fields), and in CI (diff feature definitions and training serving parity tests). Finally, gate rollout with canary plus automated score distribution checks and alerting on shifts in key business metrics like CTR and negative feedback rate.

A candidate-generation service for Marketplace integrity uses a shared library to compute features, and after a library update you see a 0.7% drop in precision at fixed recall while offline metrics look unchanged. How do you debug and harden the system so this class of regressions cannot ship again?

HardProduction Debugging and Reliability

Practice more Software Engineering (Production Quality) questions

Algorithms & Data Structures (Coding)

In coding rounds, you’re expected to translate an ambiguous problem into correct, efficient code with clear complexity reasoning. Many strong ML candidates stumble by skipping invariants, not testing edge cases, or over-optimizing before getting a working solution.

You log a user’s last $k$ content exposures (post IDs) in a fixed-size ring buffer to compute real-time diversity metrics, and you need to remove duplicates while keeping the most recent occurrence order. Given a list of post IDs ordered from oldest to newest and an integer $k$, return the de-duplicated list of the last $k$ IDs in chronological order.

EasySliding Window, Hash Set

Sample Answer

This question is checking whether you can implement a sliding window with the right invariant and not lose ordering. You need to consider only the last $k$ items, then keep each ID’s last occurrence, and finally output in chronological order. Most people fail by de-duping the full list or by keeping the first occurrence instead of the last. Complexity should be $O(k)$ time and $O(k)$ space.

Python

1from collections import OrderedDict
2from typing import List
3
4
5def dedupe_last_k_chronological(post_ids: List[int], k: int) -> List[int]:
6    """Return de-duplicated post IDs from the last k exposures, keeping the most recent occurrence.
7
8    Input post_ids is ordered oldest -> newest.
9    Output is ordered oldest -> newest within the last k window, after keeping only the last occurrence.
10
11    Example:
12      post_ids = [1, 2, 1, 3, 2], k=4 -> last window [2,1,3,2] -> keep last occurrences -> [1,3,2]
13    """
14    if k <= 0 or not post_ids:
15        return []
16
17    window = post_ids[-k:]  # last k exposures, oldest -> newest
18
19    # OrderedDict preserves insertion order. We want order by LAST occurrence.
20    # Trick: on each ID, delete it if present, then re-insert so it moves to the end.
21    last_order = OrderedDict()
22    for pid in window:
23        if pid in last_order:
24            del last_order[pid]
25        last_order[pid] = None
26
27    return list(last_order.keys())
28
29
30if __name__ == "__main__":
31    assert dedupe_last_k_chronological([1, 2, 1, 3, 2], 4) == [1, 3, 2]
32    assert dedupe_last_k_chronological([1, 1, 1], 2) == [1]
33    assert dedupe_last_k_chronological([], 3) == []
34    assert dedupe_last_k_chronological([5, 6, 7], 0) == []
35

For Integrity, you stream account events (string tokens) and need the shortest contiguous window that contains all required signals with multiplicity, for example required = {"login":2,"reset":1}. Given an array of tokens and a dict of required counts, return the $(l,r)$ indices of the minimum window or (-1,-1) if none.

MediumMinimum Window Substring, Two Pointers

Sample Answer

The standard move is a two-pointer sliding window with a frequency map and a counter for how many required items are satisfied. But here, multiplicity matters because you cannot treat the required set as unique tokens. You expand right until all counts are met, then shrink left while preserving feasibility to minimize length. Done right, it is $O(n)$ time and $O(m)$ space where $m$ is the number of required token types.

Python

1from typing import Dict, List, Tuple
2
3
4def min_window_with_counts(tokens: List[str], required: Dict[str, int]) -> Tuple[int, int]:
5    """Return (l, r) for the shortest window containing all required tokens with multiplicity.
6
7    tokens: list of event tokens in time order.
8    required: dict token -> required count.
9
10    Returns (-1, -1) if no window satisfies.
11    """
12    if not tokens or not required:
13        return (-1, -1)
14
15    # Filter out non-positive requirements
16    need = {t: c for t, c in required.items() if c > 0}
17    if not need:
18        return (-1, -1)
19
20    have: Dict[str, int] = {t: 0 for t in need}
21    # satisfied_types counts how many token types currently meet their required count
22    satisfied_types = 0
23    total_types = len(need)
24
25    best_l, best_r = -1, -1
26    best_len = float("inf")
27
28    l = 0
29    for r, tok in enumerate(tokens):
30        if tok in need:
31            have[tok] += 1
32            if have[tok] == need[tok]:
33                satisfied_types += 1
34
35        # Try to shrink while feasible
36        while satisfied_types == total_types and l <= r:
37            window_len = r - l + 1
38            if window_len < best_len:
39                best_len = window_len
40                best_l, best_r = l, r
41
42            left_tok = tokens[l]
43            if left_tok in need:
44                have[left_tok] -= 1
45                if have[left_tok] < need[left_tok]:
46                    satisfied_types -= 1
47            l += 1
48
49    return (best_l, best_r)
50
51
52if __name__ == "__main__":
53    tokens = ["login", "view", "login", "reset", "view", "login"]
54    req = {"login": 2, "reset": 1}
55    assert min_window_with_counts(tokens, req) == (0, 3)  # [login, view, login, reset]
56    assert min_window_with_counts(["a", "b"], {"a": 2}) == (-1, -1)
57    assert min_window_with_counts([], {"x": 1}) == (-1, -1)
58

In a large-scale recommender, you need to pick $k$ candidate items with maximum expected gain subject to a dependency graph where an item can be served only if all its prerequisites are served (a DAG). Given $n$ items with gains and a list of directed edges prerequisite $\to$ item, return the maximum total gain achievable by selecting at most $k$ items.

HardDAG DP, Topological Order, Knapsack on Graph

Practice more Algorithms & Data Structures (Coding) questions

Data Pipelines & Feature Engineering at Scale

When pipeline questions appear, they focus on how you prevent training/serving skew and build reliable feature generation with massive logs and distributed compute. You should be ready to discuss backfills, late-arriving data, id joins, and guaranteeing correctness across offline and online paths.

You are building a daily training set for a Reels ranking model from impression, watch-time, and like logs, and the same features must be computed online for inference. How do you design the feature pipeline to prevent training serving skew when events arrive late and user IDs can be rekeyed (for example, app reinstall)?

MediumTraining Serving Skew

Sample Answer

The standard move is a single feature definition that compiles to both offline and online (same transforms, same defaults), plus time-aware joins keyed by entity and event time. But here, late data and ID churn matters because offline backfills can silently change labels and aggregates unless you pin feature values to an as-of timestamp and define explicit rekeying rules (mapping tables with validity windows). You also need parity tests that diff offline feature values against online logs on the same request IDs. That catches skew before it ships.

A HBase-backed online feature store serves a "user 7-day negative feedback rate" for Integrity ranking, and the offline Spark job uses event-time windows, while online updates are ingestion-time. What changes do you make so the feature matches between training and serving, and how do you validate it at Meta scale?

HardOnline Offline Consistency

Sample Answer

Get this wrong in production and your model learns on one distribution and serves on another, you will see offline AUC look fine while online integrity metrics regress (more false positives, more false negatives). The right call is to make both paths event-time consistent (watermarks, allowed lateness, and deterministic window boundaries), and to define clear behavior for missing and late events (freeze or revise with versioned features). You validate with request-level shadow logging, then daily parity dashboards that track feature deltas, missingness, and drift by slice (region, device, spam segment). Add canary rollouts that gate on these parity metrics.

You need a "mutual friends count" feature for People You May Know, computed from an undirected graph of edges and used both offline and in near-real-time serving. How do you compute it at scale without double counting, and what is your strategy for incremental updates and backfills when edges are deleted or privacy settings change?

MediumGraph Feature Engineering

Practice more Data Pipelines & Feature Engineering at Scale questions

Deep Learning & LLM/GenAI Applications

If the conversation turns to modern AI, you’ll be assessed on pragmatic usage—fine-tuning vs. prompting, retrieval, safety, and evaluation—not just architecture trivia. Candidates tend to miss how to measure quality and risk (hallucinations, abuse, privacy) in a product setting.

You are shipping an LLM-powered "Help me write" composer for Instagram DMs that must not leak phone numbers or emails from the conversation history. What is your deployment-time mitigation plan (prompting, retrieval, filtering, and logging), and what offline and online metrics prove it is working?

EasyLLM Safety and Evaluation

Sample Answer

Get this wrong in production and you leak PII into generated text, create compliance risk, and train users to trust unsafe outputs. The right call is layered controls: minimize what the model sees (truncate, redact), constrain generation (system policy, output filters), and instrument detection plus human escalation. Offline, measure PII leak rate, false positive block rate, and utility (task success, edit distance), then online track policy violation rate per $10^6$ messages, user friction (abandon, retries), and precision of the safety classifier on audited samples.

Meta Search wants an LLM to answer queries using retrieved posts and pages, but it sometimes hallucinates facts or cites the wrong source. Design an evaluation and training strategy that reduces hallucinations while keeping latency under 300 ms and explain how you would choose between RAG-only, fine-tuning, or both.

HardRAG, Fine-tuning, and Hallucination Control

Practice more Deep Learning & LLM/GenAI Applications questions

The top three areas all require you to reason about Meta's specific product surfaces (Feed ranking, Ads conversion models, Integrity classifiers) under real serving constraints like sub-50ms latency for 3B+ daily actives. That overlap is where compounding difficulty lives: a system design prompt about Reels ranking can quickly demand you diagnose feedback loops in creator distribution, then pivot to how you'd detect drift post-launch, crossing three areas in a single conversation. Most candidates over-index on algorithm practice and arrive underprepared for the production ML reasoning that dominates this loop.

Sharpen your system design and domain skills with questions that mirror Meta's actual interview mix at datainterview.com/questions.

How to Prepare for Meta Machine Learning Engineer Interviews

Know the Business

Updated Q1 2026

Official mission

“Build the future of human connection and the technology that makes it possible”

What it actually means

Meta aims to build the next evolution of social technology by investing heavily in immersive experiences like the metaverse and AI, while continuing to connect billions through its existing social media platforms. Its core strategy involves enhancing human connection through technological innovation and a robust advertising business model.

Menlo Park, CaliforniaHybrid - Flexible

Key Business Metrics

Revenue

$201B

+24% YoY

Market Cap

$1.7T

-11% YoY

Employees

79K

+6% YoY

Users

4.0B

Business Segments and Where DS Fits

Reality Labs

Focuses on VR, MR, and AR technologies, aiming to build the next computing platform. It involves significant investment in the VR industry and has recently right-sized its investment for sustainability. It manages the Quest VR platform and the Worlds platform.

DS focus: Improving how people are matched with apps and games, dramatically improving analytics on the platform to help developers reach and understand their audience.

Current Strategic Priorities

Empower developers and creators to build long-term, sustainable businesses.
Explicitly separate Quest VR platform from Worlds platform to allow both products to grow.
Double down on the VR developer ecosystem.
Shift the focus of Worlds to be almost exclusively mobile.
Invest in VR as a critical technology on the path to the next computing platform.
Support the third-party developer community and sustain VR investment over the long term.
Go all-in on mobile for Worlds to tap into a much larger market.
Deliver synchronous social games at scale by connecting them with billions of people on the world’s biggest social networks.
Streamline the company’s AR and MR roadmap.
Focus on AI.

Meta reported $201B in revenue for 2025, up 23.8% year over year, with advertising as the dominant business model. But the company's investment priorities are shifting fast. Zuckerberg's 2026 roadmap puts AI at the top of the stack: the Llama model family, the Meta AI assistant, and a new PyTorch-native agentic framework are all expanding headcount. Meanwhile, Reality Labs is separating its Quest VR platform from Worlds (which is going mobile-first), and the DS focus there is on improving how people get matched with apps and games, plus building better developer analytics.

The "why Meta?" answer that falls flat is the vague one. Instead of gesturing at open-source culture or "building the future of AI," anchor your answer to a specific surface you've researched. You could talk about the constraints of on-device ML for Quest headsets where model size and latency budgets differ sharply from server-side ranking, or about how the mobile-first pivot for Worlds creates new recommendation problems for a platform trying to connect social games with billions of users across Meta's networks.

Try a Real Interview Question

Streaming log-loss and AUC for a binary model

python

Implement a function that takes two equal-length lists $y\in\{0,1\}^n$ and $p\in(0,1)^n$ where $p_i$ is the predicted probability for label $y_i$, and returns a tuple $(\text{logloss}, \text{auc})$. Compute log-loss as $$-\frac{1}{n}\sum_{i=1}^n \left(y_i\log p_i + (1-y_i)\log(1-p_i)\right)$$ and compute AUC as the probability a random positive has higher score than a random negative, counting ties as $0.5$; raise a ValueError if the inputs are invalid.

Python

1from typing import List, Tuple
2
3
4def logloss_and_auc(y: List[int], p: List[float]) -> Tuple[float, float]:
5    """Return (logloss, auc) for binary labels y and predicted probabilities p.
6
7    Raises ValueError on invalid inputs (length mismatch, empty, non-binary labels,
8    probabilities not strictly in (0,1), or if AUC is undefined due to all labels being the same).
9    """
10    pass
11

Python

1from typing import List, Tuple
2import math
3
4
5def logloss_and_auc(y: List[int], p: List[float]) -> Tuple[float, float]:
6    """Return (logloss, auc) for binary labels y and predicted probabilities p.
7
8    Raises ValueError on invalid inputs (length mismatch, empty, non-binary labels,
9    probabilities not strictly in (0,1), or if AUC is undefined due to all labels being the same).
10    """
11    if y is None or p is None:
12        raise ValueError("Inputs must be non-null")
13    if len(y) != len(p):
14        raise ValueError("y and p must have the same length")
15    n = len(y)
16    if n == 0:
17        raise ValueError("Inputs must be non-empty")
18
19    pos = 0
20    neg = 0
21    ll_sum = 0.0
22
23    for yi, pi in zip(y, p):
24        if yi not in (0, 1):
25            raise ValueError("Labels must be 0 or 1")
26        if not (0.0 < pi < 1.0):
27            raise ValueError("Probabilities must be strictly between 0 and 1")
28
29        ll_sum += yi * math.log(pi) + (1 - yi) * math.log(1.0 - pi)
30        if yi == 1:
31            pos += 1
32        else:
33            neg += 1
34
35    if pos == 0 or neg == 0:
36        raise ValueError("AUC is undefined when all labels are the same")
37
38    logloss = -ll_sum / n
39
40    # AUC via ranking with tie handling using average ranks.
41    # Equivalent to Mann-Whitney U statistic.
42    order = sorted(range(n), key=lambda i: p[i])
43
44    sum_ranks_pos = 0.0
45    rank = 1
46    i = 0
47    while i < n:
48        j = i
49        score = p[order[i]]
50        while j < n and p[order[j]] == score:
51            j += 1
52
53        group_size = j - i
54        avg_rank = (rank + (rank + group_size - 1)) / 2.0
55
56        for k in range(i, j):
57            idx = order[k]
58            if y[idx] == 1:
59                sum_ranks_pos += avg_rank
60
61        rank += group_size
62        i = j
63
64    u = sum_ranks_pos - pos * (pos + 1) / 2.0
65    auc = u / (pos * neg)
66
67    return logloss, auc
68

700+ ML coding problems with a live Python executor.

Practice in the Engine

Meta's coding rounds reward candidates who can think through edge cases under time pressure, not just arrive at a correct algorithm eventually. Problems like this one test whether you write code the way you'd write it for a real codebase. Build that muscle at datainterview.com/coding, where the problems skew toward the ML-flavored patterns Meta favors.

Test Your Readiness

How Ready Are You for Meta Machine Learning Engineer?

1 / 10

ML System Design & Serving

Can you design an end to end online inference system for a high traffic ranking model, including request flow, feature fetching, latency budget, fallback behavior, and how you would handle model versioning and safe rollouts?

See how you handle Meta-specific topics like Reality Labs matching systems, ad auction mechanics, and integrity classification at scale, then target your weak areas at datainterview.com/questions.

Frequently Asked Questions

What technical skills are tested in Machine Learning Engineer interviews?

Core skills include Python, Java, SQL, plus ML system design (training pipelines, model serving, feature stores), ML theory (loss functions, optimization, evaluation), and production engineering. Expect both coding rounds and ML design rounds.

How long does the Machine Learning Engineer interview process take?

Most candidates report 4 to 6 weeks. The process typically includes a recruiter screen, hiring manager screen, coding rounds (1-2), ML system design, and behavioral interview. Some companies add an ML theory or paper discussion round.

What is the total compensation for a Machine Learning Engineer?

Total compensation across the industry ranges from $110k to $1184k depending on level, location, and company. This includes base salary, equity (RSUs or stock options), and annual bonus. Pre-IPO equity is harder to value, so weight cash components more heavily when comparing offers.

What education do I need to become a Machine Learning Engineer?

A Bachelor's in CS or a related field is standard. A Master's is common and helpful for ML-heavy roles, but strong coding skills and production ML experience are what actually get you hired.

How should I prepare for Machine Learning Engineer behavioral interviews?

Use the STAR format (Situation, Task, Action, Result). Prepare 5 stories covering cross-functional collaboration, handling ambiguity, failed projects, technical disagreements, and driving impact without authority. Keep each answer under 90 seconds. Most interview loops include 1-2 dedicated behavioral rounds.

How many years of experience do I need for a Machine Learning Engineer role?

Entry-level positions typically require 0+ years (including internships and academic projects). Senior roles expect 10-20+ years of industry experience. What matters more than raw years is demonstrated impact: shipped models, experiments that changed decisions, or pipelines you built and maintained.

Meta Machine Learning Engineer Interview Guide

Meta Machine Learning Engineer Role

A Typical Week

A Week in the Life of a Meta Machine Learning Engineer

Weekly time split

Projects & Impact Areas

Skills & What's Expected

Levels & Career Growth

Meta Machine Learning Engineer Levels

Work Culture

Meta Machine Learning Engineer Compensation

Meta Machine Learning Engineer Interview Process

Initial Screen

Recruiter Screen

Technical Assessment

Coding & Algorithms

Onsite

Coding & Algorithms

Machine Learning & Modeling

System Design

Product Sense & Metrics

Behavioral

Tips to Stand Out

Common Reasons Candidates Don't Pass

Meta Machine Learning Engineer Interview Questions

ML System Design & Serving

Machine Learning (RecSys, Search, Integrity)

MLOps, Monitoring & Experimentation in Production

Software Engineering (Production Quality)

Algorithms & Data Structures (Coding)

Data Pipelines & Feature Engineering at Scale

Deep Learning & LLM/GenAI Applications

How to Prepare for Meta Machine Learning Engineer Interviews

Try a Real Interview Question

Streaming log-loss and AUC for a binary model

Test Your Readiness

Frequently Asked Questions

Dan Lee

Related Articles

Salesforce AI Engineer Interview Guide

Two Sigma Data Scientist Interview Guide

TikTok Data Engineer Interview Guide