Python Interview Questions

Dan Lee's profile image
Dan LeeData & AI Lead
Last updateMarch 13, 2026

Python interviews at Meta, Google, Amazon, and Netflix go far beyond basic syntax. These companies expect data scientists and ML engineers to demonstrate mastery of pandas performance quirks, NumPy broadcasting edge cases, and memory-efficient data processing patterns. You'll face questions about dictionary internals, generator expressions, and why your multiprocessing code deadlocks in production.

What makes Python interviews particularly challenging is that correct-looking code often hides subtle bugs. A lambda in a loop might capture the wrong variable, a mutable default argument creates shared state across function calls, or a seemingly innocent `if data_value:` check fails when the value is zero or an empty string. Interviewers specifically design questions to expose these gotchas because they reflect real production issues.

Here are the top 29 Python interview questions organized by core language fundamentals, algorithms and data structures, pandas and NumPy proficiency, object-oriented design, and production readiness.

Intermediate29 questions

Python Interview Questions

Top Python interview questions covering the key areas tested at leading tech companies. Practice with real questions and detailed solutions.

Data AnalystData ScientistData EngineerQuantitative ResearcherAI EngineerMachine Learning EngineerMetaGoogleAmazonAppleNetflixStripeTwo SigmaCitadel

Core Python Syntax, Types, and Idioms

Interviewers test core Python syntax because subtle language behaviors cause production bugs that cost engineering teams weeks to debug. Most candidates know basic Python but fail when asked about mutable defaults, variable scoping in closures, or truthiness edge cases with data types.

The key insight is that Python's 'batteries included' philosophy creates hidden complexity. Empty strings, zero values, and None all behave differently in boolean contexts, which matters when you're filtering missing data in a pandas pipeline.

Core Python Syntax, Types, and Idioms

Start with how you write and reason about everyday Python: truthiness, mutability, scope, iteration, and common built-ins. You often stumble here when you rely on intuition instead of Python's exact semantics under time pressure.

You are cleaning a pandas Series of string IDs, and you write `if id_str:` to skip missing values. What values will incorrectly pass or fail this check in Python, and what exact condition would you use instead?

MetaMetaMediumCore Python Syntax, Types, and Idioms

Sample Answer

Most candidates default to `if x:` as a universal missingness check, but that fails here because truthiness is not the same as "missing" in Python. Empty strings `""`, `0`, `0.0`, empty containers, and `False` all evaluate to `False`, so you can accidentally drop legitimate values like `"0"` or `0` depending on your pipeline. For strings, use `if id_str is not None and id_str != "":` when you mean non-empty, or `if id_str is not None:` when empty string is valid. If you are dealing with pandas missingness like `NaN`, you need an explicit check like `pd.notna(id_str)` rather than relying on truthiness.

Practice more Core Python Syntax, Types, and Idioms questions

Data Structures and Algorithms in Python

Data structure and algorithm questions in Python interviews focus on choosing the right collections and understanding their performance characteristics under scale. Candidates often write code that works on small datasets but becomes unusably slow with millions of records.

Python's rich standard library means there's usually an optimal data structure for each problem. Using collections.deque for sliding windows, heapq for streaming data, or Counter for frequency analysis can turn an O(n²) solution into O(n log n) or better.

Data Structures and Algorithms in Python

Interviewers probe whether you can choose the right structure, list, dict, set, heap, deque, and implement efficient logic with clear complexity tradeoffs. You typically struggle when you code a correct solution that is accidentally quadratic or memory heavy on large inputs.

You are given a list of event IDs (strings) and an integer k. Return the k most frequent IDs, breaking ties by lexicographic order, and you must handle up to 10 million events.

AmazonAmazonMediumData Structures and Algorithms in Python

Sample Answer

Use a frequency dict plus a size-k heap to keep only the top candidates. You count with a dict in $O(n)$ time, then push (freq, id) into a heap and pop when size exceeds $k$, giving $O(m\log k)$ for $m$ unique IDs. This avoids sorting all uniques, which is $O(m\log m)$ and can be slow when $m$ is large. Memory is $O(m)$ for counts plus $O(k)$ for the heap.

Practice more Data Structures and Algorithms in Python questions

NumPy and Pandas for Data Work

NumPy and pandas questions separate strong Python data practitioners from those who just know basic syntax. These libraries have non-obvious behaviors around indexing, broadcasting, and memory management that interviewers exploit to test deep understanding.

The critical skill is thinking vectorized-first instead of writing Python loops. A pandas groupby with transform can replace hundreds of lines of manual aggregation code, but only if you understand how alignment and indexing work under the hood.

NumPy and Pandas for Data Work

In data roles, you are evaluated on vectorization, indexing, joins, groupby, time series handling, and avoiding slow row-wise operations. You can lose points when your solution works on small samples but fails on edge cases, missing values, or performance constraints.

You have a Pandas DataFrame of ad impressions with columns [user_id, ts, campaign_id, cost]. You need each user's 7 day rolling sum of cost aligned to each row, and it must handle missing timestamps and multiple events per day efficiently.

MetaMetaHardNumPy and Pandas for Data Work

Sample Answer

You could do a row-wise apply that filters the prior 7 days per row, or you could sort by user and time and use a time-based rolling window. The rolling approach wins here because it is vectorized and typically $O(n)$ per group rather than $O(n^2)$. Set ts to datetime, sort by [user_id, ts], set an index on ts within each user, then use groupby('user_id')['cost'].rolling('7D').sum().reset_index(level=0, drop=True). Be explicit about missing values, for example fill cost with 0 if business logic says missing means zero spend, otherwise leave NaN and decide whether to use min_periods.

Practice more NumPy and Pandas for Data Work questions

Object-Oriented Design and Python Internals

Object-oriented design questions reveal whether you understand Python's data model and can build maintainable systems. Candidates frequently write classes that break when used as dictionary keys, create memory leaks through circular references, or violate inheritance contracts.

Python's flexibility becomes a liability without proper design. The difference between __eq__ and __hash__, when to use dataclasses versus regular classes, and how descriptors work for property validation directly impact code reliability in production systems.

Object-Oriented Design and Python Internals

Expect questions that check how you design clean interfaces, use dataclasses, properties, inheritance, and understand dunder methods, hashing, and equality. You commonly get tripped up when objects behave unexpectedly in sets, dict keys, or when state mutability leaks across instances.

You define a @dataclass(frozen=True) named FeatureKey with fields name: str and tags: list[str], then you use instances as dict keys for a feature store cache. It crashes at runtime, what exactly is happening and how do you redesign it so keys are stable and hashable?

StripeStripeMediumObject-Oriented Design and Python Internals

Sample Answer

Reason through it: A frozen dataclass auto-generates __hash__ based on its fields, so Python tries to hash tags. A list is mutable and unhashable, so hashing the object raises TypeError when you insert it as a dict key. The fix is to make all fields participating in identity hashable, for example tags: tuple[str, ...] or frozenset[str]. If you still need list semantics internally, store a tuple for identity and expose a property that returns a list copy.

Practice more Object-Oriented Design and Python Internals questions

Performance, Concurrency, and Production Readiness

Performance and concurrency questions test whether your Python code can handle production workloads. Many candidates write code that works locally but fails under realistic data volumes or concurrent access patterns.

The biggest mistake is assuming threads will speed up CPU-bound work in CPython due to the Global Interpreter Lock. Understanding when to use multiprocessing, asyncio, or thread pools requires knowing exactly what type of bottleneck you're solving.

Performance, Concurrency, and Production Readiness

Finally, you are tested on writing Python that scales: profiling, memory behavior, the GIL, multiprocessing vs threading, async I/O, and practical optimization. You may struggle to justify tradeoffs, or to explain why a solution is slow in CPython despite looking parallel.

You have a Python pipeline that processes 50 million rows and suddenly got 3x slower after a refactor. Walk me through how you would profile it end to end and identify whether the bottleneck is Python overhead, Pandas/Numpy internals, or I/O.

MetaMetaMediumPerformance, Concurrency, and Production Readiness

Sample Answer

This question is checking whether you can isolate bottlenecks with evidence, not vibes. You start with coarse timing around stages, then use cProfile or py-spy to find hot functions, and line_profiler only after you have a suspect. You validate whether time is in Python frames versus native code by looking at where the profiler attributes CPU, and you corroborate with I/O metrics like bytes read and wait time. You then rerun with representative data and fixed environment settings so you can attribute the regression to a specific change.

Practice more Performance, Concurrency, and Production Readiness questions

How to Prepare for Python Interviews

Practice with realistic data volumes

Work with DataFrames containing at least 1 million rows and time your operations. Many pandas methods that feel fast on small datasets become unusably slow at scale, and you need to experience this firsthand.

Test your assumptions about mutability

Create examples where mutable default arguments, shared references, and in-place operations cause unexpected behavior. Run the code and observe what actually happens rather than guessing.

Profile before optimizing

Use cProfile and line_profiler on actual data processing tasks to see where time is really spent. Your intuition about performance bottlenecks is probably wrong until you measure.

Master pandas without loops

Challenge yourself to solve every data manipulation problem using vectorized operations, groupby transforms, or merge/join patterns. If you're writing a Python for loop over DataFrame rows, there's usually a better way.

Build classes that work as dictionary keys

Practice implementing __hash__ and __eq__ correctly for custom classes. Create objects, put them in sets and dictionaries, and verify the behavior matches your expectations.

How Ready Are You for Python Interviews?

1 / 6
Core Python Syntax

You are writing a CLI tool. A function has a default parameter set to an empty list, and users report that repeated runs in the same process keep appending old values. What change best fixes the bug while keeping the API clean?

Frequently Asked Questions

How deep do I need to know Python for a data focused interview?

You should be comfortable writing clean, correct Python under time pressure and explaining your choices. Expect to know core syntax, data structures, functions, error handling, and common standard library tools like itertools, collections, datetime, and re. For data roles, you also need strong pandas and NumPy basics, plus debugging and performance fundamentals like vectorization and avoiding unnecessary loops.

Which companies tend to ask the most Python interview questions?

Companies with data heavy products and ML driven teams tend to emphasize Python the most, including major tech firms, fintechs, and AI first startups. You will also see lots of Python at cloud and data platform companies because their pipelines and services often use Python for orchestration and tooling. The best signal is the specific team, for example analytics engineering may focus on pandas and SQL, while ML infrastructure may focus on packaging, testing, and performance.

Is live coding required in Python interviews for these roles?

Often yes, you should expect at least one live or take home exercise where you write Python code. Data Analyst and Data Scientist loops frequently include pandas wrangling, data cleaning, and basic scripting, while Data Engineer and ML Engineer loops can include writing robust functions, parsing logs, or building small pipeline components. You should practice writing code from scratch and validating it quickly using datainterview.com/coding.

How do Python interview expectations differ across Data Analyst, Data Scientist, Data Engineer, Quant, AI Engineer, and ML Engineer roles?

Data Analysts are usually tested on pandas, data cleaning, joins, time series handling, and clear plotting or summary logic in Python. Data Scientists and Quantitative Researchers often face heavier statistics, simulation, and numerical computing in Python, including NumPy, SciPy style thinking, and careful handling of randomness and performance. Data Engineers, AI Engineers, and ML Engineers are more likely to be tested on production Python, for example modules, typing, testing, I O, concurrency basics, and writing code that is reliable and maintainable.

How can I prepare for Python interviews if I have no real world experience?

Build a small portfolio of scripts that solve realistic problems, such as cleaning a messy CSV, generating features, and training and evaluating a simple model, all in a single reproducible Python project. Focus on fundamentals like reading and writing files, working with pandas, writing functions, adding basic tests, and documenting assumptions. Then drill common Python interview patterns using datainterview.com/questions and practice coding fluency on datainterview.com/coding.

What are common Python specific mistakes to avoid in interviews?

Do not rely on vague pandas magic, you should explain what a groupby, merge, or apply is doing and its performance impact. Avoid writing overly clever one liners that you cannot debug, prefer readable code with intermediate variables and checks for nulls and edge cases. Also avoid mutability pitfalls, for example using a mutable default argument, and make sure you handle types, timezones, and integer division correctly.

Dan Lee's profile image

Written by

Dan Lee

Data & AI Lead

Dan is a seasoned data scientist and ML coach with 10+ years of experience at Google, PayPal, and startups. He has helped candidates land top-paying roles and offers personalized guidance to accelerate your data career.

Connect on LinkedIn