Got a machine learning interview lined up? Chances are that you are interviewing for ML engineering and/or data scientist position. Companies that have ML interview portions are Google, Meta, Stripe, McKinsey, and startups. And, the ML questions are peppered throughout the technical screen, take-home, and on-site rounds. So, what are entailed in the ML engineering interview? There are generally five areas👇
📚 ML Interview Areas
Area 1 - ML Coding
ML coding is similar to LeetCode style, but the main difference is that it is the application of machine learning using coding. Expect to write ML functions from scratch. In some cases, you will not be allowed to import third-party libraries like SkLearn as the questions are designed to assess your conceptual understanding and coding ability.
# Sample Questions 1. [Uber] Write an AUC from scratch using vanilla Python 2. [Google] Write the K-Means algorithm using Numpy only
Area 2 - ML Theory (”Breath”)
These assess the candidate’s breath of knowledge in machine learning. Conceptual understanding of ML theories including the bias-variance trade-off, handling imbalanced labels, and accuracy vs interpretability are what’s assessed in ML theory interviews.
# Sample Questions 1. [Amazon] Explain how the cross-validation work 2. [Etsy] How do you handle imbalanced labels in classification models 3. [McKinsey] What is the variance-bias trade-off?
Area 3 - ML Algorithms (”Depth”)
Don’t confuse ML algorithms (sometimes called “Depth”) as the same coverage as ML “Breath”. While ML breath covers the general understanding of machine learning. ML Depth, on the other hand, assesses an in-depth understanding of the particular algorithm. For instance, you may have a dedicated round just focusing on the random forest. E.g. Here’s a sample question set you could be asked in a single round at Amazon.
# Sample Questions 1. [Amazon] What is the pseudocode of the Random Forest model? 2. [Amazon] What is the variance and bias of the Random Forest model? 3. [Amazon] How is the Random Forest different from Gradient Boosted Trees?
Area 4 - Applied ML / Business Case
These are solve ML cases in the context of a business problem. Scalability and productionization are not the main concern as they are more so relevant in ML system design portions. Business case could be assessed in various form; it could be verbal explanation, or hands-on coding on Jupyter or Colab.
# Sample Questions 1. [Google] Given a dataset that contains purchase history on PlayStore, how would you build a propensity score model? 2. [PayPal] How would you build a fraud model without labels? 3. [Apple] How would you identify meaningful segmentation?
Area 5 - ML System Design
These assess the soundness and scalability of the ML system design. They are often assessed in the ML engineering interview, and you will be required to discuss the functional & non-functional requirements, architecture overview, data preparation, model training, model evaluation, and model productionization.
# Sample Questions 1. [Google] Build a real-time translation system 2. [Uber] Build a real-time ETA model 3. [Amazon] Build a recommender system for product search
📚 ML Questions x Track (e.g. product analyst, data scientist, MLE)
Depending on the tracks, the type of ML questions you will be exposed to will vary. Here are some examples. Consider the following questions posed in various roles:
- Product Analyst - Build a model that can predict the lifetime value of a customer
- Data Scientist (Generalist) - Build a fraud detection model using credit card transactions
- ML Engineering - Build a recommender system that can scale to 10 million daily active users
For product analyst roles, the emphasis is on the application of ML on product analysis, user segmentation, and feature improvement. Rigor in scalable system is not required as most of the analysis is conducted on offline dataset.
For data scientist roles, you will most likely be assessed on ML breath, depth, and business case challenges. Understanding scalable systems is not required unless the role is more focused on “full-stack” type of data science role.
For ML engineering role, you will be asked coding, ML breath & depth and ML system design design questions. You will most likely have dedicated rounds on ML coding and ML system design with ML breath & depth questions peppered throughout the interview process.
✍️ 7 Algorithms You Should Know
In general you should have a in-depth understanding of the following algorithms. Understand the assumption, application, trade-offs and parameter tuning of these 7 ML algorithms. The most important aspect isn’t whether you understand 20+ ML algorithms. What’s more important is that you understand how to leverage 7 algorithms in 20 different situations.
- Linear Regression
- Logistic Regression
- Decision Tree
- Random Forest
- Gradient Boosted Trees
- Dense Neural Networks
📝 More Questions
- What is the difference between supervised and unsupervised learning?
- Can you explain the concept of overfitting and underfitting in machine learning models?
- What is cross-validation? Why is it important?
- Describe how a decision tree works. When would you use it over other algorithms?
- What is the difference between bagging and boosting?
- How would you validate a model you created to generate a predictive analysis?
- How does KNN work?
- What is PCA?
- How would you perform feature selection?
- What are the advantages and disadvantages of a neural network?
💡 Prep Tips
Tip 1 - Understand How ML Interviews are Screen
The typical format is 20 to 40 minutes embedded in a technical phone screen or a dedicated ML round within an onsite. You will be assessed by Sr./Staff-level data scientist or ML engineer. Here’s a sample video. You can also get coaching with a ML interviewer at FAANG companies: https://www.datainterview.com/coaching
Tip 2 - Practice Explaining Verbally
Interviewing is not a written exercise, it’s a verbal exercise. Whether the interviewer asks you conceptual knowledge of ML, coding question, or ML system design, you will be expected to explain with clarity and in-details. As you practice interview questions, practice verbally.
Tip 3 - Join the Ultimate Prep
Get access to ML questions, cases and machine learning mock interview recordings when you join the interview program: Join the Data Science Ultimate Prep created by FAANG engineers/Interviewers