đ§ Episode 001: â AI Interview
Detecting Missing Data
Welcome to DS/AI Interview Pulse â a series helping you crush ML interviews one real-world topic at a time. Think less textbook, more âhow this shows up in a Stripe or Meta loop.â
This week: missing values.
Simple? Yes. Ignored? Constantly. Mismanaged in interviews? All the time..
đ¨ Quick Hit: Why Interviewers Ask This
Because itâs everywhere.
And because how you handle missing data tells them if you:
Know how real datasets actually look
Can build pipelines that wonât crash
Avoid subtle leakage that wrecks models
Also: cleaning data is still 60â70% of the job. It's boring to some, but if you skip it, you're out.
đŻ The Ask Behind the Ask
If they ask:
âHow do you detect missing values in a dataset?â
Theyâre actually testing:
đ§ Are you comfortable with
isnull(),.info(), and value counts?đ Do you notice weird encodings? (like
"?",-999,None)đ Do you use visualizations (heatmaps, missingno)?
â ď¸ Do you think about what causes missingness?
Itâs not just ârun .isnull().sum()â. Itâs:
â Can you spot broken data pipelines, business rules, or bugs?
đ§Ş In Code â What You Should Say (and Show)
# Step 1: Basic null scan
df.isnull().sum()
df.info()
# Step 2: % missing per column
df.isnull().mean().sort_values(ascending=False)
# Step 3: Visual
import seaborn as sns
import matplotlib.pyplot as plt
sns.heatmap(df.isnull(), cbar=False)Also: mention missingno.matrix(df) â bonus points.
đ§ If You're Smart, Youâll Say This
âFirst I check the shape of missingness with
.info()and.isnull().mean(). Then I look for weird patterns â like if high-income users have fewer blanks. That tells me if data is missing at random, or if it's tied to a group. Also â I always search for things like â-999â or âmissingâ â those arenât nulls but they should be.â
This is what interviewers love:
Realism
Risk thinking
Awareness of traps
â ď¸ Common Pitfall (They Hope You Miss It)
Data isnât always missing as
NaN. Itâs often hidden.
Look for:
"?","N/A","Unknown",0,-1,-999Empty strings (
"")Categorical features with
"none"or"blank"as levels
Pro tip: Run .value_counts(dropna=False) for every column you suspect.
đŁď¸ Mock Follow-Up Question (That Catches People)
"If a feature has 45% missing, what do you do?"
Wrong answers:
âDrop it.â
âJust fill it with the mean.â
Better:
âDepends. If itâs informative (like Employment History), I might create a âmissingâ flag and treat missing as a category. If itâs random noise or no business meaning, I might drop it. But Iâd check first if missingness correlates with the target.â
đŚ The Interviewerâs Checklist
Theyâre looking to check if you:
Mention
.isnull()or.info()Bring up weird encodings
Show awareness of correlation with target
Donât blindly drop columns or rows
Can tie this to real-world ETL problems
đ§Š Mini Case: The Curveball
Youâre working with a hiring dataset.
Education_Levelis missing in 38% of rows. Turns out, 90% of those are intern applicants.
What do you do?
Best answer shows:
You ask why itâs missing
You segment the missingness
You donât panic-drop or blindly impute
đ Coming Up
Episode 002: Filling Missing Data â The Safe, the Lazy, and the Risky
Mean imputation is for rookies. Letâs talk smarter strategies (and how to explain them like a pro).
Want the next episodes in your inbox? Hit subscribe.

