Welcome to DS/AI Interview Pulse â a series helping you crush ML interviews one real-world topic at a time. Think less textbook, more âhow this shows up in a Stripe or Meta loop.â
This week: missing values.
Simple? Yes. Ignored? Constantly. Mismanaged in interviews? All the time..
đ¨ Quick Hit: Why Interviewers Ask This
Because itâs everywhere.
And because how you handle missing data tells them if you:
Know how real datasets actually look
Can build pipelines that wonât crash
Avoid subtle leakage that wrecks models
Also: cleaning data is still 60â70% of the job. It's boring to some, but if you skip it, you're out.
đŻ The Ask Behind the Ask
If they ask:
âHow do you detect missing values in a dataset?â
Theyâre actually testing:
đ§ Are you comfortable with
isnull()
,.info()
, and value counts?đ Do you notice weird encodings? (like
"?"
,-999
,None
)đ Do you use visualizations (heatmaps, missingno)?
â ď¸ Do you think about what causes missingness?
Itâs not just ârun .isnull().sum()â. Itâs:
â Can you spot broken data pipelines, business rules, or bugs?
đ§Ş In Code â What You Should Say (and Show)
# Step 1: Basic null scan
df.isnull().sum()
df.info()
# Step 2: % missing per column
df.isnull().mean().sort_values(ascending=False)
# Step 3: Visual
import seaborn as sns
import matplotlib.pyplot as plt
sns.heatmap(df.isnull(), cbar=False)
Also: mention missingno.matrix(df)
â bonus points.
đ§ If You're Smart, Youâll Say This
âFirst I check the shape of missingness with
.info()
and.isnull().mean()
. Then I look for weird patterns â like if high-income users have fewer blanks. That tells me if data is missing at random, or if it's tied to a group. Also â I always search for things like â-999â or âmissingâ â those arenât nulls but they should be.â
This is what interviewers love:
Realism
Risk thinking
Awareness of traps
â ď¸ Common Pitfall (They Hope You Miss It)
Data isnât always missing as
NaN
. Itâs often hidden.
Look for:
"?"
,"N/A"
,"Unknown"
,0
,-1
,-999
Empty strings (
""
)Categorical features with
"none"
or"blank"
as levels
Pro tip: Run .value_counts(dropna=False)
for every column you suspect.
đŁď¸ Mock Follow-Up Question (That Catches People)
"If a feature has 45% missing, what do you do?"
Wrong answers:
âDrop it.â
âJust fill it with the mean.â
Better:
âDepends. If itâs informative (like Employment History), I might create a âmissingâ flag and treat missing as a category. If itâs random noise or no business meaning, I might drop it. But Iâd check first if missingness correlates with the target.â
đŚ The Interviewerâs Checklist
Theyâre looking to check if you:
Mention
.isnull()
or.info()
Bring up weird encodings
Show awareness of correlation with target
Donât blindly drop columns or rows
Can tie this to real-world ETL problems
đ§Š Mini Case: The Curveball
Youâre working with a hiring dataset.
Education_Level
is missing in 38% of rows. Turns out, 90% of those are intern applicants.
What do you do?
Best answer shows:
You ask why itâs missing
You segment the missingness
You donât panic-drop or blindly impute
đ Coming Up
Episode 002: Filling Missing Data â The Safe, the Lazy, and the Risky
Mean imputation is for rookies. Letâs talk smarter strategies (and how to explain them like a pro).
Want the next episodes in your inbox? Hit subscribe.