Evaluating ML Models

Overfitting vs underfitting, accuracy vs precision vs recall, and the metrics vocabulary the exam tests.

9 min read

Overfitting and underfitting

Problem	What happened	Symptom	Fixes
Overfitting	Model memorized the training data, including its noise	Great on training data, poor on new data	More/varied data, simpler model, regularization, early stopping
Underfitting	Model too simple to capture the pattern	Poor on training AND new data	More complex model, better features, train longer

Think of it like this

An overfit model is a student who memorized last year's exam answers — perfect on the practice test, lost on new questions. An underfit model is a student who barely skimmed the book — bad at both.

Classification metrics

Key points

Accuracy — share of all predictions that were correct. Misleading on imbalanced data (a model calling everything "not fraud" is 99.9% accurate and useless).
Precision — of everything the model flagged positive, how much really was? Optimize when false positives are costly (e.g., blocking legitimate customers).
Recall — of all actual positives, how many did the model catch? Optimize when missing a positive is costly (e.g., cancer screening, fraud).
F1 score — harmonic mean of precision and recall; a balanced single number.
AUC-ROC — how well the model separates classes across all thresholds (1.0 = perfect, 0.5 = coin flip).

Exam tip

Memorize the trade-off: "can't afford to miss any" → recall; "false alarms are expensive" → precision; "imbalanced dataset" → accuracy is the wrong metric, use F1/AUC.

Regression metrics & business metrics

Key points

Regression (numeric predictions) uses error measures: MAE, MSE/RMSE — lower is better; R² — how much variance the model explains.
Technical metrics aren't the finish line: models are judged by business metrics too — revenue lift, cost per prediction, customer satisfaction, ROI.

Knowledge check

Question 1 of 4

A model performs excellently on its training data but poorly on new, unseen data. What is this called?

PreviousThe ML Lifecycle & Data NextThe AWS AI/ML Service Stack