Evaluating ML Models
Overfitting vs underfitting, accuracy vs precision vs recall, and the metrics vocabulary the exam tests.
9 min read
Overfitting and underfitting
| Problem | What happened | Symptom | Fixes |
|---|---|---|---|
| Overfitting | Model memorized the training data, including its noise | Great on training data, poor on new data | More/varied data, simpler model, regularization, early stopping |
| Underfitting | Model too simple to capture the pattern | Poor on training AND new data | More complex model, better features, train longer |
Think of it like this
An overfit model is a student who memorized last year's exam answers — perfect on the practice test, lost on new questions. An underfit model is a student who barely skimmed the book — bad at both.
Classification metrics
Key points
- Accuracy — share of all predictions that were correct. Misleading on imbalanced data (a model calling everything "not fraud" is 99.9% accurate and useless).
- Precision — of everything the model flagged positive, how much really was? Optimize when false positives are costly (e.g., blocking legitimate customers).
- Recall — of all actual positives, how many did the model catch? Optimize when missing a positive is costly (e.g., cancer screening, fraud).
- F1 score — harmonic mean of precision and recall; a balanced single number.
- AUC-ROC — how well the model separates classes across all thresholds (1.0 = perfect, 0.5 = coin flip).
Exam tip
Memorize the trade-off: "can't afford to miss any" → recall; "false alarms are expensive" → precision; "imbalanced dataset" → accuracy is the wrong metric, use F1/AUC.
Regression metrics & business metrics
Key points
- Regression (numeric predictions) uses error measures: MAE, MSE/RMSE — lower is better; R² — how much variance the model explains.
- Technical metrics aren't the finish line: models are judged by business metrics too — revenue lift, cost per prediction, customer satisfaction, ROI.
Knowledge check
Question 1 of 4A model performs excellently on its training data but poorly on new, unseen data. What is this called?