Evaluating ML Models

Overfitting vs underfitting, accuracy vs precision vs recall, and the metrics vocabulary the exam tests.

9 min read

Overfitting and underfitting

ProblemWhat happenedSymptomFixes
OverfittingModel memorized the training data, including its noiseGreat on training data, poor on new dataMore/varied data, simpler model, regularization, early stopping
UnderfittingModel too simple to capture the patternPoor on training AND new dataMore complex model, better features, train longer
Think of it like this

An overfit model is a student who memorized last year's exam answers — perfect on the practice test, lost on new questions. An underfit model is a student who barely skimmed the book — bad at both.

Classification metrics

Key points

  • Accuracy — share of all predictions that were correct. Misleading on imbalanced data (a model calling everything "not fraud" is 99.9% accurate and useless).
  • Precision — of everything the model flagged positive, how much really was? Optimize when false positives are costly (e.g., blocking legitimate customers).
  • Recall — of all actual positives, how many did the model catch? Optimize when missing a positive is costly (e.g., cancer screening, fraud).
  • F1 score — harmonic mean of precision and recall; a balanced single number.
  • AUC-ROC — how well the model separates classes across all thresholds (1.0 = perfect, 0.5 = coin flip).
Exam tip

Memorize the trade-off: "can't afford to miss any" → recall; "false alarms are expensive" → precision; "imbalanced dataset" → accuracy is the wrong metric, use F1/AUC.

Regression metrics & business metrics

Key points

  • Regression (numeric predictions) uses error measures: MAE, MSE/RMSE — lower is better; — how much variance the model explains.
  • Technical metrics aren't the finish line: models are judged by business metrics too — revenue lift, cost per prediction, customer satisfaction, ROI.
Knowledge check
Question 1 of 4

A model performs excellently on its training data but poorly on new, unseen data. What is this called?