Principles of Responsible AI
The dimensions of trustworthy AI — fairness, explainability, transparency, privacy, robustness — and where bias comes from.
The core dimensions
- Fairness — the system's outcomes don't disadvantage groups (age, gender, ethnicity…).
- Explainability — humans can understand *why* the model made a decision.
- Transparency — being open about how the system works, its data, and its limits.
- Privacy & security — protecting personal data throughout the AI lifecycle.
- Robustness — reliable behavior even on unexpected or adversarial inputs.
- Veracity/safety — outputs are accurate and not harmful.
- Controllability & governance — humans can monitor, guide, and override the system; accountability is assigned.
Where bias comes from
| Source | Example |
|---|---|
| Unrepresentative training data (sampling bias) | A hiring model trained mostly on one demographic's résumés |
| Historical bias baked into data | Loan data reflecting decades of discriminatory lending |
| Measurement/labeling bias | Human labelers applying inconsistent standards |
| Algorithmic amplification | The model exaggerates small imbalances present in data |
Mitigations to recognize: diverse, representative, high-quality training data; bias audits before and after deployment (SageMaker Clarify); human oversight for consequential decisions; continuous monitoring of live behavior. "Garbage in, garbage out" is tested — biased data yields biased models regardless of the algorithm.
The interpretability trade-off
Simple models (linear regression, decision trees) are interpretable — you can read exactly why they decided. Deep neural networks and LLMs are far more capable but relatively opaque ("black boxes"). Regulated, high-stakes decisions (credit, hiring, healthcare) may favor an interpretable model, or require explainability tooling and human review layered on the powerful one.
An interpretable model is a chef who explains every ingredient. A deep network is a chef whose dishes are superb but whose recipe even they can't fully articulate. For a casual dinner that's fine; for allergy-safe catering, you need the ingredient list.
A model systematically scores loan applicants from one region lower because the training data reflected historical discrimination. Which responsible AI dimension is violated?