Customizing Foundation Models
The adaptation ladder — prompting → RAG → fine-tuning → pre-training — and how to pick a model and method.
There's a ladder of ways to make a foundation model fit your needs. Each rung adds capability — and cost, data requirements, and complexity. The exam constantly asks you to pick the right rung for a scenario.
The adaptation ladder (cheapest → most expensive)
| Method | What it changes | Data needed | Typical trigger |
|---|---|---|---|
| Prompt engineering | Nothing in the model — just better instructions | None | First attempt at any task |
| RAG | What the model can *reference* | Your documents (unlabeled) | Company knowledge, current facts, citations |
| Fine-tuning | The model's *weights* — behavior, style, format, domain language | Hundreds–thousands of labeled examples | Consistent specialized behavior prompting can't achieve |
| Continued pre-training | Deep domain knowledge in the weights | Large unlabeled domain corpus | Heavy domain adaptation (medical, legal) |
| Train from scratch | Everything | Internet-scale data + millions in compute | Almost never the right answer on this exam |
Data-type tell: fine-tuning uses labeled prompt/response pairs; continued pre-training uses unlabeled domain text. And in Bedrock, customized models produce a private copy — the shared base model is never altered by your data — and require Provisioned Throughput to serve.
Choosing the base model
Selection criteria
- Capability/quality on your task (benchmarks + your own evaluation).
- Modality — text, image, embeddings, multimodal.
- Context window — how much input it must hold.
- Cost and latency — smaller models are cheaper and faster; prefer the smallest that meets quality.
- Customization options — does it support fine-tuning?
- Licensing — proprietary vs open weights.
Prompting is giving directions. RAG is handing over the map. Fine-tuning is a training course that changes habits. Continued pre-training is a second university degree. Training from scratch is raising the employee from birth — almost never worth it.
A company wants its model to ALWAYS respond in its precise brand voice and proprietary report format — something prompting alone hasn't achieved reliably. It has 5,000 labeled example pairs. Which approach fits?