RAG & Knowledge Bases

Retrieval-augmented generation: how it works, why it beats fine-tuning for company knowledge, and vector databases on AWS.

9 min read

Retrieval-Augmented Generation (RAG) fixes two LLM weaknesses at once — hallucination and stale knowledge — by fetching relevant, current documents *at question time* and giving them to the model as context. The model answers grounded in your data, and can cite its sources. No model training required.

How RAG works

Ingest: your documents are split into chunks; an embedding model converts each chunk to a vector, stored in a vector database.
Retrieve: a user's question is embedded too; the database returns the chunks whose vectors are closest (most semantically similar).
Augment: the retrieved chunks are inserted into the prompt as context.
Generate: the LLM answers using that context, often with citations.

Think of it like this

RAG is an open-book exam. Instead of hoping the student memorized everything (fine-tuning), you hand them the right pages of the textbook (retrieval) as they answer each question.

RAG on AWS

Amazon Bedrock Knowledge Bases

Fully managed RAG: point it at S3 documents and it handles chunking, embeddings, vector storage, retrieval, and citations.

Amazon OpenSearch Service

Popular vector database option for semantic search.

Amazon Aurora / RDS with pgvector

PostgreSQL as a vector store.

Amazon Neptune / DocumentDB / MemoryDB

Other AWS options with vector search support.

Amazon Kendra

Managed intelligent search that can also feed retrieval for GenAI apps.

Exam tip

Choose RAG when: answers must come from company/current data, must include citations, data changes frequently, or hallucinations must be reduced — all without training. Choose fine-tuning instead when you need new *behavior/style/format*, not new *facts*. Frequently-changing knowledge in a fine-tuned model = constant expensive retraining; in RAG it's just a document update.

Knowledge check

Question 1 of 4

A chatbot must answer questions using the company's internal HR policies, which change monthly, and cite the source document. Which approach fits BEST?

PreviousPrompt Engineering NextCustomizing Foundation Models