LLMs don't know your internal docs, policies, or last week's data — and they hallucinate around the gaps. Retrieval-Augmented Generation (RAG) fixes this: fetch the relevant content at query time and put it in the context. Spring AI gives you the building blocks.
The two phases
Ingestion (offline): documents → chunks → embeddings → vector store. Query (online): question → embedding → similarity search → relevant chunks → prompt → answer.
Embeddings & vector store
Add an embedding model starter and a vector store. Spring AI supports pgvector, Redis, Qdrant, Pinecone, Chroma, and more behind one VectorStore interface:
spring.ai.openai.embedding.options.model=text-embedding-3-small
spring.ai.vectorstore.pgvector.dimensions=1536
Ingestion
var reader = new TikaDocumentReader(resource); // PDF, docx, html…
var splitter = new TokenTextSplitter(); // chunking
List<Document> chunks = splitter.apply(reader.get());
vectorStore.add(chunks); // embeds + stores
Attach metadata (source, title, section) to each Document — you'll use it for filtering and provenance.
Retrieval + generation
The clean way is the RAG advisor, which retrieves and injects context automatically:
String answer = chat.prompt()
.advisors(new QuestionAnswerAdvisor(vectorStore))
.user(question)
.call()
.content();
Or do it manually for control: vectorStore.similaritySearch(...) → build the prompt with the chunks → call.
What actually determines quality
RAG quality is capped by retrieval, and retrieval is capped by chunking:
- Chunk on structure (headings/sections) where possible; ~300–800 tokens with 10–20% overlap as a starting point.
- Filter by metadata before vector search (e.g. by product, recency) to cut noise.
- Tune top-k — too many chunks cause "context rot"; too few miss the answer.
- Measure retrieval recall with a golden set before touching the prompt.
Best practices & anti-patterns
- ✅ Add metadata + provenance so answers are traceable and conflicts surface.
- ✅ Keep static instructions first in the prompt (cache-friendly).
- ❌ Don't dump whole documents into the prompt — retrieve.
- ❌ Don't fine-tune facts in; they go stale. Retrieve fresh data instead.
Next: standardise tool access and build agents → MCP & Agents →