"Which LLM is best?" has no single answer — it depends on the task, cost, latency, and data-residency constraints. Spring AI's value is that you don't have to bet the codebase on one: you integrate behind a common interface and route per task.
How Spring AI integrates "all of them"
Every provider is a starter implementing the same ChatModel/EmbeddingModel interfaces:
| Provider | Starter (model) |
|---|---|
| Anthropic Claude | spring-ai-starter-model-anthropic |
| OpenAI | spring-ai-starter-model-openai |
| Azure OpenAI | spring-ai-starter-model-azure-openai |
| Google Vertex (Gemini) | spring-ai-starter-model-vertex-ai-gemini |
| Amazon Bedrock | spring-ai-starter-model-bedrock-* |
| Ollama (local) | spring-ai-starter-model-ollama |
Your code calls ChatClient; the starter + config decide the concrete model. Swapping providers is a dependency/config change, not a rewrite.
Which model for which job
There's no universal "best", but sensible defaults:
- Complex reasoning, agents, architecture, coding → a frontier model (e.g. Claude Sonnet/Opus class). Worth the cost where quality and tool-use reliability matter.
- High-volume classification / extraction / routing → a small, fast model. Cheap and low-latency; pair with structured output.
- Private / regulated data, offline → a local model via Ollama or a VPC-hosted provider for data residency.
- Embeddings (RAG) → a dedicated embedding model; keep the same model for ingestion and query.
Decision factors: capability (can it do the task reliably?), cost per token, latency, context window, data residency/compliance, and provider reliability.
Routing: use more than one
A mature system routes by task — a cheap model for the easy 80%, a frontier model for the hard 20%:
ChatClient client = complexity.isHigh(task)
? frontierClient // built from the Anthropic builder
: fastClient; // built from a small-model builder
String out = client.prompt().user(task).call().content();
Centralise this behind a service (or an AI gateway) so apps don't each reinvent routing, fallback, and budgets.
Best practices & anti-patterns
- ✅ Abstract on
ChatClient; keep provider specifics in config so you can switch/route. - ✅ Benchmark candidate models on your eval set, not leaderboards.
- ✅ Have a fallback provider for outages.
- ❌ Hardcoding one provider's SDK throughout your app (lock-in, no routing).
- ❌ Using a frontier model for trivial classification (cost) — or a tiny model for hard reasoning (quality).
Next: make it production-grade → Production Architecture →