Spring AI — Choosing & Integrating LLMs — Avaneesh Yadav

"Which LLM is best?" has no single answer — it depends on the task, cost, latency, and data-residency constraints. Spring AI's value is that you don't have to bet the codebase on one: you integrate behind a common interface and route per task.

How Spring AI integrates "all of them"

Every provider is a starter implementing the same ChatModel/EmbeddingModel interfaces:

Provider	Starter (model)
Anthropic Claude	`spring-ai-starter-model-anthropic`
OpenAI	`spring-ai-starter-model-openai`
Azure OpenAI	`spring-ai-starter-model-azure-openai`
Google Vertex (Gemini)	`spring-ai-starter-model-vertex-ai-gemini`
Amazon Bedrock	`spring-ai-starter-model-bedrock-*`
Ollama (local)	`spring-ai-starter-model-ollama`

Your code calls ChatClient; the starter + config decide the concrete model. Swapping providers is a dependency/config change, not a rewrite.

flowchart LR C@{ icon: "logos:anthropic", form: "square", label: "Claude" } O@{ icon: "logos:openai", form: "square", label: "OpenAI / Azure" } G@{ icon: "logos:google-cloud", form: "square", label: "Gemini / Vertex" } B@{ icon: "logos:aws", form: "square", label: "Bedrock" } APP[Your code: ChatClient]:::svc --> ABS[Spring AI abstraction]:::svc ABS --> C ABS --> O ABS --> G ABS --> B ABS --> L[Ollama local]:::ai class C,O,G,B logo classDef svc fill:#06303a,stroke:#22d3ee,color:#fff classDef ai fill:#241844,stroke:#a855f7,color:#fff classDef logo fill:#0b1220,stroke:#475569,color:#e2e8f0

Which model for which job

There's no universal "best", but sensible defaults:

Complex reasoning, agents, architecture, coding → a frontier model (e.g. Claude Sonnet/Opus class). Worth the cost where quality and tool-use reliability matter.
High-volume classification / extraction / routing → a small, fast model. Cheap and low-latency; pair with structured output.
Private / regulated data, offline → a local model via Ollama or a VPC-hosted provider for data residency.
Embeddings (RAG) → a dedicated embedding model; keep the same model for ingestion and query.

Decision factors: capability (can it do the task reliably?), cost per token, latency, context window, data residency/compliance, and provider reliability.

Routing: use more than one

A mature system routes by task — a cheap model for the easy 80%, a frontier model for the hard 20%:

ChatClient client = complexity.isHigh(task)
    ? frontierClient    // built from the Anthropic builder
    : fastClient;       // built from a small-model builder
String out = client.prompt().user(task).call().content();

Centralise this behind a service (or an AI gateway) so apps don't each reinvent routing, fallback, and budgets.

Best practices & anti-patterns

✅ Abstract on ChatClient; keep provider specifics in config so you can switch/route.
✅ Benchmark candidate models on your eval set, not leaderboards.
✅ Have a fallback provider for outages.
❌ Hardcoding one provider's SDK throughout your app (lock-in, no routing).
❌ Using a frontier model for trivial classification (cost) — or a tiny model for hard reasoning (quality).

Next: make it production-grade → Production Architecture →

Spring AI — Choosing & Integrating LLMs

How Spring AI integrates "all of them"

Which model for which job

Routing: use more than one

Best practices & anti-patterns

Ask about this article

Enjoyed this? Get more like it
every Monday.

Spring AI — Choosing & Integrating LLMs

How Spring AI integrates "all of them"

Which model for which job

Routing: use more than one

Best practices & anti-patterns

Ask about this article

Enjoyed this? Get more like it every Monday.

Enjoyed this? Get more like it
every Monday.