Spring AI brings LLMs into Spring Boot with the same ergonomics you already know: starters, auto-configuration, and beans you inject. This page gets you from zero to your first working calls.
The mental model
Spring AI is built on provider-agnostic abstractions:
ChatModel/ChatClient— chat completions (the fluentChatClientis what you'll use day to day).EmbeddingModel— turn text into vectors (for RAG).VectorStore— store/query embeddings.ImageModel,AudioModel— multimodal where supported.
You depend on the abstraction; a starter wires in the concrete provider. Switching from OpenAI to Anthropic is mostly a dependency + config change, not a code rewrite.
1. Add a model starter
Pick a provider starter (you can have more than one). For Anthropic Claude:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-anthropic</artifactId>
</dependency>
spring.ai.anthropic.api-key=${ANTHROPIC_API_KEY}
spring.ai.anthropic.chat.options.model=claude-sonnet-4-6
spring.ai.anthropic.chat.options.temperature=0.4
Keep the API key in an environment variable — never commit it.
2. Build a ChatClient
ChatClient is the fluent entry point. Configure shared defaults once via the builder:
@Configuration
class AiConfig {
@Bean
ChatClient chatClient(ChatClient.Builder builder) {
return builder
.defaultSystem("You are a concise, accurate enterprise assistant.")
.build();
}
}
3. Your first call
@RestController
class AskController {
private final ChatClient chat;
AskController(ChatClient chat) { this.chat = chat; }
@GetMapping("/ask")
String ask(@RequestParam String q) {
return chat.prompt()
.user(q)
.call()
.content();
}
}
.call().content() returns the text. Want the full response (tokens, metadata)? Use .call().chatResponse().
4. Streaming
For chat UIs, stream tokens as they arrive:
Flux<String> stream(String q) {
return chat.prompt().user(q).stream().content();
}
Return the Flux from a WebFlux endpoint (or bridge to SSE) so users see output immediately.
5. Per-call options
Override model/temperature per request without touching config:
chat.prompt()
.user(q)
.options(AnthropicChatOptions.builder().temperature(0.0).build())
.call()
.content();
Best practices from day one
- Set a clear default system prompt — role, tone, and the boundaries of what the assistant should do.
- Externalize keys and model names to config/env; don't hardcode.
- Pick temperature by task: ~0 for extraction/classification, higher only when you want variety.
- Don't trust output blindly — you'll add structure and validation next.
Next: make outputs reliable and let the model call your code → Prompting, Structured Output & Tool Calling →