Spring AI
    June 10, 2026

    Spring AI — Getting Started & Fundamentals

    Set up Spring AI, understand its provider-agnostic model, and make your first ChatClient calls — system prompts, options, and streaming, the Spring Boot way.

    Share

    Spring AI brings LLMs into Spring Boot with the same ergonomics you already know: starters, auto-configuration, and beans you inject. This page gets you from zero to your first working calls.

    The mental model

    Spring AI is built on provider-agnostic abstractions:

    • ChatModel / ChatClient — chat completions (the fluent ChatClient is what you'll use day to day).
    • EmbeddingModel — turn text into vectors (for RAG).
    • VectorStore — store/query embeddings.
    • ImageModel, AudioModel — multimodal where supported.

    You depend on the abstraction; a starter wires in the concrete provider. Switching from OpenAI to Anthropic is mostly a dependency + config change, not a code rewrite.

    1. Add a model starter

    Pick a provider starter (you can have more than one). For Anthropic Claude:

    <dependency>
      <groupId>org.springframework.ai</groupId>
      <artifactId>spring-ai-starter-model-anthropic</artifactId>
    </dependency>
    spring.ai.anthropic.api-key=${ANTHROPIC_API_KEY}
    spring.ai.anthropic.chat.options.model=claude-sonnet-4-6
    spring.ai.anthropic.chat.options.temperature=0.4

    Keep the API key in an environment variable — never commit it.

    2. Build a ChatClient

    ChatClient is the fluent entry point. Configure shared defaults once via the builder:

    @Configuration
    class AiConfig {
        @Bean
        ChatClient chatClient(ChatClient.Builder builder) {
            return builder
                .defaultSystem("You are a concise, accurate enterprise assistant.")
                .build();
        }
    }

    3. Your first call

    @RestController
    class AskController {
        private final ChatClient chat;
        AskController(ChatClient chat) { this.chat = chat; }
    
        @GetMapping("/ask")
        String ask(@RequestParam String q) {
            return chat.prompt()
                .user(q)
                .call()
                .content();
        }
    }

    .call().content() returns the text. Want the full response (tokens, metadata)? Use .call().chatResponse().

    4. Streaming

    For chat UIs, stream tokens as they arrive:

    Flux<String> stream(String q) {
        return chat.prompt().user(q).stream().content();
    }

    Return the Flux from a WebFlux endpoint (or bridge to SSE) so users see output immediately.

    5. Per-call options

    Override model/temperature per request without touching config:

    chat.prompt()
        .user(q)
        .options(AnthropicChatOptions.builder().temperature(0.0).build())
        .call()
        .content();

    Best practices from day one

    • Set a clear default system prompt — role, tone, and the boundaries of what the assistant should do.
    • Externalize keys and model names to config/env; don't hardcode.
    • Pick temperature by task: ~0 for extraction/classification, higher only when you want variety.
    • Don't trust output blindly — you'll add structure and validation next.

    Next: make outputs reliable and let the model call your code → Prompting, Structured Output & Tool Calling →

    Ask about this article

    Get answers grounded in this post. AI-generated — based on this article, and may be imperfect.

    Scaled AI Weekly

    Enjoyed this? Get more like it every Monday.

    Real architecture decisions, LLMOps patterns that survive production, and engineering leadership advice — from 12+ years of building at enterprise scale. Free. No spam. Unsubscribe anytime.

    Join engineers building production AI systems