Spring AI
    June 1, 2026

    Spring AI Enterprise Integration Guide

    A practical guide to building enterprise-grade AI applications with Spring AI — from chat clients and RAG pipelines to production observability.

    Share

    Spring AI Enterprise Integration Guide

    Java has dominated enterprise backends for two decades. The AI wave doesn't change that — it creates a problem: most AI tooling is Python-first. Spring AI solves this by bringing LLM integration, vector stores, and RAG pipelines into the familiar Spring Boot ecosystem. Here's how to use it properly in a production enterprise context.

    What Spring AI Actually Is

    Spring AI is a Spring Framework project (not a third-party library) that provides:

    • A unified abstraction over LLM providers (OpenAI, Anthropic, Azure OpenAI, Ollama, and more)
    • Built-in support for vector stores (pgvector, Pinecone, Redis, Chroma, Weaviate)
    • Prompt templates with variable interpolation
    • A ChatClient and EmbeddingClient with Spring-idiomatic APIs
    • First-class support for Retrieval Augmented Generation (RAG)

    The key insight: you write code against Spring AI's interfaces, not against OpenAI's SDK. Switching from GPT-4o to Claude Sonnet is a configuration change, not a rewrite.

    Setting Up Spring AI

    Add the Spring AI BOM to your pom.xml:

    <dependencyManagement>
      <dependencies>
        <dependency>
          <groupId>org.springframework.ai</groupId>
          <artifactId>spring-ai-bom</artifactId>
          <version>1.0.0</version>
          <type>pom</type>
          <scope>import</scope>
        </dependency>
      </dependencies>
    </dependencyManagement>
    
    <dependencies>
      <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
      </dependency>
    </dependencies>

    Configure in application.yml:

    spring:
      ai:
        openai:
          api-key: ${OPENAI_API_KEY}
          chat:
            options:
              model: gpt-4o
              temperature: 0.7
              max-tokens: 2048

    ChatClient: Your Primary Interface

    The ChatClient is the main entry point for LLM interactions. In Spring AI 1.0+, it has a fluent builder API:

    @Service
    public class AIAssistantService {
    
        private final ChatClient chatClient;
    
        public AIAssistantService(ChatClient.Builder builder) {
            this.chatClient = builder
                .defaultSystem("You are a senior software engineering assistant. " +
                              "Provide concise, production-ready code with explanations.")
                .build();
        }
    
        public String generateResponse(String userQuery) {
            return chatClient.prompt()
                .user(userQuery)
                .call()
                .content();
        }
    
        public Flux<String> generateStreamingResponse(String userQuery) {
            return chatClient.prompt()
                .user(userQuery)
                .stream()
                .content();
        }
    }

    The defaultSystem() call sets a persistent system prompt for every request through this client. Create separate ChatClient instances for different AI roles in your application.

    Prompt Templates for Reusable Prompts

    Hard-coding prompts in Java strings is a maintenance nightmare. Spring AI supports PromptTemplate with variable interpolation and file-based template loading:

    @Service
    public class DocumentAnalysisService {
    
        private final ChatClient chatClient;
        private final PromptTemplate analysisTemplate;
    
        public DocumentAnalysisService(ChatClient.Builder builder) {
            this.chatClient = builder.build();
            this.analysisTemplate = new PromptTemplate("""
                Analyse the following {document_type} and extract:
                1. Key findings (max 5 bullet points)
                2. Risk factors
                3. Recommended actions
                
                Document:
                {document_content}
                
                Respond in JSON format.
                """);
        }
    
        public String analyseDocument(String type, String content) {
            Prompt prompt = analysisTemplate.create(Map.of(
                "document_type", type,
                "document_content", content
            ));
            return chatClient.prompt(prompt).call().content();
        }
    }

    Store templates in src/main/resources/prompts/ as .st files for version control and easy modification without recompilation.

    Building a RAG Pipeline

    Retrieval Augmented Generation lets your LLM answer questions based on your own documents. Spring AI makes the pipeline straightforward:

    Step 1: Ingest documents into a vector store

    @Service
    public class DocumentIngestionService {
    
        private final VectorStore vectorStore;
        private final TokenTextSplitter textSplitter;
    
        public DocumentIngestionService(VectorStore vectorStore) {
            this.vectorStore = vectorStore;
            this.textSplitter = new TokenTextSplitter(512, 64); // chunk size, overlap
        }
    
        public void ingestDocument(String content, Map<String, Object> metadata) {
            Document document = new Document(content, metadata);
            List<Document> chunks = textSplitter.apply(List.of(document));
            vectorStore.add(chunks);
        }
    }

    Step 2: Query with context retrieval

    @Service
    public class EnterpriseQAService {
    
        private final ChatClient chatClient;
        private final VectorStore vectorStore;
    
        public String answerQuestion(String question) {
            // Retrieve relevant chunks
            List<Document> relevantDocs = vectorStore.similaritySearch(
                SearchRequest.query(question).withTopK(5)
            );
    
            // Build context from retrieved chunks
            String context = relevantDocs.stream()
                .map(Document::getContent)
                .collect(Collectors.joining("\n\n---\n\n"));
    
            // Generate answer grounded in retrieved context
            return chatClient.prompt()
                .system("""
                    You are an enterprise knowledge assistant. Answer questions 
                    ONLY based on the provided context. If the answer is not in 
                    the context, say so clearly — do not make up information.
                    """)
                .user(u -> u.text("""
                    Context:
                    {context}
                    
                    Question: {question}
                    """)
                    .param("context", context)
                    .param("question", question)
                )
                .call()
                .content();
        }
    }

    Step 3: Configure pgvector as your vector store

    spring:
      ai:
        vectorstore:
          pgvector:
            initialize-schema: true
            dimensions: 1536  # OpenAI text-embedding-3-small
            distance-type: COSINE_DISTANCE
      datasource:
        url: jdbc:postgresql://localhost:5432/enterprise_ai

    pgvector runs in your existing PostgreSQL instance — no separate vector database infrastructure needed for most enterprise use cases.

    Structured Output: Extracting Typed Data

    Spring AI can parse LLM responses directly into Java objects:

    public record ExtractedOrder(
        String orderId,
        String customerName,
        List<String> items,
        BigDecimal totalAmount
    ) {}
    
    public ExtractedOrder extractOrderFromEmail(String emailBody) {
        return chatClient.prompt()
            .user(u -> u.text("Extract order details from this email: {email}")
                .param("email", emailBody))
            .call()
            .entity(ExtractedOrder.class);
    }

    Spring AI handles schema generation and response parsing automatically. For complex nested objects, add validation annotations and a BeanOutputConverter.

    Observability in Production

    Spring AI integrates with Spring Boot's Actuator and Micrometer out of the box. Enable AI-specific metrics:

    management:
      metrics:
        tags:
          application: enterprise-ai-service
      endpoints:
        web:
          exposure:
            include: health,metrics,prometheus

    Key metrics to monitor:

    • spring.ai.chat.client.requests — total LLM requests by model
    • spring.ai.chat.client.latency — p50/p95/p99 response time
    • spring.ai.chat.client.tokens.input / output — token usage for cost tracking

    Add distributed tracing with Spring Boot's Micrometer Tracing to correlate AI calls with the rest of your service traces.

    Production Checklist

    Before going live with a Spring AI service:

    • API keys stored in a secrets manager (AWS Secrets Manager, HashiCorp Vault), not environment variables
    • Rate limiting on AI endpoints (token bucket per user/tenant)
    • Request/response logging with PII masking
    • Fallback model configured for when primary provider is unavailable
    • Token budget enforcement — reject requests that would exceed cost thresholds
    • Prompt injection detection (scan user inputs for common injection patterns)
    • Response caching for deterministic queries (Redis with TTL)

    Why Spring AI for Enterprise Java?

    The alternative to Spring AI is calling OpenAI's REST API directly or using the Python-based LangChain ecosystem from a separate service. Spring AI keeps AI capabilities inside your existing Java microservices:

    • Same dependency injection, same testing patterns, same deployment pipeline
    • No polyglot complexity or inter-service latency for every AI call
    • Full integration with Spring Security, Spring Data, and Spring Cloud
    • Enterprise support via the Broadcom/VMware Spring support channel

    For Java-first organisations, it's the fastest path from prototype to production AI.

    Ask about this article

    Get answers grounded in this post. AI-generated — based on this article, and may be imperfect.

    Scaled AI Weekly

    Enjoyed this? Get more like it every Monday.

    Real architecture decisions, LLMOps patterns that survive production, and engineering leadership advice — from 12+ years of building at enterprise scale. Free. No spam. Unsubscribe anytime.

    Join engineers building production AI systems