Spring AI Enterprise Integration Guide

Java has dominated enterprise backends for two decades. The AI wave doesn't change that — it creates a problem: most AI tooling is Python-first. Spring AI solves this by bringing LLM integration, vector stores, and RAG pipelines into the familiar Spring Boot ecosystem. Here's how to use it properly in a production enterprise context.

What Spring AI Actually Is

Spring AI is a Spring Framework project (not a third-party library) that provides:

A unified abstraction over LLM providers (OpenAI, Anthropic, Azure OpenAI, Ollama, and more)
Built-in support for vector stores (pgvector, Pinecone, Redis, Chroma, Weaviate)
Prompt templates with variable interpolation
A ChatClient and EmbeddingClient with Spring-idiomatic APIs
First-class support for Retrieval Augmented Generation (RAG)

The key insight: you write code against Spring AI's interfaces, not against OpenAI's SDK. Switching from GPT-4o to Claude Sonnet is a configuration change, not a rewrite.

Setting Up Spring AI

Add the Spring AI BOM to your pom.xml:

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>org.springframework.ai</groupId>
      <artifactId>spring-ai-bom</artifactId>
      <version>1.0.0</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependencies>
  <dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
  </dependency>
</dependencies>

Configure in application.yml:

spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o
          temperature: 0.7
          max-tokens: 2048

ChatClient: Your Primary Interface

The ChatClient is the main entry point for LLM interactions. In Spring AI 1.0+, it has a fluent builder API:

@Service
public class AIAssistantService {

    private final ChatClient chatClient;

    public AIAssistantService(ChatClient.Builder builder) {
        this.chatClient = builder
            .defaultSystem("You are a senior software engineering assistant. " +
                          "Provide concise, production-ready code with explanations.")
            .build();
    }

    public String generateResponse(String userQuery) {
        return chatClient.prompt()
            .user(userQuery)
            .call()
            .content();
    }

    public Flux<String> generateStreamingResponse(String userQuery) {
        return chatClient.prompt()
            .user(userQuery)
            .stream()
            .content();
    }
}

The defaultSystem() call sets a persistent system prompt for every request through this client. Create separate ChatClient instances for different AI roles in your application.

Prompt Templates for Reusable Prompts

Hard-coding prompts in Java strings is a maintenance nightmare. Spring AI supports PromptTemplate with variable interpolation and file-based template loading:

@Service
public class DocumentAnalysisService {

    private final ChatClient chatClient;
    private final PromptTemplate analysisTemplate;

    public DocumentAnalysisService(ChatClient.Builder builder) {
        this.chatClient = builder.build();
        this.analysisTemplate = new PromptTemplate("""
            Analyse the following {document_type} and extract:
            1. Key findings (max 5 bullet points)
            2. Risk factors
            3. Recommended actions
            
            Document:
            {document_content}
            
            Respond in JSON format.
            """);
    }

    public String analyseDocument(String type, String content) {
        Prompt prompt = analysisTemplate.create(Map.of(
            "document_type", type,
            "document_content", content
        ));
        return chatClient.prompt(prompt).call().content();
    }
}

Store templates in src/main/resources/prompts/ as .st files for version control and easy modification without recompilation.

Building a RAG Pipeline

Retrieval Augmented Generation lets your LLM answer questions based on your own documents. Spring AI makes the pipeline straightforward:

Step 1: Ingest documents into a vector store

@Service
public class DocumentIngestionService {

    private final VectorStore vectorStore;
    private final TokenTextSplitter textSplitter;

    public DocumentIngestionService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
        this.textSplitter = new TokenTextSplitter(512, 64); // chunk size, overlap
    }

    public void ingestDocument(String content, Map<String, Object> metadata) {
        Document document = new Document(content, metadata);
        List<Document> chunks = textSplitter.apply(List.of(document));
        vectorStore.add(chunks);
    }
}

Step 2: Query with context retrieval

@Service
public class EnterpriseQAService {

    private final ChatClient chatClient;
    private final VectorStore vectorStore;

    public String answerQuestion(String question) {
        // Retrieve relevant chunks
        List<Document> relevantDocs = vectorStore.similaritySearch(
            SearchRequest.query(question).withTopK(5)
        );

        // Build context from retrieved chunks
        String context = relevantDocs.stream()
            .map(Document::getContent)
            .collect(Collectors.joining("\n\n---\n\n"));

        // Generate answer grounded in retrieved context
        return chatClient.prompt()
            .system("""
                You are an enterprise knowledge assistant. Answer questions 
                ONLY based on the provided context. If the answer is not in 
                the context, say so clearly — do not make up information.
                """)
            .user(u -> u.text("""
                Context:
                {context}
                
                Question: {question}
                """)
                .param("context", context)
                .param("question", question)
            )
            .call()
            .content();
    }
}

Step 3: Configure pgvector as your vector store

spring:
  ai:
    vectorstore:
      pgvector:
        initialize-schema: true
        dimensions: 1536  # OpenAI text-embedding-3-small
        distance-type: COSINE_DISTANCE
  datasource:
    url: jdbc:postgresql://localhost:5432/enterprise_ai

pgvector runs in your existing PostgreSQL instance — no separate vector database infrastructure needed for most enterprise use cases.

Structured Output: Extracting Typed Data

Spring AI can parse LLM responses directly into Java objects:

public record ExtractedOrder(
    String orderId,
    String customerName,
    List<String> items,
    BigDecimal totalAmount
) {}

public ExtractedOrder extractOrderFromEmail(String emailBody) {
    return chatClient.prompt()
        .user(u -> u.text("Extract order details from this email: {email}")
            .param("email", emailBody))
        .call()
        .entity(ExtractedOrder.class);
}

Spring AI handles schema generation and response parsing automatically. For complex nested objects, add validation annotations and a BeanOutputConverter.

Observability in Production

Spring AI integrates with Spring Boot's Actuator and Micrometer out of the box. Enable AI-specific metrics:

management:
  metrics:
    tags:
      application: enterprise-ai-service
  endpoints:
    web:
      exposure:
        include: health,metrics,prometheus

Key metrics to monitor:

spring.ai.chat.client.requests — total LLM requests by model
spring.ai.chat.client.latency — p50/p95/p99 response time
spring.ai.chat.client.tokens.input / output — token usage for cost tracking

Add distributed tracing with Spring Boot's Micrometer Tracing to correlate AI calls with the rest of your service traces.

Production Checklist

Before going live with a Spring AI service:

API keys stored in a secrets manager (AWS Secrets Manager, HashiCorp Vault), not environment variables
Rate limiting on AI endpoints (token bucket per user/tenant)
Request/response logging with PII masking
Fallback model configured for when primary provider is unavailable
Token budget enforcement — reject requests that would exceed cost thresholds
Prompt injection detection (scan user inputs for common injection patterns)
Response caching for deterministic queries (Redis with TTL)

Why Spring AI for Enterprise Java?

The alternative to Spring AI is calling OpenAI's REST API directly or using the Python-based LangChain ecosystem from a separate service. Spring AI keeps AI capabilities inside your existing Java microservices:

Same dependency injection, same testing patterns, same deployment pipeline
No polyglot complexity or inter-service latency for every AI call
Full integration with Spring Security, Spring Data, and Spring Cloud
Enterprise support via the Broadcom/VMware Spring support channel

For Java-first organisations, it's the fastest path from prototype to production AI.

Tool Calling with Spring AI — let Claude safely call your Java services with @Tool methods.

Spring AI Enterprise Integration Guide

Spring AI Enterprise Integration Guide

What Spring AI Actually Is

Setting Up Spring AI

ChatClient: Your Primary Interface

Prompt Templates for Reusable Prompts

Building a RAG Pipeline

Structured Output: Extracting Typed Data

Observability in Production

Production Checklist

Why Spring AI for Enterprise Java?

Ask about this article

Enjoyed this? Get more like it
every Monday.

Spring AI Enterprise Integration Guide

Spring AI Enterprise Integration Guide

What Spring AI Actually Is

Setting Up Spring AI

ChatClient: Your Primary Interface

Prompt Templates for Reusable Prompts

Building a RAG Pipeline

Structured Output: Extracting Typed Data

Observability in Production

Production Checklist

Why Spring AI for Enterprise Java?

Related reading

Ask about this article

Enjoyed this? Get more like it every Monday.

Enjoyed this? Get more like it
every Monday.