Java quietly became one of the best platforms for building AI applications — and most people missed it because the headlines went to Python. Between Java 21 and 25 (the current LTS line), the language picked up exactly the features AI workloads need: cheap massive concurrency, clean data modeling, and native interop. Here's what changed and how to use it.
The release cadence (so the version talk makes sense)
Java ships every six months, with an LTS every two years: Java 21 (Sept 2023) and Java 25 (Sept 2025) are the LTS releases you should target. Features arrive as preview/incubator first, then finalize. Below I flag what's stable vs. still cooking.
The headline features
Virtual Threads (finalized in 21) — the big one for AI
Virtual threads are lightweight threads managed by the JVM, not the OS. You can run hundreds of thousands of them. AI apps are overwhelmingly IO-bound — waiting on model APIs, tools, vector stores — so virtual threads let you handle huge concurrency with simple blocking code, no reactive spaghetti.
// Each task gets its own virtual thread — cheap, blocking-style code that scales.
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
List<Future<String>> answers = prompts.stream()
.map(p -> executor.submit(() -> chatClient.prompt().user(p).call().content()))
.toList();
for (var f : answers) System.out.println(f.get());
}
Structured Concurrency (preview — API still evolving)
Treats a group of concurrent subtasks as a single unit: if one fails, the rest cancel; the parent waits for all. Perfect for fan-out AI patterns — parallel retrieval, multi-tool calls, or multi-model voting — with proper error handling and cancellation. (The exact StructuredTaskScope API has changed across JDK previews, so check your version's signature.)
Scoped Values (modern ThreadLocal)
Immutable, inheritable per-task context that works cleanly with virtual threads — ideal for carrying a request id, user, or trace context through an AI call chain without the leaks and overhead of ThreadLocal.
Pattern matching for switch + record patterns (stable)
Destructure and branch on shape — great for handling structured LLM output or event/tool-result types:
sealed interface ToolResult {}
record Found(String data) implements ToolResult {}
record NotFound(String id) implements ToolResult {}
record Failed(String error) implements ToolResult {}
String handle(ToolResult r) {
return switch (r) {
case Found(var data) -> "Use: " + data;
case NotFound(var id) -> "Tell the model: " + id + " doesn't exist";
case Failed(var error) -> "Escalate: " + error;
};
}
Records + sealed types let you model AI responses, tool arguments, and structured outputs precisely — which pairs perfectly with frameworks that map model JSON onto typed objects.
Foreign Function & Memory API (finalized in 22)
Call native libraries without JNI. That means you can drive native inference runtimes (llama.cpp, ONNX Runtime) or other C/C++ ML libs directly from Java — the practical path to local/on-device inference on the JVM.
Vector API (incubator)
SIMD-accelerated numeric operations — useful for the math under AI: embedding similarity, distance calculations, and vector ops without dropping to native code. Still incubating, so gate it behind a flag.
Quality-of-life
Sequenced collections, stream gatherers (custom intermediate ops), a simplified void main() / implicit classes for scripts and onboarding, Markdown in Javadoc, and generational ZGC for low-pause GC under heavy load. (Note: String Templates were previewed then withdrawn for redesign — don't build on them yet.)
Why this matters for AI workloads
- Concurrency for free. An AI request often fans out to a model, a retriever, and several tools. Virtual threads run all of them concurrently with plain blocking code — thousands of in-flight requests on a normal server.
- Reliable fan-out. Structured concurrency makes "run these 3 calls, cancel all if one fails, wait for the rest" a first-class construct.
- Local inference & math. FFM + Vector API bring native runtimes and SIMD to the JVM — no JNI, no Python sidecar required.
The Java AI ecosystem
You don't call raw HTTP — mature libraries handle it:
- Spring AI — provider-agnostic
ChatClient, RAG, tool calling, and MCP, with Spring Boot ergonomics. Best fit for Spring shops. - LangChain4j — AI Services, RAG, tools, and a wide set of model providers; framework-agnostic.
- Official Anthropic Java SDK — direct, typed access to the Claude API when you want to talk to the model without a higher-level framework.
All of them are just IO calls under the hood — which is exactly why virtual threads make them scale so well.
Adoption tips
- Target an LTS — Java 21 or 25. You get virtual threads, records, and pattern matching as stable features.
- Preview/incubator features (structured concurrency, vector API) need
--enable-preview/--add-modulesand can change between releases — isolate them. - Mind virtual-thread pinning: a
synchronizedblock held during a blocking call can pin a carrier thread. On hot paths preferReentrantLockso the scheduler can unmount the virtual thread.
Don't pool virtual threads. They're cheap to create — use newVirtualThreadPerTaskExecutor() and spawn one per task. Pooling them defeats the entire point.
Preview features like structured concurrency can change their API between JDK releases. Pin your JDK version and isolate preview code behind a thin interface so an upgrade doesn't ripple through your codebase.
Wrap-up
Modern Java turned into a genuinely strong AI platform: virtual threads and structured concurrency for IO-heavy, fan-out workloads; records and pattern matching for clean data modeling; and FFM + Vector API for native inference and math. Target Java 21/25, lean on Spring AI or LangChain4j, and let virtual threads carry the concurrency.