Java 16 min read

Spring Boot 3 + Spring AI: Build Production-Ready AI Apps in Java (2025 Complete Guide)

Master Spring AI with this comprehensive 2025 guide. Learn to build production-ready AI applications in Java with OpenAI, Ollama, RAG systems, vector stores, and complete code examples.

MR

Moshiour Rahman

Advertisement

Spring Boot 3 + Spring AI: Build Production-Ready AI Apps in Java (2025 Complete Guide)

AI is transforming software development, but most tutorials are Python-centric. What if you’re a Java developer working in a Spring Boot ecosystem? Enter Spring AI - the game-changer that brings enterprise-grade AI capabilities to Java applications.

This is the most comprehensive Spring AI guide you’ll find in 2025. We’regoing way beyond “Hello World” to build a production-ready RAG (Retrieval Augmented Generation) system with multi-provider support, vector stores, and full observability.

📚 Table of Contents

  1. Why Spring AI? (The Java Developer’s Perspective)
  2. What You’ll Build
  3. Prerequisites
  4. Architecture Overview
  5. Environment Setup
  6. Building the Foundation
  7. Implementing Chat Completions
  8. Building a RAG System
  9. Vector Stores & Embeddings
  10. Production Considerations
  11. Testing & Deployment
  12. Complete Code Repository

Why Spring AI? {#why-spring-ai}

The Problem with Python-First AI

Don’t get me wrong—Python is fantastic for AI/ML. But here’s the reality for Java developers:

  • Your entire stack is Java - microservices, databases, messaging, auth
  • Type safety matters in production systems
  • Spring Boot patterns are battle-tested for enterprise applications
  • Team expertise is in Java, not Python

Spring AI solves this: Build AI-powered features using familiar Spring patterns, without leaving your ecosystem.

Why Not Just Use OpenAI’s REST API?

You could make raw HTTP calls to OpenAI. But Spring AI gives you:

Spring AI Features

Key Benefits:

  1. Abstraction: Switch between OpenAI, Azure, Ollama without code changes
  2. Spring Integration: Works with Spring Security, Data, Cloud, etc.
  3. Type Safety: Strong typing vs. raw JSON strings
  4. Production-Ready: Built-in metrics, tracing, error handling
  5. Familiar Patterns: Repositories, Services, Controllers you already know

What You’ll Build {#what-youll-build}

We’re building a complete AI application with:

Multi-Provider Chat: OpenAI and Ollama (local) support
RAG System: Upload docs, query with context-aware AI
Vector Store: PostgreSQL with pgvector for semantic search
REST APIs: Production-grade endpoints with validation
Observability: Metrics, tracing, health checks
Docker Setup: One-command local development
Kubernetes: Production deployment manifests

Spring AI Architecture

GitHub Repository: All code is available at spring-boot-ai-starter


Prerequisites {#prerequisites}

Required

  • Java 21+ (LTS version)
  • Maven 3.8+ or Gradle 8+
  • Docker & Docker Compose
  • Git

Optional

  • OpenAI API Key (for OpenAI provider)
  • IDE: IntelliJ IDEA, VS Code, or Eclipse
  • Postman or curl for API testing

Knowledge Assumptions

  • Basic Spring Boot understanding
  • REST API concepts
  • Docker basics
  • SQL fundamentals

Architecture Overview {#architecture}

High-Level Architecture

Spring AI Application Layers

Component Breakdown

ComponentPurposeTechnology
ControllersREST API endpointsSpring Web
ServicesBusiness logicSpring AI Client
Chat ModelAI completionsOpenAI / Ollama
Embedding ModelText to vectorsOpenAI / Ollama
Vector StoreSemantic searchPostgreSQL + pgvector
Document ReadersParse PDFs, textSpring AI Readers
MetricsPerformance monitoringMicrometer + Prometheus

Environment Setup {#environment-setup}

Step 1: Create Spring Boot Project

We’ll use Spring Initializr via command line:

curl https://start.spring.io/starter.zip \
  -d dependencies=web,data-jpa,postgresql,actuator,lombok \
  -d type=maven-project \
  -d language=java \
  -d bootVersion=3.3.6 \
  -d groupId=io.techyowls \
  -d artifactId=spring-boot-ai-starter \
  -d name=spring-boot-ai-starter \
  -d packageName=io.techyowls.springai \
  -d javaVersion=21 \
  -o spring-boot-ai-starter.zip

unzip spring-boot-ai-starter.zip
cd spring-boot-ai-starter

Step 2: Add Spring AI Dependencies

Edit pom.xml and add:

<properties>
    <java.version>21</java.version>
    <spring-ai.version>1.0.0-M4</spring-ai.version>
</properties>

<dependencies>
    <!-- Spring AI Dependencies -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
    </dependency>
    
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
    </dependency>
    
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
    </dependency>
    
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-pdf-document-reader</artifactId>
    </dependency>
</dependencies>

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>${spring-ai.version}</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

<repositories>
    <repository>
        <id>spring-milestones</id>
        <name>Spring Milestones</name>
        <url>https://repo.spring.io/milestone</url>
    </repository>
</repositories>

What we added:

  • spring-ai-openai-spring-boot-starter: OpenAI integration
  • spring-ai-ollama-spring-boot-starter: Local Ollama support
  • spring-ai-pgvector-store-spring-boot-starter: Vector database
  • spring-ai-pdf-document-reader: PDF parsing for RAG

Step 3: Configure Application Properties

Create src/main/resources/application.yml:

spring:
  application:
    name: spring-boot-ai-starter
  
  datasource:
    url: jdbc:postgresql://localhost:5432/vectordb
    username: springai
    password: springai123
    driver-class-name: org.postgresql.Driver
  
  jpa:
    hibernate:
      ddl-auto: update
    show-sql: false
  
  ai:
    openai:
      api-key: ${OPENAI_API_KEY:your-key-here}
      chat:
        options:
          model: gpt-4o-mini
          temperature: 0.7
      embedding:
        options:
          model: text-embedding-3-small
    
    ollama:
      base-url: http://localhost:11434
      chat:
        options:
          model: llama3.2
          temperature: 0.7
      embedding:
        options:
          model: nomic-embed-text
    
    vectorstore:
      pgvector:
        initialize-schema: true
        index-type: HNSW
        distance-type: COSINE_DISTANCE
        dimensions: 1536

server:
  port: 8080

management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus
  metrics:
    export:
      prometheus:
        enabled: true

Configuration Breakdown:

  • OpenAI: Configured with gpt-4o-mini (fast, cheap) and text-embedding-3-small
  • Ollama: Local AI with Llama 3.2 and nomic-embed-text
  • Vector Store: pgvector with HNSW indexing for fast similarity search
  • Actuator: Health checks and Prometheus metrics

Step 4: Docker Compose Setup

Create docker-compose.yml:

version: '3.8'

services:
  postgres:
    image: pgvector/pgvector:pg16
    container_name: spring-ai-postgres
    environment:
      POSTGRES_USER: springai
      POSTGRES_PASSWORD: springai123
      POSTGRES_DB: vectordb
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./init-scripts:/docker-entrypoint-initdb.d
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U springai"]
      interval: 10s
      timeout: 5s
      retries: 5

  ollama:
    image: ollama/ollama:latest
    container_name: spring-ai-ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    command: serve
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
      interval: 30s
      timeout: 10s
      retries: 3

volumes:
  postgres_data:
  ollama_data:

Create init-scripts/01-init.sql:

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create vector store table
CREATE TABLE IF NOT EXISTS vector_store (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    content TEXT NOT NULL,
    metadata JSONB,
    embedding vector(1536)
);

-- Create index for faster similarity search
CREATE INDEX IF NOT EXISTS vector_store_embedding_idx 
    ON vector_store 
    USING hnsw (embedding vector_cosine_ops);

Step 5: Start Infrastructure

# Start PostgreSQL and Ollama
docker-compose up -d

# Wait for services to be healthy
docker-compose ps

# Pull Ollama models
docker exec -it spring-ai-ollama ollama pull llama3.2
docker exec -it spring-ai-ollama ollama pull nomic-embed-text

Building the Foundation {#building-foundation}

Project Structure

src/main/java/io/techyowls/springai/
├── SpringAiApplication.java
├── config/
├── controller/
│   ├── ChatController.java
│   └── RAGController.java
├── service/
│   ├── ChatService.java
│   └── RAGService.java
├── model/
│   ├── ChatRequest.java
│   ├── ChatResponse.java
│   ├── RAGRequest.java
│   └── RAGResponse.java
└── exception/
    └── GlobalExceptionHandler.java

Main Application Class

package io.techyowls.springai;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class SpringAiApplication {
    public static void main(String[] args) {
        SpringApplication.run(SpringAiApplication.class, args);
    }
}

Implementing Chat Completions {#chat-completions}

Step 1: Create Request/Response Models

ChatRequest.java:

package io.techyowls.springai.model;

import jakarta.validation.constraints.NotBlank;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;

import java.util.Map;

@Data
@NoArgsConstructor
@AllArgsConstructor
public class ChatRequest {
    
    @NotBlank(message = "Message cannot be empty")
    private String message;
    
    private String provider = "openai"; // or "ollama"
    private String model;
    private Double temperature;
    private Integer maxTokens;
    private Map<String, Object> options;
}

ChatResponse.java:

package io.techyowls.springai.model;

import lombok.AllArgsConstructor;
import lombok.Builder;
import lombok.Data;
import lombok.NoArgsConstructor;

import java.time.Instant;

@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class ChatResponse {
    
    private String response;
    private String provider;
    private String model;
    private Integer tokensUsed;
    private Double processingTimeMs;
    private Instant timestamp;
}

Step 2: Create Chat Service

ChatService.java:

package io.techyowls.springai.service;

import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;
import io.techyowls.springai.model.ChatRequest;
import io.techyowls.springai.model.ChatResponse;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.chat.model.ChatModel;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.openai.OpenAiChatModel;
import org.springframework.ai.ollama.OllamaChatModel;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.stereotype.Service;

import java.time.Instant;

@Slf4j
@Service
public class ChatService {

    private final OpenAiChatModel openAiChatModel;
    private final OllamaChatModel ollamaChatModel;
    private final Counter chatRequestCounter;
    private final Timer chatResponseTimer;

    public ChatService(
            @Qualifier("openAiChatModel") OpenAiChatModel openAiChatModel,
            @Qualifier("ollamaChatModel") OllamaChatModel ollamaChatModel,
            MeterRegistry meterRegistry) {
        this.openAiChatModel = openAiChatModel;
        this.ollamaChatModel = ollamaChatModel;
        
        this.chatRequestCounter = Counter.builder("chat.requests.total")
                .description("Total chat requests")
                .register(meterRegistry);
        this.chatResponseTimer = Timer.builder("chat.response.time")
                .description("Chat response time")
                .register(meterRegistry);
    }

    public ChatResponse chat(ChatRequest request) {
        chatRequestCounter.increment();
        
        return chatResponseTimer.record(() -> {
            long startTime = System.currentTimeMillis();
            
            try {
                log.info("Processing chat with provider: {}", request.getProvider());
                
                ChatModel chatModel = selectChatModel(request.getProvider());
                Prompt prompt = new Prompt(request.getMessage());
                
                String response = chatModel.call(prompt)
                        .getResult()
                        .getOutput()
                        .getContent();
                
                long endTime = System.currentTimeMillis();
                
                return ChatResponse.builder()
                        .response(response)
                        .provider(request.getProvider())
                        .model(request.getModel())
                        .processingTimeMs((double) (endTime - startTime))
                        .timestamp(Instant.now())
                        .build();
                        
            } catch (Exception e) {
                log.error("Chat error: {}", e.getMessage(), e);
                throw new RuntimeException("Failed to process chat: " + e.getMessage(), e);
            }
        });
    }

    private ChatModel selectChatModel(String provider) {
        return switch (provider.toLowerCase()) {
            case "openai" -> openAiChatModel;
            case "ollama" -> ollamaChatModel;
            default -> throw new IllegalArgumentException(
                    "Unknown provider: " + provider
            );
        };
    }
}

Key Features:

  • ✅ Multi-provider support with simple switch
  • ✅ Metrics integration (request counter, response timer)
  • ✅ Comprehensive error handling
  • ✅ Logging for debugging

Step 3: Create REST Controller

ChatController.java:

package io.techyowls.springai.controller;

import io.techyowls.springai.model.ChatRequest;
import io.techyowls.springai.model.ChatResponse;
import io.techyowls.springai.service.ChatService;
import jakarta.validation.Valid;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;

@Slf4j
@RestController
@RequestMapping("/api/v1/chat")
@RequiredArgsConstructor
public class ChatController {

    private final ChatService chatService;

    @PostMapping
    public ResponseEntity<ChatResponse> chat(@Valid @RequestBody ChatRequest request) {
        log.info("Received chat request");
        ChatResponse response = chatService.chat(request);
        return ResponseEntity.ok(response);
    }

    @GetMapping("/health")
    public ResponseEntity<String> health() {
        return ResponseEntity.ok("Chat service is running");
    }
}

Step 4: Test the Chat API

Start the application:

./mvnw spring-boot:run

Test with OpenAI:

curl -X POST http://localhost:8080/api/v1/chat \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Explain Spring AI in 3 sentences",
    "provider": "openai"
  }'

Response:

{
  "response": "Spring AI is a framework that provides Spring-friendly abstractions for AI services...",
  "provider": "openai",
  "model": "gpt-4o-mini",
  "processingTimeMs": 1234.56,
  "timestamp": "2024-12-07T10:30:00Z"
}

Test with Ollama (Local):

curl -X POST http://localhost:8080/api/v1/chat \
  -H "Content-Type: application/json" \
  -d '{
    "message": "What is Java?",
    "provider": "ollama"
  }'

Building a RAG System {#rag-system}

RAG (Retrieval Augmented Generation) is the killer feature for production AI apps. It allows your AI to answer questions based on your own documents.

How RAG Works

RAG System Flow

Process:

  1. Ingest: Documents → Chunks → Embeddings → Vector Store
  2. Query: Question → Find similar docs → Build context
  3. Generate: Context + Question → AI → Answer

Step 1: Create RAG Models

RAGRequest.java:

package io.techyowls.springai.model;

import jakarta.validation.constraints.NotBlank;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;

@Data
@NoArgsConstructor
@AllArgsConstructor
public class RAGRequest {
    
    @NotBlank(message = "Question cannot be empty")
    private String question;
    
    private Integer topK = 5;
    private Double similarityThreshold = 0.7;
    private String provider = "openai";
}

RAGResponse.java:

package io.techyowls.springai.model;

import lombok.AllArgsConstructor;
import lombok.Builder;
import lombok.Data;
import lombok.NoArgsConstructor;

import java.util.List;

@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class RAGResponse {
    
    private String answer;
    private List<RetrievedDocument> sources;
    private Integer documentsRetrieved;
    private String provider;
    
    @Data
    @Builder
    @NoArgsConstructor
    @AllArgsConstructor
    public static class RetrievedDocument {
        private String content;
        private Double similarityScore;
        private String source;
    }
}

Step 2: Implement RAG Service

RAGService.java:

package io.techyowls.springai.service;

import io.techyowls.springai.model.RAGRequest;
import io.techyowls.springai.model.RAGResponse;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.chat.model.ChatModel;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.prompt.PromptTemplate;
import org.springframework.ai.document.Document;
import org.springframework.ai.openai.OpenAiChatModel;
import org.springframework.ai.ollama.OllamaChatModel;
import org.springframework.ai.reader.TextReader;
import org.springframework.ai.reader.pdf.PagePdfDocumentReader;
import org.springframework.ai.transformer.splitter.TextSplitter;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.core.io.Resource;
import org.springframework.stereotype.Service;

import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

@Slf4j
@Service
public class RAGService {

    private final VectorStore vectorStore;
    private final OpenAiChatModel openAiChatModel;
    private final OllamaChatModel ollamaChatModel;
    private final TextSplitter textSplitter;

    private static final String RAG_PROMPT_TEMPLATE = """
            You are a helpful assistant. Answer based on the following context.
            If the answer isn't in the context, say so.
            
            Context:
            {context}
            
            Question: {question}
            
            Answer:
            """;

    public RAGService(
            VectorStore vectorStore,
            @Qualifier("openAiChatModel") OpenAiChatModel openAiChatModel,
            @Qualifier("ollamaChatModel") OllamaChatModel ollamaChatModel) {
        this.vectorStore = vectorStore;
        this.openAiChatModel = openAiChatModel;
        this.ollamaChatModel = ollamaChatModel;
        this.textSplitter = new TokenTextSplitter();
    }

    public void ingestDocuments(List<Resource> resources) {
        log.info("Ingesting {} documents", resources.size());
        
        List<Document> allDocuments = resources.stream()
                .flatMap(resource -> {
                    try {
                        if (resource.getFilename() != null && 
                            resource.getFilename().endsWith(".pdf")) {
                            PagePdfDocumentReader pdfReader = 
                                new PagePdfDocumentReader(resource);
                            return pdfReader.get().stream();
                        } else {
                            TextReader textReader = new TextReader(resource);
                            return textReader.get().stream();
                        }
                    } catch (Exception e) {
                        log.error("Error reading: {}", resource.getFilename(), e);
                        return List.<Document>of().stream();
                    }
                })
                .toList();

        // Split into chunks
        List<Document> chunks = textSplitter.apply(allDocuments);
        
        // Store in vector database
        vectorStore.add(chunks);
        
        log.info("Ingested {} chunks", chunks.size());
    }

    public RAGResponse query(RAGRequest request) {
        log.info("RAG query: {}", request.getQuestion());
        
        // Step 1: Retrieve relevant documents
        SearchRequest searchRequest = SearchRequest.query(request.getQuestion())
                .withTopK(request.getTopK())
                .withSimilarityThreshold(request.getSimilarityThreshold());
        
        List<Document> similarDocuments = 
            vectorStore.similaritySearch(searchRequest);
        
        if (similarDocuments.isEmpty()) {
            return RAGResponse.builder()
                    .answer("No relevant information found.")
                    .sources(List.of())
                    .documentsRetrieved(0)
                    .provider(request.getProvider())
                    .build();
        }
        
        // Step 2: Build context
        String context = similarDocuments.stream()
                .map(Document::getContent)
                .collect(Collectors.joining("\n\n"));
        
        // Step 3: Generate answer with context
        PromptTemplate promptTemplate = new PromptTemplate(RAG_PROMPT_TEMPLATE);
        Prompt prompt = promptTemplate.create(Map.of(
                "context", context,
                "question", request.getQuestion()
        ));
        
        ChatModel chatModel = selectChatModel(request.getProvider());
        String answer = chatModel.call(prompt)
                .getResult()
                .getOutput()
                .getContent();
        
        // Step 4: Build response with sources
        List<RAGResponse.RetrievedDocument> sources = similarDocuments.stream()
                .map(doc -> RAGResponse.RetrievedDocument.builder()
                        .content(doc.getContent().substring(
                            0, Math.min(200, doc.getContent().length())) + "...")
                        .source(doc.getMetadata()
                            .getOrDefault("source", "Unknown").toString())
                        .build())
                .toList();
        
        return RAGResponse.builder()
                .answer(answer)
                .sources(sources)
                .documentsRetrieved(similarDocuments.size())
                .provider(request.getProvider())
                .build();
    }

    private ChatModel selectChatModel(String provider) {
        return switch (provider.toLowerCase()) {
            case "openai" -> openAiChatModel;
            case "ollama" -> ollamaChatModel;
            default -> throw new IllegalArgumentException("Unknown provider");
        };
    }
}

Critical RAG Concepts:

  1. Document Chunking: Split large docs into manageable pieces (TokenTextSplitter)
  2. Embeddings: Convert text to vectors (automatic via VectorStore)
  3. Similarity Search: Find most relevant chunks (cosine similarity)
  4. Context Building: Combine retrieved docs as context for AI
  5. Prompt Engineering: Structure prompt with context + question

Step 3: Create RAG Controller

RAGController.java:

package io.techyowls.springai.controller;

import io.techyowls.springai.model.RAGRequest;
import io.techyowls.springai.model.RAGResponse;
import io.techyowls.springai.service.RAGService;
import jakarta.validation.Valid;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.core.io.Resource;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import org.springframework.web.multipart.MultipartFile;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.List;

@Slf4j
@RestController
@RequestMapping("/api/v1/rag")
@RequiredArgsConstructor
public class RAGController {

    private final RAGService ragService;

    @PostMapping("/ingest")
    public ResponseEntity<String> ingestDocuments(
            @RequestParam("files") List<MultipartFile> files) {
        log.info("Received {} files", files.size());
        
        try {
            List<Resource> resources = files.stream()
                    .map(file -> {
                        try {
                            Path tempFile = Files.createTempFile(
                                "upload-", file.getOriginalFilename());
                            file.transferTo(tempFile.toFile());
                            return (Resource) new 
                                org.springframework.core.io.FileSystemResource(tempFile);
                        } catch (IOException e) {
                            log.error("Error: {}", file.getOriginalFilename(), e);
                            return null;
                        }
                    })
                    .filter(resource -> resource != null)
                    .toList();
            
            ragService.ingestDocuments(resources);
            
            return ResponseEntity.ok("Ingested " + resources.size() + " documents");
        } catch (Exception e) {
            return ResponseEntity.internalServerError()
                    .body("Error: " + e.getMessage());
        }
    }

    @PostMapping("/query")
    public ResponseEntity<RAGResponse> query(@Valid @RequestBody RAGRequest request) {
        RAGResponse response = ragService.query(request);
        return ResponseEntity.ok(response);
    }
}

Step 4: Test RAG System

Ingest Documents:

curl -X POST http://localhost:8080/api/v1/rag/ingest \
  -F "files=@spring-ai-docs.pdf" \
  -F "files=@java-guide.txt"

Query with RAG:

curl -X POST http://localhost:8080/api/v1/rag/query \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is Spring AI?",
    "topK": 5,
    "similarityThreshold": 0.7,
    "provider": "openai"
  }'

Response:

{
  "answer": "Spring AI is a framework that provides Spring-friendly abstractions...",
  "sources": [
    {
      "content": "Spring AI brings AI capabilities to Java applications using familiar Spring patterns...",
      "source": "spring-ai-docs.pdf",
      "similarityScore": 0.92
    }
  ],
  "documentsRetrieved": 5,
  "provider": "openai"
}

Vector Stores & Embeddings {#vector-stores}

Understanding Embeddings

Embeddings convert text into numerical vectors that capture semantic meaning:

“bash “Spring Boot” → [0.12, 0.45, -0.23, …, 0.67] (1536 dimensions) “Java Framework” → [0.14, 0.43, -0.21, …, 0.69] (similar!) “Cat” → [-0.45, 0.12, 0.88, …, -0.23] (very different)


**Key Concept**: Similar text = similar vectors

### pgvector Configuration

Our `application.yml` configures:

```yaml
spring:
  ai:
    vectorstore:
      pgvector:
        initialize-schema: true  # Auto-create tables
        index-type: HNSW         # Fast approximate search
        distance-type: COSINE_DISTANCE  # Similarity metric
        dimensions: 1536         # OpenAI embedding size

Index Types:

TypeSpeedAccuracyUse Case
HNSWFast~95%Production (recommended)
IVFFlatMedium~90%Large datasets
ExactSlow100%Small datasets, testing

How Vector Search Works

Vector Search Process

Process:

  1. Convert query to embedding
  2. Calculate cosine similarity with all stored vectors
  3. Return top-K most similar documents

Production Considerations {#production}

Error Handling

GlobalExceptionHandler.java:

package io.techyowls.springai.exception;

import lombok.extern.slf4j.Slf4j;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.validation.FieldError;
import org.springframework.web.bind.MethodArgumentNotValidException;
import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.bind.annotation.RestControllerAdvice;

import java.time.Instant;
import java.util.HashMap;
import java.util.Map;

@Slf4j
@RestControllerAdvice
public class GlobalExceptionHandler {

    @ExceptionHandler(MethodArgumentNotValidException.class)
    public ResponseEntity<Map<String, Object>> handleValidationExceptions(
            MethodArgumentNotValidException ex) {
        Map<String, String> errors = new HashMap<>();
        ex.getBindingResult().getAllErrors().forEach(error -> {
            String fieldName = ((FieldError) error).getField();
            String errorMessage = error.getDefaultMessage();
            errors.put(fieldName, errorMessage);
        });
        
        Map<String, Object> response = new HashMap<>();
        response.put("timestamp", Instant.now());
        response.put("status", HttpStatus.BAD_REQUEST.value());
        response.put("error", "Validation Failed");
        response.put("errors", errors);
        
        return ResponseEntity.badRequest().body(response);
    }

    @ExceptionHandler(RuntimeException.class)
    public ResponseEntity<Map<String, Object>> handleRuntimeException(
            RuntimeException ex) {
        log.error("Runtime error: {}", ex.getMessage(), ex);
        
        Map<String, Object> response = new HashMap<>();
        response.put("timestamp", Instant.now());
        response.put("status", HttpStatus.INTERNAL_SERVER_ERROR.value());
        response.put("error", "Internal Server Error");
        response.put("message", ex.getMessage());
        
        return ResponseEntity.internalServerError().body(response);
    }
}

Monitoring & Metrics

Key Metrics to Track:

// In ChatService
Counter.builder("chat.requests.total").register(meterRegistry);
Timer.builder("chat.response.time").register(meterRegistry);

Prometheus Endpoint: http://localhost:8080/actuator/prometheus

Sample Metrics:

# TYPE chat_requests_total counter
chat_requests_total{provider="openai"} 1247.0

# TYPE chat_response_time_seconds summary
chat_response_time_seconds_sum 156.3
chat_response_time_seconds_count 1247

Rate Limiting

For production, add rate limiting:

@RateLimiter(name = "chatApi", fallbackMethod = "chatFallback")
public ChatResponse chat(ChatRequest request) {
    // ... existing code
}

Cost Optimization

Tips:

  1. Cache responses: Use Spring Cache for repeated queries
  2. Use smaller models: gpt-4o-mini vs gpt-4
  3. Limit tokens: Set maxTokens parameter
  4. Batch requests: Process multiple questions at once
  5. Use Ollama: Free local inference for development

Testing & Deployment {#testing-deployment}

Unit Testing

ChatServiceTest.java:

package io.techyowls.springai.service;

import io.techyowls.springai.model.ChatRequest;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;

import static org.junit.jupiter.api.Assertions.*;

@SpringBootTest
class ChatServiceTest {

    @Autowired
    private ChatService chatService;

    @Test
    void testChatWithOpenAI() {
        ChatRequest request = new ChatRequest();
        request.setMessage("Say 'test successful'");
        request.setProvider("openai");

        assertDoesNotThrow(() -> chatService.chat(request));
    }
}

Docker Deployment

Build image:

./mvnw spring-boot:build-image -Dspring-boot.build-image.imageName=techyowls/spring-ai:latest

Run with Docker:

docker run -p 8080:8080 \
  -e OPENAI_API_KEY=your-key \
  -e SPRING_DATASOURCE_URL=jdbc:postgresql://host.docker.internal:5432/vectordb \
  techyowls/spring-ai:latest

Kubernetes Deployment

deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-ai-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: spring-ai
  template:
    metadata:
      labels:
        app: spring-ai
    spec:
      containers:
      - name: spring-ai
        image: techyowls/spring-ai:latest
        ports:
        - containerPort: 8080
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: ai-secret
              key: openai-api-key
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /actuator/health/liveness
            port: 8080
          initialDelaySeconds: 60
        readinessProbe:
          httpGet:
            path: /actuator/health/readiness
            port: 8080

Deploy:

kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml

Complete Code Repository {#code-repository}

🎁 Get the full source code:

GitHub: https://github.com/Moshiour027/spring-boot-ai-starter

What’s included:

  • ✅ Complete Spring Boot 3 + Spring AI application
  • ✅ Multi-provider support (OpenAI + Ollama)
  • ✅ Production-ready RAG system
  • ✅ Docker Compose setup
  • ✅ Kubernetes manifests
  • ✅ Unit tests
  • ✅ Comprehensive README

Quick start:

git clone https://github.com/Moshiour027/techyowls-branding.git
cd techyowls-branding/spring-boot-ai-starter
docker-compose up -d
./mvnw spring-boot:run

Conclusion

You’ve just built a production-ready AI application in Java! Here’s what we covered:

✅ Spring AI fundamentals and architecture
✅ Multi-provider chat completions (OpenAI + Ollama)
✅ Advanced RAG system with vector stores
✅ PostgreSQL pgvector integration
✅ Production features (metrics, error handling, health checks)
✅ Docker and Kubernetes deployment

Next Steps

  1. Add function calling: Let AI call your Java methods
  2. Implement caching: Reduce costs with Redis
  3. Add streaming responses: Real-time chat UX
  4. Build a UI: React/Angular frontend
  5. Scale horizontally: Kubernetes auto-scaling

Resources


FAQs

Q: Can I use Azure OpenAI instead?
A: Yes! Change dependencies to spring-ai-azure-openai-spring-boot-starter and update configuration.

Q: How much does this cost to run?
A: With gpt-4o-mini: ~$0.15/1M input tokens, $0.60/1M output tokens. Ollama is free.

Q: Can I use this in production?
A: Absolutely! Add rate limiting, caching, and monitoring for enterprise use.

Q: What about image generation?
A: Spring AI supports DALL-E. Add ImageClient similar to ChatClient.

Q: Performance concerns?
A: Vector search is fast (~ms). AI calls depend on provider (OpenAI: 1-3s, Ollama: faster but less accurate).


Found this helpful? Star the GitHub repo and follow TechyOwls for more Java + AI tutorials!

Questions? Drop a comment below or reach out on Twitter/X!


Published: December 7, 2024
Tags: #SpringAI #Java #SpringBoot #AI #RAG #OpenAI #Ollama #VectorStore

Advertisement

MR

Moshiour Rahman

Software Architect & AI Engineer

Share:
MR

Moshiour Rahman

Software Architect & AI Engineer

Enterprise software architect with deep expertise in financial systems, distributed architecture, and AI-powered applications. Building large-scale systems at Fortune 500 companies. Specializing in LLM orchestration, multi-agent systems, and cloud-native solutions. I share battle-tested patterns from real enterprise projects.

Related Articles

Comments

Comments are powered by GitHub Discussions.

Configure Giscus at giscus.app to enable comments.