このバージョンはまだ開発中であり、まだ安定しているとは考えられていません。最新のスナップショットバージョンについては、Spring AI 1.0.3 を使用してください。

Bedrock Converse API

Amazon Bedrock コンバース API は、関数 / ツールの呼び出し、マルチモーダル入力、ストリーミングレスポンスなどの強化された機能を備えた会話型 AI モデル用の統合インターフェースを提供します。

Bedrock Converse API には、次のような高レベルの機能があります。

ツール / 関数の呼び出し: 会話中の機能定義とツールの使用をサポート
マルチモーダル入力: 会話中のテキストとイメージ入力の両方を処理する機能
ストリーミングサポート: モデルレスポンスのリアルタイムストリーミング
システムメッセージ: システムレベルの命令とコンテキスト設定のサポート

Bedrock Converse API は、AWS 固有の認証とインフラストラクチャに関する関心事に対応しながら、複数のモデルプロバイダー間で統一されたインターフェースを提供します。現在、Converse API 対応モデル [Amazon] には Amazon Titan、Amazon Nova、AI21 Labs、Anthropic Claude、Cohere Command、Meta Llama、Mistral AI が含まれています。

Bedrock の推奨事項に従い、Spring AI は、Spring AI のすべてのチャット会話実装に Amazon Bedrock の Converse API を使用するように移行しています。既存の InvokeModel API は会話アプリケーションをサポートしていますが、すべてのチャット会話モデルに Converse API を採用することを強くお勧めします。

Converse API は埋め込み操作をサポートしていないため、これらは現在の API に残り、既存の InvokeModel API の埋め込みモデル機能は維持されます。

前提条件

API アクセスの設定については Amazon Bedrock 入門を参照してください

AWS 認証情報を取得する: AWS アカウントと AWS CLI がまだ設定されていない場合は、このビデオガイドが設定に役立ちます: Less での AWS CLI と SDK のセットアップは 4 分以内です ! (英語) 。アクセスキーとセキュリティキーを取得できるはずです。
使用するモデルを有効にする: Amazon Bedrock に移動し、左側のモデルアクセス [Amazon] メニューから、使用するモデルへのアクセスを構成します。

自動構成

Spring AI 自動構成、スターターモジュールのアーティファクト名に大きな変更がありました。詳細については、アップグレードノートを参照してください。

プロジェクトの Maven pom.xml または Gradle build.gradle ビルドファイルに spring-ai-starter-model-bedrock-converse 依存関係を追加します。

Maven
Gradle

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-bedrock-converse</artifactId>
</dependency>

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-model-bedrock-converse'
}

Spring AI BOM をビルドファイルに追加するには、"依存関係管理" セクションを参照してください。

チャットのプロパティ

プレフィックス spring.ai.bedrock.aws は、AWS Bedrock への接続を設定するためのプロパティプレフィックスです。

プロパティ	説明	デフォルト
spring.ai.bedrock.aws.region	AWS region to use	米国東部 -1
spring.ai.bedrock.aws.timeout	AWS max duration for entire API call	5 分
spring.ai.bedrock.aws.connectionTimeout	Max duration to wait while establishing connection	5s
spring.ai.bedrock.aws.connectionAcquisitionTimeout	Max duration to wait for new connection from the pool	30 代
spring.ai.bedrock.aws.asyncReadTimeout	Max duration spent reading asynchronous responses	30 代
spring.ai.bedrock.aws.access-key	AWS アクセスキー	-
spring.ai.bedrock.aws.secret-key	AWS 秘密鍵	-
spring.ai.bedrock.aws.session-token	AWS session token for temporary credentials	-

プロパティ

説明

デフォルト

spring.ai.bedrock.aws.region

AWS region to use

米国東部 -1

spring.ai.bedrock.aws.timeout

AWS max duration for entire API call

5 分

spring.ai.bedrock.aws.connectionTimeout

Max duration to wait while establishing connection

spring.ai.bedrock.aws.connectionAcquisitionTimeout

Max duration to wait for new connection from the pool

30 代

spring.ai.bedrock.aws.asyncReadTimeout

Max duration spent reading asynchronous responses

30 代

spring.ai.bedrock.aws.access-key

AWS アクセスキー

spring.ai.bedrock.aws.secret-key

AWS 秘密鍵

spring.ai.bedrock.aws.session-token

AWS session token for temporary credentials

チャットの自動構成の有効化と無効化は、プレフィックス spring.ai.model.chat を持つ最上位プロパティを介して設定されるようになりました。

有効にするには、spring.ai.model.chat=bedrock-converse (デフォルトで有効になっています)

無効にするには、spring.ai.model.chat=none (または bedrock-converse と一致しない値)

この変更は、複数のモデルの構成を可能にするために行われます。

プレフィックス spring.ai.bedrock.converse.chat は、Converse API のチャットモデル実装を構成するプロパティプレフィックスです。

プロパティ	説明	デフォルト
spring.ai.bedrock.converse.chat.enabled (削除され、無効になりました)	Bedrock Converse チャットモデルを有効にします。	true
spring.ai.model.chat	Bedrock Converse チャットモデルを有効にします。	bedrock-converse
spring.ai.bedrock.converse.chat.options.model	使用するモデル ID。サポートされているモデルとモデル機能 [Amazon] を使用できます	なし。AWS Bedrock コンソールから modelId [Amazon] を選択します。
spring.ai.bedrock.converse.chat.options.temperature	出力のランダム性を制御します。値の範囲は [0.0,1.0] です	0.8
spring.ai.bedrock.converse.chat.options.top-p	サンプリング時に考慮するトークンの最大累積確率。	AWS Bedrock のデフォルト
spring.ai.bedrock.converse.chat.options.top-k	次のトークンを生成するためのトークン選択の数。	AWS Bedrock のデフォルト
spring.ai.bedrock.converse.chat.options.max-tokens	生成されたレスポンス内のトークンの最大数。	500

プロパティ

説明

デフォルト

spring.ai.bedrock.converse.chat.enabled (削除され、無効になりました)

Bedrock Converse チャットモデルを有効にします。

true

spring.ai.model.chat

Bedrock Converse チャットモデルを有効にします。

bedrock-converse

spring.ai.bedrock.converse.chat.options.model

使用するモデル ID。サポートされているモデルとモデル機能 [Amazon] を使用できます

なし。AWS Bedrock コンソールから modelId [Amazon] を選択します。

spring.ai.bedrock.converse.chat.options.temperature

出力のランダム性を制御します。値の範囲は [0.0,1.0] です

0.8

spring.ai.bedrock.converse.chat.options.top-p

サンプリング時に考慮するトークンの最大累積確率。

AWS Bedrock のデフォルト

spring.ai.bedrock.converse.chat.options.top-k

次のトークンを生成するためのトークン選択の数。

AWS Bedrock のデフォルト

spring.ai.bedrock.converse.chat.options.max-tokens

生成されたレスポンス内のトークンの最大数。

500

ランタイムオプション

ポータブル ChatOptions または BedrockChatOptions ポータブルビルダーを使用して、温度、maxToken、topP などのモデル構成を作成します。

起動時に、BedrockConverseProxyChatModel(api, options) コンストラクターまたは spring.ai.bedrock.converse.chat.options.* プロパティを使用してデフォルトのオプションを構成できます。

実行時に、Prompt 呼び出しに新しいリクエスト固有のオプションを追加することで、デフォルトのオプションをオーバーライドできます。

var options = BedrockChatOptions.builder()
        .model("anthropic.claude-3-5-sonnet-20240620-v1:0")
        .temperature(0.6)
        .maxTokens(300)
        .toolCallbacks(List.of(FunctionToolCallback.builder("getCurrentWeather", new WeatherService())
            .description("Get the weather in location. Return temperature in 36°F or 36°C format. Use multi-turn if needed.")
            .inputType(WeatherService.Request.class)
            .build()))
        .build();

String response = ChatClient.create(this.chatModel)
    .prompt("What is current weather in Amsterdam?")
    .options(options)
    .call()
    .content();

プロンプトキャッシュ

AWS Bedrock’s prompt caching feature [Amazon] allows you to cache frequently used prompts to reduce costs and improve response times for repeated interactions. When you cache a prompt, subsequent identical requests can reuse the cached content, significantly reducing the number of input tokens processed.

対応モデル

Prompt caching is supported on Claude 3.x, Claude 4.x, and Amazon Nova models available through AWS Bedrock.

トークン要件

Different models have different minimum token thresholds for cache effectiveness: - Claude Sonnet 4 and most models: 1024+ tokens - Model-specific requirements may vary - consult AWS Bedrock documentation

キャッシュ戦略

Spring AI は、BedrockCacheStrategy 列挙型を通じて戦略的なキャッシュ配置を提供します。

NONE: プロンプトのキャッシュを完全に無効にする (default)
SYSTEM_ONLY: システムメッセージの内容のみをキャッシュします
TOOLS_ONLY: Caches tool definitions only (Claude models only)
SYSTEM_AND_TOOLS: Caches both system message and tool definitions (Claude models only)
CONVERSATION_HISTORY: Caches entire conversation history in chat memory scenarios

This strategic approach ensures optimal cache breakpoint placement while staying within AWS Bedrock’s 4-breakpoint limit.

Amazon Nova Limitations

Amazon Nova models (Nova Micro, Lite, Pro, Premier) only support caching for system and messages content. They do not support caching for tools.

If you attempt to use TOOLS_ONLY or SYSTEM_AND_TOOLS strategies with Nova models, AWS will return a ValidationException. Use SYSTEM_ONLY strategy for Amazon Nova models.

プロンプトキャッシュを有効にする

BedrockChatOptions に cacheOptions を設定し、strategy を選択してプロンプトキャッシュを有効にします。

System-Only Caching

The most common use case - cache system instructions across multiple requests:

// Cache system message content
ChatResponse response = chatModel.call(
    new Prompt(
        List.of(
            new SystemMessage("You are a helpful AI assistant with extensive knowledge..."),
            new UserMessage("What is machine learning?")
        ),
        BedrockChatOptions.builder()
            .model("us.anthropic.claude-3-7-sonnet-20250219-v1:0")
            .cacheOptions(BedrockCacheOptions.builder()
                .strategy(BedrockCacheStrategy.SYSTEM_ONLY)
                .build())
            .maxTokens(500)
            .build()
    )
);

Tools-Only Caching

Cache large tool definitions while keeping system prompts dynamic (Claude models only):

// Cache tool definitions only
ChatResponse response = chatModel.call(
    new Prompt(
        "What's the weather in San Francisco?",
        BedrockChatOptions.builder()
            .model("us.anthropic.claude-3-7-sonnet-20250219-v1:0")
            .cacheOptions(BedrockCacheOptions.builder()
                .strategy(BedrockCacheStrategy.TOOLS_ONLY)
                .build())
            .toolCallbacks(weatherToolCallbacks)  // Large tool definitions
            .maxTokens(500)
            .build()
    )
);

This strategy is only supported on Claude models. Amazon Nova models will return a ValidationException.

System and Tools Caching

Cache both system instructions and tool definitions for maximum reuse (Claude models only):

// Cache system message and tool definitions
ChatResponse response = chatModel.call(
    new Prompt(
        List.of(
            new SystemMessage("You are a weather analysis assistant..."),
            new UserMessage("What's the weather like in Tokyo?")
        ),
        BedrockChatOptions.builder()
            .model("us.anthropic.claude-3-7-sonnet-20250219-v1:0")
            .cacheOptions(BedrockCacheOptions.builder()
                .strategy(BedrockCacheStrategy.SYSTEM_AND_TOOLS)
                .build())
            .toolCallbacks(weatherToolCallbacks)
            .maxTokens(500)
            .build()
    )
);

This strategy uses 2 cache breakpoints (one for tools, one for system). Only supported on Claude models.

Conversation History Caching

Cache growing conversation history for multi-turn chatbots and assistants:

// Cache conversation history with ChatClient and memory
ChatClient chatClient = ChatClient.builder(chatModel)
    .defaultSystem("You are a personalized career counselor...")
    .defaultAdvisors(MessageChatMemoryAdvisor.builder(chatMemory)
        .conversationId(conversationId)
        .build())
    .build();

String response = chatClient.prompt()
    .user("What career advice would you give me?")
    .options(BedrockChatOptions.builder()
        .model("us.anthropic.claude-3-7-sonnet-20250219-v1:0")
        .cacheOptions(BedrockCacheOptions.builder()
            .strategy(BedrockCacheStrategy.CONVERSATION_HISTORY)
            .build())
        .maxTokens(500)
        .build())
    .call()
    .content();

Using ChatClient Fluent API

String response = ChatClient.create(chatModel)
    .prompt()
    .system("You are an expert document analyst...")
    .user("Analyze this large document: " + document)
    .options(BedrockChatOptions.builder()
        .model("us.anthropic.claude-3-7-sonnet-20250219-v1:0")
        .cacheOptions(BedrockCacheOptions.builder()
            .strategy(BedrockCacheStrategy.SYSTEM_ONLY)
            .build())
        .build())
    .call()
    .content();

使用例

Here’s a complete example demonstrating prompt caching with cost tracking:

// Create system content that will be reused multiple times
String largeSystemPrompt = "You are an expert software architect specializing in distributed systems...";
// (Ensure this is 1024+ tokens for cache effectiveness)

// First request - creates cache
ChatResponse firstResponse = chatModel.call(
    new Prompt(
        List.of(
            new SystemMessage(largeSystemPrompt),
            new UserMessage("What is microservices architecture?")
        ),
        BedrockChatOptions.builder()
            .model("us.anthropic.claude-3-7-sonnet-20250219-v1:0")
            .cacheOptions(BedrockCacheOptions.builder()
                .strategy(BedrockCacheStrategy.SYSTEM_ONLY)
                .build())
            .maxTokens(500)
            .build()
    )
);

// Access cache-related token usage from metadata
Integer cacheWrite1 = (Integer) firstResponse.getMetadata()
    .getMetadata()
    .get("cacheWriteInputTokens");
Integer cacheRead1 = (Integer) firstResponse.getMetadata()
    .getMetadata()
    .get("cacheReadInputTokens");

System.out.println("Cache creation tokens: " + cacheWrite1);
System.out.println("Cache read tokens: " + cacheRead1);

// Second request with same system prompt - reads from cache
ChatResponse secondResponse = chatModel.call(
    new Prompt(
        List.of(
            new SystemMessage(largeSystemPrompt),  // Same prompt - cache hit
            new UserMessage("What are the benefits of event sourcing?")
        ),
        BedrockChatOptions.builder()
            .model("us.anthropic.claude-3-7-sonnet-20250219-v1:0")
            .cacheOptions(BedrockCacheOptions.builder()
                .strategy(BedrockCacheStrategy.SYSTEM_ONLY)
                .build())
            .maxTokens(500)
            .build()
    )
);

Integer cacheWrite2 = (Integer) secondResponse.getMetadata()
    .getMetadata()
    .get("cacheWriteInputTokens");
Integer cacheRead2 = (Integer) secondResponse.getMetadata()
    .getMetadata()
    .get("cacheReadInputTokens");

System.out.println("Cache creation tokens: " + cacheWrite2); // Should be 0
System.out.println("Cache read tokens: " + cacheRead2);      // Should be > 0

Token Usage Tracking

AWS Bedrock provides cache-specific metrics through the response metadata. Cache metrics are accessible via the metadata Map:

ChatResponse response = chatModel.call(/* ... */);

// Access cache metrics from metadata Map
Integer cacheWrite = (Integer) response.getMetadata()
    .getMetadata()
    .get("cacheWriteInputTokens");
Integer cacheRead = (Integer) response.getMetadata()
    .getMetadata()
    .get("cacheReadInputTokens");

Cache-specific metrics include:

cacheWriteInputTokens: Returns the number of tokens used when creating a cache entry
cacheReadInputTokens: Returns the number of tokens read from an existing cache entry

When you first send a cached prompt: - cacheWriteInputTokens will be greater than 0 - cacheReadInputTokens will be 0

When you send the same cached prompt again (within 5-minute TTL): - cacheWriteInputTokens will be 0 - cacheReadInputTokens will be greater than 0

Real-World Use Cases

Legal Document Analysis

Analyze large legal contracts or compliance documents efficiently by caching document content across multiple questions:

// Load a legal contract (PDF or text)
String legalContract = loadDocument("merger-agreement.pdf"); // ~3000 tokens

// System prompt with legal expertise
String legalSystemPrompt = "You are an expert legal analyst specializing in corporate law. " +
    "Analyze the following contract and provide precise answers about terms, obligations, and risks: " +
    legalContract;

// First analysis - creates cache
ChatResponse riskAnalysis = chatModel.call(
    new Prompt(
        List.of(
            new SystemMessage(legalSystemPrompt),
            new UserMessage("What are the key termination clauses and associated penalties?")
        ),
        BedrockChatOptions.builder()
            .model("us.anthropic.claude-3-7-sonnet-20250219-v1:0")
            .cacheOptions(BedrockCacheOptions.builder()
                .strategy(BedrockCacheStrategy.SYSTEM_ONLY)
                .build())
            .maxTokens(1000)
            .build()
    )
);

// Subsequent questions reuse cached document - 90% cost savings
ChatResponse obligationAnalysis = chatModel.call(
    new Prompt(
        List.of(
            new SystemMessage(legalSystemPrompt), // Same content - cache hit
            new UserMessage("List all financial obligations and payment schedules.")
        ),
        BedrockChatOptions.builder()
            .model("us.anthropic.claude-3-7-sonnet-20250219-v1:0")
            .cacheOptions(BedrockCacheOptions.builder()
                .strategy(BedrockCacheStrategy.SYSTEM_ONLY)
                .build())
            .maxTokens(1000)
            .build()
    )
);

Batch Code Review

Process multiple code files with consistent review criteria while caching the review guidelines:

// Define comprehensive code review guidelines
String reviewGuidelines = """
    You are a senior software engineer conducting code reviews. Apply these criteria:
    - Security vulnerabilities and best practices
    - Performance optimizations and memory usage
    - Code maintainability and readability
    - Testing coverage and edge cases
    - Design patterns and architecture compliance
    """;

List<String> codeFiles = Arrays.asList(
    "UserService.java", "PaymentController.java", "SecurityConfig.java"
);

List<String> reviews = new ArrayList<>();

for (String filename : codeFiles) {
    String sourceCode = loadSourceFile(filename);

    ChatResponse review = chatModel.call(
        new Prompt(
            List.of(
                new SystemMessage(reviewGuidelines), // Cached across all reviews
                new UserMessage("Review this " + filename + " code:\n\n" + sourceCode)
            ),
            BedrockChatOptions.builder()
                .model("us.anthropic.claude-3-7-sonnet-20250219-v1:0")
                .cacheOptions(BedrockCacheOptions.builder()
                    .strategy(BedrockCacheStrategy.SYSTEM_ONLY)
                    .build())
                .maxTokens(800)
                .build()
        )
    );

    reviews.add(review.getResult().getOutput().getText());
}

// Guidelines cached after first request, subsequent reviews are faster and cheaper

Customer Support with Knowledge Base

Create a customer support system that caches your product knowledge base for consistent, accurate responses:

// Load comprehensive product knowledge
String knowledgeBase = """
    PRODUCT DOCUMENTATION:
    - API endpoints and authentication methods
    - Common troubleshooting procedures
    - Billing and subscription details
    - Integration guides and examples
    - Known issues and workarounds
    """ + loadProductDocs(); // ~2500 tokens

@Service
public class CustomerSupportService {

    public String handleCustomerQuery(String customerQuery, String customerId) {
        ChatResponse response = chatModel.call(
            new Prompt(
                List.of(
                    new SystemMessage("You are a helpful customer support agent. " +
                        "Use this knowledge base to provide accurate solutions: " + knowledgeBase),
                    new UserMessage("Customer " + customerId + " asks: " + customerQuery)
                ),
                BedrockChatOptions.builder()
                    .model("us.anthropic.claude-3-7-sonnet-20250219-v1:0")
                    .cacheOptions(BedrockCacheOptions.builder()
                        .strategy(BedrockCacheStrategy.SYSTEM_ONLY)
                        .build())
                    .maxTokens(600)
                    .build()
            )
        );

        return response.getResult().getOutput().getText();
    }
}

// Knowledge base is cached across all customer queries
// Multiple support agents can benefit from the same cached content

Multi-Tenant SaaS Application

Cache shared tool definitions across different tenants while customizing system prompts per tenant:

// Shared tool definitions (cached once, used across all tenants)
List<FunctionToolCallback> sharedTools = createLargeToolRegistry(); // ~2000 tokens

// Tenant-specific configuration
@Service
public class MultiTenantAIService {

    public String processRequest(String tenantId, String userQuery) {
        // Load tenant-specific system prompt (changes per tenant)
        String tenantPrompt = loadTenantSystemPrompt(tenantId);

        ChatResponse response = chatModel.call(
            new Prompt(
                List.of(
                    new SystemMessage(tenantPrompt), // Tenant-specific, not cached
                    new UserMessage(userQuery)
                ),
                BedrockChatOptions.builder()
                    .model("us.anthropic.claude-3-7-sonnet-20250219-v1:0")
                    .cacheOptions(BedrockCacheOptions.builder()
                        .strategy(BedrockCacheStrategy.TOOLS_ONLY)
                        .build())
                    .toolCallbacks(sharedTools) // Shared tools - cached
                    .maxTokens(500)
                    .build()
            )
        );

        return response.getResult().getOutput().getText();
    }
}

// Tools cached once, each tenant gets customized system prompt

ベストプラクティス

Choose the Right Strategy :
- Use SYSTEM_ONLY for reusable system prompts and instructions (works with all models)
- Use TOOLS_ONLY when you have large stable tools but dynamic system prompts (Claude only)
- Use SYSTEM_AND_TOOLS when both system and tools are large and stable (Claude only)
- Use CONVERSATION_HISTORY with ChatClient memory for multi-turn conversations
- Use NONE to explicitly disable caching
Meet Token Requirements : Focus on caching content that meets the minimum token requirements (1024+ tokens for most models).
Reuse Identical Content : Caching works best with exact matches of prompt content. Even small changes will require a new cache entry.

Monitor Token Usage : Track cache effectiveness using the metadata metrics:

Integer cacheWrite = (Integer) response.getMetadata().getMetadata().get("cacheWriteInputTokens");
Integer cacheRead = (Integer) response.getMetadata().getMetadata().get("cacheReadInputTokens");
if (cacheRead != null && cacheRead > 0) {
    System.out.println("Cache hit: " + cacheRead + " tokens saved");
}

Strategic Cache Placement : The implementation automatically places cache breakpoints at optimal locations based on your chosen strategy, ensuring compliance with AWS Bedrock’s 4-breakpoint limit.
Cache Lifetime : AWS Bedrock caches have a fixed 5-minute TTL (Time To Live). Each cache access resets the timer.
Model Compatibility : Be aware of model-specific limitations:
- Claude モデル : Support all caching strategies
- Amazon Nova モデル : Only support SYSTEM_ONLY and CONVERSATION_HISTORY (tool caching not supported)
Tool Stability : When using TOOLS_ONLY, SYSTEM_AND_TOOLS, or CONVERSATION_HISTORY strategies, ensure tools remain stable. Changing tool definitions will invalidate all downstream cache breakpoints due to cascade invalidation.

Cache Invalidation and Cascade Behavior

AWS Bedrock follows a hierarchical cache model with cascade invalidation:

Cache Hierarchy : Tools → System → Messages

Changes at each level invalidate that level and all subsequent levels:

What Changes	Tools Cache	System Cache	Messages Cache
ツール	❌ Invalid	❌ Invalid	❌ Invalid
システム	✅ Valid	❌ Invalid	❌ Invalid
メッセージ	✅ Valid	✅ Valid	❌ Invalid

What Changes

Tools Cache

System Cache

Messages Cache

ツール

❌ Invalid

システム

✅ Valid

❌ Invalid

メッセージ

✅ Valid

❌ Invalid

Example with SYSTEM_AND_TOOLS strategy :

// Request 1: Cache both tools and system
ChatResponse r1 = chatModel.call(
    new Prompt(
        List.of(new SystemMessage("System prompt"), new UserMessage("Question")),
        BedrockChatOptions.builder()
            .cacheOptions(BedrockCacheOptions.builder()
                .strategy(BedrockCacheStrategy.SYSTEM_AND_TOOLS)
                .build())
            .toolCallbacks(tools)
            .build()
    )
);
// Result: Both caches created

// Request 2: Change only system prompt (tools same)
ChatResponse r2 = chatModel.call(
    new Prompt(
        List.of(new SystemMessage("DIFFERENT system prompt"), new UserMessage("Question")),
        BedrockChatOptions.builder()
            .cacheOptions(BedrockCacheOptions.builder()
                .strategy(BedrockCacheStrategy.SYSTEM_AND_TOOLS)
                .build())
            .toolCallbacks(tools) // SAME tools
            .build()
    )
);
// Result: Tools cache HIT (reused), system cache MISS (recreated)

// Request 3: Change tools (system same as Request 2)
ChatResponse r3 = chatModel.call(
    new Prompt(
        List.of(new SystemMessage("DIFFERENT system prompt"), new UserMessage("Question")),
        BedrockChatOptions.builder()
            .cacheOptions(BedrockCacheOptions.builder()
                .strategy(BedrockCacheStrategy.SYSTEM_AND_TOOLS)
                .build())
            .toolCallbacks(newTools) // DIFFERENT tools
            .build()
    )
);
// Result: BOTH caches MISS (tools change invalidates everything downstream)

実装の詳細

The prompt caching implementation in Spring AI follows these key design principles:

Strategic Cache Placement : Cache breakpoints are automatically placed at optimal locations based on the chosen strategy, ensuring compliance with AWS Bedrock’s 4-breakpoint limit.
Provider Portability : Cache configuration is done through BedrockChatOptions rather than individual messages, preserving compatibility when switching between different AI providers.
スレッドセーフ : The cache breakpoint tracking is implemented with thread-safe mechanisms to handle concurrent requests correctly.
UNION Type Pattern : AWS SDK uses UNION types where cache points are added as separate blocks rather than properties. This is different from direct API approaches but ensures type safety and API compliance.
Incremental Caching : The CONVERSATION_HISTORY strategy places cache breakpoints on the last user message, enabling incremental caching where each conversation turn builds on the previous cached prefix.

Cost Considerations

AWS Bedrock pricing for prompt caching (approximate, varies by model):

Cache writes : ~25% more expensive than base input tokens
Cache reads : ~90% cheaper (only 10% of base input token price)
Break-even point : After just 1 cache read, you’ve saved money

Example cost calculation :

// System prompt: 2000 tokens
// User question: 50 tokens

// Without caching (5 requests):
// Cost: 5 × (2000 + 50) = 10,250 tokens at base rate

// With caching (5 requests):
// Request 1: 2000 tokens × 1.25 (cache write) + 50 = 2,550 tokens
// Requests 2-5: 4 × (2000 × 0.10 (cache read) + 50) = 4 × 250 = 1,000 tokens
// Total: 2,550 + 1,000 = 3,550 tokens equivalent

// Savings: (10,250 - 3,550) / 10,250 = 65% cost reduction

ツール呼び出し

Bedrock Converse API はツール呼び出し機能をサポートしており、モデルが会話中にツールを使用できるようにします。@Tool ベースのツールを定義して使用する方法の例を次に示します。

public class WeatherService {

    @Tool(description = "Get the weather in location")
    public String weatherByLocation(@ToolParam(description= "City or state name") String location) {
        ...
    }
}

String response = ChatClient.create(this.chatModel)
        .prompt("What's the weather like in Boston?")
        .tools(new WeatherService())
        .call()
        .content();

java.util.function Bean をツールとしても使用できます。

@Bean
@Description("Get the weather in location. Return temperature in 36°F or 36°C format.")
public Function<Request, Response> weatherFunction() {
    return new MockWeatherService();
}

String response = ChatClient.create(this.chatModel)
        .prompt("What's the weather like in Boston?")
        .toolNames("weatherFunction")
        .inputType(Request.class)
        .call()
        .content();

詳細については、ツールのドキュメントを参照してください。

マルチモーダル

マルチモーダル性とは、テキスト、イメージ、ビデオ、PDF、DOC、HTML、MD などのデータ形式を含むさまざまなソースからの情報を同時に理解して処理するモデルの機能を指します。

Bedrock Converse API は、テキスト入力やイメージ入力などのマルチモーダル入力をサポートし、組み合わせた入力に基づいてテキストレスポンスを生成できます。

Anthropic、Claude、Amazon Nova モデルなど、マルチモーダル入力をサポートするモデルが必要です。

イメージ

Amazon Nova、Anthropic Claude、Llama 3.2 などのビジョンマルチモダリティをサポートするモデル [Amazon] の場合、Bedrock Converse API Amazon を使用すると、ペイロードに複数のイメージを含めることができます。これらのモデルは、渡されたイメージを分析して質問に答えたり、イメージを分類したり、提供された指示に基づいてイメージを要約したりできます。

現在、Bedrock Converse は、image/jpeg、image/png、image/gif、image/webp MIME 型の base64 エンコードされたイメージをサポートしています。

Spring AI の Message インターフェースは、Media 型を導入することで、マルチモーダル AI モデルをサポートします。これには、Spring の org.springframework.util.MimeType と、生のメディアデータ用の java.lang.Object を使用して、メッセージ内のメディア添付ファイルに関するデータと情報が含まれます。

以下は、ユーザーテキストとイメージの組み合わせを示す簡単なコード例です。

String response = ChatClient.create(chatModel)
    .prompt()
    .user(u -> u.text("Explain what do you see on this picture?")
        .media(Media.Format.IMAGE_PNG, new ClassPathResource("/test.png")))
    .call()
    .content();

logger.info(response);

入力イメージ test.png :

「この写真に何が写っているか説明してください」というテキストメッセージとともに、次のようなレスポンスが生成されます。

The image shows a close-up view of a wire fruit basket containing several pieces of fruit.
...

ビデオ

Amazon Nova モデルを使用すると、ペイロードに 1 つのビデオを含めることができます。このビデオは、base64 形式または Amazon S3 URI を通じて提供できます。

現在、Bedrock Nova は video/x-matroska、video/quicktime、video/mp4、video/webm、video/x-flv、video/mpeg、video/x-ms-wmv、video/3gpp MIME 型のビデオをサポートしています。

以下は、ユーザーテキストとビデオの組み合わせを示す簡単なコード例です。

String response = ChatClient.create(chatModel)
    .prompt()
    .user(u -> u.text("Explain what do you see in this video?")
        .media(Media.Format.VIDEO_MP4, new ClassPathResource("/test.video.mp4")))
    .call()
    .content();

logger.info(response);

入力イメージ test.video.mp4 :

「このビデオで何が見えますか？」というテキストメッセージとともに、次のようなレスポンスが生成されます。

The video shows a group of baby chickens, also known as chicks, huddled together on a surface
...

文書

一部のモデルでは、Bedrock では、Converse API ドキュメントサポートを通じてペイロードにドキュメントを含めることができます。ドキュメントはバイト単位で提供できます。ドキュメントサポートには、以下で説明する 2 つの異なるバリエーションがあります。

テキストドキュメントの種類 (txt、csv、html、md など) では、テキストの理解に重点が置かれます。これらのユースケースには、ドキュメントのテキスト要素に基づいて回答することが含まれます。
メディアドキュメントの種類 (pdf、docx、xlsx) では、質問に答えるための視覚ベースの理解に重点が置かれています。これらのユースケースには、チャートやグラフなどに基づいて質問に答えることが含まれます。

現在、Anthropic PDF サポート (ベータ) (英語) および Amazon Bedrock Nova モデルはドキュメントのマルチモーダル性をサポートしています。

以下は、ユーザーテキストとメディアドキュメントの組み合わせを示す簡単なコード例です。

String response = ChatClient.create(chatModel)
    .prompt()
    .user(u -> u.text(
            "You are a very professional document summarization specialist. Please summarize the given document.")
        .media(Media.Format.DOC_PDF, new ClassPathResource("/spring-ai-reference-overview.pdf")))
    .call()
    .content();

logger.info(response);

入力として spring-ai-reference-overview.pdf ドキュメントを受け取ります:

「非常にプロフェッショナルなドキュメント要約の専門家です。指定されたドキュメントを要約してください。」というテキストメッセージとともに、次のようなレスポンスが生成されます。

**Introduction:**
- Spring AI is designed to simplify the development of applications with artificial intelligence (AI) capabilities, aiming to avoid unnecessary complexity.
...

サンプルコントローラー

新しい Spring Boot プロジェクトを作成し、依存関係に spring-ai-starter-model-bedrock-converse を追加します。

src/main/resources に application.properties ファイルを追加します。

spring.ai.bedrock.aws.region=eu-central-1
spring.ai.bedrock.aws.timeout=10m
spring.ai.bedrock.aws.access-key=${AWS_ACCESS_KEY_ID}
spring.ai.bedrock.aws.secret-key=${AWS_SECRET_ACCESS_KEY}
# session token is only required for temporary credentials
spring.ai.bedrock.aws.session-token=${AWS_SESSION_TOKEN}

spring.ai.bedrock.converse.chat.options.temperature=0.8
spring.ai.bedrock.converse.chat.options.top-k=15

以下はチャットモデルを使用するコントローラーの例です。

@RestController
public class ChatController {

    private final ChatClient chatClient;

    @Autowired
    public ChatController(ChatClient.Builder builder) {
        this.chatClient = builder.build();
    }

    @GetMapping("/ai/generate")
    public Map generate(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        return Map.of("generation", this.chatClient.prompt(message).call().content());
    }

    @GetMapping("/ai/generateStream")
    public Flux<ChatResponse> generateStream(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        return this.chatClient.prompt(message).stream().content();
    }
}