このバージョンはまだ開発中であり、まだ安定しているとは考えられていません。最新のスナップショットバージョンについては、Spring AI 1.0.3 を使用してください。

Anthropic チャット

Anthropic Claude (英語) は、さまざまなアプリケーションで使用できる基礎的な AI モデルのファミリーです。開発者や企業は、API アクセスを活用して、Anthropic の AI インフラ (英語) 上に直接構築できます。

Spring AI は、同期およびストリーミングテキスト生成のために Anthropic メッセージング API (英語) をサポートします。

Anthropic の Claude モデルは、Amazon Bedrock Converse を通じても利用できます。Spring AI は専用の Amazon Bedrock コンバース Anthropic クライアント実装も提供します。

前提条件

Anthropic ポータルで API キーを作成する必要があります。

Anthropic API ダッシュボード (英語) でアカウントを作成し、API キーを取得する (英語) ページで API キーを生成します。

Spring AI プロジェクトは、spring.ai.anthropic.api-key という名前の構成プロパティを定義します。このプロパティは、anthropic.com から取得した API Key の値に設定する必要があります。

この構成プロパティは、application.properties ファイルで設定できます。

spring.ai.anthropic.api-key=<your-anthropic-api-key>

API キーなどの機密情報を扱う際のセキュリティを強化するために、Spring 式言語 (SpEL) を使用してカスタム環境変数を参照できます。

# In application.yml
spring:
  ai:
    anthropic:
      api-key: ${ANTHROPIC_API_KEY}

# In your environment or .env file
export ANTHROPIC_API_KEY=<your-anthropic-api-key>

この構成は、アプリケーションコード内でプログラム的に取得することもできます。

// Retrieve API key from a secure source or environment variable
String apiKey = System.getenv("ANTHROPIC_API_KEY");

リポジトリと BOM の追加

Spring AI アーティファクトは、Maven Central リポジトリと Spring スナップショットリポジトリに公開されています。これらのリポジトリをビルドシステムに追加するには、アーティファクトリポジトリセクションを参照してください。

依存関係の管理を支援するために、Spring AI は BOM (部品表) を提供し、一貫したバージョンの Spring AI がプロジェクト全体で使用されるようにします。Spring AI BOM をビルドシステムに追加するには、"依存関係管理" セクションを参照してください。

自動構成

Spring AI 自動構成、スターターモジュールのアーティファクト名に大きな変更がありました。詳細については、アップグレードノートを参照してください。

Spring AI は、Anthropic チャットクライアント用の Spring Boot 自動構成を提供します。これを有効にするには、プロジェクトの Maven pom.xml または Gradle build.gradle ファイルに次の依存関係を追加します。

Maven
Gradle

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-anthropic</artifactId>
</dependency>

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-model-anthropic'
}

Spring AI BOM をビルドファイルに追加するには、"依存関係管理" セクションを参照してください。

チャットのプロパティ

再試行プロパティ

プレフィックス spring.ai.retry は、Anthropic チャットモデルの再試行メカニズムを構成できるプロパティプレフィックスとして使用されます。

プロパティ説明デフォルト

プロパティ	説明	デフォルト
spring.ai.retry.max-attempts	再試行の最大回数。	10
spring.ai.retry.backoff.initial-interval	指数関数的バックオフポリシーの初期スリープ期間。	2 秒
spring.ai.retry.backoff.multiplier	バックオフ間隔の乗数。	5
spring.ai.retry.backoff.max-interval	最大バックオフ期間。	3 分
spring.ai.retry.on-client-errors	false の場合、NonTransientAiException をスローし、`4xx` クライアントエラーコードの再試行を試行しません。	false
spring.ai.retry.exclude-on-http-codes	再試行をトリガーしない HTTP ステータスコードのリスト (例: NonTransientAiException をスローする)。	空
spring.ai.retry.on-http-codes	再試行をトリガーする必要がある HTTP ステータスコードのリスト (例: TransientAiException をスローする)。	空

spring.ai.retry.max-attempts

再試行の最大回数。

spring.ai.retry.backoff.initial-interval

指数関数的バックオフポリシーの初期スリープ期間。

2 秒

spring.ai.retry.backoff.multiplier

バックオフ間隔の乗数。

spring.ai.retry.backoff.max-interval

最大バックオフ期間。

3 分

spring.ai.retry.on-client-errors

false の場合、NonTransientAiException をスローし、4xx クライアントエラーコードの再試行を試行しません。

false

spring.ai.retry.exclude-on-http-codes

再試行をトリガーしない HTTP ステータスコードのリスト (例: NonTransientAiException をスローする)。

空

spring.ai.retry.on-http-codes

再試行をトリガーする必要がある HTTP ステータスコードのリスト (例: TransientAiException をスローする)。

空

現在、再試行ポリシーはストリーミング API には適用されません。

接続プロパティ

接頭辞 spring.ai.anthropic は、Anthropic への接続を可能にするプロパティ接頭辞として使用されます。

プロパティ説明デフォルト

プロパティ	説明	デフォルト
spring.ai.anthropic.base-url	接続先の URL	api.anthropic.com (英語)
spring.ai.anthropic.completions-path	ベース URL に追加するパス。	`/v1/chat/completions`
spring.ai.anthropic.version	Anthropic API バージョン	2023-06-01
spring.ai.anthropic.api-key	API キー	-
spring.ai.anthropic.beta-version	新しい / 実験的な機能を有効にします。`max-tokens-3-5-sonnet-2024-07-15` に設定すると、出力トークンの制限が `4096` トークンから `8192` トークンに増加されます (claude-3-5-sonnet のみ)。	`tools-2024-04-04`

spring.ai.anthropic.base-url

接続先の URL

api.anthropic.com (英語)

spring.ai.anthropic.completions-path

ベース URL に追加するパス。

/v1/chat/completions

spring.ai.anthropic.version

Anthropic API バージョン

2023-06-01

spring.ai.anthropic.api-key

API キー

spring.ai.anthropic.beta-version

新しい / 実験的な機能を有効にします。max-tokens-3-5-sonnet-2024-07-15 に設定すると、出力トークンの制限が 4096 トークンから 8192 トークンに増加されます (claude-3-5-sonnet のみ)。

tools-2024-04-04

プロパティの構成

チャットの自動構成の有効化と無効化は、プレフィックス spring.ai.model.chat を持つ最上位プロパティを介して設定されるようになりました。

有効にするには、spring.ai.model.chat=anthropic (デフォルトで有効になっています)

無効にするには、spring.ai.model.chat=none (または人類学的に一致しない値)

この変更は、複数のモデルの構成を可能にするために行われます。

プレフィックス spring.ai.anthropic.chat は、Anthropic のチャットモデル実装を構成できるプロパティプレフィックスです。

プロパティ説明デフォルト

プロパティ	説明	デフォルト
spring.ai.anthropic.chat.enabled (削除され、無効になりました)	Anthropic チャットモデルを有効にします。	true
spring.ai.model.chat	Anthropic チャットモデルを有効にします。	人類的な
spring.ai.anthropic.chat.options.model	使用する Anthropic チャットモデルです。サポート: `claude-opus-4-0`、`claude-sonnet-4-0`、`claude-3-7-sonnet-latest`、`claude-3-5-sonnet-latest`、`claude-3-opus-20240229`、`claude-3-sonnet-20240229`、`claude-3-haiku-20240307`、`claude-3-7-sonnet-latest`、`claude-sonnet-4-20250514`、`claude-opus-4-1-20250805`	`claude-opus-4-20250514`
spring.ai.anthropic.chat.options.temperature	生成される補完の見かけの創造性を制御するために使用するサンプリング温度。値を高くすると出力がよりランダムになり、値を低くすると結果がより集中的で決定的になります。これら 2 つの設定の相互作用を予測するのは難しいため、同じ完了リクエストに対して温度と top_p を変更することはお勧めできません。	0.8
spring.ai.anthropic.chat.options.max-tokens	チャット補完で生成するトークンの最大数。入力トークンと生成されたトークンの合計の長さは、モデルのコンテキストの長さによって制限されます。	500
spring.ai.anthropic.chat.options.stop-sequence	モデルの生成を停止させるカスタムテキストシーケンス。モデルは通常、自然にターンを完了すると停止し、その結果、レスポンス stop_reason は "end_turn" になります。モデルがカスタムテキスト文字列に遭遇したときに生成を停止する場合は、stop_sequences パラメーターを使用できます。モデルがカスタムシーケンスの 1 つに遭遇すると、レスポンス stop_reason 値は "stop_sequence" になり、レスポンス stop_sequence 値には一致する停止シーケンスが含まれます。	-
spring.ai.anthropic.chat.options.top-p	核サンプリングを使用します。核サンプリングでは、後続の各トークンのすべてのオプションの累積分布を確率の降順で計算し、top_p で指定された特定の確率に達するとそれをカットオフします。温度または top_p のいずれかを変更する必要がありますが、両方を変更することはできません。高度な使用例にのみ推奨されます。通常は温度のみを使用する必要があります。	-
spring.ai.anthropic.chat.options.top-k	後続の各トークンについて、上位 K 個のオプションからのみサンプリングします。「ロングテール」の低確率レスポンスを削除するために使用されます。技術的な詳細については、こちらを参照してください。高度な使用例にのみ推奨されます。通常は温度のみを使用する必要があります。	-
spring.ai.anthropic.chat.options.tool-names	単一のプロンプトリクエストでツール呼び出しを有効にする、名前で識別されるツールのリスト。これらの名前のツールは、toolCallbacks レジストリに存在する必要があります。	-
spring.ai.anthropic.chat.options.tool-callbacks	ChatModel に登録するツールコールバック。	-
spring.ai.anthropic.chat.options.toolChoice	Controls which (if any) tool is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a tool. Specifying a particular tool via `{"type: "tool", "name": "my_tool"}` forces the model to call that tool. `none` is the default when no functions are present. `auto` is the default if functions are present.	-
spring.ai.anthropic.chat.options.internal-tool-execution-enabled	False の場合、Spring AI はツール呼び出しを内部で処理せず、クライアントにプロキシします。その後、ツール呼び出しを処理し、適切な関数にディスパッチして、結果を返すのはクライアントの責任です。True (デフォルト) の場合、Spring AI は関数呼び出しを内部で処理します。関数呼び出しをサポートするチャットモデルにのみ適用されます。	true
spring.ai.anthropic.chat.options.http-headers	チャット補完リクエストに追加されるオプションの HTTP ヘッダー。	-

spring.ai.anthropic.chat.enabled (削除され、無効になりました)

Anthropic チャットモデルを有効にします。

true

spring.ai.model.chat

Anthropic チャットモデルを有効にします。

人類的な

spring.ai.anthropic.chat.options.model

使用する Anthropic チャットモデルです。サポート: claude-opus-4-0、claude-sonnet-4-0、claude-3-7-sonnet-latest、claude-3-5-sonnet-latest、claude-3-opus-20240229、claude-3-sonnet-20240229、claude-3-haiku-20240307、claude-3-7-sonnet-latest、claude-sonnet-4-20250514、claude-opus-4-1-20250805

claude-opus-4-20250514

spring.ai.anthropic.chat.options.temperature

生成される補完の見かけの創造性を制御するために使用するサンプリング温度。値を高くすると出力がよりランダムになり、値を低くすると結果がより集中的で決定的になります。これら 2 つの設定の相互作用を予測するのは難しいため、同じ完了リクエストに対して温度と top_p を変更することはお勧めできません。

0.8

spring.ai.anthropic.chat.options.max-tokens

チャット補完で生成するトークンの最大数。入力トークンと生成されたトークンの合計の長さは、モデルのコンテキストの長さによって制限されます。

500

spring.ai.anthropic.chat.options.stop-sequence

モデルの生成を停止させるカスタムテキストシーケンス。モデルは通常、自然にターンを完了すると停止し、その結果、レスポンス stop_reason は "end_turn" になります。モデルがカスタムテキスト文字列に遭遇したときに生成を停止する場合は、stop_sequences パラメーターを使用できます。モデルがカスタムシーケンスの 1 つに遭遇すると、レスポンス stop_reason 値は "stop_sequence" になり、レスポンス stop_sequence 値には一致する停止シーケンスが含まれます。

spring.ai.anthropic.chat.options.top-p

核サンプリングを使用します。核サンプリングでは、後続の各トークンのすべてのオプションの累積分布を確率の降順で計算し、top_p で指定された特定の確率に達するとそれをカットオフします。温度または top_p のいずれかを変更する必要がありますが、両方を変更することはできません。高度な使用例にのみ推奨されます。通常は温度のみを使用する必要があります。

spring.ai.anthropic.chat.options.top-k

後続の各トークンについて、上位 K 個のオプションからのみサンプリングします。「ロングテール」の低確率レスポンスを削除するために使用されます。技術的な詳細については、こちらを参照してください。高度な使用例にのみ推奨されます。通常は温度のみを使用する必要があります。

spring.ai.anthropic.chat.options.tool-names

単一のプロンプトリクエストでツール呼び出しを有効にする、名前で識別されるツールのリスト。これらの名前のツールは、toolCallbacks レジストリに存在する必要があります。

spring.ai.anthropic.chat.options.tool-callbacks

ChatModel に登録するツールコールバック。

spring.ai.anthropic.chat.options.toolChoice

Controls which (if any) tool is called by the model. none means the model will not call a function and instead generates a message. auto means the model can pick between generating a message or calling a tool. Specifying a particular tool via {"type: "tool", "name": "my_tool"} forces the model to call that tool. none is the default when no functions are present. auto is the default if functions are present.

spring.ai.anthropic.chat.options.internal-tool-execution-enabled

False の場合、Spring AI はツール呼び出しを内部で処理せず、クライアントにプロキシします。その後、ツール呼び出しを処理し、適切な関数にディスパッチして、結果を返すのはクライアントの責任です。True (デフォルト) の場合、Spring AI は関数呼び出しを内部で処理します。関数呼び出しをサポートするチャットモデルにのみ適用されます。

true

spring.ai.anthropic.chat.options.http-headers

チャット補完リクエストに追加されるオプションの HTTP ヘッダー。

モデルエイリアスの最新リストとその説明については、公式の Anthropic モデルエイリアスのドキュメント (英語) を参照してください。

spring.ai.anthropic.chat.options というプレフィックスが付いたすべてのプロパティは、リクエスト固有のランタイムオプションを Prompt 呼び出しに追加することで実行時にオーバーライドできます。

ランタイムオプション

AnthropicChatOptions.java [GitHub] (英語) は、使用するモデル、温度、最大トークン数などのモデル構成を提供します。

起動時に、AnthropicChatModel(api, options) コンストラクターまたは spring.ai.anthropic.chat.options.* プロパティを使用してデフォルトのオプションを構成できます。

実行時に、新しいリクエスト固有のオプションを Prompt 呼び出しに追加することで、デフォルトのオプションをオーバーライドできます。たとえば、特定のリクエストのデフォルトのモデルと温度をオーバーライドするには、次のようにします。

ChatResponse response = chatModel.call(
    new Prompt(
        "Generate the names of 5 famous pirates.",
        AnthropicChatOptions.builder()
            .model("claude-3-7-sonnet-latest")
            .temperature(0.4)
        .build()
    ));

モデル固有の AnthropicChatOptions [GitHub] (英語) に加えて、ChatOptions#builder() [GitHub] (英語) で作成されたポータブル ChatOptions [GitHub] (英語) インスタンスを使用できます。

プロンプトキャッシュ

Anthropic’s prompt caching feature (英語) allows you to cache frequently used prompts to reduce costs and improve response times for repeated interactions. When you cache a prompt, subsequent identical requests can reuse the cached content, significantly reducing the number of input tokens processed.

対応モデル

プロンプトキャッシュは現在、Claude Opus 4、Claude Sonnet 4、Claude Sonnet 3.7、Claude Sonnet 3.5、Claude Haiku 3.5、Claude Haiku 3、および Claude Opus 3 でサポートされています。

トークン要件

異なるモデルには、キャッシュの有効性に関する最小トークンしきい値が異なります。- Claude Sonnet 4: 1024+ トークン - Claude Haiku モデル: 2048+ トークン - その他のモデル: 1024+ トークン

キャッシュ戦略

Spring AI provides strategic cache placement through the AnthropicCacheStrategy enum. Each strategy automatically places cache breakpoints at optimal locations while staying within Anthropic’s 4-breakpoint limit.

戦略 Breakpoints Used ユースケース

戦略	Breakpoints Used	ユースケース
`NONE`	0	Disables prompt caching completely. Use when requests are one-off or content is too small to benefit from caching.
`SYSTEM_ONLY`	1	Caches system message content. Tools are cached implicitly via Anthropic’s automatic ~20-block lookback mechanism. Use when system prompts are large and stable with fewer than 20 tools.
`TOOLS_ONLY`	1	Caches tool definitions only. System messages remain uncached and are processed fresh on each request. Use when tool definitions are large and stable (5000+ tokens) but system prompts change frequently or vary per tenant/context.
`SYSTEM_AND_TOOLS`	2	Caches both tool definitions (breakpoint 1) and system message (breakpoint 2) explicitly. Use when you have 20+ tools (beyond automatic lookback) or want deterministic caching of both components. System changes don’t invalidate tool cache.
`CONVERSATION_HISTORY`	1-4	Caches entire conversation history up to the current user question. Use for multi-turn conversations with chat memory where conversation history grows over time.

NONE

Disables prompt caching completely. Use when requests are one-off or content is too small to benefit from caching.

SYSTEM_ONLY

Caches system message content. Tools are cached implicitly via Anthropic’s automatic ~20-block lookback mechanism. Use when system prompts are large and stable with fewer than 20 tools.

TOOLS_ONLY

Caches tool definitions only. System messages remain uncached and are processed fresh on each request. Use when tool definitions are large and stable (5000+ tokens) but system prompts change frequently or vary per tenant/context.

SYSTEM_AND_TOOLS

Caches both tool definitions (breakpoint 1) and system message (breakpoint 2) explicitly. Use when you have 20+ tools (beyond automatic lookback) or want deterministic caching of both components. System changes don’t invalidate tool cache.

CONVERSATION_HISTORY

1-4

Caches entire conversation history up to the current user question. Use for multi-turn conversations with chat memory where conversation history grows over time.

Due to Anthropic’s cascade invalidation, changing tool definitions will invalidate ALL downstream cache breakpoints (system, messages). Tool stability is critical when using SYSTEM_AND_TOOLS or CONVERSATION_HISTORY strategies.

プロンプトキャッシュを有効にする

AnthropicChatOptions に cacheOptions を設定し、strategy を選択してプロンプトキャッシュを有効にします。

System-Only Caching

Best for: Stable system prompts with <20 tools (tools cached implicitly via automatic lookback).

// Cache system message content (tools cached implicitly)
ChatResponse response = chatModel.call(
    new Prompt(
        List.of(
            new SystemMessage("You are a helpful AI assistant with extensive knowledge..."),
            new UserMessage("What is machine learning?")
        ),
        AnthropicChatOptions.builder()
            .model("claude-sonnet-4")
            .cacheOptions(AnthropicCacheOptions.builder()
                .strategy(AnthropicCacheStrategy.SYSTEM_ONLY)
                .build())
            .maxTokens(500)
            .build()
    )
);

Tools-Only Caching

Best for: Large stable tool sets with dynamic system prompts (multi-tenant apps, A/B testing).

// Cache tool definitions, system prompt processed fresh each time
ChatResponse response = chatModel.call(
    new Prompt(
        List.of(
            new SystemMessage("You are a " + persona + " assistant..."), // Dynamic per-tenant
            new UserMessage("What's the weather like in San Francisco?")
        ),
        AnthropicChatOptions.builder()
            .model("claude-sonnet-4")
            .cacheOptions(AnthropicCacheOptions.builder()
                .strategy(AnthropicCacheStrategy.TOOLS_ONLY)
                .build())
            .toolCallbacks(weatherToolCallback) // Large tool set cached
            .maxTokens(500)
            .build()
    )
);

System and Tools Caching

Best for: 20+ tools (beyond automatic lookback) or when both components should be cached independently.

// Cache both tool definitions and system message with independent breakpoints
// Changing system won't invalidate tool cache (but changing tools invalidates both)
ChatResponse response = chatModel.call(
    new Prompt(
        List.of(
            new SystemMessage("You are a weather analysis assistant..."),
            new UserMessage("What's the weather like in San Francisco?")
        ),
        AnthropicChatOptions.builder()
            .model("claude-sonnet-4")
            .cacheOptions(AnthropicCacheOptions.builder()
                .strategy(AnthropicCacheStrategy.SYSTEM_AND_TOOLS)
                .build())
            .toolCallbacks(weatherToolCallback) // 20+ tools
            .maxTokens(500)
            .build()
    )
);

Conversation History Caching

// Cache conversation history with ChatClient and memory (cache breakpoint on last user message)
ChatClient chatClient = ChatClient.builder(chatModel)
    .defaultSystem("You are a personalized career counselor...")
    .defaultAdvisors(MessageChatMemoryAdvisor.builder(chatMemory)
        .conversationId(conversationId)
        .build())
    .build();

String response = chatClient.prompt()
    .user("What career advice would you give me?")
    .options(AnthropicChatOptions.builder()
        .model("claude-sonnet-4")
        .cacheOptions(AnthropicCacheOptions.builder()
            .strategy(AnthropicCacheStrategy.CONVERSATION_HISTORY)
            .build())
        .maxTokens(500)
        .build())
    .call()
    .content();

Using ChatClient Fluent API

String response = ChatClient.create(chatModel)
    .prompt()
    .system("You are an expert document analyst...")
    .user("Analyze this large document: " + document)
    .options(AnthropicChatOptions.builder()
        .model("claude-sonnet-4")
        .cacheOptions(AnthropicCacheOptions.builder()
            .strategy(AnthropicCacheStrategy.SYSTEM_ONLY)
            .build())
        .build())
    .call()
    .content();

Advanced Caching Options

Per-Message TTL (5m or 1h)

By default, cached content uses a 5-minute TTL. You can set a 1-hour TTL for specific message types. When 1-hour TTL is used, Spring AI automatically sets the required Anthropic beta header.

ChatResponse response = chatModel.call(
    new Prompt(
        List.of(new SystemMessage(largeSystemPrompt)),
        AnthropicChatOptions.builder()
            .model("claude-sonnet-4")
            .cacheOptions(AnthropicCacheOptions.builder()
                .strategy(AnthropicCacheStrategy.SYSTEM_ONLY)
                .messageTypeTtl(MessageType.SYSTEM, AnthropicCacheTtl.ONE_HOUR)
                .build())
            .maxTokens(500)
            .build()
    )
);

Extended TTL uses Anthropic beta feature extended-cache-ttl-2025-04-11.

Cache Eligibility Filters

Control when cache breakpoints are used by setting minimum content lengths and an optional token-based length function:

AnthropicCacheOptions cache = AnthropicCacheOptions.builder()
    .strategy(AnthropicCacheStrategy.CONVERSATION_HISTORY)
    .messageTypeMinContentLength(MessageType.SYSTEM, 1024)
    .messageTypeMinContentLength(MessageType.USER, 1024)
    .messageTypeMinContentLength(MessageType.ASSISTANT, 1024)
    .contentLengthFunction(text -> MyTokenCounter.count(text))
    .build();

ChatResponse response = chatModel.call(
    new Prompt(
        List.of(/* messages */),
        AnthropicChatOptions.builder()
            .model("claude-sonnet-4")
            .cacheOptions(cache)
            .build()
    )
);

Tool Definitions are always considered for caching if SYSTEM_AND_TOOLS strategy is used, regardless of content length.

使用例

Here’s a complete example demonstrating prompt caching with cost tracking:

// Create system content that will be reused multiple times
String largeSystemPrompt = "You are an expert software architect specializing in distributed systems...";

// First request - creates cache
ChatResponse firstResponse = chatModel.call(
    new Prompt(
        List.of(
            new SystemMessage(largeSystemPrompt),
            new UserMessage("What is microservices architecture?")
        ),
        AnthropicChatOptions.builder()
            .model("claude-sonnet-4")
            .cacheOptions(AnthropicCacheOptions.builder()
                .strategy(AnthropicCacheStrategy.SYSTEM_ONLY)
                .build())
            .maxTokens(500)
            .build()
    )
);

// Access cache-related token usage
AnthropicApi.Usage firstUsage = (AnthropicApi.Usage) firstResponse.getMetadata()
    .getUsage().getNativeUsage();

System.out.println("Cache creation tokens: " + firstUsage.cacheCreationInputTokens());
System.out.println("Cache read tokens: " + firstUsage.cacheReadInputTokens());

// Second request with same system prompt - reads from cache
ChatResponse secondResponse = chatModel.call(
    new Prompt(
        List.of(
            new SystemMessage(largeSystemPrompt),
            new UserMessage("What are the benefits of event sourcing?")
        ),
        AnthropicChatOptions.builder()
            .model("claude-sonnet-4")
            .cacheOptions(AnthropicCacheOptions.builder()
                .strategy(AnthropicCacheStrategy.SYSTEM_ONLY)
                .build())
            .maxTokens(500)
            .build()
    )
);

AnthropicApi.Usage secondUsage = (AnthropicApi.Usage) secondResponse.getMetadata()
    .getUsage().getNativeUsage();

System.out.println("Cache creation tokens: " + secondUsage.cacheCreationInputTokens()); // Should be 0
System.out.println("Cache read tokens: " + secondUsage.cacheReadInputTokens()); // Should be > 0

Token Usage Tracking

The Usage record provides detailed information about cache-related token consumption. To access Anthropic-specific cache metrics, use the getNativeUsage() method:

AnthropicApi.Usage usage = (AnthropicApi.Usage) response.getMetadata()
    .getUsage().getNativeUsage();

Cache-specific metrics include:

cacheCreationInputTokens(): Returns the number of tokens used when creating a cache entry
cacheReadInputTokens(): Returns the number of tokens read from an existing cache entry

When you first send a cached prompt: - cacheCreationInputTokens() will be greater than 0 - cacheReadInputTokens() will be 0

When you send the same cached prompt again: - cacheCreationInputTokens() will be 0 - cacheReadInputTokens() will be greater than 0

Real-World Use Cases

Legal Document Analysis

Analyze large legal contracts or compliance documents efficiently by caching document content across multiple questions:

// Load a legal contract (PDF or text)
String legalContract = loadDocument("merger-agreement.pdf"); // ~3000 tokens

// System prompt with legal expertise
String legalSystemPrompt = "You are an expert legal analyst specializing in corporate law. " +
    "Analyze the following contract and provide precise answers about terms, obligations, and risks: " +
    legalContract;

// First analysis - creates cache
ChatResponse riskAnalysis = chatModel.call(
    new Prompt(
        List.of(
            new SystemMessage(legalSystemPrompt),
            new UserMessage("What are the key termination clauses and associated penalties?")
        ),
        AnthropicChatOptions.builder()
            .model("claude-sonnet-4")
            .cacheOptions(AnthropicCacheOptions.builder()
                .strategy(AnthropicCacheStrategy.SYSTEM_ONLY)
                .build())
            .maxTokens(1000)
            .build()
    )
);

// Subsequent questions reuse cached document - 90% cost savings
ChatResponse obligationAnalysis = chatModel.call(
    new Prompt(
        List.of(
            new SystemMessage(legalSystemPrompt), // Same content - cache hit
            new UserMessage("List all financial obligations and payment schedules.")
        ),
        AnthropicChatOptions.builder()
            .model("claude-sonnet-4")
            .cacheOptions(AnthropicCacheOptions.builder()
                .strategy(AnthropicCacheStrategy.SYSTEM_ONLY)
                .build())
            .maxTokens(1000)
            .build()
    )
);

Batch Code Review

Process multiple code files with consistent review criteria while caching the review guidelines:

// Define comprehensive code review guidelines
String reviewGuidelines = """
    You are a senior software engineer conducting code reviews. Apply these criteria:
    - Security vulnerabilities and best practices
    - Performance optimizations and memory usage
    - Code maintainability and readability
    - Testing coverage and edge cases
    - Design patterns and architecture compliance
    """;

List<String> codeFiles = Arrays.asList(
    "UserService.java", "PaymentController.java", "SecurityConfig.java"
);

List<String> reviews = new ArrayList<>();

for (String filename : codeFiles) {
    String sourceCode = loadSourceFile(filename);

    ChatResponse review = chatModel.call(
        new Prompt(
            List.of(
                new SystemMessage(reviewGuidelines), // Cached across all reviews
                new UserMessage("Review this " + filename + " code:\n\n" + sourceCode)
            ),
            AnthropicChatOptions.builder()
                .model("claude-sonnet-4")
                .cacheOptions(AnthropicCacheOptions.builder()
                    .strategy(AnthropicCacheStrategy.SYSTEM_ONLY)
                    .build())
                .maxTokens(800)
                .build()
        )
    );

    reviews.add(review.getResult().getOutput().getText());
}

// Guidelines cached after first request, subsequent reviews are faster and cheaper

Multi-Tenant SaaS with Shared Tools

Build a multi-tenant application where tools are shared but system prompts are customized per tenant:

// Define large shared tool set (used by all tenants)
List<FunctionCallback> sharedTools = Arrays.asList(
    weatherToolCallback,    // ~500 tokens
    calendarToolCallback,   // ~800 tokens
    emailToolCallback,      // ~700 tokens
    analyticsToolCallback,  // ~600 tokens
    reportingToolCallback,  // ~900 tokens
    // ... 20+ more tools, totaling 5000+ tokens
);

@Service
public class MultiTenantAIService {

    public String handleTenantRequest(String tenantId, String userQuery) {
        // Get tenant-specific configuration
        TenantConfig config = tenantRepository.findById(tenantId);

        // Dynamic system prompt per tenant
        String tenantSystemPrompt = String.format("""
            You are %s's AI assistant. Company values: %s.
            Brand voice: %s. Compliance requirements: %s.
            """, config.companyName(), config.values(),
                 config.brandVoice(), config.compliance());

        ChatResponse response = chatModel.call(
            new Prompt(
                List.of(
                    new SystemMessage(tenantSystemPrompt), // Different per tenant, NOT cached
                    new UserMessage(userQuery)
                ),
                AnthropicChatOptions.builder()
                    .model("claude-sonnet-4")
                    .cacheOptions(AnthropicCacheOptions.builder()
                        .strategy(AnthropicCacheStrategy.TOOLS_ONLY) // Cache tools only
                        .build())
                    .toolCallbacks(sharedTools) // Cached once, shared across all tenants
                    .maxTokens(800)
                    .build()
            )
        );

        return response.getResult().getOutput().getText();
    }
}

// Tools cached once (5000 tokens @ 10% = 500 token cost for cache hits)
// Each tenant's unique system prompt processed fresh (200-500 tokens @ 100%)
// Total per request: ~700-1000 tokens vs 5500+ without TOOLS_ONLY

Customer Support with Knowledge Base

Create a customer support system that caches your product knowledge base for consistent, accurate responses:

// Load comprehensive product knowledge
String knowledgeBase = """
    PRODUCT DOCUMENTATION:
    - API endpoints and authentication methods
    - Common troubleshooting procedures
    - Billing and subscription details
    - Integration guides and examples
    - Known issues and workarounds
    """ + loadProductDocs(); // ~2500 tokens

@Service
public class CustomerSupportService {

    public String handleCustomerQuery(String customerQuery, String customerId) {
        ChatResponse response = chatModel.call(
            new Prompt(
                List.of(
                    new SystemMessage("You are a helpful customer support agent. " +
                        "Use this knowledge base to provide accurate solutions: " + knowledgeBase),
                    new UserMessage("Customer " + customerId + " asks: " + customerQuery)
                ),
                AnthropicChatOptions.builder()
                    .model("claude-sonnet-4")
                    .cacheOptions(AnthropicCacheOptions.builder()
                        .strategy(AnthropicCacheStrategy.SYSTEM_ONLY)
                        .build())
                    .maxTokens(600)
                    .build()
            )
        );

        return response.getResult().getOutput().getText();
    }
}

// Knowledge base is cached across all customer queries
// Multiple support agents can benefit from the same cached content

ベストプラクティス

Choose the Right Strategy :
- Use SYSTEM_ONLY for stable system prompts with <20 tools (tools cached implicitly via automatic lookback)
- Use TOOLS_ONLY for large stable tool sets (5000+ tokens) with dynamic system prompts (multi-tenant, A/B testing)
- Use SYSTEM_AND_TOOLS when you have 20+ tools (beyond automatic lookback) or want both cached independently
- Use CONVERSATION_HISTORY with ChatClient memory for multi-turn conversations
- Use NONE to explicitly disable caching
Understand Cascade Invalidation : Anthropic’s cache hierarchy (tools → system → messages) means changes flow downward:
- Changing tools invalidates: tools + system + messages (all caches) ❌❌❌
- Changing system invalidates: system + messages (tools cache remains valid) ✅❌❌
- Changing messages invalidates: messages only (tools and system caches remain valid) ✅✅❌
  **Tool stability is critical** when using `SYSTEM_AND_TOOLS` or `CONVERSATION_HISTORY` strategies.
SYSTEM_AND_TOOLS Independence : With SYSTEM_AND_TOOLS, changing the system message does NOT invalidate the tool cache, allowing efficient reuse of cached tools even when system prompts vary.
Meet Token Requirements : Focus on caching content that meets the minimum token requirements (1024+ tokens for Sonnet 4, 2048+ for Haiku models).
Reuse Identical Content : Caching works best with exact matches of prompt content. Even small changes will require a new cache entry.
Monitor Token Usage : Use the cache usage statistics to track cache effectiveness: java AnthropicApi.Usage usage = (AnthropicApi.Usage) response.getMetadata().getUsage().getNativeUsage(); if (usage != null) { System.out.println("Cache creation: " + usage.cacheCreationInputTokens()); System.out.println("Cache read: " + usage.cacheReadInputTokens()); }
Strategic Cache Placement : The implementation automatically places cache breakpoints at optimal locations based on your chosen strategy, ensuring compliance with Anthropic’s 4-breakpoint limit.
Cache Lifetime : Default TTL is 5 minutes; set 1-hour TTL per message type via messageTypeTtl(…). Each cache access resets the timer.
Tool Caching Limitations : Be aware that tool-based interactions may not provide cache usage metadata in the response.

実装の詳細

The prompt caching implementation in Spring AI follows these key design principles:

Strategic Cache Placement : Cache breakpoints are automatically placed at optimal locations based on the chosen strategy, ensuring compliance with Anthropic’s 4-breakpoint limit.
- CONVERSATION_HISTORY places cache breakpoints on: tools (if present), system message, and the last user message
- This enables Anthropic’s prefix matching to incrementally cache the growing conversation history
- Each turn builds on the previous cached prefix, maximizing cache reuse
Provider Portability : Cache configuration is done through AnthropicChatOptions rather than individual messages, preserving compatibility when switching between different AI providers.
スレッドセーフ : The cache breakpoint tracking is implemented with thread-safe mechanisms to handle concurrent requests correctly.
Automatic Content Ordering : The implementation ensures proper on-the-wire ordering of JSON content blocks and cache controls according to Anthropic’s API requirements.
Aggregate Eligibility Checking : For CONVERSATION_HISTORY, the implementation considers all message types (user, assistant, tool) within the last ~20 content blocks when determining if the combined content meets the minimum token threshold for caching.

Future Enhancements

The current cache strategies are designed to handle 90% of common use cases effectively. For applications requiring more granular control, future enhancements may include:

Message-level cache control for fine-grained breakpoint placement
Multi-block content caching within individual messages
Advanced cache boundary selection for complex tool scenarios
Mixed TTL strategies for optimized cache hierarchies

These enhancements will maintain full backward compatibility while unlocking Anthropic’s complete prompt caching capabilities for specialized use cases.

考え

Anthropic Claude モデルは、「思考」機能をサポートしており、最終的な答えを出す前にモデルが推論プロセスを示すことができます。この機能により、特に段階的な推論を必要とする複雑な問題において、より透明性が高く詳細な問題解決が可能になります。

対応モデル

思考機能は、次の Claude モデルでサポートされています。

Claude 4 モデル (claude-opus-4-20250514, claude-sonnet-4-20250514)
Claude 3.7 ソネット (claude-3-7-sonnet-20250219)

モデルの機能:

Claude 3.7 ソネット : 完全な思考出力を返します。動作は一貫していますが、要約された思考やインターリーブされた思考には対応していません。
Claude 4 モデル : 要約思考、インターリーブ思考、強化されたツール統合をサポートします。

API リクエスト構造はサポートされているすべてのモデルで同じですが、出力動作は異なります。

思考構成

サポートされている Claude モデルで思考を有効にするには、リクエストに次の構成を含めます。

必要な構成

thinking オブジェクトを追加する :
- "type": "enabled"
- budget_tokens: 推論のためのトークン制限 (1024 から始めることをお勧めします)
トークン予算ルール :
- budget_tokens は通常 max_tokens より小さくなければなりません
- Claude は割り当てられたトークンよりも少ないトークンを使用する場合があります
- 予算が大きいほど推論の深さは増しますが、レイテンシに影響する可能性があります
- インターリーブ思考でツールの使用を使用する場合 (Claude 4 のみ)、この制約は緩和されますが、Spring AI ではまだサポートされていません。

重要な考慮事項

Claude 3.7 は完全な思考内容をレスポンスで返します
Claude 4 は、モデルの内部推論の要約版を返すことで、レイテンシを削減し、機密コンテンツを保護します。
出力トークンの一部としての思考トークンは課金可能 (たとえすべての反応が目に見えなくても)
インターリーブ思考は Claude 4 モデルでのみ使用可能で、ベータヘッダー interleaved-thinking-2025-05-14 が必要です。

ツールの統合とインターリーブ思考

Claude 4 モデルはツールの使用とインターリーブされた思考をサポートし、モデルがツールの呼び出し間で推論できるようにします。

現在の Spring AI 実装では、基本的な思考とツールの使用が個別にサポートされていますが、ツールの使用とインターリーブされた思考 (複数のツール呼び出しにわたって思考が継続される) はまだサポートされていません。

ツールの使用とインターリーブ思考の詳細については、Anthropic ドキュメント (英語) を参照してください。

非ストリーミングの例

ChatClient API を使用して非ストリーミングリクエストで思考を有効にする方法は次のとおりです。

ChatClient chatClient = ChatClient.create(chatModel);

// For Claude 3.7 Sonnet - explicit thinking configuration required
ChatResponse response = chatClient.prompt()
    .options(AnthropicChatOptions.builder()
        .model("claude-3-7-sonnet-latest")
        .temperature(1.0)  // Temperature should be set to 1 when thinking is enabled
        .maxTokens(8192)
        .thinking(AnthropicApi.ThinkingType.ENABLED, 2048)  // Must be ≥1024 && < max_tokens
        .build())
    .user("Are there an infinite number of prime numbers such that n mod 4 == 3?")
    .call()
    .chatResponse();

// For Claude 4 models - thinking is enabled by default
ChatResponse response4 = chatClient.prompt()
    .options(AnthropicChatOptions.builder()
        .model("claude-opus-4-0")
        .maxTokens(8192)
        // No explicit thinking configuration needed
        .build())
    .user("Are there an infinite number of prime numbers such that n mod 4 == 3?")
    .call()
    .chatResponse();

// Process the response which may contain thinking content
for (Generation generation : response.getResults()) {
    AssistantMessage message = generation.getOutput();
    if (message.getText() != null) {
        // Regular text response
        System.out.println("Text response: " + message.getText());
    }
    else if (message.getMetadata().containsKey("signature")) {
        // Thinking content
        System.out.println("Thinking: " + message.getMetadata().get("thinking"));
        System.out.println("Signature: " + message.getMetadata().get("signature"));
    }
}

ストリーミングの例

ストリーミングレスポンスで思考を使用することもできます。

ChatClient chatClient = ChatClient.create(chatModel);

// For Claude 3.7 Sonnet - explicit thinking configuration
Flux<ChatResponse> responseFlux = chatClient.prompt()
    .options(AnthropicChatOptions.builder()
        .model("claude-3-7-sonnet-latest")
        .temperature(1.0)
        .maxTokens(8192)
        .thinking(AnthropicApi.ThinkingType.ENABLED, 2048)
        .build())
    .user("Are there an infinite number of prime numbers such that n mod 4 == 3?")
    .stream();

// For Claude 4 models - thinking is enabled by default
Flux<ChatResponse> responseFlux4 = chatClient.prompt()
    .options(AnthropicChatOptions.builder()
        .model("claude-opus-4-0")
        .maxTokens(8192)
        // No explicit thinking configuration needed
        .build())
    .user("Are there an infinite number of prime numbers such that n mod 4 == 3?")
    .stream();

// For streaming, you might want to collect just the text responses
String textContent = responseFlux.collectList()
    .block()
    .stream()
    .map(ChatResponse::getResults)
    .flatMap(List::stream)
    .map(Generation::getOutput)
    .map(AssistantMessage::getText)
    .filter(text -> text != null && !text.isBlank())
    .collect(Collectors.joining());

ツール使用の統合

Claude 4 モデルは思考力とツール使用機能を統合します。

Claude 3.7 ソネット : 思考とツールの使用の両方をサポートしますが、それらは別々に動作し、より明示的な設定が必要です。
Claude 4 モデル : 思考とツールの使用をネイティブにインターリーブし、ツールのインタラクション中に深い推論を提供します。

思考を使うことの利点

思考機能にはいくつかの利点があります。

透過性 : モデルの推論プロセスと結論に至った経緯を確認する
デバッグ : モデルが論理エラーを起こしている可能性がある場所を特定する
教育 : 段階的な推論を教材として使う
複雑な問題の解決 : 数学、論理、推論の課題でより良い結果が得られる

思考を有効にするには、思考プロセス自体が割り当てられたトークンを消費するため、より高いトークン予算が必要になることに注意してください。

ツール / 関数の呼び出し

AnthropicChatModel にカスタム Java ツールを登録し、Anthropic Claude モデルで、登録された関数の 1 つまたは複数を呼び出すための引数を含む JSON オブジェクトをインテリジェントに出力するように選択できます。これは、LLM 機能を外部ツールや API に接続するための強力な手法です。ツール呼び出しの詳細については、こちらを参照してください。

Tool Choice

The tool_choice parameter allows you to control how the model uses the provided tools. This feature gives you fine-grained control over tool execution behavior.

For complete API details, see the Anthropic tool_choice documentation (英語) .

Tool Choice Options

Spring AI provides four tool choice strategies through the AnthropicApi.ToolChoice interface:

ToolChoiceAuto (default): The model automatically decides whether to use tools or respond with text
ToolChoiceAny : The model must use at least one of the available tools
ToolChoiceTool : The model must use a specific tool by name
ToolChoiceNone : The model cannot use any tools

Disabling Parallel Tool Use

All tool choice options (except ToolChoiceNone) support a disableParallelToolUse parameter. When set to true, the model will output at most one tool use.

使用例

Auto Mode (Default Behavior)

Let the model decide whether to use tools:

ChatResponse response = chatModel.call(
    new Prompt(
        "What's the weather in San Francisco?",
        AnthropicChatOptions.builder()
            .toolChoice(new AnthropicApi.ToolChoiceAuto())
            .toolCallbacks(weatherToolCallback)
            .build()
    )
);

Force Tool Use (任意)

Require the model to use at least one tool:

ChatResponse response = chatModel.call(
    new Prompt(
        "What's the weather?",
        AnthropicChatOptions.builder()
            .toolChoice(new AnthropicApi.ToolChoiceAny())
            .toolCallbacks(weatherToolCallback, calculatorToolCallback)
            .build()
    )
);

Force Specific Tool

Require the model to use a specific tool by name:

ChatResponse response = chatModel.call(
    new Prompt(
        "What's the weather in San Francisco?",
        AnthropicChatOptions.builder()
            .toolChoice(new AnthropicApi.ToolChoiceTool("get_weather"))
            .toolCallbacks(weatherToolCallback, calculatorToolCallback)
            .build()
    )
);

Disable Tool Use

Prevent the model from using any tools:

ChatResponse response = chatModel.call(
    new Prompt(
        "What's the weather in San Francisco?",
        AnthropicChatOptions.builder()
            .toolChoice(new AnthropicApi.ToolChoiceNone())
            .toolCallbacks(weatherToolCallback)
            .build()
    )
);

Disable Parallel Tool Use

Force the model to use only one tool at a time:

ChatResponse response = chatModel.call(
    new Prompt(
        "What's the weather in San Francisco and what's 2+2?",
        AnthropicChatOptions.builder()
            .toolChoice(new AnthropicApi.ToolChoiceAuto(true)) // disableParallelToolUse = true
            .toolCallbacks(weatherToolCallback, calculatorToolCallback)
            .build()
    )
);

Using ChatClient API

You can also use tool choice with the fluent ChatClient API:

String response = ChatClient.create(chatModel)
    .prompt()
    .user("What's the weather in San Francisco?")
    .options(AnthropicChatOptions.builder()
        .toolChoice(new AnthropicApi.ToolChoiceTool("get_weather"))
        .build())
    .call()
    .content();

ユースケース

検証 : Use ToolChoiceTool to ensure a specific tool is called for critical operations
効率 : Use ToolChoiceAny when you know a tool must be used to avoid unnecessary text generation
コントロール : Use ToolChoiceNone to temporarily disable tool access while keeping tool definitions registered
Sequential Processing : Use disableParallelToolUse to force sequential tool execution for dependent operations

マルチモーダル

マルチモーダル性とは、テキスト、PDF、イメージ、データ形式など、さまざまなソースからの情報を同時に理解して処理するモデルの機能を指します。

イメージ

現在、Anthropic、Claude 3 は、images の base64 ソース型と、image/jpeg、image/png、image/gif、image/webp メディア型をサポートしています。詳細については、ビジョンガイド (英語) を確認してください。Anthropic、Claude 3.5 Sonnet は、application/pdf ファイルの pdf ソース型もサポートしています。

Spring AI の Message インターフェースは、メディア型を導入することでマルチモーダル AI モデルをサポートします。この型には、生のメディアデータに Spring の org.springframework.util.MimeType および java.lang.Object を使用して、メッセージ内のメディア添付ファイルに関するデータと情報が含まれます。

以下は、AnthropicChatModelIT.java [GitHub] (英語) から抽出された簡単なコード例で、ユーザーテキストとイメージの組み合わせを示しています。

var imageData = new ClassPathResource("/multimodal.test.png");

var userMessage = new UserMessage("Explain what do you see on this picture?",
        List.of(new Media(MimeTypeUtils.IMAGE_PNG, this.imageData)));

ChatResponse response = chatModel.call(new Prompt(List.of(this.userMessage)));

logger.info(response.getResult().getOutput().getContent());

入力イメージ multimodal.test.png :

「この写真に何が写っているか説明してください」というテキストメッセージとともに、次のようなレスポンスが生成されます。

The image shows a close-up view of a wire fruit basket containing several pieces of fruit.
...

PDF

Sonnet 以降では 3.5 と PDF サポート (ベータ) (英語) が提供されています。メッセージに PDF ファイルを添付するには、application/pdf メディア型を使用します。

var pdfData = new ClassPathResource("/spring-ai-reference-overview.pdf");

var userMessage = new UserMessage(
        "You are a very professional document summarization specialist. Please summarize the given document.",
        List.of(new Media(new MimeType("application", "pdf"), pdfData)));

var response = this.chatModel.call(new Prompt(List.of(userMessage)));

引用

Anthropic’s Citations API (英語) allows Claude to reference specific parts of provided documents when generating responses. When citation documents are included in a prompt, Claude can cite the source material, and citation metadata (character ranges, page numbers, or content blocks) is returned in the response metadata.

Citations help improve:

Accuracy verification : Users can verify Claude’s responses against source material
透過性 : See exactly which parts of documents informed the response
準拠 : Meet requirements for source attribution in regulated industries
信頼 : Build confidence by showing where information came from

対応モデル

Citations are supported on Claude 3.7 Sonnet and Claude 4 models (Opus and Sonnet).

ドキュメントタイプ

Three types of citation documents are supported:

プレーンテキスト : Text content with character-level citations
PDF : PDF documents with page-level citations
Custom Content : User-defined content blocks with block-level citations

Creating Citation Documents

Use the CitationDocument builder to create documents that can be cited:

Plain Text Documents

CitationDocument document = CitationDocument.builder()
    .plainText("The Eiffel Tower was completed in 1889 in Paris, France. " +
               "It stands 330 meters tall and was designed by Gustave Eiffel.")
    .title("Eiffel Tower Facts")
    .citationsEnabled(true)
    .build();

PDF Documents

// From file path
CitationDocument document = CitationDocument.builder()
    .pdfFile("path/to/document.pdf")
    .title("Technical Specification")
    .citationsEnabled(true)
    .build();

// From byte array
byte[] pdfBytes = loadPdfBytes();
CitationDocument document = CitationDocument.builder()
    .pdf(pdfBytes)
    .title("Product Manual")
    .citationsEnabled(true)
    .build();

Custom Content Blocks

For fine-grained citation control, use custom content blocks:

CitationDocument document = CitationDocument.builder()
    .customContent(
        "The Great Wall of China is approximately 21,196 kilometers long.",
        "It was built over many centuries, starting in the 7th century BC.",
        "The wall was constructed to protect Chinese states from invasions."
    )
    .title("Great Wall Facts")
    .citationsEnabled(true)
    .build();

Using Citations in Requests

Include citation documents in your chat options:

ChatResponse response = chatModel.call(
    new Prompt(
        "When was the Eiffel Tower built and how tall is it?",
        AnthropicChatOptions.builder()
            .model("claude-3-7-sonnet-latest")
            .maxTokens(1024)
            .citationDocuments(document)
            .build()
    )
);

Multiple Documents

You can provide multiple documents for Claude to reference:

CitationDocument parisDoc = CitationDocument.builder()
    .plainText("Paris is the capital city of France with a population of 2.1 million.")
    .title("Paris Information")
    .citationsEnabled(true)
    .build();

CitationDocument eiffelDoc = CitationDocument.builder()
    .plainText("The Eiffel Tower was designed by Gustave Eiffel for the 1889 World's Fair.")
    .title("Eiffel Tower History")
    .citationsEnabled(true)
    .build();

ChatResponse response = chatModel.call(
    new Prompt(
        "What is the capital of France and who designed the Eiffel Tower?",
        AnthropicChatOptions.builder()
            .model("claude-3-7-sonnet-latest")
            .citationDocuments(parisDoc, eiffelDoc)
            .build()
    )
);

Accessing Citations

Citations are returned in the response metadata:

ChatResponse response = chatModel.call(prompt);

// Get citations from metadata
List<Citation> citations = (List<Citation>) response.getMetadata().get("citations");

// Optional: Get citation count directly from metadata
Integer citationCount = (Integer) response.getMetadata().get("citationCount");
System.out.println("Total citations: " + citationCount);

// Process each citation
for (Citation citation : citations) {
    System.out.println("Document: " + citation.getDocumentTitle());
    System.out.println("Location: " + citation.getLocationDescription());
    System.out.println("Cited text: " + citation.getCitedText());
    System.out.println("Document index: " + citation.getDocumentIndex());
    System.out.println();
}

Citation Types

Citations contain different location information depending on the document type:

Character Location (プレーンテキスト)

For plain text documents, citations include character indices:

Citation citation = citations.get(0);
if (citation.getType() == Citation.LocationType.CHAR_LOCATION) {
    int start = citation.getStartCharIndex();
    int end = citation.getEndCharIndex();
    String text = citation.getCitedText();
    System.out.println("Characters " + start + "-" + end + ": " + text);
}

Page Location (PDF)

For PDF documents, citations include page numbers:

Citation citation = citations.get(0);
if (citation.getType() == Citation.LocationType.PAGE_LOCATION) {
    int startPage = citation.getStartPageNumber();
    int endPage = citation.getEndPageNumber();
    System.out.println("Pages " + startPage + "-" + endPage);
}

Content Block Location (Custom Content)

For custom content, citations reference specific content blocks:

Citation citation = citations.get(0);
if (citation.getType() == Citation.LocationType.CONTENT_BLOCK_LOCATION) {
    int startBlock = citation.getStartBlockIndex();
    int endBlock = citation.getEndBlockIndex();
    System.out.println("Content blocks " + startBlock + "-" + endBlock);
}

Complete Example

Here’s a complete example demonstrating citation usage:

// Create a citation document
CitationDocument document = CitationDocument.builder()
    .plainText("Spring AI is an application framework for AI engineering. " +
               "It provides a Spring-friendly API for developing AI applications. " +
               "The framework includes abstractions for chat models, embedding models, " +
               "and vector databases.")
    .title("Spring AI Overview")
    .citationsEnabled(true)
    .build();

// Call the model with the document
ChatResponse response = chatModel.call(
    new Prompt(
        "What is Spring AI?",
        AnthropicChatOptions.builder()
            .model("claude-3-7-sonnet-latest")
            .maxTokens(1024)
            .citationDocuments(document)
            .build()
    )
);

// Display the response
System.out.println("Response: " + response.getResult().getOutput().getText());
System.out.println("\nCitations:");

// Process citations
List<Citation> citations = (List<Citation>) response.getMetadata().get("citations");

if (citations != null && !citations.isEmpty()) {
    for (int i = 0; i < citations.size(); i++) {
        Citation citation = citations.get(i);
        System.out.println("\n[" + (i + 1) + "] " + citation.getDocumentTitle());
        System.out.println("    Location: " + citation.getLocationDescription());
        System.out.println("    Text: " + citation.getCitedText());
    }
} else {
    System.out.println("No citations were provided in the response.");
}

ベストプラクティス

Use descriptive titles : Provide meaningful titles for citation documents to help users identify sources in the citations.
Check for null citations : Not all responses will include citations, so always validate the citations metadata exists before accessing it.
Consider document size : Larger documents provide more context but consume more input tokens and may affect response time.
Leverage multiple documents : When answering questions that span multiple sources, provide all relevant documents in a single request rather than making multiple calls.
Use appropriate document types : Choose plain text for simple content, PDF for existing documents, and custom content blocks when you need fine-grained control over citation granularity.

Real-World Use Cases

Legal Document Analysis

Analyze contracts and legal documents while maintaining source attribution:

CitationDocument contract = CitationDocument.builder()
    .pdfFile("merger-agreement.pdf")
    .title("Merger Agreement 2024")
    .citationsEnabled(true)
    .build();

ChatResponse response = chatModel.call(
    new Prompt(
        "What are the key termination clauses in this contract?",
        AnthropicChatOptions.builder()
            .model("claude-sonnet-4")
            .maxTokens(2000)
            .citationDocuments(contract)
            .build()
    )
);

// Citations will reference specific pages in the PDF

Customer Support Knowledge Base

Provide accurate customer support answers with verifiable sources:

CitationDocument kbArticle1 = CitationDocument.builder()
    .plainText(loadKnowledgeBaseArticle("authentication"))
    .title("Authentication Guide")
    .citationsEnabled(true)
    .build();

CitationDocument kbArticle2 = CitationDocument.builder()
    .plainText(loadKnowledgeBaseArticle("billing"))
    .title("Billing FAQ")
    .citationsEnabled(true)
    .build();

ChatResponse response = chatModel.call(
    new Prompt(
        "How do I reset my password and update my billing information?",
        AnthropicChatOptions.builder()
            .model("claude-3-7-sonnet-latest")
            .citationDocuments(kbArticle1, kbArticle2)
            .build()
    )
);

// Citations show which KB articles were referenced

Research and Compliance

Generate reports that require source citations for compliance:

CitationDocument clinicalStudy = CitationDocument.builder()
    .pdfFile("clinical-trial-results.pdf")
    .title("Clinical Trial Phase III Results")
    .citationsEnabled(true)
    .build();

CitationDocument regulatoryGuidance = CitationDocument.builder()
    .plainText(loadRegulatoryDocument())
    .title("FDA Guidance Document")
    .citationsEnabled(true)
    .build();

ChatResponse response = chatModel.call(
    new Prompt(
        "Summarize the efficacy findings and regulatory implications.",
        AnthropicChatOptions.builder()
            .model("claude-sonnet-4")
            .maxTokens(3000)
            .citationDocuments(clinicalStudy, regulatoryGuidance)
            .build()
    )
);

// Citations provide audit trail for compliance

Citation Document Options

Context Field

Optionally provide context about the document that won’t be cited but can guide Claude’s understanding:

CitationDocument document = CitationDocument.builder()
    .plainText("...")
    .title("Legal Contract")
    .context("This is a merger agreement dated January 2024 between Company A and Company B")
    .build();

Controlling Citations

By default, citations are disabled for all documents (opt-in behavior). To enable citations, explicitly set citationsEnabled(true):

CitationDocument document = CitationDocument.builder()
    .plainText("The Eiffel Tower was completed in 1889...")
    .title("Historical Facts")
    .citationsEnabled(true)  // Explicitly enable citations for this document
    .build();

You can also provide documents without citations for background context:

CitationDocument backgroundDoc = CitationDocument.builder()
    .plainText("Background information about the industry...")
    .title("Context Document")
    // citationsEnabled defaults to false - Claude will use this but not cite it
    .build();

Anthropic requires consistent citation settings across all documents in a request. You cannot mix citation-enabled and citation-disabled documents in the same request.

サンプルコントローラー

新しい Spring Boot プロジェクトを作成し、spring-ai-starter-model-anthropic を pom (または gradle) の依存関係に追加します。

src/main/resources ディレクトリに application.properties ファイルを追加して、Anthropic チャットモデルを有効にして構成します。

spring.ai.anthropic.api-key=YOUR_API_KEY
spring.ai.anthropic.chat.options.model=claude-3-5-sonnet-latest
spring.ai.anthropic.chat.options.temperature=0.7
spring.ai.anthropic.chat.options.max-tokens=450

api-key を Anthropic の資格情報に置き換えます。

これにより、クラスに挿入できる AnthropicChatModel 実装が作成されます。以下は、テキスト生成にチャットモデルを使用する単純な @Controller クラスの例です。

@RestController
public class ChatController {

    private final AnthropicChatModel chatModel;

    @Autowired
    public ChatController(AnthropicChatModel chatModel) {
        this.chatModel = chatModel;
    }

    @GetMapping("/ai/generate")
    public Map generate(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        return Map.of("generation", this.chatModel.call(message));
    }

    @GetMapping("/ai/generateStream")
	public Flux<ChatResponse> generateStream(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        Prompt prompt = new Prompt(new UserMessage(message));
        return this.chatModel.stream(prompt);
    }
}

手動構成

AnthropicChatModel [GitHub] (英語) は ChatModel と StreamingChatModel を実装し、低レベル AnthropicApi クライアントを使用して Anthropic サービスに接続します。

spring-ai-anthropic 依存関係をプロジェクトの Maven pom.xml ファイルに追加します。

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-anthropic</artifactId>
</dependency>

または、Gradle build.gradle ビルドファイルに保存します。

dependencies {
    implementation 'org.springframework.ai:spring-ai-anthropic'
}

Spring AI BOM をビルドファイルに追加するには、"依存関係管理" セクションを参照してください。

次に、AnthropicChatModel を作成し、テキスト生成に使用します。

var anthropicApi = new AnthropicApi(System.getenv("ANTHROPIC_API_KEY"));
var anthropicChatOptions = AnthropicChatOptions.builder()
            .model("claude-3-7-sonnet-20250219")
            .temperature(0.4)
            .maxTokens(200)
        .build()
var chatModel = AnthropicChatModel.builder().anthropicApi(anthropicApi)
                .defaultOptions(anthropicChatOptions).build();

ChatResponse response = this.chatModel.call(
    new Prompt("Generate the names of 5 famous pirates."));

// Or with streaming responses
Flux<ChatResponse> response = this.chatModel.stream(
    new Prompt("Generate the names of 5 famous pirates."));

AnthropicChatOptions は、チャットリクエストの構成情報を提供します。AnthropicChatOptions.Builder は流れるようなオプションビルダーです。

低レベル AnthropicApi クライアント

AnthropicApi [GitHub] (英語) が提供するのは、Anthropic メッセージ API (英語) 用の軽量 Java クライアントです。

次のクラス図は、AnthropicApi チャットインターフェースと構成要素を示しています。

API をプログラムで使用する方法の簡単なスニペットを次に示します。

AnthropicApi anthropicApi =
    new AnthropicApi(System.getenv("ANTHROPIC_API_KEY"));

AnthropicMessage chatCompletionMessage = new AnthropicMessage(
        List.of(new ContentBlock("Tell me a Joke?")), Role.USER);

// Sync request
ResponseEntity<ChatCompletionResponse> response = this.anthropicApi
    .chatCompletionEntity(new ChatCompletionRequest(AnthropicApi.ChatModel.CLAUDE_3_OPUS.getValue(),
            List.of(this.chatCompletionMessage), null, 100, 0.8, false));

// Streaming request
Flux<StreamResponse> response = this.anthropicApi
    .chatCompletionStream(new ChatCompletionRequest(AnthropicApi.ChatModel.CLAUDE_3_OPUS.getValue(),
            List.of(this.chatCompletionMessage), null, 100, 0.8, true));

詳細については、AnthropicApi.java [GitHub] (英語) の JavaDoc を参照してください。

低レベル API の例

AnthropicApiIT.java [GitHub] (英語) テストでは、軽量ライブラリの使用方法の一般的な例をいくつか示します。