Google GenAI チャット

Google GenAI API (英語) を使用すると、開発者は Gemini 開発者 API または Vertex AI を介して、Google の Gemini モデルを用いた生成 AI アプリケーションを構築できます。Google GenAI API は、マルチモーダルプロンプトを入力としてサポートし、テキストまたはコードを出力します。マルチモーダルモデルは、イメージ、動画、テキストなど、複数のモダリティからの情報を処理できます。たとえば、クッキーの写真をモデルに送信し、そのクッキーのレシピを確認することができます。

Gemini は、Google と DeepMind によって開発された、マルチモーダルユースケース向けに設計された生成 AI モデルファミリーです。Gemini API を使用すると、Gemini 2.0 フラッシュ (英語) 、Gemini 2.0 フラッシュライト (英語) 、すべての Gemini プロ (英語) モデル、最新の Gemini 3 プロ (英語) までアクセスできます。

この実装では、次の 2 つの認証モードが提供されます。

Gemini 開発者 API : API キーを使用して素早いプロトタイピングと開発を行う
Vertex AI : エンタープライズ機能を備えた本番環境のデプロイには Google クラウド認証情報を使用します

Gemini API リファレンス (英語)

前提条件

次のいずれかの認証方法を選択します。

オプション 1: Gemini 開発者 API (API キー)

Google AI スタジオ (英語) から API キーを取得する
API キーを環境変数またはアプリケーションのプロパティとして設定します

オプション 2: Vertex AI (Google クラウド)

OS に適した gcloud CLI をインストールします。
次のコマンドを実行して認証します。PROJECT_ID を Google Cloud プロジェクト ID に置き換え、ACCOUNT を Google Cloud ユーザー名に置き換えます。

gcloud config set project <PROJECT_ID> &&
gcloud auth application-default login <ACCOUNT>

自動構成

Spring AI 自動構成、スターターモジュールのアーティファクト名に大きな変更がありました。詳細については、アップグレードノートを参照してください。

Spring AI は、Google GenAI チャットクライアント用の Spring Boot 自動構成を提供します。これを有効にするには、プロジェクトの Maven pom.xml または Gradle build.gradle ビルドファイルに以下の依存関係を追加してください。

Maven
Gradle

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-google-genai</artifactId>
</dependency>

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-model-google-genai'
}

Spring AI BOM をビルドファイルに追加するには、"依存関係管理" セクションを参照してください。

チャットのプロパティ

チャットの自動構成の有効化と無効化は、プレフィックス spring.ai.model.chat を持つ最上位プロパティを介して設定されるようになりました。

有効にするには、spring.ai.model.chat=google-genai (デフォルトで有効になっています)

無効にするには、spring.ai.model.chat=none (または google-genai と一致しない値)

この変更は、複数のモデルの構成を可能にするために行われます。

接続プロパティ

プレフィックス spring.ai.google.genai は、Google GenAI に接続できるようにするプロパティプレフィックスとして使用されます。

プロパティ説明デフォルト

プロパティ	説明	デフォルト
spring.ai.model.chat	チャットモデルクライアントを有効にする	google-genai
spring.ai.google.genai.api-key	Gemini 開発者 API の API キー。指定すると、クライアントは Vertex AI ではなく Gemini 開発者 API を使用します。	-
spring.ai.google.genai.project-id	Google クラウドプラットフォームプロジェクト ID (Vertex AI モードに必要)	-
spring.ai.google.genai.location	Google クラウド領域 (Vertex AI モードに必要)	-
spring.ai.google.genai.credentials-uri	Google クラウド認証情報への URI。指定すると、認証用の `GoogleCredentials` インスタンスの作成に使用されます。	-

spring.ai.model.chat

チャットモデルクライアントを有効にする

google-genai

spring.ai.google.genai.api-key

Gemini 開発者 API の API キー。指定すると、クライアントは Vertex AI ではなく Gemini 開発者 API を使用します。

spring.ai.google.genai.project-id

Google クラウドプラットフォームプロジェクト ID (Vertex AI モードに必要)

spring.ai.google.genai.location

Google クラウド領域 (Vertex AI モードに必要)

spring.ai.google.genai.credentials-uri

Google クラウド認証情報への URI。指定すると、認証用の GoogleCredentials インスタンスの作成に使用されます。

チャットモデルのプロパティ

プレフィックス spring.ai.google.genai.chat は、Google GenAI Chat のチャットモデルの実装を構成できるプロパティプレフィックスです。

プロパティ説明デフォルト

プロパティ	説明	デフォルト
spring.ai.google.genai.chat.options.model	使用できる Google GenAI チャットモデル (英語) には `gemini-2.0-flash`、`gemini-2.0-flash-lite`、`gemini-pro`、`gemini-1.5-flash` が含まれます。	ジェミニ 2.0 フラッシュ
spring.ai.google.genai.chat.options.response-mime-type	生成された候補テキストのレスポンス MIME 型を出力します。	`text/plain`: (default) Text output or `application/json`: JSON response.
spring.ai.google.genai.chat.options.google-search-retrieval	Use Google search Grounding feature	`true` or `false`, default `false`.
spring.ai.google.genai.chat.options.temperature	Controls the randomness of the output. Values can range over [0.0,1.0], inclusive. A value closer to 1.0 will produce responses that are more varied, while a value closer to 0.0 will typically result in less surprising responses from the generative.	0.7
spring.ai.google.genai.chat.options.top-k	The maximum number of tokens to consider when sampling. The generative uses combined Top-k and nucleus sampling. Top-k sampling considers the set of topK most probable tokens.	-
spring.ai.google.genai.chat.options.top-p	The maximum cumulative probability of tokens to consider when sampling. The generative uses combined Top-k and nucleus sampling. Nucleus sampling considers the smallest set of tokens whose probability sum is at least topP.	-
spring.ai.google.genai.chat.options.candidate-count	The number of generated response messages to return. This value must be between [1, 8], inclusive. Defaults to 1.	1
spring.ai.google.genai.chat.options.max-output-tokens	The maximum number of tokens to generate.	-
spring.ai.google.genai.chat.options.frequency-penalty	Frequency penalties for reducing repetition.	-
spring.ai.google.genai.chat.options.presence-penalty	Presence penalties for reducing repetition.	-
spring.ai.google.genai.chat.options.thinking-budget	Thinking budget for the thinking process. See Thinking Configuration.	-
spring.ai.google.genai.chat.options.thinking-level	The level of thinking tokens the model should generate. Valid values: `LOW`, `HIGH`, `THINKING_LEVEL_UNSPECIFIED`. See Thinking Configuration.	-
spring.ai.google.genai.chat.options.include-thoughts	Enable thought signatures for function calling. Required for Gemini 3 Pro to avoid validation errors during the internal tool execution loop. See Thought Signatures.	false
spring.ai.google.genai.chat.options.tool-names	List of tools, identified by their names, to enable for function calling in a single prompt request. Tools with those names must exist in the ToolCallback registry.	-
spring.ai.google.genai.chat.options.tool-callbacks	Tool Callbacks to register with the ChatModel.	-
spring.ai.google.genai.chat.options.internal-tool-execution-enabled	If true, the tool execution should be performed, otherwise the response from the model is returned back to the user. Default is null, but if it’s null, `ToolCallingChatOptions.DEFAULT_TOOL_EXECUTION_ENABLED` which is true will take into account	-
spring.ai.google.genai.chat.options.safety-settings	List of safety settings to control safety filters, as defined by Google GenAI Safety Settings (英語) . Each safety setting can have a method, threshold, and category.	-
spring.ai.google.genai.chat.options.cached-content-name	The name of cached content to use for this request. When set along with `use-cached-content=true`, the cached content will be used as context. See Cached Content.	-
spring.ai.google.genai.chat.options.use-cached-content	Whether to use cached content if available. When true and `cached-content-name` is set, the system will use the cached content.	false
spring.ai.google.genai.chat.options.auto-cache-threshold	Automatically cache prompts that exceed this token threshold. When set, prompts larger than this value will be automatically cached for reuse. Set to null to disable auto-caching.	-
spring.ai.google.genai.chat.options.auto-cache-ttl	Time-to-live (Duration) for auto-cached content in ISO-8601 format (e.g., `PT1H` for 1 hour). Used when auto-caching is enabled.	PT1H
spring.ai.google.genai.chat.enable-cached-content	Enable the `GoogleGenAiCachedContentService` bean for managing cached content.	true

spring.ai.google.genai.chat.options.model

使用できる Google GenAI チャットモデル (英語) には gemini-2.0-flash、gemini-2.0-flash-lite、gemini-pro、gemini-1.5-flash が含まれます。

ジェミニ 2.0 フラッシュ

spring.ai.google.genai.chat.options.response-mime-type

生成された候補テキストのレスポンス MIME 型を出力します。

text/plain: (default) Text output or application/json: JSON response.

spring.ai.google.genai.chat.options.google-search-retrieval

Use Google search Grounding feature

true or false, default false.

spring.ai.google.genai.chat.options.temperature

Controls the randomness of the output. Values can range over [0.0,1.0], inclusive. A value closer to 1.0 will produce responses that are more varied, while a value closer to 0.0 will typically result in less surprising responses from the generative.

0.7

spring.ai.google.genai.chat.options.top-k

The maximum number of tokens to consider when sampling. The generative uses combined Top-k and nucleus sampling. Top-k sampling considers the set of topK most probable tokens.

spring.ai.google.genai.chat.options.top-p

The maximum cumulative probability of tokens to consider when sampling. The generative uses combined Top-k and nucleus sampling. Nucleus sampling considers the smallest set of tokens whose probability sum is at least topP.

spring.ai.google.genai.chat.options.candidate-count

The number of generated response messages to return. This value must be between [1, 8], inclusive. Defaults to 1.

spring.ai.google.genai.chat.options.max-output-tokens

The maximum number of tokens to generate.

spring.ai.google.genai.chat.options.frequency-penalty

Frequency penalties for reducing repetition.

spring.ai.google.genai.chat.options.presence-penalty

Presence penalties for reducing repetition.

spring.ai.google.genai.chat.options.thinking-budget

Thinking budget for the thinking process. See Thinking Configuration.

spring.ai.google.genai.chat.options.thinking-level

The level of thinking tokens the model should generate. Valid values: LOW, HIGH, THINKING_LEVEL_UNSPECIFIED. See Thinking Configuration.

spring.ai.google.genai.chat.options.include-thoughts

Enable thought signatures for function calling. Required for Gemini 3 Pro to avoid validation errors during the internal tool execution loop. See Thought Signatures.

false

spring.ai.google.genai.chat.options.tool-names

List of tools, identified by their names, to enable for function calling in a single prompt request. Tools with those names must exist in the ToolCallback registry.

spring.ai.google.genai.chat.options.tool-callbacks

Tool Callbacks to register with the ChatModel.

spring.ai.google.genai.chat.options.internal-tool-execution-enabled

If true, the tool execution should be performed, otherwise the response from the model is returned back to the user. Default is null, but if it’s null, ToolCallingChatOptions.DEFAULT_TOOL_EXECUTION_ENABLED which is true will take into account

spring.ai.google.genai.chat.options.safety-settings

List of safety settings to control safety filters, as defined by Google GenAI Safety Settings (英語) . Each safety setting can have a method, threshold, and category.

spring.ai.google.genai.chat.options.cached-content-name

The name of cached content to use for this request. When set along with use-cached-content=true, the cached content will be used as context. See Cached Content.

spring.ai.google.genai.chat.options.use-cached-content

Whether to use cached content if available. When true and cached-content-name is set, the system will use the cached content.

false

spring.ai.google.genai.chat.options.auto-cache-threshold

Automatically cache prompts that exceed this token threshold. When set, prompts larger than this value will be automatically cached for reuse. Set to null to disable auto-caching.

spring.ai.google.genai.chat.options.auto-cache-ttl

Time-to-live (Duration) for auto-cached content in ISO-8601 format (e.g., PT1H for 1 hour). Used when auto-caching is enabled.

PT1H

spring.ai.google.genai.chat.enable-cached-content

Enable the GoogleGenAiCachedContentService bean for managing cached content.

true

All properties prefixed with spring.ai.google.genai.chat.options can be overridden at runtime by adding a request specific Runtime options to the Prompt call.

Runtime options

The GoogleGenAiChatOptions.java [GitHub] (英語) provides model configurations, such as the temperature, the topK, etc.

On start-up, the default options can be configured with the GoogleGenAiChatModel(client, options) constructor or the spring.ai.google.genai.chat.options.* properties.

At runtime, you can override the default options by adding new, request specific, options to the Prompt call. For example, to override the default temperature for a specific request:

ChatResponse response = chatModel.call(
    new Prompt(
        "Generate the names of 5 famous pirates.",
        GoogleGenAiChatOptions.builder()
            .temperature(0.4)
        .build()
    ));

In addition to the model specific GoogleGenAiChatOptions you can use a portable ChatOptions [GitHub] (英語) instance, created with the ChatOptions#builder() [GitHub] (英語) .

Tool Calling

The Google GenAI model supports tool calling (function calling) capabilities, allowing models to use tools during conversations. Here’s an example of how to define and use @Tool-based tools:

public class WeatherService {

    @Tool(description = "Get the weather in location")
    public String weatherByLocation(@ToolParam(description= "City or state name") String location) {
        ...
    }
}

String response = ChatClient.create(this.chatModel)
        .prompt("What's the weather like in Boston?")
        .tools(new WeatherService())
        .call()
        .content();

You can use the java.util.function beans as tools as well:

@Bean
@Description("Get the weather in location. Return temperature in 36°F or 36°C format.")
public Function<Request, Response> weatherFunction() {
    return new MockWeatherService();
}

String response = ChatClient.create(this.chatModel)
        .prompt("What's the weather like in Boston?")
        .toolNames("weatherFunction")
        .inputType(Request.class)
        .call()
        .content();

Find more in Tools documentation.

Thinking Configuration

Gemini models support a "thinking" capability that allows the model to perform deeper reasoning before generating responses. This is controlled through the ThinkingConfig which includes three related options: thinkingBudget, thinkingLevel, and includeThoughts.

Thinking Level

The thinkingLevel option controls the depth of reasoning tokens the model generates. This is available for models that support thinking (e.g., Gemini 3 Pro Preview).

Value Description

Value	Description
`LOW`	最小限の思考。詳細な分析よりもスピードが優先されるシンプルなクエリに使用します。
`HIGH`	広範な思考。深い分析と段階的な推論を必要とする複雑な問題に活用します。
`THINKING_LEVEL_UNSPECIFIED`	モデルはデフォルトの動作を使用します。

LOW

最小限の思考。詳細な分析よりもスピードが優先されるシンプルなクエリに使用します。

HIGH

広範な思考。深い分析と段階的な推論を必要とする複雑な問題に活用します。

THINKING_LEVEL_UNSPECIFIED

モデルはデフォルトの動作を使用します。

プロパティによる設定

spring.ai.google.genai.chat.options.model=gemini-3-pro-preview
spring.ai.google.genai.chat.options.thinking-level=HIGH

プログラムによる構成

import org.springframework.ai.google.genai.common.GoogleGenAiThinkingLevel;

ChatResponse response = chatModel.call(
    new Prompt(
        "Explain the theory of relativity in simple terms.",
        GoogleGenAiChatOptions.builder()
            .model("gemini-3-pro-preview")
            .thinkingLevel(GoogleGenAiThinkingLevel.HIGH)
            .build()
    ));

予算を考える

thinkingBudget オプションは、思考プロセスのトークン予算を設定します。

正の値 : 思考のためのトークンの最大数 (例: 8192)
ゼロ (0) : 思考を完全に無効にする
未設定 : モデルはクエリの複雑さに基づいて自動的に決定します

ChatResponse response = chatModel.call(
    new Prompt(
        "Solve this complex math problem step by step.",
        GoogleGenAiChatOptions.builder()
            .model("gemini-2.5-pro")
            .thinkingBudget(8192)
            .build()
    ));

オプションの互換性

thinkingLevel と thinkingBudget は相互に排他的である。同じリクエストで両方を使用することはできません。使用すると、API エラーが発生します。

Gemini 3 プロモデルには thinkingLevel （LOW、HIGH）を使用してください。
Gemini 2.5 シリーズモデルでは thinkingBudget （トークンカウント）を使用します

includeThoughts は、thinkingLevel または thinkingBudget のいずれかと組み合わせることができます (両方を組み合わせることはできません)。

// For Gemini 3 Pro: use thinkingLevel + includeThoughts
ChatResponse response = chatModel.call(
    new Prompt(
        "Analyze this complex scenario.",
        GoogleGenAiChatOptions.builder()
            .model("gemini-3-pro-preview")
            .thinkingLevel(GoogleGenAiThinkingLevel.HIGH)
            .includeThoughts(true)
            .build()
    ));

// For Gemini 2.5: use thinkingBudget + includeThoughts
ChatResponse response = chatModel.call(
    new Prompt(
        "Analyze this complex scenario.",
        GoogleGenAiChatOptions.builder()
            .model("gemini-2.5-pro")
            .thinkingBudget(8192)
            .includeThoughts(true)
            .build()
    ));

モデルサポート

思考構成オプションはモデルによって異なります。

モデル thinkingLevel thinkingBudget ノート

モデル	thinkingLevel	thinkingBudget	ノート
Gemini 3 プロ (プレビュー)	✅ サポートされています	⚠️ 下位互換性のみ	`thinkingLevel` を使用してください。思考を無効化できません。グローバルエンドポイントが必要です。
Gemini 2.5 プロ	❌ サポートされていません	✅ サポートされています	`thinkingBudget` を使用します。無効にするには 0 に設定し、動的にするには -1 に設定します。
Gemini 2.5 フラッシュ	❌ サポートされていません	✅ サポートされています	`thinkingBudget` を使用します。無効にするには 0 に設定し、動的にするには -1 に設定します。
Gemini 2.5 フラッシュライト	❌ サポートされていません	✅ サポートされています	デフォルトでは思考は無効になっています。有効にするには `thinkingBudget` を設定してください。
Gemini 2.0 フラッシュ	❌ サポートされていません	❌ サポートされていません	思考は利用できません。

Gemini 3 プロ (プレビュー)

✅ サポートされています

⚠️ 下位互換性のみ

thinkingLevel を使用してください。思考を無効化できません。グローバルエンドポイントが必要です。

Gemini 2.5 プロ

❌ サポートされていません

✅ サポートされています

thinkingBudget を使用します。無効にするには 0 に設定し、動的にするには -1 に設定します。

Gemini 2.5 フラッシュ

❌ サポートされていません

✅ サポートされています

thinkingBudget を使用します。無効にするには 0 に設定し、動的にするには -1 に設定します。

Gemini 2.5 フラッシュライト

❌ サポートされていません

✅ サポートされています

デフォルトでは思考は無効になっています。有効にするには thinkingBudget を設定してください。

Gemini 2.0 フラッシュ

❌ サポートされていません

思考は利用できません。

サポートされていないモデル (Gemini 2.5 以前など) で thinkingLevel を使用すると、API エラーが発生します。
Gemini 3 Pro プレビューはグローバルエンドポイントでのみ利用可能です。spring.ai.google.genai.location=global または GOOGLE_CLOUD_LOCATION=global を設定してください。
最新モデルの機能については、Google GenAI Thinking ドキュメント (英語) を確認してください。

思考機能を有効にすると、トークンの使用量と API コストが増加します。クエリの複雑さに応じて適切にご利用ください。

思考署名

Gemini 3 Pro は、関数呼び出し中にモデルの推論コンテキストを保持する不透明なバイト配列である思考シグネチャーを導入します。includeThoughts が有効な場合、モデルは思考シグネチャーを返します。この思考シグネチャーは、内部ツール実行ループ中に同じターン内に返される必要があります。

思考シグネチャーが重要になるとき

IMPORTANT : 思考シグネチャーの検証は、現在のターン、具体的にはモデルが関数呼び出し（並列および順次）を行う内部ツール実行ループ中にのみ適用されます。API は、会話履歴内の以前のターンの思考シグネチャーを検証しません。

Google のドキュメント (英語) によると:

検証は現在のターン内の関数呼び出しに対してのみ強制されます
以前のターンシグネチャーを保存する必要はない
現在のターンの関数呼び出しに署名が欠落しているため、Gemini 3 Pro で HTTP 400 エラーが発生します。
並列関数呼び出しの場合、最初の functionCall 部分のみがシグネチャーを持ちます。

Gemini 2.5 Pro およびそれ以前のモデルでは、思考署名はオプションであり、API は寛容です。

構成

構成プロパティを使用して思考署名を有効にします。

spring.ai.google.genai.chat.options.model=gemini-3-pro-preview
spring.ai.google.genai.chat.options.include-thoughts=true

または実行時にプログラム的に:

ChatResponse response = chatModel.call(
    new Prompt(
        "Your question here",
        GoogleGenAiChatOptions.builder()
            .model("gemini-3-pro-preview")
            .includeThoughts(true)
            .toolCallbacks(callbacks)
            .build()
    ));

自動処理

Spring AI は、内部ツール実行ループ中に思考シグネチャーを自動的に処理します。internalToolExecutionEnabled が true（デフォルト）の場合、Spring AI は以下の処理を行います。

モデルレスポンスからの抽出思考シグネチャー
関数レスポンスを返すときに、正しい functionCall 部分にアタッチする
1 ターン内の関数呼び出し中に正しく伝播するする (並列と逐次の両方)

思考シグネチャーを手動で管理する必要はありません。Spring AI は、API 仕様で要求されているように、思考シグネチャーが functionCall パーツに適切に添付されていることを確認します。

関数呼び出しの例

@Bean
@Description("Get the weather in a location")
public Function<WeatherRequest, WeatherResponse> weatherFunction() {
    return new WeatherService();
}

// Enable includeThoughts for Gemini 3 Pro with function calling
String response = ChatClient.create(this.chatModel)
        .prompt("What's the weather like in Boston?")
        .options(GoogleGenAiChatOptions.builder()
            .model("gemini-3-pro-preview")
            .includeThoughts(true)
            .build())
        .toolNames("weatherFunction")
        .call()
        .content();

手動ツール実行モード

ツール実行ループを手動で制御するように internalToolExecutionEnabled=false を設定した場合、Gemini 3 Pro を includeThoughts=true と併用するときに、思考シグネチャーを自分で処理する必要があります。

思考シグネチャーを使用した手動ツール実行の要件:

レスポンスメタデータから思考シグネチャーを抽出します。

AssistantMessage assistantMessage = response.getResult().getOutput();
Map<String, Object> metadata = assistantMessage.getMetadata();
List<byte[]> thoughtSignatures = (List<byte[]>) metadata.get("thoughtSignatures");

関数レスポンスを返す際は、メタデータをそのまま残した元の AssistantMessage をメッセージ履歴に含めてください。Spring AI は、思考シグネチャーを適切な functionCall 部分に自動的に添付します。
Gemini 3 Pro の場合、現在のターン中に思考シグネチャーを保存できないと、API から HTTP 400 エラーが発生します。

現在のターンの関数呼び出しのみに思考シグネチャーが必要です。関数呼び出しラウンドの終了後に新しい会話ターンを開始する際は、前のターンの思考シグネチャーを保持する必要はありません。

includeThoughts を有効にすると、思考プロセスがレスポンスに含まれるため、トークンの使用量が増加します。これにより API コストは増加しますが、推論の透明性が向上します。

マルチモーダル

マルチモダリティとは、text、pdf、images、audio やその他のデータ形式を含むさまざまな（入力）ソースからの情報を同時に理解して処理するモデルの機能を指します。

イメージ、音声、ビデオ

Google の Gemini AI モデルは、テキスト、コード、オーディオ、イメージ、ビデオを理解して統合することにより、この機能をサポートします。詳細については、ブログ投稿 Gemini の導入 (英語) を参照してください。

Spring AI の Message インターフェースは、メディア型を導入することでマルチモーダル AI モデルをサポートします。この型には、生のメディアデータに Spring の org.springframework.util.MimeType および java.lang.Object を使用して、メッセージ内のメディア添付ファイルに関するデータと情報が含まれます。

以下は、GoogleGenAiChatModelIT.java [GitHub] (英語) から抽出された簡単なコード例で、ユーザーテキストとイメージの組み合わせを示しています。

byte[] data = new ClassPathResource("/vertex-test.png").getContentAsByteArray();

var userMessage = UserMessage.builder()
			.text("Explain what do you see o this picture?")
			.media(List.of(new Media(MimeTypeUtils.IMAGE_PNG, data)))
			.build();

ChatResponse response = chatModel.call(new Prompt(List.of(this.userMessage)));

PDF

Google GenAI は PDF 入力型をサポートしています。メッセージに PDF ファイルを添付するには、application/pdf メディア型を使用してください。

var pdfData = new ClassPathResource("/spring-ai-reference-overview.pdf");

var userMessage = UserMessage.builder()
			.text("You are a very professional document summarization specialist. Please summarize the given document.")
			.media(List.of(new Media(new MimeType("application", "pdf"), pdfData)))
			.build();

var response = this.chatModel.call(new Prompt(List.of(userMessage)));

キャッシュされたコンテンツ

Google GenAI のコンテキストキャッシング (英語) を使用すると、大量のコンテンツ（長いドキュメント、コードリポジトリ、メディアなど）をキャッシュし、複数のリクエストで再利用できます。これにより、API コストが大幅に削減され、同じコンテンツに対する繰り返しクエリのレスポンスレイテンシが向上します。

メリット

コスト削減 : キャッシュされたトークンは通常の入力トークンよりもはるかに低い料金で課金されます (通常 75-90% の方が安い)
パフォーマンスの向上 : キャッシュされたコンテンツを再利用すると、大規模なコンテキストの処理時間が短縮されます。
一貫性 : 同じキャッシュされたコンテキストにより、複数のリクエスト間で一貫したレスポンスが保証されます。

キャッシュ要件

最小キャッシュサイズ: 32,768 トークン (約 25,000 語)
最大キャッシュ期間: デフォルトでは 1 時間 (TTL で設定可能)
キャッシュされたコンテンツには、システム指示または会話履歴のいずれかが含まれている必要があります。

キャッシュコンテンツサービスの使用

Spring AI は、プログラムによるキャッシュ管理のための GoogleGenAiCachedContentService を提供します。このサービスは、Spring Boot の自動構成機能を使用すると自動的に設定されます。

キャッシュされたコンテンツの作成

@Autowired
private GoogleGenAiCachedContentService cachedContentService;

// Create cached content with a large document
String largeDocument = "... your large context here (>32k tokens) ...";

CachedContentRequest request = CachedContentRequest.builder()
    .model("gemini-2.0-flash")
    .contents(List.of(
        Content.builder()
            .role("user")
            .parts(List.of(Part.fromText(largeDocument)))
            .build()
    ))
    .displayName("My Large Document Cache")
    .ttl(Duration.ofHours(1))
    .build();

GoogleGenAiCachedContent cachedContent = cachedContentService.create(request);
String cacheName = cachedContent.getName(); // Save this for reuse

チャットリクエストでキャッシュされたコンテンツを使用する

キャッシュされたコンテンツを作成したら、チャットリクエストでそのコンテンツを参照します。

ChatResponse response = chatModel.call(
    new Prompt(
        "Summarize the key points from the document",
        GoogleGenAiChatOptions.builder()
            .useCachedContent(true)
            .cachedContentName(cacheName) // Use the cached content name
            .build()
    ));

または構成プロパティ経由で:

spring.ai.google.genai.chat.options.use-cached-content=true
spring.ai.google.genai.chat.options.cached-content-name=cachedContent/your-cache-name

キャッシュされたコンテンツの管理

GoogleGenAiCachedContentService は包括的なキャッシュ管理を提供します。

// Retrieve cached content
GoogleGenAiCachedContent content = cachedContentService.get(cacheName);

// Update cache TTL
CachedContentUpdateRequest updateRequest = CachedContentUpdateRequest.builder()
    .ttl(Duration.ofHours(2))
    .build();
GoogleGenAiCachedContent updated = cachedContentService.update(cacheName, updateRequest);

// List all cached content
List<GoogleGenAiCachedContent> allCaches = cachedContentService.listAll();

// Delete cached content
boolean deleted = cachedContentService.delete(cacheName);

// Extend cache TTL
GoogleGenAiCachedContent extended = cachedContentService.extendTtl(cacheName, Duration.ofMinutes(30));

// Cleanup expired caches
int removedCount = cachedContentService.cleanupExpired();

非同期操作

すべての操作には非同期のバリエーションがあります。

CompletableFuture<GoogleGenAiCachedContent> futureCache =
    cachedContentService.createAsync(request);

CompletableFuture<GoogleGenAiCachedContent> futureGet =
    cachedContentService.getAsync(cacheName);

CompletableFuture<Boolean> futureDelete =
    cachedContentService.deleteAsync(cacheName);

自動キャッシュ

Spring AI は、指定されたトークンしきい値を超えると、大きなプロンプトを自動的にキャッシュできます。

# Automatically cache prompts larger than 100,000 tokens
spring.ai.google.genai.chat.options.auto-cache-threshold=100000
# Set auto-cache TTL to 1 hour
spring.ai.google.genai.chat.options.auto-cache-ttl=PT1H

またはプログラム的に:

ChatResponse response = chatModel.call(
    new Prompt(
        largePrompt,
        GoogleGenAiChatOptions.builder()
            .autoCacheThreshold(100000)
            .autoCacheTtl(Duration.ofHours(1))
            .build()
    ));

自動キャッシュは、一度限りの大きなコンテキストに便利です。同じコンテキストを繰り返し使用する場合は、キャッシュされたコンテンツを手動で作成して参照する方が効率的です。

キャッシュ使用状況の監視

キャッシュされたコンテンツには、サービス経由でアクセス可能な使用状況メタデータが含まれます。

GoogleGenAiCachedContent content = cachedContentService.get(cacheName);

// Check if cache is expired
boolean expired = content.isExpired();

// Get remaining TTL
Duration remaining = content.getRemainingTtl();

// Get usage metadata
CachedContentUsageMetadata metadata = content.getUsageMetadata();
if (metadata != null) {
    System.out.println("Total tokens: " + metadata.totalTokenCount().orElse(0));
}

ベストプラクティス

キャッシュの有効期間 : ユースケースに応じて適切な TTL を設定します。頻繁に変更されるコンテンツの場合は TTL を短く、静的なコンテンツの場合は TTL を長くします。
キャッシュの命名 : キャッシュされたコンテンツを簡単に識別できるように、わかりやすい表示名を使用します。
クリーンアップ : 整理を維持するために、期限切れのキャッシュを定期的にクリーンアップします。
トークンしきい値 : 最小しきい値 (32,768 トークン) を超えるコンテンツのみをキャッシュします。
コスト最適化 : キャッシュされたコンテンツを複数のリクエストにわたって再利用して、コストを最大限節約します。

設定例

完全な構成例:

# Enable cached content service (enabled by default)
spring.ai.google.genai.chat.enable-cached-content=true

# Use a specific cached content
spring.ai.google.genai.chat.options.use-cached-content=true
spring.ai.google.genai.chat.options.cached-content-name=cachedContent/my-cache-123

# Auto-caching configuration
spring.ai.google.genai.chat.options.auto-cache-threshold=50000
spring.ai.google.genai.chat.options.auto-cache-ttl=PT30M

サンプルコントローラー

新しい Spring Boot プロジェクトを作成し、spring-ai-starter-model-google-genai を pom (または gradle) の依存関係に追加します。

src/main/resources ディレクトリに application.properties ファイルを追加して、Google GenAI チャットモデルを有効にして構成します。

Gemini 開発者 API の使用 (API キー)

spring.ai.google.genai.api-key=YOUR_API_KEY
spring.ai.google.genai.chat.options.model=gemini-2.0-flash
spring.ai.google.genai.chat.options.temperature=0.5

Vertex AI の使用

spring.ai.google.genai.project-id=PROJECT_ID
spring.ai.google.genai.location=LOCATION
spring.ai.google.genai.chat.options.model=gemini-2.0-flash
spring.ai.google.genai.chat.options.temperature=0.5

project-id を Google クラウドプロジェクト ID に置き換え、location は us-central1、europe-west1 などの Google クラウドリージョンに置き換えます。

各モデルには独自のサポート対象リージョンのセットがあり、サポート対象リージョンのリストはモデルページで確認できます。

これにより、クラスに挿入できる GoogleGenAiChatModel 実装が作成されます。以下は、テキスト生成にチャットモデルを使用する単純な @Controller クラスの例です。

@RestController
public class ChatController {

    private final GoogleGenAiChatModel chatModel;

    @Autowired
    public ChatController(GoogleGenAiChatModel chatModel) {
        this.chatModel = chatModel;
    }

    @GetMapping("/ai/generate")
    public Map generate(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        return Map.of("generation", this.chatModel.call(message));
    }

    @GetMapping("/ai/generateStream")
	public Flux<ChatResponse> generateStream(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        Prompt prompt = new Prompt(new UserMessage(message));
        return this.chatModel.stream(prompt);
    }
}

手動構成

GoogleGenAiChatModel [GitHub] (英語) は ChatModel を実装し、com.google.genai.Client を使用して Google GenAI サービスに接続します。

spring-ai-google-genai 依存関係をプロジェクトの Maven pom.xml ファイルに追加します。

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-google-genai</artifactId>
</dependency>

または、Gradle build.gradle ビルドファイルに保存します。

dependencies {
    implementation 'org.springframework.ai:spring-ai-google-genai'
}

Spring AI BOM をビルドファイルに追加するには、"依存関係管理" セクションを参照してください。

次に、GoogleGenAiChatModel を作成し、テキスト生成に使用します。

API キーの使用

Client genAiClient = Client.builder()
    .apiKey(System.getenv("GOOGLE_API_KEY"))
    .build();

var chatModel = new GoogleGenAiChatModel(genAiClient,
    GoogleGenAiChatOptions.builder()
        .model(ChatModel.GEMINI_2_0_FLASH)
        .temperature(0.4)
    .build());

ChatResponse response = this.chatModel.call(
    new Prompt("Generate the names of 5 famous pirates."));

Vertex AI の使用

Client genAiClient = Client.builder()
    .project(System.getenv("GOOGLE_CLOUD_PROJECT"))
    .location(System.getenv("GOOGLE_CLOUD_LOCATION"))
    .vertexAI(true)
    .build();

var chatModel = new GoogleGenAiChatModel(genAiClient,
    GoogleGenAiChatOptions.builder()
        .model(ChatModel.GEMINI_2_0_FLASH)
        .temperature(0.4)
    .build());

ChatResponse response = this.chatModel.call(
    new Prompt("Generate the names of 5 famous pirates."));

GoogleGenAiChatOptions は、チャットリクエストの構成情報を提供します。GoogleGenAiChatOptions.Builder は流れるようなオプションビルダーです。

Vertex AI から Gemini への移行

現在 Vertex AI Gemini 実装 (spring-ai-vertex-ai-gemini) を使用している場合は、最小限の変更で Google GenAI に移行できます。

主な違い

SDK : Google GenAI は com.google.cloud.vertexai.VertexAI の代わりに新しい com.google.genai.Client を使用します
認証 : API キーと Google クラウド認証情報の両方をサポート
パッケージ名 : クラスは org.springframework.ai.vertexai.gemini ではなく org.springframework.ai.google.genai にあります
プロパティ接頭辞 : spring.ai.vertex.ai.gemini の代わりに spring.ai.google.genai を使用する

Google GenAI と Vertex AI Gemini の使い分け

Google GenAI を使用する場合 : - API キーを使用して迅速にプロトタイピングしたい - 開発者 API の最新の Gemini 機能が必要 - API キーと Vertex AI モードを柔軟に切り替えたい

Vertex AI Gemini を使用する場合 : - 既存の Vertex AI インフラストラクチャがある - 特定の Vertex AI エンタープライズ機能が必要 - 組織で Google クラウドのみのデプロイが必要

低レベル Java クライアント

Google GenAI 実装は、Gemini モデルにアクセスするための最新の合理化された API を提供する新しい Google GenAI Java SDK 上に構築されています。