Docker モデルランナーチャット

Docker モデルランナー (英語) is an AI Inference Engine offering a wide range of models from various providers (英語) .

Spring AI は、既存の OpenAI を基盤とする ChatClient を再利用することで、Docker モデルランナーと統合します。これを行うには、ベース URL を localhost:12434/engines に設定し、提供されている LLM モデル (英語) のいずれかを選択します。

Spring AI で Docker モデルランナーを使用する方法の例については、DockerModelRunnerWithOpenAiChatModelIT.java [GitHub] (英語) テストを確認してください。

前提条件

Mac 4.40.0 用の Docker デスクトップをダウンロードしてください。

モデルランナーを有効にするには、次のいずれかのオプションを選択します。

オプション 1:

モデルランナー docker desktop enable model-runner --tcp 12434 を有効にします。
ベース URL を localhost:12434/engines に設定する

オプション 2:

モデルランナー docker desktop enable model-runner を有効にします。
Testcontainers を使用して、base-url を次のように設定します。

@Container
private static final SocatContainer socat = new SocatContainer().withTarget(80, "model-runner.docker.internal");

@Bean
public OpenAiApi chatCompletionApi() {
	var baseUrl = "http://%s:%d/engines".formatted(socat.getHost(), socat.getMappedPort(80));
	return OpenAiApi.builder().baseUrl(baseUrl).apiKey("test").build();
}

Docker モデルランナーの詳細については、Run LLMs Locally with Docker (英語) のブログ投稿を参照してください。

自動構成

The artifact IDs for Spring AI starter modules have been renamed since version 1.0.0.M7. Dependency names should now follow updated naming patterns for models, vector stores, and MCP starters. Please refer to the upgrade notes for more information.

Spring AI は、OpenAI チャットクライアント用の Spring Boot 自動構成を提供します。これを有効にするには、プロジェクトの Maven pom.xml ファイルに以下の依存関係を追加してください。

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>

または、Gradle build.gradle ビルドファイルに次のコードを追加します。

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-model-openai'
}

Spring AI BOM をビルドファイルに追加するには、"依存関係管理" セクションを参照してください。

チャットのプロパティ

再試行プロパティ

プレフィックス spring.ai.retry は、OpenAI チャットモデルの再試行メカニズムを構成できるプロパティプレフィックスとして使用されます。

プロパティ説明デフォルト

プロパティ	説明	デフォルト
spring.ai.retry.max-attempts	再試行の最大回数。	10
spring.ai.retry.backoff.initial-interval	指数関数的バックオフポリシーの初期スリープ期間。	2 秒
spring.ai.retry.backoff.multiplier	バックオフ間隔の乗数。	5
spring.ai.retry.backoff.max-interval	最大バックオフ期間。	3 分
spring.ai.retry.on-client-errors	false の場合、NonTransientAiException をスローし、`4xx` クライアントエラーコードの再試行を試行しません。	false
spring.ai.retry.exclude-on-http-codes	再試行をトリガーすべきではない HTTP ステータスコードのリスト (NonTransientAiException をスローするなど)。	空
spring.ai.retry.on-http-codes	再試行をトリガーする必要がある HTTP ステータスコードのリスト (例: TransientAiException をスローする)。	空

spring.ai.retry.max-attempts

再試行の最大回数。

spring.ai.retry.backoff.initial-interval

指数関数的バックオフポリシーの初期スリープ期間。

2 秒

spring.ai.retry.backoff.multiplier

バックオフ間隔の乗数。

spring.ai.retry.backoff.max-interval

最大バックオフ期間。

3 分

spring.ai.retry.on-client-errors

false の場合、NonTransientAiException をスローし、4xx クライアントエラーコードの再試行を試行しません。

false

spring.ai.retry.exclude-on-http-codes

再試行をトリガーすべきではない HTTP ステータスコードのリスト (NonTransientAiException をスローするなど)。

空

spring.ai.retry.on-http-codes

再試行をトリガーする必要がある HTTP ステータスコードのリスト (例: TransientAiException をスローする)。

空

接続プロパティ

接頭辞 spring.ai.openai は、OpenAI への接続を可能にするプロパティ接頭辞として使用されます。

プロパティ説明デフォルト

プロパティ	説明	デフォルト
spring.ai.openai.base-url	接続先の URL。`hub.docker.com/u/ai (英語)` に設定する必要があります	-
spring.ai.openai.api-key	任意の文字列	-

spring.ai.openai.base-url

接続先の URL。hub.docker.com/u/ai (英語) に設定する必要があります

spring.ai.openai.api-key

任意の文字列

プロパティの構成

チャットの自動構成の有効化と無効化は、プレフィックス spring.ai.model.chat を持つトップレベルのプロパティを介して実行されるようになりました。

有効にするには、spring.ai.model.chat=openai (デフォルトで有効になっています)

無効にするには、spring.ai.model.chat=none (または openai と一致しない値)

この変更により、アプリケーション内で複数のモデルを構成できるようになります。

プレフィックス spring.ai.openai.chat は、OpenAI のチャットモデル実装を構成できるプロパティプレフィックスです。

プロパティ説明デフォルト

プロパティ	説明	デフォルト
spring.ai.openai.chat.enabled (削除され、無効になりました)	OpenAI チャットモデルを有効にします。	true
spring.ai.model.chat	OpenAI チャットモデルを有効にします。	開く
spring.ai.openai.chat.base-url	オプション: `spring.ai.openai.base-url` をオーバーライドしてチャット専用の URL を提供します。`localhost:12434/engines` に設定する必要があります	-
spring.ai.openai.chat.api-key	オプションで spring.ai.openai.api-key をオーバーライドしてチャット固有の API キーを提供します	-
spring.ai.openai.chat.options.model	使用する LLM model (英語)	-
spring.ai.openai.chat.options.temperature	The sampling temperature that controls the apparent creativity of generated completions. Higher values will make output more random while lower values will make results more focused and deterministic. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict.	0.8
spring.ai.openai.chat.options.frequencyPenalty	-2.0 から 2.0 までの数値。正の値を指定すると、これまでのテキスト内の既存の頻度に基づいて新しいトークンにペナルティが課され、モデルが同じ行をそのまま繰り返す可能性が低くなります。	0.0f
spring.ai.openai.chat.options.maxTokens	チャット補完で生成するトークンの最大数。入力トークンと生成されたトークンの合計の長さは、モデルのコンテキストの長さによって制限されます。	-
spring.ai.openai.chat.options.n	各入力メッセージに対して生成するチャット補完の選択肢の数。すべての選択肢にわたって生成されたトークンの数に基づいて料金が請求されることに注意してください。コストを最小限に抑えるために、n を 1 に保ちます。	1
spring.ai.openai.chat.options.presencePenalty	-2.0 から 2.0 までの数値。正の値を指定すると、これまでにテキストに出現したかどうかに基づいて新しいトークンにペナルティが課され、モデルが新しいトピックについて話す可能性が高まります。	-
spring.ai.openai.chat.options.responseFormat	モデルが出力する必要がある形式を指定するオブジェクト。`{ "type": "json_object" }` に設定すると JSON モードが有効になり、モデルが生成するメッセージが有効な JSON であることが保証されます。	-
spring.ai.openai.chat.options.seed	この機能はベータ版です。指定した場合、システムは、同じシードとパラメーターを使用した繰り返しリクエストが同じ結果を返すように、決定論的にサンプリングするために最善の努力をします。	-
spring.ai.openai.chat.options.stop	API がさらなるトークンの生成を停止する最大 4 つのシーケンス。	-
spring.ai.openai.chat.options.topP	核サンプリングと呼ばれる、温度によるサンプリングの代替方法。モデルは、top_p 確率質量を使用してトークンの結果を考慮します。0.1 は、上位 10% の確率質量を構成するトークンのみが考慮されることを意味します。通常、これまたは温度を変更することをお勧めしますが、両方を変更することは推奨しません。	-
spring.ai.openai.chat.options.tools	モデルが呼び出す可能性のあるツールのリスト。現在、ツールとしては関数のみがサポートされています。これを使用して、モデルが JSON 入力を生成する可能性のある関数のリストを提供します。	-
spring.ai.openai.chat.options.toolChoice	モデルによって呼び出される関数 (存在する場合) を制御します。none は、モデルが関数を呼び出さず、代わりにメッセージを生成することを意味します。auto は、モデルがメッセージを生成するか関数を呼び出すかを選択できることを意味します。{"type: "function" , "function" : {"name" : "my_function" }} で特定の関数を指定すると、モデルは強制的にその関数を呼び出します。関数が存在しない場合は none がデフォルトです。auto はデフォルトです。機能が存在します。	-
spring.ai.openai.chat.options.user	エンドユーザーを表す一意の識別子。OpenAI が不正使用を監視および検出できます。	-
spring.ai.openai.chat.options.functions	単一のプロンプトリクエストで関数呼び出しを有効にするために、名前で識別される関数のリスト。これらの名前を持つ関数は、functionCallbacks レジストリに存在する必要があります。	-
spring.ai.openai.chat.options.stream-usage	(ストリーミングのみ) リクエスト全体のトークン使用統計を含む追加のチャンクを追加するように設定します。このチャンクの `choices` フィールドは空の配列であり、他のすべてのチャンクにも使用状況フィールドが含まれますが、値は null になります。	false
spring.ai.openai.chat.options.proxy-tool-calls	true の場合、Spring AI は関数呼び出しを内部で処理せず、クライアントにプロキシします。関数呼び出しを処理し、適切な関数にディスパッチして、結果を返すのはクライアントの責任です。false (デフォルト) の場合、Spring AI は関数呼び出しを内部で処理します。関数呼び出しをサポートするチャットモデルにのみ適用されます。	false

spring.ai.openai.chat.enabled (削除され、無効になりました)

OpenAI チャットモデルを有効にします。

true

spring.ai.model.chat

OpenAI チャットモデルを有効にします。

開く

spring.ai.openai.chat.base-url

オプション: spring.ai.openai.base-url をオーバーライドしてチャット専用の URL を提供します。localhost:12434/engines に設定する必要があります

spring.ai.openai.chat.api-key

オプションで spring.ai.openai.api-key をオーバーライドしてチャット固有の API キーを提供します

spring.ai.openai.chat.options.model

使用する LLM model (英語)

spring.ai.openai.chat.options.temperature

The sampling temperature that controls the apparent creativity of generated completions. Higher values will make output more random while lower values will make results more focused and deterministic. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict.

0.8

spring.ai.openai.chat.options.frequencyPenalty

-2.0 から 2.0 までの数値。正の値を指定すると、これまでのテキスト内の既存の頻度に基づいて新しいトークンにペナルティが課され、モデルが同じ行をそのまま繰り返す可能性が低くなります。

0.0f

spring.ai.openai.chat.options.maxTokens

チャット補完で生成するトークンの最大数。入力トークンと生成されたトークンの合計の長さは、モデルのコンテキストの長さによって制限されます。

spring.ai.openai.chat.options.n

各入力メッセージに対して生成するチャット補完の選択肢の数。すべての選択肢にわたって生成されたトークンの数に基づいて料金が請求されることに注意してください。コストを最小限に抑えるために、n を 1 に保ちます。

spring.ai.openai.chat.options.presencePenalty

-2.0 から 2.0 までの数値。正の値を指定すると、これまでにテキストに出現したかどうかに基づいて新しいトークンにペナルティが課され、モデルが新しいトピックについて話す可能性が高まります。

spring.ai.openai.chat.options.responseFormat

モデルが出力する必要がある形式を指定するオブジェクト。{ "type": "json_object" } に設定すると JSON モードが有効になり、モデルが生成するメッセージが有効な JSON であることが保証されます。

spring.ai.openai.chat.options.seed

この機能はベータ版です。指定した場合、システムは、同じシードとパラメーターを使用した繰り返しリクエストが同じ結果を返すように、決定論的にサンプリングするために最善の努力をします。

spring.ai.openai.chat.options.stop

API がさらなるトークンの生成を停止する最大 4 つのシーケンス。

spring.ai.openai.chat.options.topP

核サンプリングと呼ばれる、温度によるサンプリングの代替方法。モデルは、top_p 確率質量を使用してトークンの結果を考慮します。0.1 は、上位 10% の確率質量を構成するトークンのみが考慮されることを意味します。通常、これまたは温度を変更することをお勧めしますが、両方を変更することは推奨しません。

spring.ai.openai.chat.options.tools

モデルが呼び出す可能性のあるツールのリスト。現在、ツールとしては関数のみがサポートされています。これを使用して、モデルが JSON 入力を生成する可能性のある関数のリストを提供します。

spring.ai.openai.chat.options.toolChoice

モデルによって呼び出される関数 (存在する場合) を制御します。none は、モデルが関数を呼び出さず、代わりにメッセージを生成することを意味します。auto は、モデルがメッセージを生成するか関数を呼び出すかを選択できることを意味します。{"type: "function" , "function" : {"name" : "my_function" }} で特定の関数を指定すると、モデルは強制的にその関数を呼び出します。関数が存在しない場合は none がデフォルトです。auto はデフォルトです。機能が存在します。

spring.ai.openai.chat.options.user

エンドユーザーを表す一意の識別子。OpenAI が不正使用を監視および検出できます。

spring.ai.openai.chat.options.functions

単一のプロンプトリクエストで関数呼び出しを有効にするために、名前で識別される関数のリスト。これらの名前を持つ関数は、functionCallbacks レジストリに存在する必要があります。

spring.ai.openai.chat.options.stream-usage

(ストリーミングのみ) リクエスト全体のトークン使用統計を含む追加のチャンクを追加するように設定します。このチャンクの choices フィールドは空の配列であり、他のすべてのチャンクにも使用状況フィールドが含まれますが、値は null になります。

false

spring.ai.openai.chat.options.proxy-tool-calls

true の場合、Spring AI は関数呼び出しを内部で処理せず、クライアントにプロキシします。関数呼び出しを処理し、適切な関数にディスパッチして、結果を返すのはクライアントの責任です。false (デフォルト) の場合、Spring AI は関数呼び出しを内部で処理します。関数呼び出しをサポートするチャットモデルにのみ適用されます。

false

spring.ai.openai.chat.options というプレフィックスが付いたすべてのプロパティは、リクエスト固有のランタイムオプションを Prompt 呼び出しに追加することで実行時にオーバーライドできます。

ランタイムオプション

OpenAiChatOptions.java [GitHub] (英語) は、使用するモデル、温度、周波数ペナルティなどのモデル構成を提供します。

起動時に、OpenAiChatModel(api, options) コンストラクターまたは spring.ai.openai.chat.options.* プロパティを使用してデフォルトのオプションを構成できます。

At run-time you can override the default options by adding new, request specific, options to the Prompt call. For example, to override the default model and temperature for a specific request:

ChatResponse response = chatModel.call(
    new Prompt(
        "Generate the names of 5 famous pirates.",
        OpenAiChatOptions.builder()
            .model("ai/gemma3:4B-F16")
        .build()
    ));

モデル固有の OpenAiChatOptions [GitHub] (英語) に加えて、ChatOptions#builder() [GitHub] (英語) で作成されたポータブル ChatOptions [GitHub] (英語) インスタンスを使用できます。

関数呼び出し

Docker Model Runner supports Tool/Function calling when selecting a model that supports it.

You can register custom Java functions with your ChatModel and have the provided model intelligently choose to output a JSON object containing arguments to call one or many of the registered functions. This is a powerful technique for connecting the LLM capabilities with external tools and API.

ツールの例

Here’s a simple example of how to use Docker Model Runner function calling with Spring AI:

spring.ai.openai.api-key=test
spring.ai.openai.base-url=http://localhost:12434/engines
spring.ai.openai.chat.options.model=ai/gemma3:4B-F16

@SpringBootApplication
public class DockerModelRunnerLlmApplication {

    public static void main(String[] args) {
        SpringApplication.run(DockerModelRunnerLlmApplication.class, args);
    }

    @Bean
    CommandLineRunner runner(ChatClient.Builder chatClientBuilder) {
        return args -> {
            var chatClient = chatClientBuilder.build();

            var response = chatClient.prompt()
                .user("What is the weather in Amsterdam and Paris?")
                .functions("weatherFunction") // reference by bean name.
                .call()
                .content();

            System.out.println(response);
        };
    }

    @Bean
    @Description("Get the weather in location")
    public Function<WeatherRequest, WeatherResponse> weatherFunction() {
        return new MockWeatherService();
    }

    public static class MockWeatherService implements Function<WeatherRequest, WeatherResponse> {

        public record WeatherRequest(String location, String unit) {}
        public record WeatherResponse(double temp, String unit) {}

        @Override
        public WeatherResponse apply(WeatherRequest request) {
            double temperature = request.location().contains("Amsterdam") ? 20 : 25;
            return new WeatherResponse(temperature, request.unit);
        }
    }
}

In this example, when the model needs weather information, it will automatically call the weatherFunction bean, which can then fetch real-time weather data. The expected response is: "The weather in Amsterdam is currently 20 degrees Celsius, and the weather in Paris is currently 25 degrees Celsius."

OpenAI 関数呼び出しの詳細を参照してください。

サンプルコントローラー

新しい Spring Boot プロジェクトを作成し、spring-ai-starter-model-openai を pom (または gradle) の依存関係に追加します。

src/main/resources ディレクトリに application.properties ファイルを追加して、OpenAi チャットモデルを有効にして構成します。

spring.ai.openai.api-key=test
spring.ai.openai.base-url=http://localhost:12434/engines
spring.ai.openai.chat.options.model=ai/gemma3:4B-F16

# Docker Model Runner doesn't support embeddings, so we need to disable them.
spring.ai.openai.embedding.enabled=false

Here is an example of a simple @Controller class that uses the chat model for text generation.

@RestController
public class ChatController {

    private final OpenAiChatModel chatModel;

    @Autowired
    public ChatController(OpenAiChatModel chatModel) {
        this.chatModel = chatModel;
    }

    @GetMapping("/ai/generate")
    public Map generate(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        return Map.of("generation", this.chatModel.call(message));
    }

    @GetMapping("/ai/generateStream")
	public Flux<ChatResponse> generateStream(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        Prompt prompt = new Prompt(new UserMessage(message));
        return this.chatModel.stream(prompt);
    }
}