OpenSearch

このセクションでは、ドキュメントの埋め込みを保存し、類似性検索を実行するための OpenSearchVectorStore のセットアップについて説明します。

OpenSearch (英語) は、もともと Elasticsearch から分岐し、Apache License 2.0 として配布されたオープンソースの検索および分析エンジンです。AI 生成アセットの統合と管理を簡素化することで、AI アプリケーション開発を強化します。OpenSearch は、ベクトル、語彙、ハイブリッド検索機能をサポートし、ベクトルデータベースページ (英語) で詳しく説明されているように、高度なベクトルデータベース機能を活用して、低レイテンシのクエリと類似性検索を容易にします。

OpenSearch k-NN (英語) 機能を使用すると、ユーザーは大規模なデータセットからベクトル埋め込みをクエリできます。埋め込みとは、テキスト、イメージ、音声、ドキュメントなどのデータオブジェクトの数値表現です。埋め込みはインデックスに保存し、さまざまな類似性関数を使用してクエリできます。

前提条件

実行中の OpenSearch インスタンス。次のオプションが利用可能です。
- セルフマネージド OpenSearch (英語)
- Amazon OpenSearch サービス
必要に応じて、EmbeddingModel が OpenSearchVectorStore によって保存される埋め込みを生成するための API キー。

自動構成

Spring AI 自動構成、スターターモジュールのアーティファクト名に大きな変更がありました。詳細については、アップグレードノートを参照してください。

Spring AI は、OpenSearch ベクトルストア用の Spring Boot 自動構成を提供します。これを有効にするには、プロジェクトの Maven pom.xml ファイルに次の依存関係を追加します。

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-vector-store-opensearch</artifactId>
</dependency>

または、Gradle build.gradle ビルドファイルに次の内容を追加します。

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-vector-store-opensearch'
}

Spring AI BOM をビルドファイルに追加するには、"依存関係管理" セクションを参照してください。

Amazon OpenSearch サービスの場合は、代わりに次の依存関係を使用します。

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-vector-store-opensearch</artifactId>
</dependency>

または Gradle の場合:

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-vector-store-opensearch'
}

デフォルト値と構成オプションについては、ベクトルストアの構成パラメーターのリストを参照してください。

さらに、設定済みの EmbeddingModel Bean が必要です。詳細については、"EmbeddingModel" セクションを参照してください。

これで、アプリケーションで OpenSearchVectorStore をベクトルストアとして自動的に接続できるようになりました。

@Autowired VectorStore vectorStore;

// ...

List<Document> documents = List.of(
    new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", Map.of("meta1", "meta1")),
    new Document("The World is Big and Salvation Lurks Around the Corner"),
    new Document("You walk forward facing the past and you turn back toward the future.", Map.of("meta2", "meta2")));

// Add the documents to OpenSearch
vectorStore.add(documents);

// Retrieve documents similar to a query
List<Document> results = vectorStore.similaritySearch(SearchRequest.builder().query("Spring").topK(5).build());

プロパティの構成

OpenSearch に接続して OpenSearchVectorStore を使用するには、インスタンスのアクセス詳細を提供する必要があります。Spring Boot の application.yml を介して簡単な構成を提供できます。

spring:
  ai:
    vectorstore:
      opensearch:
        uris: <opensearch instance URIs>
        username: <opensearch username>
        password: <opensearch password>
        index-name: spring-ai-document-index
        initialize-schema: true
        similarity-function: cosinesimil
        read-timeout: <time to wait for response>
        connect-timeout: <time to wait until connection established>
        path-prefix: <custom path prefix>
        ssl-bundle: <name of SSL bundle>
        aws:  # Only for Amazon OpenSearch Service
          host: <aws opensearch host>
          service-name: <aws service name>
          access-key: <aws access key>
          secret-key: <aws secret key>
          region: <aws region>

spring.ai.vectorstore.opensearch.* で始まるプロパティは、OpenSearchVectorStore を構成するために使用されます。

プロパティ説明デフォルト値

プロパティ	説明	デフォルト値
`spring.ai.vectorstore.opensearch.uris`	OpenSearch クラスターエンドポイントの URI	-
`spring.ai.vectorstore.opensearch.username`	OpenSearch クラスターにアクセスするためのユーザー名	-
`spring.ai.vectorstore.opensearch.password`	指定されたユーザー名のパスワード	-
`spring.ai.vectorstore.opensearch.index-name`	ベクトルを格納するインデックスの名前	`spring-ai-document-index`
`spring.ai.vectorstore.opensearch.initialize-schema`	必要なスキーマを初期化するかどうか	`false`
`spring.ai.vectorstore.opensearch.similarity-function`	使用する類似度関数	`cosinesimil`
`spring.ai.vectorstore.opensearch.read-timeout`	反対側のエンドポイントからのレスポンスを待機する時間。0 〜無限大。	-
`spring.ai.vectorstore.opensearch.connect-timeout`	接続が確立されるまで待機する時間。0 - 無限大。	-
`spring.ai.vectorstore.opensearch.path-prefix`	OpenSearch API エンドポイントのパスプレフィックス。OpenSearch がルート以外のパスを持つリバースプロキシの背後にある場合に便利です。	-
`spring.ai.vectorstore.opensearch.ssl-bundle`	SSL 接続時に使用する SSL バンドルの名前	-
`spring.ai.vectorstore.opensearch.aws.host`	OpenSearch インスタンスのホスト名	-
`spring.ai.vectorstore.opensearch.aws.service-name`	AWS サービス名	-
`spring.ai.vectorstore.opensearch.aws.access-key`	AWS アクセスキー	-
`spring.ai.vectorstore.opensearch.aws.secret-key`	AWS 秘密鍵	-
`spring.ai.vectorstore.opensearch.aws.region`	AWS リージョン	-

spring.ai.vectorstore.opensearch.uris

OpenSearch クラスターエンドポイントの URI

spring.ai.vectorstore.opensearch.username

OpenSearch クラスターにアクセスするためのユーザー名

spring.ai.vectorstore.opensearch.password

指定されたユーザー名のパスワード

spring.ai.vectorstore.opensearch.index-name

ベクトルを格納するインデックスの名前

spring-ai-document-index

spring.ai.vectorstore.opensearch.initialize-schema

必要なスキーマを初期化するかどうか

false

spring.ai.vectorstore.opensearch.similarity-function

使用する類似度関数

cosinesimil

spring.ai.vectorstore.opensearch.read-timeout

反対側のエンドポイントからのレスポンスを待機する時間。0 〜無限大。

spring.ai.vectorstore.opensearch.connect-timeout

接続が確立されるまで待機する時間。0 - 無限大。

spring.ai.vectorstore.opensearch.path-prefix

OpenSearch API エンドポイントのパスプレフィックス。OpenSearch がルート以外のパスを持つリバースプロキシの背後にある場合に便利です。

spring.ai.vectorstore.opensearch.ssl-bundle

SSL 接続時に使用する SSL バンドルの名前

spring.ai.vectorstore.opensearch.aws.host

OpenSearch インスタンスのホスト名

spring.ai.vectorstore.opensearch.aws.service-name

AWS サービス名

spring.ai.vectorstore.opensearch.aws.access-key

AWS アクセスキー

spring.ai.vectorstore.opensearch.aws.secret-key

AWS 秘密鍵

spring.ai.vectorstore.opensearch.aws.region

AWS リージョン

spring.ai.vectorstore.opensearch.aws.enabled プロパティを使用して、AWS 固有の OpenSearch 自動構成を有効にするかどうかを制御できます。

このプロパティを false に設定すると、AWS SDK クラスがクラスパス上に存在する場合でも、非 AWS OpenSearch 構成がアクティブ化されます。これにより、他のサービス用の AWS SDK が存在する環境でも、セルフマネージド型またはサードパーティ製の OpenSearch クラスターを使用できるようになります。
AWS SDK クラスが存在しない場合は、常に非 AWS 構成が使用されます。
AWS SDK クラスが存在し、プロパティが設定されていないか true に設定されている場合、AWS 固有の構成がデフォルトで使用されます。

このフォールバックロジックにより、ユーザーは OpenSearch 統合の型を明示的に制御できるようになり、AWS 固有のロジックが望ましくないときに誤ってアクティブ化されることを防止できます。

path-prefix プロパティを使用すると、非ルートパスを使用するリバースプロキシの背後で OpenSearch が実行されているときに、カスタムパスプレフィックスを指定できます。例: OpenSearch インスタンスが example.com/ (英語) ではなく example.com/opensearch/ (英語) でアクセスできる場合は、path-prefix: /opensearch を設定します。

次の類似度関数が利用可能です。

cosinesimil - デフォルト。ほとんどのユースケースに適しています。ベクトル間のコサイン類似度を測定します。
l1 - ベクトル間のマンハッタン距離。
l2 - ベクトル間のユークリッド距離。
linf - ベクトル間のチェビシェフ距離。

手動構成

Spring Boot の自動構成を使用する代わりに、OpenSearch ベクトルストアを手動で構成できます。そのためには、プロジェクトに spring-ai-opensearch-store を追加する必要があります。

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-opensearch-store</artifactId>
</dependency>

または、Gradle build.gradle ビルドファイルに次の内容を追加します。

dependencies {
    implementation 'org.springframework.ai:spring-ai-opensearch-store'
}

Spring AI BOM をビルドファイルに追加するには、"依存関係管理" セクションを参照してください。

OpenSearch クライアント Bean を作成します。

@Bean
public OpenSearchClient openSearchClient() {
    RestClient restClient = RestClient.builder(
        HttpHost.create("http://localhost:9200"))
        .build();

    return new OpenSearchClient(new RestClientTransport(
        restClient, new JacksonJsonpMapper()));
}

次に、ビルダーパターンを使用して OpenSearchVectorStore Bean を作成します。

@Bean
public VectorStore vectorStore(OpenSearchClient openSearchClient, EmbeddingModel embeddingModel) {
    return OpenSearchVectorStore.builder(openSearchClient, embeddingModel)
        .index("custom-index")                // Optional: defaults to "spring-ai-document-index"
        .similarityFunction("l2")             // Optional: defaults to "cosinesimil"
        .initializeSchema(true)               // Optional: defaults to false
        .batchingStrategy(new TokenCountBatchingStrategy()) // Optional: defaults to TokenCountBatchingStrategy
        .build();
}

// This can be any EmbeddingModel implementation
@Bean
public EmbeddingModel embeddingModel() {
    return new OpenAiEmbeddingModel(new OpenAiApi(System.getenv("OPENAI_API_KEY")));
}

メタデータフィルタリング

OpenSearch でも、汎用的でポータブルなメタデータフィルターを活用できます。

例: 次のいずれかのテキスト式言語を使用できます。

vectorStore.similaritySearch(
    SearchRequest.builder()
        .query("The World")
        .topK(TOP_K)
        .similarityThreshold(SIMILARITY_THRESHOLD)
        .filterExpression("author in ['john', 'jill'] && 'article_type' == 'blog'").build());

または、Filter.Expression DSL を使用してプログラム的に次のようにします。

FilterExpressionBuilder b = new FilterExpressionBuilder();

vectorStore.similaritySearch(SearchRequest.builder()
    .query("The World")
    .topK(TOP_K)
    .similarityThreshold(SIMILARITY_THRESHOLD)
    .filterExpression(b.and(
        b.in("author", "john", "jill"),
        b.eq("article_type", "blog")).build()).build());

これらの (ポータブル) フィルター式は、独自の OpenSearch クエリ文字列クエリ (英語) に自動的に変換されます。

例: この移植可能なフィルター式:

author in ['john', 'jill'] && 'article_type' == 'blog'

独自の OpenSearch フィルター形式に変換されます。

(metadata.author:john OR jill) AND metadata.article_type:blog

ネイティブクライアントへのアクセス

OpenSearch ベクトルストアの実装は、getNativeClient() メソッドを通じて、基盤となるネイティブ OpenSearch クライアント (OpenSearchClient) へのアクセスを提供します。

OpenSearchVectorStore vectorStore = context.getBean(OpenSearchVectorStore.class);
Optional<OpenSearchClient> nativeClient = vectorStore.getNativeClient();

if (nativeClient.isPresent()) {
    OpenSearchClient client = nativeClient.get();
    // Use the native client for OpenSearch-specific operations
}

ネイティブクライアントを使用すると、VectorStore インターフェースでは公開されない可能性のある OpenSearch 固有の機能や操作にアクセスできます。