Transcription API
Spring AI は、TranscriptionModel インターフェースを通じて音声テキスト変換用の統合 API を提供します。これにより、異なる音声テキスト変換プロバイダー間で動作する移植性の高いコードを作成できます。
共通インターフェース
すべてのトランスクリプションプロバイダーは、次の共有インターフェースを実装します。
TranscriptionModel
TranscriptionModel インターフェースは、オーディオをテキストに変換するためのメソッドを提供します。
public interface TranscriptionModel extends Model<AudioTranscriptionPrompt, AudioTranscriptionResponse> {
/**
* Transcribes the audio from the given prompt.
*/
AudioTranscriptionResponse call(AudioTranscriptionPrompt transcriptionPrompt);
/**
* A convenience method for transcribing an audio resource.
*/
default String transcribe(Resource resource) {
AudioTranscriptionPrompt prompt = new AudioTranscriptionPrompt(resource);
return this.call(prompt).getResult().getOutput();
}
/**
* A convenience method for transcribing an audio resource with options.
*/
default String transcribe(Resource resource, AudioTranscriptionOptions options) {
AudioTranscriptionPrompt prompt = new AudioTranscriptionPrompt(resource, options);
return this.call(prompt).getResult().getOutput();
}
}Writing Provider-Agnostic Code
One of the key benefits of the shared transcription interface is the ability to write code that works with any transcription provider without modification. The actual provider (OpenAI, Azure OpenAI, etc.) is determined by your Spring Boot configuration, allowing you to switch providers without changing application code.
Basic Service Example
The shared interface allows you to write code that works with any transcription provider:
@Service
public class TranscriptionService {
private final TranscriptionModel transcriptionModel;
public TranscriptionService(TranscriptionModel transcriptionModel) {
this.transcriptionModel = transcriptionModel;
}
public String transcribeAudio(Resource audioFile) {
return transcriptionModel.transcribe(audioFile);
}
public String transcribeWithOptions(Resource audioFile, AudioTranscriptionOptions options) {
AudioTranscriptionPrompt prompt = new AudioTranscriptionPrompt(audioFile, options);
AudioTranscriptionResponse response = transcriptionModel.call(prompt);
return response.getResult().getOutput();
}
}This service works seamlessly with OpenAI, Azure OpenAI, or any other transcription provider, with the actual implementation determined by your Spring Boot configuration.
Provider-Specific Features
While the shared interface provides portability, each provider also offers specific features through provider-specific options classes (e.g., OpenAiAudioTranscriptionOptions, AzureOpenAiAudioTranscriptionOptions). These classes implement the AudioTranscriptionOptions interface while adding provider-specific capabilities.
For detailed information about provider-specific features, see the individual provider documentation pages.