Azure OpenAI 语音转语音聊天

您所在的位置：网站首页 › 中文翻译语音说话 › Azure OpenAI 语音转语音聊天

Azure OpenAI 语音转语音聊天

2024-06-16 02:30| 来源: 网络整理| 查看: 265

你当前正在访问 Microsoft Azure Global Edition 技术文档网站。如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站，请访问 https://docs.azure.cn。

Azure OpenAI 语音转语音聊天项目02/15/2024

参考文档 | 包 (NuGet) | GitHub 上的其他示例

在本操作指南中，可以使用 Azure AI 语音与 Azure OpenAI 服务对话。语音服务识别的文本将发送到 Azure OpenAI。语音服务根据 Azure OpenAI 的文本响应合成语音。

对着麦克风讲话，开始与 Azure OpenAI 的对话。

语音服务可识别语音并将其转换为文本（语音转文本）。你的请求以文本形式发送到 Azure OpenAI。语音服务文本转语音功能可合成 Azure OpenAI 的对默认说话人的响应。

尽管此示例的体验是来回交流，但 Azure OpenAI 不会记住对话的上下文。

重要

若要完成本指南中的步骤，你必须能够访问 Azure 订阅中的 Microsoft Azure OpenAI 服务。目前，仅应用程序授予对此服务的访问权限。通过在 https://aka.ms/oai/access 上填写表单来申请对 Azure OpenAI 的访问权限。

先决条件 Azure 订阅 - 免费创建订阅在 Azure 门户中创建 Microsoft Azure OpenAI 服务资源。在 Azure OpenAI 资源中部署模型。有关模型部署的详细信息，请参阅 Azure OpenAI 资源部署指南。获取 Azure OpenAI 资源密钥和终结点。部署 Azure OpenAI 资源后，选择“转到资源”以查看和管理密钥。有关 Azure AI 服务资源的详细信息，请参阅获取资源密钥。在 Azure 门户中创建语音资源。获取语音资源密钥和区域。部署语音资源后，选择“转到资源”以查看和管理密钥。有关 Azure AI 服务资源的详细信息，请参阅获取资源密钥。设置环境

语音 SDK 以 NuGet 包的形式提供并实现了 .NET Standard 2.0。本指南的后面部分会安装语音 SDK，但先请查看 SDK 安装指南以了解更多要求。

设置环境变量。

此示例需要名为 OPEN_AI_KEY、OPEN_AI_ENDPOINT、OPEN_AI_DEPLOYMENT_NAME、SPEECH_KEY 和 SPEECH_REGION 的环境变量。

必须对应用程序进行身份验证才能访问 Azure AI 服务资源。对于生产，请使用安全的方式存储和访问凭据。例如，为语音资源获取密钥后，请将其写入运行应用程序的本地计算机上的新环境变量。

提示

请不要直接在代码中包含密钥，并且绝不公开发布密钥。有关 Azure Key Vault 之类的其他身份验证选项，请参阅 Azure AI 服务安全性。

若要设置环境变量，请打开控制台窗口，按照操作系统和开发环境的说明进行操作。

若要设置 OPEN_AI_KEY 环境变量，请将 your-openai-key 替换为资源的其中一个密钥。若要设置 OPEN_AI_ENDPOINT 环境变量，请将 your-openai-endpoint 替换为你的资源的其中一个区域。若要设置 OPEN_AI_DEPLOYMENT_NAME 环境变量，请将 your-openai-deployment-name 替换为你的资源的其中一个区域。若要设置 SPEECH_KEY 环境变量，请将 your-speech-key 替换为资源的其中一个密钥。若要设置 SPEECH_REGION 环境变量，请将 your-speech-region 替换为你的资源的其中一个区域。 Windows Linux macOS setx OPEN_AI_KEY your-openai-key setx OPEN_AI_ENDPOINT your-openai-endpoint setx OPEN_AI_DEPLOYMENT_NAME your-openai-deployment-name setx SPEECH_KEY your-speech-key setx SPEECH_REGION your-speech-region

注意

如果只需要访问当前正在运行的控制台中的环境变量，则请使用 set（而不是 setx）设置环境变量。

添加环境变量后，你可能需要重启任何需要读取环境变量的正在运行的程序（包括控制台窗口）。例如，如果 Visual Studio 是编辑器，请在运行示例之前重启 Visual Studio。

export OPEN_AI_KEY=your-openai-key export OPEN_AI_ENDPOINT=your-openai-endpoint export OPEN_AI_DEPLOYMENT_NAME=your-openai-deployment-name export SPEECH_KEY=your-speech-key export SPEECH_REGION=your-speech-region

添加环境变量后，请从控制台窗口运行 source ~/.bashrc，使更改生效。

Bash

编辑 .bash_profile，然后添加环境变量：

添加环境变量后，请从控制台窗口运行 source ~/.bash_profile，使更改生效。

Xcode

对于 iOS 和 macOS 开发，请在 Xcode 中设置环境变量。例如，按照以下步骤在 Xcode 13.4.1 中设置环境变量。

选择“产品”>“方案”>“编辑方案”。在“运行(调试运行)”页面上选择“参数”。在“环境变量”下，选择加号 (+) 符号来添加新的环境变量。为“名称”输入 SPEECH_KEY，为“值”输入语音资源密钥。

重复这些步骤以设置其他必需的环境变量。

有关更多配置选项，请参阅 Xcode 文档。

识别来自麦克风的语音

按照以下步骤创建新的控制台应用程序。

在需要新项目的文件夹中打开命令提示符窗口。运行以下命令，使用 .NET CLI 创建控制台应用程序。

dotnet new console

该命令会在项目目录中创建 Program.cs 文件。

使用 .NET CLI 在新项目中安装语音 SDK。

dotnet add package Microsoft.CognitiveServices.Speech

使用 .NET CLI 在新项目中安装 Azure OpenAI SDK（预发行版本）。

dotnet add package Azure.AI.OpenAI --prerelease

将 Program.cs 的内容替换为以下代码。

using System.Text; using Microsoft.CognitiveServices.Speech; using Microsoft.CognitiveServices.Speech.Audio; using Azure; using Azure.AI.OpenAI; // This example requires environment variables named "OPEN_AI_KEY", "OPEN_AI_ENDPOINT" and "OPEN_AI_DEPLOYMENT_NAME" // Your endpoint should look like the following https://YOUR_OPEN_AI_RESOURCE_NAME.openai.azure.com/ string openAIKey = Environment.GetEnvironmentVariable("OPEN_AI_KEY") ?? throw new ArgumentException("Missing OPEN_AI_KEY"); string openAIEndpoint = Environment.GetEnvironmentVariable("OPEN_AI_ENDPOINT") ?? throw new ArgumentException("Missing OPEN_AI_ENDPOINT"); // Enter the deployment name you chose when you deployed the model. string engine = Environment.GetEnvironmentVariable("OPEN_AI_DEPLOYMENT_NAME") ?? throw new ArgumentException("Missing OPEN_AI_DEPLOYMENT_NAME"); // This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION" string speechKey = Environment.GetEnvironmentVariable("SPEECH_KEY") ?? throw new ArgumentException("Missing SPEECH_KEY"); string speechRegion = Environment.GetEnvironmentVariable("SPEECH_REGION") ?? throw new ArgumentException("Missing SPEECH_REGION"); // Sentence end symbols for splitting the response into sentences. List sentenceSaperators = new() { ".", "!", "?", ";", "。", "！", "？", "；", "\n" }; try { await ChatWithOpenAI(); } catch (Exception ex) { Console.WriteLine(ex); } // Prompts Azure OpenAI with a request and synthesizes the response. async Task AskOpenAI(string prompt) { object consoleLock = new(); var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion); // The language of the voice that speaks. speechConfig.SpeechSynthesisVoiceName = "en-US-JennyMultilingualNeural"; var audioOutputConfig = AudioConfig.FromDefaultSpeakerOutput(); using var speechSynthesizer = new SpeechSynthesizer(speechConfig, audioOutputConfig); speechSynthesizer.Synthesizing += (sender, args) => { lock (consoleLock) { Console.ForegroundColor = ConsoleColor.Yellow; Console.Write($"[Audio]"); Console.ResetColor(); } }; // Ask Azure OpenAI OpenAIClient client = new(new Uri(openAIEndpoint), new AzureKeyCredential(openAIKey)); var completionsOptions = new ChatCompletionsOptions() { DeploymentName = engine, Messages = { new ChatRequestUserMessage(prompt) }, MaxTokens = 100, }; var responseStream = await client.GetChatCompletionsStreamingAsync(completionsOptions); StringBuilder gptBuffer = new(); await foreach (var completionUpdate in responseStream) { var message = completionUpdate.ContentUpdate; if (string.IsNullOrEmpty(message)) { continue; } lock (consoleLock) { Console.ForegroundColor = ConsoleColor.DarkBlue; Console.Write($"{message}"); Console.ResetColor(); } gptBuffer.Append(message); if (sentenceSaperators.Any(message.Contains)) { var sentence = gptBuffer.ToString().Trim(); if (!string.IsNullOrEmpty(sentence)) { await speechSynthesizer.SpeakTextAsync(sentence); gptBuffer.Clear(); } } } } // Continuously listens for speech input to recognize and send as text to Azure OpenAI async Task ChatWithOpenAI() { // Should be the locale for the speaker's language. var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion); speechConfig.SpeechRecognitionLanguage = "en-US"; using var audioConfig = AudioConfig.FromDefaultMicrophoneInput(); using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig); var conversationEnded = false; while (!conversationEnded) { Console.WriteLine("Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation."); // Get audio from the microphone and then send it to the TTS service. var speechRecognitionResult = await speechRecognizer.RecognizeOnceAsync(); switch (speechRecognitionResult.Reason) { case ResultReason.RecognizedSpeech: if (speechRecognitionResult.Text == "Stop.") { Console.WriteLine("Conversation ended."); conversationEnded = true; } else { Console.WriteLine($"Recognized speech: {speechRecognitionResult.Text}"); await AskOpenAI(speechRecognitionResult.Text); } break; case ResultReason.NoMatch: Console.WriteLine($"No speech could be recognized: "); break; case ResultReason.Canceled: var cancellationDetails = CancellationDetails.FromResult(speechRecognitionResult); Console.WriteLine($"Speech Recognition canceled: {cancellationDetails.Reason}"); if (cancellationDetails.Reason == CancellationReason.Error) { Console.WriteLine($"Error details={cancellationDetails.ErrorDetails}"); } break; } } }

要增加或减少 Azure OpenAI 返回的令牌数，请更改 ChatCompletionsOptions 类实例中的 MaxTokens 属性。有关令牌和成本含义的详细信息，请参阅 Azure OpenAI 令牌和 Azure OpenAI 定价。

运行新的控制台应用程序，从麦克风开始进行语音识别：

dotnet run

重要

请确保按照所述要求设置 OPEN_AI_KEY、OPEN_AI_ENDPOINT、OPEN_AI_DEPLOYMENT_NAME、SPEECH_KEY 和 SPEECH_REGION环境变量。如果未设置这些变量，示例将失败并显示错误消息。

当系统提示时，对着麦克风说话。控制台输出包括提示你开始说话，以文本形式提供你的请求，然后是来自 Azure OpenAI 的响应（文本）。 Azure OpenAI 的响应应从文本转换为语音，然后输出给默认说话人。

PS C:\dev\openai\csharp> dotnet run Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation. Recognized speech:Make a comma separated list of all continents. Azure OpenAI response:Africa, Antarctica, Asia, Australia, Europe, North America, South America Speech synthesized to speaker for text [Africa, Antarctica, Asia, Australia, Europe, North America, South America] Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation. Recognized speech: Make a comma separated list of 1 Astronomical observatory for each continent. A list should include each continent name in parentheses. Azure OpenAI response:Mauna Kea Observatories (North America), La Silla Observatory (South America), Tenerife Observatory (Europe), Siding Spring Observatory (Australia), Beijing Xinglong Observatory (Asia), Naukluft Plateau Observatory (Africa), Rutherford Appleton Laboratory (Antarctica) Speech synthesized to speaker for text [Mauna Kea Observatories (North America), La Silla Observatory (South America), Tenerife Observatory (Europe), Siding Spring Observatory (Australia), Beijing Xinglong Observatory (Asia), Naukluft Plateau Observatory (Africa), Rutherford Appleton Laboratory (Antarctica)] Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation. Conversation ended. PS C:\dev\openai\csharp> 注解

以下是一些其他的注意事项：

若要更改语音识别语言，请将 en-US 替换为其他支持的语言。例如，es-ES 代表西班牙语（西班牙）。默认语言为 en-US。若要详细了解如何从多种使用的语言中进行识别，请参阅语言识别。若要更改你听到的语音，请将 en-US-JennyMultilingualNeural 替换为另一个受支持的语音。如果语音不使用从 Azure OpenAI 返回的文本的语言，则语音服务不会输出合成音频。若要使用不同的模型，请将 gpt-35-turbo-instruct 替换为另一个部署的 ID。部署 ID 不一定与模型名称相同。你是在 Azure OpenAI Studio 中创建部署时为其命名的。 Azure OpenAI 还会对提示输入和生成的输出执行内容审核。检测到有害内容时，可能会筛选提示或响应。有关详细信息，请参阅内容筛选一文。清理资源

可以使用 Azure 门户或 Azure 命令行接口 (CLI) 删除创建的语音资源。

参考文档 | 包 (PyPi) | GitHub 上的其他示例

对着麦克风讲话，开始与 Azure OpenAI 的对话。

尽管此示例的体验是来回交流，但 Azure OpenAI 不会记住对话的上下文。

重要

适用于 Python 的语音 SDK 可用作 Python 包索引 (PyPI) 模块。适用于 Python 的语音 SDK 与 Windows、Linux 和 macOS 兼容。

安装适用于你的平台的 Microsoft Visual C++ Redistributable for Visual Studio 2015、2017、2019 和 2022。首次安装此包时，可能需要重启。在 Linux 上，你必须使用 x64 目标体系结构。

安装 3.7 或更高版本的 Python。首先请查看 SDK 安装指南以了解更多要求。

安装以下 Python 库：os、requests、json。

设置环境变量。

此示例需要名为 OPEN_AI_KEY、OPEN_AI_ENDPOINT、OPEN_AI_DEPLOYMENT_NAME、SPEECH_KEY 和 SPEECH_REGION 的环境变量。

提示

请不要直接在代码中包含密钥，并且绝不公开发布密钥。有关 Azure Key Vault 之类的其他身份验证选项，请参阅 Azure AI 服务安全性。

若要设置环境变量，请打开控制台窗口，按照操作系统和开发环境的说明进行操作。

注意

如果只需要访问当前正在运行的控制台中的环境变量，则请使用 set（而不是 setx）设置环境变量。

添加环境变量后，请从控制台窗口运行 source ~/.bashrc，使更改生效。

Bash

编辑 .bash_profile，然后添加环境变量：

添加环境变量后，请从控制台窗口运行 source ~/.bash_profile，使更改生效。

Xcode

对于 iOS 和 macOS 开发，请在 Xcode 中设置环境变量。例如，按照以下步骤在 Xcode 13.4.1 中设置环境变量。

重复这些步骤以设置其他必需的环境变量。

有关更多配置选项，请参阅 Xcode 文档。

识别来自麦克风的语音

按照以下步骤创建新的控制台应用程序。

在需要新项目的文件夹中打开命令提示符窗口。在需要新项目的地方打开命令提示符，并创建名为 openai-speech.py 的新文件。

运行此命令以安装语音 SDK：

pip install azure-cognitiveservices-speech

运行以下命令以安装 OpenAI SDK：

pip install openai

注意

此库由 OpenAI 而非 Microsoft Azure 维护。参考发行历史记录或 version.py 提交历史记录跟踪库的最新更新。

创建一个名为 openai-speech.py 的文件。将以下代码复制到该文件中：

import os import azure.cognitiveservices.speech as speechsdk from openai import AzureOpenAI # This example requires environment variables named "OPEN_AI_KEY", "OPEN_AI_ENDPOINT" and "OPEN_AI_DEPLOYMENT_NAME" # Your endpoint should look like the following https://YOUR_OPEN_AI_RESOURCE_NAME.openai.azure.com/ client = AzureOpenAI( azure_endpoint=os.environ.get('OPEN_AI_ENDPOINT'), api_key=os.environ.get('OPEN_AI_KEY'), api_version="2023-05-15" ) # This will correspond to the custom name you chose for your deployment when you deployed a model. deployment_id=os.environ.get('OPEN_AI_DEPLOYMENT_NAME') # This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION" speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'), region=os.environ.get('SPEECH_REGION')) audio_output_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True) audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True) # Should be the locale for the speaker's language. speech_config.speech_recognition_language="en-US" speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config) # The language of the voice that responds on behalf of Azure OpenAI. speech_config.speech_synthesis_voice_name='en-US-JennyMultilingualNeural' speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_output_config) # tts sentence end mark tts_sentence_end = [ ".", "!", "?", ";", "。", "！", "？", "；", "\n" ] # Prompts Azure OpenAI with a request and synthesizes the response. def ask_openai(prompt): # Ask Azure OpenAI in streaming way response = client.chat.completions.create(model=deployment_id, max_tokens=200, stream=True, messages=[ {"role": "user", "content": prompt} ]) collected_messages = [] last_tts_request = None # iterate through the stream response stream for chunk in response: if len(chunk.choices) > 0: chunk_message = chunk.choices[0].delta.content # extract the message if chunk_message is not None: collected_messages.append(chunk_message) # save the message if chunk_message in tts_sentence_end: # sentence end found text = ''.join(collected_messages).strip() # join the recieved message together to build a sentence if text != '': # if sentence only have \n or space, we could skip print(f"Speech synthesized to speaker for: {text}") last_tts_request = speech_synthesizer.speak_text_async(text) collected_messages.clear() if last_tts_request: last_tts_request.get() # Continuously listens for speech input to recognize and send as text to Azure OpenAI def chat_with_open_ai(): while True: print("Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.") try: # Get audio from the microphone and then send it to the TTS service. speech_recognition_result = speech_recognizer.recognize_once_async().get() # If speech is recognized, send it to Azure OpenAI and listen for the response. if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech: if speech_recognition_result.text == "Stop.": print("Conversation ended.") break print("Recognized speech: {}".format(speech_recognition_result.text)) ask_openai(speech_recognition_result.text) elif speech_recognition_result.reason == speechsdk.ResultReason.NoMatch: print("No speech could be recognized: {}".format(speech_recognition_result.no_match_details)) break elif speech_recognition_result.reason == speechsdk.ResultReason.Canceled: cancellation_details = speech_recognition_result.cancellation_details print("Speech Recognition canceled: {}".format(cancellation_details.reason)) if cancellation_details.reason == speechsdk.CancellationReason.Error: print("Error details: {}".format(cancellation_details.error_details)) except EOFError: break # Main try: chat_with_open_ai() except Exception as err: print("Encountered exception. {}".format(err))

若要增加或减少 Azure OpenAI 返回的令牌数，请更改 max_tokens 参数。有关令牌和成本含义的详细信息，请参阅 Azure OpenAI 令牌和 Azure OpenAI 定价。

运行新的控制台应用程序，从麦克风开始进行语音识别：

python openai-speech.py

重要

确保按照前面所述设置 OPEN_AI_KEY、OPEN_AI_ENDPOINT、OPEN_AI_DEPLOYMENT_NAME、SPEECH_KEY 和 SPEECH_REGION 环境变量。如果未设置这些变量，示例将失败并显示错误消息。

PS C:\dev\openai\python> python.exe .\openai-speech.py Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation. Recognized speech:Make a comma separated list of all continents. Azure OpenAI response:Africa, Antarctica, Asia, Australia, Europe, North America, South America Speech synthesized to speaker for text [Africa, Antarctica, Asia, Australia, Europe, North America, South America] Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation. Recognized speech: Make a comma separated list of 1 Astronomical observatory for each continent. A list should include each continent name in parentheses. Azure OpenAI response:Mauna Kea Observatories (North America), La Silla Observatory (South America), Tenerife Observatory (Europe), Siding Spring Observatory (Australia), Beijing Xinglong Observatory (Asia), Naukluft Plateau Observatory (Africa), Rutherford Appleton Laboratory (Antarctica) Speech synthesized to speaker for text [Mauna Kea Observatories (North America), La Silla Observatory (South America), Tenerife Observatory (Europe), Siding Spring Observatory (Australia), Beijing Xinglong Observatory (Asia), Naukluft Plateau Observatory (Africa), Rutherford Appleton Laboratory (Antarctica)] Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation. Conversation ended. PS C:\dev\openai\python> 注解

以下是一些其他的注意事项：

若要更改语音识别语言，请将 en-US 替换为其他支持的语言。例如，es-ES 代表西班牙语（西班牙）。默认语言为 en-US。若要详细了解如何从多种使用的语言中进行识别，请参阅语言识别。若要更改你听到的语音，请将 en-US-JennyMultilingualNeural 替换为另一个受支持的语音。如果语音不使用从 Azure OpenAI 返回的文本的语言，则语音服务不会输出合成音频。若要使用不同的模型，请将 gpt-35-turbo-instruct 替换为另一个部署的 ID。请记住，部署 ID 不一定与模型名称相同。你是在 Azure OpenAI Studio 中创建部署时为其命名的。 Azure OpenAI 还会对提示输入和生成的输出执行内容审核。检测到有害内容时，可能会筛选提示或响应。有关详细信息，请参阅内容筛选一文。清理资源

可以使用 Azure 门户或 Azure 命令行接口 (CLI) 删除创建的语音资源。

相关内容详细了解语音详细了解 Azure OpenAI

【本文地址】

公司简介

联系我们