Category: provider
Model Studio CosyVoice Voice Clone
Use the CosyVoice voice enrollment API to create cloned voices from public reference audio.
Critical model names
Use model="voice-enrollment" and one of these target_model values:
- - INLINECODE2
- INLINECODE3
- INLINECODE4
- INLINECODE5
- INLINECODE6
Recommended default in this repo:
Region and compatibility
- -
cosyvoice-v3.5-plus and cosyvoice-v3.5-flash are available only in China mainland deployment mode (Beijing endpoint). - In international deployment mode (Singapore endpoint),
cosyvoice-v3-plus and cosyvoice-v3-flash do not support voice clone/design. - The
target_model used during enrollment must match the model used later in speech synthesis, otherwise synthesis fails.
Endpoint
- - Domestic: INLINECODE13
- International: INLINECODE14
Prerequisites
- - Set
DASHSCOPE_API_KEY in your environment, or add dashscope_api_key to ~/.alibabacloud/credentials. - Provide a public audio URL for the enrollment sample.
Normalized interface (cosyvoice.voice_clone)
Request
- -
model (string, optional): fixed to INLINECODE19 - INLINECODE20 (string, optional): default INLINECODE21
- INLINECODE22 (string, required): letters/digits only, max 10 chars
- INLINECODE23 (string, required): public audio URL
- INLINECODE24 (array[string], optional): only first item is used
- INLINECODE25 (float, optional): only for
cosyvoice-v3.5-plus, cosyvoice-v3.5-flash, INLINECODE28 - INLINECODE29 (bool, optional): only for
cosyvoice-v3.5-plus, cosyvoice-v3.5-flash, INLINECODE32
Response
- -
voice_id (string): use this as the voice parameter in later TTS calls - INLINECODE35 (string)
- INLINECODE36 (number, optional)
Operational guidance
- - For Chinese dialect reference audio, keep
language_hints=["zh"]; control dialect style later in synthesis via text or instruct. - For
cosyvoice-v3.5-plus, supported language_hints include zh, en, fr, de, ja, ko, ru, pt, th, id, vi. - Avoid frequent enrollment calls; each call creates a new custom voice and consumes quota.
Local helper script
Prepare a normalized request JSON:
CODEBLOCK0
Validation
CODEBLOCK1
Pass criteria: command exits 0 and output/aliyun-cosyvoice-voice-clone/validate.txt is generated.
Output And Evidence
- - Save artifacts, command outputs, and API response summaries under
output/aliyun-cosyvoice-voice-clone/. - Include
target_model, prefix, and sample URL in the evidence file.
References
- - INLINECODE56
- INLINECODE57
技能名称: aliyun-cosyvoice-voice-clone
详细描述:
类别: 提供者
模型工作室 CosyVoice 声音克隆
使用 CosyVoice 声音注册 API,从公开参考音频创建克隆声音。
关键模型名称
使用 model=voice-enrollment 和以下 target_model 值之一:
- - cosyvoice-v3.5-plus
- cosyvoice-v3.5-flash
- cosyvoice-v3-plus
- cosyvoice-v3-flash
- cosyvoice-v2
本仓库推荐默认值:
- - target_model=cosyvoice-v3.5-plus
区域与兼容性
- - cosyvoice-v3.5-plus 和 cosyvoice-v3.5-flash 仅在中国大陆部署模式(北京端点)下可用。
- 在国际部署模式(新加坡端点)下,cosyvoice-v3-plus 和 cosyvoice-v3-flash 不支持声音克隆/设计。
- 注册时使用的 target_model 必须与后续语音合成中使用的模型一致,否则合成会失败。
端点
- - 国内:https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
- 国际:https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization
前提条件
- - 在环境中设置 DASHSCOPEAPIKEY,或将 dashscopeapikey 添加到 ~/.alibabacloud/credentials。
- 为注册样本提供公开音频 URL。
标准化接口 (cosyvoice.voice_clone)
请求
- - model (字符串,可选):固定为 voice-enrollment
- targetmodel (字符串,可选):默认为 cosyvoice-v3.5-plus
- prefix (字符串,必填):仅限字母/数字,最多 10 个字符
- voicesampleurl (字符串,必填):公开音频 URL
- languagehints (字符串数组,可选):仅使用第一个元素
- maxpromptaudiolength (浮点数,可选):仅适用于 cosyvoice-v3.5-plus、cosyvoice-v3.5-flash、cosyvoice-v3-flash
- enablepreprocess (布尔值,可选):仅适用于 cosyvoice-v3.5-plus、cosyvoice-v3.5-flash、cosyvoice-v3-flash
响应
- - voiceid (字符串):在后续 TTS 调用中用作 voice 参数
- requestid (字符串)
- usage.count (数字,可选)
操作指南
- - 对于中文方言参考音频,保持 languagehints=[zh];在合成时通过文本或 instruct 控制方言风格。
- 对于 cosyvoice-v3.5-plus,支持的 languagehints 包括 zh、en、fr、de、ja、ko、ru、pt、th、id、vi。
- 避免频繁调用注册接口;每次调用都会创建新的自定义声音并消耗配额。
本地辅助脚本
准备标准化请求 JSON:
bash
python skills/ai/audio/aliyun-cosyvoice-voice-clone/scripts/preparecosyvoiceclone_request.py \
--target-model cosyvoice-v3.5-plus \
--prefix myvoice \
--voice-sample-url https://example.com/voice.wav \
--language-hint zh
验证
bash
mkdir -p output/aliyun-cosyvoice-voice-clone
for f in skills/ai/audio/aliyun-cosyvoice-voice-clone/scripts/*.py; do
python3 -m py_compile $f
done
echo pycompileok > output/aliyun-cosyvoice-voice-clone/validate.txt
通过标准:命令退出码为 0 且生成 output/aliyun-cosyvoice-voice-clone/validate.txt。
输出与证据
- - 将产物、命令输出和 API 响应摘要保存在 output/aliyun-cosyvoice-voice-clone/ 下。
- 在证据文件中包含 target_model、prefix 和样本 URL。
参考
- - references/api_reference.md
- references/sources.md