Category: provider
Model Studio Qwen TTS Realtime
Use realtime TTS models for low-latency streaming speech output.
Critical model names
Use one of these exact model strings:
- - INLINECODE0
- INLINECODE1
- INLINECODE2
- INLINECODE3
- INLINECODE4
Prerequisites
- - Install SDK in a virtual environment:
CODEBLOCK0
- - Set
DASHSCOPE_API_KEY in your environment, or add dashscope_api_key to ~/.alibabacloud/credentials.
Normalized interface (tts.realtime)
Request
- -
text (string, required) - INLINECODE9 (string, required)
- INLINECODE10 (string, optional)
- INLINECODE11 (int, optional)
Response
- -
audio_base64_pcm_chunks (array) - INLINECODE13 (int)
- INLINECODE14 (string)
Operational guidance
- - Use websocket or streaming endpoint for realtime mode.
- Keep each utterance short for lower latency.
- For instruction models, keep instruction explicit and concise.
- Some SDK/runtime combinations may reject realtime model calls over
MultiModalConversation; use the probe script below to verify compatibility.
Local demo script
Use the probe script to verify realtime compatibility in your current SDK/runtime, and optionally fallback to a non-realtime model for immediate output:
CODEBLOCK1
Strict mode (for CI / gating):
CODEBLOCK2
Output location
- - Default output: INLINECODE16
- Override base dir with
OUTPUT_DIR.
Validation
CODEBLOCK3
Pass criteria: command exits 0 and output/aliyun-qwen-tts-realtime/validate.txt is generated.
Output And Evidence
- - Save artifacts, command outputs, and API response summaries under
output/aliyun-qwen-tts-realtime/. - Include key parameters (region/resource id/time range) in evidence files for reproducibility.
Workflow
1) Confirm user intent, region, identifiers, and whether the operation is read-only or mutating.
2) Run one minimal read-only query first to verify connectivity and permissions.
3) Execute the target operation with explicit parameters and bounded scope.
4) Verify results and save output/evidence files.
References
技能名称: aliyun-qwen-tts-realtime
详细描述:
类别: 提供者
模型工作室 Qwen TTS 实时
使用实时TTS模型实现低延迟流式语音输出。
关键模型名称
使用以下精确的模型字符串之一:
- - qwen3-tts-flash-realtime
- qwen3-tts-instruct-flash-realtime
- qwen3-tts-instruct-flash-realtime-2026-01-22
- qwen3-tts-vd-realtime-2026-01-15
- qwen3-tts-vc-realtime-2026-01-15
前提条件
bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install dashscope
- - 在环境中设置 DASHSCOPEAPIKEY,或将 dashscopeapikey 添加到 ~/.alibabacloud/credentials。
标准化接口 (tts.realtime)
请求
- - text (字符串,必填)
- voice (字符串,必填)
- instruction (字符串,可选)
- sample_rate (整数,可选)
响应
- - audiobase64pcmchunks (数组<字符串>)
- samplerate (整数)
- finish_reason (字符串)
操作指南
- - 使用websocket或流式端点实现实时模式。
- 保持每个话语简短以降低延迟。
- 对于指令模型,保持指令明确且简洁。
- 某些SDK/运行时组合可能拒绝通过 MultiModalConversation 调用实时模型;请使用下面的探测脚本验证兼容性。
本地演示脚本
使用探测脚本验证当前SDK/运行时的实时兼容性,并可选择回退到非实时模型以立即输出:
bash
.venv/bin/python skills/ai/audio/aliyun-qwen-tts-realtime/scripts/realtimettsdemo.py \
--text 这是一个实时语音演示。 \
--fallback \
--output output/ai-audio-tts-realtime/audio/fallback-demo.wav
严格模式(用于CI/门控):
bash
.venv/bin/python skills/ai/audio/aliyun-qwen-tts-realtime/scripts/realtimettsdemo.py \
--text 实时健康检查 \
--strict
输出位置
- - 默认输出:output/ai-audio-tts-realtime/audio/
- 使用 OUTPUT_DIR 覆盖基础目录。
验证
bash
mkdir -p output/aliyun-qwen-tts-realtime
for f in skills/ai/audio/aliyun-qwen-tts-realtime/scripts/*.py; do
python3 -m py_compile $f
done
echo pycompileok > output/aliyun-qwen-tts-realtime/validate.txt
通过标准:命令退出码为0,并且生成了 output/aliyun-qwen-tts-realtime/validate.txt。
输出与证据
- - 将工件、命令输出和API响应摘要保存在 output/aliyun-qwen-tts-realtime/ 下。
- 在证据文件中包含关键参数(区域/资源ID/时间范围)以确保可复现性。
工作流程
1) 确认用户意图、区域、标识符以及操作是只读还是修改性。
2) 首先运行一个最小的只读查询以验证连接和权限。
3) 使用明确的参数和有限的范围执行目标操作。
4) 验证结果并保存输出/证据文件。
参考