返回顶部
q

qwen3-audio

High-performance audio library for Apple Silicon with text-to-speech (TTS) and speech-to-text (STT).

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 0.1.1
安全检测
已通过
390
下载量
0
收藏
概述
安装方式
版本历史

qwen3-audio

# Qwen3-Audio ## Overview Qwen3-Audio is a high-performance audio processing library optimized for Apple Silicon (M1/M2/M3/M4). It delivers fast, efficient TTS and STT with support for multiple models, languages, and audio formats. ## Prerequisites - Python 3.10+ - Apple Silicon Mac (M1/M2/M3/M4) ### Environment checks Before using any capability, verify that all items in `./references/env-check-list.md` are complete. ## Capabilities ### Text to Speech ```bash uv run --python ".venv/bin/python" "./scripts/mlx-audio.py" tts --text "hello world" --output "/path_to_save.wav" ``` **Returns (JSON):** ```json { "audio_path": "/path_to_save.wav", "duration": 1.234, "sample_rate": 24000 } ``` ### Voice Cloning Clone any voice using a reference audio sample. Provide the wav file and its transcript: ```bash uv run --python ".venv/bin/python" "./scripts/mlx-audio.py" tts --text "hello world" --output "/path_to_save.wav" --ref_audio "sample_audio.wav" --ref_text "This is what my voice sounds like." ``` ref_audio: reference audio to clone ref_text: transcript of the reference audio ### Use Created Voice (Shortcut) Use a voice created with `voice create` by its ID: ```bash uv run --python ".venv/bin/python" "./scripts/mlx-audio.py" tts --text "hello world" --output "/path_to_save.wav" --ref_voice "my-voice-id" ``` This automatically loads `ref_audio` and `ref_text` from the voice profile. ### CustomVoice (Emotion Control) Use predefined voices with emotion/style instructions: ```bash uv run --python ".venv/bin/python" "./scripts/mlx-audio.py" tts --text "hello world" --output "/path_to_save.wav" --speaker "Ryan" --language "English" --instruct "Very happy and excited." ``` ### VoiceDesign (Create Any Voice) Create any voice from a text description: ```bash uv run --python ".venv/bin/python" "./scripts/mlx-audio.py" tts --text "hello world" --output "/path_to_save.wav" --language "English" --instruct "A cheerful young female voice with high pitch and energetic tone." ``` ### Automatic Speech Recognition (STT) ```bash uv run --python ".venv/bin/python" "./scripts/mlx-audio.py" stt --audio "/sample_audio.wav" --output "/path_to_save.txt" --output-format srt ``` Test audio: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav output-format: "txt" | "ass" | "srt" | "all" **Returns (JSON):** ```json { "text": "transcribed text content", "duration": 10.5, "sample_rate": 16000, "files": ["/path_to_save.txt", "/path_to_save.srt"] } ``` ### Voice Management Voices are stored in the `voices/` directory at the skill root level. Each voice has its own folder containing: - `ref_audio.wav` - Reference audio file - `ref_text.txt` - Reference text transcript - `ref_instruct.txt` - Voice style description #### Create a Voice Create a reusable voice profile using VoiceDesign model. The `--instruct` parameter is required to describe the voice style: ```bash uv run --python ".venv/bin/python" "./scripts/mlx-audio.py" voice create --text "This is a sample voice reference text." --instruct "A warm, friendly female voice with a professional tone." --language "English" ``` Optional: `--id "my-voice-id"` to specify a custom voice ID. **Returns (JSON):** ```json { "id": "abc12345", "ref_audio": "/path/to/skill/voices/abc12345/ref_audio.wav", "ref_text": "This is a sample voice reference text.", "instruct": "A warm, friendly female voice with a professional tone.", "duration": 3.456, "sample_rate": 24000 } ``` #### List Voices List all created voice profiles: ```bash uv run --python ".venv/bin/python" "./scripts/mlx-audio.py" voice list ``` **Returns (JSON):** ```json [ { "id": "abc12345", "ref_audio": "/path/to/skill/voices/abc12345/ref_audio.wav", "ref_text": "This is a sample voice reference text.", "instruct": "A warm, friendly female voice with a professional tone.", "duration": 3.456, "sample_rate": 24000 } ] ``` #### Use a Created Voice After creating a voice, use it for TTS with the `--ref_voice` parameter. The instruct will be automatically loaded: ```bash uv run --python ".venv/bin/python" "./scripts/mlx-audio.py" tts --text "New text to speak" --output "/output.wav" --ref_voice "abc12345" ``` ## Predefined Speakers (CustomVoice) For `Qwen3-TTS-12Hz-1.7B/0.6B-CustomVoice` models, the supported speakers and their descriptions are listed below. We recommend using each speaker's native language for best quality. Each speaker can still speak any language supported by the model. | Speaker | Voice Description | Native Language | | --- | --- | --- | | Vivian | Bright, slightly edgy young female voice. | Chinese | | Serena | Warm, gentle young female voice. | Chinese | | Uncle_Fu | Seasoned male voice with a low, mellow timbre. | Chinese | | Dylan | Youthful Beijing male voice with a clear, natural timbre. | Chinese (Beijing Dialect) | | Eric | Lively Chengdu male voice with a slightly husky brightness. | Chinese (Sichuan Dialect) | | Ryan | Dynamic male voice with strong rhythmic drive. | English | | Aiden | Sunny American male voice with a clear midrange. | English | | Ono_Anna | Playful Japanese female voice with a light, nimble timbre. | Japanese | | Sohee | Warm Korean female voice with rich emotion. | Korean | ### Released Models | Model | Features | Language Support | Instruction Control | |---|---|---|---| | Qwen3-TTS-12Hz-1.7B-VoiceDesign | Performs voice design based on user-provided descriptions. | Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian | ✅ | | Qwen3-TTS-12Hz-1.7B-CustomVoice | Provides style control over target timbres via user instructions; supports 9 premium timbres covering various combinations of gender, age, language, and dialect. | Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian | ✅ | | Qwen3-TTS-12Hz-1.7B-Base | Base model capable of 3-second rapid voice clone from user audio input; can be used for fine-tuning (FT) other models. | Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian | |

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 qwen3-audio-1776278964 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 qwen3-audio-1776278964 技能

通过命令行安装

skillhub install qwen3-audio-1776278964

下载 Zip 包

⬇ 下载 qwen3-audio v0.1.1

文件大小: 9.9 KB | 发布时间: 2026-4-16 18:37

v0.1.1 最新 2026-4-16 18:37
Voice profile management updated to require and support style descriptions.

- Voice profiles now include a mandatory instruct (style description) field.
- voices/ directory structure updated: each voice now contains ref_instruct.txt.
- voice create command requires --instruct to describe voice style (used with VoiceDesign model).
- Listing or using voices now shows and applies the instruct field automatically.
- Documentation updated to reflect new requirements and workflow for voice profile creation and use.

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部