返回顶部
c

cosyvoice3轻松语音3

|

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
650
下载量
免费
免费
1
收藏
概述
安装方式
版本历史

cosyvoice3

CosyVoice3 TTS

Local text-to-speech using Alibaba's CosyVoice3 on macOS Apple Silicon.

Overview

CosyVoice3 is an advanced TTS system based on large language models, supporting:

  • - 9 languages: Chinese, English, Japanese, Korean, German, Spanish, French, Italian, Russian
  • 18+ Chinese dialects: Cantonese, Sichuan, Dongbei, Shanghai, etc.
  • Zero-shot voice cloning: Clone any voice from 3-10 seconds of audio
  • Cross-lingual synthesis: Speak Chinese with English voice or vice versa
  • Fine-grained control: Emotions, speed, volume via text tags

Prerequisites

  • - macOS with Apple Silicon (M1/M2/M3)
  • Python 3.10
  • Conda installed
  • ~5GB disk space for models

Installation

Run the installation script:

CODEBLOCK0

This will:

  1. 1. Create conda environment INLINECODE0
  2. Install PyTorch (CPU version for Apple Silicon)
  3. Install CosyVoice dependencies
  4. Download Fun-CosyVoice3-0.5B model (~2GB)

Usage

Quick Start - Basic TTS

重要:CosyVoice3 需要在参考文本中添加 <|endofprompt|> 标记!

CODEBLOCK1

Using the TTS Script

Generate speech from text:

CODEBLOCK2

Available Assets

Reference audio files in cosyvoice3-repo/asset/:

  • - zero_shot_prompt.wav - Default Chinese female voice
  • INLINECODE4 - English prompt for cross-lingual

Advanced Features

Voice Cloning

Clone a voice from 3-10 seconds of reference audio:

CODEBLOCK3

Fine-Grained Control

Control prosody with special tags:

CODEBLOCK4

Dialect Support

Use instruct mode for dialects:

CODEBLOCK5

Troubleshooting

Model not found

If you get "model not found" errors, download models manually:

CODEBLOCK6

Memory issues

For long text, split into sentences:

CODEBLOCK7

Audio format

Reference audio requirements:

  • - Format: WAV, MP3
  • Sample rate: 16kHz+ (automatically resampled)
  • Duration: 3-10 seconds optimal
  • Content: Clear speech, minimal background noise

Resources

Scripts

  • - install.sh - Installation script for macOS
  • INLINECODE6 - Main TTS script with CLI interface
  • INLINECODE7 - Download pretrained models

References

Model Files

Located in cosyvoice3-repo/pretrained_models/:

  • - Fun-CosyVoice3-0.5B/ - Main model (recommended)
  • INLINECODE10 - Previous version
  • INLINECODE11 - Lighter model
  • INLINECODE12 - SFT version
  • INLINECODE13 - Instruct version

Notes

  • - First inference takes ~30 seconds (model warmup)
  • Subsequent inferences are faster
  • Apple Silicon uses CPU mode (no CUDA)
  • RTF (real-time factor) ~0.3-0.5 on M-series chips
  • Model files are cached locally after first download

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 cosyvoice3-macos-1776419987 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 cosyvoice3-macos-1776419987 技能

通过命令行安装

skillhub install cosyvoice3-macos-1776419987

下载

⬇ 下载 cosyvoice3 v1.0.0(免费)

文件大小: 8.1 KB | 发布时间: 2026-4-17 20:08

v1.0.0 最新 2026-4-17 20:08
Initial release: Alibaba CosyVoice3 TTS for macOS Apple Silicon. Supports Chinese, English, 18+ dialects, zero-shot voice cloning.

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部