MiniMax Multimodal Toolkit

Generate voice, music, image, and video content via MiniMax APIs. Pure Python — works on Windows, Mac, and Linux without any third-party dependencies.

Prerequisites

- MINIMAX_API_KEY environment variable (starts with sk-)
INLINECODE2 environment variable (optional, default: https://api.minimaxi.com)
Python 3.6+
For video duration detection: ffprobe (optional)

Quick Start

CODEBLOCK0

Or use CLI directly:
CODEBLOCK1

Output Convention

All generated files MUST be saved to minimax-output/ under the agent's working directory.

TTS (Text-to-Speech)

Endpoint: POST /v1/t2a_v2 — returns hex audio, decoded and saved as file.

Models: speech-2.8-hd (recommended, best quality), speech-2.8-turbo (faster), speech-02-hd, INLINECODE10

CODEBLOCK2

Common voice IDs: female-shaonv, male-qn-qingse, male-qn-jingying, presenter_male, presenter_female
Emotions: happy, sad, angry, fearful, disgusted, surprised, calm, fluent, whisper (empty = auto)

Music Generation

Endpoint: POST /v1/music_generation — lyrics required, returns audio URL. Takes 30-300 seconds.

CODEBLOCK3

Image Generation (Text-to-Image)

Endpoint: POST /v1/image_generation — returns image URLs (immediate).

CODEBLOCK4

Aspect ratios: 1:1 (default), 16:9, 4:3, 3:2, 2:3, 3:4, 9:16, INLINECODE34

Image-to-Image Generation

Endpoint: POST /v1/image_generation with image_file — generate new images from a reference.

CODEBLOCK5

Video Generation

Endpoint: POST /v1/video_generation (async) + GET /v1/query/video_generation — polling required.

CODEBLOCK6

Models: MiniMax-Hailuo-2.3 (default), MiniMax-Hailuo-2.3-Fast (i2v), MiniMax-Hailuo-02 (1080P, 10s)
Modes: t2v, i2v, sef (start-end frame), ref (subject reference)

Video Prompt Tips

Main subject + Scene + Movement + Camera motion + Aesthetic. For i2v: describe motion only, don't repeat what's in the image.

Generate & Send to Feishu

Use generate_and_send.py to generate content and prepare for Feishu delivery via the feishu-media skill:

CODEBLOCK7

After generation, the script outputs file paths and feishu-media send instructions. Use the feishu-media skill to actually deliver the content.

Legacy PowerShell Script

The original scripts/minimax-api.ps1 is preserved for backward compatibility but is deprecated. Use the Python scripts instead.

Error Handling

Error Code	Meaning	Solution
2061	Plan doesn't support model	Try `speech-02-turbo` for TTS
1008

References

See references/ folder for detailed API docs, voice catalogs, and prompt guides.

错误码	含义	解决方案
2061	套餐不支持该模型	TTS 尝试使用 speech-02-turbo
1008

ali-minimax-toolkitMiniMax多模态工具

ali-minimax-toolkit

MiniMax Multimodal Toolkit

Prerequisites

Quick Start

Output Convention

TTS (Text-to-Speech)

Music Generation

Image Generation (Text-to-Image)

Image-to-Image Generation

Video Generation

Video Prompt Tips

Generate & Send to Feishu

Legacy PowerShell Script

Error Handling

References

MiniMax 多模态工具包

前置条件

快速开始

加载 Python 模块

输出规范

文本转语音（TTS）

基础 TTS

中文特定音色

带情感

音乐生成

纯音乐（背景音乐）

带歌词的歌曲

图像生成（文本转图像）

基础

带宽高比

多张图像

带提示词优化器

图像到图像生成

从本地文件

从 URL

视频生成

文本转视频

图像转视频（提示词仅关注运动）

主体参考（面部一致性）

视频提示词技巧

生成并发送到飞书

生成 TTS 并发送

生成图像并发送

设置 FEISHUCHATID 环境变量以避免每次传递 --feishu-chat

旧版 PowerShell 脚本

错误处理

参考资料

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement