IMA Studio
# IMA AI Creation
## ⚠️ 重要:模型 ID 参考
**CRITICAL:** When calling the script, you MUST use the exact **model_id** (second column), NOT the friendly model name. Do NOT infer model_id from the friendly name (e.g., ❌ `nano-banana-pro` is WRONG; ✅ `gemini-3-pro-image` is CORRECT).
**Quick Reference Table:**
### 图像模型 (Image Models)
| 友好名称 (Friendly Name) | model_id | 说明 (Notes) |
|-------------------------|----------|-------------|
| Nano Banana2 | `gemini-3.1-flash-image` | ❌ NOT nano-banana-2, 预算选择 4-13 pts |
| Nano Banana Pro | `gemini-3-pro-image` | ❌ NOT nano-banana-pro, 高质量 10-18 pts |
| SeeDream 4.5 | `doubao-seedream-4.5` | ✅ Recommended default, 5 pts |
| Midjourney | `midjourney` | ✅ Same as friendly name, 8-10 pts |
### 视频模型 (Video Models)
| 友好名称 (Friendly Name) | model_id (t2v) | model_id (i2v) | 说明 (Notes) |
|-------------------------|---------------|----------------|-------------|
| Wan 2.6 | `wan2.6-t2v` | `wan2.6-i2v` | ⚠️ Note -t2v/-i2v suffix |
| IMA Video Pro (Sevio 1.0) | `ima-pro` | `ima-pro` | ✅ IMA native quality model |
| IMA Video Pro Fast (Sevio 1.0-Fast) | `ima-pro-fast` | `ima-pro-fast` | ✅ IMA native low-latency model |
| Kling O1 | `kling-video-o1` | `kling-video-o1` | ⚠️ Note video- prefix |
| Kling 2.6 | `kling-v2-6` | `kling-v2-6` | ⚠️ Note v prefix |
| Hailuo 2.3 | `MiniMax-Hailuo-2.3` | `MiniMax-Hailuo-2.3` | ⚠️ Note MiniMax- prefix |
| Hailuo 2.0 | `MiniMax-Hailuo-02` | `MiniMax-Hailuo-02` | ⚠️ Note 02 not 2.0 |
| Google Veo 3.1 | `veo-3.1-generate-preview` | `veo-3.1-generate-preview` | ⚠️ Note -generate-preview suffix |
| Sora 2 Pro | `sora-2-pro` | `sora-2-pro` | ✅ Straightforward |
| Pixverse | `pixverse` | `pixverse` | ✅ Same as friendly name |
### 音乐模型 (Music Models)
| 友好名称 (Friendly Name) | model_id | 说明 (Notes) |
|-------------------------|----------|-------------|
| Suno (sonic v4) | `sonic` | ⚠️ Simplified to sonic |
| DouBao BGM | `GenBGM` | ❌ NOT doubao-bgm |
| DouBao Song | `GenSong` | ❌ NOT doubao-song |
### 语音模型 (Speech/TTS Models)
| 友好名称 (Friendly Name) | model_id | 说明 (Notes) |
|-------------------------|----------|-------------|
| seed-tts-2.0 | `seed-tts-2.0` | ✅ Same as friendly name (default) |
**How to get the correct model_id:**
1. Check this table first
2. Use `--list-models --task-type <type>` to query available models
3. Refer to command examples in this SKILL.md
> Runtime truth source: `GET /open/v1/product/list` (or `--list-models`).
> Any table in this document is guidance; actual availability depends on current product list.
**Example:**
```bash
# ❌ WRONG: Inferring from friendly name
--model-id nano-banana-pro
# ✅ CORRECT: Using exact model_id from table
--model-id gemini-3-pro-image
```
---
## 📚 Optional Knowledge Enhancement (ima-knowledge-ai)
This skill is fully runnable as a standalone package.
If `ima-knowledge-ai` is installed, the agent may read its references for workflow decomposition and consistency guidance.
Recommended optional reads:
1. **Check for workflow complexity** — Read `ima-knowledge-ai/references/workflow-design.md` if:
- User mentions: "MV"、"宣传片"、"完整作品"、"配乐"、"soundtrack"
- Task spans multiple media types (image + video, video + music, etc.)
- Complex multi-step workflows that need task decomposition
2. **Check for visual consistency needs** — Read `ima-knowledge-ai/references/visual-consistency.md` if:
- User mentions: "系列"、"多张"、"同一个"、"角色"、"续"、"series"、"same"
- Task involves: multiple images/videos, character continuity, product shots
- Second+ request about same subject (e.g., "旺财在游泳" after "生成旺财照片")
3. **Check video modes** — Read `ima-knowledge-ai/references/video-modes.md` if:
- Any video generation task
- Need to understand: image_to_video vs reference_image_to_video difference
4. **Check model selection** — Read `ima-knowledge-ai/references/model-selection.md` if:
- Unsure which model to use
- Need cost/quality trade-off guidance
- User specifies budget or quality requirements
**Why this matters:**
- Multi-media workflows need proper task sequencing (e.g., video duration → matching music duration)
- AI generation defaults to **独立生成** each time — without reference images, results will be inconsistent
- Wrong video mode = wrong result (image_to_video ≠ reference_image_to_video)
- Model choice affects cost and quality significantly
**Example multi-media workflow:**
```
User: "帮我做个产品宣传MV,有背景音乐,主角是旺财小狗"
❌ Wrong:
1. Generate dog image (random look)
2. Generate video (different dog)
3. Generate music (unrelated)
✅ Right:
1. Read workflow-design.md + visual-consistency.md
2. Generate Master Reference: 旺财小狗图片
3. Generate video shots using image_to_video with 旺财 as first frame
4. Get video duration (e.g., 15s)
5. Generate BGM with matching duration and mood
```
**How to check:**
```python
# Step 0: Determine media type first (image / video / music / speech)
# From user request: "画"/"生成图"/"image" → image; "视频"/"video" → video; "音乐"/"歌"/"music"/"BGM" → music; "语音"/"朗读"/"TTS"/"speech" → speech
# Then choose task_type and model from the corresponding section (image: text_to_image/image_to_image; video: text_to_video/...; music: text_to_music; speech: text_to_speech)
# Step 1: Read knowledge base based on task type
if multi_media_workflow:
read("~/.openclaw/skills/ima-knowledge-ai/references/workflow-design.md")
if "same subject" or "series" or "character":
read("~/.openclaw/skills/ima-knowledge-ai/references/visual-consistency.md")
if video_generation:
read("~/.openclaw/skills/ima-knowledge-ai/references/video-modes.md")
# Step 2: Execute with proper sequencing and reference images
# (see workflow-design.md for specific patterns)
```
**No exceptions** — for simple single-media requests, you can proceed directly. For complex multi-media workflows, read the knowledge base first.
---
## 📥 User Input Parsing (Media Type & Task Routing)
**Purpose:** So that any agent parses user intent consistently, **first determine the media type** from the user's request, then choose task_type and model.
### 1. User phrasing → media type (do this first)
| User intent / keywords | Media type | task_type examples |
|------------------------|------------|---------------------|
| 画 / 生成图 / 图片 / image / 画一张 / 图生图 | **image** | `text_to_image`, `image_to_image` |
| 视频 / 生成视频 / video / 图生视频 / 文生视频 | **video** | `text_to_video`, `image_to_video`, `first_last_frame_to_video`, `reference_image_to_video` |
| 音乐 / 歌 / BGM / 背景音乐 / music / 作曲 | **music** | `text_to_music` |
| 语音 / 朗读 / TTS / 语音合成 / 配音 / speech / read aloud / text-to-speech | **speech** | `text_to_speech` |
If the request mixes media (e.g. "宣传片+配乐"), treat as **multi-media workflow**: read `workflow-design.md`, then plan image → video → music steps and use the correct task_type for each step.
### 2. Model and parameter parsing
- **Image:** For model name → model_id and size/aspect_ratio parsing, follow the same rules as in **ima-image-ai** skill (User Input Parsing section).
- **Video:** For task_type (t2v / i2v / first_last / reference), model alias → model_id, and duration/resolution/aspect_ratio, follow **ima-video-ai** skill (User Input Parsing section).
Sevio alias normalization in `ima-all-ai`:
- `Ima Sevio 1.0` → `ima-pro`
- `Ima Sevio 1.0-Fast` / `Ima Sevio 1.0 Fast` → `ima-pro-fast`
Routing rule:
- Normalize alias first
- Then resolve against runtime product list for the selected `task_type`
- If model is absent in current category, return available model_ids from `--list-models`
- **Music:** Suno (`sonic`) vs DouBao BGM/Song — infer from "BGM"/"背景音乐" → BGM; "带歌词"/"人声" → Suno or Song. Use model_id `sonic`, `GenBGM`, `GenSong` per "Recommended Defaults" and "Music Generation" tables below.
- **Speech (TTS):** Get model_id from `GET /open/v1/product/list?category=text_to_speech` or run script with `--task-type text_to_speech --list-models`. Map user intent to parameters using product `form_config`:
| User intent / phrasing | Parameter (if in form_config) | Notes |
|------------------------|--------------------------------|--------|
| 女声 / 女声朗读 / female voice | voice_id / voice_type | Use value from form_config options |
| 男声 / 男声朗读 / male voice | voice_id / voice_type | Use value from form_config options |
| 语速快/慢 / speed up/slow | speed | e.g. 0.8–1.2 |
| 音调 / pitch | pitch | If supported |
| 大声/小声 / volume | volume | If supported |
If the user does not specify, use form_config defaults. Pass extra params via `--extra-params '{"speed":1.0}'`. Only send parameters present in the product’s credit_rules/attributes or form_config (script reflection strips others on retry).
---
## ⚙️ How This Skill Works
**For transparency:** This skill uses a bundled Python script (`scripts/ima_create.py`) to call the IMA Open API. The script:
- Sends your prompt to **two IMA-owned domains** (see "Network Endpoints" below)
- Uses `--user-id` **only locally** as a key for storing your model preferences
- Returns image/video/music URLs when generation is complete
**What gets sent to IMA servers:**
- ✅ Your prompt/description (image/video/music)
- ✅ Model selection (SeeDream/Wan/Suno/etc.)
- ✅ Generation parameters (size, duration, style, etc.)
- ❌ NO API key in prompts (key is used for authentication only)
- ❌ NO user_id (it's only used locally)
**What's stored locally:**
- `~/.openclaw/memory/ima_prefs.json` - Your model preferences (< 1 KB)
- `~/.openclaw/logs/ima_skills/` - Generation logs (auto-deleted after 7 days)
---
## 🌐 Network Endpoints Used
| Domain | Owner | Purpose | Data Sent | Privacy |
|--------|-------|---------|-----------|---------|
| `api.imastudio.com` | IMA Studio | Main API (product list, task creation, task polling) | Prompts, model IDs, generation params, **your API key** | Standard HTTPS, data processed for AI generation |
| `imapi.liveme.com` | IMA Studio | Image/Video upload service (presigned URL generation) | **Your API key**, file metadata (MIME type, extension) | Standard HTTPS, used for image/video tasks only |
| `*.aliyuncs.com`, `*.esxscloud.com` | Alibaba Cloud (OSS) | Image/video storage (file upload, CDN delivery) | Raw image/video bytes (via presigned URL, **NO API key**) | IMA-managed OSS buckets, presigned URLs expire after 7 days |
**Key Points:**
- **Music tasks** (`text_to_music`) and **TTS tasks** (`text_to_speech`) only use `api.imastudio.com`.
- **Image/video tasks** require `imapi.liveme.com` to obtain presigned URLs for uploading input images.
- Your API key is sent to **both `api.imastudio.com` and `imapi.liveme.com`** (both owned by IMA Studio).
- Verify network calls: `tcpdump -i any -n 'host api.imastudio.com or host imapi.liveme.com'`. See this document: **🌐 Network Endpoints Used** and **⚠️ Credential Security Notice** for full disclosure.
---
## ⚠️ Credential Security Notice
**Your API key is sent to both IMA-owned domains:**
1. `Authorization: Bearer ima_xxx...` → `api.imastudio.com` (main API)
2. Query param `appUid=ima_xxx...` → `imapi.liveme.com` (upload service)
**Security best practices:**
- 🧪 **Use test keys for experiments:** Generate a separate API key for testing.
- 🔍 **Monitor usage:** Check `https://imastudio.com/dashboard` for unauthorized activity.
- ⏱️ **Rotate keys:** Regenerate your API key periodically (monthly recommended).
- 📊 **Review logs:** Check `~/.openclaw/logs/ima_skills/` for unexpected API calls.
**Why two domains?** IMA Studio uses a microservices architecture:
- **`api.imastudio.com`**: Core AI generation API
- **`imapi.liveme.com`**: Specialized image/video upload service (shared infrastructure)
Both domains are operated by IMA Studio. The same API key grants access to both services.
### Agent Execution (Internal Reference)
> **Note for users:** You can review the script source at `scripts/ima_create.py` anytime.
> The agent uses this script to simplify API calls. **Music tasks** use only `api.imastudio.com`, while **image/video tasks** also call `imapi.liveme.com` for file uploads (see "Network Endpoints" above).
Use the bundled script internally for all task types — it ensures correct parameter construction:
```bash
# ─── Image Generation ──────────────────────────────────────────────────────────
# Basic text-to-image (default model)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_image \
--model-id doubao-seedream-4.5 --prompt "a cute puppy on grass, photorealistic" \
--user-id {user_id} --output-json
# Text-to-image with size override (Nano Banana2)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_image \
--model-id gemini-3.1-flash-image --prompt "city skyline at sunset, 4K" \
--size 2k --user-id {user_id} --output-json
# Image-to-image with input URL
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type image_to_image \
--model-id doubao-seedream-4.5 --prompt "turn into oil painting style" \
--input-images https://example.com/photo.jpg --user-id {user_id} --output-json
# ─── Video Generation ──────────────────────────────────────────────────────────
# Basic text-to-video (default model, 5s 720P)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_video \
--model-id wan2.6-t2v --prompt "a puppy dancing happily, cinematic" \
--user-id {user_id} --output-json
# Text-to-video with extra params (10s 1080P)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_video \
--model-id wan2.6-t2v --prompt "dramatic ocean waves, sunset" \
--extra-params '{"duration":10,"resolution":"1080P","aspect_ratio":"16:9"}' \
--user-id {user_id} --output-json
# Image-to-video (animate static image)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type image_to_video \
--model-id wan2.6-i2v --prompt "camera slowly zooms in, gentle movement" \
--input-images https://example.com/photo.jpg --user-id {user_id} --output-json
# First-last frame video (two images)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type first_last_frame_to_video \
--model-id kling-video-o1 --prompt "smooth transition between frames" \
--input-images https://example.com/frame1.jpg https://example.com/frame2.jpg \
--user-id {user_id} --output-json
# ─── Music Generation ──────────────────────────────────────────────────────────
# Basic text-to-music (Suno default)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_music \
--model-id sonic --prompt "upbeat electronic music, 120 BPM, no vocals" \
--user-id {user_id} --output-json
# Music with custom lyrics (Suno custom mode)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_music \
--model-id sonic --prompt "pop ballad, emotional" \
--extra-params '{"custom_mode":true,"lyrics":"Your custom lyrics here...","vocal_gender":"female"}' \
--user-id {user_id} --output-json
# Background music (DouBao BGM)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_music \
--model-id GenBGM --prompt "relaxing ambient music for meditation" \
--user-id {user_id} --output-json
# ─── Text-to-Speech (TTS) ─────────────────────────────────────────────────────
# List TTS models first to get model_id, then generate speech
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_speech --list-models
# TTS: use model_id from list above (prompt = text to speak)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_speech \
--model-id <model_id from list> --prompt "Text to be spoken here." \
--user-id {user_id} --output-json
```
The script outputs JSON with `url`, `model_name`, `credit` — use these values in the UX protocol messages below. The script internals (product list query, parameter construction, polling) are invisible to users.
---
## Overview
Call IMA Open API to create AI-generated content. All endpoints require an `ima_*` API key. The core flow is: **query products → create task → poll until done**.
---
## 🔒 Security & Transparency Policy
> **This skill is community-maintained and open for inspection.**
### ✅ What Users CAN Do
**Full transparency:**
- ✅ **Review all source code**: Check `scripts/ima_create.py` and `ima_logger.py` anytime
- ✅ **Verify network calls**: Music tasks use `api.imastudio.com` only; image/video tasks also use `imapi.liveme.com` (see "Network Endpoints" section)
- ✅ **Inspect local data**: View `~/.openclaw/memory/ima_prefs.json` and log files
- ✅ **Control privacy**: Delete preferences/logs anytime, or disable file writes (see below)
**Configuration allowed:**
- ✅ **Set API key** in environment or agent config:
- Environment variable: `export IMA_API_KEY=ima_your_key_here`
- OpenClaw/MCP config: Add `IMA_API_KEY` to agent's environment configuration
- Get your key at: https://imastudio.com
- ✅ **Use scoped/test keys**: Test with limited API keys, rotate after testing
- ✅ **Disable file writes**: Make prefs/logs read-only or symlink to `/dev/null`
**Data control:**
- ✅ **View stored data**: `cat ~/.openclaw/memory/ima_prefs.json`
- ✅ **Delete preferences**: `rm ~/.openclaw/memory/ima_prefs.json` (resets to defaults)
- ✅ **Delete logs**: `rm -rf ~/.openclaw/logs/ima_skills/` (auto-cleanup after 7 days anyway)
### ⚠️ Advanced Users: Fork & Modify
If you need to modify this skill for your use case:
1. **Fork the repository** (don't modify the original)
2. **Update your fork** with your changes
3. **Test thoroughly** with limited API keys
4. **Document your changes** for troubleshooting
**Note:** Modified skills may break API compatibility or introduce security issues. Official support only covers the unmodified version.
### ❌ What to AVOID (Security Risks)
**Actions that could compromise security:**
- ❌ Sharing API keys publicly or in skill files
- ❌ Modifying API endpoints to unknown servers
- ❌ Disabling SSL/TLS certificate verification
- ❌ Logging sensitive user data (prompts, IDs, etc.)
- ❌ Bypassing authentication or billing mechanisms
**Why this matters:**
1. **API Compatibility**: Skill logic aligns with IMA Open API schema
2. **Security**: Malicious modifications could leak credentials or bypass billing
3. **Support**: Modified skills may not be supported
4. **Community**: Breaking changes affect all users
### 📋 Privacy & Data Handling Summary
**What this skill does with your data:**
| Data Type | Sent to IMA? | Stored Locally? | User Control |
|-----------|-------------|-----------------|--------------|
| Prompts (image/video/music) | ✅ Yes (required for generation) | ❌ No | None (required) |
| API key | ✅ Yes (authentication header) | ❌ No | Set via env var |
| user_id (optional CLI arg) | ❌ **Never** (local preference key only) | ✅ Yes (as prefs file key) | Change `--user-id` value |
| Model preferences | ❌ No | ✅ Yes (~/.openclaw) | Delete anytime |
| Generation logs | ❌ No | ✅ Yes (~/.openclaw) | Auto-cleanup 7 days |
**Privacy recommendations:**
1. **Use test/scoped API keys** for initial testing
2. **Note**: `--user-id` is **never sent to IMA servers** - it's only used locally as a key for storing preferences in `~/.openclaw/memory/ima_prefs.json`
3. **Review source code** at `scripts/ima_create.py` to verify network calls (search for `create_task` function)
4. **Rotate API keys** after testing or if compromised
**Get your IMA API key:** Visit https://imastudio.com to register and get started.
### 🔧 For Skill Maintainers Only
**Version control:**
- All changes must go through Git with proper version bumps (semver)
- CHANGELOG.md must document all changes
- Production deployments require code review
**File checksums (optional):**
```bash
# Verify skill integrity
sha256sum SKILL.md scripts/ima_create.py
```
If users report issues, verify file integrity first.
---
## 🧠 User Preference Memory (Image)
> User preferences have **highest priority** when they exist. But preferences are only saved when users **explicitly express** model preferences — not from automatic model selection.
### Storage: `~/.openclaw/memory/ima_prefs.json`
Single file, shared across all IMA skills:
```json
{
"user_{user_id}": {
"text_to_image": { "model_id": "doubao-seedream-4.5", "model_name": "SeeDream 4.5", "credit": 5, "last_used": "2026-02-27T03:07:27Z" },
"image_to_image": { "model_id": "doubao-seedream-4.5", "model_name": "SeeDream 4.5", "credit": 5, "last_used": "2026-02-27T03:07:27Z" },
"text_to_speech": { "model_id": "<from product list>", "model_name": "...", "credit": 2, "last_used": "..." }
}
}
```
### Model Selection Flow (Image Generation)
**Step 1: Get knowledge-ai recommendation** (if installed)
```python
knowledge_recommended_model = read_ima_knowledge_ai() # e.g., "SeeDream 4.5"
```
**Step 2: Check user preference**
```python
user_pref = load_prefs().get(f"user_{user_id}", {}).get(task_type) # e.g., {"model_id": "midjourney", ...}
```
**Step 3: Decide which model to use**
```python
if user_pref exists:
use_model = user_pref["model_id"] # Highest priority
else:
use_model = knowledge_recommended_model or fallback_default
```
**Step 4: Check for mismatch (for later hint)**
```python
if user_pref exists and knowledge_recommended_model != user_pref["model_id"]:
mismatch = True # Will add hint in success message
```
### When to Write (User Explicit Preference ONLY)
**✅ Save preference when user explicitly specifies a model:**
| User says | Action |
|-----------|--------|
| `用XXX` / `换成XXX` / `改用XXX` | Switch to model XXX + save as preference |
| `以后都用XXX` / `默认用XXX` / `always use XXX` | Save + confirm: `✅ 已记住!以后图片生成默认用 [XXX]` |
| `我喜欢XXX` / `我更喜欢XXX` | Save as preference |
**❌ Do NOT save when:**
- Agent auto-selects from knowledge-ai → not user preference
- Agent uses fallback default → not user preference
- User says generic quality requests (see "Clear Preference" below) → clear preference instead
### When to Clear (User Abandons Preference)
**🗑️ Clear preference when user wants automatic selection:**
| User says | Action |
|-----------|--------|
| `用最好的` / `用最合适的` / `best` / `recommended` | Clear pref + use knowledge-ai recommendation |
| `推荐一个` / `你选一个` / `自动选择` | Clear pref + use knowledge-ai recommendation |
| `用默认的` / `用新的` | Clear pref + use knowledge-ai recommendation |
| `试试别的` / `换个试试` (without specific model) | Clear pref + use knowledge-ai recommendation |
| `重新推荐` | Clear pref + use knowledge-ai recommendation |
**Implementation:**
```python
del prefs[f"user_{user_id}"][task_type]
save_prefs(prefs)
```
---
## ⭐ Model Selection Priority (Image)
**Selection flow:**
1. **User preference** (if exists) → Highest priority, always respect
2. **ima-knowledge-ai skill** (if installed) → Professional recommendation based on task
3. **Fallback defaults** → Use table below (only if neither 1 nor 2 exists)
**Important notes:**
- User preference is only saved when user **explicitly specifies** a model (see "When to Write" above)
- Knowledge-ai is **always consulted** (even when user pref exists) to detect mismatches
- When mismatch detected → add gentle hint in success message (does NOT interrupt generation)
> The defaults below are FALLBACK only. User preferences have highest priority, then knowledge-ai recommendations.
When using user preference for image generation, show a line like:
```
🎨 根据你的使用习惯,将用 [Model Name] 帮你生成…
• 模型:[Model Name](你的常用模型)
• 预计耗时:[X ~ Y 秒]
• 消耗积分:[N pts]
```
### Preference Change Confirmation
When user switches to a different model than their saved preference:
```
💡 你之前喜欢用 [Old Model],这次换成了 [New Model]。
要把 [New Model] 设为以后的默认吗?
回复「是」保存 / 回复「否」仅本次使用
```
---
## ⭐ Recommended Defaults
> **These are fallback defaults — only used when no user preference exists.**
> **Always default to the newest and most popular model. Do NOT default to the cheapest.**
| Task Type | Default Model | model_id | version_id | Cost | Why |
|-----------|--------------|----------|------------|------|-----|
| **text_to_image** | **SeeDream 4.5** | `doubao-seedream-4.5` | `doubao-seedream-4-5-251128` | 5 pts | Latest doubao flagship, photorealistic 4K |
| text_to_image (budget) | **Nano Banana2** | `gemini-3.1-flash-image` | `gemini-3.1-flash-image` | 4 pts | Fastest and cheapest option |
| text_to_image (premium) | **Nano Banana Pro** | `gemini-3-pro-image` | `gemini-3-pro-image-preview` | 10/10/18 pts | Premium quality, 1K/2K/4K options |
| text_to_image (artistic) | **Midjourney** 🎨 | `midjourney` | `v6` | 8/10 pts | Artist-level aesthetics, creative styles |
| **image_to_image** | **SeeDream 4.5** | `doubao-seedream-4.5` | `doubao-seedream-4-5-251128` | 5 pts | Latest, best i2i quality |
| image_to_image (budget) | **Nano Banana2** | `gemini-3.1-flash-image` | `gemini-3.1-flash-image` | 4 pts | Cheapest option |
| image_to_image (premium) | **Nano Banana Pro** | `gemini-3-pro-image` | `gemini-3-pro-image-preview` | 10 pts | Premium quality |
| image_to_image (artistic) | **Midjourney** 🎨 | `midjourney` | `v6` | 8/10 pts | Artist-level aesthetics, style transfer |
| **text_to_video** | **Wan 2.6** | `wan2.6-t2v` | `wan2.6-t2v` | 25 pts | 🔥 Most popular t2v, balanced cost |
| text_to_video (premium) | **Hailuo 2.3** | `MiniMax-Hailuo-2.3` | `MiniMax-Hailuo-2.3` | 38 pts | Higher quality |
| text_to_video (budget) | **Vidu Q2** | `viduq2` | `viduq2` | 5 pts | Lowest cost t2v |
| **image_to_video** | **Wan 2.6** | `wan2.6-i2v` | `wan2.6-i2v` | 25 pts | 🔥 Most popular i2v, 1080P |
| image_to_video (premium) | **Kling 2.6** | `kling-v2-6` | `kling-v2-6` | 40-160 pts | Premium Kling i2v |
| **first_last_frame_to_video** | **Kling O1** | `kling-video-o1` | `kling-video-o1` | 48 pts | Newest Kling reasoning model |
| **reference_image_to_video** | **Kling O1** | `kling-video-o1` | `kling-video-o1` | 48 pts | Best reference fidelity |
| **text_to_music** | **Suno (sonic-v4)** | `sonic` | `sonic` | 25 pts | Latest Suno engine, best quality |
| **text_to_speech** | (query product list) | — | — | — | Run `--task-type text_to_speech --list-models`; use first or user-preferred model_id |
**Premium options:**
- **Image**: Nano Banana Pro — Highest quality with size control (1K/2K/4K), higher cost (10-18 pts for text_to_image, 10 pts for image_to_image)
- **Video**: Kling O1, Sora 2 Pro, Google Veo 3.1 — Premium quality with longer duration options
**Quick selection guide (production as of 2026-02-27, sorted by popularity):**
- **Image (4 models available)** → **SeeDream 4.5** (5, default); artistic → Midjourney 🎨 (8-10); budget → Nano Banana2 (4, 512px); premium → Nano Banana Pro (10-18)
- **🔥 Video from text (most popular)** → **Wan 2.6** (25, balanced); premium → Hailuo 2.3 (38); budget → Vidu Q2 (5)
- **🔥 Video from image (most popular)** → **Wan 2.6** (25)
- Music → **Suno** (25); DouBao BGM/Song (30 each)
- Cheapest → Nano Banana2 512px (4) for image; Vidu Q2 (5) for video
**Selection guide by use case:**
**Image Generation:**
- General image generation → **SeeDream 4.5** (5pts)
- **Custom aspect ratio (16:9, 9:16, 4:3, etc.)** → **SeeDream 4.5** 🌟 or **Nano Banana Pro/2** 🆕 (native support)
- Budget-conscious / fast generation → **Nano Banana2** (4pts)
- Highest quality with size control (1K/2K/4K) → **Nano Banana Pro** (text_to_image: 10-18pts, image_to_image: 10pts)
- **Artistic/creative styles, illustrations, paintings** → **Midjourney** 🎨 (8-10pts)
- Style transfer / image editing → **SeeDream 4.5** (5pts) or **Midjourney** 🎨 (artistic)
**Video Generation:**
- General video generation → **Wan 2.6** (25pts, most popular)
- Premium cinematic quality → **Google Veo 3.1** (70-330pts) or **Sora 2 Pro** (122+pts)
- Budget video → **Vidu Q2** (5pts) or **Hailuo 2.0** (5pts)
- With audio support → **Kling O1** (48+pts) or **Google Veo 3.1** (70+pts)
- First/last frame animation → **Kling O1** (48+pts)
- Reference image consistency → **Kling O1** (48+pts) or **Google Veo 3.1** (70+pts)
**Music Generation:**
- **Custom song with lyrics, vocals, style** → **Suno sonic-v5** (25pts, default, ~2min)
- Full control: custom_mode, lyrics, vocal_gender, tags, negative_tags
- Best for: complete songs, vocal tracks, artistic compositions
- **Background music / ambient loop** → **DouBao BGM** (30pts, ~30s)
- Simplified: prompt-only, no advanced parameters
- Best for: video backgrounds, ambient music, short loops
- **Simple song generation** → **DouBao Song** (30pts, ~30s)
- Simplified: prompt-only
- Best for: quick song generation, structured vocal compositions
- **User explicitly asks for cheapest** → DouBao BGM/Song (6pts each) — only if explicitly requested
**Speech (TTS) Generation:**
- **Text-to-speech / 语音合成 / 朗读** → `text_to_speech`. Always query `GET /open/v1/product/list?category=text_to_speech` (or `--list-models`) to get current model_id and credit. No fixed default; use first available or user preference. Voice/speed/format parameters: see "Model and parameter parsing" (TTS table) and "Speech (TTS) — text_to_speech" in this document.
**⚠️ Technical Note for Suno:**
> `model_version` inside `parameters.parameters` (e.g., `"sonic-v5"`) is different from the outer `model_version` field (which is `"sonic"`). Always set both correctly when creating Suno tasks.
**⚠️ Production Image Models (4 available):**
- SeeDream 4.5 (`doubao-seedream-4.5`) — 5 pts, default
- Midjourney 🎨 (`midjourney`) — 8/10 pts for 480p/720p, artistic styles
- Nano Banana2 (`gemini-3.1-flash-image`) — 4/6/10/13 pts for 512px/1K/2K/4K
- Nano Banana Pro (`gemini-3-pro-image`) — 10/10/18 pts for 1K/2K/4K
**All other image models mentioned in older documentation are no longer available in production.**
**🌟 Parameter Support Notes (All Task Types):**
### Image Models (text_to_image / image_to_image)
**🆕 MAJOR UPDATE: Nano Banana series now has NATIVE aspect_ratio support!**
- **Nano Banana Pro**: ✅ Supports `aspect_ratio` (1:1, 16:9, 9:16, 4:3, 3:4) NATIVELY
- **Nano Banana2**: ✅ Supports `aspect_ratio` (1:1, 16:9, 9:16, 4:3, 3:4) NATIVELY
- **SeeDream 4.5**: ✅ Supports 8 ratios via virtual params (1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9)
- **Midjourney**: ❌ 1:1 only (fixed 1024x1024)
**aspect_ratio support details:**
- ✅ **aspect_ratio**:
- **SeeDream 4.5**: ✅ Supports 8 ratios via virtual params (1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9)
- **Nano Banana2**: ✅ **Native support** for 5 ratios (1:1, 16:9, 9:16, 4:3, 3:4)
- **Nano Banana Pro**: ✅ **Native support** for 5 ratios (1:1, 16:9, 9:16, 4:3, 3:4)
- **Midjourney**: ❌ 1:1 only (fixed 1024x1024)
- ✅ **size**:
- **Nano Banana2**: 512px, 1K, 2K, 4K (via different `attribute_id`s, 4-13 pts)
- **Nano Banana Pro**: 1K, 2K, 4K (via different `attribute_id`s, 10-18 pts)
- **SeeDream 4.5**: Adaptive default (5 pts)
- **Midjourney**: 480p/720p (via `attribute_id`, 8/10 pts)
- ❌ **8K**: No model supports 8K (max is 4K via Nano Banana Pro)
- ❌ **Non-standard aspect ratios** (7:3, 8:5, etc.): Not supported. Use closest supported ratio or video models.
- ✅ **n**: Multiple outputs supported (1-4), credit × n
**When user requests unsupported combinations for images:**
- **Midjourney + aspect_ratio (16:9, etc.)**: Recommend **SeeDream 4.5** or **Nano Banana series** instead
```
❌ Midjourney 暂不支持自定义 aspect_ratio(仅支持 1024x1024 方形)
✅ 推荐方案:
1. SeeDream 4.5(支持虚拟参数 aspect_ratio)
• 支持比例:1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9
• 成本:5 积分(性价比最佳)
2. Nano Banana Pro/2(原生支持 aspect_ratio)
• 支持比例:1:1, 16:9, 9:16, 4:3, 3:4
• 成本:4-18 积分(按尺寸)
需要我帮你用 SeeDream 4.5 生成吗?
```
- **Any model + 8K**: Inform user no model supports 8K, max is 4K (Nano Banana Pro)
- **Any model + non-standard ratio (7:3, 8:5, etc.)**: Non-standard ratio, not supported. Suggest closest supported ratio (e.g., 21:9 for ultra-wide, 2:3 for portrait)
### Video Models (text_to_video / image_to_video / first_last_frame / reference_image)
- ✅ **resolution**: 540P, 720P, 1080P, 2K, 4K (model-dependent, higher res = higher cost)
- ✅ **aspect_ratio**: 16:9, 9:16, 1:1, 4:3 (model-dependent, check `form_config`)
- ✅ **duration**: 4s, 5s, 10s, 15s (model-dependent, longer = higher cost)
- ⚠️ **generate_audio**: Supported by Veo 3.1, Kling O1, Hailuo (check `form_config`)
- ✅ **prompt_extend**: AI-powered prompt enhancement (most models support)
- ✅ **negative_prompt**: Content exclusion (most models support)
- ✅ **shot_type**: Single/multi-shot control (model-dependent)
- ✅ **seed**: Reproducibility control (most models support, -1 = random)
- ✅ **n**: Multiple outputs (1-4), credit × n
#### 🆕 Special Case: Pixverse Model Parameter (v1.0.7+)
**Auto-Inference Logic for Pixverse V5.5/V5/V4:**
- **Problem**: Pixverse V5.5, V5, V4 lack `model` field in `form_config` from Product List API
- **Backend Requirement**: Backend requires `model` parameter (e.g., `"v5.5"`, `"v5"`, `"v4"`)
- **Auto-Fix**: System automatically extracts version from `model_name` and injects it
- Example: `model_name: "Pixverse V5.5"` → auto-inject `model: "v5.5"`
- Example: `model_name: "Pixverse V4"` → auto-inject `model: "v4"`
- **Note**: V4.5 and V3.5 include `model` in `form_config` (no auto-inference needed)
- **Relevant Task Types**: All video modes (text_to_video, image_to_video, first_last_frame_to_video, reference_image_to_video)
**Error Prevention:**
- Without auto-inference: `err_code=400017 err_msg=Invalid value for model`
- With auto-inference (v1.0.7+): Pixverse V5.5/V5/V4 work seamlessly ✅
### Music Models (text_to_music)
**Suno sonic-v5 (Full-Featured):**
- ✅ **custom_mode**: Suno only (enables vocal_gender, lyrics, tags support)
- ✅ **vocal_gender**: Suno only (male/female/mixed, requires custom_mode=True)
- ✅ **lyrics**: Suno only (custom lyrics support, requires custom_mode=True)
- ✅ **make_instrumental**: Suno only (force instrumental, no vocals)
- ✅ **auto_lyrics**: Suno only (AI-generated lyrics)
- ✅ **tags**: Suno only (genre/style tags)
- ✅ **negative_tags**: Suno only (exclude unwanted styles)
- ✅ **title**: Suno only (song title)
- ❌ **duration**: Fixed-length output (DouBao ~30s, Suno ~2min, not user-controllable)
- ✅ **n**: Multiple outputs supported (1-2), credit × n
**DouBao BGM/Song (Simplified):**
- ✅ **prompt**: Text description only
- ❌ **No advanced parameters** (no custom_mode, lyrics, vocal control)
- ❌ **duration**: Fixed ~30s output
**🎵 Suno Prompt Writing Guide (for `gpt_description_prompt`):**
When using Suno, structure your prompt with these elements:
1. **Genre/Style:**
- Examples: `"lo-fi hip hop"`, `"orchestral cinematic"`, `"upbeat pop"`, `"dark ambient"`, `"indie folk"`, `"electronic dance"`
2. **Tempo/BPM:**
- Examples: `"80 BPM"`, `"fast tempo"`, `"slow ballad"`, `"moderate pace 110 BPM"`
3. **Vocals Control:**
- **No vocals**: `"no vocals"` → set `make_instrumental=true`
- **With vocals**: `"female vocals"` → set `vocal_gender="female"`
- **Male vocals**: `"male vocals"` → set `vocal_gender="male"`
- **Mixed**: Set `vocal_gender="mixed"`
4. **Mood/Emotion:**
- Examples: `"happy and energetic"`, `"melancholic"`, `"tense and dramatic"`, `"peaceful and calming"`
5. **Negative Tags (exclude styles):**
- Use `negative_tags`: `"heavy metal, distortion, screaming"` to exclude unwanted elements
6. **Duration Hint:**
- Examples: `"60 seconds"`, `"30 second loop"`, `"2 minute track"`
- Note: Suno typically generates ~2min, not strictly controllable
**Example Suno prompts:**
```
"upbeat lo-fi hip hop, 90 BPM, no vocals, relaxed and chill"
→ Set: make_instrumental=true
"emotional pop ballad, slow tempo, female vocals, melancholic"
→ Set: vocal_gender="female"
"orchestral cinematic trailer music, epic and dramatic, 120 BPM, no vocals"
→ Set: make_instrumental=true, tags="orchestral,cinematic,epic"
"acoustic indie folk, gentle guitar, male vocals, warm and nostalgic"
→ Set: vocal_gender="male", tags="acoustic,indie,folk"
```
**⚠️ Technical Note for Suno:**
> `model_version` inside `parameters.parameters` (e.g., `"sonic-v5"`) is different from the outer `model_version` field (which is `"sonic"`). Always set both correctly.
### Common Parameter Patterns
- **n (batch generation)**: Supported by ALL models. Cost = base_credit × n. Creates n independent resources.
- **seed**: Supported by most models (-1 = random, >0 = reproducible results)
- **prompt_extend**: AI-powered prompt enhancement (video models only)
### Decision Tree: When User Requests Unsupported Features
```
User asks for custom aspect ratio image (e.g. "7:3 landscape")
→ ❌ Image models don't support custom ratios
→ ✅ Solution: "图片模型不支持自定义比例。建议用视频模型(Wan 2.6 t2v)生成16:9视频,然后截取首帧作为图片。"
User asks for 8K image
→ ❌ No model supports 8K
→ ✅ Solution: "当前最高支持4K分辨率(Nano Banana Pro,18积分)。要使用吗?"
User asks for video with audio
→ Check model: Veo 3.1 / Kling O1 / Hailuo have generate_audio
→ ✅ Solution: "Veo 3.1 和 Kling O1 支持音频生成(需在参数中设置 generate_audio=True)。要用哪个?"
User asks for long music (e.g. "5 minute track")
→ ❌ Duration not user-controllable
→ ✅ Solution: "Suno 生成约2分钟音乐。需要更长时长可以生成多段后拼接。"
User asks for 30s video
→ Check model: Most models max 15s
→ ✅ Solution: "当前最长15秒。可选模型:Wan 2.6(15s, 75积分), Kling O1(10s, 96积分)。"
```
**When user requests unsupported combinations:**
- Video + audio (unsupported model) → "该模型不支持音频。建议用 Veo 3.1 或 Kling O1 (支持 generate_audio 参数)"
- Music + custom duration → "音乐时长由模型固定(Suno约2分钟,DouBao约30秒),无法自定义"
- Video duration > 15s → "当前最长15秒。可选模型:Wan 2.6(15s, 75积分), Kling O1(10s, 96积分)"
> **Note:** Image-specific unsupported combinations (Midjourney + aspect_ratio, 8K, non-standard ratios) are documented in the "Image Models" section above.
---
## 🧠 User Preference Memory (Video)
> User preferences have **highest priority** when they exist. But preferences are only saved when users **explicitly express** model preferences — not from automatic model selection.
### Storage: `~/.openclaw/memory/ima_prefs.json`
```json
{
"user_{user_id}": {
"text_to_video": { "model_id": "wan2.6-t2v", "model_name": "Wan 2.6", "credit": 25, "last_used": "..." },
"image_to_video": { "model_id": "wan2.6-i2v", "model_name": "Wan 2.6", "credit": 25, "last_used": "..." },
"first_last_frame_to_video": { "model_id": "kling-video-o1", "model_name": "Kling O1", "credit": 48, "last_used": "..." },
"reference_image_to_video": { "model_id": "kling-video-o1", "model_name": "Kling O1", "credit": 48, "last_used": "..." }
}
}
```
### Model Selection Flow (Video Generation)
**Step 1: Get knowledge-ai recommendation** (if installed)
```python
knowledge_recommended_model = read_ima_knowledge_ai() # e.g., "Wan 2.6"
```
**Step 2: Check user preference**
```python
user_pref = load_prefs().get(f"user_{user_id}", {}).get(task_type) # e.g., {"model_id": "kling-video-o1", ...}
```
**Step 3: Decide which model to use**
```python
if user_pref exists:
use_model = user_pref["model_id"] # Highest priority
else:
use_model = knowledge_recommended_model or fallback_default
```
**Step 4: Check for mismatch (for later hint)**
```python
if user_pref exists and knowledge_recommended_model != user_pref["model_id"]:
mismatch = True # Will add hint in success message
```
### When to Write (User Explicit Preference ONLY)
**✅ Save preference when user explicitly specifies a model:**
| User says | Action |
|-----------|--------|
| `用XXX` / `换成XXX` / `改用XXX` | Switch to model XXX + save as preference |
| `以后都用XXX` / `默认用XXX` / `always use XXX` | Save + confirm: `✅ 已记住!以后视频生成默认用 [XXX]` |
| `我喜欢XXX` / `我更喜欢XXX` | Save as preference |
**❌ Do NOT save when:**
- Agent auto-selects from knowledge-ai → not user preference
- Agent uses fallback default → not user preference
- User says generic quality requests (see "Clear Preference" below) → clear preference instead
### When to Clear (User Abandons Preference)
**🗑️ Clear preference when user wants automatic selection:**
| User says | Action |
|-----------|--------|
| `用最好的` / `用最合适的` / `best` / `recommended` | Clear pref + use knowledge-ai recommendation |
| `推荐一个` / `你选一个` / `自动选择` | Clear pref + use knowledge-ai recommendation |
| `用默认的` / `用新的` | Clear pref + use knowledge-ai recommendation |
| `试试别的` / `换个试试` (without specific model) | Clear pref + use knowledge-ai recommendation |
| `重新推荐` | Clear pref + use knowledge-ai recommendation |
**Implementation:**
```python
del prefs[f"user_{user_id}"][task_type]
save_prefs(prefs)
```
---
## ⭐ Model Selection Priority (Video)
**Selection flow:**
1. **User preference** (if exists) → Highest priority, always respect
2. **ima-knowledge-ai skill** (if installed) → Professional recommendation based on task
3. **Fallback defaults** → Use table below (only if neither 1 nor 2 exists)
**Important notes:**
- User preference is only saved when user **explicitly specifies** a model (see "When to Write" above)
- Knowledge-ai is **always consulted** (even when user pref exists) to detect mismatches
- When mismatch detected → add gentle hint in success message (does NOT interrupt generation)
> The defaults below are FALLBACK only. User preferences have highest priority, then knowledge-ai recommendations.
---
## 💬 User Experience Protocol (IM / Feishu / Discord) v2.0 🆕
> **v2.0 Updates (aligned with ima-image-ai v1.3):**
> - Added Step 0 for correct message ordering (fixes group chat bug)
> - Added Step 5 for explicit task completion
> - Enhanced Midjourney support with proper timing estimates
> - Now 6 steps total (0-5): Acknowledgment → Pre-Gen → Progress → Success/Failure → Done
>
> This skill runs inside IM platforms (Feishu, Discord via OpenClaw).
> Generation takes 10 seconds (music) up to 6 minutes (video). **Never let users wait in silence.**
> Always follow all 6 steps below, every single time.
### 🗣️ User-Friendly First, Transparent on Request
Default to plain-language updates in normal user flows.
If users ask for technical details, provide them transparently (script name, endpoints, and key parameters).
In standard progress messages, prioritize: **model name, estimated/actual time, credits consumed, result URL, and natural-language status updates.**
---
### Estimated Generation Time (All Task Types)
| Task Type | Model | Estimated Time | Poll Every | Send Progress Every |
|-----------|-------|---------------|------------|---------------------|
| **text_to_image** | SeeDream 4.5 | 25~60s | 5s | 20s |
| | Nano Banana2 💚 | 20~40s | 5s | 15s |
| | Nano Banana Pro | 60~120s | 5s | 30s |
| | Midjourney 🎨 | 40~90s | 8s | 25s |
| **image_to_image** | SeeDream 4.5 | 25~60s | 5s | 20s |
| | Nano Banana2 💚 | 20~40s | 5s | 15s |
| | Nano Banana Pro | 60~120s | 5s | 30s |
| | Midjourney 🎨 | 40~90s | 8s | 25s |
| **text_to_video** | Wan 2.6, Hailuo 2.0/2.3, Vidu Q2, Pixverse | 60~120s | 8s | 30s |
| | SeeDance 1.5 Pro, Kling 2.6, Veo 3.1 | 90~180s | 8s | 40s |
| | Kling O1, Sora 2 Pro | 180~360s | 8s | 60s |
| **image_to_video** | Same ranges as text_to_video | — | 8s | 40s |
| **first_last_frame / reference** | Kling O1, Veo 3.1 | 180~360s | 8s | 60s |
| **text_to_music** | DouBao BGM / Song | 10~25s | 5s | 10s |
| | Suno (sonic-v5) | 20~45s | 5s | 15s |
| **text_to_speech** | (varies by model) | 5~30s | 3s | 10s |
`estimated_max_seconds` = upper bound of the range (e.g. 60 for SeeDream 4.5, 40 for Nano Banana2, 120 for Nano Banana Pro, 90 for Midjourney, 180 for Kling 2.6, 360 for Kling O1).
---
### Step 0 — Initial Acknowledgment Reply (Normal Reply) 🆕
**⚠️ CRITICAL:** This step is essential for correct message ordering in IM platforms (Feishu, Discord).
**Before doing anything else**, reply to the user with a friendly acknowledgment message using your **normal reply** (not `message` tool). This reply will automatically appear FIRST in the conversation.
**Example acknowledgment messages:**
For images:
```
好的!来帮你画一只萌萌的猫咪 🐱
```
```
收到!马上为你生成一张 16:9 的风景照 🏔️
```
```
OK! Starting image generation with SeeDream 4.5 🎨
```
For videos:
```
好的!来帮你生成一段视频 🎬
```
```
收到!开始用 Wan 2.6 生成视频 🎥
```
For music:
```
好的!来帮你创作一首音乐 🎵
```
**Rules:**
- Keep it short and warm (< 15 words)
- Match the user's language (Chinese/English)
- Include relevant emoji (🐱/🎨/🎬/🎵/✨)
- This is your ONLY normal reply — all subsequent updates use `message` tool
**Why this matters:**
- Normal replies automatically appear FIRST in the conversation thread
- `message` tool pushes appear in chronological order AFTER your initial reply
- This ensures users see: "好的!" → "🎨 开始生成..." → "⏳ 进度..." → "✅ 成功!" (correct order)
- Without Step 0, the confirmation might appear LAST, confusing users
---
### Step 1 — Pre-Generation Notification (Push via message tool)
**After Step 0 reply**, use the `message` tool to push a notification immediately:
```
[Emoji] 开始生成 [内容类型],请稍候…
• 模型:[Model Name]
• 预计耗时:[X ~ Y 秒]
• 消耗积分:[N pts]
```
**Emoji by content type:**
- 图片 → `🎨`
- 视频 → `🎬`(加注:视频生成需要较长时间,我会定时汇报进度)
- 音乐 → `🎵`
**Cost transparency (new requirement):**
- Always show credit cost with model tier context
- For expensive models (>50 pts), offer cheaper alternative proactively
- Examples:
- Balanced (default): "使用 Wan 2.6(25 积分,最新 Wan)"
- Premium (user explicit): "使用高端模型 Kling O1(48-120 积分),质量最佳"
- Premium (auto-selected): "使用 Wan 2.6(25 积分)。若需更高质量可选 Kling O1(48 积分起)"
- Budget (user asked): "使用 Vidu Q2(5 积分,最省钱)"
> Adapt language to match the user (Chinese / English). For video, always add a note that it takes longer. For expensive models, always mention cheaper alternatives unless user explicitly requested premium.
---
### Step 2 — Progress Updates
Poll the task detail API every `[Poll Every]` seconds per the table.
Send a progress update every `[Send Progress Every]` seconds.
```
⏳ 正在生成中… [P]%
已等待 [elapsed]s,预计最长 [max]s
```
**Progress formula:**
```
P = min(95, floor(elapsed_seconds / estimated_max_seconds * 100))
```
- **Cap at 95%** — never reach 100% until the API confirms `success`
- If `elapsed > estimated_max`: freeze at 95%, append `「快了,稍等一下…」`
- For video with max=360s: at 120s → 33%, at 250s → 69%, at 400s → 95% (frozen)
---
### Step 3 — Success Notification
When task status = `success`:
#### For Video Tasks (text_to_video / image_to_video / first_last_frame / reference_image)
**3.1 Send video player first** (IM platforms like Feishu will render inline player):
```python
# Get result URL from script output or task detail API
result = get_task_result(task_id)
video_url = result["medias"][0]["url"]
# Build caption
caption = f"""✅ 视频生成成功!
• 模型:[Model Name]
• 耗时:预计 [X~Y]s,实际 [actual]s
• 消耗积分:[N pts]
[视频描述]"""
# Add mismatch hint if user pref conflicts with knowledge-ai recommendation
if user_pref_exists and knowledge_recommended_model != used_model:
caption += f"""
💡 提示:当前任务也许用 {knowledge_recommended_model} 也会不错({reason},{cost} pts)"""
# Send video with caption (use message tool if available)
message(
action="send",
media=video_url, # ⚠️ Use HTTPS URL directly, NOT local file path
caption=caption
)
```
**Important:**
- Hint is **non-intrusive** — does NOT interrupt generation
- Only shown when user pref conflicts with knowledge-ai recommendation
- User can ignore the hint; video is already delivered
**3.2 Then send link as text** (for copying/sharing):
```python
# Send link message immediately after video
message(action="send", text=f"🔗 视频链接(可复制分享):\n{video_url}")
```
**⚠️ Critical for video:**
- Send video player FIRST (inline preview)
- Send text link SECOND (for copying)
- Include first-frame thumbnail URL if available: `result["medias"][0]["cover"]`
#### For Image Tasks (text_to_image / image_to_image)
```python
# Build caption
caption = f"""✅ 图片生成成功!
• 模型:[Model Name]
• 耗时:预计 [X~Y]s,实际 [actual]s
• 消耗积分:[N pts]
🔗 原始链接:{image_url}"""
# Add mismatch hint if user pref conflicts with knowledge-ai recommendation
if user_pref_exists and knowledge_recommended_model != used_model:
caption += f"""
💡 提示:当前任务也许用 {knowledge_recommended_model} 也会不错({reason},{cost} pts)"""
# Send image with caption
message(
action="send",
media=image_url,
caption=caption
)
```
**Important:**
- Hint is **non-intrusive** — does NOT interrupt generation
- Only shown when user pref conflicts with knowledge-ai recommendation
- User can ignore the hint; image is already delivered
#### For Music Tasks (text_to_music)
Send audio file with player:
```
✅ 音乐生成成功!
• 模型:[Model Name]
• 耗时:预计 [X~Y]s,实际 [actual]s
• 消耗积分:[N pts]
• 时长:约 [duration]
[音频URL或直接发送音频文件]
```
#### For TTS Tasks (text_to_speech) — Full UX Protocol (Steps 0–5)
**Step 0 — Initial acknowledgment (normal reply)**
First reply with a short acknowledgment, e.g.: 好的,正在帮你把这段文字转成语音。 / OK, converting this text to speech.
**Step 1 — Pre-generation (message tool)**
Push once:
```
🔊 开始语音合成,请稍候…
• 模型:[Model Name]
• 预计耗时:[X ~ Y 秒]
• 消耗积分:[N pts]
```
**Step 2 — Progress**
Poll every 2–5s. Every 10–15s send: `⏳ 语音合成中… [P]%`,已等待 [elapsed]s,预计最长 [max]s. Cap progress at 95% until API returns success.
**Step 3 — Success (message tool)**
When `resource_status == 1` and `status != "failed"`, send **media** = `medias[0].url` and **caption**:
```
✅ 语音合成成功!
• 模型:[Model Name]
• 耗时:实际 [actual]s
• 消耗积分:[N pts]
🔗 原始链接:[url]
```
Use the **URL** from the API (do not use local file paths).
**Step 4 — Failure (message tool)**
On failure, send user-friendly message. **TTS error translation (do not expose raw API errors):**
| Technical | ✅ Say (CN) | ✅ Say (EN) |
|-----------|-------------|-------------|
| 401 Unauthorized | 密钥无效或未授权,请至 imaclaw.ai 生成新密钥 | API key invalid; generate at imaclaw.ai |
| 4008 Insufficient points | 积分不足,请至 imaclaw.ai 购买积分 | Insufficient points; buy at imaclaw.ai |
| Invalid product attribute | 参数配置异常,请稍后重试 | Configuration error, try again later |
| Error 6006 / 6010 | 积分或参数不匹配,请换模型或重试 | Points/params mismatch, try another model |
| resource_status == 2 / status failed | 语音合成失败,建议换模型或缩短文本 | Synthesis failed, try another model or shorter text |
| timeout | 合成超时,请稍后重试 | Timed out, try again later |
| Network error | 网络不稳定,请检查后重试 | Network unstable, check and retry |
| Text too long (TTS) | 文本过长,请缩短后重试 | Text too long, please shorten |
Links: API key — https://www.imaclaw.ai/imaclaw/apikey ;Credits — https://www.imaclaw.ai/imaclaw/subscription
**Step 5 — Done**
After Step 0–4, no further reply needed. Do not send duplicate confirmations.
---
### Step 4 — Failure Notification
When task status = `failed` or any API/network error, send:
```
❌ [内容类型]生成失败
• 原因:[natural_language_error_message]
• 建议改用:
- [Alt Model 1]([特点],[N pts])
- [Alt Model 2]([特点],[N pts])
需要我帮你用其他模型重试吗?
```
**⚠️ CRITICAL: Error Message Translation**
**NEVER show technical error messages to users.** Always translate API errors into natural language.
**API key & credits:** 密钥与积分管理入口为 imaclaw.ai(与 imastudio.com 同属 IMA 平台)。Key and subscription management: imaclaw.ai (same IMA platform as imastudio.com).
| Technical Error | ❌ Never Say | ✅ Say Instead (Chinese) | ✅ Say Instead (English) |
|----------------|-------------|------------------------|------------------------|
| `401 Unauthorized` 🆕 | Invalid API key / 401 Unauthorized | ❌ API密钥无效或未授权<br>💡 **生成新密钥**: https://www.imaclaw.ai/imaclaw/apikey | ❌ API key is invalid or unauthorized<br>💡 **Generate API Key**: https://www.imaclaw.ai/imaclaw/apikey |
| `4008 Insufficient points` 🆕 | Insufficient points / Error 4008 | ❌ 积分不足,无法创建任务<br>💡 **购买积分**: https://www.imaclaw.ai/imaclaw/subscription | ❌ Insufficient points to create this task<br>💡 **Buy Credits**: https://www.imaclaw.ai/imaclaw/subscription |
| `"Invalid product attribute"` / `"Insufficient points"` | Invalid product attribute | 生成参数配置异常,请稍后重试 | Configuration error, please try again later |
| `Error 6006` (credit mismatch) | Error 6006 | 积分计算异常,系统正在修复 | Points calculation error, system is fixing |
| `Error 6009` (no matching rule) | Error 6009 | 参数组合不匹配,已自动调整 | Parameter mismatch, auto-adjusted |
| `Error 6010` (attribute_id mismatch) | Attribute ID does not match | 模型参数不匹配,请尝试其他模型 | Model parameters incompatible, try another model |
| `error 400` (bad request) | error 400 / Bad request | 请求参数有误,请稍后重试 | Invalid request parameters, please try again |
| `resource_status == 2` | Resource status 2 / Failed | 生成过程遇到问题,建议换个模型试试 | Generation failed, please try another model |
| `status == "failed"` (no details) | Task failed | 这次生成没成功,要不换个模型试试? | Generation unsuccessful, try a different model? |
| `timeout` | Task timed out / Timeout error | 生成时间过长已超时,建议用更快的模型 | Generation took too long, try a faster model |
| Network error / Connection refused | Connection refused / Network error | 网络连接不稳定,请检查网络后重试 | Network connection unstable, check network and retry |
| Rate limit exceeded | 429 Too Many Requests / Rate limit | 请求过于频繁,请稍等片刻再试 | Too many requests, please wait a moment |
| Prompt moderation (Sora only) | Content policy violation | 提示词包含敏感内容,请修改后重试 | Prompt contains restricted content, please modify |
| Model unavailable | Model not available / 503 Service Unavailable | 当前模型暂时不可用,建议换个模型 | Model temporarily unavailable, try another model |
| **Lyrics format error (Suno only)** 🎵 | Invalid lyrics format | 歌词格式有误,请调整后重试 | Lyrics format error, adjust and retry |
| **Prompt too short/long (Music)** 🎵 | Prompt length invalid | 音乐描述过短或过长,请调整到合适长度 (建议20-100字) | Music description too short or long, adjust to appropriate length (20-100 chars recommended) |
| **Text too long (TTS)** 🔊 | TTS text length | 文本过长,请缩短后重试 | Text too long, please shorten and retry |
**Generic fallback (when error is unknown):**
- Chinese: `生成过程遇到问题,请稍后重试或换个模型试试`
- English: `Generation encountered an issue, please try again or use another model`
**Best Practices:**
1. **Focus on user action**: Tell users what to do next, not what went wrong technically
2. **Be reassuring**: Use phrases like "建议换个模型试试" instead of "失败了"
3. **Avoid blame**: Never say "你的提示词有问题" → say "提示词需要调整一下"
4. **Provide alternatives**: Always suggest 1-2 alternative models in the failure message
5. **🆕 Include actionable links (v1.0.8+)**: For 401/4008 errors, provide clickable links to API key generation or credit purchase pages
6. **🎵 Music-specific (v1.2.0+)**:
- For Suno lyrics errors, suggest simplifying lyrics or using auto-generated lyrics (`auto_lyrics=true`)
- For prompt length errors, give example length (e.g., "建议20-100字")
- For BGM requests, recommend DouBao BGM over Suno
7. **🔊 TTS-specific:** Use the TTS error translation table in "For TTS Tasks (text_to_speech)" above; suggest another model via `--list-models` or shortening text.
---
### Step 5 — Done (No Further Action Needed) 🆕
**After sending Step 3 (success) or Step 4 (failure):**
1. **DO NOT send any additional messages** unless the user asks a follow-up question
2. **The task is complete** — wait for the user's next request
3. **User preference has been saved** (if generation succeeded)
4. **The conversation is ready** for the next generation request
**Why this step matters:**
- Prevents unnecessary "anything else?" messages that clutter the chat
- Allows users to naturally continue the conversation when ready
- Respects the asynchronous nature of IM platforms
**Exception:** If the user explicitly asks "还有别的吗?" or similar, then respond naturally.
---
**🆕 Enhanced Error Handling (v1.0.8):**
The Reflection mechanism (3 automatic retries) now provides **specific, actionable suggestions** for common errors:
- **401 Unauthorized**: System suggests generating a new API key with clickable link
- **4008 Insufficient Points**: System suggests purchasing credits with clickable link
- **500 Internal Server Error**: Automatic parameter degradation (size, resolution, duration, quality)
- **6009 No Rule Match**: Automatic parameter completion from credit_rules
- **6010 Attribute Mismatch**: Automatic credit_rule reselection
- **Timeout**: Helpful info with dashboard link for background task status
All error handling is **automatic and transparent** — users receive natural language explanations with next steps.
**Failure fallback by task type:**
| Task Type | Failed Model | First Alt | Second Alt |
|-----------|-------------|-----------|------------|
| text_to_image | SeeDream 4.5 | Nano Banana2 (4pts, fast) | Nano Banana Pro (10-18pts, premium) |
| text_to_image | Nano Banana2 | SeeDream 4.5 (5pts, better quality) | Nano Banana Pro (10-18pts) |
| text_to_image | Nano Banana Pro | SeeDream 4.5 (5pts) | Nano Banana2 (4pts, budget) |
| image_to_image | SeeDream 4.5 | Nano Banana2 (4pts, fast) | Nano Banana Pro (10pts) |
| image_to_image | Nano Banana2 | SeeDream 4.5 (5pts) | Nano Banana Pro (10pts) |
| image_to_image | Nano Banana Pro | SeeDream 4.5 (5pts) | Nano Banana2 (4pts) |
| text_to_video | Kling O1 | Wan 2.6 (25pts) | Vidu Q2 (5pts) |
| text_to_video | Google Veo 3.1 | Kling O1 (48pts) | Sora 2 Pro (122pts) |
| text_to_video | Any | Wan 2.6 (25pts, most popular) | Hailuo 2.0 (5pts) |
| image_to_video | Wan 2.6 | Kling O1 (48pts) | Hailuo 2.0 i2v (25pts) |
| image_to_video | Any | Wan 2.6 (25pts, most popular) | Vidu Q2 Pro (20pts) |
| first_last / reference | Kling O1 | Kling 2.6 (80pts) | Veo 3.1 (70pts+) |
| **text_to_music** 🎵 | **Suno** | **DouBao BGM (30pts, 背景音乐)** | **DouBao Song (30pts, 歌曲生成)** |
| **text_to_music** 🎵 | **DouBao BGM** | **DouBao Song (30pts)** | **Suno (25pts, 功能最强)** |
| **text_to_music** 🎵 | **DouBao Song** | **DouBao BGM (30pts)** | **Suno (25pts, 功能最强)** |
| **text_to_speech** 🔊 | (any) | Query `--list-models` for alternatives | Use another model_id from product list |
**Music-specific failure guidance:**
- If Suno fails → Recommend DouBao BGM (for background music) or DouBao Song (for songs)
- If DouBao BGM fails → Try DouBao Song first (similar pricing), then Suno (more powerful)
- If DouBao Song fails → Try DouBao BGM first (similar pricing), then Suno (more powerful)
- For lyrics errors in Suno → Suggest simplifying lyrics or using `auto_lyrics=true`
- For prompt length errors → Recommend 20-100 characters
**TTS-specific failure guidance:**
- If TTS fails → Run `--task-type text_to_speech --list-models` and suggest another model_id; or shorten text / simplify content. Use the TTS error translation table in "For TTS Tasks" above for user-facing messages.
---
## Supported Models at a Glance
> **Source: production `GET /open/v1/product/list` (2026-02-27).** Model count reduced significantly. Always query product list API at runtime.
### Image Generation (4 models each)
| Category | Name | model_id | Cost |
|----------|------|----------|------|
| **text_to_image** | SeeDream 4.5 🌟 | `doubao-seedream-4.5` | 5 pts |
| text_to_image | Midjourney 🎨 | `midjourney` | 8/10 pts (480p/720p) |
| text_to_image | Nano Banana2 💚 | `gemini-3.1-flash-image` | 4/6/10/13 pts |
| text_to_image | Nano Banana Pro | `gemini-3-pro-image` | 10/10/18 pts |
| **image_to_image** | SeeDream 4.5 🌟 | `doubao-seedream-4.5` | 5 pts |
| image_to_image | Midjourney 🎨 | `midjourney` | 8/10 pts (480p/720p) |
| image_to_image | Nano Banana2 💚 | `gemini-3.1-flash-image` | 4/6/10/13 pts |
| image_to_image | Nano Banana Pro | `gemini-3-pro-image` | 10 pts |
**Midjourney attribute_ids**: 5451/5452 (text_to_image), 5453/5454 (image_to_image)
**Nano Banana2 size options**: 512px (4pts), 1K (6pts), 2K (10pts), 4K (13pts)
**Nano Banana Pro size options**: 1K (10pts), 2K (10pts), 4K (18pts for t2i / 10pts for i2i)
#### Image Model Capabilities (Parameter Support)
⚠️ **Critical**: Models have **varying parameter support**. Custom aspect ratios are now **supported by multiple models**.
| Model | Custom Aspect Ratio | Max Resolution | Size Options | Notes |
|-------|---------------------|----------------|--------------|-------|
| **SeeDream 4.5** | ✅ (via virtual params) | 4K (adaptive) | 8 aspect ratios | Supports 1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9 (5 pts) |
| **Nano Banana2** | ✅ **Native support** 🆕 | 4K (4096×4096) | 512px/1K/2K/4K + aspect ratios | Supports 1:1, 16:9, 9:16, 4:3, 3:4; size via `attribute_id` |
| **Nano Banana Pro** | ✅ **Native support** 🆕 | 4K (4096×4096) | 1K/2K/4K + aspect ratios | Supports 1:1, 16:9, 9:16, 4:3, 3:4; size via `attribute_id` |
| **Midjourney** 🎨 | ❌ (1:1 only) | 1024px (square) | 480p/720p via `attribute_id` | Fixed 1024x1024, artistic style focus |
**Key Capabilities**:
- ✅ **Aspect ratio control**: **SeeDream 4.5** (virtual params), **Nano Banana Pro/2** (native support)
- ❌ **8K**: Not supported by any model (max is 4K)
- ✅ **Size control**: **Nano Banana2**, **Nano Banana Pro**, and **Midjourney** support multiple size options via different `attribute_id`s
-
标签
skill
ai