IMA Studio

# IMA AI Creation ## ⚠️ 重要：模型 ID 参考 **CRITICAL:** When calling the script, you MUST use the exact **model_id** (second column), NOT the friendly model name. Do NOT infer model_id from the friendly name (e.g., ❌ `nano-banana-pro` is WRONG; ✅ `gemini-3-pro-image` is CORRECT). **Quick Reference Table:** ### 图像模型 (Image Models) | 友好名称 (Friendly Name) | model_id | 说明 (Notes) | |-------------------------|----------|-------------| | Nano Banana2 | `gemini-3.1-flash-image` | ❌ NOT nano-banana-2, 预算选择 4-13 pts | | Nano Banana Pro | `gemini-3-pro-image` | ❌ NOT nano-banana-pro, 高质量 10-18 pts | | SeeDream 4.5 | `doubao-seedream-4.5` | ✅ Recommended default, 5 pts | | Midjourney | `midjourney` | ✅ Same as friendly name, 8-10 pts | ### 视频模型 (Video Models) | 友好名称 (Friendly Name) | model_id (t2v) | model_id (i2v) | 说明 (Notes) | |-------------------------|---------------|----------------|-------------| | Wan 2.6 | `wan2.6-t2v` | `wan2.6-i2v` | ⚠️ Note -t2v/-i2v suffix | | IMA Video Pro (Sevio 1.0) | `ima-pro` | `ima-pro` | ✅ IMA native quality model | | IMA Video Pro Fast (Sevio 1.0-Fast) | `ima-pro-fast` | `ima-pro-fast` | ✅ IMA native low-latency model | | Kling O1 | `kling-video-o1` | `kling-video-o1` | ⚠️ Note video- prefix | | Kling 2.6 | `kling-v2-6` | `kling-v2-6` | ⚠️ Note v prefix | | Hailuo 2.3 | `MiniMax-Hailuo-2.3` | `MiniMax-Hailuo-2.3` | ⚠️ Note MiniMax- prefix | | Hailuo 2.0 | `MiniMax-Hailuo-02` | `MiniMax-Hailuo-02` | ⚠️ Note 02 not 2.0 | | Google Veo 3.1 | `veo-3.1-generate-preview` | `veo-3.1-generate-preview` | ⚠️ Note -generate-preview suffix | | Sora 2 Pro | `sora-2-pro` | `sora-2-pro` | ✅ Straightforward | | Pixverse | `pixverse` | `pixverse` | ✅ Same as friendly name | ### 音乐模型 (Music Models) | 友好名称 (Friendly Name) | model_id | 说明 (Notes) | |-------------------------|----------|-------------| | Suno (sonic v4) | `sonic` | ⚠️ Simplified to sonic | | DouBao BGM | `GenBGM` | ❌ NOT doubao-bgm | | DouBao Song | `GenSong` | ❌ NOT doubao-song | ### 语音模型 (Speech/TTS Models) | 友好名称 (Friendly Name) | model_id | 说明 (Notes) | |-------------------------|----------|-------------| | seed-tts-2.0 | `seed-tts-2.0` | ✅ Same as friendly name (default) | **How to get the correct model_id:** 1. Check this table first 2. Use `--list-models --task-type <type>` to query available models 3. Refer to command examples in this SKILL.md > Runtime truth source: `GET /open/v1/product/list` (or `--list-models`). > Any table in this document is guidance; actual availability depends on current product list. **Example:** ```bash # ❌ WRONG: Inferring from friendly name --model-id nano-banana-pro # ✅ CORRECT: Using exact model_id from table --model-id gemini-3-pro-image ``` --- ## 📚 Optional Knowledge Enhancement (ima-knowledge-ai) This skill is fully runnable as a standalone package. If `ima-knowledge-ai` is installed, the agent may read its references for workflow decomposition and consistency guidance. Recommended optional reads: 1. **Check for workflow complexity** — Read `ima-knowledge-ai/references/workflow-design.md` if: - User mentions: "MV"、"宣传片"、"完整作品"、"配乐"、"soundtrack" - Task spans multiple media types (image + video, video + music, etc.) - Complex multi-step workflows that need task decomposition 2. **Check for visual consistency needs** — Read `ima-knowledge-ai/references/visual-consistency.md` if: - User mentions: "系列"、"多张"、"同一个"、"角色"、"续"、"series"、"same" - Task involves: multiple images/videos, character continuity, product shots - Second+ request about same subject (e.g., "旺财在游泳" after "生成旺财照片") 3. **Check video modes** — Read `ima-knowledge-ai/references/video-modes.md` if: - Any video generation task - Need to understand: image_to_video vs reference_image_to_video difference 4. **Check model selection** — Read `ima-knowledge-ai/references/model-selection.md` if: - Unsure which model to use - Need cost/quality trade-off guidance - User specifies budget or quality requirements **Why this matters:** - Multi-media workflows need proper task sequencing (e.g., video duration → matching music duration) - AI generation defaults to **独立生成** each time — without reference images, results will be inconsistent - Wrong video mode = wrong result (image_to_video ≠ reference_image_to_video) - Model choice affects cost and quality significantly **Example multi-media workflow:** ``` User: "帮我做个产品宣传MV，有背景音乐，主角是旺财小狗" ❌ Wrong: 1. Generate dog image (random look) 2. Generate video (different dog) 3. Generate music (unrelated) ✅ Right: 1. Read workflow-design.md + visual-consistency.md 2. Generate Master Reference: 旺财小狗图片 3. Generate video shots using image_to_video with 旺财 as first frame 4. Get video duration (e.g., 15s) 5. Generate BGM with matching duration and mood ``` **How to check:** ```python # Step 0: Determine media type first (image / video / music / speech) # From user request: "画"/"生成图"/"image" → image; "视频"/"video" → video; "音乐"/"歌"/"music"/"BGM" → music; "语音"/"朗读"/"TTS"/"speech" → speech # Then choose task_type and model from the corresponding section (image: text_to_image/image_to_image; video: text_to_video/...; music: text_to_music; speech: text_to_speech) # Step 1: Read knowledge base based on task type if multi_media_workflow: read("~/.openclaw/skills/ima-knowledge-ai/references/workflow-design.md") if "same subject" or "series" or "character": read("~/.openclaw/skills/ima-knowledge-ai/references/visual-consistency.md") if video_generation: read("~/.openclaw/skills/ima-knowledge-ai/references/video-modes.md") # Step 2: Execute with proper sequencing and reference images # (see workflow-design.md for specific patterns) ``` **No exceptions** — for simple single-media requests, you can proceed directly. For complex multi-media workflows, read the knowledge base first. --- ## 📥 User Input Parsing (Media Type & Task Routing) **Purpose:** So that any agent parses user intent consistently, **first determine the media type** from the user's request, then choose task_type and model. ### 1. User phrasing → media type (do this first) | User intent / keywords | Media type | task_type examples | |------------------------|------------|---------------------| | 画 / 生成图 / 图片 / image / 画一张 / 图生图 | **image** | `text_to_image`, `image_to_image` | | 视频 / 生成视频 / video / 图生视频 / 文生视频 | **video** | `text_to_video`, `image_to_video`, `first_last_frame_to_video`, `reference_image_to_video` | | 音乐 / 歌 / BGM / 背景音乐 / music / 作曲 | **music** | `text_to_music` | | 语音 / 朗读 / TTS / 语音合成 / 配音 / speech / read aloud / text-to-speech | **speech** | `text_to_speech` | If the request mixes media (e.g. "宣传片+配乐"), treat as **multi-media workflow**: read `workflow-design.md`, then plan image → video → music steps and use the correct task_type for each step. ### 2. Model and parameter parsing - **Image:** For model name → model_id and size/aspect_ratio parsing, follow the same rules as in **ima-image-ai** skill (User Input Parsing section). - **Video:** For task_type (t2v / i2v / first_last / reference), model alias → model_id, and duration/resolution/aspect_ratio, follow **ima-video-ai** skill (User Input Parsing section). Sevio alias normalization in `ima-all-ai`: - `Ima Sevio 1.0` → `ima-pro` - `Ima Sevio 1.0-Fast` / `Ima Sevio 1.0 Fast` → `ima-pro-fast` Routing rule: - Normalize alias first - Then resolve against runtime product list for the selected `task_type` - If model is absent in current category, return available model_ids from `--list-models` - **Music:** Suno (`sonic`) vs DouBao BGM/Song — infer from "BGM"/"背景音乐" → BGM; "带歌词"/"人声" → Suno or Song. Use model_id `sonic`, `GenBGM`, `GenSong` per "Recommended Defaults" and "Music Generation" tables below. - **Speech (TTS):** Get model_id from `GET /open/v1/product/list?category=text_to_speech` or run script with `--task-type text_to_speech --list-models`. Map user intent to parameters using product `form_config`: | User intent / phrasing | Parameter (if in form_config) | Notes | |------------------------|--------------------------------|--------| | 女声 / 女声朗读 / female voice | voice_id / voice_type | Use value from form_config options | | 男声 / 男声朗读 / male voice | voice_id / voice_type | Use value from form_config options | | 语速快/慢 / speed up/slow | speed | e.g. 0.8–1.2 | | 音调 / pitch | pitch | If supported | | 大声/小声 / volume | volume | If supported | If the user does not specify, use form_config defaults. Pass extra params via `--extra-params '{"speed":1.0}'`. Only send parameters present in the product’s credit_rules/attributes or form_config (script reflection strips others on retry). --- ## ⚙️ How This Skill Works **For transparency:** This skill uses a bundled Python script (`scripts/ima_create.py`) to call the IMA Open API. The script: - Sends your prompt to **two IMA-owned domains** (see "Network Endpoints" below) - Uses `--user-id` **only locally** as a key for storing your model preferences - Returns image/video/music URLs when generation is complete **What gets sent to IMA servers:** - ✅ Your prompt/description (image/video/music) - ✅ Model selection (SeeDream/Wan/Suno/etc.) - ✅ Generation parameters (size, duration, style, etc.) - ❌ NO API key in prompts (key is used for authentication only) - ❌ NO user_id (it's only used locally) **What's stored locally:** - `~/.openclaw/memory/ima_prefs.json` - Your model preferences (< 1 KB) - `~/.openclaw/logs/ima_skills/` - Generation logs (auto-deleted after 7 days) --- ## 🌐 Network Endpoints Used | Domain | Owner | Purpose | Data Sent | Privacy | |--------|-------|---------|-----------|---------| | `api.imastudio.com` | IMA Studio | Main API (product list, task creation, task polling) | Prompts, model IDs, generation params, **your API key** | Standard HTTPS, data processed for AI generation | | `imapi.liveme.com` | IMA Studio | Image/Video upload service (presigned URL generation) | **Your API key**, file metadata (MIME type, extension) | Standard HTTPS, used for image/video tasks only | | `*.aliyuncs.com`, `*.esxscloud.com` | Alibaba Cloud (OSS) | Image/video storage (file upload, CDN delivery) | Raw image/video bytes (via presigned URL, **NO API key**) | IMA-managed OSS buckets, presigned URLs expire after 7 days | **Key Points:** - **Music tasks** (`text_to_music`) and **TTS tasks** (`text_to_speech`) only use `api.imastudio.com`. - **Image/video tasks** require `imapi.liveme.com` to obtain presigned URLs for uploading input images. - Your API key is sent to **both `api.imastudio.com` and `imapi.liveme.com`** (both owned by IMA Studio). - Verify network calls: `tcpdump -i any -n 'host api.imastudio.com or host imapi.liveme.com'`. See this document: **🌐 Network Endpoints Used** and **⚠️ Credential Security Notice** for full disclosure. --- ## ⚠️ Credential Security Notice **Your API key is sent to both IMA-owned domains:** 1. `Authorization: Bearer ima_xxx...` → `api.imastudio.com` (main API) 2. Query param `appUid=ima_xxx...` → `imapi.liveme.com` (upload service) **Security best practices:** - 🧪 **Use test keys for experiments:** Generate a separate API key for testing. - 🔍 **Monitor usage:** Check `https://imastudio.com/dashboard` for unauthorized activity. - ⏱️ **Rotate keys:** Regenerate your API key periodically (monthly recommended). - 📊 **Review logs:** Check `~/.openclaw/logs/ima_skills/` for unexpected API calls. **Why two domains?** IMA Studio uses a microservices architecture: - **`api.imastudio.com`**: Core AI generation API - **`imapi.liveme.com`**: Specialized image/video upload service (shared infrastructure) Both domains are operated by IMA Studio. The same API key grants access to both services. ### Agent Execution (Internal Reference) > **Note for users:** You can review the script source at `scripts/ima_create.py` anytime. > The agent uses this script to simplify API calls. **Music tasks** use only `api.imastudio.com`, while **image/video tasks** also call `imapi.liveme.com` for file uploads (see "Network Endpoints" above). Use the bundled script internally for all task types — it ensures correct parameter construction: ```bash # ─── Image Generation ────────────────────────────────────────────────────────── # Basic text-to-image (default model) python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type text_to_image \ --model-id doubao-seedream-4.5 --prompt "a cute puppy on grass, photorealistic" \ --user-id {user_id} --output-json # Text-to-image with size override (Nano Banana2) python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type text_to_image \ --model-id gemini-3.1-flash-image --prompt "city skyline at sunset, 4K" \ --size 2k --user-id {user_id} --output-json # Image-to-image with input URL python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type image_to_image \ --model-id doubao-seedream-4.5 --prompt "turn into oil painting style" \ --input-images https://example.com/photo.jpg --user-id {user_id} --output-json # ─── Video Generation ────────────────────────────────────────────────────────── # Basic text-to-video (default model, 5s 720P) python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type text_to_video \ --model-id wan2.6-t2v --prompt "a puppy dancing happily, cinematic" \ --user-id {user_id} --output-json # Text-to-video with extra params (10s 1080P) python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type text_to_video \ --model-id wan2.6-t2v --prompt "dramatic ocean waves, sunset" \ --extra-params '{"duration":10,"resolution":"1080P","aspect_ratio":"16:9"}' \ --user-id {user_id} --output-json # Image-to-video (animate static image) python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type image_to_video \ --model-id wan2.6-i2v --prompt "camera slowly zooms in, gentle movement" \ --input-images https://example.com/photo.jpg --user-id {user_id} --output-json # First-last frame video (two images) python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type first_last_frame_to_video \ --model-id kling-video-o1 --prompt "smooth transition between frames" \ --input-images https://example.com/frame1.jpg https://example.com/frame2.jpg \ --user-id {user_id} --output-json # ─── Music Generation ────────────────────────────────────────────────────────── # Basic text-to-music (Suno default) python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type text_to_music \ --model-id sonic --prompt "upbeat electronic music, 120 BPM, no vocals" \ --user-id {user_id} --output-json # Music with custom lyrics (Suno custom mode) python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type text_to_music \ --model-id sonic --prompt "pop ballad, emotional" \ --extra-params '{"custom_mode":true,"lyrics":"Your custom lyrics here...","vocal_gender":"female"}' \ --user-id {user_id} --output-json # Background music (DouBao BGM) python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type text_to_music \ --model-id GenBGM --prompt "relaxing ambient music for meditation" \ --user-id {user_id} --output-json # ─── Text-to-Speech (TTS) ───────────────────────────────────────────────────── # List TTS models first to get model_id, then generate speech python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type text_to_speech --list-models # TTS: use model_id from list above (prompt = text to speak) python3 {baseDir}/scripts/ima_create.py \ --api-key $IMA_API_KEY --task-type text_to_speech \ --model-id <model_id from list> --prompt "Text to be spoken here." \ --user-id {user_id} --output-json ``` The script outputs JSON with `url`, `model_name`, `credit` — use these values in the UX protocol messages below. The script internals (product list query, parameter construction, polling) are invisible to users. --- ## Overview Call IMA Open API to create AI-generated content. All endpoints require an `ima_*` API key. The core flow is: **query products → create task → poll until done**. --- ## 🔒 Security & Transparency Policy > **This skill is community-maintained and open for inspection.** ### ✅ What Users CAN Do **Full transparency:** - ✅ **Review all source code**: Check `scripts/ima_create.py` and `ima_logger.py` anytime - ✅ **Verify network calls**: Music tasks use `api.imastudio.com` only; image/video tasks also use `imapi.liveme.com` (see "Network Endpoints" section) - ✅ **Inspect local data**: View `~/.openclaw/memory/ima_prefs.json` and log files - ✅ **Control privacy**: Delete preferences/logs anytime, or disable file writes (see below) **Configuration allowed:** - ✅ **Set API key** in environment or agent config: - Environment variable: `export IMA_API_KEY=ima_your_key_here` - OpenClaw/MCP config: Add `IMA_API_KEY` to agent's environment configuration - Get your key at: https://imastudio.com - ✅ **Use scoped/test keys**: Test with limited API keys, rotate after testing - ✅ **Disable file writes**: Make prefs/logs read-only or symlink to `/dev/null` **Data control:** - ✅ **View stored data**: `cat ~/.openclaw/memory/ima_prefs.json` - ✅ **Delete preferences**: `rm ~/.openclaw/memory/ima_prefs.json` (resets to defaults) - ✅ **Delete logs**: `rm -rf ~/.openclaw/logs/ima_skills/` (auto-cleanup after 7 days anyway) ### ⚠️ Advanced Users: Fork & Modify If you need to modify this skill for your use case: 1. **Fork the repository** (don't modify the original) 2. **Update your fork** with your changes 3. **Test thoroughly** with limited API keys 4. **Document your changes** for troubleshooting **Note:** Modified skills may break API compatibility or introduce security issues. Official support only covers the unmodified version. ### ❌ What to AVOID (Security Risks) **Actions that could compromise security:** - ❌ Sharing API keys publicly or in skill files - ❌ Modifying API endpoints to unknown servers - ❌ Disabling SSL/TLS certificate verification - ❌ Logging sensitive user data (prompts, IDs, etc.) - ❌ Bypassing authentication or billing mechanisms **Why this matters:** 1. **API Compatibility**: Skill logic aligns with IMA Open API schema 2. **Security**: Malicious modifications could leak credentials or bypass billing 3. **Support**: Modified skills may not be supported 4. **Community**: Breaking changes affect all users ### 📋 Privacy & Data Handling Summary **What this skill does with your data:** | Data Type | Sent to IMA? | Stored Locally? | User Control | |-----------|-------------|-----------------|--------------| | Prompts (image/video/music) | ✅ Yes (required for generation) | ❌ No | None (required) | | API key | ✅ Yes (authentication header) | ❌ No | Set via env var | | user_id (optional CLI arg) | ❌ **Never** (local preference key only) | ✅ Yes (as prefs file key) | Change `--user-id` value | | Model preferences | ❌ No | ✅ Yes (~/.openclaw) | Delete anytime | | Generation logs | ❌ No | ✅ Yes (~/.openclaw) | Auto-cleanup 7 days | **Privacy recommendations:** 1. **Use test/scoped API keys** for initial testing 2. **Note**: `--user-id` is **never sent to IMA servers** - it's only used locally as a key for storing preferences in `~/.openclaw/memory/ima_prefs.json` 3. **Review source code** at `scripts/ima_create.py` to verify network calls (search for `create_task` function) 4. **Rotate API keys** after testing or if compromised **Get your IMA API key:** Visit https://imastudio.com to register and get started. ### 🔧 For Skill Maintainers Only **Version control:** - All changes must go through Git with proper version bumps (semver) - CHANGELOG.md must document all changes - Production deployments require code review **File checksums (optional):** ```bash # Verify skill integrity sha256sum SKILL.md scripts/ima_create.py ``` If users report issues, verify file integrity first. --- ## 🧠 User Preference Memory (Image) > User preferences have **highest priority** when they exist. But preferences are only saved when users **explicitly express** model preferences — not from automatic model selection. ### Storage: `~/.openclaw/memory/ima_prefs.json` Single file, shared across all IMA skills: ```json { "user_{user_id}": { "text_to_image": { "model_id": "doubao-seedream-4.5", "model_name": "SeeDream 4.5", "credit": 5, "last_used": "2026-02-27T03:07:27Z" }, "image_to_image": { "model_id": "doubao-seedream-4.5", "model_name": "SeeDream 4.5", "credit": 5, "last_used": "2026-02-27T03:07:27Z" }, "text_to_speech": { "model_id": "<from product list>", "model_name": "...", "credit": 2, "last_used": "..." } } } ``` ### Model Selection Flow (Image Generation) **Step 1: Get knowledge-ai recommendation** (if installed) ```python knowledge_recommended_model = read_ima_knowledge_ai() # e.g., "SeeDream 4.5" ``` **Step 2: Check user preference** ```python user_pref = load_prefs().get(f"user_{user_id}", {}).get(task_type) # e.g., {"model_id": "midjourney", ...} ``` **Step 3: Decide which model to use** ```python if user_pref exists: use_model = user_pref["model_id"] # Highest priority else: use_model = knowledge_recommended_model or fallback_default ``` **Step 4: Check for mismatch (for later hint)** ```python if user_pref exists and knowledge_recommended_model != user_pref["model_id"]: mismatch = True # Will add hint in success message ``` ### When to Write (User Explicit Preference ONLY) **✅ Save preference when user explicitly specifies a model:** | User says | Action | |-----------|--------| | `用XXX` / `换成XXX` / `改用XXX` | Switch to model XXX + save as preference | | `以后都用XXX` / `默认用XXX` / `always use XXX` | Save + confirm: `✅ 已记住！以后图片生成默认用 [XXX]` | | `我喜欢XXX` / `我更喜欢XXX` | Save as preference | **❌ Do NOT save when:** - Agent auto-selects from knowledge-ai → not user preference - Agent uses fallback default → not user preference - User says generic quality requests (see "Clear Preference" below) → clear preference instead ### When to Clear (User Abandons Preference) **🗑️ Clear preference when user wants automatic selection:** | User says | Action | |-----------|--------| | `用最好的` / `用最合适的` / `best` / `recommended` | Clear pref + use knowledge-ai recommendation | | `推荐一个` / `你选一个` / `自动选择` | Clear pref + use knowledge-ai recommendation | | `用默认的` / `用新的` | Clear pref + use knowledge-ai recommendation | | `试试别的` / `换个试试` (without specific model) | Clear pref + use knowledge-ai recommendation | | `重新推荐` | Clear pref + use knowledge-ai recommendation | **Implementation:** ```python del prefs[f"user_{user_id}"][task_type] save_prefs(prefs) ``` --- ## ⭐ Model Selection Priority (Image) **Selection flow:** 1. **User preference** (if exists) → Highest priority, always respect 2. **ima-knowledge-ai skill** (if installed) → Professional recommendation based on task 3. **Fallback defaults** → Use table below (only if neither 1 nor 2 exists) **Important notes:** - User preference is only saved when user **explicitly specifies** a model (see "When to Write" above) - Knowledge-ai is **always consulted** (even when user pref exists) to detect mismatches - When mismatch detected → add gentle hint in success message (does NOT interrupt generation) > The defaults below are FALLBACK only. User preferences have highest priority, then knowledge-ai recommendations. When using user preference for image generation, show a line like: ``` 🎨 根据你的使用习惯，将用 [Model Name] 帮你生成… • 模型：[Model Name]（你的常用模型） • 预计耗时：[X ~ Y 秒] • 消耗积分：[N pts] ``` ### Preference Change Confirmation When user switches to a different model than their saved preference: ``` 💡 你之前喜欢用 [Old Model]，这次换成了 [New Model]。要把 [New Model] 设为以后的默认吗？回复「是」保存 / 回复「否」仅本次使用 ``` --- ## ⭐ Recommended Defaults > **These are fallback defaults — only used when no user preference exists.** > **Always default to the newest and most popular model. Do NOT default to the cheapest.** | Task Type | Default Model | model_id | version_id | Cost | Why | |-----------|--------------|----------|------------|------|-----| | **text_to_image** | **SeeDream 4.5** | `doubao-seedream-4.5` | `doubao-seedream-4-5-251128` | 5 pts | Latest doubao flagship, photorealistic 4K | | text_to_image (budget) | **Nano Banana2** | `gemini-3.1-flash-image` | `gemini-3.1-flash-image` | 4 pts | Fastest and cheapest option | | text_to_image (premium) | **Nano Banana Pro** | `gemini-3-pro-image` | `gemini-3-pro-image-preview` | 10/10/18 pts | Premium quality, 1K/2K/4K options | | text_to_image (artistic) | **Midjourney** 🎨 | `midjourney` | `v6` | 8/10 pts | Artist-level aesthetics, creative styles | | **image_to_image** | **SeeDream 4.5** | `doubao-seedream-4.5` | `doubao-seedream-4-5-251128` | 5 pts | Latest, best i2i quality | | image_to_image (budget) | **Nano Banana2** | `gemini-3.1-flash-image` | `gemini-3.1-flash-image` | 4 pts | Cheapest option | | image_to_image (premium) | **Nano Banana Pro** | `gemini-3-pro-image` | `gemini-3-pro-image-preview` | 10 pts | Premium quality | | image_to_image (artistic) | **Midjourney** 🎨 | `midjourney` | `v6` | 8/10 pts | Artist-level aesthetics, style transfer | | **text_to_video** | **Wan 2.6** | `wan2.6-t2v` | `wan2.6-t2v` | 25 pts | 🔥 Most popular t2v, balanced cost | | text_to_video (premium) | **Hailuo 2.3** | `MiniMax-Hailuo-2.3` | `MiniMax-Hailuo-2.3` | 38 pts | Higher quality | | text_to_video (budget) | **Vidu Q2** | `viduq2` | `viduq2` | 5 pts | Lowest cost t2v | | **image_to_video** | **Wan 2.6** | `wan2.6-i2v` | `wan2.6-i2v` | 25 pts | 🔥 Most popular i2v, 1080P | | image_to_video (premium) | **Kling 2.6** | `kling-v2-6` | `kling-v2-6` | 40-160 pts | Premium Kling i2v | | **first_last_frame_to_video** | **Kling O1** | `kling-video-o1` | `kling-video-o1` | 48 pts | Newest Kling reasoning model | | **reference_image_to_video** | **Kling O1** | `kling-video-o1` | `kling-video-o1` | 48 pts | Best reference fidelity | | **text_to_music** | **Suno (sonic-v4)** | `sonic` | `sonic` | 25 pts | Latest Suno engine, best quality | | **text_to_speech** | (query product list) | — | — | — | Run `--task-type text_to_speech --list-models`; use first or user-preferred model_id | **Premium options:** - **Image**: Nano Banana Pro — Highest quality with size control (1K/2K/4K), higher cost (10-18 pts for text_to_image, 10 pts for image_to_image) - **Video**: Kling O1, Sora 2 Pro, Google Veo 3.1 — Premium quality with longer duration options **Quick selection guide (production as of 2026-02-27, sorted by popularity):** - **Image (4 models available)** → **SeeDream 4.5** (5, default); artistic → Midjourney 🎨 (8-10); budget → Nano Banana2 (4, 512px); premium → Nano Banana Pro (10-18) - **🔥 Video from text (most popular)** → **Wan 2.6** (25, balanced); premium → Hailuo 2.3 (38); budget → Vidu Q2 (5) - **🔥 Video from image (most popular)** → **Wan 2.6** (25) - Music → **Suno** (25); DouBao BGM/Song (30 each) - Cheapest → Nano Banana2 512px (4) for image; Vidu Q2 (5) for video **Selection guide by use case:** **Image Generation:** - General image generation → **SeeDream 4.5** (5pts) - **Custom aspect ratio (16:9, 9:16, 4:3, etc.)** → **SeeDream 4.5** 🌟 or **Nano Banana Pro/2** 🆕 (native support) - Budget-conscious / fast generation → **Nano Banana2** (4pts) - Highest quality with size control (1K/2K/4K) → **Nano Banana Pro** (text_to_image: 10-18pts, image_to_image: 10pts) - **Artistic/creative styles, illustrations, paintings** → **Midjourney** 🎨 (8-10pts) - Style transfer / image editing → **SeeDream 4.5** (5pts) or **Midjourney** 🎨 (artistic) **Video Generation:** - General video generation → **Wan 2.6** (25pts, most popular) - Premium cinematic quality → **Google Veo 3.1** (70-330pts) or **Sora 2 Pro** (122+pts) - Budget video → **Vidu Q2** (5pts) or **Hailuo 2.0** (5pts) - With audio support → **Kling O1** (48+pts) or **Google Veo 3.1** (70+pts) - First/last frame animation → **Kling O1** (48+pts) - Reference image consistency → **Kling O1** (48+pts) or **Google Veo 3.1** (70+pts) **Music Generation:** - **Custom song with lyrics, vocals, style** → **Suno sonic-v5** (25pts, default, ~2min) - Full control: custom_mode, lyrics, vocal_gender, tags, negative_tags - Best for: complete songs, vocal tracks, artistic compositions - **Background music / ambient loop** → **DouBao BGM** (30pts, ~30s) - Simplified: prompt-only, no advanced parameters - Best for: video backgrounds, ambient music, short loops - **Simple song generation** → **DouBao Song** (30pts, ~30s) - Simplified: prompt-only - Best for: quick song generation, structured vocal compositions - **User explicitly asks for cheapest** → DouBao BGM/Song (6pts each) — only if explicitly requested **Speech (TTS) Generation:** - **Text-to-speech / 语音合成 / 朗读** → `text_to_speech`. Always query `GET /open/v1/product/list?category=text_to_speech` (or `--list-models`) to get current model_id and credit. No fixed default; use first available or user preference. Voice/speed/format parameters: see "Model and parameter parsing" (TTS table) and "Speech (TTS) — text_to_speech" in this document. **⚠️ Technical Note for Suno:** > `model_version` inside `parameters.parameters` (e.g., `"sonic-v5"`) is different from the outer `model_version` field (which is `"sonic"`). Always set both correctly when creating Suno tasks. **⚠️ Production Image Models (4 available):** - SeeDream 4.5 (`doubao-seedream-4.5`) — 5 pts, default - Midjourney 🎨 (`midjourney`) — 8/10 pts for 480p/720p, artistic styles - Nano Banana2 (`gemini-3.1-flash-image`) — 4/6/10/13 pts for 512px/1K/2K/4K - Nano Banana Pro (`gemini-3-pro-image`) — 10/10/18 pts for 1K/2K/4K **All other image models mentioned in older documentation are no longer available in production.** **🌟 Parameter Support Notes (All Task Types):** ### Image Models (text_to_image / image_to_image) **🆕 MAJOR UPDATE: Nano Banana series now has NATIVE aspect_ratio support!** - **Nano Banana Pro**: ✅ Supports `aspect_ratio` (1:1, 16:9, 9:16, 4:3, 3:4) NATIVELY - **Nano Banana2**: ✅ Supports `aspect_ratio` (1:1, 16:9, 9:16, 4:3, 3:4) NATIVELY - **SeeDream 4.5**: ✅ Supports 8 ratios via virtual params (1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9) - **Midjourney**: ❌ 1:1 only (fixed 1024x1024) **aspect_ratio support details:** - ✅ **aspect_ratio**: - **SeeDream 4.5**: ✅ Supports 8 ratios via virtual params (1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9) - **Nano Banana2**: ✅ **Native support** for 5 ratios (1:1, 16:9, 9:16, 4:3, 3:4) - **Nano Banana Pro**: ✅ **Native support** for 5 ratios (1:1, 16:9, 9:16, 4:3, 3:4) - **Midjourney**: ❌ 1:1 only (fixed 1024x1024) - ✅ **size**: - **Nano Banana2**: 512px, 1K, 2K, 4K (via different `attribute_id`s, 4-13 pts) - **Nano Banana Pro**: 1K, 2K, 4K (via different `attribute_id`s, 10-18 pts) - **SeeDream 4.5**: Adaptive default (5 pts) - **Midjourney**: 480p/720p (via `attribute_id`, 8/10 pts) - ❌ **8K**: No model supports 8K (max is 4K via Nano Banana Pro) - ❌ **Non-standard aspect ratios** (7:3, 8:5, etc.): Not supported. Use closest supported ratio or video models. - ✅ **n**: Multiple outputs supported (1-4), credit × n **When user requests unsupported combinations for images:** - **Midjourney + aspect_ratio (16:9, etc.)**: Recommend **SeeDream 4.5** or **Nano Banana series** instead ``` ❌ Midjourney 暂不支持自定义 aspect_ratio（仅支持 1024x1024 方形） ✅ 推荐方案： 1. SeeDream 4.5（支持虚拟参数 aspect_ratio） • 支持比例：1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9 • 成本：5 积分（性价比最佳） 2. Nano Banana Pro/2（原生支持 aspect_ratio） • 支持比例：1:1, 16:9, 9:16, 4:3, 3:4 • 成本：4-18 积分（按尺寸）需要我帮你用 SeeDream 4.5 生成吗？ ``` - **Any model + 8K**: Inform user no model supports 8K, max is 4K (Nano Banana Pro) - **Any model + non-standard ratio (7:3, 8:5, etc.)**: Non-standard ratio, not supported. Suggest closest supported ratio (e.g., 21:9 for ultra-wide, 2:3 for portrait) ### Video Models (text_to_video / image_to_video / first_last_frame / reference_image) - ✅ **resolution**: 540P, 720P, 1080P, 2K, 4K (model-dependent, higher res = higher cost) - ✅ **aspect_ratio**: 16:9, 9:16, 1:1, 4:3 (model-dependent, check `form_config`) - ✅ **duration**: 4s, 5s, 10s, 15s (model-dependent, longer = higher cost) - ⚠️ **generate_audio**: Supported by Veo 3.1, Kling O1, Hailuo (check `form_config`) - ✅ **prompt_extend**: AI-powered prompt enhancement (most models support) - ✅ **negative_prompt**: Content exclusion (most models support) - ✅ **shot_type**: Single/multi-shot control (model-dependent) - ✅ **seed**: Reproducibility control (most models support, -1 = random) - ✅ **n**: Multiple outputs (1-4), credit × n #### 🆕 Special Case: Pixverse Model Parameter (v1.0.7+) **Auto-Inference Logic for Pixverse V5.5/V5/V4:** - **Problem**: Pixverse V5.5, V5, V4 lack `model` field in `form_config` from Product List API - **Backend Requirement**: Backend requires `model` parameter (e.g., `"v5.5"`, `"v5"`, `"v4"`) - **Auto-Fix**: System automatically extracts version from `model_name` and injects it - Example: `model_name: "Pixverse V5.5"` → auto-inject `model: "v5.5"` - Example: `model_name: "Pixverse V4"` → auto-inject `model: "v4"` - **Note**: V4.5 and V3.5 include `model` in `form_config` (no auto-inference needed) - **Relevant Task Types**: All video modes (text_to_video, image_to_video, first_last_frame_to_video, reference_image_to_video) **Error Prevention:** - Without auto-inference: `err_code=400017 err_msg=Invalid value for model` - With auto-inference (v1.0.7+): Pixverse V5.5/V5/V4 work seamlessly ✅ ### Music Models (text_to_music) **Suno sonic-v5 (Full-Featured):** - ✅ **custom_mode**: Suno only (enables vocal_gender, lyrics, tags support) - ✅ **vocal_gender**: Suno only (male/female/mixed, requires custom_mode=True) - ✅ **lyrics**: Suno only (custom lyrics support, requires custom_mode=True) - ✅ **make_instrumental**: Suno only (force instrumental, no vocals) - ✅ **auto_lyrics**: Suno only (AI-generated lyrics) - ✅ **tags**: Suno only (genre/style tags) - ✅ **negative_tags**: Suno only (exclude unwanted styles) - ✅ **title**: Suno only (song title) - ❌ **duration**: Fixed-length output (DouBao ~30s, Suno ~2min, not user-controllable) - ✅ **n**: Multiple outputs supported (1-2), credit × n **DouBao BGM/Song (Simplified):** - ✅ **prompt**: Text description only - ❌ **No advanced parameters** (no custom_mode, lyrics, vocal control) - ❌ **duration**: Fixed ~30s output **🎵 Suno Prompt Writing Guide (for `gpt_description_prompt`):** When using Suno, structure your prompt with these elements: 1. **Genre/Style:** - Examples: `"lo-fi hip hop"`, `"orchestral cinematic"`, `"upbeat pop"`, `"dark ambient"`, `"indie folk"`, `"electronic dance"` 2. **Tempo/BPM:** - Examples: `"80 BPM"`, `"fast tempo"`, `"slow ballad"`, `"moderate pace 110 BPM"` 3. **Vocals Control:** - **No vocals**: `"no vocals"` → set `make_instrumental=true` - **With vocals**: `"female vocals"` → set `vocal_gender="female"` - **Male vocals**: `"male vocals"` → set `vocal_gender="male"` - **Mixed**: Set `vocal_gender="mixed"` 4. **Mood/Emotion:** - Examples: `"happy and energetic"`, `"melancholic"`, `"tense and dramatic"`, `"peaceful and calming"` 5. **Negative Tags (exclude styles):** - Use `negative_tags`: `"heavy metal, distortion, screaming"` to exclude unwanted elements 6. **Duration Hint:** - Examples: `"60 seconds"`, `"30 second loop"`, `"2 minute track"` - Note: Suno typically generates ~2min, not strictly controllable **Example Suno prompts:** ``` "upbeat lo-fi hip hop, 90 BPM, no vocals, relaxed and chill" → Set: make_instrumental=true "emotional pop ballad, slow tempo, female vocals, melancholic" → Set: vocal_gender="female" "orchestral cinematic trailer music, epic and dramatic, 120 BPM, no vocals" → Set: make_instrumental=true, tags="orchestral,cinematic,epic" "acoustic indie folk, gentle guitar, male vocals, warm and nostalgic" → Set: vocal_gender="male", tags="acoustic,indie,folk" ``` **⚠️ Technical Note for Suno:** > `model_version` inside `parameters.parameters` (e.g., `"sonic-v5"`) is different from the outer `model_version` field (which is `"sonic"`). Always set both correctly. ### Common Parameter Patterns - **n (batch generation)**: Supported by ALL models. Cost = base_credit × n. Creates n independent resources. - **seed**: Supported by most models (-1 = random, >0 = reproducible results) - **prompt_extend**: AI-powered prompt enhancement (video models only) ### Decision Tree: When User Requests Unsupported Features ``` User asks for custom aspect ratio image (e.g. "7:3 landscape") → ❌ Image models don't support custom ratios → ✅ Solution: "图片模型不支持自定义比例。建议用视频模型(Wan 2.6 t2v)生成16:9视频，然后截取首帧作为图片。" User asks for 8K image → ❌ No model supports 8K → ✅ Solution: "当前最高支持4K分辨率(Nano Banana Pro，18积分)。要使用吗？" User asks for video with audio → Check model: Veo 3.1 / Kling O1 / Hailuo have generate_audio → ✅ Solution: "Veo 3.1 和 Kling O1 支持音频生成(需在参数中设置 generate_audio=True)。要用哪个？" User asks for long music (e.g. "5 minute track") → ❌ Duration not user-controllable → ✅ Solution: "Suno 生成约2分钟音乐。需要更长时长可以生成多段后拼接。" User asks for 30s video → Check model: Most models max 15s → ✅ Solution: "当前最长15秒。可选模型：Wan 2.6(15s, 75积分), Kling O1(10s, 96积分)。" ``` **When user requests unsupported combinations:** - Video + audio (unsupported model) → "该模型不支持音频。建议用 Veo 3.1 或 Kling O1 (支持 generate_audio 参数)" - Music + custom duration → "音乐时长由模型固定(Suno约2分钟,DouBao约30秒),无法自定义" - Video duration > 15s → "当前最长15秒。可选模型：Wan 2.6(15s, 75积分), Kling O1(10s, 96积分)" > **Note:** Image-specific unsupported combinations (Midjourney + aspect_ratio, 8K, non-standard ratios) are documented in the "Image Models" section above. --- ## 🧠 User Preference Memory (Video) > User preferences have **highest priority** when they exist. But preferences are only saved when users **explicitly express** model preferences — not from automatic model selection. ### Storage: `~/.openclaw/memory/ima_prefs.json` ```json { "user_{user_id}": { "text_to_video": { "model_id": "wan2.6-t2v", "model_name": "Wan 2.6", "credit": 25, "last_used": "..." }, "image_to_video": { "model_id": "wan2.6-i2v", "model_name": "Wan 2.6", "credit": 25, "last_used": "..." }, "first_last_frame_to_video": { "model_id": "kling-video-o1", "model_name": "Kling O1", "credit": 48, "last_used": "..." }, "reference_image_to_video": { "model_id": "kling-video-o1", "model_name": "Kling O1", "credit": 48, "last_used": "..." } } } ``` ### Model Selection Flow (Video Generation) **Step 1: Get knowledge-ai recommendation** (if installed) ```python knowledge_recommended_model = read_ima_knowledge_ai() # e.g., "Wan 2.6" ``` **Step 2: Check user preference** ```python user_pref = load_prefs().get(f"user_{user_id}", {}).get(task_type) # e.g., {"model_id": "kling-video-o1", ...} ``` **Step 3: Decide which model to use** ```python if user_pref exists: use_model = user_pref["model_id"] # Highest priority else: use_model = knowledge_recommended_model or fallback_default ``` **Step 4: Check for mismatch (for later hint)** ```python if user_pref exists and knowledge_recommended_model != user_pref["model_id"]: mismatch = True # Will add hint in success message ``` ### When to Write (User Explicit Preference ONLY) **✅ Save preference when user explicitly specifies a model:** | User says | Action | |-----------|--------| | `用XXX` / `换成XXX` / `改用XXX` | Switch to model XXX + save as preference | | `以后都用XXX` / `默认用XXX` / `always use XXX` | Save + confirm: `✅ 已记住！以后视频生成默认用 [XXX]` | | `我喜欢XXX` / `我更喜欢XXX` | Save as preference | **❌ Do NOT save when:** - Agent auto-selects from knowledge-ai → not user preference - Agent uses fallback default → not user preference - User says generic quality requests (see "Clear Preference" below) → clear preference instead ### When to Clear (User Abandons Preference) **🗑️ Clear preference when user wants automatic selection:** | User says | Action | |-----------|--------| | `用最好的` / `用最合适的` / `best` / `recommended` | Clear pref + use knowledge-ai recommendation | | `推荐一个` / `你选一个` / `自动选择` | Clear pref + use knowledge-ai recommendation | | `用默认的` / `用新的` | Clear pref + use knowledge-ai recommendation | | `试试别的` / `换个试试` (without specific model) | Clear pref + use knowledge-ai recommendation | | `重新推荐` | Clear pref + use knowledge-ai recommendation | **Implementation:** ```python del prefs[f"user_{user_id}"][task_type] save_prefs(prefs) ``` --- ## ⭐ Model Selection Priority (Video) **Selection flow:** 1. **User preference** (if exists) → Highest priority, always respect 2. **ima-knowledge-ai skill** (if installed) → Professional recommendation based on task 3. **Fallback defaults** → Use table below (only if neither 1 nor 2 exists) **Important notes:** - User preference is only saved when user **explicitly specifies** a model (see "When to Write" above) - Knowledge-ai is **always consulted** (even when user pref exists) to detect mismatches - When mismatch detected → add gentle hint in success message (does NOT interrupt generation) > The defaults below are FALLBACK only. User preferences have highest priority, then knowledge-ai recommendations. --- ## 💬 User Experience Protocol (IM / Feishu / Discord) v2.0 🆕 > **v2.0 Updates (aligned with ima-image-ai v1.3):** > - Added Step 0 for correct message ordering (fixes group chat bug) > - Added Step 5 for explicit task completion > - Enhanced Midjourney support with proper timing estimates > - Now 6 steps total (0-5): Acknowledgment → Pre-Gen → Progress → Success/Failure → Done > > This skill runs inside IM platforms (Feishu, Discord via OpenClaw). > Generation takes 10 seconds (music) up to 6 minutes (video). **Never let users wait in silence.** > Always follow all 6 steps below, every single time. ### 🗣️ User-Friendly First, Transparent on Request Default to plain-language updates in normal user flows. If users ask for technical details, provide them transparently (script name, endpoints, and key parameters). In standard progress messages, prioritize: **model name, estimated/actual time, credits consumed, result URL, and natural-language status updates.** --- ### Estimated Generation Time (All Task Types) | Task Type | Model | Estimated Time | Poll Every | Send Progress Every | |-----------|-------|---------------|------------|---------------------| | **text_to_image** | SeeDream 4.5 | 25~60s | 5s | 20s | | | Nano Banana2 💚 | 20~40s | 5s | 15s | | | Nano Banana Pro | 60~120s | 5s | 30s | | | Midjourney 🎨 | 40~90s | 8s | 25s | | **image_to_image** | SeeDream 4.5 | 25~60s | 5s | 20s | | | Nano Banana2 💚 | 20~40s | 5s | 15s | | | Nano Banana Pro | 60~120s | 5s | 30s | | | Midjourney 🎨 | 40~90s | 8s | 25s | | **text_to_video** | Wan 2.6, Hailuo 2.0/2.3, Vidu Q2, Pixverse | 60~120s | 8s | 30s | | | SeeDance 1.5 Pro, Kling 2.6, Veo 3.1 | 90~180s | 8s | 40s | | | Kling O1, Sora 2 Pro | 180~360s | 8s | 60s | | **image_to_video** | Same ranges as text_to_video | — | 8s | 40s | | **first_last_frame / reference** | Kling O1, Veo 3.1 | 180~360s | 8s | 60s | | **text_to_music** | DouBao BGM / Song | 10~25s | 5s | 10s | | | Suno (sonic-v5) | 20~45s | 5s | 15s | | **text_to_speech** | (varies by model) | 5~30s | 3s | 10s | `estimated_max_seconds` = upper bound of the range (e.g. 60 for SeeDream 4.5, 40 for Nano Banana2, 120 for Nano Banana Pro, 90 for Midjourney, 180 for Kling 2.6, 360 for Kling O1). --- ### Step 0 — Initial Acknowledgment Reply (Normal Reply) 🆕 **⚠️ CRITICAL:** This step is essential for correct message ordering in IM platforms (Feishu, Discord). **Before doing anything else**, reply to the user with a friendly acknowledgment message using your **normal reply** (not `message` tool). This reply will automatically appear FIRST in the conversation. **Example acknowledgment messages:** For images: ``` 好的!来帮你画一只萌萌的猫咪 🐱 ``` ``` 收到！马上为你生成一张 16:9 的风景照 🏔️ ``` ``` OK! Starting image generation with SeeDream 4.5 🎨 ``` For videos: ``` 好的!来帮你生成一段视频 🎬 ``` ``` 收到！开始用 Wan 2.6 生成视频 🎥 ``` For music: ``` 好的!来帮你创作一首音乐 🎵 ``` **Rules:** - Keep it short and warm (< 15 words) - Match the user's language (Chinese/English) - Include relevant emoji (🐱/🎨/🎬/🎵/✨) - This is your ONLY normal reply — all subsequent updates use `message` tool **Why this matters:** - Normal replies automatically appear FIRST in the conversation thread - `message` tool pushes appear in chronological order AFTER your initial reply - This ensures users see: "好的!" → "🎨 开始生成..." → "⏳ 进度..." → "✅ 成功!" (correct order) - Without Step 0, the confirmation might appear LAST, confusing users --- ### Step 1 — Pre-Generation Notification (Push via message tool) **After Step 0 reply**, use the `message` tool to push a notification immediately: ``` [Emoji] 开始生成 [内容类型]，请稍候… • 模型：[Model Name] • 预计耗时：[X ~ Y 秒] • 消耗积分：[N pts] ``` **Emoji by content type:** - 图片 → `🎨` - 视频 → `🎬`（加注:视频生成需要较长时间，我会定时汇报进度） - 音乐 → `🎵` **Cost transparency (new requirement):** - Always show credit cost with model tier context - For expensive models (>50 pts), offer cheaper alternative proactively - Examples: - Balanced (default): "使用 Wan 2.6（25 积分，最新 Wan）" - Premium (user explicit): "使用高端模型 Kling O1（48-120 积分），质量最佳" - Premium (auto-selected): "使用 Wan 2.6（25 积分）。若需更高质量可选 Kling O1（48 积分起）" - Budget (user asked): "使用 Vidu Q2（5 积分，最省钱）" > Adapt language to match the user (Chinese / English). For video, always add a note that it takes longer. For expensive models, always mention cheaper alternatives unless user explicitly requested premium. --- ### Step 2 — Progress Updates Poll the task detail API every `[Poll Every]` seconds per the table. Send a progress update every `[Send Progress Every]` seconds. ``` ⏳ 正在生成中… [P]% 已等待 [elapsed]s，预计最长 [max]s ``` **Progress formula:** ``` P = min(95, floor(elapsed_seconds / estimated_max_seconds * 100)) ``` - **Cap at 95%** — never reach 100% until the API confirms `success` - If `elapsed > estimated_max`: freeze at 95%, append `「快了，稍等一下…」` - For video with max=360s: at 120s → 33%, at 250s → 69%, at 400s → 95% (frozen) --- ### Step 3 — Success Notification When task status = `success`: #### For Video Tasks (text_to_video / image_to_video / first_last_frame / reference_image) **3.1 Send video player first** (IM platforms like Feishu will render inline player): ```python # Get result URL from script output or task detail API result = get_task_result(task_id) video_url = result["medias"][0]["url"] # Build caption caption = f"""✅ 视频生成成功！ • 模型：[Model Name] • 耗时：预计 [X~Y]s，实际 [actual]s • 消耗积分：[N pts] [视频描述]""" # Add mismatch hint if user pref conflicts with knowledge-ai recommendation if user_pref_exists and knowledge_recommended_model != used_model: caption += f""" 💡 提示：当前任务也许用 {knowledge_recommended_model} 也会不错（{reason}，{cost} pts）""" # Send video with caption (use message tool if available) message( action="send", media=video_url, # ⚠️ Use HTTPS URL directly, NOT local file path caption=caption ) ``` **Important:** - Hint is **non-intrusive** — does NOT interrupt generation - Only shown when user pref conflicts with knowledge-ai recommendation - User can ignore the hint; video is already delivered **3.2 Then send link as text** (for copying/sharing): ```python # Send link message immediately after video message(action="send", text=f"🔗 视频链接（可复制分享）：\n{video_url}") ``` **⚠️ Critical for video:** - Send video player FIRST (inline preview) - Send text link SECOND (for copying) - Include first-frame thumbnail URL if available: `result["medias"][0]["cover"]` #### For Image Tasks (text_to_image / image_to_image) ```python # Build caption caption = f"""✅ 图片生成成功！ • 模型：[Model Name] • 耗时：预计 [X~Y]s，实际 [actual]s • 消耗积分：[N pts] 🔗 原始链接：{image_url}""" # Add mismatch hint if user pref conflicts with knowledge-ai recommendation if user_pref_exists and knowledge_recommended_model != used_model: caption += f""" 💡 提示：当前任务也许用 {knowledge_recommended_model} 也会不错（{reason}，{cost} pts）""" # Send image with caption message( action="send", media=image_url, caption=caption ) ``` **Important:** - Hint is **non-intrusive** — does NOT interrupt generation - Only shown when user pref conflicts with knowledge-ai recommendation - User can ignore the hint; image is already delivered #### For Music Tasks (text_to_music) Send audio file with player: ``` ✅ 音乐生成成功！ • 模型：[Model Name] • 耗时：预计 [X~Y]s，实际 [actual]s • 消耗积分：[N pts] • 时长：约 [duration] [音频URL或直接发送音频文件] ``` #### For TTS Tasks (text_to_speech) — Full UX Protocol (Steps 0–5) **Step 0 — Initial acknowledgment (normal reply)** First reply with a short acknowledgment, e.g.: 好的，正在帮你把这段文字转成语音。 / OK, converting this text to speech. **Step 1 — Pre-generation (message tool)** Push once: ``` 🔊 开始语音合成，请稍候… • 模型：[Model Name] • 预计耗时：[X ~ Y 秒] • 消耗积分：[N pts] ``` **Step 2 — Progress** Poll every 2–5s. Every 10–15s send: `⏳ 语音合成中… [P]%`，已等待 [elapsed]s，预计最长 [max]s. Cap progress at 95% until API returns success. **Step 3 — Success (message tool)** When `resource_status == 1` and `status != "failed"`, send **media** = `medias[0].url` and **caption**: ``` ✅ 语音合成成功！ • 模型：[Model Name] • 耗时：实际 [actual]s • 消耗积分：[N pts] 🔗 原始链接：[url] ``` Use the **URL** from the API (do not use local file paths). **Step 4 — Failure (message tool)** On failure, send user-friendly message. **TTS error translation (do not expose raw API errors):** | Technical | ✅ Say (CN) | ✅ Say (EN) | |-----------|-------------|-------------| | 401 Unauthorized | 密钥无效或未授权，请至 imaclaw.ai 生成新密钥 | API key invalid; generate at imaclaw.ai | | 4008 Insufficient points | 积分不足，请至 imaclaw.ai 购买积分 | Insufficient points; buy at imaclaw.ai | | Invalid product attribute | 参数配置异常，请稍后重试 | Configuration error, try again later | | Error 6006 / 6010 | 积分或参数不匹配，请换模型或重试 | Points/params mismatch, try another model | | resource_status == 2 / status failed | 语音合成失败，建议换模型或缩短文本 | Synthesis failed, try another model or shorter text | | timeout | 合成超时，请稍后重试 | Timed out, try again later | | Network error | 网络不稳定，请检查后重试 | Network unstable, check and retry | | Text too long (TTS) | 文本过长，请缩短后重试 | Text too long, please shorten | Links: API key — https://www.imaclaw.ai/imaclaw/apikey ；Credits — https://www.imaclaw.ai/imaclaw/subscription **Step 5 — Done** After Step 0–4, no further reply needed. Do not send duplicate confirmations. --- ### Step 4 — Failure Notification When task status = `failed` or any API/network error, send: ``` ❌ [内容类型]生成失败 • 原因：[natural_language_error_message] • 建议改用： - [Alt Model 1]（[特点]，[N pts]） - [Alt Model 2]（[特点]，[N pts]）需要我帮你用其他模型重试吗？ ``` **⚠️ CRITICAL: Error Message Translation** **NEVER show technical error messages to users.** Always translate API errors into natural language. **API key & credits:** 密钥与积分管理入口为 imaclaw.ai（与 imastudio.com 同属 IMA 平台）。Key and subscription management: imaclaw.ai (same IMA platform as imastudio.com). | Technical Error | ❌ Never Say | ✅ Say Instead (Chinese) | ✅ Say Instead (English) | |----------------|-------------|------------------------|------------------------| | `401 Unauthorized` 🆕 | Invalid API key / 401 Unauthorized | ❌ API密钥无效或未授权<br>💡 **生成新密钥**: https://www.imaclaw.ai/imaclaw/apikey | ❌ API key is invalid or unauthorized<br>💡 **Generate API Key**: https://www.imaclaw.ai/imaclaw/apikey | | `4008 Insufficient points` 🆕 | Insufficient points / Error 4008 | ❌ 积分不足，无法创建任务<br>💡 **购买积分**: https://www.imaclaw.ai/imaclaw/subscription | ❌ Insufficient points to create this task<br>💡 **Buy Credits**: https://www.imaclaw.ai/imaclaw/subscription | | `"Invalid product attribute"` / `"Insufficient points"` | Invalid product attribute | 生成参数配置异常，请稍后重试 | Configuration error, please try again later | | `Error 6006` (credit mismatch) | Error 6006 | 积分计算异常，系统正在修复 | Points calculation error, system is fixing | | `Error 6009` (no matching rule) | Error 6009 | 参数组合不匹配，已自动调整 | Parameter mismatch, auto-adjusted | | `Error 6010` (attribute_id mismatch) | Attribute ID does not match | 模型参数不匹配，请尝试其他模型 | Model parameters incompatible, try another model | | `error 400` (bad request) | error 400 / Bad request | 请求参数有误，请稍后重试 | Invalid request parameters, please try again | | `resource_status == 2` | Resource status 2 / Failed | 生成过程遇到问题，建议换个模型试试 | Generation failed, please try another model | | `status == "failed"` (no details) | Task failed | 这次生成没成功，要不换个模型试试？ | Generation unsuccessful, try a different model? | | `timeout` | Task timed out / Timeout error | 生成时间过长已超时，建议用更快的模型 | Generation took too long, try a faster model | | Network error / Connection refused | Connection refused / Network error | 网络连接不稳定，请检查网络后重试 | Network connection unstable, check network and retry | | Rate limit exceeded | 429 Too Many Requests / Rate limit | 请求过于频繁，请稍等片刻再试 | Too many requests, please wait a moment | | Prompt moderation (Sora only) | Content policy violation | 提示词包含敏感内容，请修改后重试 | Prompt contains restricted content, please modify | | Model unavailable | Model not available / 503 Service Unavailable | 当前模型暂时不可用，建议换个模型 | Model temporarily unavailable, try another model | | **Lyrics format error (Suno only)** 🎵 | Invalid lyrics format | 歌词格式有误，请调整后重试 | Lyrics format error, adjust and retry | | **Prompt too short/long (Music)** 🎵 | Prompt length invalid | 音乐描述过短或过长，请调整到合适长度 (建议20-100字) | Music description too short or long, adjust to appropriate length (20-100 chars recommended) | | **Text too long (TTS)** 🔊 | TTS text length | 文本过长，请缩短后重试 | Text too long, please shorten and retry | **Generic fallback (when error is unknown):** - Chinese: `生成过程遇到问题，请稍后重试或换个模型试试` - English: `Generation encountered an issue, please try again or use another model` **Best Practices:** 1. **Focus on user action**: Tell users what to do next, not what went wrong technically 2. **Be reassuring**: Use phrases like "建议换个模型试试" instead of "失败了" 3. **Avoid blame**: Never say "你的提示词有问题" → say "提示词需要调整一下" 4. **Provide alternatives**: Always suggest 1-2 alternative models in the failure message 5. **🆕 Include actionable links (v1.0.8+)**: For 401/4008 errors, provide clickable links to API key generation or credit purchase pages 6. **🎵 Music-specific (v1.2.0+)**: - For Suno lyrics errors, suggest simplifying lyrics or using auto-generated lyrics (`auto_lyrics=true`) - For prompt length errors, give example length (e.g., "建议20-100字") - For BGM requests, recommend DouBao BGM over Suno 7. **🔊 TTS-specific:** Use the TTS error translation table in "For TTS Tasks (text_to_speech)" above; suggest another model via `--list-models` or shortening text. --- ### Step 5 — Done (No Further Action Needed) 🆕 **After sending Step 3 (success) or Step 4 (failure):** 1. **DO NOT send any additional messages** unless the user asks a follow-up question 2. **The task is complete** — wait for the user's next request 3. **User preference has been saved** (if generation succeeded) 4. **The conversation is ready** for the next generation request **Why this step matters:** - Prevents unnecessary "anything else?" messages that clutter the chat - Allows users to naturally continue the conversation when ready - Respects the asynchronous nature of IM platforms **Exception:** If the user explicitly asks "还有别的吗？" or similar, then respond naturally. --- **🆕 Enhanced Error Handling (v1.0.8):** The Reflection mechanism (3 automatic retries) now provides **specific, actionable suggestions** for common errors: - **401 Unauthorized**: System suggests generating a new API key with clickable link - **4008 Insufficient Points**: System suggests purchasing credits with clickable link - **500 Internal Server Error**: Automatic parameter degradation (size, resolution, duration, quality) - **6009 No Rule Match**: Automatic parameter completion from credit_rules - **6010 Attribute Mismatch**: Automatic credit_rule reselection - **Timeout**: Helpful info with dashboard link for background task status All error handling is **automatic and transparent** — users receive natural language explanations with next steps. **Failure fallback by task type:** | Task Type | Failed Model | First Alt | Second Alt | |-----------|-------------|-----------|------------| | text_to_image | SeeDream 4.5 | Nano Banana2 (4pts, fast) | Nano Banana Pro (10-18pts, premium) | | text_to_image | Nano Banana2 | SeeDream 4.5 (5pts, better quality) | Nano Banana Pro (10-18pts) | | text_to_image | Nano Banana Pro | SeeDream 4.5 (5pts) | Nano Banana2 (4pts, budget) | | image_to_image | SeeDream 4.5 | Nano Banana2 (4pts, fast) | Nano Banana Pro (10pts) | | image_to_image | Nano Banana2 | SeeDream 4.5 (5pts) | Nano Banana Pro (10pts) | | image_to_image | Nano Banana Pro | SeeDream 4.5 (5pts) | Nano Banana2 (4pts) | | text_to_video | Kling O1 | Wan 2.6 (25pts) | Vidu Q2 (5pts) | | text_to_video | Google Veo 3.1 | Kling O1 (48pts) | Sora 2 Pro (122pts) | | text_to_video | Any | Wan 2.6 (25pts, most popular) | Hailuo 2.0 (5pts) | | image_to_video | Wan 2.6 | Kling O1 (48pts) | Hailuo 2.0 i2v (25pts) | | image_to_video | Any | Wan 2.6 (25pts, most popular) | Vidu Q2 Pro (20pts) | | first_last / reference | Kling O1 | Kling 2.6 (80pts) | Veo 3.1 (70pts+) | | **text_to_music** 🎵 | **Suno** | **DouBao BGM (30pts, 背景音乐)** | **DouBao Song (30pts, 歌曲生成)** | | **text_to_music** 🎵 | **DouBao BGM** | **DouBao Song (30pts)** | **Suno (25pts, 功能最强)** | | **text_to_music** 🎵 | **DouBao Song** | **DouBao BGM (30pts)** | **Suno (25pts, 功能最强)** | | **text_to_speech** 🔊 | (any) | Query `--list-models` for alternatives | Use another model_id from product list | **Music-specific failure guidance:** - If Suno fails → Recommend DouBao BGM (for background music) or DouBao Song (for songs) - If DouBao BGM fails → Try DouBao Song first (similar pricing), then Suno (more powerful) - If DouBao Song fails → Try DouBao BGM first (similar pricing), then Suno (more powerful) - For lyrics errors in Suno → Suggest simplifying lyrics or using `auto_lyrics=true` - For prompt length errors → Recommend 20-100 characters **TTS-specific failure guidance:** - If TTS fails → Run `--task-type text_to_speech --list-models` and suggest another model_id; or shorten text / simplify content. Use the TTS error translation table in "For TTS Tasks" above for user-facing messages. --- ## Supported Models at a Glance > **Source: production `GET /open/v1/product/list` (2026-02-27).** Model count reduced significantly. Always query product list API at runtime. ### Image Generation (4 models each) | Category | Name | model_id | Cost | |----------|------|----------|------| | **text_to_image** | SeeDream 4.5 🌟 | `doubao-seedream-4.5` | 5 pts | | text_to_image | Midjourney 🎨 | `midjourney` | 8/10 pts (480p/720p) | | text_to_image | Nano Banana2 💚 | `gemini-3.1-flash-image` | 4/6/10/13 pts | | text_to_image | Nano Banana Pro | `gemini-3-pro-image` | 10/10/18 pts | | **image_to_image** | SeeDream 4.5 🌟 | `doubao-seedream-4.5` | 5 pts | | image_to_image | Midjourney 🎨 | `midjourney` | 8/10 pts (480p/720p) | | image_to_image | Nano Banana2 💚 | `gemini-3.1-flash-image` | 4/6/10/13 pts | | image_to_image | Nano Banana Pro | `gemini-3-pro-image` | 10 pts | **Midjourney attribute_ids**: 5451/5452 (text_to_image), 5453/5454 (image_to_image) **Nano Banana2 size options**: 512px (4pts), 1K (6pts), 2K (10pts), 4K (13pts) **Nano Banana Pro size options**: 1K (10pts), 2K (10pts), 4K (18pts for t2i / 10pts for i2i) #### Image Model Capabilities (Parameter Support) ⚠️ **Critical**: Models have **varying parameter support**. Custom aspect ratios are now **supported by multiple models**. | Model | Custom Aspect Ratio | Max Resolution | Size Options | Notes | |-------|---------------------|----------------|--------------|-------| | **SeeDream 4.5** | ✅ (via virtual params) | 4K (adaptive) | 8 aspect ratios | Supports 1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9 (5 pts) | | **Nano Banana2** | ✅ **Native support** 🆕 | 4K (4096×4096) | 512px/1K/2K/4K + aspect ratios | Supports 1:1, 16:9, 9:16, 4:3, 3:4; size via `attribute_id` | | **Nano Banana Pro** | ✅ **Native support** 🆕 | 4K (4096×4096) | 1K/2K/4K + aspect ratios | Supports 1:1, 16:9, 9:16, 4:3, 3:4; size via `attribute_id` | | **Midjourney** 🎨 | ❌ (1:1 only) | 1024px (square) | 480p/720p via `attribute_id` | Fixed 1024x1024, artistic style focus | **Key Capabilities**: - ✅ **Aspect ratio control**: **SeeDream 4.5** (virtual params), **Nano Banana Pro/2** (native support) - ❌ **8K**: Not supported by any model (max is 4K) - ✅ **Size control**: **Nano Banana2**, **Nano Banana Pro**, and **Midjourney** support multiple size options via different `attribute_id`s -

IMA Studio

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

IMA Studio

IMA Studio

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement