返回顶部
v

vision-skill

Use this skill for computer vision tasks including image recognition (OCR, object detection) and image generation (text-to-image, image-to-image). Supports asynchronous task execution with Tencent COS storage and Doubao AI models.

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
321
下载量
0
收藏
概述
安装方式
版本历史

vision-skill

# Vision Skill ## Overview This skill provides capabilities for visual recognition and image generation using Doubao AI models. It handles image storage via Tencent Cloud COS and executes tasks asynchronously. ## Capabilities ### 1. Vision Recognition Analyze images to describe content, extract text (OCR), or answer questions about the image. - **Input**: Local image path or URL, optional prompt. - **Process**: Uploads local images to COS, then calls Doubao Vision API. - **Output**: Text description or answer. ### 2. Image Generation Generate images from text prompts, optionally using reference images. - **Text-to-Image**: Generate images from a text description. - **Image-to-Image**: Generate images based on a reference image and text prompt. - **Sequential Generation**: Generate a series of consistent images (e.g., storyboards). ## Usage The skill is exposed via a CLI script `scripts/vision_cli.py`. ### Prerequisites Environment variables must be set in `.env` or the system environment: - `COS_SECRET_ID`, `COS_SECRET_KEY`, `COS_REGION`, `COS_BUCKET_NAME` - `DOUBAO_API_KEY`, `DOUBAO_VISION_MODEL`, `DOUBAO_IMAGE_MODEL` ### Commands #### Vision Recognition ```bash # Basic Usage python3 scripts/vision_cli.py recognize <image_path> --prompt "Describe this image" # Using Presets (--format) # Available formats: invoice, contract, form, slide, whiteboard, table, json, key_value, markdown_note, qa_pairs, code, ocr, analysis python3 scripts/vision_cli.py recognize ./invoice.jpg --format json python3 scripts/vision_cli.py recognize ./screenshot.png --format code # Batch recognition python3 scripts/vision_cli.py recognize ./a.jpg ./b.jpg ./c.jpg --format table --wait --output ./batch_result.json # Quality mode and retry python3 scripts/vision_cli.py recognize ./contract.png --format contract --quality high --retry 3 --wait # Wait for result and save to file python3 scripts/vision_cli.py recognize ./doc.jpg --format ocr --wait --output ./result.txt ``` #### Image Generation ```bash # Text to Image with Style Presets (--style) # Available styles: ppt, business_flat, cartoon, tech_isometric, hand_drawn, icon, photo, anime, sketch python3 scripts/vision_cli.py generate "A cyberpunk city" --style anime # Image to Image python3 scripts/vision_cli.py generate "Make it snowy" --ref <image_path> # Sequential Generation python3 scripts/vision_cli.py generate "A story about a cat" --seq 4 --style cartoon # Wait for result and save image python3 scripts/vision_cli.py generate "App icon for a camera" --style icon --wait --output ./icon.png # Quality mode and retry python3 scripts/vision_cli.py generate "A SaaS architecture illustration" --style tech_isometric --quality high --retry 3 --wait ``` #### Check Status ```bash python3 scripts/vision_cli.py status <task_id> # Or save result if completed python3 scripts/vision_cli.py status <task_id> --output ./final_result.png ``` ## Task Management All tasks are executed asynchronously by default. - Use `--wait` flag to block until completion (useful for Agent workflow). - Use `--output` flag to automatically save text or download images. - Task data is stored in `.tasks/` directory.

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 vision-skill-1776119419 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 vision-skill-1776119419 技能

通过命令行安装

skillhub install vision-skill-1776119419

下载 Zip 包

⬇ 下载 vision-skill v1.0.0

文件大小: 20.72 KB | 发布时间: 2026-4-14 10:02

v1.0.0 最新 2026-4-14 10:02
Initial release of vision-skill, providing end-to-end computer vision and image generation capabilities.

- Supports image recognition (OCR, object detection, content description, Q&A) and flexible image generation (text-to-image, image-to-image, sequential images).
- Integrates with Tencent Cloud COS for image storage and uses Doubao AI models for processing.
- CLI interface via `vision_cli.py` with options for batch tasks, style/format presets, quality modes, and retries.
- All tasks execute asynchronously, with options to wait for completion and save outputs.
- Comprehensive environment variable setup and task management through a local `.tasks/` directory.

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部