Category: provider
Model Studio Digital Human
Validation
CODEBLOCK0
Pass criteria: command exits 0 and output/aliyun-wan-digital-human/validate.txt is generated.
Output And Evidence
- - Save normalized request payloads, chosen resolution, and task polling snapshots under
output/aliyun-wan-digital-human/. - Record image/audio URLs and whether the input image passed detection.
Use this skill for image + audio driven speaking, singing, or presenting characters.
Critical model names
Use these exact model strings:
Selection guidance:
- - Run
wan2.2-s2v-detect first to validate the image. - Use
wan2.2-s2v for the actual video generation job.
Prerequisites
- - China mainland (Beijing) only.
- Set
DASHSCOPE_API_KEY in your environment, or add dashscope_api_key to ~/.alibabacloud/credentials. - Input audio should contain clear speech or singing, and input image should depict a clear subject.
Normalized interface (video.digital_human)
Detect Request
- -
model (string, optional): default INLINECODE10 - INLINECODE11 (string, required)
Generate Request
- -
model (string, optional): default INLINECODE13 - INLINECODE14 (string, required)
- INLINECODE15 (string, required)
- INLINECODE16 (string, optional):
480P or INLINECODE18 - INLINECODE19 (string, optional):
talk, sing, or INLINECODE22
Response
- -
task_id (string) - INLINECODE24 (string)
- INLINECODE25 (string, when finished)
Quick start
CODEBLOCK1
Operational guidance
- - Use a portrait, half-body, or full-body image with a clear face and stable framing.
- Match audio length to the desired output duration; the output follows the audio length up to the model limit.
- Keep image and audio as public HTTP/HTTPS URLs.
- If the image fails detection, do not proceed directly to video generation.
Output location
- - Default output: INLINECODE26
- Override base dir with
OUTPUT_DIR.
References
技能名称: aliyun-wan-digital-human
分类: provider
模型工作室数字人
验证
bash
mkdir -p output/aliyun-wan-digital-human
python -m pycompile skills/ai/video/aliyun-wan-digital-human/scripts/preparedigitalhumanrequest.py && echo pycompileok > output/aliyun-wan-digital-human/validate.txt
通过标准:命令退出码为0,且生成了 output/aliyun-wan-digital-human/validate.txt 文件。
输出与证据
- - 将标准化请求负载、所选分辨率和任务轮询快照保存到 output/aliyun-wan-digital-human/ 目录下。
- 记录图像/音频URL以及输入图像是否通过检测。
使用此技能实现图像+音频驱动的说话、唱歌或表演角色。
关键模型名称
请使用以下精确的模型字符串:
- - wan2.2-s2v-detect
- wan2.2-s2v
选择指南:
- - 首先运行 wan2.2-s2v-detect 验证图像。
- 使用 wan2.2-s2v 进行实际的视频生成任务。
前提条件
- - 仅限中国大陆(北京)区域。
- 在环境中设置 DASHSCOPEAPIKEY,或将 dashscopeapikey 添加到 ~/.alibabacloud/credentials 文件中。
- 输入音频应包含清晰的语音或歌声,输入图像应描绘清晰的主体。
标准化接口 (video.digital_human)
检测请求
- - model (字符串,可选):默认为 wan2.2-s2v-detect
- image_url (字符串,必填)
生成请求
- - model (字符串,可选):默认为 wan2.2-s2v
- imageurl (字符串,必填)
- audiourl (字符串,必填)
- resolution (字符串,可选):480P 或 720P
- scenario (字符串,可选):talk、sing 或 perform
响应
- - taskid (字符串)
- taskstatus (字符串)
- video_url (字符串,任务完成时返回)
快速开始
bash
python skills/ai/video/aliyun-wan-digital-human/scripts/preparedigitalhuman_request.py \
--image-url https://example.com/anchor.png \
--audio-url https://example.com/voice.mp3 \
--resolution 720P \
--scenario talk
操作指南
- - 使用面部清晰、构图稳定的肖像、半身或全身图像。
- 音频长度应与期望的输出时长匹配;输出时长跟随音频长度,直至模型限制。
- 保持图像和音频为公开的HTTP/HTTPS URL。
- 如果图像未通过检测,请勿直接进行视频生成。
输出位置
- - 默认输出:output/aliyun-wan-digital-human/request.json
- 可通过 OUTPUT_DIR 覆盖基础目录。
参考资料