CAID Multi-Agent Coordination

This skill implements the Centralized Asynchronous Isolated Delegation (CAID) paradigm for coordinating multiple agents working on shared artifacts.

⚠️ CRITICAL WARNINGS FROM PAPER:

- Use CAID from the outset — Don't run single-agent first as fallback. Sequential strategy costs nearly 2x with minimal gain.
Physical worktree isolation is mandatory — Soft isolation (instruction-only) degrades performance on complex tasks.
Engineer limits are strict — 2 for PaperBench-style, 4 for Commit0-style, never exceed 8.
Higher cost/runtime trade-off — CAID improves accuracy, not speed. Integration is sequential/test-gated.

Core Principles

1. Centralized Task Delegation — A manager agent decomposes tasks into dependency-aware units
Asynchronous Execution — Multiple engineer agents work concurrently
Isolated Workspaces — Each agent works in its own isolated branch/worktree
Structured Integration — Progress is merged via git commit/merge with test verification

When to Use This Skill

Use CAID from the outset for:

- Long-horizon tasks with multiple interdependent files
Clear dependency structure (imports, test mappings)
Parallelizable work exists
Integration can be verified by executable tests

Don't use as fallback: Running single-agent first then CAID is inefficient (cost/runtime nearly additive, minimal performance gain).

Use single-agent for:

- Isolated, single-file changes
No clear parallelization opportunities
Exploratory/research-oriented tasks

Coordination Workflow

0. Manager Pre-Setup (CRITICAL)

Before ANY delegation, the manager must:

1. Prepare runtime environment

- Ensure dependencies installed - Set up virtual environment

2. Organize entry points

- Create main entry files - Ensure import paths work

3. Add minimal function stubs

- Empty function definitions so imports don't fail - Type signatures if available

4. Commit to main branch

- All engineer branches created from consistent base - Without this, engineers start from divergent states

CODEBLOCK0

1. Task Analysis & Dependency Graph Creation

Manager's role: Before delegating, analyze the task structure:

- Identify atomic units of work (files, functions, modules)
Build a dependency graph: G=(V,E) where edges indicate dependencies
Define Ready(v) ⇔ all dependencies of v are completed
Only delegate tasks that are Ready (all dependencies satisfied)

Commit0-style tasks (clear file structure):

1. Check import statements to identify file-level dependencies
Collect executable test cases from repository
Examine which files tests exercise
Identify components to implement earlier (upstream dependencies)
Delegate at file level first — only split to function level if file has many unimplemented functions

PaperBench-style tasks (inferred structure):

1. Read paper to identify main contribution
Infer implementation order from contribution
Use max 2 engineers — manager task is harder, more agents destabilize

Dependency graph construction:
CODEBLOCK1

2. Workspace Isolation Setup

Create PHYSICALLY isolated worktrees (not soft isolation):

CODEBLOCK2

⚠️ WARNING: Soft isolation (same workspace, instruction-level constraints) degrades performance to below single-agent on PaperBench. Physical git worktree isolation is mandatory.

Key isolation principles:

- Each engineer operates in its own git worktree (physical filesystem isolation)
All worktrees are derived from the main branch
Engineers modify files only within their assigned workspace
Restricted files (shared across engineers): __init__.py, config files, global constants — engineers must NOT commit changes to these

3. Dependency-Aware Task Delegation

STRICT Engineer Limits:

Task Type	Max Engineers	Why
PaperBench-style	2	Inferred dependencies; more destabilizes
Commit0-style

⚠️ Critical: Increasing engineers beyond optimal degrades performance due to integration overhead and conflict resolution costs.

Task prioritization heuristics:
Manager should prioritize tasks that:

1. Enable earlier test execution (expose evaluation signals sooner)
Lie closer to upstream of dependency chain
Are simpler functions before complex ones

Round definition:

One round = complete cycle of delegation → implementation → dependency update

Recommended iteration limits (from paper experiments):

Role	Max Iterations
Manager	50
Each Engineer

80 |
| Total Rounds | ~22 (varies by task) |

Delegation algorithm:

CODEBLOCK3

Task assignment JSON format (structured communication — NO free-form dialog):

CODEBLOCK4

Key: All communication uses structured JSON, not free-form dialog. This prevents inter-agent misalignment (primary failure mode in multi-agent systems).

4. Asynchronous Execution Loop

Event loop pattern:

1. Delegate → Manager assigns tasks to available engineers
Execute → Engineers work concurrently in isolated worktrees
Self-Verify → Engineer runs tests, fixes failures
Complete → Engineer submits commit when ALL tests pass
Integrate → Manager attempts merge to main
Conflict Resolution (if needed) → Responsible engineer resolves
Update → Manager updates dependency graph
Repeat → Continue until all tasks complete or limits reached

Engineer self-verification (MANDATORY before submission):

- Run relevant tests that import/reference modified files
If no explicit mapping, run repository's default test command
Any failed test or runtime exception MUST be resolved
Use concrete error logs and tracebacks for iterative refinement
Only submit commit after ALL tests pass

5. Integration via Merge

Merge workflow:

CODEBLOCK5

Main branch is single source of truth throughout execution.

6. Context Management for Manager

To prevent context explosion, manager uses LLMSummarizingCondenser pattern:

CODEBLOCK6

Compressed execution history format:
CODEBLOCK7

7. Worktree Synchronization & Cleanup

State synchronization when main advances:

CODEBLOCK8

Worktree cleanup (after completion or limit reached):

CODEBLOCK9

Worktrees are deleted after all assigned tasks are completed or when the engineer reaches the predefined iteration limit.

8. Termination Conditions

- Success: All units completed and integrated into main
Failure: Maximum rounds/iterations reached with unresolved tasks
Incomplete: Task considered incomplete if any units remain unresolved

Manager iteration limits (from paper):

- Manager: INLINECODE2
Each engineer: INLINECODE3
Total rounds: ~22 (varies by task)

9. Manager Final Review

After the asynchronous loop completes, the manager does a final review before submitting the final product.

Final review checklist:

1. Verify all tasks from dependency graph are completed
Run full test suite: INLINECODE4
Check integration completeness (all commits merged)
Review any unresolved errors or warnings
Validate final state matches expected outcome
Submit final product only after verification

CODEBLOCK10

Implementation Guidelines

Using OpenClaw Sub-agents

For OpenClaw, the sessions_spawn tool enables parallel agent execution:

Spawn engineer agents:

CODEBLOCK11

Check progress:

CODEBLOCK12

Worktree Synchronization

When main advances, update worktrees:

CODEBLOCK13

This ensures engineers work from latest integrated state.

Verification Intensity vs Efficiency Trade-off

From paper analysis (Section 4.4):

Strategy	Pass Rate	Runtime	When to Use
Round-Manager Review	60.2%	3689s	Maximum correctness required
Engineer Self-Verification

55.1% | 2244s | Default - balanced |
| Efficiency-Prioritized | 54.0% | 1909s | Time-critical, acceptable risk |

Default: Engineer self-verification without repeated manager review.

Common Pitfalls & Solutions

Pitfall	Solution
Using CAID as fallback after single-agent fails	Use from outset; sequential costs ~2x with minimal gain
Soft isolation (instruction-only)

Cost/Runtime Expectations

CAID trade-offs (vs single-agent):

- Higher API cost — Multiple agents = more LLM calls
Similar or longer wall-clock time — Integration is sequential/test-gated
Substantially higher accuracy — +26.7% PaperBench, +14.3% Commit0

When worth it: Long-horizon shared-artifact tasks where correctness matters more than speed.

Example Workflows

See references/examples.md for concrete implementation examples including:

- Commit0-style library implementation
PaperBench-style paper reproduction
Bug fixing (single-file vs multi-file)
Feature addition with API and frontend

References

- Paper: "Effective Strategies for Asynchronous Software Engineering Agents" (arXiv:2603.21489v1)
GitHub: https://github.com/JiayiGeng/async-swe-agents
Built on OpenHands agent SDK principles

任务类型	最大工程师数	原因
PaperBench风格	2	依赖关系是推断的；更多会导致不稳定
Commit0风格

角色	最大迭代次数
管理智能体	50
每个工程师

caid-multi-agentCAID多智能体

caid-multi-agent

CAID Multi-Agent Coordination

Core Principles

When to Use This Skill

Coordination Workflow

0. Manager Pre-Setup (CRITICAL)

1. Task Analysis & Dependency Graph Creation

2. Workspace Isolation Setup

3. Dependency-Aware Task Delegation

4. Asynchronous Execution Loop

5. Integration via Merge

6. Context Management for Manager

7. Worktree Synchronization & Cleanup

8. Termination Conditions

9. Manager Final Review

Implementation Guidelines

Using OpenClaw Sub-agents

Worktree Synchronization

Verification Intensity vs Efficiency Trade-off

Common Pitfalls & Solutions

Cost/Runtime Expectations

Example Workflows

References

CAID 多智能体协调

核心原则

何时使用此技能

协调工作流

0. 管理智能体预设置（关键）

预设置提交

1. 任务分析与依赖图创建

2. 工作空间隔离设置

主分支是唯一真实来源

等等

3. 依赖感知的任务委派

4. 异步执行循环

5. 通过合并进行集成

管理智能体尝试合并

如果冲突：

1. 产生冲突提交的工程师负责解决

2. 工程师拉取最新的主分支：git pull origin main

3. 在本地解决冲突

4. 重新运行测试以确保解决没有破坏任何内容

5. 重新提交

6. 管理智能体重试合并

6. 管理智能体的上下文管理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement