Agent Audit
Scan your entire OpenClaw setup and get actionable cost/performance recommendations.
What This Skill Does
- 1. Scans config — reads OpenClaw config to map models to agents/tasks
- Analyzes cron history — checks every cron job's model, token usage, runtime, success rate
- Classifies tasks — determines complexity level of each task
- Calculates costs — per agent, per cron, per task type using provider pricing
- Recommends changes — with confidence levels and risk warnings
- Generates report — markdown report with specific savings estimates
Running the Audit
CODEBLOCK0
Options:
CODEBLOCK1
How It Works
Phase 1: Discovery
- - Read OpenClaw config (
~/.openclaw/openclaw.json or similar) - List all cron jobs and their configurations
- List all agents and their default models
- Detect provider (Anthropic, OpenAI, Google, xAI) from model names
Phase 2: History Analysis
- - Pull cron job run history (last 7 days by default)
- Calculate per-job: avg tokens, avg runtime, success rate, model used
- Pull session history where available
- Calculate total token spend by model tier
Phase 3: Task Classification
Classify each task into complexity tiers:
| Tier | Examples | Recommended Models |
|---|
| Simple | Health checks, status reports, reminders, notifications | Cheapest tier (Haiku, GPT-4o-mini, Flash, Grok-mini) |
| Medium |
Content drafts, research, summarization, data analysis | Mid tier (Sonnet, GPT-4o, Pro, Grok) |
|
Complex | Coding, architecture, security review, nuanced writing | Top tier (Opus, GPT-4.5, Ultra, Grok-2) |
Classification signals:
- - Simple: Short output (<500 tokens), low thinking requirement, repetitive pattern, status/health tasks
- Medium: Medium output, some reasoning needed, creative but templated, research tasks
- Complex: Long output, multi-step reasoning, code generation, security-critical, tasks that previously failed on weaker models
Phase 4: Recommendations
For each task where the model tier doesn't match complexity:
CODEBLOCK2
Safety Rules — NEVER Recommend Downgrading:
- - Coding/development tasks
- Security reviews or audits
- Tasks that have previously failed on weaker models
- Tasks where the user explicitly chose a higher model
- Complex multi-step reasoning tasks
- Anything the user flagged as critical
Phase 5: Report Generation
Output a clean markdown report with:
- 1. Overview — total agents, crons, monthly spend estimate
- Per-agent breakdown — model, usage, cost
- Per-cron breakdown — model, frequency, avg tokens, cost
- Recommendations — sorted by savings potential
- Total potential savings — monthly estimate
- One-liner config changes — exact model strings to swap
Model Pricing Reference
See references/model-pricing.md for current pricing across all providers.
Update this file when prices change.
Task Classification Details
See references/task-classification.md for detailed heuristics
on how tasks are classified into complexity tiers.
Important Notes
- - This skill is read-only — it never changes your config automatically
- All recommendations include risk levels and confidence scores
- When unsure about a task's complexity, it defaults to keeping the current model
- The audit should be re-run periodically (monthly) as usage patterns change
- Token counts are estimates based on cron history — actual costs depend on your provider's billing