agent-desktop

# agent-desktop CLI tool enabling AI agents to observe and control desktop applications via native OS accessibility trees. **Core principle:** agent-desktop is NOT an AI agent. It is a tool that AI agents invoke. It outputs structured JSON with ref-based element identifiers. The observation-action loop lives in the calling agent. ## Installation ```bash npm install -g agent-desktop # or bun install -g --trust agent-desktop ``` Requires macOS 12+ with Accessibility permission granted to your terminal. ## Reference Files Detailed documentation is split into focused reference files. Read them as needed: | Reference | Contents | |-----------|----------| | `references/commands-observation.md` | snapshot, find, get, is, screenshot, list-surfaces — all flags, output examples | | `references/commands-interaction.md` | click, type, set-value, select, toggle, scroll, drag, keyboard, mouse — choosing the right command | | `references/commands-system.md` | launch, close, windows, clipboard, wait, batch, status, permissions, version | | `references/workflows.md` | 12 common patterns: forms, menus, dialogs, scroll-find, drag-drop, async wait, anti-patterns | | `references/macos.md` | macOS permissions/TCC, AX API internals, smart activation chain, surfaces, Notification Center, troubleshooting | ## The Observe-Act Loop Every automation follows this pattern: ``` 1. OBSERVE → agent-desktop snapshot --app "App Name" -i 2. REASON → Parse JSON, find target element by ref (@e1, @e2...) 3. ACT → agent-desktop click @e5 (or type, select, toggle...) 4. VERIFY → agent-desktop snapshot again to confirm state change 5. REPEAT → Continue until task is complete ``` Always snapshot before acting. Refs are snapshot-scoped and become stale after UI changes. ## Ref System - Refs assigned depth-first: `@e1`, `@e2`, `@e3`... - Only interactive elements get refs: button, textfield, checkbox, link, menuitem, tab, slider, combobox, treeitem, cell - Static text, groups, containers remain in tree for context but have no ref - Refs are deterministic within a snapshot but NOT stable across snapshots if UI changed - After any action that changes UI, run `snapshot` again for fresh refs ## JSON Output Contract Every command returns a JSON envelope on stdout: **Success:** `{ "version": "1.0", "ok": true, "command": "snapshot", "data": { ... } }` **Error:** `{ "version": "1.0", "ok": false, "command": "click", "error": { "code": "STALE_REF", "message": "...", "suggestion": "..." } }` Exit codes: `0` success, `1` structured error, `2` argument error. ### Error Codes | Code | Meaning | Recovery | |------|---------|----------| | `PERM_DENIED` | Accessibility permission not granted | Grant in System Settings > Privacy > Accessibility | | `ELEMENT_NOT_FOUND` | Ref not in current refmap | Re-run snapshot, use fresh ref | | `APP_NOT_FOUND` | App not running | Launch it first | | `ACTION_FAILED` | AX action rejected | Try alternative approach or coordinate-based click | | `ACTION_NOT_SUPPORTED` | Element can't do this | Use different command | | `STALE_REF` | Ref from old snapshot | Re-run snapshot | | `WINDOW_NOT_FOUND` | No matching window | Check app name, use list-windows | | `TIMEOUT` | Wait condition not met | Increase --timeout | | `INVALID_ARGS` | Bad arguments | Check command syntax | ## Command Quick Reference (54 commands) ### Observation ``` agent-desktop snapshot --app "App" -i # Accessibility tree with refs agent-desktop screenshot --app "App" out.png # PNG screenshot agent-desktop find --app "App" --role button # Search elements agent-desktop get @e1 --property text # Read element property agent-desktop is @e1 --property enabled # Check element state agent-desktop list-surfaces --app "App" # Available surfaces ``` ### Interaction ``` agent-desktop click @e5 # Click element agent-desktop double-click @e3 # Double-click agent-desktop triple-click @e2 # Triple-click (select line) agent-desktop right-click @e5 # Right-click (context menu) agent-desktop type @e2 "hello" # Type text into element agent-desktop set-value @e2 "new value" # Set value directly agent-desktop clear @e2 # Clear element value agent-desktop focus @e2 # Set keyboard focus agent-desktop select @e4 "Option B" # Select dropdown option agent-desktop toggle @e6 # Toggle checkbox/switch agent-desktop check @e6 # Idempotent check agent-desktop uncheck @e6 # Idempotent uncheck agent-desktop expand @e7 # Expand disclosure agent-desktop collapse @e7 # Collapse disclosure agent-desktop scroll @e1 --direction down # Scroll element agent-desktop scroll-to @e8 # Scroll into view ``` ### Keyboard & Mouse ``` agent-desktop press cmd+c # Key combo agent-desktop press return --app "App" # Targeted key press agent-desktop key-down shift # Hold key agent-desktop key-up shift # Release key agent-desktop hover @e5 # Cursor to element agent-desktop hover --xy 500,300 # Cursor to coordinates agent-desktop drag --from @e1 --to @e5 # Drag between elements agent-desktop mouse-click --xy 500,300 # Click at coordinates agent-desktop mouse-move --xy 100,200 # Move cursor agent-desktop mouse-down --xy 100,200 # Press mouse button agent-desktop mouse-up --xy 300,400 # Release mouse button ``` ### App & Window ``` agent-desktop launch "System Settings" # Launch and wait agent-desktop close-app "TextEdit" # Quit gracefully agent-desktop close-app "TextEdit" --force # Force kill agent-desktop list-windows --app "Finder" # List windows agent-desktop list-apps # List running GUI apps agent-desktop focus-window --app "Finder" # Bring to front agent-desktop resize-window --app "App" --width 800 --height 600 agent-desktop move-window --app "App" --x 0 --y 0 agent-desktop minimize --app "App" agent-desktop maximize --app "App" agent-desktop restore --app "App" ``` ### Notifications ``` agent-desktop list-notifications # List all notifications agent-desktop list-notifications --app "Slack" # Filter by app agent-desktop list-notifications --text "deploy" --limit 5 # Filter by text agent-desktop dismiss-notification 1 # Dismiss by index agent-desktop dismiss-all-notifications # Dismiss all agent-desktop dismiss-all-notifications --app "Slack" # Dismiss all from app agent-desktop notification-action 1 --action "Reply" # Click action button ``` ### Clipboard ``` agent-desktop clipboard-get # Read clipboard agent-desktop clipboard-set "text" # Write to clipboard agent-desktop clipboard-clear # Clear clipboard ``` ### Wait ``` agent-desktop wait 1000 # Pause 1 second agent-desktop wait --element @e5 --timeout 5000 # Wait for element agent-desktop wait --window "Title" # Wait for window agent-desktop wait --text "Done" --app "App" # Wait for text agent-desktop wait --menu --app "App" # Wait for context menu agent-desktop wait --menu-closed --app "App" # Wait for menu dismissal agent-desktop wait --notification --app "App" # Wait for new notification ``` ### System ``` agent-desktop status # Health check agent-desktop permissions # Check permission agent-desktop permissions --request # Trigger permission dialog agent-desktop version --json # Version info agent-desktop batch '[...]' --stop-on-error # Batch commands ``` ## Key Principles for Agents 1. **Always snapshot first.** Never assume UI state. 2. **Use `-i` flag.** Filters to interactive elements only, reducing tokens. 3. **Refs are ephemeral.** Snapshot again after any UI-changing action. 4. **Prefer refs over coordinates.** `click @e5` > `mouse-click --xy 500,300`. 5. **Use `wait` for async UI.** After launch/dialog triggers, wait for expected state. 6. **Check permissions first.** Run `permissions` on first use. 7. **Handle errors.** Parse `error.code` and follow `error.suggestion`. 8. **Use `find` for targeted searches.** Faster than full snapshot when you know role/name. 9. **Use surfaces for menus.** `snapshot --surface menu` captures open menus. 10. **Batch for performance.** Multiple commands in one invocation.

agent-desktop

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

agent-desktop

agent-desktop

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement