agent-paddleocr-vision

# Agent PaddleOCR Vision **OCR with Agent Actions — powered by PaddleOCR only.** Automatically classifies documents and provides actionable prompts. ## What It Does - OCR extraction via **PaddleOCR cloud API** (requires credentials) - 11 document types: invoice, business card, receipt, table, contract, ID card, passport, bank statement, driver's license, tax form, general - Action suggestion with structured parameters - Batch processing - Searchable PDF generation (with bbox alignment) ## Quick Start ```bash # Install dependencies pip3 install -r scripts/requirements.txt # Configure PaddleOCR API export PADDLEOCR_DOC_PARSING_API_URL=https://your-api.paddleocr.com/layout-parsing export PADDLEOCR_ACCESS_TOKEN=your_token # Process a file python3 scripts/doc_vision.py --file-path ./invoice.jpg --pretty --make-searchable-pdf ``` ## Batch ```bash python3 scripts/doc_vision.py --batch-dir ./inbox --output-dir ./out ``` ## Output See `docs/README.zh.md` for full JSON schema and integration guide. ## Supported Types | Type | Actions | |------|---------| | Invoice | create_expense, archive, tax_report | | Business Card | add_contact, save_vcard | | Receipt | create_expense, split_bill | | Table | export_csv, analyze_data | | Contract | summarize, extract_dates, flag_obligations | | ID Card | extract_id_info, verify_age | | Passport | store_passport_info, check_validity | | Bank Statement | categorize_transactions, generate_report | | Driver License | store_license_info, check_expiry | | Tax Form | summarize_tax, suggest_deductions | | General | summarize, translate, search_keywords | ## Configuration Required environment variables: - `PADDLEOCR_DOC_PARSING_API_URL` — API endpoint ending in `/layout-parsing` - `PADDLEOCR_ACCESS_TOKEN` — Access token Optional: - `PADDLEOCR_DOC_PARSING_TIMEOUT` — Default 600 seconds ## Searchable PDF With `--make-searchable-pdf`, embeds OCR text layer aligned to original layout using bounding boxes. Requires `pdf2image` + `poppler` (system) and `reportlab`, `pypdf`, `pillow` (Python). ## Full Documentation Detailed usage, troubleshooting, and development guide available in multiple languages under `docs/`: - 中文: `docs/README.zh.md` - English: `docs/README.en.md` - Español: `docs/README.es.md` - العربية: `docs/README.ar.md` ## License MIT-0 --- **Made for OpenClaw.** Let your agent see and act.

agent-paddleocr-vision

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

agent-paddleocr-vision