返回顶部
l

liteparse

Parse, extract text from, and screenshot PDF and document files locally using the LiteParse CLI (`lit`). Use when asked to extract text from a PDF, parse a Word/Excel/PowerPoint file, batch-process a folder of documents, or generate page screenshots for LLM vision workflows. Runs entirely offline — no cloud, no API key. Supports PDF, DOCX, XLSX, PPTX, images (jpg/png/webp), and more. Triggers on phrases like "extract text from this PDF", "parse this document", "get the text out of", "screenshot

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
162
下载量
0
收藏
概述
安装方式
版本历史

liteparse

# LiteParse Local document parser built on PDF.js + Tesseract.js. Zero cloud dependencies. **Binary:** `lit` (installed globally via npm) **Docs:** https://developers.llamaindex.ai/liteparse/ ## Quick Reference ```bash # Parse a PDF to text (stdout) lit parse document.pdf # Parse to file lit parse document.pdf -o output.txt # Parse to JSON (includes bounding boxes) lit parse document.pdf --format json -o output.json # Specific pages only lit parse document.pdf --target-pages "1-5,10,15-20" # No OCR (faster, text-layer PDFs only) lit parse document.pdf --no-ocr # Batch parse a directory lit batch-parse ./input-dir ./output-dir # Screenshot pages (for vision model input) lit screenshot document.pdf -o ./screenshots lit screenshot document.pdf --target-pages "1,3,5" --dpi 300 -o ./screenshots ``` ## Output Formats | Format | Use case | |--------|----------| | `text` (default) | Plain text extraction, feeding into prompts | | `json` | Structured output with bounding boxes, useful for layout-aware tasks | ## OCR Behavior - OCR is **on by default** via Tesseract.js (downloads ~10MB English data on first run) - First run will be slow; subsequent runs use cached data - `--no-ocr` for pure text-layer PDFs (faster, no network needed) - For multi-language: `--ocr-language fra+eng` ## Supported File Types Works natively: **PDF** Requires **LibreOffice** (`brew install --cask libreoffice`): .docx, .doc, .xlsx, .xls, .pptx, .ppt, .odt, .csv Requires **ImageMagick** (`brew install imagemagick`): .jpg, .png, .gif, .bmp, .tiff, .webp ## Installation Notes - Installed via npm: `npm install -g @llamaindex/liteparse` - Brew formula exists (`brew tap run-llama/liteparse`) but requires current macOS CLT — use npm as primary install path on this machine - Binary path: `/opt/homebrew/bin/lit` ## Workflow Tips - For **VA forms, job description PDFs, military docs**: `lit parse file.pdf -o /tmp/output.txt` then read into context - For **scanned PDFs** (no text layer): OCR is required; complex layouts may degrade — consider LlamaParse cloud for critical docs - For **vision model workflows**: use `lit screenshot` to generate page images, then pass to `image` tool or similar - For **batch jobs**: use `lit batch-parse` — it reuses the PDF engine across files for efficiency ## Limitations - Complex tables, multi-column layouts, and scanned government forms may produce imperfect output - LlamaParse (cloud) handles the hard cases: https://cloud.llamaindex.ai - Max recommended DPI for screenshots: 300 (higher = slower, larger files) ## Reference See `references/output-examples.md` for sample JSON/text output structure.

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 liteparse-1776108302 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 liteparse-1776108302 技能

通过命令行安装

skillhub install liteparse-1776108302

下载 Zip 包

⬇ 下载 liteparse v1.0.0

文件大小: 3.11 KB | 发布时间: 2026-4-14 14:34

v1.0.0 最新 2026-4-14 14:34
Initial release: local PDF/doc parser skill using LiteParse CLI

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部