返回顶部
i

information-extraction

Extract structured information from unstructured text through a semi-automatic pipeline. Support entity extraction, relation extraction, attribute extraction, and event extraction from plain text and Markdown. Use when converting raw text into triples, graph-ready records, or normalized structured facts from documents, notes, reports, transcripts, and web content copied as text.

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
94
下载量
0
收藏
概述
安装方式
版本历史

information-extraction

# Information Extraction Extract entity, relation, attribute, and event information from text into a normalized intermediate structure, then export triples in JSON, JSONL, or TSV. ## Core workflow 1. Define extraction scope and output granularity. 2. Segment input text into sentences and paragraphs. 3. Extract entities with evidence. 4. Extract relations, attributes, and events. 5. Normalize aliases, predicates, and duplicated records. 6. Export triples. Default output is JSON. 7. Review ambiguities before treating output as final. ## Input scope Prefer this skill for: - Plain text strings - Markdown text - Text copied from webpages, notes, reports, transcripts, or documents If the user provides a file in another format, convert it to text first, then use this skill. ## Output contract Default output should contain: ```json { "triples": [], "entities": [], "attributes": [], "events": [], "ambiguities": [] } ``` Support export formats: - JSON (default) - JSONL - TSV ## Extraction principles - Extract explicit facts before inference. - Preserve evidence spans for important records. - Prefer controlled predicates from `references/relation-taxonomy.md`. - Keep attributes and events separate internally, even when final output is triples. - Do not flatten complex events too early. - Normalize before exporting. - Record unresolved ambiguity instead of pretending certainty. ## Minimal internal schema Use these record shapes during extraction. ### Entity ```json { "id": "ent_001", "mention": "OpenAI", "canonical_name": "OpenAI", "type": "Organization", "evidence": "OpenAI published the GPT-4 Technical Report.", "confidence": 0.95 } ``` ### Relation ```json { "subject": "ent_001", "predicate": "published", "object": "ent_002", "evidence": "OpenAI published the GPT-4 Technical Report.", "confidence": 0.93 } ``` ### Attribute ```json { "entity_id": "ent_002", "attribute": "year", "value": "2023", "evidence": "The report was released in 2023.", "confidence": 0.87 } ``` ### Event ```json { "id": "ev_001", "type": "Publication", "trigger": "published", "participants": { "agent": "ent_001", "object": "ent_002" }, "time": "2023", "location": null, "evidence": "OpenAI published the GPT-4 Technical Report in 2023.", "confidence": 0.92 } ``` ## How to use references - Read `references/pipeline.md` for the end-to-end procedure. - Read `references/schema.md` for types and intermediate record structure. - Read `references/relation-taxonomy.md` before inventing new predicates. - Read `references/triple-mapping.md` when exporting final triples. - Read `references/event-modeling.md` when text describes complex events. - Read `references/quality-checklist.md` before final delivery. ## Scripts ### Extract ```bash python3 skills/information-extraction/scripts/extract.py --text "OpenAI published GPT-4." --output out.json ``` Or read from stdin: ```bash echo "OpenAI published GPT-4." | python3 skills/information-extraction/scripts/extract.py --stdin --output out.json ``` ### Normalize ```bash python3 skills/information-extraction/scripts/normalize.py --input out.json --output normalized.json ``` ### Export triples ```bash python3 skills/information-extraction/scripts/export_triples.py --input normalized.json --format json --output triples.json python3 skills/information-extraction/scripts/export_triples.py --input normalized.json --format jsonl --output triples.jsonl python3 skills/information-extraction/scripts/export_triples.py --input normalized.json --format tsv --output triples.tsv ``` ## Notes on automation This is a semi-automatic pipeline, not a claim of perfect extraction. The scripts provide scaffolding, normalization, and export. For high-stakes outputs, keep evidence and perform manual review.

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 information-extraction-1776101651 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 information-extraction-1776101651 技能

通过命令行安装

skillhub install information-extraction-1776101651

下载 Zip 包

⬇ 下载 information-extraction v1.0.0

文件大小: 10.81 KB | 发布时间: 2026-4-14 10:52

v1.0.0 最新 2026-4-14 10:52
Initial release: semi-automatic information extraction pipeline for entities, relations, attributes, events, and triple export (JSON/JSONL/TSV).

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部