返回顶部
l

literature-manager文献管理器

Search, download, convert, organize, and audit academic literature collections. Use when asked to find papers, build a literature library, add papers to references, download PDFs, convert papers to markdown, organize references by category, audit a reference collection, or collect code/dataset links for tools mentioned in papers.

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.1.1
安全检测
已通过
869
下载量
免费
免费
2
收藏
概述
安装方式
版本历史

literature-manager

Literature Manager

Manage academic literature collections: search → download → convert → organize → verify.

Dependencies

  • - pdftotext (poppler-utils) — PDF text extraction
  • INLINECODE1 — downloading
  • INLINECODE2 — JSON processing in audit
  • INLINECODE3 (coreutils) — PDF validation
  • INLINECODE4 (optional) — fallback PDF→MD converter (note: plain uvx markitdown does NOT work for PDFs — must use uvx markitdown[pdf])

Quick Start

CODEBLOCK0

Workflow

1. Search

Use web_fetch on Google Scholar:

https://scholar.google.com/scholar?q=QUERY&as_ylo=YEAR

Extract: title, authors, year, journal, DOI, PDF links.

For each result, identify the best open-access PDF source (see Download Strategy).

2. Download

Run scripts/download.sh <DOI_or_URL> <output_dir/> per paper. The script tries sources in order:

  1. 1. Direct publisher PDF (Nature, eLife, Frontiers, PNAS, bioRxiv, arXiv)
  2. EuropePMC (PMC_ID → PDF)
  3. bioRxiv/arXiv preprint
  4. Sci-Hubhttps://sci-hub.box/<DOI> (use when publisher is paywalled)

CODEBLOCK2

⚠️ Legal note: Sci-Hub may violate publisher terms of service or copyright law in some jurisdictions. Use only if you understand and accept the legal implications in your context.

If all sources fail (including Sci-Hub), flag as permanent paywall. Provide the user with the DOI and ask for manual download.

3. Convert

Run scripts/convert.sh <input.pdf> <output.md>. Uses pdftotext (reliable) with uvx markitdown[pdf] as fallback.

CODEBLOCK3

Prefer uvx markitdown[pdf] over pdftotext when full fidelity (tables, figures captions) matters.

4. Organize

Standard folder structure:
CODEBLOCK4

Categories are user-defined. Number-prefix for sort order (e.g., 01-theoretical-frameworks/).

index.json schema per paper

CODEBLOCK5

README.md pattern

Per category section, per paper: title, authors, year, journal, DOI, short summary in user's language.

4b. DOI-Based Filenames & Path Mapping

Downloaded files are often named using DOI format rather than AuthorYear:
CODEBLOCK6

When markdown_path entries in index.json become stale (e.g., after folder reorganization), maintain a separate mapping file:

CODEBLOCK7

To build this mapping: cross-reference each paper's DOI in index.json against actual files on disk. Use find + Python to automate.

index.json Known Pitfalls

  • - id: null corruption: If many entries have id=null and share the same pdf_path, the index was likely corrupted during a batch write. Rebuild from actual files on disk.
  • DOI errors: Verify DOIs resolve correctly — typos in DOI fields are common (e.g., wrong suffix digits). Always cross-check with publisher page.
  • Dead markdown_path: After restructuring folders, markdown_path in index.json often points to old locations. Use the mapping file above as the source of truth.

5. Verify

Run scripts/audit.sh <references_dir/> for full verification:

  • - Every PDF is valid (file -b = PDF)
  • Every PDF title matches filename (pdftotext | head)
  • Every PDF has matching markdown (and vice versa)
  • index.json is valid, complete, paths exist, no duplicate IDs
  • README.md stats match actual counts

6. Collect Resources

For tool/method papers, find GitHub repos and public datasets. Store in RESOURCES.md + resources.json.

Sub-agent Strategy

For large batches, parallelize:

  • - Download: 1 sub-agent per batch of ~5-8 papers
  • Organize: 1 sub-agent to build indexes
  • Verify: 1 independent sub-agent (never the same as organizer)

Always use a separate sub-agent for verification (QC should not self-grade).

⚠️ Sub-agent Rules (Learned from Practice)

  1. 1. One batch at a time — do not spawn multiple note-writing batches simultaneously; LLM rate limits will cause silent failures
  2. Set a cron monitor whenever spawning long-running agents — agents can fail silently without triggering auto-announce; cron catches this
  3. Cron monitor pattern:
CODEBLOCK8

Adding Papers Incrementally

To add papers to an existing collection:

  1. 1. Download + convert new papers into correct category folder
  2. Append entries to index.json
  3. Update README.md stats
  4. Run audit to verify consistency

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 literature-manager-1776420065 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 literature-manager-1776420065 技能

通过命令行安装

skillhub install literature-manager-1776420065

下载

⬇ 下载 literature-manager v1.1.1(免费)

文件大小: 8.31 KB | 发布时间: 2026-4-17 20:17

v1.1.1 最新 2026-4-17 20:17
Fix doc/code mismatch: convert.sh now uses uvx markitdown[pdf]; download.sh now includes Sci-Hub as Strategy 5; add legal disclaimer for Sci-Hub in SKILL.md

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部