wechat-to-obsidian

# WeChat → Obsidian Clipper Clips a WeChat public article to your Obsidian vault: - **Headless** — uses `agent-browser` (no visible window, fast) - **Full fidelity** — extracts text and images in their original DOM order - **Images downloaded** — saves to `attachments/` next to the note using `curl` with correct `Referer` header (required to avoid 403) - **Clean Markdown** — Obsidian `![[filename]]` embed syntax ## Requirements - `agent-browser` ≥ 0.17: `npm install -g agent-browser && agent-browser install` - `curl` (pre-installed on macOS/Linux) - An Obsidian vault accessible on the local filesystem ## Trigger User sends a `mp.weixin.qq.com` link and says something like: - "存 Obsidian" / "剪藏" / "帮我存" - "save to Obsidian" / "clip this" / "save article" ## Standard Workflow ### Step 1 — Open page (headless) ```bash agent-browser open "<wechat_url>" agent-browser wait --load networkidle ``` ### Step 2 — Get title ```bash agent-browser get title ``` ### Step 3 — Scroll to trigger lazy-loaded images WeChat uses lazy loading — must scroll before extracting image URLs. ```bash agent-browser eval " (async () => { window.scrollTo(0, document.body.scrollHeight); await new Promise(r => setTimeout(r, 2000)); const step = 600; for (let y = 0; y < document.body.scrollHeight; y += step) { window.scrollTo(0, y); await new Promise(r => setTimeout(r, 300)); } return 'done'; })()" ``` ### Step 4 — Extract content in DOM order **Important:** Use classic `function(){}` syntax, not arrow functions — arrow functions inside `JSON.stringify()` cause a syntax error in agent-browser eval. ```bash agent-browser eval " (function() { var nodes = document.querySelectorAll('#js_content p, #js_content section, #js_content img'); var result = []; var imgIdx = 0; nodes.forEach(function(node) { if (node.tagName === 'IMG') { var src = node.currentSrc || node.src || node.dataset.src || ''; if (src && src.includes('mmbiz') && !src.includes('mmbiz.qlogo') && !src.includes('profile')) { var h = node.naturalHeight || node.height || 0; var alt = (node.alt || '').toLowerCase(); if ((h >= 50 || h === 0) && !alt.includes('二维码') && !alt.includes('引导') && !alt.includes('赞赏')) { result.push({ type: 'img', idx: imgIdx++, src: src }); } } } else { var text = node.innerText ? node.innerText.trim() : ''; if (text && text.length > 3) result.push({ type: 'text', text: text }); } }); return JSON.stringify(result); })()" ``` **Image filter rules** (skip these): - `mmbiz.qlogo` — author avatar - `mp_profile` — account profile image - height < 50px — decorative dividers - alt containing `二维码` / `引导` / `赞赏` — QR codes and tip prompts ### Step 5 — Confirm save location **Always ask the user** before writing. Suggest a directory based on topic: > 📂 Suggested: `<vault_root>/<topic_directory>/` > 📄 Filename: `<title-keywords-YYYY-MM-DD>.md` > 🖼 Images → same folder's `attachments/` directory > Confirm, or pick a different location? ### Step 6 — Download images ```bash mkdir -p "<note_dir>/attachments" # Must include Referer header — WeChat returns 403 without it curl -s -L --fail \ -A 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36' \ -e 'https://mp.weixin.qq.com/' \ "<image_url>" -o "<note_dir>/attachments/<filename>" ``` Naming convention: `<slug>-图00.png`, `图01.jpg` … numbered in DOM order. Extension: check `wx_fmt=jpeg` → `.jpg`, otherwise → `.png`. **Shell tip for batch download** — use a function, not `declare -A` (zsh doesn't support associative arrays): ```bash download_img() { local idx=$1 url=$2 local ext="png" echo "$url" | grep -q "wx_fmt=jpeg" && ext="jpg" local fname=$(printf "<slug>-图%02d.%s" "$idx" "$ext") curl -s -L --fail \ -A 'Mozilla/5.0' \ -e 'https://mp.weixin.qq.com/' \ "$url" -o "$fname" \ && echo "OK $fname" || echo "FAIL $fname" } ``` ### Step 7 — Write the note Strictly follow the DOM order from Step 4 — insert images at their original positions, not at the top or bottom. ```markdown # {Article Title} **Source:** WeChat — {Author / Account Name} **Original URL:** {URL} **Clipped:** {YYYY-MM-DD} **Tags:** #{tag1} #{tag2} --- {paragraph text} ![[slug-图00.jpg]] {more paragraph text} ![[slug-图01.png]] ... --- **References:** - {any links from the article} ``` ### Step 8 — Close browser ```bash agent-browser close ``` ### Step 9 — Report to user Tell the user: - Note path - Number of images downloaded - ⚠️ Reminder: if you move the note, move the `attachments/` folder with it. Obsidian's `![[]]` does a global vault search so links won't break immediately, but it's cleaner to keep them together. --- ## Gotchas (hard-won lessons) | # | Issue | Fix | |---|-------|-----| | 1 | Lazy-loaded images return empty `src` | Always scroll the full page before extracting | | 2 | WeChat images return 403 | `curl` must pass `-e 'https://mp.weixin.qq.com/'` as Referer | | 3 | Image order wrong | Traverse DOM nodes in order — don't collect images separately | | 4 | `zsh: bad substitution` | zsh doesn't support `declare -A`; use a shell function instead | | 5 | `SyntaxError: missing ) after argument list` in agent-browser eval | Use classic `function(){}` not arrow functions inside `JSON.stringify()` | | 6 | `async` eval hangs | Wrap in `(async () => { ... })()` — no outer `await` needed |

wechat-to-obsidian

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

wechat-to-obsidian