q_code

扫码关注官方微信

cell_code

扫码下载APP

返回顶部

c

crawlee-web-scraper

Resilient web scraper with bot-detection evasion using the Crawlee library. Use when web_fetch is blocked by rate limits or bot detection. Supports single URLs, bulk file input, and automatic fallback from requests to Crawlee on 403/429 responses.

作者: admin | 来源: ClawHub

源自

ClawHub

版本

V 1.0.0

安全检测

已通过

127
下载量

0
收藏

概述

安装方式

版本历史

crawlee-web-scraper

# crawlee-web-scraper Drop-in replacement for `web_fetch` when sites block automated requests. Crawlee handles session management, retry logic, and bot-detection evasion automatically. ## Scripts - **`crawlee_fetch.py`** — main scraper; accepts a single URL or a file of URLs; returns JSON - **`crawlee_http.py`** — library helper; tries `requests` first, falls back to Crawlee on 403/429/503 ## Usage ```bash # Single URL, return HTML preview python3 scripts/crawlee_fetch.py --url "https://example.com" # Single URL, extract text (strips HTML tags) python3 scripts/crawlee_fetch.py --url "https://example.com" --extract-text # Bulk scrape from file python3 scripts/crawlee_fetch.py --urls-file urls.txt --output results.json ``` ### Library usage ```python from crawlee_http import fetch_with_fallback resp = fetch_with_fallback("https://example.com") print(resp.status_code, resp.text[:500]) ``` ## Output JSON array with one object per URL: ```json [ { "url": "https://example.com", "status": 200, "fetched_at": "2026-01-01T00:00:00Z", "length": 12345, "text": "Page content..." } ] ``` ## Installation ```bash pip install crawlee requests ``` ## When to use - `web_fetch` returns 403 / 429 / empty - Bulk scraping 10+ URLs - Sites using Cloudflare or similar bot protection

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装：

OpenClaw WorkBuddy QClaw Kimi Claude

方式一：安装 SkillHub 和技能

帮我安装 SkillHub 和 crawlee-web-scraper-1776108608 技能

方式二：设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源，然后帮我安装 crawlee-web-scraper-1776108608 技能

通过命令行安装

skillhub install crawlee-web-scraper-1776108608

下载 Zip 包

⬇ 下载 crawlee-web-scraper v1.0.0

文件大小: 4.05 KB | 发布时间: 2026-4-14 11:24

v1.0.0 最新 2026-4-14 11:24

Initial release of crawlee-web-scraper.

- Provides resilient web scraping with evasion for bot detection and rate limits using Crawlee.
- Supports both single URLs and bulk file input for scraping.
- Implements automatic fallback: tries regular requests, then uses Crawlee on 403/429/503 errors.
- Returns standardized JSON output per URL with metadata and extracted content.
- Drop-in replacement for web_fetch, with simple command-line and Python library usage.

闲社论坛
关于我们会员介绍开通会员羊毛论坛
闲社论坛
羊毛交流论坛线报讨论社区优惠分享交流线报更新服务
网站服务
会员咨询：515151560 广告合作：515151570 投诉建议：515151580 售后指导：515151590

多链集团旗下-闲社网

闲社网热线

免费联系电话

0527-80111111

服务时间：周一到周日 8:00-24:00

公众号
闲社闲社线报社区

关注闲社网

闲社在线客服
关注闲社网微信
闲社网APP

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0 © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large

返回顶部