返回顶部
c

crawlee-web-scraper

Resilient web scraper with bot-detection evasion using the Crawlee library. Use when web_fetch is blocked by rate limits or bot detection. Supports single URLs, bulk file input, and automatic fallback from requests to Crawlee on 403/429 responses.

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
127
下载量
0
收藏
概述
安装方式
版本历史

crawlee-web-scraper

# crawlee-web-scraper Drop-in replacement for `web_fetch` when sites block automated requests. Crawlee handles session management, retry logic, and bot-detection evasion automatically. ## Scripts - **`crawlee_fetch.py`** — main scraper; accepts a single URL or a file of URLs; returns JSON - **`crawlee_http.py`** — library helper; tries `requests` first, falls back to Crawlee on 403/429/503 ## Usage ```bash # Single URL, return HTML preview python3 scripts/crawlee_fetch.py --url "https://example.com" # Single URL, extract text (strips HTML tags) python3 scripts/crawlee_fetch.py --url "https://example.com" --extract-text # Bulk scrape from file python3 scripts/crawlee_fetch.py --urls-file urls.txt --output results.json ``` ### Library usage ```python from crawlee_http import fetch_with_fallback resp = fetch_with_fallback("https://example.com") print(resp.status_code, resp.text[:500]) ``` ## Output JSON array with one object per URL: ```json [ { "url": "https://example.com", "status": 200, "fetched_at": "2026-01-01T00:00:00Z", "length": 12345, "text": "Page content..." } ] ``` ## Installation ```bash pip install crawlee requests ``` ## When to use - `web_fetch` returns 403 / 429 / empty - Bulk scraping 10+ URLs - Sites using Cloudflare or similar bot protection

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 crawlee-web-scraper-1776108608 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 crawlee-web-scraper-1776108608 技能

通过命令行安装

skillhub install crawlee-web-scraper-1776108608

下载 Zip 包

⬇ 下载 crawlee-web-scraper v1.0.0

文件大小: 4.05 KB | 发布时间: 2026-4-14 11:24

v1.0.0 最新 2026-4-14 11:24
Initial release of crawlee-web-scraper.

- Provides resilient web scraping with evasion for bot detection and rate limits using Crawlee.
- Supports both single URLs and bulk file input for scraping.
- Implements automatic fallback: tries regular requests, then uses Crawlee on 403/429/503 errors.
- Returns standardized JSON output per URL with metadata and extracted content.
- Drop-in replacement for web_fetch, with simple command-line and Python library usage.

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部