返回顶部
s

simple-csc

>

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.1
安全检测
已通过
86
下载量
1
收藏
概述
安装方式
版本历史

simple-csc

# Simple CSC A training-free approach to Chinese Spelling Correction using LLMs as pure language models with beam search and distortion modeling. ## Prerequisites This skill is a usage guide for the [simple-csc](https://github.com/Jacob-Zhou/simple-csc) repository. Before using any commands or APIs described here, clone the repository and work from its root: ```bash git clone https://github.com/Jacob-Zhou/simple-csc.git cd simple-csc ``` All paths referenced below (e.g., `configs/`, `scripts/`, `data/`, `eval/`, `datasets/`) are relative to this repository root. The repository contains the actual code, config files, data dictionaries, and scripts — this skill provides the knowledge of how to use them. ## Quick Reference ### Environment Setup ```bash # Standard setup (creates venv, installs deps) bash scripts/set_environment.sh # For Qwen3 models bash scripts/set_environment_qwen3.sh # Recommended: install flash-attn for better performance and lower VRAM pip install flash-attn --no-build-isolation ``` **Qwen2/Qwen2.5 warning**: Without flash-attn, set `torch_dtype=torch.bfloat16` to avoid unexpected behavior. ### Python API ```python import torch from lmcsc import LMCorrector corrector = LMCorrector( model="Qwen/Qwen2.5-7B", prompted_model="Qwen/Qwen2.5-7B", # use same model to save VRAM config_path="configs/c2ec_config.yaml", # or "configs/default_config.yaml" for substitution-only torch_dtype=torch.bfloat16, # recommended for Qwen2/2.5 without flash-attn ) # Single sentence outputs = corrector("完善农产品上行发展机智。") # => [('完善农产品上行发展机制。',)] # Batch outputs = corrector(["句子一", "句子二"]) # With context (same length lists) outputs = corrector(["未挨前兆"], contexts=["患者提问:"]) # Streaming (batch_size=1 only) for output in corrector("完善农产品上行发展机智。", stream=True): print(output[0][0], end="\r", flush=True) ``` ### Config Selection | Config | Use Case | |--------|----------| | `configs/default_config.yaml` | Substitution-only CSC (v1.0.0 style) | | `configs/c2ec_config.yaml` | Full C2EC with insert/delete support (v2.0.0) | | `configs/demo_config.yaml` | Same as c2ec_config, used by demo app | Key difference: `c2ec_config.yaml` includes `ROR` (reorder), `MIS` (missing char), `RED` (redundant char) distortion types and `length_immutable_chars` data file. ### Recommended Models - **v2.0.0 (C2EC)**: `Qwen/Qwen2.5-7B` or `Qwen/Qwen2.5-14B` — best performance/speed balance - **v1.0.0 (CSC)**: `baichuan-inc/Baichuan2-13B-Base` — best performance - Always prefer `Base` models over `Instruct`/`Chat` variants ### RESTful API Server ```bash python api_server.py \ --model "Qwen/Qwen2.5-7B" \ --prompted_model "Qwen/Qwen2.5-7B" \ --config_path "configs/c2ec_config.yaml" \ --host 127.0.0.1 --port 8000 --workers 1 --bf16 ``` Endpoints: - `GET /health` — health check - `POST /correction` — `{"input": "...", "stream": false, "contexts": null}` ```bash # Non-streaming curl -X POST 'http://127.0.0.1:8000/correction' \ -H 'Content-Type: application/json' \ -d '{"input": "完善农产品上行发展机智。"}' # With context curl -X POST 'http://127.0.0.1:8000/correction' \ -H 'Content-Type: application/json' \ -d '{"input": "未挨前兆", "contexts": "患者提问:"}' ``` For detailed API parameters, config options, evaluation pipeline, and dataset formats, see [references/details.md](references/details.md). ## Key Architecture Concepts The approach works by: 1. Using an LLM as a pure language model (left-to-right generation) 2. At each step, computing a distortion probability for each candidate token based on how "similar" it is to the observed (possibly erroneous) character 3. Combining LM probability with distortion probability via beam search 4. Distortion types encode the relationship between observed and candidate characters (identical, same pinyin, similar shape, etc.) The `prompted_model` parameter adds a second probability source: a prompt-based LLM that scores candidates given the full input sentence as context, improving correction quality.

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 simple-csc-1776122382 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 simple-csc-1776122382 技能

通过命令行安装

skillhub install simple-csc-1776122382

下载 Zip 包

⬇ 下载 simple-csc v1.0.1

文件大小: 7.61 KB | 发布时间: 2026-4-14 13:11

v1.0.1 最新 2026-4-14 13:11
Version 1.0.1 of simple-csc

- Added explicit compatibility and prerequisite instructions, including GPU, Python version, and VRAM requirements.
- Clarified that this skill is a usage guide and that the simple-csc repository must be cloned locally before use.
- Noted that all file paths are relative to the repository root, improving user guidance.
- No changes to features, APIs, or behavior were made; this is a documentation update for improved clarity and onboarding.

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部