返回顶部
i

inception-token-optimizer

Optimize Inception Labs token usage to minimize costs. Use when choosing Inception models (Mercury, etc.), crafting prompts for Inception, analyzing token consumption, or when the user wants to reduce API costs. Covers caching strategies, context pruning, prompt compression, model selection tips, and free-tier budget management.

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
100
下载量
0
收藏
概述
安装方式
版本历史

inception-token-optimizer

# Inception Token Optimizer Reduce Inception API token consumption through prompt engineering, context management, and budget enforcement. ## Free-Tier Limits (Inception Labs) | Metric | Cap | |---|---| | Requests/min | 100 | | Input tokens/min | 100,000 | | Output tokens/min | 10,000 | ## Core Strategies ### 1. Prompt Compression - Remove redundant instructions, filler words, and repeated context. - Use short system prompts: "Concise answers. French." beats a 200-word persona block. - Avoid re-sending unchanged context — only send deltas. - Ask for short replies: "Réponds en < 100 mots." ### 2. Context Pruning - Before sending, estimate tokens: `len(text) // 4` (rough heuristic). - If total context > target budget, drop oldest messages and replace with a 1-2 sentence summary. - Use `references/pruning-strategies.md` for detailed patterns. ### 3. Caching - Identical prompts → reuse previous response. Do not re-call. - Hash the prompt; if seen recently (within session), return cached reply. - `scripts/lru_cache.py` provides a drop-in LRU cache (256 items default). ### 4. Model Selection - Use cheaper/faster models for simple tasks (summarisation, classification). - Reserve Mercury (or flagship) for complex reasoning only. - Batch trivial queries into a single prompt instead of multiple calls. ### 5. Output Budgeting - Set `max_tokens` explicitly — never leave it open-ended. - Target 150-200 output tokens for conversational replies. - Use `temperature=0.7` to reduce verbose wandering. ## Token Budget Guard `scripts/token_bucket.py` enforces per-minute caps using a sliding window: ```python from scripts.token_bucket import TokenBucket bucket = TokenBucket(req_per_min=100, in_tok_per_min=100_000, out_tok_per_min=10_000) bucket.wait_for_slot(in_tokens=500, out_tokens=200) # proceed with API call ``` Blocks until a slot is available. Use before every Inception API call. ## When to Use This Skill - Before sending a prompt to Inception → compress & prune first. - When monitoring costs → check token estimates. - When near free-tier limits → activate budget guard. - When building automation → integrate caching + bucket guard.

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 inception-token-optimizer-1776121562 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 inception-token-optimizer-1776121562 技能

通过命令行安装

skillhub install inception-token-optimizer-1776121562

下载 Zip 包

⬇ 下载 inception-token-optimizer v1.0.0

文件大小: 4.42 KB | 发布时间: 2026-4-14 14:41

v1.0.0 最新 2026-4-14 14:41
Initial release: token bucket rate limiter, LRU cache, prompt compression guide, context pruning strategies

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部