clawsaver
# ClawSaver
**Reduce model API costs by 20–40% through intelligent message batching and buffering.**
Most agent systems waste money on redundant API calls. When users send follow-up messages, you call the model separately for each one. ClawSaver fixes this by waiting ~800ms to collect related messages, then sending them together in a single optimized request. Same response quality. Lower cost. No user friction.
## How It Works: Batching & Buffering
```
WITHOUT CLAWSAVER (Context Overhead Hidden):
User: "What is ML?"
Model: → API Call #1 [Context: system prompt, chat history] (cost: $X)
Returns: definition
User: "Give an example"
Model: → API Call #2 [Context: system prompt, chat history, Q1, A1] (cost: $X)
Returns: example
User: "Apply to finance?"
Model: → API Call #3 [Context: system prompt, chat history, Q1–A2] (cost: $X)
Returns: finance application
Total: 3 calls × full context = 3X cost, each call repeats context overhead
───────────────────────────────────────
WITH CLAWSAVER (Single Context Load):
User: "What is ML?" ← Buffer (800ms wait)
User: "Give an example" ← Buffer (800ms wait)
User: "Apply to finance?" ← Flush: Send all 3 together
Model: → API Call #1 [Context loaded ONCE: system prompt, chat history]
Processes all 3 questions together
Returns: comprehensive answer addressing all three
Total: 1 call × full context = 1X cost, context overhead paid once
Actual savings (with context): 67% reduction
Cost per token: 1/3 (fewer context re-loads + consolidation)
```
**Why it matters:** Context (system prompts, history, instructions) gets re-sent on every API call. With ClawSaver, you pay that context overhead **once per batch instead of three times**. This compounds the savings beyond just "fewer calls."
**Example (4K token context, 200 output tokens):**
- Without ClawSaver: 3 calls × 4,200 tokens = 12,600 tokens
- With ClawSaver: 1 call × 4,600 tokens = 4,600 tokens
- **Actual savings: 63% token reduction** (even better than call reduction)
## The Problem
```
User: "What is machine learning?"
(pause)
User: "Give an example"
(pause)
User: "How does that apply to healthcare?"
```
Without optimization: **3 API calls = 3x cost**
With ClawSaver: **1 batched call = 1/3 the price**
Across thousands of conversations, this compounds fast.
## How It Works
1. User sends message → ClawSaver buffers it
2. Waits ~800ms for follow-ups from same user
3. If more messages arrive → keep buffering
4. Timer expires → send all messages together
5. Model responds once → you get complete answer
**Why users don't notice:** They're already waiting for your model response. Buffering input doesn't feel slower because the response comes right after the batch sends.
## Install
```bash
clawhub install clawsaver
```
## Quick Start (10 lines)
```javascript
import SessionDebouncer from 'clawsaver';
const debouncers = new Map();
function handleMessage(userId, text) {
if (!debouncers.has(userId)) {
debouncers.set(userId, new SessionDebouncer(
userId,
(msgs) => callModel(userId, msgs)
));
}
debouncers.get(userId).enqueue({ text });
}
```
## Impact
| Metric | Value |
|--------|-------|
| **Cost reduction** | 20–40% typical |
| **Setup time** | 10 minutes |
| **Code added** | ~10 lines |
| **Dependencies** | 0 |
| **File size** | 4.2 KB |
| **Latency added** | +800ms (user-imperceptible) |
| **Maintenance** | None |
## Three Profiles
Choose based on your use case:
### Balanced (Default)
- 25–35% savings
- 800ms buffer
- Chat, Q&A, general conversation
### Aggressive
- 35–45% savings
- 1.5s buffer
- Batch workflows, high-volume ingestion
### Real-Time
- 5–10% savings
- 200ms buffer
- Interactive, voice-first systems
## When to Use
✅ Chat applications
✅ Customer support bots
✅ Multi-turn Q&A
✅ Any conversation with follow-ups
❌ Single-request workflows
❌ Sub-100ms response requirements
## API
```javascript
new SessionDebouncer(userId, handler, {
debounceMs: 800, // wait time
maxWaitMs: 3000, // absolute max
maxMessages: 5, // batch size cap
maxTokens: 2048 // reserved
})
// Methods
debouncer.enqueue(message) // add to batch
debouncer.forceFlush(reason) // send now
debouncer.getState() // buffer + metrics
debouncer.getStatusString() // human-readable
```
## Docs
- **START_HERE.md** — Navigation (pick your role/timeline)
- **AUTO-INTEGRATION.md** — ⭐ Drop-in middleware wrapper (2 min setup)
- **QUICKSTART.md** — 5-minute integration
- **INTEGRATION.md** — Patterns, edge cases, full config
- **SUMMARY.md** — Metrics and ROI (decision makers)
- **SKILL.md** — Full API reference
- **example-integration.js** — Copy-paste templates
## Security
- **No telemetry** — Doesn't phone home
- **No network calls** — Runs locally
- **No dependencies** — Pure JavaScript
- **You control output** — You decide what goes to your model
Data never leaves your machine.
## License
MIT
---
**Start here:** Pick your path in **START_HERE.md**, or jump to **QUICKSTART.md** for 5-minute setup.
标签
skill
ai