Native Token Compression¶
AI models have a limit on how much text they can process at once. Token compression keeps your context small enough to fit, while preserving the important stuff.
Harness CLI does this natively — no extra tools to install, no shell hooks, no dependencies.
The Problem¶
Every time your agent starts a new session, ContextDB loads the history of what happened before. But if your project has months of history, that's a LOT of text — often more than the model can handle.
Token compression solves this by:
- Keeping recent work, errors, decisions, and file paths
- Compressing repeated logs, verbose output, and stack traces
- Dropping low-priority content only when necessary
Two Layers¶
Input Compression (what goes TO the model)¶
Reduce the context pack before your agent reads it:
npm run contextdb -- context:pack \
--session <session-id> \
--limit 80 \
--token-budget 1200 \
--token-strategy balanced
| Strategy | When to use | What it does |
|---|---|---|
balanced |
Default | Compresses low-signal text, keeps errors and recent work |
aggressive |
Very small budgets | Maximum compression, minimal detail |
legacy |
Old behavior | Only keeps the tail end of history |
What gets preserved (never dropped):
- Error messages and failure signals
- File paths and command outputs
- Recent state and decisions
What gets compressed (shortened, not always dropped):
- Repeated log lines
- Stack traces
- Verbose tool output
Output Compression (what comes FROM the model)¶
Control how verbose your agent's responses are:
| Level | Use for | Behavior |
|---|---|---|
tight |
Normal coding | Concise answer, no filler |
ultra |
Harness logs, checkpoints | One-line evidence + next action |
precise |
Browser actions, safety-critical | Full explicit wording |
/compress tight # Normal work
/compress ultra # Overnight runs
/compress precise # When precision matters
stop compress # Back to normal
Browser Reads¶
When your agent reads web pages, Harness CLI automatically prefers the most compact format:
- Semantic snapshot (smallest)
- Targeted text extraction
- Full text extraction
- Full HTML (largest)
- Screenshot (only when visual evidence is needed)
This means less token waste when agents browse the web.
Why Native?¶
Harness CLI's compression is built in — not a bolted-on tool:
- No extra packages to install
- No shell hooks or command rewriting
- Everything stays auditable — you can see exactly what was compressed or dropped
- Works consistently across Codex, Claude, Gemini, and OpenCode
Where To Go Next¶
- ContextDB — how memory works with compression
- Solo Harness — long runs benefit most from compression
- Architecture — technical details of the compression pipeline