AI Claude Code AGENTS.md Skills Caveman.MD token compression cost optimization cavecrew OpenClaw opencode 2026

Caveman.MD: The Complete 2026 Guide to AI Agent Token Compression (65% Output Reduction, v1.8.2 Update)

The Prompt Shelf ·

In April 2026, JuliusBrussee/caveman shipped a deceptively simple idea: make Claude (and 30+ other AI agents) write like a caveman to cut output tokens by 65%. The tagline says it all — “why use many token when few do trick.”

Within weeks, the project found its way into AGENTS.md rulesets, Claude Code plugin marketplaces, Codex extensions, and Gemini CLI configurations across the AI coding community. If you’ve been wondering what “caveman.md” means when you see it referenced — or how it differs from CLAUDE.md / AGENTS.md / .cursorrules — this is the complete reference.

This article covers what Caveman is, the four compression levels, the benchmark data, installation across every supported agent, and where Caveman fits in the rule-file ecosystem we’ve documented in our AGENTS.md vs CLAUDE.md vs .cursor/rules guide.

What Caveman Actually Is

Caveman is not a new rule file format. It’s a set of instructions that gets dropped into your existing AGENTS.md (or CLAUDE.md, or .cursorrules) telling the model to compress its output. Same delivery mechanism as any other AGENTS.md content, but the content is specifically about eliminating filler.

From the official README: “Caveman only affects output tokens — thinking/reasoning tokens untouched. Caveman no make brain smaller. Caveman make mouth smaller.”

The result, measured across 10 representative tasks, is an average 65% reduction in output tokens (range: 22–87%). Code, URLs, file paths, and technical accuracy remain byte-perfect — only the natural-language prose surrounding the technical content gets compressed.

Why “Caveman”?

The naming reflects the style. Instead of:

“I’ll examine the file and analyze the structure to understand what we’re working with.”

You get:

Read file. Structure clear.

This isn’t a gimmick. A March 2026 research paper cited in the Caveman docs reports that “constraining large models to brief responses improved accuracy by 26 points on certain benchmarks.” Verbose output may correlate with worse reasoning, not better — by removing the apologetic preamble and meta-commentary, Caveman pushes the model toward direct, accurate answers.

The 4 Compression Levels

Caveman ships with four distinct compression modes:

LevelBehaviorUse case
liteRemoves filler words onlyConservative — keeps natural-sounding prose, just trims padding
fullDefault caveman modeTelegraphic but readable. The everyday recommendation.
ultraMaximum telegraphic styleWhen you want absolute brevity. Slight readability cost.
wenyanClassical Chinese variantShortest. Use for fun or when output is purely log-style.

You pick a level via /caveman <level> once per session, or set a default in your AGENTS.md. The model continues thinking at full fidelity — only the surface text changes.

The 10 Rules (Distilled)

The Caveman ruleset distils the compression philosophy into ten directives:

  1. No filler phrases — drop “I’ll go ahead and…”, “Let me…”, “Sure, I can help with…”
  2. Execute before explaining — code/result first, prose second
  3. No meta-commentary — don’t narrate what you’re about to do
  4. No preamble — skip the warm-up
  5. No postamble — skip the “let me know if you need…” sign-off
  6. No tool announcements — silent tool use, only the result matters
  7. Explain only when needed — comment on non-obvious decisions, not obvious ones
  8. Let code speak for itself — minimize narration around code blocks
  9. Errors are things to fix, not narrate — debug silently, report the fix
  10. Compress prose, preserve technical artifacts — code/URLs/paths stay byte-perfect

These rules are why Caveman is so portable. The exact same ruleset, dropped into AGENTS.md, CLAUDE.md, or .cursorrules, produces compressed output across all major coding agents.

Benchmarks (Real Numbers)

Caveman’s repo includes tiktoken-verified benchmarks across 10 representative tasks:

Task categoryBaseline tokensCaveman tokensReduction
React re-render bug explanation1,18015987%
PostgreSQL race condition debugging1,20023281%
Architecture discussion30% avg
Web search summarization68%
Code edits50%
Q&A72%

Average across all 10 tasks: 65% output reduction.

Memory file compression (running Caveman against existing CLAUDE.md / project notes) reduces input tokens by ~46% on average, which compounds because input tokens are loaded every session.

For a Claude Code user spending $50/month on output tokens, a 65% reduction translates to roughly $17/month after savings — a $33 saving for adding ten lines to AGENTS.md.

Installation Methods

Caveman supports 30+ agents through different delivery mechanisms.

Universal one-liner (Unix-like)

curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash

Universal one-liner (Windows PowerShell 5.1+)

irm https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.ps1 | iex

The installer detects which agents are present (Claude Code, Codex CLI, Gemini CLI, Cursor, Windsurf, Cline, Copilot, etc.) and configures each appropriately. Node ≥18 required. Re-running is safe.

Per-agent specifics

AgentHow Caveman deliversAuto-activates?
Claude CodePlugin via ~/.claude/plugins/caveman/ + CLAUDE.md importYes
Codex CLIPlugin via ~/.codex/plugins/caveman/Yes
Gemini CLIExtension via Gemini CLI plugin systemYes
Cursor.cursor/rules/caveman.mdc with alwaysApply: trueVia rule load
Windsurf.windsurf/rules/caveman.md with trigger: always_onVia rule load
Cline.clinerules/caveman.mdYes
CopilotCustom instructions via AGENTS.md importYes
Aider.aider.conf.yml instructions referenceYes

For agents without auto-activation, users invoke /caveman once per session to opt in.

The Tool Suite

Beyond the core ruleset, Caveman ships several utility commands:

CommandPurpose
/caveman <level>Set the compression level for this session
/caveman-commitGenerate conventional commit messages (≤50 chars)
/caveman-reviewOne-line PR feedback
/caveman-statsShow session token consumption
/caveman-compress <file>Rewrite an existing CLAUDE.md / memory file in caveman style
caveman-shrink MCP middlewareCompress MCP server tool descriptions (reduces context overhead)

The caveman-shrink middleware is particularly useful — MCP servers often have verbose tool descriptions that bloat every session’s context window. Running them through caveman-shrink reclaims tokens without changing tool functionality.

How Caveman Fits the AGENTS.md Ecosystem

Caveman is content for an existing rule file, not a new rule file format. Here’s the mapping:

If you use…Caveman drops into…
AGENTS.mdA section of AGENTS.md (or imported as @caveman.md)
CLAUDE.mdA section of CLAUDE.md (or imported via @AGENTS.md → caveman content)
.cursor/rules/A .mdc file with alwaysApply: true
.windsurf/rules/A .md file with trigger: always_on
Plugin marketplaceA standalone plugin that delivers the rules to any agent

Because the rule content is plain markdown prose, it works equally well in any format. The installer’s job is just deciding where to put the file and how to wire it to the right agent.

For deeper coverage of the rule-file landscape, see AGENTS.md vs CLAUDE.md vs .cursor/rules.

The Caveman Ecosystem

Caveman has spawned a small constellation of related projects:

  • cavemem — Cross-agent memory compression. Periodically rewrites your CLAUDE.md / project notes in compressed form, automatically.
  • cavekit — Spec-driven build loops with caveman compression baked in. Useful for repeatable scaffolding tasks.
  • cavegemma — Fine-tuned Gemma 4 31B model with compression baked into the weights. Runs locally, no rule file needed.

For most users, the core caveman install is enough. The ecosystem is for users who want compression to extend beyond Claude/Codex/Cursor into self-hosted models or specialized workflows.

When Caveman Helps vs When It Hurts

Caveman is excellent for:

  • Long sessions where output tokens accumulate (refactoring, multi-file edits)
  • Cost-sensitive teams running Claude API at scale
  • Engineers who already speak in shorthand — caveman matches their preferred style
  • CI environments where verbose output makes logs unreadable

Caveman is less useful for:

  • Pair-programming sessions where you want the model to “think out loud” — caveman compresses the thinking-out-loud away
  • Onboarding new engineers — caveman output is harder to follow if you’re learning what the agent does
  • Compliance-heavy workflows that need explicit audit trails of model reasoning
  • Customer-facing chat interfaces — caveman style sounds curt to non-engineers

The right answer is toggle it per task: /caveman full for refactor-heavy work, /caveman lite or off for exploratory sessions.

Should You Add Caveman to Your Project’s AGENTS.md?

A practical test: count the number of times in the last week you skim-read a Claude response and thought “just give me the code.” If that’s more than once a day, install Caveman. If you actually read the prose responses end-to-end and find them useful, hold off.

For most production engineering teams in 2026, the answer is yes. The 65% token reduction translates to direct cost savings and faster session iteration, with minimal accuracy cost.

Common Pitfalls

  1. Installing globally then forgetting it’s on. New teammates joining your project may be confused by terse responses. Document Caveman in your team README.
  2. Using wenyan mode in shared codebases. The classical Chinese variant is fun but unreadable to most teams. Stick with full for shared projects.
  3. Compressing CLAUDE.md too aggressively. caveman-compress will rewrite your instructions in caveman style — which compounds compression. Read the output before committing.
  4. Forgetting Caveman doesn’t compress code. Some users expect output reduction across the board; only natural-language prose gets compressed, code stays full.
  5. Using /caveman with prompt-type hooks. The prompt hook fires a model call expecting a JSON response. Caveman compression can shorten the JSON output of the hook’s model call, but the hook handler itself doesn’t run with Caveman context. Plan accordingly.

2026 Q2 Update: v1.7.0 → v1.8.2 Recent Releases

Caveman shipped four production releases between April 15 and May 12, 2026. If you installed before May, re-run the universal installer — several measurement and integration features are new.

v1.8.2 (May 12, 2026) — Installer Hardening

  • curl | bash skill auto-install now passes --yes --all so it completes without prompts in CI
  • Gemini commands/caveman-init.toml had stray YAML fences that broke gemini extensions install; removed
  • Codex setting key renamed from codex_hookshooks to match Codex CLI 2026.5 config schema

v1.8.0 (May 10, 2026) — OpenClaw + opencode Native Integration

  • OpenClaw: Caveman now ships as a native skill dropped into ~/.openclaw/workspace/. No more npx shim — the installer detects OpenClaw and writes the skill files directly into the gateway’s workspace.
  • opencode: Native plugin path moved to ~/.config/opencode/plugins/caveman/. Previous versions required an npx wrapper; v1.8.0 onward writes the plugin files directly.
  • Installer rewrite: 34 provider detection paths, --only <provider> validation, symlink-safe flag writes via O_NOFOLLOW + 0600
  • Repo restructure: src/ now holds hooks/, rules/, tools/, mcp-servers/ in one tree
  • 50/50 tests passing (+7 vs v1.7.0)

v1.7.0 (May 1, 2026) — Measurement, cavecrew, MCP Middleware

This is the biggest release of Q2. Four headline features:

  1. /caveman-stats skill — Reads your session JSONL files, sums input/output tokens, applies current Anthropic pricing, and reports cumulative USD saved. First skill in the ecosystem to report measured rather than estimated savings.
  2. Statusline savings badge[CAVEMAN] ⛏ 12.4k renders inline in your Claude Code statusline showing lifetime tokens saved. Default-on in v1.7.0+.
  3. caveman-shrink MCP middleware — Now formally published to npm. Wraps any MCP server and rewrites tool/prompt/resource descriptions in caveman style before they enter the context window.
  4. cavecrew subagents — Three predefined subagents (investigator/builder/reviewer) wired with Claude Haiku/Sonnet/Haiku respectively, demonstrating ~60% reduction in handoff tokens between agents.

v1.6.0 (April 15, 2026) — Security Hardening

  • Symlink attack mitigation in safeWriteFlag() (O_NOFOLLOW + atomic write + 0600 permissions)
  • ESM require error fix when ~/.claude/package.json declares "type": "module"
  • CLAUDE_CONFIG_DIR environment variable now respected by hooks and statusline
  • Natural-language activation ("talk like caveman" / "normal mode") shipped to GA
  • Per-turn reinforcement so caveman style persists across long sessions

The combined effect of v1.6.0 → v1.8.2 is that Caveman is now a measurable, multi-agent, supply-chain-safe tool. If you adopted it in April for the novelty, the May releases are why teams are now adopting it for budget governance.

Measuring Real Savings: /caveman-stats and Statusline Badge

Until v1.7.0, Caveman’s 65% reduction figure came from the maintainer’s tiktoken benchmarks. v1.7.0 added per-user measurement:

# Inside Claude Code, anytime:
/caveman-stats

# Output (example, real session):
Caveman Savings Last 30 Days
  Output tokens saved: 187,420
  Output tokens spent: 102,180
  Estimated reduction: 64.7%
  USD saved (at Sonnet $15/M out): $2.81
  Lifetime saved: 1.42M tokens / $21.34

The skill reads the session JSONL files Claude Code writes to ~/.claude/projects/<project>/, sums tokens with and without Caveman context, and applies current Anthropic pricing.

The statusline badge is the persistent UI for the same data. After installing v1.7.0+, every Claude Code session shows [CAVEMAN] ⛏ 12.4k (or your actual lifetime number) in the statusline. The badge updates after each session. Hide it by setting caveman.statusline = false in your settings.

For teams managing API budgets, the combination of /caveman-stats (per-developer report) + statusline badge (always-visible nudge) makes Caveman the first ruleset in this category with closed-loop measurement rather than estimated savings.

Cavecrew: Multi-Agent Subagent Pattern

The biggest architectural addition in v1.7.0 is cavecrew — a worked example of compressing the handoff layer between Claude Code subagents.

A standard multi-agent pattern (one orchestrator delegating to specialists) wastes tokens on the handoff prose: “I’ll now delegate this to the X agent which will then return…”. Across a long session with multiple delegations, that handoff narration adds up to 30-60% of agent traffic.

Cavecrew defines three subagents with caveman-compressed system prompts and inter-agent contracts:

SubagentModelRoleHandoff style
investigatorClaude HaikuSearch codebase, gather contextReturns bullet-list facts only
builderClaude SonnetWrite/edit code based on investigator’s reportReports diffs + tests, no narration
reviewerClaude HaikuRun lint/typecheck on builder’s output, report regressionsReturns pass/fail + diff to revert

The orchestrator runs them in sequence (or in parallel where safe). Maintainer benchmarks show ~60% reduction in inter-agent token traffic vs the same workflow with uncompressed defaults.

You can install cavecrew standalone (curl ... | bash -s -- --only cavecrew) or use it as a template for writing your own compressed-handoff subagent stacks. For broader Claude Code subagent patterns, see our Claude Code Subagents complete reference.

OpenClaw and opencode Integration

v1.8.0 added first-class support for two newer agent platforms.

OpenClaw

OpenClaw is a self-hosted gateway that exposes Claude (and other model providers) over Slack/Discord/iMessage/MCP. v1.8.0 of Caveman ships a native skill drop at ~/.openclaw/workspace/:

# Detect and install for OpenClaw specifically
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh \
  | bash -s -- --only openclaw

The skill auto-activates per gateway session. If you bridge Slack messages to Claude through OpenClaw, every Slack reply that goes through the gateway is now caveman-compressed — useful when teams pay per-output-token and want predictable cost for support-channel workflows.

opencode

opencode is the open-source code agent CLI. v1.8.0 native plugin path:

~/.config/opencode/plugins/caveman/
  ├─ plugin.yml      # opencode plugin manifest
  ├─ rules/          # 10 rules in opencode-compatible markdown
  └─ commands/       # /caveman, /caveman-stats, etc.

Previous releases required wrapping with an npx shim; v1.8.0 writes the plugin files directly, so opencode sees Caveman as a native plugin alongside any others you’ve installed.

Both integrations follow the same philosophy: compression rules are markdown, the installer just figures out the right delivery path per agent. If you’re running a stack with Claude Code + Cursor + OpenClaw + opencode, one installer command wires Caveman into all four.

FAQ

What is Caveman.MD and how does it differ from AGENTS.md or CLAUDE.md?

Caveman is not a new rule file format. It’s a set of instructions (10 rules + 4 compression levels) that drops into your existing AGENTS.md, CLAUDE.md, .cursor/rules, .windsurf/rules, etc. The instructions tell the model to compress its prose output by 65% on average, while leaving code, URLs, file paths, and technical content byte-perfect.

How much money does Caveman actually save?

In benchmarks, Caveman cuts output tokens by an average of 65% (range 22–87%). For a user spending $50/month on Claude output tokens, that’s roughly $33/month back. Input token compression (via caveman-compress on CLAUDE.md / memory files) saves another ~46% on input, compounding session-over-session.

Does Caveman make Claude less accurate?

No — and possibly the opposite. A March 2026 research paper cited in the Caveman docs reports that “constraining large models to brief responses improved accuracy by 26 points on certain benchmarks.” Verbose preambles correlate with worse, not better, reasoning. Caveman removes the preamble while leaving thinking-token computation untouched.

What’s the difference between lite, full, ultra, and wenyan modes?

lite removes filler words only — output still reads like natural prose. full is the default caveman mode, telegraphic but readable. ultra is maximum brevity, slight readability cost. wenyan mimics classical Chinese style and produces the shortest output (often for fun or pure log-style use).

Does Caveman work with Cursor, Windsurf, Cline, Aider?

Yes — the universal installer detects 30+ supported agents and configures each. Cursor gets .cursor/rules/caveman.mdc with alwaysApply: true. Windsurf gets .windsurf/rules/caveman.md with trigger: always_on. Cline reads .clinerules/caveman.md. Aider reads it via .aider.conf.yml.

Can I use Caveman with the official Claude Code plugin system?

Yes — Caveman ships as a Claude Code plugin installed at ~/.claude/plugins/caveman/. The plugin auto-activates, no per-session opt-in needed. To uninstall, remove the directory and restart Claude Code.

What is caveman-shrink MCP middleware?

A separate utility that compresses MCP server tool descriptions before they enter the context window. MCP servers often have verbose, multi-paragraph tool descriptions that bloat every session’s startup context. caveman-shrink rewrites these in compressed form, recovering tokens without changing tool functionality.

Does Caveman compress my prompt or only the model’s output?

Only output by default. The model’s input (your prompt + CLAUDE.md + tool descriptions) is unchanged unless you explicitly run caveman-compress against memory files or caveman-shrink against MCP descriptions. The cost-saving is mostly on the output side, which is also where models are most prone to filler.

Can I add Caveman rules manually instead of using the installer?

Yes. The repo’s README lists the 10 rules verbatim — copy them into your AGENTS.md (or CLAUDE.md, or .cursor/rules/caveman.mdc) as plain markdown. The installer is just a convenience for wiring multiple agents at once.

How do I measure actual savings from Caveman in my own sessions?

Run /caveman-stats inside Claude Code (v1.7.0+). It reads your session JSONL files at ~/.claude/projects/<project>/, sums input/output tokens, applies current Anthropic pricing, and reports cumulative USD saved. The same data drives the statusline badge ([CAVEMAN] ⛏ 12.4k) shown by default after v1.7.0. This is the first ruleset in the AGENTS.md category with per-user measured savings rather than maintainer benchmarks.

What is cavecrew and how does it differ from running Claude Code subagents normally?

Cavecrew is a v1.7.0 reference implementation of caveman-compressed subagent handoffs. It defines three subagents — investigator (Haiku), builder (Sonnet), reviewer (Haiku) — with compressed system prompts and bullet-list inter-agent contracts. Maintainer benchmarks show ~60% reduction in handoff tokens vs the same workflow with uncompressed defaults. Install via curl ... | bash -s -- --only cavecrew or use it as a template for your own compressed-handoff subagent stacks.

Does Caveman work with OpenClaw or opencode?

Yes, since v1.8.0 (May 10, 2026). OpenClaw gets a native skill drop at ~/.openclaw/workspace/ — useful when Slack/Discord/iMessage replies go through the gateway and you want output-token compression at the bridge. opencode gets a native plugin at ~/.config/opencode/plugins/caveman/ (no more npx shim). Run the universal installer and it auto-detects both. Combined with Claude Code, Cursor, Codex CLI, and Gemini CLI support, one install command wires Caveman into the 34 supported provider paths.

External References

Related Articles

Career-Ops vs LazyApply vs JobScan: Which AI Job Search Tool Wins in 2026?

Career-Ops (44,500+ stars), LazyApply, and JobScan AI are the three dominant AI job search tools in 2026. This comparison breaks down their architecture, pricing, target users, control surface, and the exact scenarios where each wins. Includes decision framework, FAQ, and migration guide between them.

Claude Code Plugins: The Complete 2026 Reference (7 Component Types, plugin.json, Marketplace)

Every Claude Code plugin component in one page — verified against Anthropic's May 2026 docs. Cover Skills, Agents, Hooks, MCP servers, LSP servers, Monitors, Commands, the plugin.json schema, .claude-plugin/ structure, marketplace distribution, and production patterns.

How to Use Career-Ops on Claude Code: From Zero to First Application in 30 Minutes (2026)

Step-by-step tutorial for using Career-Ops — the viral Claude Code job-search system. CV onboarding, portal scanning, batch evaluation, application tracking, and the exact prompts to drive each of the 14 skill modes. Verified against the GitHub source as of May 2026.

Claude Code Hooks: The Complete 2026 Production Reference (32+ Events, 5 Handler Types, Exit Code Semantics)

Every Claude Code hook event in one page — verified against Anthropic's May 2026 docs. Cover all 32+ event types, 5 handler types (command/http/mcp_tool/prompt/agent), matcher patterns, exit code 2 blocking semantics, JSON input/output schemas, and production patterns.

Explore the collection

Browse all AI coding rules — CLAUDE.md, .cursorrules, AGENTS.md, and more.

Browse Rules