The Caveman Principle

Why AI Pricing Is Still Broken

A tool called Caveman hit #1 on Hacker News with 688 points.

It makes Claude speak like a prehistoric human.

Instead of: "Great question! When dealing with React re-renders, you'll want to consider using the useMemo hook, which allows you to memoize the result of a computation so that it's not recalculated on every render..." (1,180 tokens)

You get: "New object ref each render. Wrap in useMemo." (159 tokens)

No articles. No pleasantries. 87% fewer tokens. 688 people thought this was worth upvoting.

That's not a fun hack. That's revealed preference about what's broken in AI pricing.

The Real Cost Behind the Humor

Caveman exists because tokens cost money. Not abstractly — concretely, operationally, in a way that changes developer behavior.

The benchmarks in the repo are striking: React explanation goes from 1,180 to 159 tokens. PostgreSQL setup: 2,347 to 380. Average savings: 65%. For a developer making hundreds of API calls per day, this isn't optimization — it's survival math.

But here's what the Caveman readme doesn't say: why is anyone building token compression tools at all?

Because the pricing model that governs AI access was designed for a world that no longer exists.

How We Got Here

The mental model behind $20/month AI subscriptions comes from streaming. Netflix charges one price regardless of how many hours you watch. This works because human attention is bounded — nobody watches Netflix for 22 hours a day. The math holds.

AI subscriptions inherited this logic. Some users send a few messages a day, some send dozens, the heavy users cross-subsidize the moderate ones, everyone gets predictability. The abstraction holds — until agents arrive.

An agent doesn't have attention. It doesn't pause to think. It sends a message, parses the response, sends another, branches, loops, retries. A background agent working on a codebase overnight makes 500–2,000 API calls. A customer support agent runs continuously, with zero idle time.

A human power user might send 50 messages on a busy day. An agent sends 50 messages before you finish your morning coffee.

The flat subscription model doesn't fail for agents because providers are being restrictive. It fails because the math never worked. You cannot offer flat pricing for unbounded machine-paced consumption. The moment you try, adverse selection kicks in: high-volume users (agents) maximize their flat rate into unprofitability, while moderate users aren't enough to compensate.

The Cascade

When Anthropic blocked third-party agentic tools from Claude subscriptions recently, the developer community erupted. OpenClaw users lost access overnight. Threads hit the front page.

Most of the anger targeted Anthropic's timing. But their technical explanation was honest: "Our subscriptions weren't built for the usage patterns of these third-party tools."

That's not spin. Anthropic's Boris Cherny spelled out the actual problem: their own Claude Code tool is built to maximize "prompt cache hit rates" — reusing previously processed context to save compute. Third-party tools aren't optimized this way. The math only works if caching works. Third-party tools break the math.

So now you have two communities responding to the same underlying problem through different lenses:

Caveman: make the AI say less so it costs less.
Claude Code: cache aggressively so compute gets reused.

Both are correct. Both are workarounds for a pricing model that wasn't built for agents.

What the Right Model Looks Like

The honest answer is that agentic workloads need usage-based billing. Pay per token, with caching as a first-class optimization lever.

This sounds worse for developers. It's actually better, for three reasons:

Transparent costs. You know exactly what an agent run costs. You can set spending limits, alert thresholds, kill switches. With flat subscriptions, cost is opaque until you get cut off.

Aligned incentives. When you pay per token, you're motivated to minimize waste. Caveman's value proposition is identical to cache optimization — both exist because the cost signal is real. Real cost signals create better software.

Predictable unit economics. If you're building a product on top of an AI API, your costs should scale with your usage, not with your provider's estimate of average human behavior. Subscription pricing makes agent cost modeling impossible. API pricing makes it straightforward.

The developers who will win in the machine-paced era are those who internalize this now. Not as a constraint — as a design principle. Every agent you build should have a token budget. Every workflow should have a cache strategy. Every API call should have a cost attribution.

The Structural Truth

Caveman is funny. 688 people upvoted it. But the reason it exists is that the developer community is paying real money for tokens and knows it.

The streaming subscription model was built for human-paced consumption. We're in the machine-paced era now. The pricing models, the billing abstractions, the infrastructure assumptions — they all need to be rebuilt around a simple truth:

Agents don't pause. Pricing models that assume they do are priced for the world we left behind.

The flat subscription is ending for agents. Not because providers are hostile. Because math is.


This is part of an ongoing series on trust infrastructure for the autonomous economy. Earlier essays: The Agent Passed All the Checks. That Was the Problem., Declarations Are Gameable, Who Decides What Agents Are Allowed to Buy? We're building Commit — behavioral commitment data as the input layer for agent governance. Reach out if you're thinking about runtime trust infrastructure for autonomous agents.

Stay in the loop

Early access, research updates, and the occasional strong opinion.