Who Owns the Context Window?
Every AI coding agent has one scarce resource that matters more than its model, its tools, or its prompt: the context window — the decision about what the model actually sees, each turn, in each phase of its work. Tokens are the line item on the invoice. The window is who decides what becomes a token at all.
For the past year I've been forced to answer that question the practical way, by building the same system twice. Forge, the SDLC engine I build at Entelligentsia, runs as a Claude Code plugin and as forge-cli, a standalone harness on the pi coding agent. Along the way I adopted context middleware, wrote a compression library, built a full context governor — and then commissioned a deep research pass that told me the platforms were absorbing the entire layer I'd been building.
This series is that journey, told in order, with the research woven in where the journey earned it. It's written for developers building with agents and for the managers paying for them. No part requires the others, but they compound — the way the problem did.
The six parts
Part 1 — Same Brain, Two Bodies: Forge as a Plugin, Forge as a Harness
Forge lives in two bodies. The plugin's orchestrators were prose — markdown an LLM executed — until Claude Code GA'd Dynamic Workflows last week and the port was so natural it felt like the platform shipping the part I'd been simulating. The second body, forge-cli, was code from day one. The two are converging on everything except one row of the comparison table: what my agents see, each turn, in each phase. That row is the series.
Part 2 — The Token Bill Arrives: Discovering lean-ctx
The first time I decomposed a month's token bill, the whale wasn't intelligence — it was transit. My first instinct was everyone's: reach for middleware — lean-ctx, rtk, headroom, a real market built in a matter of months on a measurable problem, each with its scalpel at a different layer. And somewhere between the install and the invoice, three questions started nagging that none of the available meters could answer.
Part 3 — The Three Meters Never Agree: Putting lean-ctx, rtk, and headroom on the Bench
Same real task, same harness, same models, one neutral meter — the provider's bill. A pre-registered protocol frozen in git before the runs, right-of-reply issues filed with every maintainer, every transcript public. What the bench said is not what any of the three meters promised — and the differences between the meters became the finding.
Part 4 — Then I Asked Whether Any of This Should Exist
Having built the layer three times, I stopped building and ran the research: a 99-agent deep-research workflow, 78 extracted claims, 25 adversarially verified. The verdict was uncomfortable. Anthropic has absorbed nearly the entire middleware surface into server-side primitives; OpenAI commoditized compaction into an endpoint; Google made the question invisible. I wasn't reading a market survey — I was triaging my own roadmap.
Part 5 — Who Audits the Meter?
This one began as my own accusation: you charge me for every token, compress before it hits your infrastructure, and pocket the difference. The charge failed on the facts — billing is post-edit, and every API response carries the proof. But it survived in sharper form: the per-request savings data evaporates before it reaches any dashboard. Cache savings get a first-class billing line; editing savings, you take on faith. Trust-me metering, and the question only a governor-builder knows to ask.
Part 6 — Where I Land: The Governor Belongs in the Harness
The conviction the journey produced. Middleware is transitionary — the platforms ship your roadmap. Platform absorption is real but arrives below the line, as opacity. The durable home for context management is the harness layer: open, pluggable, phase-aware, cache-respecting, audit-trailed. And a watchlist: when the opt-in betas become defaults, and whether anyone builds the cross-provider token accounting no vendor has an incentive to.
Why you might care
If you build with agents: this is a field guide to a layer of your stack that is moving under your feet, written by someone who built on it before and after it moved.
If you manage teams that build with agents: Parts 2, 3, and 5 are about money — where token spend actually goes, what an independent benchmark says about saving it, and why no vendor dashboard will show you the number you most need.
If you build developer tools: Parts 3 and 4 are the uncomfortable ones. Read them before your next roadmap review.
Forge is open source: the Claude Code plugin and forge-cli, built on the pi coding agent. New parts publish weekly.