Technology Insights

Real-world implementation stories from our engineering team

The Three Meters Never Agree: Putting lean-ctx, rtk, and headroom on the Bench Part 3 of 6

The Three Meters Never Agree: Putting lean-ctx, rtk, and headroom on the Bench

Same real task, same harness, same models, one neutral meter — the provider's bill. A pre-registered protocol, right-of-reply issues filed with every maintainer, and every transcript public. Fourteen runs, ~76M tokens of my own money. What the bench said is not what any of the three meters promised.

How I Built a Capability Test I'd Actually Trust for a Coding Agent

How I Built a Capability Test I'd Actually Trust for a Coding Agent

A new model dropped, and I wanted to know if it was actually better for my agentic harness or just newer and pricier. Cost is the easy half; capability is the hard one — and most agent benchmarks fail because the task is too easy to separate the contestants. So I planted a one-token bug in git's version-comparison code, removed the version-control oracle, and had three Claude models find it blind — Opus, Fable, and Sonnet, one in the driver seat at a time. They all passed; precision and the rate card decided the rest. Plus the methodology twist where I almost reported a regression that wasn't there.

What Models Want: A Love Story About Tool Outputs, Runways, and Reading Minds

What Models Want: A Love Story About Tool Outputs, Runways, and Reading Minds

In 2000, MCP meant 'male chauvinist pig' — and Nick Marshall, the ad man who electrocutes himself into hearing women's thoughts, was Hollywood's prize specimen. In 2026, MCP means Model Context Protocol. We asked six AI models what they actually want from tool output, and one of them read its own harness's source code to answer. The verdict: the harness is the architect of the model's reality — and it never confesses. The models don't want less. They want honesty.

The Token Bill Arrives: Discovering lean-ctx Part 2 of 6

The Token Bill Arrives: Discovering lean-ctx

The first time I decomposed a month's token bill, the whale wasn't intelligence — it was transit. My agents were re-reading, re-sending, and narrating build logs at frontier-model prices. My first instinct was the same as everyone else's: reach for middleware. A whole market had sprung up in a matter of months.

Who Owns the Context Window? - Series Overview Part 0 of 6

Who Owns the Context Window? - Series Overview

A builder's six-part journey from a Claude Code plugin to a coding harness — through token bills, a public middleware benchmark, and a deep research pass — ending at a market-wide question: the platforms are absorbing context management, so where should it actually live?