May 22, 2026

Why your multi-agent workflow costs 10x what you budgeted in 2026

Last updated May 22, 2026

Four agents coordinating a market-research task entered an infinite loop in 2026. The Analyzer kept asking the Verifier for confirmation, and the Verifier kept asking back. Eleven days later the bill was $47,000, run up under monitoring dashboards, Slack alerts at 80 percent of budget, and a provider-level cap that did nothing to stop it. The same task, run by a single agent, costs about twenty cents.

Tokens are the units a language model bills on, roughly a word or punctuation mark each. Every byte the agent reads in or writes back gets counted. Once you accept that, the rest of the math becomes obvious.

The triangular tax

Every time an agent takes a step, the model re-reads the entire conversation up to that point. Message 201 costs as much to send as messages 1 through 200 combined. The bill doesn’t grow with each step. It grows with the square of how many steps came before.

Anthropic’s prompt-caching announcement put a public number on this overhead for the first time. Cached content gets billed at 10 percent of the normal rate. Anthropic calls that “up to 90% cost reduction” in its own pricing copy. Read it the other way around: without caching, the bulk of an agent’s bill is paying for the same context, again and again, while the actual thinking is a rounding error.

An April 2026 Stanford paper argues that most claimed multi-agent advantages are “better explained by unaccounted computation and context effects rather than inherent architectural benefits.” The thing being paid for is rarely the intelligence.

Fan-out makes it worse

Throwing more agents at a hard problem is the natural response. The math punishes it.

Anthropic published exactly what its own multi-agent research system costs to run: about 15x the tokens of an equivalent chat, before any of the actual work gets done. The justification in the same post is that the architecture suits tasks “where the value of the task is high enough to pay for the increased performance.” Honest, but it doesn’t make the bill smaller.

Cognition, the team behind Devin, argues against multi-agent designs altogether. Their “Don’t Build Multi-Agents” position holds that one agent running end to end, with a separate compression model, beats peer collaboration in nearly every production case the company has shipped. The Stanford paper finds the same thing under controlled budgets. When you give a single agent the same token allowance as a multi-agent swarm, it matches or beats the swarm on multi-hop reasoning. The supposed gains from peer collaboration look like an artifact of giving the swarm more rope.

Don’t build multi-agents. A single agent with shared context beats peer collaboration in nearly every production setting we’ve shipped.

— Cognition AI, Don't Build Multi-Agents

Multi-agent earns its 15x on one specific class of work: genuinely breadth-first tasks, information that exceeds a single context window, or too many complex tools for one agent to coordinate. Anthropic’s strongest reported result was a 90 percent gain on parallel lookups across S&P 500 board memberships, where every lead can be chased independently. Most production workloads don’t look like that, which is why most production agents shouldn’t be multi-agent.

Retries don’t retry, they double

Most frameworks replay the full conversation on every retry. A 20 percent per-step failure rate compounds into roughly 2x the baseline bill, not 1.2x. Each failed step pays the accumulated context cost a second time.

A 2026 retry-budget analysis pinned the math. At a 20 percent step-failure rate, a three-step agent runs at 1.7 to 1.9x baseline, and a five-step agent at 2.2 to 2.5x. The same write-up documents a single user session that burned 1.67 billion tokens in five hours, costing between $16,000 and $50,000, because a context-reloading bug fired over and over.

Datadog logged the production side of this in its 2026 State of AI Engineering report. Five percent of model calls return errors in production, and 60 percent of those errors are rate limits. The provider refused to take the call. The request never ran. The retry chain tries again, and the bill keeps growing while no work gets done.

Multi-agent systems use about 15× more tokens than chats. Token usage alone explains 80% of performance variance.

— Anthropic, How we built our multi-agent research system

The fixes that stick

ProjectDiscovery’s cache hit rate moved from 7 percent to 84 percent over two months. The change was structural: the parts of the prompt that change every call got moved out of the cacheable section, so the cache actually held across calls. The result was a 59 percent reduction in overall token bill, with 9.8 billion tokens served from cache. Two of their tasks ran at identical token volume but 91.8 percent and 3.2 percent cache rates. The cost difference between them was about 60x.

Anthropic’s own pricing points to the same ceiling. Cached reads cost 10 percent of fresh input, so a high-cache workload runs at 10 to 20 percent of what an uncached one does. The 90 percent figure in the marketing is the ceiling. ProjectDiscovery’s 59 percent is closer to what a careful team actually reaches.

Beyond caching, two changes show up at every team that has cut its agent bill. The work that doesn’t need the most expensive model gets routed to a cheaper one, like Haiku or Flash. That strips a large chunk off the bill at almost no quality cost. And budget ceilings get enforced inside the agent loop, before the call lands at the provider. The $47,000 incident had monitoring at the dashboard and alert layers. Neither layer is fast enough to stop a runaway call mid-flight.

The Monday-morning test

Most agent teams in 2026 still wire their budget enforcement the same way the $47,000 team did, on dashboards and alerts that fire after the spend has already happened. The next twelve months will sort agent shops into two groups: teams that treat token spend as a real budget constraint inside the agent loop, and teams that find out on Monday morning what their agents did over the weekend.

The Integration Layer

Cost, context, and architecture for agentic AI. Weekly in your inbox.