June 1, 2026

You can't underwrite a bill you can't reproduce

Give an AI agent the same job twice and the two bills won’t match. An agent is software that works through a task on its own, deciding as it goes how much work the job needs, so the cost comes out different each time you run it. Not wildly different on a calm day. Different enough that the number you hand finance for next quarter is a guess, not a quote.

This is the cost problem nobody warns you about up front. Traditional software has a price you can quote in advance. An agent’s cost is a range you can only put a number on after the fact. You don’t find out what a run cost until the agent has decided, on its own, how much work to do.

The reason is structural and a better model won’t fix it. Ordinary software does a fixed amount of work and bills a fixed amount. An agent reads a result, decides what to do next, and may loop twice or twenty times. How much it costs isn’t set when it starts. It’s set by choices the model makes while it’s running, against inputs you haven’t seen yet.

The pilot hides the variance

This is why a pilot looks so clean. A pilot runs on curated inputs: one or two operators, a short list of cases someone picked on purpose. The costs cluster because the inputs do. Production is the opposite, with messy inputs, many users at once, and edge cases nobody staged. The variance was there the whole time. A pilot just never has enough runs, or strange enough ones, to show the tail.

Gartner expects most of these projects to die at exactly that seam.

Over 40% of agentic AI projects will be canceled by the end of 2027, due to escalating costs, unclear business value or inadequate risk controls.

— Gartner, June 2025

Those three look like separate problems. They’re the same one. An agent you can’t predict, on cost or on behavior, is one you can’t price and can’t defend to a board. The capability was never what stood in the way.

What finance is being asked to sign

Put yourself on the other side of the budget. Most software you buy is a fixed monthly cost: the same number this month, next month, and every month after. An agent asks finance to sign off on a number the system itself can’t promise. The forecast isn’t a single figure, it’s a wide range, and the worst case sits a long way above the typical one. No CFO underwrites a range that wide on a system still proving it’s worth the spend.

The agent projects that stall here almost never stall on capability. They stall on a finance team that can’t get a straight answer to one question: what does this cost next month. The engineering works. The number underneath it won’t hold still long enough to defend.

The FinOps Foundation, the people whose entire job is accounting for cloud spend, put it plainly for AI workloads: “predictability is generally lower,” and “higher volatility of costs makes trend-based forecasting in general more challenging.” Their own guidance is that you have to revise the forecast far more often than you would for ordinary cloud.

The size of the move is easy to feel. At twenty-five dollars per million output tokens on the top model, the difference between a run that answers in one pass and one that keeps going, re-reading everything it has done so far each time it continues, is the difference between a rounding error and a line item finance asks about. Same task, same prompt, a different decision made mid-run.

Illustrative. A deterministic call lands at one price every time. An agent decides at runtime how many steps to take, so its cost spreads into a long right tail. A spend cap chops the tail off, which doesn’t make those runs cheaper to predict. It makes where they fail harder to.

A cap truncates the work, not the risk

The standard answer is a spend cap. Set a ceiling, sleep at night. A cap does stop the catastrophic run, and every agent in production should have one. But look at what it does to the picture above. It doesn’t narrow the distribution. It draws a vertical line and discards everything to the right of it.

The runs in that tail ran up the cost for different reasons. Some were genuinely hard cases that needed more work to reach a right answer. Others were a loosely scoped agent wandering because nothing told it when to stop. A cap can’t tell the two apart. It stops both half-finished, and they return something confident and wrong. You haven’t removed the uncertainty, you’ve relocated it, from the invoice to the output, where it’s harder to see and far worse to explain to whoever relied on the answer.

What makes a run forecastable

The fix isn’t a tighter ceiling. It’s making the run reproducible enough that its cost lands in a narrow band before any cap is involved. That means bounding what the agent is allowed to do, not only what it’s allowed to spend. A ceiling on how much work one run can take on. A defined set of tools instead of open-ended reach. A prompt that spells out the goal and the exact shape of the output, so the model isn’t left improvising to fill in what you left vague.

Do that and the fat curve pulls in toward the spike. The bill becomes a number you can quote, because the run becomes a thing you can repeat. That is the work, and it’s less glamorous than autonomy. It’s also the entire distance between a demo that wowed a steering committee and a system finance will sign off on twice.

The agents that reach production this year won’t be the most autonomous. They’ll be the ones whose cost you can name before they run. Every vendor is selling more independence and less supervision. The agents that clear a budget review will be the ones that gave some of it back.

SharePost on X LinkedIn Reddit Hacker News Teams

Keep reading

1 wk ago

Why your multi-agent workflow costs 10x what you budgeted in 2026

One LangChain coordination loop ran 11 days and billed $47,000 before anyone noticed. Anthropic’s own multi-agent system burns 15x the tokens of an equivalent chat. The math behind these overruns is here, with what the teams that fixed it actually changed.

8 hrs ago

AI adoption stalled at half of US business. Now they're putting their own staff on a meter.

Half of US business now pays for AI, and adoption stalled the same quarter companies began capping their own staff’s access. Rationing is what it looks like when the bill outran the plan. The spend reduces to tokens, and there are exactly two levers on it: how much context rides on every call, and which model you put the work through. A staff quota touches neither.

Stax newsletter

Get the cost side of agentic AI before it lands on your invoice.