May 29, 2026
Anthropic is about to become the first profitable AI lab. Every Opus 4.8 default is quietly tuned to make you spend more.
Last updated May 29, 2026
Anthropic released Opus 4.8 today and put an effort control next to the model selector in the Claude app. The slider itself is not new, Claude Code has had it since 4.7, but now it sits in front of every consumer user, and it defaults to high. Read what Anthropic says the lower setting does: on lower effort settings, Claude “will respond faster and use up a user’s rate limits more slowly.” Read it the other way around. The setting they turn on by default burns your limit faster than the one they leave off.
I spent the last piece on why a multi-agent workflow bills 10x what you budgeted. That was the mechanism. What should bother you more is who builds it into the defaults. The most candid research lab on agent cost keeps setting every product default to the expensive end of the curve it spent two years documenting.
Anthropic is the honest one
Anthropic publishes numbers most vendors bury. When their research team wrote up how they built their own multi-agent system, they put the cost in plain text: about 15x the tokens of a normal chat, and token usage alone explaining 80 percent of the performance difference. No other frontier lab tells you that about its own flagship architecture.
Their guidance on context engineering is sharper still. The whole point, they argue, is restraint: find the fewest tokens that still get the job done, because the model works against a finite attention budget and every wasted token degrades the result. That is the correct advice. It is also the exact opposite of what their product surface now nudges you to do.
Context engineering is finding the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome.
Every new default points the same way
Opus 4.8’s headline feature is workflows: Claude can “plan the work and then run hundreds of parallel subagents in a single session.” That is a real capability, and for a genuine codebase-wide migration it earns its keep. But read it next to the 15x number from the same company. A multi-agent run is the worst case for token spend there is. Every subagent carries its own copy of the context, every one of them replays that context on every step, and the coordinator pays again to stitch the results back together. Anthropic told you this in its own engineering post. The workflows feature productizes the most expensive shape an agent run can take and puts it one click from default.
The effort slider does the same thing at the level of a single prompt. High effort means Claude “will think more frequently and more deeply,” and thinking tokens bill at the full output rate. The model defaults to high because, in Anthropic’s words, it gives “the best overall balance of quality and user experience.” It also quietly defaults every casual question to the setting that drains your weekly allowance fastest.
None of these features is a trick. Each one is genuinely useful on the task it was built for. The problem is the direction they all point. Longer runs, deeper thinking, more parallelism, more context held in memory. Every default released this year moves you up the same cost curve Anthropic’s own research arm tells you to move down.
You see a limit, not a bill
Most people running Claude on a Max or Team plan never see a token count. They see a usage bar that empties, and a message that says come back in a few hours. The feedback that would tell you a feature is expensive has been abstracted away into a limit that just arrives sooner.
That abstraction used to be cushioned by subsidy. It is not anymore. The industry has started calling the pattern tokenmaxxing, and PulseMCP’s recent write-up notes that Anthropic has “largely pulled out of subsidizing token costs for individuals” on its Max plans. When the slack comes out of the plan, every default that spends more shows up directly as a limit you hit at lunch instead of dinner.
For a single user that is an annoyance. For a company running Claude across a team, it is a budget line nobody can read. Productive work and pure token-burn produce the same signal: a higher bill and a depleted limit. You cannot tell which one you bought, because the one number that would tell you is the one the interface hides.
It’s incentive, not malice
The honest read here is structural. Anthropic bills on tokens, and the model is working: the company is on track to post its first profitable quarter, the only frontier lab in the black while OpenAI still projects billions in losses. Every feature that increases token throughput per session increases revenue per user, whether or not it increases value per user. A vendor does not need a conspiracy to keep its defaults leaning expensive. It needs an income statement, and the incentive does the rest.
The tell is in which way the defaults break. High effort on, not off. Hundreds of subagents one click away, with no budget ceiling beside the button. Longer autonomous runs framed as a capability, never as a spend. When the convenient default and the expensive default are always the same default, that is a pricing decision wearing the costume of a quality decision.
What to turn down
The defaults are adjustable. For any team running Claude against a real budget, these are the first settings to revisit.
- Drop effort to medium for routine prompts and save high for the work that genuinely needs deep reasoning.
- Do not reach for workflows or subagents unless the task is truly parallel; one agent with good context beats a swarm on anything with chained steps.
- Start a fresh session per task instead of one ballooning thread, because every turn replays the whole history back to the model.
- Route bulk and low-stakes work to a cheaper model like Sonnet, and keep Opus for the parts that earn it.
- Keep the stable parts of a prompt stable so they stay cached at roughly a tenth of the price.
Treat effort and parallelism as cost controls, because that is what they are. The teams that win on agent economics over the next year will not be the ones running the most subagents. They will be the ones who read the defaults as the billing decisions they are, and adjusted accordingly. The rest will keep discovering, every Monday, that their Claude limit is a throttle they forgot they were allowed to turn.
The Integration Layer
Cost, context, and architecture for agentic AI. Weekly in your inbox.