How can organizations move AI from token maxxing to production value?

[Adobe Stock]

A recent video from a popular software developer–based content creator pokes fun at the at times unreliable nature of AI coding agents. At the beginning of the clip, a screen shows a blue circle with the text “coding then.” She wants it green, opens the file, edits one line, and it turns green. Then later, “coding now” pops up. The developer types her requests to an AI assistant in plain English. Asked to change the color, it adds a stray blue square. Asked to remove the square, it deletes the circle. “Are you dumb?” she writes, “can you just remove the square and add a green circle.” It manages the green circle but it is too small. She asks it to make the circle bigger. A red triangle appears.

The clip is a joke, but it highlights a real failure mode. Safe Superintelligence (SSI) founder and former OpenAI chief scientist Ilya Sutskever told Dwarkesh Patel in November 2025 that models are confounding because they can score well at hard benchmarks while their performance remains inconsistent in the real world. Their economic impact, too, appears to lag behind apparently steady gains in benchmarks, which themselves are susceptible to contamination and testing exploits.

Sometimes, spending more tokens doesn’t equal better outcomes

Ben Schein

Agent reliability is one theme. Cost is another, and a phenomenon known as token maxxing made it impossible to ignore. Jensen Huang had given the metric its aspirational ceiling at GTC, saying a $500,000 engineer or AI researcher should consume at least $250,000 in tokens a year. Otherwise, Huang said on the All-In Podcast, he would be “deeply alarmed”

Not long after, the volume of AI tokens became something of a status symbol. At Google I/O in May, Alphabet CEO Sundar Pichai put the number on a keynote slide: Google now processes more than 3.2 quadrillion tokens a month across its products, a sevenfold jump in a year. A total of 375 of its cloud customers each running more than a trillion tokens in the past year.

Inside companies, the number became a de facto scoreboard. An engineer at Meta built an intranet leaderboard called Claudeonomics that ranked more than 85,000 employees by token consumption. It handed out titles like “Token Legend” and “Session Immortal” to the top 250 power users. In 30 days those employees burned through about 60 trillion tokens, with the single heaviest user averaging 281 billion, according to The Information.

By late spring, the mood had turned. Amazon shut down an internal token leaderboard after leadership told employees to solve customer and business problems rather than chase usage for its own sake, and Uber burned through its entire 2026 AI coding budget in four months after gamifying consumption the same way, with its COO now asking whether the spend is worth it. “All of a sudden you get the bill and ask, ‘Why are we spending all this money? What are we even doing with it?'” said Ben Schein, chief AI and analytics officer at Domo, on the phenomenon.

That disconnect is what companies run into when the demo meets the org chart, and two executives describe the same wall from opposite sides. You can vibe-code a slick prototype in an afternoon, but “you can’t vibe code governance, security, and distribution,” Schein said.

On AI pilot ‘addiction’

Cathal McCarthy

Kore.ai Chief Strategy Officer Cathal McCarthy names the organizational version of it. Firms get “addicted to pilots,” he says, mistaking a run of impressive demos for progress.

The risk is that organizations don’t necessarily learn from pilots at scale. “People typically take the low-hanging fruit, the quick wins, and that’s not where the organizational learning happens,” McCarthy said. “The organizational learning happens at production scale.”

The usage numbers can mislead before any demo reaches the org chart. In a cross-national survey of more than 2,000 employees by Fractional Insights and Ferrazzi Greenlight published in Harvard Business Review, workers most anxious about AI reported that roughly 65% of their job was AI-assisted, against about 42% for the least anxious, even as the anxious group logged more than double the resistance to adopting it. The researchers read that gap as adoption that is performative rather than participatory, where fear of replacement pushes people to rack up tokens without real buy-in.

The popularity of vibe coding has made it relatively easy for employees with little-to-no software development experience to build app prototypes, but many of these are stuck in a demo phase. Potentially they have a front end with “made-up data or data you loaded into memory,” Schein said. “You can’t distribute such apps to thousands of employees, or to people in stores, the way we’ve done with pro-code apps.”

Part of the reason for pilot purgatory in the vibe coding domain is employees’ ability to move to a sort of phase where the user thinks, ‘I tried this, cool idea,’ and then gets stuck. They move onto the next thing because it never really worked,” Schein said.

Institutional change

Developing streams of demos can hinder organizations’ focus on pursuing AI transformation as an “institutional change,” McCarthy said. “The company’s blind spot is that they’re missing the organizational re-architecture that needs to take place,” he said. Focusing on AI pilots for their own sake can be a commitment to “not deciding anything.”

The way out of that holding pattern, both McCarthy and Schein argue, runs through the same move. LLMs are probabilistic and lack inherent stateful memory, making them imperfect partners for architecting and building and updating complex projects over time. Part of the solution is to use such probabilistic models to find an answer, then lock the repeatable part into a deterministic path. Schein shared an example where he pointed Claude at Domo via MCP to reason through a question from his CEO, then solidified the good answer into a Domo data transformation or an app. That way, the model re-enters only when the locked version stops matching expectations. “I don’t want to pay for tokens every time my CEO looks at it,” he said.
Domo’s App Catalyst tool, in open beta since January and used by more than 800 companies, lets non-developers build apps on top of their data. The company is repositioning the broader platform as a governance and control-tower layer over models from Anthropic, OpenAI, and Google.

Kore.ai’s enterprise agent platform, Artemis, promises predictability in production, an audit trail for every agent session, and logic that outlasts any single model. McCarthy pushes that auditing logic past the single session to the business itself, where the harder question becomes how many agents a company is even running. He frames it through a problem firms already have with their human headcount. “Ask any company going through a reorganization how many employees they have, and they’ll give you the wrong answer,” he said. Stand up agents without orchestration and approvals, and the same blind spot returns, now with digital workers that look alike and multiply unchecked. “Would you hire two sales engineers for the southeast?” he asked. “Perhaps not.”

The discipline sounds almost too obvious to state. In essence, figure out what you want, then commit to it. If an individual is trying to get in shape, hours in the gym track fairly well with results, so chasing more of them makes sense. Tokens are a looser proxy. The aforementioned HBR survey found the heaviest AI users tended to be the most anxious, which is why the researchers urge leaders to stop reading usage as buy-in and to separate activity from impact.

Schein draws the line between the probabilistic work of “figuring out what you want” and the moment you’ve “gotten to a good point” and decide it is worth keeping. “It’s good to know the art of the possible,” he said, “but it’s better to know the art of the possible and then make it possible.”

Sometimes, spending more tokens doesn’t equal better outcomes

On AI pilot ‘addiction’

Institutional change

Tell Us What You Think! Cancel reply

Related Articles Read More >

OpenAI launches Rosalind Biodefense, offers federal agencies early access to its life-sciences model

Benchling bets lab automation can ground AI co-scientists in the physical world

For AI co-scientists to scale, scientists have to trust them. The architectural bets to earn it vary.

Pistoia Alliance on why 69% of life sciences firms can’t measure AI’s impact, and the architectural shift that could change that

Search R&D World