Why Your CFO Is Quietly Panicking About Your AI Bill?
AI vendors don’t sell you outcomes. They sell you activity. Your finance team is about to find out the difference.
You’ve dipped your toes into AI tools at work, watched the demos, signed the procurement forms, and told the team to “just use AI more.”
Six months later, finance forwards you an invoice with a question mark in the subject line.
If that sounds dramatic, ask Uber.
The ride-hailing giant burned through its entire 2026 AI budget in four months. By April, the CTO was, in his own words, “back to the drawing board.” More on that in a bit.
Welcome to the conversation nobody warned you about.
Today, we are going to talk about something that sounds boring but is quietly becoming the next cloud bill shock for enterprises.
Tokens.
Yes, tokens. The tiny invisible units that AI models use to understand your prompt, (supposedly) “think” about it, and reply.
These tokens are the units of AI economics, in a way.
The AI economy is not priced around value. It is priced around activity.
Your business does not want tokens. Your business wants resolved support tickets, shipped features, reviewed pull requests, clean reports, and fewer meetings that should have been emails.
But the AI stack does not bill you for “business value created.” It bills you for text in, text out, hidden reasoning, retries, tool calls, agent loops, and context windows big enough to swallow your entire Confluence (Hello, Mr. Claude 🤠)
Welcome to token inflation.
Wait, What Even Is a Token?
A token is roughly a chunk of text. Not exactly a word. Not exactly a character.
When you ask for that witty one-liner joke from the AI, your prompt is translated to numbers called tokens. So, the LLM takes in tokens and then outputs tokens, which are again numbers. The output tokens are then ‘decoded’ to show us our English language version of the joke.
How are Tokens the Units of Billing?
If you look at the billing model of any LLM provider, you will notice that their usage is counted in:
Input tokens - What you send in
Output tokens - What the model generates back
Cached tokens- The reused parts of previous context, often priced cheaper.
So when you ask:
“Summarize this meeting transcript.”
You are paying for:
The transcript going in.
The model’s answer coming out.
Possibly the conversation history (in the same chat).
Possibly hidden reasoning.
Possibly tool calls.
Possibly retries if your app or agent keeps going.
Simple, right?
Of course not. This is AI. We made autocomplete expensive and then added a whole not-so-AI billing department.
The Pricing Tells You Where the Incentives Are
Look at the economics. Let’s see the pricing for two leading models.
OpenAI’s API pricing page currently lists GPT-5.5 at $5 per million input tokens, $0.50 per million cached input tokens, and $30 per million output tokens.
Claude’s API pricing page lists Claude Opus 4.7 at $5 per million input tokens, $0.50 per million cached input tokens, and $25 per million output tokens.
Read that again.
Output tokens cost 6x more than the input tokens.
In case of Claude, there is an additional charge for writing to their cache, depending on how long you want the cache to be active. This adds another $6.25 per million tokens with 5 minutes of retention. For 1 hour of retentions, this price becomes $10 per million tokens.
Google’s Gemini pricing also charges more for output than input, and importantly says output pricing includes “thinking tokens.” OpenAI’s reasoning model docs say reasoning tokens are not visible through the API, but they still occupy context window space and are billed as output tokens.
This is where things get spicy.
The most expensive part of the system is not always the part you see. The model may give you a short answer. But behind the scenes, it may have spent a lot of tokens “thinking.”
This is the AI equivalent of ordering one idly and getting charged for the chef’s emotional journey.
When the most expensive part of your bill is the part you cannot see, every model provider has a built-in incentive to make models that "think" a little longer, "explain" a little more, and "explore" a few extra steps before answering. The pricing model does the nudging on its own.
Is This Evil? Not Exactly.
Let us be fair.
Output tokens are not fake cost. Generating output is computationally expensive. Reasoning models do more work. Better models use more compute. Bigger context windows are not free. Agents running multiple steps are not magic.
So no, the takeaway is not:
“AI companies are villains trying to burn your tokens.”
That would be too easy. The takeaway is:
AI companies built a business model where the meter runs on activity, while users care about outcomes.
That gap is where token inflation lives.
When users cannot easily see whether extra tokens created extra value, the system naturally drifts toward more tokens. Longer answers. Bigger context. More retries. More agent steps. More “thinking.” More cost.
The First Token Bill Shock Is Already Here
This is not theoretical. And no, this is not happening only at scrappy startups burning runway. This is happening at one of the most operationally disciplined companies in the world.
The Uber Incident
In April 2026, Uber’s CTO Praveen Neppalli Naga admitted the company had exhausted its entire 2026 AI budget in just four months. His words: “I’m back to the drawing board, because the budget I thought I would need is blown away already.”
Let that sit for a second.
Uber is not a cash-strapped seed-stage shop. R&D spend is $3.4 billion a year. They have planning teams, finance controllers, and procurement systems that have survived a decade of hyper-growth. And yet, four months into 2026, the AI line item was already a smoking crater.
The driver? Claude Code, rolled out to engineers in December 2025. By February, usage had nearly doubled. By April, 95% of Uber engineers were using AI coding tools monthly. 70% of committed code now originates from AI. Per-engineer API costs ran between $500 and $2,000 a month.
What started as a productivity experiment turned into a runaway success that the company could not afford at scale.
This is the new pattern. And Uber is not alone.
Not an Isolated Case
Gergely Orosz recently wrote in The Pragmatic Engineer that token spend is already breaking budgets inside tech companies. He spoke with developers at 15 businesses, and the pattern was clear. At two companies, leadership said token spend had increased by around 10x in six months.
One director said their company had to raise API budget limits multiple times in April after switching to a higher-effort Claude setting that significantly increased the cost per pull request.
A seed-stage AI infra company said spend went from about $200 per developer per month to around $3,000 per developer per month in six months.
We are not talking about one random developer accidentally leaving a script running overnight. We are talking about normal AI adoption patterns inside companies.
This is the cloud bill shock all over again. Except this time, the bill is attached to “thinking.”
The Two Corporate Strategies
The Pragmatic Engineer piece describes two broad responses.
Strategy one: Let it rip and start measuring.
These companies are saying yes, spend is going up, but maybe productivity is going up even faster. Let developers use the tools. Measure later.
Strategy two: Curb spending.
These companies are setting cheaper default models, adding spend caps, and forcing users to justify expensive models.
Both strategies make sense. And both can go wrong.
If you curb too early, you may kill real productivity gains before you understand them. If you let it rip forever, congratulations, you have invented a token bonfire.
Tokenmaxxing: When Metrics Become Targets
Here is the weirdest part.
The same Pragmatic Engineer article mentions a “tokenmaxxing” trend, where developers run agents partly to boost their personal token stats and avoid looking like they are not using AI enough.
This is Goodhart’s Law wearing a hoodie.
When “AI usage” becomes a performance signal, people will optimize for AI usage. Not necessarily useful output. Not necessarily shipped work.
Just usage. Tokens burned. Agent sessions started. Copilot chats opened.
This is how you get dashboards that say “AI adoption is up 400%” while nobody asks the only question that matters:
Did the work actually get better?
Vendor-Side Inflation vs User-Side Sprawl
There are two forces here.
The first is vendor-side token inflation. This happens when AI providers and tools benefit from more metered usage. Longer outputs. Larger context windows. Hidden reasoning tokens. Expensive default models. Agent loops. Premium routing. Tool-call chains.
The second is user-side token sprawl. This happens when employees treat AI like free magic.
“Summarize this entire repo.”
“Rewrite this again.”
“Try one more approach.”
“Run the agent overnight.”
“Use the best model, just in case.”
“Analyze all 400 files.”
“Think harder.”
Each individual request feels tiny. But so did each AWS instance when we started the cloud migration.
Then, finance saw the bill.
The Agent Problem
Chatbots are expensive enough. Agents are where token spend goes feral.
A chatbot does this:
Prompt → answer.
An agent does this:
Prompt → plan → search → tool call → read result → think → retry → inspect files → generate patch → run tests → fix errors → retry → summarize → ask for approval → continue anyway because someone forgot the stop condition.
That is not one model call. That is a tiny software team made of loops. And every loop can burn tokens.
The “cost per token” is the wrong metric. The real metric is Cost per completed task.
If a cheap model needs 20 attempts, it may not be cheap. If an expensive model solves the task once, it may be worth it. If an agent spends ₹10,000 worth of tokens and produces code nobody reviews, that is not productivity. That is performance theatre.
The Enterprise Risk
The risk is not “AI will cost money.” Of course, AI will cost money.
The risk is that companies will confuse three very different things.
AI usage means people are using the tool.
AI productivity means work is getting done faster.
AI value means the right work is getting done with acceptable quality, risk, and cost.
A company can have high AI usage and low AI value. In fact, many will. Especially if leadership says “use AI more” without ever defining what “better work” looks like.
So What Should Builders Do?
If you are building AI into an enterprise product, budget for token control early. Not later. Early.
Just like security. Just like observability. Just like retries. Just like boring infrastructure that nobody appreciates until production catches fire.
Here is the basic checklist.
1. Set Token Budgets Per Task
Do not give agents infinite runway. Set hard limits.
Maximum output tokens. Maximum reasoning effort. Maximum tool calls. Maximum retries. Maximum files inspected. Maximum wall-clock time.
A good agent should know when to stop. A bad agent keeps “thinking” until your CFO starts thinking too.
2. Default to Cheaper Models
Most tasks do not need the most expensive model.
Classification? Cheap model. Formatting? Cheap model. Simple summarization? Cheap model. Extracting dates from text? Please do not summon the frontier model like it is a temple deity.
Use expensive models only when the task actually needs them. And make model upgrades intentional, not sticky.
3. Measure Cost Per Outcome
Do not celebrate token volume. Measure what matters.
Cost per support ticket resolved. Cost per PR merged. Cost per bug fixed. Cost per document reviewed. Cost per sales call summarized. Cost per analyst report generated.
The unit of value is not the token. The unit of value is the job done.
4. Expose the Meter
Users behave differently when they can see the cost.
Before running a large task, show an estimate.
“This may process 200k tokens and cost around $X. Continue?”
For internal tools, show team-level dashboards. Not to shame people. To prevent accidental bonfires.
5. Cap Hidden Reasoning
Reasoning models are powerful. But “think harder” should not be the default for everything.
If the provider exposes reasoning effort controls, use them. If the model supports low, medium, and high reasoning modes, start low and escalate only when needed.
One of the examples in The Pragmatic Engineer piece involved companies seeing costs jump after switching to higher-effort settings. That is the kind of toggle that needs a big warning label.
“This button may improve quality. It may also quietly eat your budget.”
6. Cache Aggressively
If the same context is reused again and again, cache it.
Cached input tokens can be dramatically cheaper than regular input tokens. OpenAI lists cached GPT-5.5 input at $0.50 per million tokens compared with $5.00 for normal input. That is a 10x difference for sending the same thing twice.
This matters for enterprise apps that repeatedly send the same policies, schemas, docs, product catalogs, or system instructions.
Do not pay full price to remind the model of the same thing 10,000 times. That is not intelligence. That is bad plumbing.
What Should Individual Users Do?
You cannot control the entire AI economy. But you can control your prompts.
Try these:
“Answer in under 150 words.”
“No preamble.”
“Ask before doing a long analysis.”
“Give me the shortest useful answer.”
“Do not restate my question.”
“Use a cheaper or faster model unless this requires deep reasoning.”
“Give me the answer first, then optional details.”
Small prompt changes can reduce output bloat dramatically.
For paid tools, be careful with words like deep research, agent mode, thinking mode, high effort, best model, analyze everything, continue, and try again. These are often useful. They are also where the meter starts running faster.
Don’t Be Fooled by the Hype
AI is useful. Very useful in many cases. Not so much in many cases too.
I use it. You use it. Your company probably wants everyone to use it more.
But “use AI more” is not a strategy. It is a slogan.
The grounded version is:
Use AI where the value exceeds the cost, risk, and supervision burden.
That sounds less exciting. It also survives contact with the finance team.
The next wave of enterprise AI will not just be about model quality. It will be about AI FinOps.
Who gets to use expensive models? Which tasks deserve high reasoning? When should agents stop? How do we measure value? How do we avoid rewarding people for burning tokens?
These questions are boring. Which means they are probably important.
Wrapping Up
Token inflation is what happens when the unit of billing gets detached from the unit of value.
You want outcomes. The AI stack bills activity.
That gap is manageable when usage is small. But as agents enter IDEs, support desks, CRMs, data tools, and internal workflows, the gap becomes expensive.
Today, teams are excited that AI can do more work. Tomorrow, they will ask why the bill looks like a small engineering team’s payroll.
And somewhere, in a beautiful dashboard, there will be a chart showing token usage going up and to the right.
The real question is:
Is the value going up with AI usage?






