Your OpenAI bill came in. It's three times what you budgeted.
You go looking for the cause. The usage dashboard shows aggregate numbers. The logs show completed runs. Nothing looks obviously wrong.
But somewhere in your production stack, an AI agent has been burning tokens on a problem it can't solve — and every tool you have was designed to catch errors, not waste.
The Problem
AI agent cost spikes don't look like errors. They look like activity.
An agent in a retry loop calls the same tool 50 times before timing out. Each call consumes tokens. Each call registers as a legitimate API request. Your infrastructure sees 50 successful HTTP calls. Your billing dashboard sees 50x normal consumption.
Nobody fires an alert. Nobody sends a notification. You find out when the invoice arrives — or when you manually check the dashboard because something felt off.
By then, the spend has already happened. And more importantly, the problem is still running.
Why It's Hard to Catch
LLM cost anomalies are structurally invisible to standard monitoring.
Token usage isn't logged per-run by default — Most application logging setups capture request/response. They don't automatically compute and store token consumption per individual agent run. Without run-level cost data, you can't identify which run spiked.
Aggregate dashboards hide the signal — If your agent runs 1,000 times per day and one run consumes 100x normal tokens, that spike is smoothed out in your daily aggregate. It disappears. The average looks fine.
Retry behavior is silent — When an agent can't make progress, it retries. The retry loop is valid behavior from the API's perspective. There's no error to catch. There's just spend.
Context stuffing is hard to detect — Sometimes agents receive bloated inputs — large documents, full database dumps, verbose tool responses — that inflate per-run costs without any obvious failure signal.
Cost alerts are set on monthly ceilings — By the time a monthly budget alert fires, you've already overspent. You need run-level cost alerts, not account-level billing alerts.
Real Example
A startup runs an AI agent that analyzes customer feedback and generates weekly summary reports. Normal token cost: approximately $0.20 per run.
A schema change in the feedback database causes the agent to receive the entire unfiltered feedback table instead of just the latest week's entries. The table contains 400,000 entries.
The agent runs. It tries to process all 400,000 entries. It hits context limits. It truncates. It generates a summary of garbage data. It calls the reformatting tool 12 times trying to fix the output.
Total token cost for one run: $47. Normal cost: $0.20.
Multiply by seven days of daily runs before anyone notices: $329 spent on a broken report nobody read.
No error was thrown. No alert fired. The agent technically completed every run.
Why Existing Solutions Fall Short
OpenAI Billing Alerts — You can set monthly spend limits. These fire after you've already spent the money. They're a circuit breaker, not an early warning system.
Usage Dashboard — Shows daily and monthly aggregates by model. No run-level breakdown. No anomaly detection. You're looking for a needle in a stack of averages.
Application Logging — You can log token counts per request if you build that instrumentation yourself. Most teams don't build it until after the first surprise invoice. And even then, logs don't alert — they record.
Manual Review — Checking usage dashboards manually doesn't scale. And the spike that matters is the one that happens at 2am on a Saturday.
What Actually Works
Early cost spike detection requires three things:
- Run-level cost tracking — Every agent run should have its token consumption recorded and stored as a data point. Not just the total for the day — the cost per individual run.
- Baseline comparison — Once you have run-level data, you can establish a normal cost range for each agent. Any run that exceeds 2x or 3x the baseline triggers an alert.
- Real-time alerting — The alert needs to fire during the problematic run, not after the billing cycle closes. Minutes matter when an agent is in a cost-amplifying loop.
RootBrief instruments your agent runs with cost tracking at the run level. It establishes your baseline automatically from your first weeks of production data, then fires alerts the moment any run exceeds your defined threshold — before the next run compounds the problem.
You stop finding out about cost spikes from your invoice. You find out from an alert, within minutes of the spike starting.
If you're already running workflows in production, you need visibility — not just logs.
How to Start
Before your next invoice arrives, do three things:
Step 1 — Pull your last 30 days of run-level token data. If you don't have run-level data, that's your first problem to fix.
Step 2 — Calculate your average token consumption per agent run. Set your alert threshold at 3x that average.
Step 3 — Configure that threshold to fire in real time, not at end-of-day.
That setup catches 90% of cost spike scenarios before they become invoice line items.
Learn what else breaks when AI agents go to production
See the full OpenAI agent production monitoring checklist
AI agent cost spikes are predictable. They happen when agents encounter input conditions they weren't tested against — and they retry, loop, and consume resources trying to solve problems they can't solve.
The spend accumulates silently. The invoice arrives as a surprise. The root cause takes hours to debug after the fact.
None of that is inevitable. Run-level cost monitoring catches the spike as it starts. The fix happens in minutes, not days.