Budget intelligence: knowing what AI costs you

We talked to a team of eight engineers in January. They had been using Claude Code for two months. When we asked how much they'd spent on API costs, the answer was: they didn't know. They'd been paying the invoices but nobody had looked at which sessions or which developers were generating the majority of the spend. One person had burned $800 in a single day debugging a particularly gnarly issue.

This is the standard pattern. Teams adopt AI coding tools, costs grow with usage, and the first meaningful visibility anyone has is the monthly invoice. By then, the expensive session that caused the spike happened three weeks ago and nobody remembers what they were working on.

Token counting

Cost starts with token counting. For Claude Code with hooks enabled, we get exact token counts from the hook payload — Claude reports input_tokens and output_tokens with each tool call. For other tools where we're working from PTY output, we apply the 3.5 characters/token estimate to prompt and response text.

Once we have a token count, we apply per-model pricing. We maintain a pricing table for the three Claude tiers (Haiku, Sonnet, Opus) and update it when Anthropic changes rates. The model used in a session is detected either from the hook payload (model field) or from PTY output patterns — each model has a slightly different response header in stream mode.

typescript

// src/main/engines/cost-calculator.ts
const MODEL_PRICING: Record<string, { input: number; output: number }> = {
  "claude-haiku-4-5":  { input: 0.80,  output: 4.00  }, // per 1M tokens
  "claude-sonnet-4-5": { input: 3.00,  output: 15.00 },
  "claude-opus-4-5":   { input: 15.00, output: 75.00 },
};

function calculateCost(usage: TokenUsage, model: string): number {
  const pricing = MODEL_PRICING[model] ?? MODEL_PRICING["claude-sonnet-4-5"];
  const inputCost  = (usage.inputTokens  / 1_000_000) * pricing.input;
  const outputCost = (usage.outputTokens / 1_000_000) * pricing.output;
  return inputCost + outputCost;
}

Real-time cost events

Cost isn't just stored — it's streamed to the renderer. Every time a trace is completed, we emit a token:usage IPC event with the trace cost and running session total. The StatsBar at the top of every session view shows a live cost number that ticks up as each trace completes. There's something viscerally useful about watching $0.03 appear next to a prompt — it calibrates your intuition for which types of requests are expensive.

Per-developer attribution

Cost is attributed at the session level, and sessions are owned by users. On team plans, the Analytics view shows cost broken down by developer: total spend, cost per session, cost per prompt, and cost trend over time. This isn't about surveillance; it's about understanding workflow patterns. The developer spending 10x the team average might be doing something inefficient (massive file dumps in every prompt) or might be the person doing the hardest work. The data shows the pattern; the team decides what it means.

Budget thresholds and spike detection

Each project can have a monthly budget configured. We track three threshold levels: warning (80% of budget), critical (95%), and exceeded (100%). When any threshold is crossed, we push an alert to the session dashboard and send an email notification.

Spike detection is more nuanced. We compute a rolling 14-day average cost per session for each developer. When a session's cost exceeds 300% of their rolling average, it triggers a spike alert. The alert links directly to the session and surfaces the top traces by cost — so you can see immediately which prompts drove the spike.

The 300% spike threshold was tuned to avoid alert fatigue. Expensive sessions happen legitimately (debugging complex bugs, large refactors). 300% catches genuine anomalies — the session where someone accidentally dumped their entire monorepo into context — without crying wolf on normal expensive days.

The result

Predictable AI spending with no surprises. Teams that deploy Operon's budget intelligence typically reduce their AI API costs by 15–25% within the first month — not by using AI less, but by using it more deliberately. When you can see the cost of each prompt in real time, you naturally start crafting better prompts.

Budget intelligence: knowing what AI costs you

Token counting

Real-time cost events

Per-developer attribution

Budget thresholds and spike detection

The result

Related posts

How much does Claude Code actually cost? I tracked 30 days of sessions.

Why we built a persistent terminal daemon

The trace model that unifies 5 AI tools

Subscribe to updates