← All articles Engineering

You shouldn't learn what your AI costs from the invoice

Q: How does routing to different models reduce cost?

Most calls (classification, extraction, summarization, routing decisions) are easy and run just as well on a small, cheap model. Reserving the expensive frontier model for the calls that actually need it means the easy majority of your traffic stops paying premium prices - often the largest single saving available.

Q: What does cost per run mean?

It is the cost of a single workflow execution - the models and tokens that one run used - recorded with the run itself rather than aggregated into a monthly total. Per-run cost lets you see which workflow is expensive the moment it executes, instead of reverse-engineering it from an invoice.

OBTO Team · Insights from the Glass Box

The invoice is the worst possible place to learn what your AI cost. By the time it lands, the money is spent, the runaway workflow has been running all month, and all you can do is squint at one large number and try to reverse-engineer where it came from. For something you might call thousands of times a day, that is a strange way to run a budget.

Per-token pricing is part of why. Every call is cheap enough to feel free, so nobody adds them up — until the total stops being cheap. The fix isn't a spreadsheet at month end. It's making cost a number you can see per run, while routing each run to a model that doesn't cost more than it has to.

Why per-token pricing hides the bill until it's too late

A single model call costs a fraction of a cent. That's the trap: each one is beneath notice, so it goes unnoticed — times ten thousand a day, times every workflow, times a frontier model you reached for out of habit. Three multipliers nobody watches in real time, surfacing as one line on an invoice weeks later.

And the invoice can't tell you the thing you actually need to know: which run, which workflow, which decision drove the cost. A monthly total is an autopsy. By the time you read it, you can't change what it's reporting on.

The two ways teams overpay

There are really two separate problems hiding inside "our AI bill is too high," and they have different fixes.

1. Routing everything to a frontier model

The strongest model is the safe default — it handles anything, so why think about it? Because most calls don't need it. Classifying a ticket, extracting a field, summarizing a paragraph, deciding which tool to call — these are easy, and a small, cheap model does them just as well for a fraction of the price. Sending them to a frontier model is paying first-class fare to ride one stop. The easy eighty percent of your calls is where the money quietly leaks.

2. Not seeing cost until the invoice

Even with perfect routing, if you can't see what a run costs as it happens, you're flying blind. The workflow that doubled in cost this week looks identical to last week until the bill says otherwise. Cost has to be observable per run — attached to the actual call, not aggregated into a monthly lump — or you can't manage it.

The fix: right-size the call, then show the price

Put the two halves together and the problem mostly disappears.

First, route by difficulty. Send the easy calls to a cheap, fast model and reserve the strong, expensive one for the calls that genuinely need it. You set the policy — what counts as easy, where the line sits — and most workloads find that the large majority of calls are easy. That alone can cut the bill substantially, without touching quality on the hard calls.

Second, attach the cost to the run. Every call should record what it used — which model, how many tokens, what it cost — right there with the call, so you can read the price of a workflow the moment it runs, not four weeks later. When cost is a per-run number, the expensive workflow announces itself the first time it runs, while you can still do something about it.

Where OBTO fits

This is the shape of OBTO's answer, and it falls out of how the platform already works. Routing is a policy you set, so the easy calls go to a cheap model and the hard ones to a strong one without you wiring it by hand every time. And because every action on OBTO leaves a Glass Receipt — a per-call record of what ran and what it cost — cost isn't a separate analytics project. It's already attached to the run: readable before the invoice, and traceable to the exact workflow that spent it.

What stays yours is the judgment — where the easy/hard line sits, what your budget thresholds are, which workflows are worth a frontier model. OBTO doesn't decide that you're overpaying; it makes overpaying visible, per run, so you can. For the deeper mechanics, we go into routing across models and tracking what your agents cost in their own pieces. This one is about the principle that ties them together: you should never meet your AI cost for the first time on the invoice.

When you want to see it, getting started puts a governed tool and a real, costed receipt in front of you in minutes, and pricing starts at zero.

Frequently asked questions

Why is per-token AI pricing so hard to budget for?

Because each call is individually trivial — a fraction of a cent — so the cost stays invisible until volume, model choice, and frequency multiply it into a large number on a monthly invoice. The unit that's easy to ignore is the unit that adds up.

How does routing to different models reduce cost?

Most calls — classification, extraction, summarization, routing decisions — are easy and run just as well on a small, cheap model. Reserving the expensive frontier model for the calls that actually need it means the easy majority of your traffic stops paying premium prices, often the largest single saving available.

What does "cost per run" mean?

It's the cost of a single workflow execution — the models and tokens that one run used — recorded with the run itself rather than aggregated into a monthly total. Per-run cost lets you see which workflow is expensive the moment it executes, instead of reverse-engineering it from an invoice.

How does OBTO show cost before the invoice?

Every call on OBTO leaves a Glass Receipt that records what ran and what it cost. Because cost is attached to each call, you can read the price of a run as it happens and trace any figure back to the workflow that produced it — no separate analytics pipeline, no waiting for the bill.

Do I still control which model is used?

Yes. You set the routing policy — what counts as an easy call, where the cheap/strong line is, which workflows always get the frontier model. OBTO makes the cost of those choices visible per run; it doesn't make the choices for you.