An adoption playbook from the AI CoE

Stop paying flagship prices for commodity tasks.

Flagship models cost 8–25x more than the smallest model from the same provider — and most teams run everything on flagship by default. This site teaches the program we use to fix that without touching quality: measure the baseline, stack the savings levers in risk order, and enforce the result with a gateway.

Start the walkthrough →Jump to the interactive analyzer

Pricing verified against official sources · June 20264 providers analyzed — Anthropic · OpenAI · Google · Fireworks12 gateways & routers compared, primary sources only

8–25x

price spread between flagship and smallest model, within one provider

Most teams run everything on flagship by default.

~90%

discount on cache-read input tokens, all four providers

Mechanical to capture. Zero quality risk.

50%

flat batch discount for anything that can wait

Evals, enrichment, nightly jobs — half price.

What the demo dataset looks like before optimization

Synthetic demo data — not real spend

$44,898

monthly run-rate across 4 providers

54%

of spend on flagship-tier models

overall cache hit rate (low = money on the table)

of spend on the 50%-off batch tier

This is the exact failure pattern the playbook targets: flagship-heavy model mix, weak caching, almost no batch. Explore it — or load your own export.

The program in three phases

Baseline

Export token-level usage from all four providers and normalize it into one schema. Tokens by model, not just dollars.

Phase 1 →

→

Levers

Apply four levers in risk order: caching and batch (mechanical, zero quality risk), then governed right-sizing, then — carefully — routing.

Phase 2 →

→

Tools

Put a gateway in front of every provider so policy, budgets, and attribution are config — not a memo.

Phase 3 →

Mechanical first

Every provider gives ~90% off cache reads and a flat 50% off batch. Those two levers carry zero quality risk — capture them before touching model choice.

Evidence, not vibes

Savings are computed per workload from token counts at verified prices — never as a vendor-brochure percentage of the bill. If a model claims >60% total savings, we re-check the classification.

Quality is the constraint

No silent downgrades. Eval-before-downgrade on every right-sizing move, and no ML routing until an eval harness exists. The GPT-5 router backlash is the cautionary tale.

Prefer to watch it first?

A short walkthrough of the whole program — baseline, levers, gateway — is in production. Until then, the interactive explainers on the Levers and Tools pages let you run the same numbers yourself.

The AI Spend Playbook in 5 minutes

Video coming soon · ~5 min

In the meantime, the interactive explainers cover the same ground.

Ready to run this for your org?

Start with Phase 1: pull three months of usage from each provider console or admin API. The walkthrough shows you exactly where each export lives.

Phase 1: Baseline your spend →