An adoption playbook from the AI CoE
Stop paying flagship prices for commodity tasks.
Flagship models cost 8–25x more than the smallest model from the same provider — and most teams run everything on flagship by default. This site teaches the program we use to fix that without touching quality: measure the baseline, stack the savings levers in risk order, and enforce the result with a gateway.
8–25x
price spread between flagship and smallest model, within one provider
Most teams run everything on flagship by default.
~90%
discount on cache-read input tokens, all four providers
Mechanical to capture. Zero quality risk.
50%
flat batch discount for anything that can wait
Evals, enrichment, nightly jobs — half price.
What the demo dataset looks like before optimization
Synthetic demo data — not real spend$44,898
monthly run-rate across 4 providers
54%
of spend on flagship-tier models
7%
overall cache hit rate (low = money on the table)
0%
of spend on the 50%-off batch tier
This is the exact failure pattern the playbook targets: flagship-heavy model mix, weak caching, almost no batch. Explore it — or load your own export.
The program in three phases
Baseline
Export token-level usage from all four providers and normalize it into one schema. Tokens by model, not just dollars.
Phase 1 →
Levers
Apply four levers in risk order: caching and batch (mechanical, zero quality risk), then governed right-sizing, then — carefully — routing.
Phase 2 →
Tools
Put a gateway in front of every provider so policy, budgets, and attribution are config — not a memo.
Phase 3 →
Mechanical first
Every provider gives ~90% off cache reads and a flat 50% off batch. Those two levers carry zero quality risk — capture them before touching model choice.
Evidence, not vibes
Savings are computed per workload from token counts at verified prices — never as a vendor-brochure percentage of the bill. If a model claims >60% total savings, we re-check the classification.
Quality is the constraint
No silent downgrades. Eval-before-downgrade on every right-sizing move, and no ML routing until an eval harness exists. The GPT-5 router backlash is the cautionary tale.
Prefer to watch it first?
A short walkthrough of the whole program — baseline, levers, gateway — is in production. Until then, the interactive explainers on the Levers and Tools pages let you run the same numbers yourself.
The AI Spend Playbook in 5 minutes
Video coming soon · ~5 min
In the meantime, the interactive explainers cover the same ground.
Ready to run this for your org?
Start with Phase 1: pull three months of usage from each provider console or admin API. The walkthrough shows you exactly where each export lives.
Phase 1: Baseline your spend →