axagent experiencelivev0.35.0
cost routing · since v0.27

Your frontier model is doing intern work.

Every sub-task your agent spawns — grepping 200 files, running the test suite, applying a spec’d-out patch — runs on your most expensive model by default. ax keeps the frontier model where it earns its keep and flags the mechanical work down a tier. Same work lands; you just get more of it before your usage window resets. ax learns the fix from your own history and proves the result — all on your laptop.

and the default leaves money on the table · one real machine, 14 days of receipts

$19,270agent spend · 14 days
663sub-tasks spawned
75%ran on the expensive default
28:1expensive-to-cheap spend

$2,301 of sub-task spend on fable/opus vs $83 on sonnet, on this machine. Run ax cost split to see yours.

01 / The loop

Measure. Nudge. Tune. Verify.

Four commands close the loop — see where the expensive defaults hide, get warned before the next one, learn the fix from what your agents actually do, then reprice against the tokens your sub-tasks actually burned.

01measure
ax cost split --days=7

Breaks your spend into main session vs the sub-tasks your agent spawns, by model. The 28:1 above came out of this table — run it to see your ratio.

02nudge
ax hooks install ~/.ax/hooks/route-dispatch.ts --providers=claude

The route-dispatch hook warns in the moment, right as a routine sub-task is about to run on the expensive default.

03tune
ax routing tune

Finds the routine work you keep overpaying for, from what your agents actually do. Deterministic pattern-matching — no AI guessing.

04verify
ax dispatches --candidates

Reprices every overbilled sub-task against what the cheaper model would have cost, from the actual tokens it burned. Your savings, not projections.

02 / The proof it’s not worse

Same work landed. Fewer windows burned.

The honest answer to “how do I know it didn’t get worse” is: you measure it. Hold the work constant — commits landed, turns taken — and watch the plan window stretch. ax routing impact captures a routing-off block and a routing-on block live on your machine, then diffs them.

~/Projects/axzsh
$ ax routing impact report

matched work across both blocks      routing off    routing on
────────────────────────────────────────────────────────────
assistant turns (work proxy)             312            318
commits landed                            11             11
5h-window utilization burned            1.00x          0.82x
────────────────────────────────────────────────────────────
1.22x more work per plan window, same output landed

Illustrative shape, not a recorded receipt — this is the one number on the page that’s yours to capture, not ours to claim. Run the two blocks on your machine (ax routing impact begin --arm=off end, then the same with --arm=on) and ax routing impact report fills in your own off-vs-on diff. Your plan window is the budget, so the receipt is in windows, not invoices.

03 / The receipts

One machine, verbatim.

Over 30 days: 20 patterns of routine work found, $591.57 of addressable spend. Reprice the last 14 days against the cheaper model and $512.91 is recoverable — the same machine as the numbers above. ax routing tune --dry-run prints yours in one command.

~/Projects/axzsh
$ ax routing tune --dry-run --days=30

20 proposals  addressable spend: $591.57  (30 days)
apply non-judgment: ax routing tune --days=30   brief: ax routing tune --emit-brief

$ ax dispatches --candidates --days=14

total est savings: $512.91
top classes: well-specified-impl ($222.78), spec-review ($69.75), bug-fix ($62.08)

Honest numbers, on purpose: “addressable spend” is what the flagged sub-tasks actually cost over the window — yours included. ax reprices what already happened, from the actual tokens burned, and never reports fabricated projected savings.

04 / The safety rule

Routing only touches the mechanical lanes.

Your obvious objection: won’t quality drop? The honest guarantee isn’t “it can’t” — it’s that the risk is bounded, visible, and reversible. ax only flags deterministic mechanical classes (bounded), you own the routing table and see every class (visible), and the impact receipt catches a regression before it compounds (reversible). Your reviews, design calls, and plans never leave the frontier model.

flagged to tier down

  • File search & repo recon (billed at opus rates today)
  • Well-specified implementation (the spec did the thinking)
  • Bug fixes with a repro (mechanical once located)
  • Verification & test runs (pass/fail, not judgment)

stays on the frontier model

  • Code review (quality gate, never flagged)
  • Design & UX (taste does not tier down)
  • Planning & architecture (the expensive mistakes)
  • Audits (the thing checking the cheap work)
ax advises — you route. ax flags mechanical dispatches and nudges the cheaper model at dispatch time; your agent still makes the call (a Claude Code hook can’t rewrite a dispatch, so it suggests, it doesn’t enforce). Judgment-shaped work never gets nudged — it ships as a written brief your agent stress-tests against your own history before any class is added.

Route the expensive model where it earns its keep.

Everything runs local. Your transcripts never leave your machine.

$ curl -fsSL https://ax.necmttn.com/install | bash$ ax ingest # first run: build the graph from your history$ ax routing tune