Bound your costs
Four axes — input tokens, output tokens, USD, wall-clock. Enforced inside the runtime, not after the fact. A trip emits BudgetExceeded and unwinds.
Budget is a four-axis cap enforced at runtime. Trips emit a
BudgetExceeded event with exact context and unwind the run with
RunFailed{ErrorType:"budget"}. Inline runtime checks, not after-the-fact
dashboards.
The struct
type Budget struct {
MaxInputTokens int64
MaxOutputTokens int64
MaxUSD float64
MaxWallClock time.Duration
}Zero on any field disables that axis.
Where each axis trips
| Axis | Type | Where enforced | BudgetExceeded.Where |
|---|---|---|---|
MaxInputTokens | int64 | Pre-call, before every step.LLMCall | "pre_call" |
MaxOutputTokens | int64 | Mid-stream, on every ChunkUsage | "mid_stream" |
MaxUSD | float64 | Mid-stream, using budget/prices.go per-model rates | "mid_stream" |
MaxWallClock | time.Duration | context.WithDeadline wrapping the run | "mid_stream" |
The token and USD axes count cumulatively across the whole run, not per
turn. MaxWallClock is wall-clock from Run entry to terminal event.
Wiring a Budget
a := &starling.Agent{
Provider: prov,
Log: log,
Tools: tools,
Budget: &starling.Budget{
MaxInputTokens: 100_000,
MaxOutputTokens: 8_000,
MaxUSD: 1.50,
MaxWallClock: 2 * time.Minute,
},
Config: starling.Config{Model: "gpt-4o-mini", MaxTurns: 12},
}nil Budget disables every axis.
The BudgetExceeded event
type BudgetExceeded struct {
Limit string // "input_tokens" | "output_tokens" | "usd" | "wall_clock"
Cap float64
Actual float64
Where string // "pre_call" | "mid_stream"
TurnID string // omitempty
CallID string // omitempty
PartialText string // omitempty
PartialTokens int64 // omitempty
}Mid-stream trips include PartialText and PartialTokens so you can
recover what the model produced before the cap was hit. Pre-call trips
don't — the call never started.
After BudgetExceeded, the run unwinds with:
// RunFailed payload (truncated):
{
ErrorType: "budget",
Limit: "usd", // matches BudgetExceeded.Limit
// ...
}Reading a trip in CI
events, err := log.Read(ctx, runID)
if err != nil { return err }
for _, ev := range events {
if ev.Kind != event.KindBudgetExceeded { continue }
var be event.BudgetExceeded
if err := event.AsBudgetExceeded(ev, &be); err != nil { return err }
log.Printf("budget %s tripped at %s: cap=%.2f actual=%.2f",
be.Limit, be.Where, be.Cap, be.Actual)
if be.PartialText != "" {
log.Printf("partial output (%d tokens): %s", be.PartialTokens, be.PartialText)
}
}USD pricing
The built-in price table lives in budget/prices.go and ships
rates for the major-vendor models (OpenAI, Anthropic, Gemini,
Bedrock foundation models). Rates are USD per million input /
output tokens.
For custom in-house models or vendor models the table doesn't yet cover, register pricing at runtime:
import "github.com/jerkeyray/starling/budget"
budget.RegisterPricing("my-finetune", inPerMtok, outPerMtok)RegisterPricing clears the unknown-model warn-once memo so a
stale "no price entry for ..." warning doesn't outlive the
registration. Negative or zero rates are accepted (they multiply
through unmodified) - the intended use is custom models, not
overriding shipped rates.
For models still not in the table, MaxUSD enforcement skips that
axis (the runtime can't price what it doesn't know). Always set
MaxInputTokens and MaxOutputTokens as defense-in-depth.
Picking values
| Workload | Sane starting caps |
|---|---|
| Small QA bot, 1-3 turns | MaxInputTokens: 8k, MaxOutputTokens: 2k, MaxUSD: 0.05, MaxWallClock: 30s |
| Multi-tool research, 8-12 turns | MaxInputTokens: 100k, MaxOutputTokens: 8k, MaxUSD: 1.50, MaxWallClock: 2m |
| Long-running incident triage | MaxInputTokens: 250k, MaxOutputTokens: 16k, MaxUSD: 5.00, MaxWallClock: 5m |
| Replay only (no provider call) | All axes 0; replay never pays cost. |
Production agents should always set MaxUSD, even if generous. It's
the only axis that scales 1:1 with billing surprises.
Recovering after a trip
The terminal RunFailed{ErrorType:"budget"} is final — the run cannot
continue under the same id. If you want a follow-up turn under a fresh
budget:
- Read the recorded run, find the
BudgetExceeded.PartialTextif any. - Construct a new run with a new goal that incorporates the partial output as context.
- The old run stays in the log for audit.
Budgets vs MaxTurns
Config.MaxTurns caps the ReAct loop count. It's not a budget axis;
it does not emit BudgetExceeded. A turn cap trip terminates with
RunFailed{ErrorType:"max_turns"}. Use both: budgets for cost and
time, MaxTurns for runaway tool-use loops.
Anti-patterns
- Setting only
MaxUSD. Models not in the price table aren't enforced. Add token caps as defense-in-depth. MaxWallClockshorter than your slowest tool. A longstep.SideEffectHTTP call counts against wall-clock. Pick a value that accounts for tool latency, not just LLM latency.- Reading
CapandActualas integers. Both arefloat64for USD compatibility. Cast explicitly when comparing token counts. - Treating budget trips as exceptional. They're a normal signal in
prod — instrument the
starling_budget_exceeded_total{axis=...}metric and alert when an axis trips at unexpected frequency.