Bound your costs

Budget is a four-axis cap enforced at runtime. Trips emit a BudgetExceeded event with exact context and unwind the run with RunFailed{ErrorType:"budget"}. Inline runtime checks, not after-the-fact dashboards.

The struct

type Budget struct {
    MaxInputTokens  int64
    MaxOutputTokens int64
    MaxUSD          float64
    MaxWallClock    time.Duration
}

Zero on any field disables that axis.

Where each axis trips

Axis	Type	Where enforced	`BudgetExceeded.Where`
`MaxInputTokens`	`int64`	Pre-call, before every `step.LLMCall`	`"pre_call"`
`MaxOutputTokens`	`int64`	Mid-stream, on every `ChunkUsage`	`"mid_stream"`
`MaxUSD`	`float64`	Mid-stream, using `budget/prices.go` per-model rates	`"mid_stream"`
`MaxWallClock`	`time.Duration`	`context.WithDeadline` wrapping the run	`"mid_stream"`

The token and USD axes count cumulatively across the whole run, not per turn. MaxWallClock is wall-clock from Run entry to terminal event.

Wiring a Budget

a := &starling.Agent{
    Provider: prov,
    Log:      log,
    Tools:    tools,
    Budget: &starling.Budget{
        MaxInputTokens:  100_000,
        MaxOutputTokens: 8_000,
        MaxUSD:          1.50,
        MaxWallClock:    2 * time.Minute,
    },
    Config: starling.Config{Model: "gpt-4o-mini", MaxTurns: 12},
}

nil Budget disables every axis.

The BudgetExceeded event

type BudgetExceeded struct {
    Limit         string  // "input_tokens" | "output_tokens" | "usd" | "wall_clock"
    Cap           float64
    Actual        float64
    Where         string  // "pre_call" | "mid_stream"
    TurnID        string  // omitempty
    CallID        string  // omitempty
    PartialText   string  // omitempty
    PartialTokens int64   // omitempty
}

Mid-stream trips include PartialText and PartialTokens so you can recover what the model produced before the cap was hit. Pre-call trips don't — the call never started.

After BudgetExceeded, the run unwinds with:

// RunFailed payload (truncated):
{
    ErrorType: "budget",
    Limit:     "usd",            // matches BudgetExceeded.Limit
    // ...
}

Reading a trip in CI

events, err := log.Read(ctx, runID)
if err != nil { return err }

for _, ev := range events {
    if ev.Kind != event.KindBudgetExceeded { continue }
    var be event.BudgetExceeded
    if err := event.AsBudgetExceeded(ev, &be); err != nil { return err }

    log.Printf("budget %s tripped at %s: cap=%.2f actual=%.2f",
        be.Limit, be.Where, be.Cap, be.Actual)
    if be.PartialText != "" {
        log.Printf("partial output (%d tokens): %s", be.PartialTokens, be.PartialText)
    }
}

USD pricing

The built-in price table lives in budget/prices.go and ships rates for the major-vendor models (OpenAI, Anthropic, Gemini, Bedrock foundation models). Rates are USD per million input / output tokens.

For custom in-house models or vendor models the table doesn't yet cover, register pricing at runtime:

import "github.com/jerkeyray/starling/budget"

budget.RegisterPricing("my-finetune", inPerMtok, outPerMtok)

RegisterPricing clears the unknown-model warn-once memo so a stale "no price entry for ..." warning doesn't outlive the registration. Negative or zero rates are accepted (they multiply through unmodified) - the intended use is custom models, not overriding shipped rates.

For models still not in the table, MaxUSD enforcement skips that axis (the runtime can't price what it doesn't know). Always set MaxInputTokens and MaxOutputTokens as defense-in-depth.

Picking values

Workload	Sane starting caps
Small QA bot, 1-3 turns	`MaxInputTokens: 8k`, `MaxOutputTokens: 2k`, `MaxUSD: 0.05`, `MaxWallClock: 30s`
Multi-tool research, 8-12 turns	`MaxInputTokens: 100k`, `MaxOutputTokens: 8k`, `MaxUSD: 1.50`, `MaxWallClock: 2m`
Long-running incident triage	`MaxInputTokens: 250k`, `MaxOutputTokens: 16k`, `MaxUSD: 5.00`, `MaxWallClock: 5m`
Replay only (no provider call)	All axes 0; replay never pays cost.

Production agents should always set MaxUSD, even if generous. It's the only axis that scales 1:1 with billing surprises.

Recovering after a trip

The terminal RunFailed{ErrorType:"budget"} is final — the run cannot continue under the same id. If you want a follow-up turn under a fresh budget:

Read the recorded run, find the BudgetExceeded.PartialText if any.
Construct a new run with a new goal that incorporates the partial output as context.
The old run stays in the log for audit.

Budgets vs `MaxTurns`

Config.MaxTurns caps the ReAct loop count. It's not a budget axis; it does not emit BudgetExceeded. A turn cap trip terminates with RunFailed{ErrorType:"max_turns"}. Use both: budgets for cost and time, MaxTurns for runaway tool-use loops.

Anti-patterns

Setting only MaxUSD. Models not in the price table aren't enforced. Add token caps as defense-in-depth.
MaxWallClock shorter than your slowest tool. A long step.SideEffect HTTP call counts against wall-clock. Pick a value that accounts for tool latency, not just LLM latency.
Reading Cap and Actual as integers. Both are float64 for USD compatibility. Cast explicitly when comparing token counts.
Treating budget trips as exceptional. They're a normal signal in prod — instrument the starling_budget_exceeded_total{axis=...} metric and alert when an axis trips at unexpected frequency.