why starling.

eight categories of capability, all wired around the event log as source of truth. audit, replay, recovery, cost control, providers, tools, storage, observability.

01audit & integrity

every run is tamper-evident.

append-only event log. blake3 hash chain. merkle root committed in the terminal event.

hash chain on append

Each event carries PrevHash = BLAKE3(canonical CBOR of prior event). Mutate any prior event and the chain breaks.

merkle root commit

RunCompleted, RunFailed, and RunCancelled embed a Merkle root over every prior event. Sign it for cross-process attestation.

eventlog.validate

Headless integrity check: seq monotonicity, hash chain, terminal placement, Merkle root, and semantic pairing rules.

raw response digest

AssistantMessageCompleted carries a BLAKE3 digest the adapter computes over the SDK-level response. Optional strict mode rejects empty digests.

canonical cbor

RFC 8949 §4.2: shortest integer form, sorted map keys, no indefinite-length items. The byte representation is deterministic.

02determinism & replay

re-execute any run, byte-for-byte.

recorded side effects + deterministic loop = portable runs. replay never re-contacts the provider.

starling.replay

Re-runs the agent against a recorded log. The first event that does not byte-match surfaces as a typed Divergence.

step.now & step.random

Wall-clock and RNG are recorded once on the live run, returned from the log on replay.

step.sideeffect

Wrap any non-deterministic effect (HTTP, filesystem, MCP). Replay reads the recorded value, skips the closure.

divergence type

replay.Divergence carries Seq, Kind, ExpectedKind, Class, and Reason. errors.Is + errors.As give you structured access.

stream & inspector mode

replay.Stream yields a ReplayStep per event so the inspector can render recorded vs produced side-by-side.

03recovery

resume a crashed run from its last seq.

reconstruct conversation state from the log, reissue pending tool calls, mark the boundary with a seam event.

agent.resume

Re-enters a run in a new process. Pending tool calls reissue under fresh CallIDs; orphaned schedules stay for audit.

runresumed seam

A non-terminal event marking the boundary between processes. Carries AtSeq, ExtraMessage, ReissueTools, and PendingCalls.

withreissuetools(false)

Refuse to re-fire pending tools, return ErrPartialToolCall instead. Use when tools are mutating and you want manual intervention.

schema preflight

Run and Resume call eventlog.Preflight on startup. Stale or too-new schemas fail fast with a remediation message.

04cost control

budgets enforced inside the runtime.

four axes. inline checks, not after-the-fact dashboards. a trip emits budgetexceeded and unwinds the run.

maxinputtokens

Pre-call check before every step.LLMCall. Counts the planned prompt, refuses the call if it would exceed.

maxoutputtokens

Mid-stream check on every ChunkUsage. Cancels the stream the moment the cap is crossed.

maxusd

Mid-stream USD enforcement using a per-model price table in the budget package. Edit the table to add or override rates.

maxwallclock

context.WithDeadline wrapping the run. The deadline triggers RunFailed{ErrorType:"budget", Limit:"wall_clock"}.

budgetexceeded event

Emitted with limit, cap, actual, and where (pre_call | mid_stream | post_call) so post-mortems are exact.

05providers

bring your model. or all of them.

adapters share a streaming contract and a conformance suite. openai-compatible endpoints plug in via withbaseurl.

openai adapter

Plus Groq, Together, Ollama, vLLM, LM Studio, Azure, anything else OpenAI-compatible.

anthropic adapter

Tool use, extended thinking with per-block signatures, prompt caching metadata.

gemini adapter

Native Google Gemini through the Google AI streaming API.

bedrock adapter

Amazon Bedrock via native ConverseStream — tool use, reasoning with signatures, redacted thinking, cache-aware usage.

openrouter adapter

Thin wrapper over the OpenAI adapter with attribution headers and routing.

conformance suite

Reusable harness in provider/conformance asserting request shape, chunk ordering, tool-call IDs, usage, and cancellation.

capability declaration

provider.Capabler exposes which features each adapter supports. Tests skip what the adapter cannot do.

06tools

typed go tools. mcp for the rest.

tool.tool is one interface. tool.typed derives json schema from your input type. the mcp adapter mounts remote servers.

tool.typed

Wrap a typed Go function as a Starling tool. JSON Schema is derived from the input type via reflection.

mcp adapter

Three transports: stdio subprocess, streamable HTTP, custom mcp.Transport. Calls route through step.SideEffect for replay safety.

built-in tools

tool/builtin ships Fetch (15s timeout, 1 MiB cap) and ReadFile(baseDir) with path-escape rejection.

idempotency & retry

step.ToolCall{Idempotent: true, MaxAttempts: N} retries on tool.ErrTransient. Same CallID, incrementing Attempt.

per-call timeouts

WithCallTimeout on the MCP adapter. Local tools enforce timeouts via context.WithDeadline inside the tool.

07storage

three backends. same interface.

in-memory for tests. sqlite for single-host. postgres for multi-host. all three share the migration contract.

eventlog.newinmemory

Tests, demos, ephemeral CLIs. Same EventLog interface, no persistence.

eventlog.newsqlite

WAL mode, per-run _txlock=immediate. Auto-migrates on open. One writer, many readers.

eventlog.newpostgres

Per-run advisory locks serialize appenders by run. Different runs are independent. PITR via WAL archiving.

schema migrations

Forward-only, idempotent. CLI subcommands: starling migrate, starling schema-version. Preflight refuses stale or too-new schemas.

ndjson export

starling export <db> <runID> dumps a run to portable NDJSON. Archive cold, delete hot.

runlister

All three backends expose ListRuns for inspector-style run indexes, ordered newest first.

08observability

production-grade out of the box.

prometheus metrics, opentelemetry spans, structured slog. an embedded read-only web ui for runs, timelines, replays.

prometheus metrics

starling.NewMetrics(reg) registers run, provider, tool, eventlog, and budget collectors. Histograms cover every hot path.

opentelemetry tracing

agent.run → agent.turn → provider.stream + step.tool. Wire any OTLP exporter; the runtime emits the spans.

structured slog

Run lifecycle and divergence events emit slog records with stable fields. Plug your own handler.

embedded inspector

Read-only HTTP UI. Runs list, per-run timeline, payload detail, live tail (SSE), replay controls, divergence rendering.

bearer auth + csrf

inspect.WithAuth + BearerAuth. CSRF protection on replay POST endpoints. Front with TLS for non-loopback access.

dual-mode binary

Embed InspectCommand(factory) in your service binary so the inspector can replay against your live agent code.

ship one. replay forever.

the runtime is small on purpose. the wedge is production debugging via replay, not framework breadth.