why starling.
eight categories of capability, all wired around the event log as source of truth. audit, replay, recovery, cost control, providers, tools, storage, observability.
every run is tamper-evident.
append-only event log. blake3 hash chain. merkle root committed in the terminal event.
hash chain on append
Each event carries PrevHash = BLAKE3(canonical CBOR of prior event). Mutate any prior event and the chain breaks.
merkle root commit
RunCompleted, RunFailed, and RunCancelled embed a Merkle root over every prior event. Sign it for cross-process attestation.
eventlog.validate
Headless integrity check: seq monotonicity, hash chain, terminal placement, Merkle root, and semantic pairing rules.
raw response digest
AssistantMessageCompleted carries a BLAKE3 digest the adapter computes over the SDK-level response. Optional strict mode rejects empty digests.
canonical cbor
RFC 8949 §4.2: shortest integer form, sorted map keys, no indefinite-length items. The byte representation is deterministic.
re-execute any run, byte-for-byte.
recorded side effects + deterministic loop = portable runs. replay never re-contacts the provider.
starling.replay
Re-runs the agent against a recorded log. The first event that does not byte-match surfaces as a typed Divergence.
step.now & step.random
Wall-clock and RNG are recorded once on the live run, returned from the log on replay.
step.sideeffect
Wrap any non-deterministic effect (HTTP, filesystem, MCP). Replay reads the recorded value, skips the closure.
divergence type
replay.Divergence carries Seq, Kind, ExpectedKind, Class, and Reason. errors.Is + errors.As give you structured access.
stream & inspector mode
replay.Stream yields a ReplayStep per event so the inspector can render recorded vs produced side-by-side.
resume a crashed run from its last seq.
reconstruct conversation state from the log, reissue pending tool calls, mark the boundary with a seam event.
agent.resume
Re-enters a run in a new process. Pending tool calls reissue under fresh CallIDs; orphaned schedules stay for audit.
runresumed seam
A non-terminal event marking the boundary between processes. Carries AtSeq, ExtraMessage, ReissueTools, and PendingCalls.
withreissuetools(false)
Refuse to re-fire pending tools, return ErrPartialToolCall instead. Use when tools are mutating and you want manual intervention.
schema preflight
Run and Resume call eventlog.Preflight on startup. Stale or too-new schemas fail fast with a remediation message.
budgets enforced inside the runtime.
four axes. inline checks, not after-the-fact dashboards. a trip emits budgetexceeded and unwinds the run.
maxinputtokens
Pre-call check before every step.LLMCall. Counts the planned prompt, refuses the call if it would exceed.
maxoutputtokens
Mid-stream check on every ChunkUsage. Cancels the stream the moment the cap is crossed.
maxusd
Mid-stream USD enforcement using a per-model price table in the budget package. Edit the table to add or override rates.
maxwallclock
context.WithDeadline wrapping the run. The deadline triggers RunFailed{ErrorType:"budget", Limit:"wall_clock"}.
budgetexceeded event
Emitted with limit, cap, actual, and where (pre_call | mid_stream | post_call) so post-mortems are exact.
bring your model. or all of them.
adapters share a streaming contract and a conformance suite. openai-compatible endpoints plug in via withbaseurl.
openai adapter
Plus Groq, Together, Ollama, vLLM, LM Studio, Azure, anything else OpenAI-compatible.
anthropic adapter
Tool use, extended thinking with per-block signatures, prompt caching metadata.
gemini adapter
Native Google Gemini through the Google AI streaming API.
bedrock adapter
Amazon Bedrock via native ConverseStream — tool use, reasoning with signatures, redacted thinking, cache-aware usage.
openrouter adapter
Thin wrapper over the OpenAI adapter with attribution headers and routing.
conformance suite
Reusable harness in provider/conformance asserting request shape, chunk ordering, tool-call IDs, usage, and cancellation.
capability declaration
provider.Capabler exposes which features each adapter supports. Tests skip what the adapter cannot do.
typed go tools. mcp for the rest.
tool.tool is one interface. tool.typed derives json schema from your input type. the mcp adapter mounts remote servers.
tool.typed
Wrap a typed Go function as a Starling tool. JSON Schema is derived from the input type via reflection.
mcp adapter
Three transports: stdio subprocess, streamable HTTP, custom mcp.Transport. Calls route through step.SideEffect for replay safety.
built-in tools
tool/builtin ships Fetch (15s timeout, 1 MiB cap) and ReadFile(baseDir) with path-escape rejection.
idempotency & retry
step.ToolCall{Idempotent: true, MaxAttempts: N} retries on tool.ErrTransient. Same CallID, incrementing Attempt.
per-call timeouts
WithCallTimeout on the MCP adapter. Local tools enforce timeouts via context.WithDeadline inside the tool.
three backends. same interface.
in-memory for tests. sqlite for single-host. postgres for multi-host. all three share the migration contract.
eventlog.newinmemory
Tests, demos, ephemeral CLIs. Same EventLog interface, no persistence.
eventlog.newsqlite
WAL mode, per-run _txlock=immediate. Auto-migrates on open. One writer, many readers.
eventlog.newpostgres
Per-run advisory locks serialize appenders by run. Different runs are independent. PITR via WAL archiving.
schema migrations
Forward-only, idempotent. CLI subcommands: starling migrate, starling schema-version. Preflight refuses stale or too-new schemas.
ndjson export
starling export <db> <runID> dumps a run to portable NDJSON. Archive cold, delete hot.
runlister
All three backends expose ListRuns for inspector-style run indexes, ordered newest first.
production-grade out of the box.
prometheus metrics, opentelemetry spans, structured slog. an embedded read-only web ui for runs, timelines, replays.
prometheus metrics
starling.NewMetrics(reg) registers run, provider, tool, eventlog, and budget collectors. Histograms cover every hot path.
opentelemetry tracing
agent.run → agent.turn → provider.stream + step.tool. Wire any OTLP exporter; the runtime emits the spans.
structured slog
Run lifecycle and divergence events emit slog records with stable fields. Plug your own handler.
embedded inspector
Read-only HTTP UI. Runs list, per-run timeline, payload detail, live tail (SSE), replay controls, divergence rendering.
bearer auth + csrf
inspect.WithAuth + BearerAuth. CSRF protection on replay POST endpoints. Front with TLS for non-loopback access.
dual-mode binary
Embed InspectCommand(factory) in your service binary so the inspector can replay against your live agent code.
ship one. replay forever.
the runtime is small on purpose. the wedge is production debugging via replay, not framework breadth.