starling

Reference

Per-package API reference: types, signatures, and short examples for the Starling Go runtime.

Per-package types, signatures, and short examples.

starling (root)

The agent loop, run lifecycle, and replay surface.

Agent

type Agent struct {
    Provider  provider.Provider
    Tools     []tool.Tool
    Log       eventlog.EventLog
    Config    Config
    Budget    *Budget
    Metrics   *Metrics
    Namespace string  // optional run-id prefix
}

Agent holds no per-run state. Two instances pointing at the same log are interchangeable.

Config

FieldDefaultNotes
ModelrequiredProvider-specific model id, e.g. "gpt-4o-mini".
MaxTurns0 = ∞Caps the ReAct loop. 0 is allowed but not recommended.
SystemPrompt""Prepended to every conversation. Captured into RunStarted.
ParamsnilProvider-specific param blob (CBOR). Hashed into RunStarted.ParamsHash.
RequireRawResponseHashfalseFail any turn whose ChunkEnd lacks a 32-byte raw-response digest.
AppVersion""Stamped into RunStarted alongside the Starling library version.
EmitTimeout0 = ∞Bounds each event-log Append under context.WithoutCancel.
SkipSchemaCheckfalseDisables eventlog.Preflight on Run / Resume. Tests only.
Loggerslog.Default()Structured slog records for run lifecycle.

Run / Resume / Replay

  • Run(ctx, goal) (*RunResult, error) — live entry. Mints a fresh run id (namespaced when Namespace != ""), emits RunStarted, runs the loop, returns the terminal *RunResult.
  • Resume(ctx, runID, extraMessage) (*RunResult, error) — re-enters a run from its last seq. Pending tool calls reissue under fresh CallIDs; the orphan stays for audit. ResumeWith(...opts) adds WithReissueTools(false) for manual recovery.
  • Replay(ctx, log, runID, agent, opts...) error — re-executes against the same wiring. Returns nil on a clean replay, wraps a *replay.Divergence with ErrNonDeterminism on the first mismatch, or ErrProviderModelMismatch when the agent's Provider.ID / APIVersion / Config.Model disagree with the recording. WithForceProvider() disables the identity check.
  • RunStream(ctx, goal) (string, <-chan AgentEvent, error) — typed event stream layered over Stream. Variants: TextDelta, ToolCallStarted, ToolCallEnded, Done. Channel closes after a single Done.

RunResult carries RunID, FinalText, totals (TurnCount, ToolCallCount, TotalCostUSD, InputTokens, OutputTokens, Duration), TerminalKind, MerkleRoot, and CacheStats (Hits, Misses, ReadTokens, CreateTokens). All recoverable from the log; the struct is a convenience.

Sentinel errors

ErrorMeaning
ErrNonDeterminismReplay diverged from the recording. Wraps *replay.Divergence.
ErrPartialToolCallResume saw pending tool calls and WithReissueTools(false) was set.
ErrRunNotFoundResume target run id is absent from the log.
ErrRunAlreadyTerminalResume target ended with a terminal event.
ErrRunInUseAnother writer already advanced the chain.
ErrSchemaVersionMismatchThe recording's schema version is unsupported by this binary.
ErrProviderModelMismatchReplay agent's Provider.ID / APIVersion / Config.Model disagrees with RunStarted.

budget

Budget has four axes; zero on any field disables it. A trip emits BudgetExceeded{Limit, Cap, Actual, Where} and unwinds with RunFailed{ErrorType:"budget"}.

Axis (field)TypeWhen
MaxInputTokensint64Pre-call, before every step.LLMCall.
MaxOutputTokensint64Mid-stream on every ChunkUsage.
MaxUSDfloat64Mid-stream using budget/prices.go per-model rates.
MaxWallClocktime.Durationcontext.WithDeadline wrapping the run.

budget.RegisterPricing(model, inPerMtok, outPerMtok) registers or overrides per-model USD pricing at runtime; resets the unknown-model warn-once memo so a stale warning doesn't outlive the call. Built-in rates ship for major-vendor models in budget/prices.go.

event

The wire format. Every event carries:

type Event struct {
    RunID     string
    Seq       uint64
    PrevHash  []byte           // BLAKE3 of canonical CBOR of prev event
    Timestamp int64            // Unix nanoseconds
    Kind      Kind
    Payload   cborenc.RawMessage  // kind-specific struct, CBOR-encoded
}

The full schema with payload definitions, the kinds the runtime emits, the reserved kinds, and the invariants live on the Events page.

Encoding helpers: Marshal, Unmarshal, Hash, ToJSON. event.HashSize is 32. Each typed payload has an EncodePayload[T] helper; each kind has a matching accessor (AsRunStarted, AsToolCallCompleted, …).

eventlog

type EventLog interface {
    Append(ctx, runID, ev) error
    Read(ctx, runID) ([]Event, error)
    Stream(ctx, runID) (<-chan Event, error)
    Close() error
}

RunLister adds ListRuns(ctx) ([]RunSummary, error). RunPageLister adds ListRunsPage(ctx, opts) (RunPage, error) for filtered, server-side pagination. RunPruner adds explicit whole-run retention cleanup with PruneRuns(ctx, opts) (PruneReport, error). All three built-in backends implement these optional interfaces.

RunSummary carries per-run aggregates (TurnCount, ToolCallCount, InputTokens, OutputTokens, CostUSD, DurationMs) so dashboards don't have to re-aggregate event streams.

Helpers: eventlog.AggregateRun(events) returns the same totals over a chained event slice (single source of truth for the inspector and the MCP server). eventlog.ForkSQLite(ctx, src, dst, runID, beforeSeq) is a WAL-safe SQLite branch via VACUUM INTO, truncating one run's events at a sequence boundary. The BLAKE3 chain helpers used by Agent.Run are public at github.com/jerkeyray/starling/merkle.

Backends

ConstructorUse when
NewInMemory()Tests, demos, ephemeral CLI tools.
NewSQLite(path, opts...)Single-host services, edge nodes.
NewPostgres(db, opts...)Multi-host services. Per-run advisory locks serialize appenders.

Options: WithReadOnly() / WithReadOnlyPG() for inspector mode, WithAutoMigratePG() to run migrations on connect.

Validation, migrations, preflight

  • Validate(events) — seq monotonicity, hash chain, terminal placement, Merkle root, and the semantic pairing rules from the Event schema.
  • SchemaVersion(ctx, log) / Migrate(ctx, log, opts...) — forward-only migration API. Migrate returns a MigrationReport.
  • Preflight(ctx, log) — fails fast with ErrSchemaOutdated or ErrSchemaTooNew. Agent.Run, Agent.Resume, and the inspector all call it unless Config.SkipSchemaCheck = true.
  • WithMetrics(log, obs) — wraps any EventLog so direct Append callers see the same latency histograms step.emit records.

Sentinel errors: ErrLogClosed, ErrLogCorrupt, ErrInvalidAppend, ErrReadOnly, ErrSchemaOutdated, ErrSchemaTooNew.

step

The determinism layer. Anything non-deterministic in the agent loop must go through step so replay can reproduce it byte-for-byte.

Helpers

func Now(ctx context.Context) time.Time
func Random(ctx context.Context) int64
func SideEffect[T any](ctx context.Context, name string, fn func() (T, error)) (T, error)

Live mode runs fn and emits a SideEffectRecorded event. Replay reads the recorded value back without invoking fn. T must be CBOR-serializable.

LLM calls

LLMCall(ctx, req) drives a streaming completion through the configured provider. Emits TurnStarted, optional ReasoningEmitted, and AssistantMessageCompleted. Enforces input/output/USD budgets inline. Validates the chunk state machine (no EOF before ChunkEnd, no duplicate ChunkToolUseStart, no chunks after ChunkEnd).

Tool dispatch

type ToolCall struct {
    CallID, TurnID, Name string
    Args                 json.RawMessage
    Idempotent           bool
    MaxAttempts          int
    Backoff              func(attempt int) time.Duration
}
  • CallTool(ctx, c) — sequential dispatch.
  • CallTools(ctx, calls) — fan-out with a semaphore (cap is step.DefaultMaxParallelTools, 8).
  • Retries kick in on tool.ErrTransient when Idempotent and MaxAttempts > 1. NewCallID() mints fresh IDs.

Replay errors

MismatchError carries Seq, Kind, ExpectedKind, Class ("exhausted" | "kind" | "payload" | "turn_id"), and Reason. It satisfies errors.Is(ErrReplayMismatch). Use errors.As for the structured fields. Other sentinels: ErrInvalidStream, ErrMissingRawResponseHash. The replay package lifts these into replay.Divergence (next section).

tool

type Tool interface {
    Name() string
    Description() string
    Schema() json.RawMessage   // JSON Schema for input
    Execute(ctx, in) (json.RawMessage, error)
}

tool.Typed[In, Out](name, description, fn) derives the JSON Schema from In via reflection. Errors wrapping tool.ErrTransient opt the call into retry under step.ToolCall{Idempotent: true, MaxAttempts: N}.

tool.Wrap(t Tool, mw ...Middleware) Tool composes middleware around Execute while passing Name, Description, and Schema through unchanged. Last middleware passed runs first (net/http.Handler ordering); short-circuiting middleware can skip the inner call entirely. Useful for logging, timing, span injection, request authentication, output redaction.

Test scaffolding (starlingtest/)

ScriptedProvider is a deterministic provider.Provider driven by a slice of canned chunks per turn. Helpers NewStream, AppendRunStarted, AssertReplayMatches, and AssertReplayDiverges cover the common test shapes without contacting an LLM.

MCP adapter (tool/mcp)

Three constructors mount remote MCP tools as ordinary Starling tools:

  • New(ctx, transport, opts...) — any mcp.Transport.
  • NewCommand(ctx, exec.Cmd, opts...) — stdio subprocess.
  • NewHTTP(ctx, endpoint, client, opts...) — streamable HTTP.

Each connects, lists remote tools, and exposes them via client.Tools(ctx). Calls route through step.SideEffect so replay never re-contacts the server. Full options table on the MCP tools page. The inbound counterpart - a read-only MCP server that exposes a recorded log to AI clients - lives at MCP server.

HTTP daemon (starlingd)

starlingd.Command(factory) builds a CLI entrypoint for serving your own agent over HTTP. starlingd.New(config) returns an http.Handler for apps that already own server setup. The daemon exposes async run creation, bounded in-process queueing, SSE streams, read APIs, Prometheus metrics, bearer auth, and an optional inspector mount. Full reference lives at HTTP daemon.

Built-in tools

tool/builtin/ ships Fetch() (public http/https only, 15s timeout, 1 MiB cap, local/private-address and unsafe redirect rejection) and ReadFile(baseDir) (path-escape rejection). Use directly or as templates.

provider

The streaming-completion abstraction.

type Provider interface {
    Info() Info
    Stream(ctx, req) (EventStream, error)
}

Optional Capabler exposes Capabilities() so the conformance suite can skip what the adapter doesn't support. A Request carries Model, SystemPrompt, Messages, Tools, ToolChoice ("" | "auto" | "any" | "none" | tool name), StopSequences, TopK, MaxOutputTokens, and a vendor-specific Params blob.

EventStream yields StreamChunk values: ChunkText, ChunkReasoning, ChunkRedactedThinking, ChunkToolUseStart/Delta/End, ChunkUsage, ChunkEnd. The state machine is enforced by step.LLMCall.

Adapters

PackageUse when
provider/openaiOpenAI, Groq, Together, Ollama, vLLM, LM Studio, Azure, anything else OpenAI-compatible (set WithBaseURL).
provider/anthropicMessages API. Tool use, extended thinking with signature, prompt caching.
provider/geminiNative Google Gemini.
provider/bedrockAmazon Bedrock via native ConverseStream (AWS SDK v2).
provider/openrouterOpenRouter: thin wrapper over the OpenAI adapter with attribution headers.
provider/conformanceThe contract test every adapter passes.

Each adapter advertises its support set via provider.Capabler.Capabilities(). The conformance suite skips capability-gated assertions when the adapter reports false.

Error classification

Adapters wrap underlying SDK / HTTP errors with one of four sentinels for retry policy via errors.Is:

SentinelWhen
provider.ErrRateLimit429 / quota
provider.ErrAuth401 / 403
provider.ErrServer5xx
provider.ErrNetworkDNS / dial / TLS / broken stream

Helpers: provider.WrapHTTPStatus(err, status) annotates by HTTP status (delegates to ClassifyTransport when status == 0); provider.ClassifyTransport(err) wraps net.Error and *url.Error with ErrNetwork. 4xx errors that are neither auth nor rate-limit pass through unwrapped on purpose - they reflect caller bugs, not transient conditions.

replay

  • Verify(ctx, log, runID, agent) — headless check. Returns nil on a clean replay or wraps *Divergence with ErrNonDeterminism on the first mismatch. starling.Replay is a thin wrapper that takes *Agent directly.
  • Stream(ctx, factory, log, runID) — inspector path. Yields a ReplayStep per emitted event so the UI can render recorded vs produced side-by-side. The final step has Diverged: true when the replay didn't reach the recorded terminal.

Divergence carries RunID, Seq, Kind, ExpectedKind, Class, Reason. Factory is func(ctx) (Agent, error).

inspect

Embedded HTTP handler. Serves the runs list, per-run timeline, event detail, live tail (SSE), and replay. The standalone binary at cmd/starling-inspect opens any SQLite log read-only.

inspect.New(log, opts...) (*Server, error). Options:

  • WithAuth(authenticator) — protect every endpoint.
  • BearerAuth(token) — convenience Authenticator.
  • WithReplayer(factory) — enable replay re-execution.
  • WithDBPath(path) — show the DB basename in the topbar context chip (full path on hover).

Read-only by construction, CSRF-protected on the replay POST endpoints. Front it with TLS in production: see Operations.

CLI (cmd/starling)

starling validate <db> [<runID>]   # hash chain + Merkle check
starling export   <db> <runID>     # NDJSON event dump
starling prune    [flags] <db>     # dry-run-first retention deletion
starling inspect  [flags] <db>     # local web inspector (read-only)
starling mcp      <db>             # read-only MCP server over stdio for AI clients
starling replay   <db> <runID>     # headless replay (dual-mode binary only)
starling migrate  [-dry-run] <db>  # apply pending schema migrations
starling schema-version <db>       # print the current schema version
starling doctor                    # quick health check: version, env vars, schema, validation
starling version                   # print the binary's Starling version (also -v / --version)

The stock binary is SQLite-only. Building a dual-mode binary that links your agent factory enables starling replay and starling inspect with replay re-execution.

Examples

PathWhat it shows
examples/m1_helloMinimal hello agent, dual-mode inspector, OTel stdout exporter.
examples/incident_triageMulti-tool agent, budgets, Resume, replay regression test, Postgres, Prometheus, OTel.

On this page