Concepts

The event log is the source of truth. The runtime, the inspector, and replay all read the same shape. Every other design choice falls out of that.

The event log

Every run is an append-only sequence of events:

seq=1  RunStarted        (model, tools, system prompt, params hash)
seq=2  TurnStarted       (turn id, prompt hash)
seq=3  AssistantMessageCompleted  (full text, tool plans, raw response hash)
seq=4  ToolCallScheduled
seq=5  ToolCallCompleted
seq=6  TurnStarted
…
seq=N  RunCompleted | RunFailed | RunCancelled  (Merkle root of all priors)

Every event carries a PrevHash field equal to BLAKE3 over the canonical CBOR of the previous event. The terminal event commits a Merkle root over all priors. Tampering with any earlier event breaks both the chain and the root; eventlog.Validate returns ErrLogCorrupt.

The full schema lives on the Event schema page.

The determinism contract

Starling borrows Temporal's determinism model. The agent loop is allowed to do exactly two things:

Read events from the log.
Emit commands that the runtime reifies as new events.

Anything else: wall clock, RNG, HTTP, filesystem: must go through the step package:

now := step.Now(ctx)        // recorded once, returned on replay
n := step.Random(ctx)        // same idea, deterministic
val, err := step.SideEffect(ctx, "name", func() (T, error) {
    // any non-deterministic effect goes here
})

Live: append a SideEffectRecorded event. Replay: read the recorded value, skip the closure. The name argument is the lookup key — reuse it for the same logical effect, change it when the effect changes.

Replay

starling.Replay(ctx, log, runID, agent) re-executes a recorded run against the same agent wiring. Every event the loop attempts to emit is compared to the recording at the matching seq:

Kind mismatch: the loop produced a different event type.
Payload mismatch: same kind, different bytes.
Exhausted: the loop ran past the end of the recording.

Mismatches surface as a typed *replay.Divergence carrying RunID, Seq, Kind, ExpectedKind, Class, and Reason. Wrap with errors.Is(err, starling.ErrNonDeterminism) to detect; use errors.As to get the structured fields.

Replay never calls the provider. Tool execution still runs but reads recorded results out of the log when wrapped in step.SideEffect.

Resume

When a run crashes mid-flight, (*Agent).Resume(ctx, runID, extra) reconstructs the conversation state from the log and re-enters the agent loop in a new process. Pending tool calls are re-issued under fresh CallIDs; the orphaned ToolCallScheduled from the prior process stays in the log for audit. A RunResumed seam event marks the boundary.

Tools

A tool is anything implementing tool.Tool:

type Tool interface {
    Name() string
    Description() string
    Schema() json.RawMessage  // JSON Schema for input
    Execute(ctx context.Context, in json.RawMessage) (json.RawMessage, error)
}

tool.Typed[In, Out](name, description, fn) is the convenience wrapper that derives the JSON Schema from your input type via reflection. For non-deterministic tools (HTTP, filesystem, anything beyond pure compute), wrap the work in step.SideEffect so replay returns the recorded result.

Budgets

Four axes:

Axis	Where enforced
`MaxInputTokens`	Pre-call, before every LLM call
`MaxOutputTokens`	Mid-stream, on every usage chunk
`MaxUSD`	Mid-stream, using per-model prices
`MaxWallClock`	`context.WithDeadline` wrapping the run

A trip emits a BudgetExceeded event with (limit, cap, actual, where) and unwinds the run with RunFailed{ErrorType:"budget"}. Budgets are inline runtime checks, not after-the-fact dashboards.

Backends

Three event-log implementations:

eventlog.NewInMemory(): tests, demos, ephemeral CLI tools.
eventlog.NewSQLite(path): single-host services. WAL mode + per-run _txlock=immediate makes one-writer-many-readers correct.
eventlog.NewPostgres(db): multi-host services. Per-run advisory locks serialize appenders by run.

All three satisfy EventLog and share the migration contract.