starling
Build

Persist runs

Pick a backend. In-memory for tests, SQLite for single-host, Postgres for multi-host. Migrations, preflight, read-only inspectors.

eventlog.EventLog is the persistence interface. Three built-in backends share it; your code only sees the interface, so swapping is a constructor change.

The interface

type EventLog interface {
    Append(ctx context.Context, runID string, ev event.Event) error
    Read(ctx context.Context, runID string) ([]event.Event, error)
    Stream(ctx context.Context, runID string) (<-chan event.Event, error)
    Close() error
}

type RunLister interface {
    ListRuns(ctx context.Context) ([]RunSummary, error)
}

type RunPageLister interface {
    ListRunsPage(ctx context.Context, opts RunPageOptions) (RunPage, error)
}

type RunPruner interface {
    PruneRuns(ctx context.Context, opts PruneOptions) (PruneReport, error)
}

type RunPageOptions struct {
    Limit            int
    Offset           int
    Status           string
    Query            string
    StartedAfter     time.Time
    RequireToolCalls bool
}

type RunPage struct {
    Runs          []RunSummary
    TotalMatching int
    Limit         int
    Offset        int
}

type RunSummary struct {
    RunID         string
    StartedAt     time.Time
    LastSeq       uint64
    TerminalKind  event.Kind

    // Aggregates over the run's events. Computed by every backend's
    // ListRuns implementation; zero values are valid for runs that
    // haven't produced an AssistantMessageCompleted yet.
    TurnCount     int
    ToolCallCount int
    InputTokens   int64
    OutputTokens  int64
    CostUSD       float64
    DurationMs    int64  // wall time from RunStarted to last event
}

All three built-in backends satisfy RunLister, RunPageLister, and RunPruner. The inspector uses RunPageLister when available, and falls back to RunLister for custom backends that only implement the older listing interface. The aggregate fields on RunSummary are computed at list time so dashboards don't have to re-aggregate event streams.

Picking a backend

BackendUse whenAvoid when
NewInMemory()Tests, demos, ephemeral CLIs.Anything you want to replay later.
NewSQLite(path, ...)Single-host services, edge nodes. WAL mode, one writer, many readers.Multi-host writers (no cross-host locking).
NewPostgres(db, ...)Multi-host services, regulated workloads, anything wanting PITR.Workloads where the DB is unavailable for stretches.

In-memory

log := eventlog.NewInMemory()

No migration, no schema check, no persistence. The whole log is gone when the process exits. Useful for go test and one-shot CLIs.

SQLite

log, err := eventlog.NewSQLite("starling.db")
if err != nil { return err }
defer log.Close()

What you get:

  • WAL mode + synchronous=NORMAL — fast appends, fsync on commit.
  • Auto-migration on open — first open installs the schema; later opens migrate forward to the binary's schema version.
  • Per-run _txlock=immediate — one writer, many readers.
  • File permissions0600, owned by the agent user.

Options:

OptionPurpose
WithReadOnly()Open with mode=ro. Append returns ErrReadOnly. Inspector mode.

Read-only example (e.g., a separate inspector binary against the same file):

log, err := eventlog.NewSQLite("starling.db", eventlog.WithReadOnly())

You can backup a live SQLite log without stopping the agent:

sqlite3 starling.db ".backup /tmp/starling-backup.db"

Postgres

import (
    _ "github.com/jackc/pgx/v5/stdlib"
    "github.com/jerkeyray/starling/eventlog"
)

db, err := sql.Open("pgx", os.Getenv("DATABASE_URL"))
if err != nil { return err }
db.SetMaxOpenConns(8)

log, err := eventlog.NewPostgres(db, eventlog.WithAutoMigratePG())
if err != nil { return err }
defer log.Close()

What you get:

  • Per-run pg_advisory_xact_lock on the run id hash — appenders to the same run serialize; different runs are independent.
  • Multi-host safe — any number of writers across hosts.
  • PITR / replication — standard Postgres tooling works.
  • Postgres ≥ 11 required (uses hashtextextended).

Options:

OptionPurpose
WithAutoMigratePG()Run InstallSchema at open. Without it, run migrations explicitly.
WithReadOnlyPG()Append returns ErrReadOnly. Inspector mode.

Use a Postgres role with the minimum privileges you need:

-- writer
GRANT SELECT, INSERT ON eventlog_events TO starling_writer;
-- reader (inspector)
GRANT SELECT ON eventlog_events TO starling_reader;

Migrations

import "github.com/jerkeyray/starling/eventlog"

// Print current version.
v, err := eventlog.SchemaVersion(ctx, log)

// Apply pending migrations (forward-only).
report, err := eventlog.Migrate(ctx, log)

// Dry-run for CI.
report, err := eventlog.Migrate(ctx, log, eventlog.WithDryRun())

CLI equivalents:

starling schema-version /var/lib/starling/log.db
starling migrate /var/lib/starling/log.db
starling migrate --dry-run /var/lib/starling/log.db

Preflight

Agent.Run and Agent.Resume call eventlog.Preflight(ctx, log) on startup. It returns:

  • nil if the schema matches.
  • ErrSchemaOutdated if the database is older than the binary (run Migrate).
  • ErrSchemaTooNew if the database is newer than the binary (deploy a newer binary or rollback the schema).

In-memory backends skip the check (return nil). Disable with Config.SkipSchemaCheck = true in tests only.

Validation

eventlog.Validate(events) re-checks an entire run end to end. Use it in CI to verify a recorded fixture hasn't drifted:

events, err := log.Read(ctx, runID)
if err != nil { return err }
if err := eventlog.Validate(events); err != nil {
    // wraps ErrLogCorrupt with a diagnostic.
}

Validate checks:

  1. Slice non-empty, events[0].Seq == 1, monotonic seq with no gaps.
  2. RunID consistent across all events.
  3. Hash chain unbroken.
  4. Exactly one terminal event, last in the slice.
  5. First event is RunStarted with a supported SchemaVersion.
  6. TurnStarted paired with a same-turn terminal.
  7. ToolCallScheduled paired with ToolCallCompleted or ToolCallFailed under the same (CallID, Attempt).
  8. Merkle root matches over every pre-terminal event.

Reading and streaming

// One-shot read of a finished run.
events, err := log.Read(ctx, runID)

// Stream as the run unfolds (historical + live).
ch, err := log.Stream(ctx, runID)
for ev := range ch {
    // ...
}

Stream delivers historical events first, then live events. The channel closes on context cancel, log close, or buffer overflow (internal buffer is 256 events).

Paged listings

Use ListRunsPage for UI or API surfaces that browse many runs:

page, err := log.ListRunsPage(ctx, eventlog.RunPageOptions{
    Limit:  50,
    Offset: 0,
    Status: "completed",
    Query:  "support-ticket",
})

Limit <= 0 uses the backend default. SQLite and Postgres apply filters, ordering, and pagination in SQL before materializing run summaries, so large logs do not need to load every run just to render the first page. No schema migration is required.

Retention pruning

Pruning is an explicit operator action outside the append-only EventLog contract. It deletes whole runs only; it never removes a single event or suffix from a run.

starling prune --older-than 720h /var/lib/starling/log.db          # dry run
starling prune --older-than 720h --confirm /var/lib/starling/log.db
starling prune --before 2026-01-01T00:00:00Z --status completed /var/lib/starling/log.db

The default selection is terminal runs (completed, failed, and cancelled) older than the cutoff. In-progress runs are kept unless you pass --status "in progress" or --include-in-progress.

For Postgres, wire the same retention policy as a maintenance job with a role that has SELECT and DELETE on eventlog_events:

report, err := log.(eventlog.RunPruner).PruneRuns(ctx, eventlog.PruneOptions{
    Before: time.Now().Add(-90 * 24 * time.Hour),
    DryRun: true,
})
if err != nil { return err }
fmt.Printf("would delete %d runs\n", report.MatchedRuns)

_, err = log.(eventlog.RunPruner).PruneRuns(ctx, eventlog.PruneOptions{
    Before: time.Now().Add(-90 * 24 * time.Hour),
})

Keep inspector roles read-only (SELECT only).

Helpers

turns, tools, inTok, outTok, cost, durNs :=
    eventlog.AggregateRun(events)

AggregateRun is the single source of truth for per-run totals across the runtime: the inspector's totals strip, the MCP server's summarize_run tool, and RunSummary's aggregate fields all share this implementation. An event whose payload fails to decode is skipped rather than failing the whole aggregation, since callers are typically presentation surfaces where one broken row should not blank the dashboard.

err := eventlog.ForkSQLite(ctx, srcPath, dstPath, runID, beforeSeq)

WAL-safe SQLite branching. Copies the source via VACUUM INTO (the only way to copy a live WAL-mode database without leaking the .db-wal and .db-shm sidecars) and truncates runID's events to those with seq < beforeSeq. Other runs are preserved verbatim.

beforeSeq=0 keeps every event for runID (forks the run as-is); returns ErrForkNotFound when nothing matches in the source. See docs/cookbook/branching.md in the runtime repo for a worked example.

Public merkle package

import "github.com/jerkeyray/starling/merkle"

The BLAKE3 hash-chain helpers used by Agent.Run are exposed as a public package. Third parties writing their own event producers can reuse the chain implementation rather than copying it - useful for non-Agent.Run recorders that need to write into an EventLog and maintain compatible chain output (e.g. importers, replay harnesses, or custom dashboards that want to rebuild a Merkle root).

Sentinel errors

var (
    ErrLogClosed       = errors.New("eventlog: log is closed")
    ErrLogCorrupt      = errors.New("eventlog: log is corrupt")
    ErrInvalidAppend   = errors.New("eventlog: invalid append")
    ErrReadOnly        = errors.New("eventlog: log is read-only")
    ErrSchemaOutdated  = errors.New("eventlog: schema outdated; run migrate")
    ErrSchemaTooNew    = errors.New("eventlog: schema too new for this binary")
)

Wrapping a backend with metrics

If you call Append directly outside step.emit, wrap the log to capture the same latency histograms:

import "github.com/jerkeyray/starling/eventlog"

obs := starling.NewMetrics(reg).EventLogObserver()
log = eventlog.WithMetrics(log, obs)

Anti-patterns

  • Multiple processes writing to one SQLite file. Use Postgres.
  • SkipSchemaCheck: true in production. Hides migrations you forgot to run.
  • Calling Migrate on every process start without coordination. It's idempotent but wastes a transaction. Run it from your release pipeline; let the binary preflight on startup.
  • Reusing a runID across runs. Once recorded, ids are retired. The agent mints fresh ULIDs; don't pass synthetic ids.

Where to next

On this page