event-sourced
agent runtime
in Go

replay past runs. resume crashed ones. stop runaway costs.

$go get github.com/jerkeyray/starling
run · 01hz8…xkj3 · 0 events✓ validated
  1. #01
    RunStartedmodel · tools · system prompt pinned
  2. #02
    TurnStartedturn 1 · prompt hash committed
  3. #03
    AssistantMessageCompletedtool plan: search · fetch
  4. #04
    ToolCallScheduledsearch · attempt 1
  5. #05
    ToolCallScheduledfetch · attempt 1
  6. #06
    ToolCallCompletedsearch · 28ms
  7. #07
    ToolCallCompletedfetch · 312ms
  8. #08
    TurnStartedturn 2
  9. #09
    AssistantMessageCompletedfinal answer
  10. #10
    RunCompletedmerkle root committed
00 / 10
what you get

what you get without writing it yourself.

Replay, audit, multi-provider, MCP, budgets, and operator tooling - all in the box, with one Go import.

Replay any past run

Re-run a recorded run against your current code. The first step that behaves differently shows up as a test failure.

Audit-grade history

Each run is hash-signed end to end. If anyone edits a past event, validation breaks and you know.

Use any model

OpenAI, Anthropic, Gemini, Bedrock, OpenRouter, and any OpenAI-compatible endpoint. Swap models without touching agent code.

Tools and MCP

Write tools as plain Go functions, or mount any MCP server. Both behave the same in live runs and in replay.

Hard cost limits

Cap tokens, dollars, and wall-clock per run. The runtime stops when a cap trips - not after the bill arrives.

Production basics, included

Postgres or SQLite storage, schema migrations, Prometheus metrics, structured logs, and a built-in web inspector.

how a run flows

five phases, start to finish.

From a goal in to a verified answer out - every step recorded in order, replayable later.

  1. 01

    define

    Wire up the agent. Pick a model, give it tools, set a budget. No I/O yet.

    modeltoolsbudget
  2. 02

    run

    Call Run with a goal. The runtime mints a run id and starts recording from the first byte.

    fresh run idrecording started
  3. 03

    loop

    The model thinks, calls tools, reads the results, thinks again. Every step lands in the recording as it happens.

    turnstool callstokens
  4. 04

    finish

    When the run ends, the recording is sealed and signed. You can prove later that nothing was edited.

    final answersigned history
  5. 05

    replay

    Re-run the recording against your current code. Any difference shows up as a typed error pointing at the exact step.

    same wiringdiff at exact step
how it works

every meaningful runtime action is an event.

Starling treats the event log as the source of truth. The runtime, the inspector, and replay verification all read the same shape.

Every event is hash-chained on append. The terminal event commits a Merkle root over all priors. Mutate any prior event and eventlog.Validate fails.

Replay re-executes the agent against the same wiring. The first event that does not byte-match surfaces as a typed replay.Divergence carrying seq, kind, expected kind, class, and reason.

merkle root
root: 8af1…91e2
L: 04c3…9b1a
R: 7d20…f4ee
e1
e2
e3
e4
chain valid · 4/4 verified

in-tree adapters

OpenAIAnthropicGeminiBedrockOpenRouter

OpenAI-compatible endpoints (Groq, Together, Ollama, vLLM, LM Studio, Azure OpenAI, …) plug in via openai.WithBaseURL.

open source

contributions welcome.

issues, pull requests, and design discussions are genuinely appreciated. star the repo to follow along.