Back to home
Engineering BlogArchitecture

Scaling Managed Agents:Decoupling the Brain from the Model

April 9, 2026 · 15 min read · Lattice Runtime Team

Managed agent infrastructure should not assume which model is the brain. Lattice Runtime is built around interfaces that stay stable as models, harnesses, and providers change.


A running theme in agent infrastructure is that harnesses encode assumptions — about what models cannot do, about where code runs, and about which model is doing the thinking. Those assumptions go stale fast.

Anthropic described this well in their recent engineering blog post on Claude Managed Agents. They found that workarounds built for Sonnet were dead weight on Opus. Context resets, retry logic, premature-completion guards — all tuned for one model's limitations, all wrong for the next.

We hit the same wall, but from a different angle. We were not just swapping model versions. We were swapping model providers. A planning agent running Opus. A code agent running Sonnet. A review agent running GPT for a second opinion. A local agent running Ollama for sensitive data that cannot leave the machine.

We built Lattice Runtime to solve both problems: a managed agent infrastructure that is opinionated about the interfaces — brain, session, hands — but not about which model is the brain or where the hands live.

The same old problem, one layer up

Anthropic framed their challenge as an old problem in computing: how to design a system for “programs as yet unthought of.” Operating systems solved this by virtualizing hardware into abstractions general enough for programs that did not exist yet.

We agree with the pattern. We disagree with where they drew the boundary.

Anthropic virtualized everything except the model. Their brain interface assumes Claude. Their session lives on Anthropic's cloud. Their sandbox runs in Anthropic's containers. They decoupled the components — brain, session, hands — but coupled them all to a single provider.

Lattice Runtime virtualizes one layer further: the model itself becomes a pluggable interface.

harness.go
// The brain doesn't know what model is behind this.
response := brain.Generate(ctx, GenerateRequest{
    Messages: session.GetEvents(lastCheckpoint),
    Tools:    runtime.AvailableTools(),
})

// Could be Claude, GPT, Gemini, or Ollama.
// Config change, not code change.

The brain calls generate(messages, tools) → response — it does not know if Claude, GPT, Gemini, or a local Ollama instance is behind it. The session lives wherever you run it. The abstraction outlasts the provider.

Do not adopt a vendor

Anthropic described the “pets vs cattle” problem: when everything lives in one container, that container becomes a pet you cannot afford to lose.

We had the same problem, but the pet was not a container — it was a vendor.

When your managed agent infrastructure is Claude-only, your entire agent system becomes a pet tethered to one provider's API, pricing, uptime, and context window.

Vendor-locked architecture

VENDOR CLOUD

Brain

Claude only

Session

Their servers

Sandbox

Their container

Provider fails → everything fails

Model deprecation breaks agents
Data must leave your infra
Pay Opus prices for commodity work
Outage = total system down

The fix is the same one Anthropic arrived at for containers: decouple. But we decouple at the provider level, not just the component level.

Decouple the brain from the model

In Lattice Runtime, the brain is a harness that calls a model-agnostic interface. It knows how to:

  1. Build context from the durable session log
  2. Call generate() with messages and tools — against any model
  3. Route tool calls to the appropriate runtime backend
  4. Write events back to the session for durability

Switching from Claude to GPT to Ollama is a config change, not a code change. Each component is an interface. Each can fail independently. Each can be swapped without disturbing the others.

Lattice Runtime: decoupled architecture

The Brain

Model-agnostic harness

generate(messages, tools) → response
ClaudeGPTGeminiOllama

The Session

Durable append-only log

getEvents() → event stream

Lives wherever you run it

Survives crashes + model swaps

SHA-256 audit chain

The Hands

6 runtime backends

execute(name, input) → string
LocalWorktreeSSHDockerDevcontainerLattice SSH

Each component is an interface. Each can fail independently. Each can be swapped.

The session is not the context window

Long-horizon tasks exceed the context window. The standard fix — compaction, trimming, memory tools — involves irreversible decisions about what to keep. It is difficult to know which tokens the future turns will need.

In Lattice Runtime, we separated two concerns:

The session is durable. Every event is written to an append-only log per workspace. Malformed entries are filtered at load time, never fatal. If streaming crashes mid-token, the partial message lifecycle ensures the session is repairable on next load.

The context window is constructed. A stream context builder reads from the session log and builds what the model sees on each turn. Different models get different context strategies. Compaction marks a boundary but does not delete — the originals are still in the log, queryable.

context.go
// Context is constructed, not accumulated.
events := session.GetEvents(
    session.FromCheckpoint(lastCompaction),
    session.WithLimit(contextBudget),
)

// Different models get different strategies.
context := contextBuilder.Build(events, model.ContextConfig())

Session vs context window

Session Log

DURABLE
event_001User message
event_002Tool call: git diff
event_003Tool result
event_004Model response
event_005Compaction boundary
...Append-only, queryable, portable

Context Window

CONSTRUCTED

Build

Stream context builder reads from session log

Transform

Different models get different context strategies

Compact

Summarize + mark boundary. Originals stay in log.

Recover

getEvents() can rewind, slice, or replay

The harness became cattle

Because the brain is model-agnostic, a crashed session can resume with a different model. If your Opus session hits a provider outage, the retry manager can fall back to Sonnet or GPT. The session does not care which brain is driving.

This is the structural advantage of decoupling at the provider level. In a vendor-locked system, a provider outage is a total system outage. In Lattice Runtime, it is a config change.

Crash → auto-recovery

Process crashes

Kill, OOM, or provider outage

Temporal replays

Deterministic replay from last checkpoint

Model swapped

Resume with a different model if needed

Audit logged

KILLED → RESTARTING → RUNNING. SHA-256 chained.

The security boundary

Credential forwarding, not credential sharing. In the coupled design, untrusted code runs next to credentials. A prompt injection only needs to convince the model to read its environment. The structural fix is to ensure credentials are never reachable from the sandbox.

In Lattice Runtime, authentication is wired into runtimes without the agent session ever seeing raw credentials. Git tokens are baked into the clone during sandbox init. MCP OAuth tokens live in a secure vault, accessed through a proxy that fetches them per-session.

Five governance gates. Every agent action passes through: Identity → Authorization → Constraints → Execute → Audit. Policy violations are structurally impossible — the infrastructure will not execute actions that fail any gate.

Five governance gates

Identity

OAuth2 / SAML / mTLS / API Key

Authorization

RBAC + ABAC / Rego → SQL

Constraints

Budget / PII / Model lock / Tool gate

Execute

Temporal durable workflow

Audit

SHA-256 chain / Diff capture

Every agent action passes through all five gates. Policy violations are structurally impossible.

Many brains, many hands

Many brains. Each workspace runs its own agent session with its own model configuration. An orchestrator running Opus can delegate to workers running Sonnet. A review agent can use GPT for a second opinion. A sensitive-data agent can use Ollama so nothing leaves the machine.

Many hands. Each brain connects to hands through execute(name, input) → string. The harness does not know whether the sandbox is a container, a remote server, or a local shell. Because no hand is coupled to any brain, brains can pass hands to one another.

Lazy provisioning. Runtimes are provisioned on the first tool call that needs them, not at session start. A session that never touches the sandbox does not wait for one. This dropped our p50 time-to-first-token by roughly 60%.

Many brains, many hands

Many brains

Orchestrator

Opus — planning, reasoning

Code Worker

Sonnet — execution, edits

Review Agent

GPT — second opinion

Sensitive Data Agent

Ollama — never leaves the machine

Many hands

Local

Direct filesystem

Worktree

Git isolation

SSH

Remote servers

Docker

Container sandbox

Devcontainer

VS Code compat

Lattice SSH

Managed tunnel

execute(name, input) → string

Same interface. Any hand. Any brain can pass hands to another.

What is different

A side-by-side comparison of vendor-locked managed agent infrastructure vs Lattice Runtime.

Vendor-locked

Lattice Runtime

Model

One provider only

Model

Any model, any provider

Hosting

Provider cloud

Hosting

Your cloud, your machine, or ours

Data

Provider servers

Data

Wherever you run it

Multi-agent

Single-model

Multi-agent

Mix models per agent role

Governance

Not built-in

Governance

5 gates, budgets, crypto audit

Runtimes

Cloud container

Runtimes

6 backends (local → K8s)

Crash recovery

Provider-dependent

Crash recovery

Resume with any model

Session

Provider-managed

Session

Durable, portable, yours

Context

Model-coupled

Context

Constructed from durable log

Credentials

Shared environment

Credentials

Forwarded, never exposed

Audit trail

Basic logging

Audit trail

SHA-256 hash chain, tamper-evident

Provisioning

Upfront container

Provisioning

Lazy, on first tool call

Building in public

We are two people. We ship every day. We are rolling out access in phases.

The bet is simple: today's best model will not be tomorrow's best model. The team that bets on one provider will rewrite their agent infrastructure every time the leaderboard shifts. The team that bets on interfaces will swap a config line and keep shipping.

Lattice Runtime is that interface layer. One Go binary. Any model behind it. Your data stays on your machines. Every action passes through five governance gates before it executes. Every event is written to a tamper-evident audit log.

The abstraction outlasts the provider. That is the whole point.


Written by the Lattice Runtime team. View the full architecture →

Run agents on your terms.

Star the repo to get on our radar. We reach out to stargazers first.

Star on GitHublattice-runtime

Star · Open an issue with your use case · Watch for the invite

View full architecture