rockbot.dev

Agentic Architecture
in Practice

Harness · LLM · Directives

A working model for how agentic systems are
designed, governed, and extended — with RockBot as the example.

Rockford Lhotka  ·  rockbot.dev  ·  blog.lhotka.net

The thesis

What is an agent?

Not "an LLM with tools bolted on." In practice it's three distinct parts:

Agent = Harness + LLM + Directives
Harness
The code that controls execution — the loop.
LLM
Reasons and generates. A black box.
Directives
Shape behavior — scope, tone, rules.

Pull any one out and you don't have an agent.

Part 1 of 3 — the LLM

The LLM is a black box

It's a microservice: text in → text out. Nothing more.

prompt (text) LLM stateless · no memory · no tools completion (text)
It doesn't remember the last call.
It can't call a tool — it just writes that it wants to.
It takes no action. It only generates text.

So where does all the "agent" behavior come from?  →

Part 2 of 3 — the harness

The harness is the loop

This is where the real engineering lives.

User input a request Build prompt + directives, memory, tools Call LLM Parse result text? or tool call? Run tool feed result back in tool call loop final text Show user

The inner loop — call a tool, feed the result back, call the LLM again — is what makes an agent feel agentic.

Part 3 of 3 — directives

Directives shape behavior

Scope · tone · rules-for-success. In RockBot, literally markdown files, composed into the system prompt.

📜 soul.md
Scope & identity.
Who the agent is, what it's for.
⚖️ directives.md
Rules.
Constraints that keep it safe and on-task.
🎨 style.md
Tone.
How it communicates.

Layered & composable — base persona, then per-task, then per-skill:

soul + directives + style + task directive + skill system prompt

Change behavior without changing code.

The concrete example

Meet RockBot

An event-driven, process-isolated autonomous agent framework for .NET.

"Nothing trusts
the LLM."

message bus (RabbitMQ) isolated processes least privilege MIT · open source

Everything that follows is a real subsystem in this codebase.

Reaching outward

Tools: how the harness acts

The LLM only asks for a tool. The harness routes and runs it — then feeds the result back.

Harness tool registry · dispatch MCP servers email, calendar, drive… CLI / scripts sandboxed Python Web & APIs search · fetch page LLM "call X"

RockBot leans on MCP heavily — secrets live in the MCP servers, not the agent. (more on that later)

Reaching outward

Peers: agents calling agents

The A2A protocol lets RockBot delegate to other agents — or be delegated to.

RockBot orchestrator Peer agent delegate task Peer agent …or calls in

A tool call reaches a service. An A2A call reaches a reasoning peer — its own harness, LLM, and directives.

The pivot

The harness handles a lot

Once the core (loop + LLM + directives) is solid, this is what becomes possible:

Harness the core Memory Skills Tool calling A2A peers Scheduled tasks Self-optimization LLM efficiency

The rest of the talk walks the spokes.

Spoke 1 — context that persists

Memory: three tiers + a graph

💬 Conversation
Last N turns. Session-scoped, ephemeral. The here-and-now.
🗂️ Working
Shared, namespaced scratch space with a TTL. Hand-off between sessions, tasks & subagents.
🧠 Long-term
Permanent facts & preferences. Recalled per-turn by BM25 (+ vector) search.
🕸️ Knowledge graph
Entities & relationships for structured, relational reasoning.

The trick isn't storing it — it's injecting only what's relevant, every turn, without token bloat.

Delta injection: a memory recalled once isn't re-injected the same session.

Spoke 2 — repeatable behavior

Skills: fixed guides & evolving know-how

Memory stores facts. Skills store how to do something — markdown procedures, recalled by BM25.

📘 Subsystem guides
Authored, fixed. "How to use the email MCP," "how working memory works." Ship with the agent.
🌱 Mutable skills
The agent writes its own — captures a procedure that worked, then refines it over time.
do task write skill reuse refine / merge

The refine step happens offline — in the dream. →

Spoke 3 — autonomy

Tasks on a schedule

No user required. Cron-driven "patrol" tasks fire on their own — each with its own directive.

cron 📨 Inbox patrol directive: triage, flag urgent 💓 Heartbeat / briefing directive: summarize, alert 🔎 Watch / monitor directive: detect change, report → working memory findings for you, later

Same harness, same LLM — a different directive turns it into a different worker.

Spoke 4 — cost & resilience

LLM routing: tiers & fallback

The LLM is swappable — so RockBot routes work across cost/capability tiers, and falls back when a provider fails.

cheaper · faster more capable · costlier Balanced the workhorse tier REQUIRED Low cheap, simple work optional High hard reasoning optional optional fallback chain Primary provider A down Fallback provider B no tokens Fallback provider C

Balanced is the one requirement. Low & High are levers for cost and capability — and the fallback chain keeps the agent working when a provider is down or out of tokens.

Spoke 4 — cost & resilience

The orchestrator delegates down

The primary agent runs the conversation — and pushes each task to the lightest executor that can do it. Routine work drops to a cheap model, or no model at all.

Primary agent User-facing orchestrator — runs the conversation, does some work itself. full model · $$$$ delegates Subagents Run independently, off your thread. Same toolkit and all 3 LLM tiers — but never talk to the user. all 3 tiers · $$$$ spawn Workers Small, single-purpose agents. Usually spawned by a subagent for one focused task. Low tier · $ or run a Wisps No LLM at all — just run a pre-built script of tool calls. A pure workflow executor. no LLM · ~free

Primary and subagents both reach for any tier. The savings come below: workers on the Low tier, wisps with no model at all.

Spoke 5 — self-learning

RockBot dreams

A background cycle that runs offline and refactors the agent's own knowledge — no user, no goals changed.

🧠 Consolidate memory
Merge duplicates, decay stale facts, mine anti-patterns from corrections.
🌱 Optimize skills
Refine, merge, and cluster the skills it has written.
🔀 Tune LLM routing
Learn which work belongs in which tier — Low, Balanced, or High — for the best cost/quality.
🪞 Infer preferences
Notice patterns across conversations; learn how you like to work.

Yesterday's experience makes tomorrow's agent sharper — automatically.

Governance — trust & isolation

Principle of least privilege

The core philosophy, restated: nothing trusts the LLM. The core agent holds no keys, no passwords.

Core agent 🚫 no secrets only LLM keys 🔐 MCP server holds the email / API credentials 🔐 Isolated service scoped, auditable, revocable 🔐 Message bus no in-process LLM code

A compromised prompt can't leak a secret the agent never had.

Governance — trust & isolation

Untrusted code runs sandboxed

RockBot can run Python and pull web pages — but never in-process.

🐍 Python execution
Runs in an ephemeral, low-privilege container. Spun up per run, destroyed after. No host access, no secrets, minimal blast radius.
🌐 Web search & fetch
Search the web and retrieve pages — but that content is untrusted input, handled with the same suspicion as the LLM's output.

Same pattern everywhere: assume the LLM (and the web) can be wrong or hostile — and contain it.

The working model

Agent = Harness + LLM + Directives

Get the core right, and it extends:

memory skills LLM routing self-learning scheduled tasks dreams MCP A2A least privilege

From abstract "agent" talk to a working model for how agentic
systems are designed, governed, and extended.

rockbot.dev  ·  github.com/MarimerLLC/rockbot  ·  blog.lhotka.net