Hermes Agent, Kilo Code, and OpenClaw all run the same commodity ReAct loop — each project's own documentation says so. Mapped against the eleven-layer harness architecture, here's what each one actually built around that loop, and what all three still leave thin. Read the loop-first version on Design a Loop →
Each project reports its own headline numbers in its own units — stars, users, tokens — so the cards below show each on its own terms.
Each agent's documented behavior mapped against the same eleven canonical layers used across Build A Harness — not an ad hoc feature list.
| Harness layer | Hermes Agent | Kilo Code | OpenClaw |
|---|---|---|---|
| 1. Caller State | ○Not described | ◐Agent "modes" (code/ask/plan/debug) fix tools + instructions per task | ◐Messaging channel/persona selects context — not a formal constraint system |
| 2. World Model | ○Raw history only | ◐Static Memory Bank files | ◐Static persona/bootstrap files |
| 3. Reasoning | ○Not described | ○Not described | ○Not described |
| 4. Control State | ○Not described | ◐Binary approval gate, not a tiered resolver | ○Queueing prevents races, isn't a risk control |
| 5. Planning | ◐Subagents get their own capped budget via delegate_task — no full dependency graph | ◐Subagent delegation for isolated subtasks | ○Not described |
| 6. Execution | ◐70+ tools, ~28 toolsets, 6 backends, concurrent dispatch | ●Git-snapshot checkpoints + diff-repair + tool-call translation | ◐Broadest action surface (device/OS tools), weakly gated |
| 7. Verification | ○Not described | ◐Informal completion check + diff-repair | ○Not described |
| 8. Recovery | ◐Provider fallback, fixed iteration budget | ◐Checkpoint rollback, diff-repair | ◐Auto-compaction retry only |
| 9. Memory | ●SQLite+FTS5, context compression with a 20-message floor | ◐Memory Bank, context condensing | ◐Raw JSONL transcript, token-limit + compaction reserve |
| 10. Learning | ●Self-improving skills — the project's headline differentiator | ○Not described | ○Explicitly static — no self-improvement |
| 11. Output & Reviewer Pass | ○Not described | ○Not described — human review substitutes | ○Reply shaping is cosmetic, not a quality gate |
Reading the pattern: Execution, Memory, and Learning are the only layers where any agent reaches full coverage — Kilo's checkpoints, and Hermes's persistent memory and self-improving skills, respectively. Reasoning and Output & Reviewer Pass get zero coverage across all three. The rest — Caller State, World Model, Control State, Planning, Verification, and Recovery — are partial at best everywhere: never fully absent, never fully built out either. Which is a fair description of where most production agents are today, popular or not.
Beyond the layer-by-layer mapping, each project reads differently in practice:
| Practical dimension | Hermes Agent | Kilo Code | OpenClaw |
|---|---|---|---|
| Model / provider breadth | 18+ providers, open-weight models | 500+ models, 60+ providers via the Kilo Gateway | Claude, GPT, DeepSeek, or local — OAuth piggyback on existing subscriptions |
| Primary reach | CLI, 20 messaging platforms, IDE (ACP), API, cron | VS Code, JetBrains, CLI, cloud agent, mobile | WhatsApp, Telegram, Discord, Signal, iMessage, Slack, Teams, native device apps |
| Business model | Free, MIT; runs on a few-dollar-a-month VPS | Free/BYOK or Kilo credits, zero markup on provider rates; $8M seed round | Free, MIT, self-hosted; foundation stewardship after the founder joined OpenAI |
| Known risk | Newest of the three (Feb 2026) — memory and self-improving skills lack a long track record | Parsing reliability is a constant fight across a 500-model matrix — not a novel loop, an inherited one | RCE CVE (CVSS 8.8), 1,000+ malicious marketplace skills reported, prompt-injection susceptible; restricted for state use in China |
All 11 layers ship as drawable, composable canvas nodes — Reasoning, Control State, Verification, and Output & Reviewer Pass included, the four layers thinnest across all three agents above. See the full architecture →
Every layer above wraps around a loop that, on its own, isn't a differentiator. Each project's own documentation says as much — and mapped against the same seven-stage loop, the pattern is clear.
| Loop stage | Hermes Agent | Kilo Code | OpenClaw |
|---|---|---|---|
| 1. Perceive | ○Raw session history, no typed beliefs | ◐Static Memory Bank files | ◐Static bootstrap/persona files |
| 2. Reason | ○Not described | ○Not described | ○Not described |
| 3. Decide | ○Provider fallback only | ◐Binary human-approval gate | ○Queueing, not risk-tiering |
| 4. Act | ◐Concurrent tool dispatch, no risk review | ●Git checkpoint before every edit | ◐Broad device/OS action surface |
| 5. Verify | ○Not described | ◐Informal check + diff-repair | ○Cosmetic reply shaping only |
| 6. Recover | ◐Fallback provider, iteration budget | ◐Checkpoint rollback, diff-repair | ◐Auto-compaction retry only |
| 7. Learn & Repeat | ●Persistent memory + self-improving skills | ◐Memory Bank persists across sessions | ○Static — explicitly no self-improvement |
{runId, acceptedAt} immediately"On its own, this loop is a fairly conventional ReAct-style tool-calling loop — structurally similar to what Claude Code, OpenClaw, and most other agent harnesses do..."
"On its own, this is a conventional ReAct-style tool-calling loop — structurally the same shape used by Cline, Roo Code, Claude Code, Cursor, and most other agent harnesses."
"On its own, this is a conventional ReAct-style tool-calling loop... The loop mechanics are not OpenClaw's differentiator."
They're three of the most-used open-source AI agents running today, each with public architecture documentation detailed enough to map against a common harness taxonomy — and each one explicitly describes its own loop as conventional in its own docs.
None covers more than 2 of 11 canonical layers at full strength. Hermes leads on Memory and Learning. Kilo Code leads on Execution. OpenClaw leads on reach, but its broad, weakly-gated action surface lines up with its documented security incidents — an RCE CVE and a marketplace incident involving over a thousand malicious skills.
The canonical architecture used across Build A Harness: Caller State, World Model, Reasoning, Control State, Planning, Execution, Verification, Recovery, Memory, Learning, and Output & Reviewer Pass. Full definition →
Build A Harness implements all 11 layers as drawable, composable canvas nodes — so you can build the layers these three agents leave thin (Reasoning, Control State, Verification, Output & Reviewer Pass) without writing that infrastructure yourself.
This comparison draws on each project's own architecture docs and independent reporting, current as of mid-2026. Token-volume figures are live from OpenRouter's global app rankings, accessed July 2026 — those numbers move daily and will drift from this snapshot.
The layer-by-layer coverage mapping above is our own reading of each project's public documentation against Build A Harness's canonical taxonomy, not an official audit or a claim endorsed by any of the three projects.
Reasoning, Control State, Verification, and Output & Reviewer Pass — the layers where all three of the most-used open-source agents are thinnest — ship as drawable canvas nodes in Build A Harness. Apache 2.0. Runs locally via Docker.