# Build A Harness

> Open-source visual canvas for production AI agent harnesses. Draw workflows on a canvas, compile to any major AI framework, trace every decision. Apache 2.0.

Build A Harness is a **working tool** (v0.8.0, public alpha) — a visual canvas plus four framework adapters plus built-in Langfuse observability. It implements a complete AI agent harness architecture: a system that goes beyond simple prompt routing to give agents a world model, multi-tier control state, structured verification, recovery strategies, and cross-run learning. The full 11-layer harness (22 nodes) is implemented and tested (379 tests passing).

## AI Agent Frameworks Observatory

A live-updated, ranked, and searchable index of open-source AI agent frameworks with real GitHub stats (stars, forks, language, license, topics) refreshed daily via GitHub Actions. Covers orchestration, multi-agent, swarm, RAG, no-code/visual, autonomous, code-generation, Microsoft, memory, infrastructure, and research frameworks.

- **URL**: https://buildaharness.com/ai_agent_frameworks.html
- **Data coverage**: ~58 frameworks as of June 2026, updated daily
- **Source list**: repos.txt in the buildaharness-pages repository (owner/repo format with optional manual tags)
- **Sorting**: by stars (default), forks, name, language, license
- **Filtering**: by programming language and GitHub topic tags

## AI Agent Memory Frameworks Observatory

A curated index of open-source memory frameworks for AI agents, classified by layer (Framework, Infra, Research), memory class (Personalization, Institutional, Both), architecture pattern, license, and maturity (Production, Emerging, Research, Experimental). GitHub stars refreshed daily.

- **URL**: https://buildaharness.com/ai_agent_memory_frameworks.html
- **Data coverage**: ~24 frameworks as of June 2026
- **Source list**: memory-repos.txt in the buildaharness-pages repository (pipe-delimited: url | layer | memoryClass | architecture | license | maturity)
- **Notable entries**: mem0, Letta, Zep, Graphiti, LangMem, LlamaIndex, Cognee, MemOS, Chroma

## Harness engineering

Harness engineering is the discipline of designing, building, and operating the infrastructure that wraps around an AI model to make it reliable in production. The harness is everything the model does not do on its own: tool execution, memory, state management, control flow, verification, recovery, and observability. The model decides; the harness governs.

Build A Harness is the purpose-built open-source tool for harness engineering. It is the only visual canvas that targets the full harness architecture — world model, control state, verification, recovery, and cross-run learning — and compiles it to any major AI framework via FlowSpec.

**Harness engineering vs prompt engineering**: Prompt engineering optimises a single model call. Harness engineering designs the full execution loop across every call — including state, tools, verification, and recovery. In production, harness-level changes account for the large majority of agent reliability gains.

**Harness engineering vs workflow building**: A workflow routes prompts from node to node. A harness governs what the agent believes, what it is allowed to do, how it verifies outputs, and how it recovers from failures. Harnesses suit agents operating in environments where state, failure, and uncertainty are real.

**Dedicated page**: https://buildaharness.com/harness-engineering

## Who is this for?

Build A Harness is for **AI engineers and platform teams** building production-grade AI agents who need more than a simple prompt-routing workflow. It is a good fit for:

- Teams who want a **visual, framework-agnostic design tool** that does not lock them into a single runtime
- Engineers who need structured **control, verification, and recovery** baked into every agent run
- Teams already using LangGraph, CrewAI, Mastra, or Microsoft Agent Framework who want observability and HITL without rewriting their flows
- Open-source practitioners who need a **self-hosted, no-cloud-required** tool (Apache 2.0, runs on Docker)
- Developers who want to expose agent flows as REST endpoints, MCP tools, or A2A agents without extra boilerplate

It is **not** a low-code chatbot builder or a drag-and-drop tool for non-engineers. It targets developers who are comfortable with Docker and AI agent frameworks.

## Key facts

- **License**: Apache 2.0
- **Current version**: v0.8.0
- **Repository**: https://github.com/3IVIS/buildaharness
- **Homepage**: https://buildaharness.com
- **Status**: Public alpha — canvas, 4 framework adapters, and Langfuse observability are working and ready to use
- **Deployment**: fully local via Docker (`./scripts/setup-env.sh && docker compose up`); no cloud account required

## What ships today (v0.8.0)

- **Visual canvas** with 27 node types (14 execution + 13 harness) for designing AI agent workflows
- **Full harness layer** — world model, 5-tier control state, 9-layer verification, 6 recovery strategies, experience store, and adversarial reviewer pass, all implemented and tested (379 tests passing)
- **FlowSpec v0.2.0** — open, portable JSON format; the runtime-neutral intermediate representation
- **4 framework adapters**: a single FlowSpec compiles to LangGraph, CrewAI, Mastra, or Microsoft Agent Framework without rewriting
- **Langfuse observability** built in — every run is traced automatically across all 4 runtimes
- **HITL** (human-in-the-loop) pause and resume
- **Deployment options**: REST API, MCP server, A2A protocol
- 9 services start from a single `docker compose up`

## Supported runtimes

- **LangGraph** (Python / JS, MIT) — production standard for stateful agent orchestration
- **CrewAI** (Python, MIT) — multi-agent teams
- **Mastra** (TypeScript, Apache 2.0) — TypeScript-native orchestrator
- **Microsoft Agent Framework** (C# / Python / Java, MIT) — covers Semantic Kernel and AutoGen users (Microsoft merged both into one SDK)
- **A2A protocol** — framework-agnostic invocation interop with Google ADK, OpenAI Agents SDK, Claude Agent SDK, and any A2A-compatible runtime; no custom adapter required

## FlowSpec v0.2.0

The runtime-neutral intermediate representation. A single FlowSpec compiles to any supported adapter. Key fields: `spec_version`, `id`, `name`, `description`, `runtime_hints`, `state_schema`, `nodes`, `edges`, `tools`, `agents`, `memory_stores`, `model_defaults`, `flow_config`.

### Node types (27)

- **i/o**: `input`, `output`
- **llm**: `llm_call`
- **tools**: `tool_invoke`
- **agents**: `agent_role`, `agent_debate`
- **flow control**: `condition`, `parallel_fork`, `parallel_join`
- **human**: `hitl_breakpoint`
- **memory**: `memory_read`, `memory_write`
- **composition**: `subgraph`, `transform`

### Harness nodes (13)

Implement the 11-layer control architecture; every node compiles to all four runtimes.

- **world model**: `world_model`, `update_wm`, `hypothesis_set`, `gather_evidence`, `evidence_store`
- **control & planning**: `control_state`, `task_graph`, `apply_tool_rel`
- **verification & recovery**: `verify_gate`, `recovery`, `reviewer_pass`
- **learning & process**: `exp_store`, `process_concept`

## What a harness is (and why it matters)

A workflow routes prompts from node to node. A harness governs what the AI believes, what it is allowed to do, how it catches its own mistakes, and what it learns for next time.

The full harness target has 11 layers across 22 nodes:

1. **Caller State** — constraints, clarification requests, escalation propagation
2. **World Model** — typed beliefs, contradictions, generation_id tracking
3. **Reasoning** — evidence store, hypotheses from 4 generation sources, value-of-information gate before every action
4. **Control** — 5-tier state resolver (NORMAL → CAUTIOUS → BLOCKED), diagnostic health vectors, deadlock detection
5. **Planning** — task graph with 6 task states, parallel concurrency management
6. **Execution** — VOI-gated actions, pre-execution review gate across 5 dimensions, reversibility strategy
7. **Verification** — 9 verification layers, adequacy critic, adversarial reviewer pass
8. **Recovery** — 6 named strategies, typed failure library, local vs global replanning scope
9. **Memory** — context compression, execution journal, budget management
10. **Learning** — experience store for cross-run structural reuse (optional)
11. **Output & Reviewer Pass** — output contract validation, 3-lens review

## Architecture phases (all shipped)

All 12 build phases are complete; the full harness is implemented and tested (379 tests passing).

- **Foundation & State Architecture** — shipped
- **Evidence & Reasoning** — shipped
- **World Model & Contradiction Detection** — shipped
- **Diagnostics & Control State** — shipped
- **Planning & Task Graph** — shipped
- **Execution & Verification** — shipped
- **Recovery & Memory** — shipped
- **Caller State & Escalation** — shipped
- **Experience Store** — shipped
- **Reviewer Pass & Output Contract** — shipped
- **Canvas Integration** — shipped (13 harness node types, diagnostic health dashboard)
- **E2E Integration & Testing** — shipped (all architectural invariants across all 4 frameworks)

Each phase is independently deployable. All flows drawn today are forward-compatible.

## Quick start

```sh
# 1. Generate secrets and configure environment
./scripts/setup-env.sh

# 2. Start all services
docker compose up
```

Services that start:
- **Canvas** — `localhost:3000` (visual workflow editor)
- **API** — `localhost:8000` (compiles and runs flows)
- **Langfuse** — `localhost:3001` (observability dashboard)

Nine services total. `scripts/setup-env.sh` handles all secrets automatically. You only need to provide an LLM API key (or skip it and use a local model).

## How to contribute

- **Run it**: `docker compose up`, point it at a real flow, open a bug report with your FlowSpec JSON, the runtime, and the full error
- **Shape it**: open feature requests on GitHub — concrete use cases carry more weight than abstract asks; phase priority is shaped by community demand
- **Build with it**: FlowSpec v0.2.0 is stable for third-party node packs (`@buildaharness/nodes/…`); harness phases are open for community contribution

## Frequently asked questions

**What is Build A Harness?**
Build A Harness is an open-source (Apache 2.0) visual canvas for designing, testing, and deploying production AI agent harnesses. Draw the full architecture using 27 node types — 14 execution nodes plus 13 harness nodes implementing the complete 11-layer control architecture: world model, 5-tier control state, 9-layer verification, 6 recovery strategies, and cross-run learning. A single FlowSpec export compiles the same design to LangGraph, CrewAI, Mastra, or Microsoft Agent Framework without rewriting. Every run is traced automatically via Langfuse, HITL controls let humans pause and resume any flow, and one API call publishes the harness as a REST endpoint, an MCP tool, and an A2A agent simultaneously. Runs locally via Docker — no cloud account required.

**What is FlowSpec?**
FlowSpec is Build A Harness's open, runtime-neutral JSON format for describing AI agent workflows. It is the intermediate representation between the visual canvas and any execution runtime. A single FlowSpec file compiles to LangGraph, CrewAI, Mastra, or Microsoft Agent Framework without rewriting. FlowSpec v0.2.0 is stable and open for third-party node packs (`@buildaharness/nodes/…`). Key fields: `spec_version`, `id`, `name`, `description`, `runtime_hints`, `state_schema`, `nodes`, `edges`, `tools`, `agents`, `memory_stores`, `model_defaults`, `flow_config`.

**What is a harness vs a workflow?**
A workflow routes prompts from node to node. A harness governs what the AI believes, what it is allowed to do, how it catches its own mistakes, and what it learns for next time. The 5-tier control state resolver knows when to slow down or stop. Nine verification layers check outputs before they land. Recovery strategies handle failures systematically rather than crashing.

**Why runtime-neutral FlowSpec?**
Decouples the canvas from any runtime. Adding a new runtime means one adapter file. Canvas features — HITL, observability, versioning — are independent of runtime choice. One flow works across all 4 supported runtimes without rewriting.

**Why these four runtimes?**
LangGraph is the production standard for stateful agents. CrewAI has the largest audience for multi-agent team patterns. Mastra is the only TypeScript-native orchestrator with first-class support. Microsoft Agent Framework merged Semantic Kernel and AutoGen, covering enterprise .NET and Python users.

**Is it open source?**
Yes. Apache 2.0. Canvas, adapters, FlowSpec, and all harness phases.

**Does it require a cloud account?**
No. Everything runs locally via Docker.

## Legal

**Impressum**: https://buildaharness.com/impressum.html — 3IVIS GmbH, Plöner Str. 25, 14193 Berlin, Germany. Managing Director: Dr. Seyed Ali Nezamoddini. VAT: DE304631216.