# Show HN： Statewright--让人工智能代理更可靠的可视化状态机

- 来源：Hacker News 热门（buzzing.cc 中文翻译）
- 作者：azurewraith
- 发布时间：2026-05-13 21:08
- AIHOT 分数：70
- AIHOT 链接：https://aihot.virxact.com/items/cmp43i3fi02mvsljxe3rox607
- 原文链接：https://github.com/statewright/statewright

## AI 摘要

Statewright发布了一个开源的可视化状态机工具，旨在通过图形化界面提升人工智能代理的可靠性。该工具允许开发者直观地设计和监控AI代理的状态转换，简化开发流程，减少错误并增强系统稳定性。在Hacker News上，该项目获得101个点赞，显示出技术社区对其创新性的关注。开发者可通过GitHub访问代码，将其集成到AI项目中以提高可维护性和性能。

## 正文

statewright

Agents are suggestions, states are laws.

Agents are suggestions, states are laws.

State machine guardrails that control which tools your AI agent can use in each phase. Define a workflow once, enforce it across Claude Code, Codex, Cursor, opencode, and Pi. Full docs →

The problem

AI agents are brittle. Give a model 40+ tools and an open-ended problem and it re-reads the same file five times, calls Edit during review, deploys before tests pass. The common fix is bigger models and longer prompts... it helps sometimes. Observability tells you what went wrong after the fact; it doesn't prevent it.

The approach

Instead of making the model bigger, make the problem smaller.

State machines constrain the tool and solution spaces so the model reasons in a focused context at each step. A planning state gets read-only tools. When the agent transitions to implementation, edit tools unlock with limited shell access. Write-via-redirect and destructive ops are still blocked even when Bash is allowed. Testing only permits designated test commands.

Call a tool that's not in the current phase and you get rejected with a message telling you what IS available and how to transition. State machines loop and retry (unlike DAGs), which is what agentic work actually needs.

Works on frontier and local models alike. Below 13GB, models can produce tool calls but can't retain enough file content to make accurate edits. Above that threshold, the guardrails start turning failures into completions.

Quickstart

Install into Claude Code:

/plugin marketplace add statewright/statewright /plugin install statewright

/plugin marketplace add statewright/statewright /plugin install statewright

Your browser opens → sign up at statewright.ai → generate a key → paste it → done.

Then start a workflow:

❯ start the bugfix workflow — fix the failing tests in calc.py ◆ statewright — statewright_start (workflow: bugfix) ◆ [statewright] Workflow activated: bugfix ◆ statewright — statewright_get_state (MCP) ◆ Current phase: planning. Let me read the code first. Read 2 files [statewright] planning => implementing ◆ statewright — statewright_transition (READY) Edit calc.py: 1 line changed [statewright] implementing => testing ◆ statewright — statewright_transition (DONE) Bash: pytest -x — 7 passed [statewright] testing => completed ◆ [statewright] Workflow complete. 46 seconds.

❯ start the bugfix workflow — fix the failing tests in calc.py ◆ statewright — statewright_start (workflow: bugfix) ◆ [statewright] Workflow activated: bugfix ◆ statewright — statewright_get_state (MCP) ◆ Current phase: planning. Let me read the code first. Read 2 files [statewright] planning => implementing ◆ statewright — statewright_transition (READY) Edit calc.py: 1 line changed [statewright] implementing => testing ◆ statewright — statewright_transition (DONE) Bash: pytest -x — 7 passed [statewright] testing => completed ◆ [statewright] Workflow complete. 46 seconds.

You can also use the slash command directly: /statewright start bugfix.

/statewright start bugfix

Research results

In our 5-task SWE-bench subset (not the full 2294-instance benchmark), two local models went from 2 of 10 attempts passing to 10 of 10 with statewright constraints. Same tasks, same hardware.

Model Size Bug Fix (26 lines) SWE-bench (5 tasks) gemma3 3.3GB FAIL FAIL gemma4:e2b 7.2GB PASS* FAIL gpt-oss:20b 13.8GB PASS PASS (5/5) gemma4:31b 19.9GB PASS PASS (5/5) llama3.3 42.5GB PASS PASS (2/2)†

*with specialized edit_line tool adaptation †tested on 2 of the 5 tasks (added after initial experiment run)

The floor is around 13GB. Below that, models identify bugs correctly but can't serialize surgical edits (they rewrite entire files). That's a model limitation, not ours.

The structural win on larger models is breaking read-loop death spirals and keeping the tool space small enough that the model reasons instead of flailing. Research brief →

How it works

Architecture

Three layers, each independently useful:

Engine (crates/engine) — Pure Rust state machine evaluator. States, transitions, guards, tool restrictions. Deterministic. No LLM in the loop. No runtime dependencies.

Engine (crates/engine) — Pure Rust state machine evaluator. States, transitions, guards, tool restrictions. Deterministic. No LLM in the loop. No runtime dependencies.

crates/engine

Agent binary (crates/cli, binary: sw-agent) — Direct-to-Ollama agent executor. Loads a workflow, runs the LLM in a constrained loop, enforces tool access, and streams structured JSONL events. Supports per-state model routing via --config, and single-state execution via --state (the TUI or MCP gateway orchestrates, sw-agent executes one state at a time and exits).

Agent binary (crates/cli, binary: sw-agent) — Direct-to-Ollama agent executor. Loads a workflow, runs the LLM in a constrained loop, enforces tool access, and streams structured JSONL events. Supports per-state model routing via --config, and single-state execution via --state (the TUI or MCP gateway orchestrates, sw-agent executes one state at a time and exits).

crates/cli

sw-agent

--config

--state

sw-agent

Plugin layer (crates/mcp-gateway + plugins/) — MCP gateway that integrates with coding agents (Claude Code, Codex, Pi, etc.). When you activate a workflow, hooks enforce tool restrictions per state. The model sees 5 tools instead of 30. It gets clear instructions for the current phase and transitions when conditions are met. The statewright_run_agent MCP tool spawns the Rust binary for states that benefit from direct Ollama execution.

Plugin layer (crates/mcp-gateway + plugins/) — MCP gateway that integrates with coding agents (Claude Code, Codex, Pi, etc.). When you activate a workflow, hooks enforce tool restrictions per state. The model sees 5 tools instead of 30. It gets clear instructions for the current phase and transitions when conditions are met. The statewright_run_agent MCP tool spawns the Rust binary for states that benefit from direct Ollama execution.

crates/mcp-gateway

plugins/

statewright_run_agent

The TUI (crates/tui, binary: statewright) is a ratatui terminal interface that spawns sw-agent as a subprocess and renders its JSONL event stream in real time. It handles keyboard input, demo mode, and fixture selection.

crates/tui

statewright

sw-agent

Per-state model routing

States can specify which model to use via the model field. A default_model in meta applies to states without an explicit override. Clients that support programmatic model switching (Pi, the Rust harness) enforce this; others treat it as advisory.

model

default_model

meta

{ "meta": { "default_model": "claude-sonnet-4-20250514" }, "states": { "diagnose": { "model": "claude-haiku-4-5-20251001", "allowed_tools": ["Read", "Bash"] }, "propose_fix": { "model": "anthropic/claude-opus-4-6", "allowed_tools": ["Read"] }, "execute": { "allowed_tools": ["Read", "Edit", "Bash"] } } }

In this example, diagnose uses Haiku (fast, cheap reconnaissance), propose_fix escalates to Opus (high-stakes reasoning), and execute inherits the default_model (Sonnet). The sw-agent binary also accepts a --config file with a model_routing block for per-state Ollama URL, temperature, and context window overrides.

diagnose

propose_fix

execute

default_model

sw-agent

--config

model_routing

Guardrails

Guardrail What it does Per-state tool enforcement Agent can't see or call tools outside allowed_tools for the current state Bash discernment Blocks echo > file, rm -rf, sed -i, and scripting interpreters (python, node) when Write/Edit aren't allowed. Even if Bash itself is permitted. Edit guards Rejects diffs exceeding max_edit_lines, caps files edited per state Command allow-lists Only prefix-matched commands run (e.g. pytest, cargo test) Conditional transitions Programmatic guards on context data: test_result eq pass, coverage gt 80 Approval gates requires_approval pauses for human review Interrupts Edit a file matching a glob pattern? Auto-transition to a validation state, then return where you were Fork/join Run branches sequentially or in parallel, join when all (or any) complete Environment scoping Hide PROD_DB_URL via blocked_env, substitute with env_overrides Session isolation Per-session state via CLAUDE_SESSION_ID Per-state model routing Route cheap states to small models, expensive states to frontier models. model per state, default_model in meta. Thinking level control Per-state thinking_level field (high, medium, low, off) for clients that support reasoning effort tuning. Tool escalation detection Validator warns when a state jumps 2+ privilege levels without an approval gate

allowed_tools

echo > file

rm -rf

sed -i

python

node

max_edit_lines

pytest

cargo test

test_result eq pass

coverage gt 80

requires_approval

PROD_DB_URL

blocked_env

env_overrides

CLAUDE_SESSION_ID

model

default_model

meta

thinking_level

high

medium

low

off

Full guardrail reference in the docs.

Define your own workflows

{ "id": "bugfix", "initial": "planning", "meta": { "default_model": "claude-sonnet-4-20250514" }, "states": { "planning": { "allowed_tools": ["Read", "Grep", "Glob"], "model": "claude-haiku-4-5-20251001", "thinking_level": "low", "max_iterations": 8, "on": { "READY": "implementing" } }, "implementing": { "allowed_tools": ["Read", "Edit", "Write"], "max_edit_lines": 20, "max_files_per_state": 3, "on": { "DONE": "testing" } }, "testing": { "allowed_tools": ["Read", "Bash"], "allowed_commands": ["pytest", "cargo test", "npm test"], "on": { "PASS": { "target": "completed", "guard": "tests_passed" }, "FAIL_TEST": "implementing" } }, "completed": { "type": "final" } }, "guards": { "tests_passed": { "field": "test_result", "op": "eq", "value": "pass" } } }

Point your agent at the JSON schema and it generates a workflow via statewright_create_workflow. Tweak tools, commands, and environment blocks in the visual editor.

statewright_create_workflow

Supported agents

Hard enforcement means tool calls are intercepted at the hook layer before execution. Advisory means rules are injected into context but the model isn't prevented from ignoring them.

Agent Integration Enforcement Claude Code Hooks + MCP Hard Codex Hooks + MCP Hard Oh My Codex Hooks + MCP Hard Pi TypeScript extension Hard* opencode TypeScript plugin Hard (alpha) Cursor MCP + rules Advisory

*Pi includes tool name normalization and tool-call recovery for local models (Ollama, LM Studio).

MCP tools

The gateway exposes these tools to the connected agent:

Tool Purpose statewright_load_workflow Activate a named workflow, optionally resuming a paused run statewright_get_state Current state, allowed tools, transitions, iteration count, model, thinking level statewright_transition Emit an event to advance the state machine statewright_list_workflows List available workflows and which is active statewright_create_workflow Create a new workflow from a JSON definition statewright_pause Pause the current run; resume later with load_workflow(resume=true) statewright_deactivate Turn off enforcement; all tools pass through statewright_get_status Gateway health: active workflow, state, available workflows statewright_run_agent Spawn the Rust agent executor (sw-agent) for direct-to-Ollama bug fixing statewright_force_state Jump to any state bypassing guards (debug mode only, gated on meta.debug)

statewright_load_workflow

statewright_get_state

statewright_transition

statewright_list_workflows

statewright_create_workflow

statewright_pause

load_workflow(resume=true)

statewright_deactivate

statewright_get_status

statewright_run_agent

sw-agent

statewright_force_state

meta.debug

Pricing

The managed cloud at statewright.ai handles workflow storage, run history, and the MCP gateway. Prices won't go up.

Plan Workflows Transitions/mo Run History Price Free 3 200 72 hours $0 Pro 10 2500 7 days $29/mo Team 30 10000 90 days $99/mo Enterprise Unlimited Unlimited to Specification Contact us

Self-hosting

Run the full stack locally with Docker Compose — PocketBase, MCP gateway, and workflow editor. BYO Ollama. Self-hosted guide →

cd self-hosted && docker compose up --build

The engine (crates/engine) and agent layer (crates/agent) are Apache 2.0, embeddable with no runtime dependencies. The MCP gateway is FSL-1.1-ALv2 (converts to Apache 2.0 in 2029). Single-developer and single-team self-hosting is permitted under the FSL license.

crates/engine

crates/agent

Tradeoffs

Requires MCP support in the agent (or hooks for non-MCP agents like Codex)

Workflow definitions are authored by hand, though agents can generate them via statewright_create_workflow

statewright_create_workflow

Cursor enforcement is advisory, not hard. MCP alone can't gate tool calls in Cursor's architecture

Research results are from a 5-task SWE-bench subset, not the full 2294-instance benchmark

If a workflow is too restrictive, the agent gets stuck. statewright_deactivate is the escape hatch

statewright_deactivate

Docs

docs.statewright.ai — install guide, workflow authoring, schema reference, MCP tool reference, and agent-generated workflows.

Contributing

Workflow definitions, templates, and bug reports welcome. See Create Your Own for how to write workflows.

Report an issue

Discussions & feedback

License

Apache 2.0 — portions FSL-1.1-ALv2 (converts to Apache 2.0 on May 3, 2029). Managed cloud at statewright.ai.

This project includes a patent pledge covering independent implementations of the techniques described in the patent. Solo developers, researchers, open source projects, and single-team self-hosted deployments are covered regardless of whether they use Statewright software.

One hook to rule them all.

One hook to rule them all.
