Skip to content

Autonomous Green-Field Build

/mk:autobuild orchestrates a full product build autonomously — planner → contract → generator ⇄ evaluator loop → ship.

Best for: Green-field product builds ("build me a X")
Time estimate: 30–180 minutes (multi-hour autonomous runs supported)
Skills used: mk:autobuild, mk:sprint-contract, mk:evaluate, mk:rubric

When to Use

Use /mk:autobuild when you want to build a new product or substantial green-field system autonomously:

  • "Build me a kanban app with drag/drop and user auth"
  • "Create a time-tracking CLI with SQLite storage"
  • "Scaffold a REST API with authentication and tests"

Not for:

  • Single features on an existing codebase → use /mk:cook
  • Bug fixes or refactors → use /mk:fix
  • Architecture exploration → use /mk:party

What You Get

  • Adaptive scaffolding — density (MINIMAL / FULL / LEAN) auto-selected per model tier; no manual setup
  • Generator/evaluator separation — developer agent builds; evaluator agent grades the running build from a fresh context, preventing self-eval bias
  • Weighted rubric verdict — each criterion scored independently; hard-fail propagation means one broken dimension fails the sprint
  • Full audit trail — run report, contract, evidence directory, and evaluator verdict all written to tasks/

Prerequisites

  • Opus 4.5+ recommended (Sonnet works with FULL density, Haiku short-circuits to mk:cook)
  • If on Opus 4.6+, set export MEOWKIT_MODEL_HINT=opus-4-6 before starting Claude Code — enables LEAN auto-detect
  • Budget awareness: use --budget 50 or export MEOWKIT_BUDGET_CAP=50 for long runs
  • MeowKit v2.2.0+ installed (npx mewkit doctor to verify)

Quick Start

bash
/mk:autobuild "build a kanban app"

That's it. The harness reads MEOWKIT_MODEL_HINT, selects density, and runs the full pipeline.

Worked Example: Kanban App

This walks through what actually happens when you run:

/mk:autobuild "build a kanban app with drag/drop cards and user auth"

Step 0 — Density Auto-Select

The harness reads MEOWKIT_MODEL_HINT and the model tier:

mk:autobuild: model hint = opus-4-6 → tier = COMPLEX → density = LEAN
Scaffolding: single-session, contract optional, 0–1 evaluator iterations
Run ID: harness-260408-1423-kanban

On Sonnet (no hint set): density = FULL (contract required, up to 3 iterations).

Step 1 — Product-Level Plan

The planner agent runs in --product-level mode. It emits user stories and design language — NOT file paths or class names:

Plan: Kanban App
  User stories:
  - As a user, I can create boards and add named lists to them
  - As a user, I can drag cards between lists to update status
  - As a user, I can sign in with email/password to save my boards
  Design language: clean utility-first UI, dark mode, smooth drag feedback

Gate 1 fires — you review and approve the product spec.

Step 2 — Sprint Contract (FULL density only)

The sprint-contract skill drafts acceptance criteria tied to evaluator rubrics:

Contract: harness-260408-1423-kanban-sprint-1.md
  AC-1: Board CRUD — create, rename, delete boards (→ functionality rubric)
  AC-2: Card drag/drop — reorder within list and move between lists (→ functionality)
  AC-3: Auth flow — sign up, sign in, sign out, persist session (→ functionality)
  AC-4: UI quality — no broken layouts, consistent spacing (→ design-quality)
Signed. Gate enforced by gate-enforcement.sh.

In LEAN mode this step is skipped if estimated ACs < 5.

Step 3 — Generator Build

The developer agent builds in 4 subphases. It cannot read the contract evaluator results — context is fresh per harness-rules.md Rule 2:

Subphase 1 — Project setup (Vite + React + Tailwind + Pinia)
Subphase 2 — Data model (Board, List, Card stores; SQLite via Drizzle)
Subphase 3 — UI (BoardView, ListColumn, CardItem with @dnd-kit/core drag)
Subphase 4 — Auth (JWT login/signup, protected routes)
Tests: 31/31 passing ✓

Step 4 — Evaluator Verdict

The evaluator agent starts with a fresh context and the frontend-app rubric preset (4 rubrics: product-depth, functionality, design-quality, originality). It drives the running build directly:

Active verification:
  → browser: open http://localhost:5173
  → click "Create board" → board appears ✓
  → drag card from "Todo" to "In Progress" → card moves ✓
  → sign up → session persists on reload ✓
  → screenshot evidence written to evidence/kanban-260408/

Rubric scores:
  product-depth:    0.82  PASS
  functionality:    0.91  PASS
  design-quality:   0.74  PASS
  originality:      0.68  PASS (no anti-patterns detected)
  Weighted overall: 0.81  → PASS

The skeptic persona is re-anchored before each rubric to prevent leniency drift.

Step 5 — Iterate or Ship

PASS on first round → proceed to ship. On FAIL, the loop re-runs up to 3 times (configurable via --max-iter). After 3 consecutive FAILs, the harness escalates to human via AskUserQuestion.

Step 6 — Artifacts

tasks/autobuild-runs/autobuild-260408-1423-kanban/run.md   ← full run report
tasks/contracts/260408-kanban-sprint-1.md                ← signed contract (FULL only)
tasks/reviews/260408-kanban-evalverdict.md               ← evaluator verdict + scores
evidence/kanban-260408/                                  ← screenshots + curl output

Flags

FlagDefaultDescription
--tier MINIMAL|FULL|LEANautoOverride scaffolding density
--max-iter N3Max generator ⇄ evaluator rounds before escalation
--budget N100Hard budget cap in USD
--no-bootoffSkip project scaffolding (use existing repo)
--teamsoffSpawn parallel developer agents (COMPLEX only)
--resume RUN_IDResume interrupted run from last checkpoint

Reading the Verdict

The verdict file (tasks/reviews/*-evalverdict.md) contains:

  • Per-rubric weighted scores — each rubric graded independently against its anchors
  • Overall weighted score — sum of (rubric weight × rubric score)
  • Hard-fail propagation — any single rubric FAIL sets overall verdict to FAIL regardless of score (harness-rules.md Rule 3)
  • Evidence directory — screenshots, curl output, CLI logs used for grading

A score ≥ 0.70 per rubric is PASS. Hard-fail threshold is configurable per rubric in .claude/rubrics/.

Troubleshooting

"Density auto-detected as FULL but I'm on Opus 4.6"
Claude Code does not export model env vars to hooks. Set export MEOWKIT_MODEL_HINT=opus-4-6 in your shell before running claude. Then restart Claude Code. Without the hint, Opus 4.6 silently gets FULL density.

"Budget breach at iteration 2"
Pass a lower cap: /mk:autobuild "..." --budget 30. Or re-scope the product spec to fewer features — the planner's product-level output drives total work. /mk:autobuild --no-boot on an existing scaffold also reduces Phase 3 cost.

"Evaluator returned FAIL but the build looks fine to me"
Open the evidence directory — the evaluator writes screenshots and curl output for every graded AC. Use /mk:elicit to re-examine a specific rubric with deeper elicitation methods. If the rubric anchors are miscalibrated for your project, see .claude/rubrics/calibration-guide.md.

"Run interrupted mid-build"
Autobuild checkpoints state to tasks/autobuild-runs/{run-id}/run.md. Resume with: /mk:autobuild --resume autobuild-260408-1423-kanban.

Released under the MIT License.