feat: enforce functional verification, mandatory quality pipeline, and requirement checks
Add three new sections to AGENTS.md addressing workflow gaps observed in a session where 6 features were implemented but none were functionally tested — only static analysis (type checks, linting) was used, resulting in broken features shipped as 'done'. New rules: - Functional Verification: features must be end-to-end tested before completion; static analysis alone is explicitly insufficient - Mandatory Quality Pipeline: every feature gets reviewer + tester passes; no batch validation; quality over quantity under time pressure - Requirement Understanding Verification: verify understanding before implementing features with implicit expectations or domain concepts Also simplifies tester bash permissions and adds plan write permission.
This commit is contained in:
@@ -113,3 +113,79 @@ When implementing features, the Lead must stop and request explicit user approva
|
||||
5. **Data model changes with migration impact**: Schema changes affecting existing production data.
|
||||
|
||||
The checkpoint must present the specific decision, 2-3 concrete options with tradeoffs, a recommendation, and a safe default. Implementation-level decisions (naming, file organization, code patterns) are NOT checkpoints — only user-visible behavior and architectural choices qualify.
|
||||
|
||||
## Functional Verification (Implement → Verify → Iterate)
|
||||
|
||||
**Static analysis is not verification.** Type checks (`bun run check`, `tsc`), linters (`eslint`, `ruff`), and framework system checks (`python manage.py check`) confirm code is syntactically and structurally valid. They do NOT confirm the feature works. A feature that type-checks perfectly can be completely non-functional.
|
||||
|
||||
**Every implemented feature MUST be functionally verified before being marked complete.** "Functionally verified" means demonstrating that the feature actually works end-to-end — not just that it compiles.
|
||||
|
||||
### What Counts as Functional Verification
|
||||
|
||||
Functional verification must exercise the **actual behavior path** a user would trigger:
|
||||
|
||||
- **API endpoints**: Make real HTTP requests (`curl`, `httpie`, or the app's test client) and verify response status, shape, and data correctness. Check both success and error paths.
|
||||
- **Frontend components**: Verify the component renders, interacts correctly, and communicates with the backend. Use the browser (Playwright) or run the app's frontend test suite.
|
||||
- **Database/model changes**: Verify migrations run, data can be created/read/updated/deleted through the ORM or API, and constraints are enforced.
|
||||
- **Integration points**: When a feature spans frontend ↔ backend, verify the full round-trip: UI action → API call → database → response → UI update.
|
||||
- **Configuration/settings**: Verify the setting is actually read and affects behavior — not just that the config key exists.
|
||||
|
||||
### What Does NOT Count as Functional Verification
|
||||
|
||||
These are useful but insufficient on their own:
|
||||
|
||||
- ❌ `bun run check` / `tsc --noEmit` (type checking)
|
||||
- ❌ `bun run lint` / `eslint` / `ruff` (linting)
|
||||
- ❌ `python manage.py check` (Django system checks)
|
||||
- ❌ `bun run build` succeeding (build pipeline)
|
||||
- ❌ Reading the code and concluding "this looks correct"
|
||||
- ❌ Verifying file existence or import structure
|
||||
|
||||
### The Iterate-Until-Working Cycle
|
||||
|
||||
When functional verification reveals a problem:
|
||||
|
||||
1. **Diagnose** the root cause (not just the symptom).
|
||||
2. **Fix** via coder dispatch with the specific failure context.
|
||||
3. **Re-verify** the same functional test that failed.
|
||||
4. **Repeat** until the feature demonstrably works.
|
||||
|
||||
A feature is "done" when it passes functional verification, not when the coder returns without errors. The lead agent must never mark a task complete based solely on a clean coder return — the verification step is mandatory.
|
||||
|
||||
### Verification Scope by Change Type
|
||||
|
||||
| Change type | Minimum verification |
|
||||
|---|---|
|
||||
| New API endpoint | HTTP request with expected response verified |
|
||||
| New UI feature | Browser-based or test-suite verification of render + interaction |
|
||||
| Full-stack feature | End-to-end: UI → API → DB → response → UI update |
|
||||
| Data model change | Migration runs + CRUD operations verified through API or ORM |
|
||||
| Bug fix | Reproduce the bug scenario, verify it no longer occurs |
|
||||
| Config/settings | Verify the setting changes observable behavior |
|
||||
| Refactor (no behavior change) | Existing tests pass + spot-check one behavior path |
|
||||
|
||||
## Mandatory Quality Pipeline
|
||||
|
||||
**The reviewer and tester agents exist to be used — not decoratively.** Every non-trivial feature must go through the quality pipeline. Skipping reviewers or testers to "save time" creates broken features that cost far more time to debug later.
|
||||
|
||||
### Minimum Quality Requirements
|
||||
|
||||
- **Every feature gets a reviewer pass.** No exceptions for "simple" features — the session transcript showed that even apparently simple features (like provider selection) had critical bugs that a reviewer would have caught.
|
||||
- **Every feature with user-facing behavior gets a tester pass.** The tester agent must be dispatched for any feature that a user would interact with. The tester validates functional behavior, not just code structure.
|
||||
- **Features cannot be batch-validated.** Each feature gets its own review → test cycle. "I'll review all 6 workstreams at the end" is not acceptable — bugs compound and become harder to diagnose.
|
||||
|
||||
### The Lead Must Not Skip the Pipeline Under Time Pressure
|
||||
|
||||
Even when there are many features to implement, the quality pipeline is non-negotiable. It is better to ship 3 working features than 6 broken ones. If scope must be reduced to maintain quality, reduce scope — do not reduce quality.
|
||||
|
||||
## Requirement Understanding Verification
|
||||
|
||||
Before implementing a feature, the lead must verify its understanding of what the user actually wants — especially for features involving:
|
||||
|
||||
- **User-facing behavior models** (e.g., "the app should learn from my data" vs. "the user manually inputs preferences")
|
||||
- **Implicit expectations** (e.g., "show available providers" implies showing which ones are *configured*, not just listing all possible providers)
|
||||
- **Domain-specific concepts** (e.g., in a travel app, "preferences" might mean auto-learned travel patterns, not a settings form)
|
||||
|
||||
When in doubt, ask. A 30-second clarification prevents hours of rework on a fundamentally misunderstood feature.
|
||||
|
||||
This complements the Clarification Rule above — that rule covers *ambiguous requirements*; this rule covers *requirements that seem clear but may be misunderstood*. The test: "If I'm wrong about what this means, would I build something completely different?" If yes, verify.
|
||||
|
||||
@@ -6,26 +6,15 @@ temperature: 0.1
|
||||
permission:
|
||||
edit: allow
|
||||
bash:
|
||||
"uv run pytest*": allow
|
||||
"uv run python -m pytest*": allow
|
||||
"pytest*": allow
|
||||
"python -m pytest*": allow
|
||||
"npm test*": allow
|
||||
"npm run test*": allow
|
||||
"pnpm test*": allow
|
||||
"pnpm run test*": allow
|
||||
"bun test*": allow
|
||||
"npm run dev*": allow
|
||||
"npm start*": allow
|
||||
"npx jest*": allow
|
||||
"npx vitest*": allow
|
||||
"npx playwright*": allow
|
||||
"*": deny
|
||||
"uv *": allow
|
||||
"bun *": allow
|
||||
"go test*": allow
|
||||
"docker *": allow
|
||||
"cargo test*": allow
|
||||
"make test*": allow
|
||||
"gh run*": allow
|
||||
"gh pr*": allow
|
||||
"*": deny
|
||||
---
|
||||
|
||||
You are the Tester subagent.
|
||||
|
||||
@@ -12,7 +12,8 @@
|
||||
},
|
||||
"explore": {
|
||||
"disable": true
|
||||
}
|
||||
},
|
||||
"plan": { "permission": { "write": "allow" } }
|
||||
},
|
||||
"permission": {
|
||||
"websearch": "allow",
|
||||
|
||||
Reference in New Issue
Block a user