Add three new sections to AGENTS.md addressing workflow gaps observed in a session where 6 features were implemented but none were functionally tested — only static analysis (type checks, linting) was used, resulting in broken features shipped as 'done'. New rules: - Functional Verification: features must be end-to-end tested before completion; static analysis alone is explicitly insufficient - Mandatory Quality Pipeline: every feature gets reviewer + tester passes; no batch validation; quality over quantity under time pressure - Requirement Understanding Verification: verify understanding before implementing features with implicit expectations or domain concepts Also simplifies tester bash permissions and adds plan write permission.
192 lines
13 KiB
Markdown
192 lines
13 KiB
Markdown
## Project Memory
|
|
|
|
Use markdown files in `.memory/` as the persistent project memory across sessions. This is the source of truth for architecture, decisions, plans, research, and implementation state.
|
|
|
|
**Directory structure:**
|
|
|
|
```text
|
|
.memory/
|
|
knowledge.md # Persistent project knowledge (architecture, patterns, key concepts)
|
|
decisions.md # Architecture decisions, SME guidance, design choices
|
|
plans/ # One file per active plan/feature
|
|
<feature>.md # Plan with tasks, statuses, acceptance criteria
|
|
research/ # Research findings
|
|
<topic>.md # Research on a specific topic
|
|
```
|
|
|
|
**Workflow: read files → work → update files**
|
|
|
|
1. **Session start:** Read `.memory/` directory contents and skim `.memory/knowledge.md`.
|
|
2. **Before each task:** Read relevant `.memory/*.md` files before reading source files for project understanding.
|
|
3. **After each task:** Update the appropriate `.memory/*.md` files with what was built.
|
|
|
|
Be specific in summaries: include parameter names, defaults, file locations, and rationale. Keep concepts organized as markdown sections (`## Heading`) and keep hierarchy shallow.
|
|
|
|
**Recording discipline:** Only record outcomes, decisions, and discoveries — never phase transitions, status changes, or ceremony checkpoints. If an entry would only say "we started phase X", don't add it. Memory files preserve *knowledge*, not *activity logs*.
|
|
|
|
**Read discipline:**
|
|
- Read only the `.memory/` files relevant to the current task; avoid broad re-reads that add no new signal.
|
|
- **Skip redundant reads** when `.memory/` already has no relevant content in that domain this session.
|
|
- **Do not immediately re-read content you just wrote.** You already have that context from the update.
|
|
- Treat `.memory/` as a **tool**, not a ritual. Every read should have a specific information need.
|
|
|
|
**Linking is required.** When recording related knowledge across files, add markdown cross-references (for example: `See [Decision: Auth](decisions.md#auth-approach)`). A section with no references becomes a dead end.
|
|
|
|
## Cross-Tool Instruction Files
|
|
|
|
Use symlinks to share ONE instruction file across all agentic coding tools:
|
|
|
|
```text
|
|
project/
|
|
├── .github/
|
|
│ └── copilot-instructions.md # Real file (edit this one)
|
|
├── AGENTS.md -> .github/copilot-instructions.md
|
|
├── CLAUDE.md -> .github/copilot-instructions.md
|
|
└── .cursorrules -> .github/copilot-instructions.md
|
|
```
|
|
|
|
**Rules:**
|
|
- Edit `.github/copilot-instructions.md` — changes propagate automatically via symlinks
|
|
- Never edit symlinked files directly (changes would be lost)
|
|
- Symlinks are committed to git (git tracks them natively)
|
|
|
|
**Content of the instruction file:**
|
|
- Project overview and purpose
|
|
- Tech stack and architecture
|
|
- Coding conventions and patterns
|
|
- Build/test/lint commands
|
|
- Project structure overview
|
|
|
|
**Do NOT duplicate `.memory/` contents** — instruction files describe how to work with the project, not active plans, research, or decisions.
|
|
|
|
**When initializing a project:**
|
|
1. Create `.github/copilot-instructions.md` with project basics
|
|
2. Create symlinks: `ln -s .github/copilot-instructions.md AGENTS.md` (etc.)
|
|
3. Commit the real file and symlinks to git
|
|
|
|
**When joining an existing project:**
|
|
- Read `.github/copilot-instructions.md` (or any of the symlinked files) to understand the project
|
|
- If instruction file is missing, create it and the symlinks
|
|
|
|
## Session Continuity
|
|
|
|
- Treat `.memory/` files as the persistent tracking system for work across sessions.
|
|
- At session start, identify prior in-progress work items and pending decisions before doing new implementation.
|
|
- After implementation, update `.memory/` files with what changed, why it changed, and what remains next.
|
|
|
|
## Clarification Rule
|
|
|
|
- If requirements are genuinely unclear, materially ambiguous, or have multiple valid interpretations that would lead to **materially different implementations**, use the `question` tool to clarify before committing to an implementation path.
|
|
- **Do not ask for clarification when the user's intent is obvious.** If the user explicitly states what they want (e.g., "update X and also update Y"), do not ask "should I do both?" — proceed with the stated request.
|
|
- Implementation-level decisions (naming, file organization, approach) are the agent's job, not the user's. Only escalate decisions that affect **user-visible behavior or scope**.
|
|
|
|
## Agent Roster
|
|
|
|
| Agent | Role | Model |
|
|
|---|---|---|
|
|
| `lead` | Primary orchestrator that decomposes work, delegates, and synthesizes outcomes. | `github-copilot/claude-opus-4` (global default) |
|
|
| `coder` | Implementation-focused coding agent for reliable code changes. | `github-copilot/gpt-5.3-codex` |
|
|
| `reviewer` | Read-only code/source review; writes `.memory/*` for verdict records. | `github-copilot/claude-opus-4.6` |
|
|
| `tester` | Validation agent for standard + adversarial testing; writes `.memory/*` for test outcomes. | `github-copilot/claude-sonnet-4.6` |
|
|
| `explorer` | Fast read-only codebase mapper; writes `.memory/*` for discovery records. | `github-copilot/claude-sonnet-4.6` |
|
|
| `researcher` | Deep technical investigator; writes `.memory/*` for research findings. | `github-copilot/claude-opus-4.6` |
|
|
| `librarian` | Documentation coverage and accuracy specialist. | `github-copilot/claude-opus-4.6` |
|
|
| `critic` | Pre-implementation gate and blocker sounding board; writes `.memory/*` for verdicts. | `github-copilot/claude-opus-4.6` |
|
|
| `sme` | Subject-matter expert for domain-specific consultation; writes `.memory/*` for guidance cache. | `github-copilot/claude-opus-4.6` |
|
|
| `designer` | UI/UX specialist for interaction and visual guidance; writes `.memory/*` for design decisions. | `github-copilot/claude-sonnet-4.6` |
|
|
|
|
All agents except `lead`, `coder`, and `librarian` are code/source read-only but have `permission.edit: allow` scoped to `.memory/*` writes for their recording duties. The `lead` and `librarian` have full edit access; `coder` has full edit access for implementation.
|
|
|
|
## Parallelization
|
|
|
|
- **Always parallelize independent work.** Any tool calls that do not depend on each other's output must be issued in the same message as parallel calls — never sequentially. This applies to bash commands, file reads, and subagent delegations alike.
|
|
- Before issuing a sequence of calls, ask: *"Does call B require the result of call A?"* If not, send them together.
|
|
|
|
## Human Checkpoint Triggers
|
|
|
|
When implementing features, the Lead must stop and request explicit user approval before dispatching coder work in these situations:
|
|
|
|
1. **Security-sensitive design**: Any feature involving encryption, auth flows, secret storage, token management, or permission model changes.
|
|
2. **Architectural ambiguity**: Multiple valid approaches with materially different tradeoffs that aren't resolvable from codebase conventions alone.
|
|
3. **Vision-dependent features**: Features where the user's intended UX or behavior model isn't fully specified by the request.
|
|
4. **New external dependencies**: Adding a service, SDK, or infrastructure component not already in the project.
|
|
5. **Data model changes with migration impact**: Schema changes affecting existing production data.
|
|
|
|
The checkpoint must present the specific decision, 2-3 concrete options with tradeoffs, a recommendation, and a safe default. Implementation-level decisions (naming, file organization, code patterns) are NOT checkpoints — only user-visible behavior and architectural choices qualify.
|
|
|
|
## Functional Verification (Implement → Verify → Iterate)
|
|
|
|
**Static analysis is not verification.** Type checks (`bun run check`, `tsc`), linters (`eslint`, `ruff`), and framework system checks (`python manage.py check`) confirm code is syntactically and structurally valid. They do NOT confirm the feature works. A feature that type-checks perfectly can be completely non-functional.
|
|
|
|
**Every implemented feature MUST be functionally verified before being marked complete.** "Functionally verified" means demonstrating that the feature actually works end-to-end — not just that it compiles.
|
|
|
|
### What Counts as Functional Verification
|
|
|
|
Functional verification must exercise the **actual behavior path** a user would trigger:
|
|
|
|
- **API endpoints**: Make real HTTP requests (`curl`, `httpie`, or the app's test client) and verify response status, shape, and data correctness. Check both success and error paths.
|
|
- **Frontend components**: Verify the component renders, interacts correctly, and communicates with the backend. Use the browser (Playwright) or run the app's frontend test suite.
|
|
- **Database/model changes**: Verify migrations run, data can be created/read/updated/deleted through the ORM or API, and constraints are enforced.
|
|
- **Integration points**: When a feature spans frontend ↔ backend, verify the full round-trip: UI action → API call → database → response → UI update.
|
|
- **Configuration/settings**: Verify the setting is actually read and affects behavior — not just that the config key exists.
|
|
|
|
### What Does NOT Count as Functional Verification
|
|
|
|
These are useful but insufficient on their own:
|
|
|
|
- ❌ `bun run check` / `tsc --noEmit` (type checking)
|
|
- ❌ `bun run lint` / `eslint` / `ruff` (linting)
|
|
- ❌ `python manage.py check` (Django system checks)
|
|
- ❌ `bun run build` succeeding (build pipeline)
|
|
- ❌ Reading the code and concluding "this looks correct"
|
|
- ❌ Verifying file existence or import structure
|
|
|
|
### The Iterate-Until-Working Cycle
|
|
|
|
When functional verification reveals a problem:
|
|
|
|
1. **Diagnose** the root cause (not just the symptom).
|
|
2. **Fix** via coder dispatch with the specific failure context.
|
|
3. **Re-verify** the same functional test that failed.
|
|
4. **Repeat** until the feature demonstrably works.
|
|
|
|
A feature is "done" when it passes functional verification, not when the coder returns without errors. The lead agent must never mark a task complete based solely on a clean coder return — the verification step is mandatory.
|
|
|
|
### Verification Scope by Change Type
|
|
|
|
| Change type | Minimum verification |
|
|
|---|---|
|
|
| New API endpoint | HTTP request with expected response verified |
|
|
| New UI feature | Browser-based or test-suite verification of render + interaction |
|
|
| Full-stack feature | End-to-end: UI → API → DB → response → UI update |
|
|
| Data model change | Migration runs + CRUD operations verified through API or ORM |
|
|
| Bug fix | Reproduce the bug scenario, verify it no longer occurs |
|
|
| Config/settings | Verify the setting changes observable behavior |
|
|
| Refactor (no behavior change) | Existing tests pass + spot-check one behavior path |
|
|
|
|
## Mandatory Quality Pipeline
|
|
|
|
**The reviewer and tester agents exist to be used — not decoratively.** Every non-trivial feature must go through the quality pipeline. Skipping reviewers or testers to "save time" creates broken features that cost far more time to debug later.
|
|
|
|
### Minimum Quality Requirements
|
|
|
|
- **Every feature gets a reviewer pass.** No exceptions for "simple" features — the session transcript showed that even apparently simple features (like provider selection) had critical bugs that a reviewer would have caught.
|
|
- **Every feature with user-facing behavior gets a tester pass.** The tester agent must be dispatched for any feature that a user would interact with. The tester validates functional behavior, not just code structure.
|
|
- **Features cannot be batch-validated.** Each feature gets its own review → test cycle. "I'll review all 6 workstreams at the end" is not acceptable — bugs compound and become harder to diagnose.
|
|
|
|
### The Lead Must Not Skip the Pipeline Under Time Pressure
|
|
|
|
Even when there are many features to implement, the quality pipeline is non-negotiable. It is better to ship 3 working features than 6 broken ones. If scope must be reduced to maintain quality, reduce scope — do not reduce quality.
|
|
|
|
## Requirement Understanding Verification
|
|
|
|
Before implementing a feature, the lead must verify its understanding of what the user actually wants — especially for features involving:
|
|
|
|
- **User-facing behavior models** (e.g., "the app should learn from my data" vs. "the user manually inputs preferences")
|
|
- **Implicit expectations** (e.g., "show available providers" implies showing which ones are *configured*, not just listing all possible providers)
|
|
- **Domain-specific concepts** (e.g., in a travel app, "preferences" might mean auto-learned travel patterns, not a settings form)
|
|
|
|
When in doubt, ask. A 30-second clarification prevents hours of rework on a fundamentally misunderstood feature.
|
|
|
|
This complements the Clarification Rule above — that rule covers *ambiguous requirements*; this rule covers *requirements that seem clear but may be misunderstood*. The test: "If I'm wrong about what this means, would I build something completely different?" If yes, verify.
|