docs: add presets/background agents spec

2026-04-12 10:58:44 +01:00
parent 86335c2971
commit 0438a7b384
2 changed files with 1738 additions and 0 deletions
--- a/docs/superpowers/plans/2026-04-12-subagent-presets-and-background-agents.md
+++ b/docs/superpowers/plans/2026-04-12-subagent-presets-and-background-agents.md
--- a/docs/superpowers/specs/2026-04-12-subagent-presets-and-background-agents-design.md
+++ b/docs/superpowers/specs/2026-04-12-subagent-presets-and-background-agents-design.md
@@ -0,0 +1,440 @@
 # Subagent presets and background agents design
 Date: 2026-04-12
 Status: approved for planning
 ## Summary
 Evolve `pi-subagents` from generic foreground-only child runs into a preset-driven delegation package with three tools:
 - `subagent` for single or parallel foreground runs
 - `background_agent` for detached process-backed runs
 - `background_agent_status` for polling detached-run status and counts
 Named customization comes back as markdown presets, but **without** bundled built-in roles. Presets live in:
 - global: `~/.pi/agent/subagents/*.md`
 - project: nearest `.pi/subagents/*.md` found by walking up from the current `cwd`
 Project presets override global presets by name. Calls must name a preset. Per-call overrides are limited to `model`; prompt and tool access come only from the preset.
 ## Current state
 Today the package has:
 - one `subagent` tool with `task | tasks | chain`
 - no background-run registry or polling tool
 - no preset discovery
 - no wrapper support for preset-owned prompt/tool restrictions
 - prompt templates that still assume `chain`
 The old role system was removed entirely, including markdown discovery, `--tools`, and `--append-system-prompt` wiring. This change brings back the useful customization pieces without reintroducing bundled roles or old role-specific behavior.
 ## Goals
 - Remove `chain` mode from `subagent`.
 - Keep foreground single-run and parallel-run delegation.
 - Add named preset discovery from global and project markdown directories.
 - Let presets define appended prompt text, built-in tool allowlist, and optional default model.
 - Add `background_agent` as a process-only detached launcher.
 - Add `background_agent_status` so the parent agent can poll one run or many runs.
 - Track background-run counts and surface completion through UI notification plus a visible session message.
 - Preserve existing runner behavior for foreground runs and keep runner-specific changes minimal.
 ## Non-goals
 - No bundled built-in presets (`scout`, `planner`, `reviewer`, `worker`, etc.).
 - No markdown discovery from `.pi/agents`.
 - No inline prompt or tool overrides on tool calls.
 - No background tmux mode. `background_agent` always uses the process runner.
 - No automatic follow-up turn when a background run finishes.
 - No attempt to restrict third-party extension tools beyond what Pi CLI already supports.
 ## Chosen approach
 Add a small preset layer and a small background-run state layer on top of the current runner/wrapper core.
 - foreground `subagent` becomes preset-aware and loses `chain`
 - wrapper/artifacts regain preset-owned prompt/tool support
 - background runs reuse process-launch mechanics but skip `monitorRun()` in the calling tool
 - extension-owned registry watches detached runs, persists state to session custom entries, updates footer status text, emits UI notifications, and injects visible completion messages into session history
 This keeps most existing code paths intact while restoring customization and adding detached orchestration.
 ## Public API design
 ### `subagent`
 Supported modes:
 #### Single mode
 Required:
 - `preset`
 - `task`
 Optional:
 - `model`
 - `cwd`
 #### Parallel mode
 Required:
 - `tasks: Array<{ preset: string; task: string; model?: string; cwd?: string }>`
 Notes:
 - each parallel item names its own preset
 - there is no top-level default preset
 - there is no top-level required model
 Removed:
 - `chain`
 Model resolution order per run:
 1. call-level `model`
 2. preset `model`
 3. error: no model resolved
 ### `background_agent`
 Single-run only.
 Required:
 - `preset`
 - `task`
 Optional:
 - `model`
 - `cwd`
 Behavior:
 - always launches with the process runner, ignoring tmux config
 - returns immediately after spawn request
 - returns run handle metadata plus counts snapshot
 - does not wait for completion
 ### `background_agent_status`
 Purpose:
 - let the main agent poll background runs and inspect counts
 Parameters:
 - `runId?` — inspect one run
 - `includeCompleted?` — default `false`; when omitted, only active runs are listed unless `runId` is provided
 Returns:
 - counts: `running`, `completed`, `failed`, `aborted`, `total`
 - per-run rows with preset, task, cwd, model info, timestamps, artifact paths, and status
 - final result fields when a run is terminal
 ## Preset design
 ### Discovery
 Load presets from two sources:
 - global: `join(getAgentDir(), "subagents")`
 - project: nearest ancestor directory containing `.pi/subagents`
 Merge order:
 1. global presets
 2. project presets override global presets with the same `name`
 No confirmation gate for project presets.
 ### File format
 Each preset is one markdown file with frontmatter and body.
 Required frontmatter:
 - `name`
 - `description`
 Optional frontmatter:
 - `model`
 - `tools`
 Body:
 - appended system prompt text
 Example:
 ```md
 ---
 name: repo-scout
 description: Fast repo exploration
 model: github-copilot/gpt-4o
 tools: read,grep,find,ls
 ---
 You are a scout. Explore quickly, summarize clearly, and avoid implementation.
 ```
 ### Preset semantics
 - `model` is the default model when the call does not provide one.
 - `tools` is optional.
 - omitted `tools` means normal child-tool behavior (no built-in tool restriction)
 - when `tools` is present, pass it through Pi CLI `--tools`, which limits built-in tools only
 - prompt text comes only from the markdown body; no inline prompt override
 ## Runtime design
 ### Foreground subagent execution
 `src/tool.ts` becomes preset-aware.
 For each run:
 1. discover presets
 2. resolve the named preset
 3. normalize explicit `model` override against available models if present
 4. normalize preset default model against available models if used
 5. compute effective model from call override or preset default
 6. pass runner metadata including preset, prompt text, built-in tool allowlist, and model selection
 Parallel behavior stays the same apart from:
 - no `chain`
 - each task resolving its own preset/model
 - summary lines identifying tasks by index and/or preset, not old role names
 ### Background execution
 `background_agent` launches the wrapper via process-runner primitives but returns immediately.
 Flow:
 1. resolve preset + effective model
 2. create run artifacts
 3. spawn wrapper process
 4. register run in extension background registry as `running`
 5. append persistent session entry for the new run
 6. start detached watcher on that run’s `result.json` / `events.jsonl`
 7. return handle metadata and counts snapshot
 Background runs are process-only even when normal `subagent` foreground runs are configured for tmux.
 ### Background registry
 The extension owns a session-scoped registry keyed by `runId`.
 Stored metadata per run:
 - `runId`
 - `preset`
 - `task`
 - `cwd`
 - `requestedModel`
 - `resolvedModel`
 - artifact paths
 - timestamps
 - terminal result fields when available
 - status: `running | completed | failed | aborted`
 The registry also computes counts:
 - `running`
 - `completed`
 - `failed`
 - `aborted`
 - `total`
 ### Persistence and reload behavior
 Persist background state with session custom entries.
 Custom entry types:
 - `pi-subagents:bg-run` — initial launch metadata
 - `pi-subagents:bg-update` — later status/result transitions
 On `session_start`, rebuild the in-memory registry by scanning `ctx.sessionManager.getEntries()`.
 For rebuilt runs that are still non-terminal:
 - if `result.json` already exists, ingest it immediately
 - otherwise reattach a watcher so completion still updates counts, notifications, and session messages after reload/resume
 ## Notification and polling design
 ### Completion notification
 When a detached run becomes terminal:
 1. update registry and counts
 2. append `pi-subagents:bg-update`
 3. update footer status text, e.g. `bg: 2 running / 5 total`
 4. emit UI notification if UI is available
 5. inject a visible custom session message describing completion
 The completion message must **not** trigger a new agent turn automatically.
 ### Polling
 The parent agent polls with `background_agent_status`.
 Typical use:
 - ask for current running count
 - list active background runs
 - inspect one `runId`
 - fetch terminal result summary after notification arrives
 ## Wrapper and artifact design
 ### Artifact layout
 Keep run directories under:
 - `.pi/subagents/runs/<runId>/`
 Keep existing files and restore the removed prompt artifact when needed:
 - `meta.json`
 - `events.jsonl`
 - `result.json`
 - `stdout.log`
 - `stderr.log`
 - `transcript.log`
 - `child-session.jsonl`
 - `system-prompt.md`
 ### Metadata
 Keep existing generic bookkeeping and add preset-specific fields:
 - `preset`
 - `presetSource`
 - `tools`
 - `systemPrompt`
 - `systemPromptPath`
 Do not reintroduce old bundled-role concepts or role-only behavior.
 ### Child wrapper
 `src/wrapper/cli.mjs` should again support:
 - `--append-system-prompt <path>` when preset prompt text exists
 - `--tools <csv>` when preset `tools` exists
 Keep:
 - `PI_SUBAGENTS_CHILD=1`
 - github-copilot initiator behavior based on effective model
 - best-effort artifact appends that must never block writing `result.json`
 - semantic-completion exit handling
 ## Extension registration behavior
 Keep existing model-registration behavior for model-dependent tools:
 - preserve current available-model order for schema enums
 - do not mutate available-model arrays when deduping cache keys
 - re-register when model set changes
 - do not re-register when model set is the same in different order
 - if the first observed set is empty, a later non-empty set must still register
 - skip tool registration entirely when `PI_SUBAGENTS_CHILD=1`
 Register:
 - `subagent`
 - `background_agent`
 - `background_agent_status`
 `background_agent_status` does not need model parameters, but registration still follows the package’s existing model-availability gate for consistency.
 ## Prompt and documentation design
 Rewrite shipped prompts so they no longer mention `chain` mode.
 They should instead describe repeated `subagent` calls or `subagent.tasks` parallel calls, for example:
 - inspect with a preset
 - turn findings into a plan with another preset
 - implement or review in separate follow-up calls
 README and docs should describe:
 - preset directories and markdown format
 - `background_agent`
 - `background_agent_status`
 - background completion notification behavior
 - background count tracking
 - process-only behavior for detached runs
 - built-in-tool-only semantics of preset `tools`
 They should continue to avoid claiming bundled built-in roles.
 ## File-level impact
 Expected new files:
 - `src/presets.ts`
 - `src/presets.test.ts`
 - `src/background-registry.ts`
 - `src/background-registry.test.ts`
 - `src/background-schema.ts`
 - `src/background-tool.ts`
 - `src/background-tool.test.ts`
 - `src/background-status-tool.ts`
 - `src/background-status-tool.test.ts`
 Expected modified files:
 - `index.ts`
 - `src/schema.ts`
 - `src/tool.ts`
 - `src/models.ts`
 - `src/runner.ts` and/or `src/process-runner.ts`
 - `src/artifacts.ts`
 - `src/process-runner.test.ts`
 - `src/extension.test.ts`
 - `src/artifacts.test.ts`
 - `src/wrapper/cli.mjs`
 - `src/wrapper/cli.test.ts`
 - `src/wrapper/render.mjs`
 - `src/wrapper/render.test.ts`
 - `src/prompts.test.ts`
 - `README.md`
 - `prompts/*.md`
 - `AGENTS.md`
 Expected removals:
 - `src/tool-chain.test.ts`
 ## Testing plan
 Implementation should follow TDD.
 ### New coverage
 Add tests for:
 - preset discovery from global and project directories
 - project preset override by name
 - required preset selection in single and parallel mode
 - model resolution from call override vs preset default
 - error when neither call nor preset supplies a valid model
 - chain removal from schema and runtime
 - detached background launch returning immediately
 - background registry counts
 - session-entry persistence and reload reconstruction
 - completion notification emitting UI notify + visible session message without auto-turn
 - polling one background run and many background runs
 ### Preserved coverage
 Keep regression coverage for:
 - child sessions skipping subagent tool registration when `PI_SUBAGENTS_CHILD=1`
 - no tool registration when no models are available
 - later registration when a non-empty model list appears
 - no re-registration for the same model set in different order
 - re-registration when the model set changes
 - github-copilot initiator behavior
 - best-effort artifact logging never preventing `result.json` writes
 - effective model using the resolved model when requested/resolved differ
 ## Risks and mitigations
 ### Risk: preset `tools` sounds broader than Pi can enforce
 Mitigation:
 - document that preset `tools` maps to Pi CLI `--tools`
 - treat it as a built-in tool allowlist only
 - keep `PI_SUBAGENTS_CHILD=1` so this package never re-registers its own subagent tools inside child runs
 ### Risk: detached watcher state lost on reload
 Mitigation:
 - persist launch/update events as session custom entries
 - rebuild registry on `session_start`
 - reattach watchers for non-terminal runs
 ### Risk: background notifications spam the user
 Mitigation:
 - emit one terminal notification per run
 - keep footer count compact
 - require explicit polling for detailed inspection
 ### Risk: larger extension entrypoint
 Mitigation:
 - keep preset discovery, registry/state, and tool definitions in separate focused modules
 - keep runner-specific logic in existing runner files with minimal changes