docs: add presets/background agents spec
This commit is contained in:
@@ -0,0 +1,440 @@
|
||||
# Subagent presets and background agents design
|
||||
|
||||
Date: 2026-04-12
|
||||
Status: approved for planning
|
||||
|
||||
## Summary
|
||||
|
||||
Evolve `pi-subagents` from generic foreground-only child runs into a preset-driven delegation package with three tools:
|
||||
- `subagent` for single or parallel foreground runs
|
||||
- `background_agent` for detached process-backed runs
|
||||
- `background_agent_status` for polling detached-run status and counts
|
||||
|
||||
Named customization comes back as markdown presets, but **without** bundled built-in roles. Presets live in:
|
||||
- global: `~/.pi/agent/subagents/*.md`
|
||||
- project: nearest `.pi/subagents/*.md` found by walking up from the current `cwd`
|
||||
|
||||
Project presets override global presets by name. Calls must name a preset. Per-call overrides are limited to `model`; prompt and tool access come only from the preset.
|
||||
|
||||
## Current state
|
||||
|
||||
Today the package has:
|
||||
- one `subagent` tool with `task | tasks | chain`
|
||||
- no background-run registry or polling tool
|
||||
- no preset discovery
|
||||
- no wrapper support for preset-owned prompt/tool restrictions
|
||||
- prompt templates that still assume `chain`
|
||||
|
||||
The old role system was removed entirely, including markdown discovery, `--tools`, and `--append-system-prompt` wiring. This change brings back the useful customization pieces without reintroducing bundled roles or old role-specific behavior.
|
||||
|
||||
## Goals
|
||||
|
||||
- Remove `chain` mode from `subagent`.
|
||||
- Keep foreground single-run and parallel-run delegation.
|
||||
- Add named preset discovery from global and project markdown directories.
|
||||
- Let presets define appended prompt text, built-in tool allowlist, and optional default model.
|
||||
- Add `background_agent` as a process-only detached launcher.
|
||||
- Add `background_agent_status` so the parent agent can poll one run or many runs.
|
||||
- Track background-run counts and surface completion through UI notification plus a visible session message.
|
||||
- Preserve existing runner behavior for foreground runs and keep runner-specific changes minimal.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- No bundled built-in presets (`scout`, `planner`, `reviewer`, `worker`, etc.).
|
||||
- No markdown discovery from `.pi/agents`.
|
||||
- No inline prompt or tool overrides on tool calls.
|
||||
- No background tmux mode. `background_agent` always uses the process runner.
|
||||
- No automatic follow-up turn when a background run finishes.
|
||||
- No attempt to restrict third-party extension tools beyond what Pi CLI already supports.
|
||||
|
||||
## Chosen approach
|
||||
|
||||
Add a small preset layer and a small background-run state layer on top of the current runner/wrapper core.
|
||||
|
||||
- foreground `subagent` becomes preset-aware and loses `chain`
|
||||
- wrapper/artifacts regain preset-owned prompt/tool support
|
||||
- background runs reuse process-launch mechanics but skip `monitorRun()` in the calling tool
|
||||
- extension-owned registry watches detached runs, persists state to session custom entries, updates footer status text, emits UI notifications, and injects visible completion messages into session history
|
||||
|
||||
This keeps most existing code paths intact while restoring customization and adding detached orchestration.
|
||||
|
||||
## Public API design
|
||||
|
||||
### `subagent`
|
||||
|
||||
Supported modes:
|
||||
|
||||
#### Single mode
|
||||
|
||||
Required:
|
||||
- `preset`
|
||||
- `task`
|
||||
|
||||
Optional:
|
||||
- `model`
|
||||
- `cwd`
|
||||
|
||||
#### Parallel mode
|
||||
|
||||
Required:
|
||||
- `tasks: Array<{ preset: string; task: string; model?: string; cwd?: string }>`
|
||||
|
||||
Notes:
|
||||
- each parallel item names its own preset
|
||||
- there is no top-level default preset
|
||||
- there is no top-level required model
|
||||
|
||||
Removed:
|
||||
- `chain`
|
||||
|
||||
Model resolution order per run:
|
||||
1. call-level `model`
|
||||
2. preset `model`
|
||||
3. error: no model resolved
|
||||
|
||||
### `background_agent`
|
||||
|
||||
Single-run only.
|
||||
|
||||
Required:
|
||||
- `preset`
|
||||
- `task`
|
||||
|
||||
Optional:
|
||||
- `model`
|
||||
- `cwd`
|
||||
|
||||
Behavior:
|
||||
- always launches with the process runner, ignoring tmux config
|
||||
- returns immediately after spawn request
|
||||
- returns run handle metadata plus counts snapshot
|
||||
- does not wait for completion
|
||||
|
||||
### `background_agent_status`
|
||||
|
||||
Purpose:
|
||||
- let the main agent poll background runs and inspect counts
|
||||
|
||||
Parameters:
|
||||
- `runId?` — inspect one run
|
||||
- `includeCompleted?` — default `false`; when omitted, only active runs are listed unless `runId` is provided
|
||||
|
||||
Returns:
|
||||
- counts: `running`, `completed`, `failed`, `aborted`, `total`
|
||||
- per-run rows with preset, task, cwd, model info, timestamps, artifact paths, and status
|
||||
- final result fields when a run is terminal
|
||||
|
||||
## Preset design
|
||||
|
||||
### Discovery
|
||||
|
||||
Load presets from two sources:
|
||||
- global: `join(getAgentDir(), "subagents")`
|
||||
- project: nearest ancestor directory containing `.pi/subagents`
|
||||
|
||||
Merge order:
|
||||
1. global presets
|
||||
2. project presets override global presets with the same `name`
|
||||
|
||||
No confirmation gate for project presets.
|
||||
|
||||
### File format
|
||||
|
||||
Each preset is one markdown file with frontmatter and body.
|
||||
|
||||
Required frontmatter:
|
||||
- `name`
|
||||
- `description`
|
||||
|
||||
Optional frontmatter:
|
||||
- `model`
|
||||
- `tools`
|
||||
|
||||
Body:
|
||||
- appended system prompt text
|
||||
|
||||
Example:
|
||||
|
||||
```md
|
||||
---
|
||||
name: repo-scout
|
||||
description: Fast repo exploration
|
||||
model: github-copilot/gpt-4o
|
||||
tools: read,grep,find,ls
|
||||
---
|
||||
You are a scout. Explore quickly, summarize clearly, and avoid implementation.
|
||||
```
|
||||
|
||||
### Preset semantics
|
||||
|
||||
- `model` is the default model when the call does not provide one.
|
||||
- `tools` is optional.
|
||||
- omitted `tools` means normal child-tool behavior (no built-in tool restriction)
|
||||
- when `tools` is present, pass it through Pi CLI `--tools`, which limits built-in tools only
|
||||
- prompt text comes only from the markdown body; no inline prompt override
|
||||
|
||||
## Runtime design
|
||||
|
||||
### Foreground subagent execution
|
||||
|
||||
`src/tool.ts` becomes preset-aware.
|
||||
|
||||
For each run:
|
||||
1. discover presets
|
||||
2. resolve the named preset
|
||||
3. normalize explicit `model` override against available models if present
|
||||
4. normalize preset default model against available models if used
|
||||
5. compute effective model from call override or preset default
|
||||
6. pass runner metadata including preset, prompt text, built-in tool allowlist, and model selection
|
||||
|
||||
Parallel behavior stays the same apart from:
|
||||
- no `chain`
|
||||
- each task resolving its own preset/model
|
||||
- summary lines identifying tasks by index and/or preset, not old role names
|
||||
|
||||
### Background execution
|
||||
|
||||
`background_agent` launches the wrapper via process-runner primitives but returns immediately.
|
||||
|
||||
Flow:
|
||||
1. resolve preset + effective model
|
||||
2. create run artifacts
|
||||
3. spawn wrapper process
|
||||
4. register run in extension background registry as `running`
|
||||
5. append persistent session entry for the new run
|
||||
6. start detached watcher on that run’s `result.json` / `events.jsonl`
|
||||
7. return handle metadata and counts snapshot
|
||||
|
||||
Background runs are process-only even when normal `subagent` foreground runs are configured for tmux.
|
||||
|
||||
### Background registry
|
||||
|
||||
The extension owns a session-scoped registry keyed by `runId`.
|
||||
|
||||
Stored metadata per run:
|
||||
- `runId`
|
||||
- `preset`
|
||||
- `task`
|
||||
- `cwd`
|
||||
- `requestedModel`
|
||||
- `resolvedModel`
|
||||
- artifact paths
|
||||
- timestamps
|
||||
- terminal result fields when available
|
||||
- status: `running | completed | failed | aborted`
|
||||
|
||||
The registry also computes counts:
|
||||
- `running`
|
||||
- `completed`
|
||||
- `failed`
|
||||
- `aborted`
|
||||
- `total`
|
||||
|
||||
### Persistence and reload behavior
|
||||
|
||||
Persist background state with session custom entries.
|
||||
|
||||
Custom entry types:
|
||||
- `pi-subagents:bg-run` — initial launch metadata
|
||||
- `pi-subagents:bg-update` — later status/result transitions
|
||||
|
||||
On `session_start`, rebuild the in-memory registry by scanning `ctx.sessionManager.getEntries()`.
|
||||
|
||||
For rebuilt runs that are still non-terminal:
|
||||
- if `result.json` already exists, ingest it immediately
|
||||
- otherwise reattach a watcher so completion still updates counts, notifications, and session messages after reload/resume
|
||||
|
||||
## Notification and polling design
|
||||
|
||||
### Completion notification
|
||||
|
||||
When a detached run becomes terminal:
|
||||
1. update registry and counts
|
||||
2. append `pi-subagents:bg-update`
|
||||
3. update footer status text, e.g. `bg: 2 running / 5 total`
|
||||
4. emit UI notification if UI is available
|
||||
5. inject a visible custom session message describing completion
|
||||
|
||||
The completion message must **not** trigger a new agent turn automatically.
|
||||
|
||||
### Polling
|
||||
|
||||
The parent agent polls with `background_agent_status`.
|
||||
|
||||
Typical use:
|
||||
- ask for current running count
|
||||
- list active background runs
|
||||
- inspect one `runId`
|
||||
- fetch terminal result summary after notification arrives
|
||||
|
||||
## Wrapper and artifact design
|
||||
|
||||
### Artifact layout
|
||||
|
||||
Keep run directories under:
|
||||
- `.pi/subagents/runs/<runId>/`
|
||||
|
||||
Keep existing files and restore the removed prompt artifact when needed:
|
||||
- `meta.json`
|
||||
- `events.jsonl`
|
||||
- `result.json`
|
||||
- `stdout.log`
|
||||
- `stderr.log`
|
||||
- `transcript.log`
|
||||
- `child-session.jsonl`
|
||||
- `system-prompt.md`
|
||||
|
||||
### Metadata
|
||||
|
||||
Keep existing generic bookkeeping and add preset-specific fields:
|
||||
- `preset`
|
||||
- `presetSource`
|
||||
- `tools`
|
||||
- `systemPrompt`
|
||||
- `systemPromptPath`
|
||||
|
||||
Do not reintroduce old bundled-role concepts or role-only behavior.
|
||||
|
||||
### Child wrapper
|
||||
|
||||
`src/wrapper/cli.mjs` should again support:
|
||||
- `--append-system-prompt <path>` when preset prompt text exists
|
||||
- `--tools <csv>` when preset `tools` exists
|
||||
|
||||
Keep:
|
||||
- `PI_SUBAGENTS_CHILD=1`
|
||||
- github-copilot initiator behavior based on effective model
|
||||
- best-effort artifact appends that must never block writing `result.json`
|
||||
- semantic-completion exit handling
|
||||
|
||||
## Extension registration behavior
|
||||
|
||||
Keep existing model-registration behavior for model-dependent tools:
|
||||
- preserve current available-model order for schema enums
|
||||
- do not mutate available-model arrays when deduping cache keys
|
||||
- re-register when model set changes
|
||||
- do not re-register when model set is the same in different order
|
||||
- if the first observed set is empty, a later non-empty set must still register
|
||||
- skip tool registration entirely when `PI_SUBAGENTS_CHILD=1`
|
||||
|
||||
Register:
|
||||
- `subagent`
|
||||
- `background_agent`
|
||||
- `background_agent_status`
|
||||
|
||||
`background_agent_status` does not need model parameters, but registration still follows the package’s existing model-availability gate for consistency.
|
||||
|
||||
## Prompt and documentation design
|
||||
|
||||
Rewrite shipped prompts so they no longer mention `chain` mode.
|
||||
|
||||
They should instead describe repeated `subagent` calls or `subagent.tasks` parallel calls, for example:
|
||||
- inspect with a preset
|
||||
- turn findings into a plan with another preset
|
||||
- implement or review in separate follow-up calls
|
||||
|
||||
README and docs should describe:
|
||||
- preset directories and markdown format
|
||||
- `background_agent`
|
||||
- `background_agent_status`
|
||||
- background completion notification behavior
|
||||
- background count tracking
|
||||
- process-only behavior for detached runs
|
||||
- built-in-tool-only semantics of preset `tools`
|
||||
|
||||
They should continue to avoid claiming bundled built-in roles.
|
||||
|
||||
## File-level impact
|
||||
|
||||
Expected new files:
|
||||
- `src/presets.ts`
|
||||
- `src/presets.test.ts`
|
||||
- `src/background-registry.ts`
|
||||
- `src/background-registry.test.ts`
|
||||
- `src/background-schema.ts`
|
||||
- `src/background-tool.ts`
|
||||
- `src/background-tool.test.ts`
|
||||
- `src/background-status-tool.ts`
|
||||
- `src/background-status-tool.test.ts`
|
||||
|
||||
Expected modified files:
|
||||
- `index.ts`
|
||||
- `src/schema.ts`
|
||||
- `src/tool.ts`
|
||||
- `src/models.ts`
|
||||
- `src/runner.ts` and/or `src/process-runner.ts`
|
||||
- `src/artifacts.ts`
|
||||
- `src/process-runner.test.ts`
|
||||
- `src/extension.test.ts`
|
||||
- `src/artifacts.test.ts`
|
||||
- `src/wrapper/cli.mjs`
|
||||
- `src/wrapper/cli.test.ts`
|
||||
- `src/wrapper/render.mjs`
|
||||
- `src/wrapper/render.test.ts`
|
||||
- `src/prompts.test.ts`
|
||||
- `README.md`
|
||||
- `prompts/*.md`
|
||||
- `AGENTS.md`
|
||||
|
||||
Expected removals:
|
||||
- `src/tool-chain.test.ts`
|
||||
|
||||
## Testing plan
|
||||
|
||||
Implementation should follow TDD.
|
||||
|
||||
### New coverage
|
||||
|
||||
Add tests for:
|
||||
- preset discovery from global and project directories
|
||||
- project preset override by name
|
||||
- required preset selection in single and parallel mode
|
||||
- model resolution from call override vs preset default
|
||||
- error when neither call nor preset supplies a valid model
|
||||
- chain removal from schema and runtime
|
||||
- detached background launch returning immediately
|
||||
- background registry counts
|
||||
- session-entry persistence and reload reconstruction
|
||||
- completion notification emitting UI notify + visible session message without auto-turn
|
||||
- polling one background run and many background runs
|
||||
|
||||
### Preserved coverage
|
||||
|
||||
Keep regression coverage for:
|
||||
- child sessions skipping subagent tool registration when `PI_SUBAGENTS_CHILD=1`
|
||||
- no tool registration when no models are available
|
||||
- later registration when a non-empty model list appears
|
||||
- no re-registration for the same model set in different order
|
||||
- re-registration when the model set changes
|
||||
- github-copilot initiator behavior
|
||||
- best-effort artifact logging never preventing `result.json` writes
|
||||
- effective model using the resolved model when requested/resolved differ
|
||||
|
||||
## Risks and mitigations
|
||||
|
||||
### Risk: preset `tools` sounds broader than Pi can enforce
|
||||
|
||||
Mitigation:
|
||||
- document that preset `tools` maps to Pi CLI `--tools`
|
||||
- treat it as a built-in tool allowlist only
|
||||
- keep `PI_SUBAGENTS_CHILD=1` so this package never re-registers its own subagent tools inside child runs
|
||||
|
||||
### Risk: detached watcher state lost on reload
|
||||
|
||||
Mitigation:
|
||||
- persist launch/update events as session custom entries
|
||||
- rebuild registry on `session_start`
|
||||
- reattach watchers for non-terminal runs
|
||||
|
||||
### Risk: background notifications spam the user
|
||||
|
||||
Mitigation:
|
||||
- emit one terminal notification per run
|
||||
- keep footer count compact
|
||||
- require explicit polling for detailed inspection
|
||||
|
||||
### Risk: larger extension entrypoint
|
||||
|
||||
Mitigation:
|
||||
- keep preset discovery, registry/state, and tool definitions in separate focused modules
|
||||
- keep runner-specific logic in existing runner files with minimal changes
|
||||
Reference in New Issue
Block a user