docs: add presets/background agents spec
This commit is contained in:
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,440 @@
|
|||||||
|
# Subagent presets and background agents design
|
||||||
|
|
||||||
|
Date: 2026-04-12
|
||||||
|
Status: approved for planning
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Evolve `pi-subagents` from generic foreground-only child runs into a preset-driven delegation package with three tools:
|
||||||
|
- `subagent` for single or parallel foreground runs
|
||||||
|
- `background_agent` for detached process-backed runs
|
||||||
|
- `background_agent_status` for polling detached-run status and counts
|
||||||
|
|
||||||
|
Named customization comes back as markdown presets, but **without** bundled built-in roles. Presets live in:
|
||||||
|
- global: `~/.pi/agent/subagents/*.md`
|
||||||
|
- project: nearest `.pi/subagents/*.md` found by walking up from the current `cwd`
|
||||||
|
|
||||||
|
Project presets override global presets by name. Calls must name a preset. Per-call overrides are limited to `model`; prompt and tool access come only from the preset.
|
||||||
|
|
||||||
|
## Current state
|
||||||
|
|
||||||
|
Today the package has:
|
||||||
|
- one `subagent` tool with `task | tasks | chain`
|
||||||
|
- no background-run registry or polling tool
|
||||||
|
- no preset discovery
|
||||||
|
- no wrapper support for preset-owned prompt/tool restrictions
|
||||||
|
- prompt templates that still assume `chain`
|
||||||
|
|
||||||
|
The old role system was removed entirely, including markdown discovery, `--tools`, and `--append-system-prompt` wiring. This change brings back the useful customization pieces without reintroducing bundled roles or old role-specific behavior.
|
||||||
|
|
||||||
|
## Goals
|
||||||
|
|
||||||
|
- Remove `chain` mode from `subagent`.
|
||||||
|
- Keep foreground single-run and parallel-run delegation.
|
||||||
|
- Add named preset discovery from global and project markdown directories.
|
||||||
|
- Let presets define appended prompt text, built-in tool allowlist, and optional default model.
|
||||||
|
- Add `background_agent` as a process-only detached launcher.
|
||||||
|
- Add `background_agent_status` so the parent agent can poll one run or many runs.
|
||||||
|
- Track background-run counts and surface completion through UI notification plus a visible session message.
|
||||||
|
- Preserve existing runner behavior for foreground runs and keep runner-specific changes minimal.
|
||||||
|
|
||||||
|
## Non-goals
|
||||||
|
|
||||||
|
- No bundled built-in presets (`scout`, `planner`, `reviewer`, `worker`, etc.).
|
||||||
|
- No markdown discovery from `.pi/agents`.
|
||||||
|
- No inline prompt or tool overrides on tool calls.
|
||||||
|
- No background tmux mode. `background_agent` always uses the process runner.
|
||||||
|
- No automatic follow-up turn when a background run finishes.
|
||||||
|
- No attempt to restrict third-party extension tools beyond what Pi CLI already supports.
|
||||||
|
|
||||||
|
## Chosen approach
|
||||||
|
|
||||||
|
Add a small preset layer and a small background-run state layer on top of the current runner/wrapper core.
|
||||||
|
|
||||||
|
- foreground `subagent` becomes preset-aware and loses `chain`
|
||||||
|
- wrapper/artifacts regain preset-owned prompt/tool support
|
||||||
|
- background runs reuse process-launch mechanics but skip `monitorRun()` in the calling tool
|
||||||
|
- extension-owned registry watches detached runs, persists state to session custom entries, updates footer status text, emits UI notifications, and injects visible completion messages into session history
|
||||||
|
|
||||||
|
This keeps most existing code paths intact while restoring customization and adding detached orchestration.
|
||||||
|
|
||||||
|
## Public API design
|
||||||
|
|
||||||
|
### `subagent`
|
||||||
|
|
||||||
|
Supported modes:
|
||||||
|
|
||||||
|
#### Single mode
|
||||||
|
|
||||||
|
Required:
|
||||||
|
- `preset`
|
||||||
|
- `task`
|
||||||
|
|
||||||
|
Optional:
|
||||||
|
- `model`
|
||||||
|
- `cwd`
|
||||||
|
|
||||||
|
#### Parallel mode
|
||||||
|
|
||||||
|
Required:
|
||||||
|
- `tasks: Array<{ preset: string; task: string; model?: string; cwd?: string }>`
|
||||||
|
|
||||||
|
Notes:
|
||||||
|
- each parallel item names its own preset
|
||||||
|
- there is no top-level default preset
|
||||||
|
- there is no top-level required model
|
||||||
|
|
||||||
|
Removed:
|
||||||
|
- `chain`
|
||||||
|
|
||||||
|
Model resolution order per run:
|
||||||
|
1. call-level `model`
|
||||||
|
2. preset `model`
|
||||||
|
3. error: no model resolved
|
||||||
|
|
||||||
|
### `background_agent`
|
||||||
|
|
||||||
|
Single-run only.
|
||||||
|
|
||||||
|
Required:
|
||||||
|
- `preset`
|
||||||
|
- `task`
|
||||||
|
|
||||||
|
Optional:
|
||||||
|
- `model`
|
||||||
|
- `cwd`
|
||||||
|
|
||||||
|
Behavior:
|
||||||
|
- always launches with the process runner, ignoring tmux config
|
||||||
|
- returns immediately after spawn request
|
||||||
|
- returns run handle metadata plus counts snapshot
|
||||||
|
- does not wait for completion
|
||||||
|
|
||||||
|
### `background_agent_status`
|
||||||
|
|
||||||
|
Purpose:
|
||||||
|
- let the main agent poll background runs and inspect counts
|
||||||
|
|
||||||
|
Parameters:
|
||||||
|
- `runId?` — inspect one run
|
||||||
|
- `includeCompleted?` — default `false`; when omitted, only active runs are listed unless `runId` is provided
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
- counts: `running`, `completed`, `failed`, `aborted`, `total`
|
||||||
|
- per-run rows with preset, task, cwd, model info, timestamps, artifact paths, and status
|
||||||
|
- final result fields when a run is terminal
|
||||||
|
|
||||||
|
## Preset design
|
||||||
|
|
||||||
|
### Discovery
|
||||||
|
|
||||||
|
Load presets from two sources:
|
||||||
|
- global: `join(getAgentDir(), "subagents")`
|
||||||
|
- project: nearest ancestor directory containing `.pi/subagents`
|
||||||
|
|
||||||
|
Merge order:
|
||||||
|
1. global presets
|
||||||
|
2. project presets override global presets with the same `name`
|
||||||
|
|
||||||
|
No confirmation gate for project presets.
|
||||||
|
|
||||||
|
### File format
|
||||||
|
|
||||||
|
Each preset is one markdown file with frontmatter and body.
|
||||||
|
|
||||||
|
Required frontmatter:
|
||||||
|
- `name`
|
||||||
|
- `description`
|
||||||
|
|
||||||
|
Optional frontmatter:
|
||||||
|
- `model`
|
||||||
|
- `tools`
|
||||||
|
|
||||||
|
Body:
|
||||||
|
- appended system prompt text
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```md
|
||||||
|
---
|
||||||
|
name: repo-scout
|
||||||
|
description: Fast repo exploration
|
||||||
|
model: github-copilot/gpt-4o
|
||||||
|
tools: read,grep,find,ls
|
||||||
|
---
|
||||||
|
You are a scout. Explore quickly, summarize clearly, and avoid implementation.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Preset semantics
|
||||||
|
|
||||||
|
- `model` is the default model when the call does not provide one.
|
||||||
|
- `tools` is optional.
|
||||||
|
- omitted `tools` means normal child-tool behavior (no built-in tool restriction)
|
||||||
|
- when `tools` is present, pass it through Pi CLI `--tools`, which limits built-in tools only
|
||||||
|
- prompt text comes only from the markdown body; no inline prompt override
|
||||||
|
|
||||||
|
## Runtime design
|
||||||
|
|
||||||
|
### Foreground subagent execution
|
||||||
|
|
||||||
|
`src/tool.ts` becomes preset-aware.
|
||||||
|
|
||||||
|
For each run:
|
||||||
|
1. discover presets
|
||||||
|
2. resolve the named preset
|
||||||
|
3. normalize explicit `model` override against available models if present
|
||||||
|
4. normalize preset default model against available models if used
|
||||||
|
5. compute effective model from call override or preset default
|
||||||
|
6. pass runner metadata including preset, prompt text, built-in tool allowlist, and model selection
|
||||||
|
|
||||||
|
Parallel behavior stays the same apart from:
|
||||||
|
- no `chain`
|
||||||
|
- each task resolving its own preset/model
|
||||||
|
- summary lines identifying tasks by index and/or preset, not old role names
|
||||||
|
|
||||||
|
### Background execution
|
||||||
|
|
||||||
|
`background_agent` launches the wrapper via process-runner primitives but returns immediately.
|
||||||
|
|
||||||
|
Flow:
|
||||||
|
1. resolve preset + effective model
|
||||||
|
2. create run artifacts
|
||||||
|
3. spawn wrapper process
|
||||||
|
4. register run in extension background registry as `running`
|
||||||
|
5. append persistent session entry for the new run
|
||||||
|
6. start detached watcher on that run’s `result.json` / `events.jsonl`
|
||||||
|
7. return handle metadata and counts snapshot
|
||||||
|
|
||||||
|
Background runs are process-only even when normal `subagent` foreground runs are configured for tmux.
|
||||||
|
|
||||||
|
### Background registry
|
||||||
|
|
||||||
|
The extension owns a session-scoped registry keyed by `runId`.
|
||||||
|
|
||||||
|
Stored metadata per run:
|
||||||
|
- `runId`
|
||||||
|
- `preset`
|
||||||
|
- `task`
|
||||||
|
- `cwd`
|
||||||
|
- `requestedModel`
|
||||||
|
- `resolvedModel`
|
||||||
|
- artifact paths
|
||||||
|
- timestamps
|
||||||
|
- terminal result fields when available
|
||||||
|
- status: `running | completed | failed | aborted`
|
||||||
|
|
||||||
|
The registry also computes counts:
|
||||||
|
- `running`
|
||||||
|
- `completed`
|
||||||
|
- `failed`
|
||||||
|
- `aborted`
|
||||||
|
- `total`
|
||||||
|
|
||||||
|
### Persistence and reload behavior
|
||||||
|
|
||||||
|
Persist background state with session custom entries.
|
||||||
|
|
||||||
|
Custom entry types:
|
||||||
|
- `pi-subagents:bg-run` — initial launch metadata
|
||||||
|
- `pi-subagents:bg-update` — later status/result transitions
|
||||||
|
|
||||||
|
On `session_start`, rebuild the in-memory registry by scanning `ctx.sessionManager.getEntries()`.
|
||||||
|
|
||||||
|
For rebuilt runs that are still non-terminal:
|
||||||
|
- if `result.json` already exists, ingest it immediately
|
||||||
|
- otherwise reattach a watcher so completion still updates counts, notifications, and session messages after reload/resume
|
||||||
|
|
||||||
|
## Notification and polling design
|
||||||
|
|
||||||
|
### Completion notification
|
||||||
|
|
||||||
|
When a detached run becomes terminal:
|
||||||
|
1. update registry and counts
|
||||||
|
2. append `pi-subagents:bg-update`
|
||||||
|
3. update footer status text, e.g. `bg: 2 running / 5 total`
|
||||||
|
4. emit UI notification if UI is available
|
||||||
|
5. inject a visible custom session message describing completion
|
||||||
|
|
||||||
|
The completion message must **not** trigger a new agent turn automatically.
|
||||||
|
|
||||||
|
### Polling
|
||||||
|
|
||||||
|
The parent agent polls with `background_agent_status`.
|
||||||
|
|
||||||
|
Typical use:
|
||||||
|
- ask for current running count
|
||||||
|
- list active background runs
|
||||||
|
- inspect one `runId`
|
||||||
|
- fetch terminal result summary after notification arrives
|
||||||
|
|
||||||
|
## Wrapper and artifact design
|
||||||
|
|
||||||
|
### Artifact layout
|
||||||
|
|
||||||
|
Keep run directories under:
|
||||||
|
- `.pi/subagents/runs/<runId>/`
|
||||||
|
|
||||||
|
Keep existing files and restore the removed prompt artifact when needed:
|
||||||
|
- `meta.json`
|
||||||
|
- `events.jsonl`
|
||||||
|
- `result.json`
|
||||||
|
- `stdout.log`
|
||||||
|
- `stderr.log`
|
||||||
|
- `transcript.log`
|
||||||
|
- `child-session.jsonl`
|
||||||
|
- `system-prompt.md`
|
||||||
|
|
||||||
|
### Metadata
|
||||||
|
|
||||||
|
Keep existing generic bookkeeping and add preset-specific fields:
|
||||||
|
- `preset`
|
||||||
|
- `presetSource`
|
||||||
|
- `tools`
|
||||||
|
- `systemPrompt`
|
||||||
|
- `systemPromptPath`
|
||||||
|
|
||||||
|
Do not reintroduce old bundled-role concepts or role-only behavior.
|
||||||
|
|
||||||
|
### Child wrapper
|
||||||
|
|
||||||
|
`src/wrapper/cli.mjs` should again support:
|
||||||
|
- `--append-system-prompt <path>` when preset prompt text exists
|
||||||
|
- `--tools <csv>` when preset `tools` exists
|
||||||
|
|
||||||
|
Keep:
|
||||||
|
- `PI_SUBAGENTS_CHILD=1`
|
||||||
|
- github-copilot initiator behavior based on effective model
|
||||||
|
- best-effort artifact appends that must never block writing `result.json`
|
||||||
|
- semantic-completion exit handling
|
||||||
|
|
||||||
|
## Extension registration behavior
|
||||||
|
|
||||||
|
Keep existing model-registration behavior for model-dependent tools:
|
||||||
|
- preserve current available-model order for schema enums
|
||||||
|
- do not mutate available-model arrays when deduping cache keys
|
||||||
|
- re-register when model set changes
|
||||||
|
- do not re-register when model set is the same in different order
|
||||||
|
- if the first observed set is empty, a later non-empty set must still register
|
||||||
|
- skip tool registration entirely when `PI_SUBAGENTS_CHILD=1`
|
||||||
|
|
||||||
|
Register:
|
||||||
|
- `subagent`
|
||||||
|
- `background_agent`
|
||||||
|
- `background_agent_status`
|
||||||
|
|
||||||
|
`background_agent_status` does not need model parameters, but registration still follows the package’s existing model-availability gate for consistency.
|
||||||
|
|
||||||
|
## Prompt and documentation design
|
||||||
|
|
||||||
|
Rewrite shipped prompts so they no longer mention `chain` mode.
|
||||||
|
|
||||||
|
They should instead describe repeated `subagent` calls or `subagent.tasks` parallel calls, for example:
|
||||||
|
- inspect with a preset
|
||||||
|
- turn findings into a plan with another preset
|
||||||
|
- implement or review in separate follow-up calls
|
||||||
|
|
||||||
|
README and docs should describe:
|
||||||
|
- preset directories and markdown format
|
||||||
|
- `background_agent`
|
||||||
|
- `background_agent_status`
|
||||||
|
- background completion notification behavior
|
||||||
|
- background count tracking
|
||||||
|
- process-only behavior for detached runs
|
||||||
|
- built-in-tool-only semantics of preset `tools`
|
||||||
|
|
||||||
|
They should continue to avoid claiming bundled built-in roles.
|
||||||
|
|
||||||
|
## File-level impact
|
||||||
|
|
||||||
|
Expected new files:
|
||||||
|
- `src/presets.ts`
|
||||||
|
- `src/presets.test.ts`
|
||||||
|
- `src/background-registry.ts`
|
||||||
|
- `src/background-registry.test.ts`
|
||||||
|
- `src/background-schema.ts`
|
||||||
|
- `src/background-tool.ts`
|
||||||
|
- `src/background-tool.test.ts`
|
||||||
|
- `src/background-status-tool.ts`
|
||||||
|
- `src/background-status-tool.test.ts`
|
||||||
|
|
||||||
|
Expected modified files:
|
||||||
|
- `index.ts`
|
||||||
|
- `src/schema.ts`
|
||||||
|
- `src/tool.ts`
|
||||||
|
- `src/models.ts`
|
||||||
|
- `src/runner.ts` and/or `src/process-runner.ts`
|
||||||
|
- `src/artifacts.ts`
|
||||||
|
- `src/process-runner.test.ts`
|
||||||
|
- `src/extension.test.ts`
|
||||||
|
- `src/artifacts.test.ts`
|
||||||
|
- `src/wrapper/cli.mjs`
|
||||||
|
- `src/wrapper/cli.test.ts`
|
||||||
|
- `src/wrapper/render.mjs`
|
||||||
|
- `src/wrapper/render.test.ts`
|
||||||
|
- `src/prompts.test.ts`
|
||||||
|
- `README.md`
|
||||||
|
- `prompts/*.md`
|
||||||
|
- `AGENTS.md`
|
||||||
|
|
||||||
|
Expected removals:
|
||||||
|
- `src/tool-chain.test.ts`
|
||||||
|
|
||||||
|
## Testing plan
|
||||||
|
|
||||||
|
Implementation should follow TDD.
|
||||||
|
|
||||||
|
### New coverage
|
||||||
|
|
||||||
|
Add tests for:
|
||||||
|
- preset discovery from global and project directories
|
||||||
|
- project preset override by name
|
||||||
|
- required preset selection in single and parallel mode
|
||||||
|
- model resolution from call override vs preset default
|
||||||
|
- error when neither call nor preset supplies a valid model
|
||||||
|
- chain removal from schema and runtime
|
||||||
|
- detached background launch returning immediately
|
||||||
|
- background registry counts
|
||||||
|
- session-entry persistence and reload reconstruction
|
||||||
|
- completion notification emitting UI notify + visible session message without auto-turn
|
||||||
|
- polling one background run and many background runs
|
||||||
|
|
||||||
|
### Preserved coverage
|
||||||
|
|
||||||
|
Keep regression coverage for:
|
||||||
|
- child sessions skipping subagent tool registration when `PI_SUBAGENTS_CHILD=1`
|
||||||
|
- no tool registration when no models are available
|
||||||
|
- later registration when a non-empty model list appears
|
||||||
|
- no re-registration for the same model set in different order
|
||||||
|
- re-registration when the model set changes
|
||||||
|
- github-copilot initiator behavior
|
||||||
|
- best-effort artifact logging never preventing `result.json` writes
|
||||||
|
- effective model using the resolved model when requested/resolved differ
|
||||||
|
|
||||||
|
## Risks and mitigations
|
||||||
|
|
||||||
|
### Risk: preset `tools` sounds broader than Pi can enforce
|
||||||
|
|
||||||
|
Mitigation:
|
||||||
|
- document that preset `tools` maps to Pi CLI `--tools`
|
||||||
|
- treat it as a built-in tool allowlist only
|
||||||
|
- keep `PI_SUBAGENTS_CHILD=1` so this package never re-registers its own subagent tools inside child runs
|
||||||
|
|
||||||
|
### Risk: detached watcher state lost on reload
|
||||||
|
|
||||||
|
Mitigation:
|
||||||
|
- persist launch/update events as session custom entries
|
||||||
|
- rebuild registry on `session_start`
|
||||||
|
- reattach watchers for non-terminal runs
|
||||||
|
|
||||||
|
### Risk: background notifications spam the user
|
||||||
|
|
||||||
|
Mitigation:
|
||||||
|
- emit one terminal notification per run
|
||||||
|
- keep footer count compact
|
||||||
|
- require explicit polling for detailed inspection
|
||||||
|
|
||||||
|
### Risk: larger extension entrypoint
|
||||||
|
|
||||||
|
Mitigation:
|
||||||
|
- keep preset discovery, registry/state, and tool definitions in separate focused modules
|
||||||
|
- keep runner-specific logic in existing runner files with minimal changes
|
||||||
Reference in New Issue
Block a user