docs: add stop-reason normalization design spec
This commit is contained in:
@@ -0,0 +1,170 @@
|
||||
# Subagent Stop-Reason Normalization Design
|
||||
|
||||
**Date:** 2026-04-11
|
||||
**Package:** `pi-subagents`
|
||||
|
||||
## Goal
|
||||
|
||||
Fix regression where subagents now exit almost immediately after being called, while preserving the previous fix for lingering child processes that already reached real completion.
|
||||
|
||||
## Root cause
|
||||
|
||||
`src/wrapper/cli.mjs` currently starts semantic completion cleanup on any normalized assistant event with a `stopReason`.
|
||||
|
||||
That assumption is wrong.
|
||||
|
||||
Evidence gathered during debugging:
|
||||
|
||||
1. `pi` emits `message_end` for assistant messages in general, not only for final overall completion.
|
||||
2. Pi stop reasons include non-terminal tool-use states.
|
||||
3. The wrapper currently treats those non-terminal assistant messages as terminal.
|
||||
4. After the first assistant `message_end`, the wrapper waits ~250 ms and then sends `SIGTERM` / `SIGKILL`.
|
||||
5. Real subagent runs therefore get killed before their tool work or later assistant turns complete.
|
||||
|
||||
A local repro confirmed this with a fake `pi` binary that emitted:
|
||||
|
||||
- an early assistant `message_end` with `stopReason: "toolUse"`,
|
||||
- later tool/final events,
|
||||
- and no immediate process exit.
|
||||
|
||||
The wrapper exited early and wrote `result.json` with the first assistant text instead of the true final result.
|
||||
|
||||
## Chosen approach
|
||||
|
||||
Use **normalized semantic completion**.
|
||||
|
||||
Instead of treating every raw `message_end.stopReason` as terminal, introduce a canonical stop-reason layer and make wrapper completion decisions from that semantic category.
|
||||
|
||||
## Scope
|
||||
|
||||
### Modify
|
||||
|
||||
- `src/wrapper/normalize.mjs`
|
||||
- `src/wrapper/normalize.test.ts`
|
||||
- `src/wrapper/cli.mjs`
|
||||
- `src/wrapper/cli.test.ts`
|
||||
|
||||
### Do not modify
|
||||
|
||||
- `src/monitor.ts`
|
||||
- `src/process-runner.ts`
|
||||
- `src/tmux-runner.ts`
|
||||
- `src/tool.ts`
|
||||
- `src/schema.ts`
|
||||
|
||||
## Design
|
||||
|
||||
### 1. Canonical stop reasons
|
||||
|
||||
Normalize raw provider/session stop reasons into canonical values at the wrapper boundary.
|
||||
|
||||
Expected canonical values:
|
||||
|
||||
- `stop` — normal terminal completion
|
||||
- `length` — terminal length limit
|
||||
- `toolUse` — non-terminal assistant turn that is handing off to tools
|
||||
- `aborted` — terminal abort
|
||||
- `error` — terminal failure
|
||||
|
||||
Representative raw mappings:
|
||||
|
||||
- `toolUse`, `tool_use` -> `toolUse`
|
||||
- `stop`, `end_turn`, `endTurn` -> `stop`
|
||||
- `length` -> `length`
|
||||
- `aborted` -> `aborted`
|
||||
- `error` -> `error`
|
||||
|
||||
Unknown values should be preserved conservatively instead of guessed away. The wrapper should not treat an unknown stop reason as definitely terminal unless explicitly mapped that way.
|
||||
|
||||
### 2. Preserve raw reason for debugging
|
||||
|
||||
Normalized assistant events should keep both:
|
||||
|
||||
- canonical `stopReason`
|
||||
- auxiliary `rawStopReason`
|
||||
|
||||
This gives stable internal semantics without losing evidence when debugging provider-specific behavior.
|
||||
|
||||
### 3. Terminal vs non-terminal assistant messages
|
||||
|
||||
Wrapper semantic completion should start only for canonical **terminal** stop reasons:
|
||||
|
||||
- `stop`
|
||||
- `length`
|
||||
- `aborted`
|
||||
- `error`
|
||||
|
||||
Wrapper semantic completion must **not** start for:
|
||||
|
||||
- `toolUse`
|
||||
- missing stop reason
|
||||
- unknown unmapped stop reason
|
||||
|
||||
This preserves the previous lingering-child fix while no longer killing subagents during normal tool orchestration.
|
||||
|
||||
### 4. Result semantics
|
||||
|
||||
`result.json` should use canonical `stopReason` so downstream logic sees stable values.
|
||||
|
||||
`rawStopReason` should be included as auxiliary debug information when available.
|
||||
|
||||
If a true terminal assistant message was already captured and the child lingers, the wrapper may still force cleanup and write a successful result.
|
||||
|
||||
If the most recent assistant message was non-terminal (`toolUse`), forced cleanup must not begin.
|
||||
|
||||
### 5. File responsibilities
|
||||
|
||||
#### `src/wrapper/normalize.mjs`
|
||||
|
||||
Owns translation from raw Pi JSON events into wrapper events with canonical stop-reason semantics.
|
||||
|
||||
#### `src/wrapper/cli.mjs`
|
||||
|
||||
Owns semantic completion policy:
|
||||
|
||||
- consume normalized events,
|
||||
- track latest terminal assistant result,
|
||||
- trigger grace/cleanup only for terminal semantic stop reasons,
|
||||
- preserve best-effort artifact behavior.
|
||||
|
||||
#### `src/wrapper/*.test.ts`
|
||||
|
||||
Own regression coverage for normalization and wrapper lifecycle behavior.
|
||||
|
||||
## Testing strategy
|
||||
|
||||
Follow TDD.
|
||||
|
||||
### New failing tests
|
||||
|
||||
1. **Normalization test**
|
||||
- verifies raw stop reasons map to canonical values correctly
|
||||
- verifies raw value is preserved for debugging
|
||||
|
||||
2. **Early-exit regression test**
|
||||
- fake `pi` emits early assistant `message_end` with `toolUse` / `tool_use`
|
||||
- later emits tool/final events
|
||||
- wrapper must not exit early
|
||||
- final `result.json` must reflect the later true terminal completion
|
||||
|
||||
### Existing tests to preserve
|
||||
|
||||
- lingering final-message child still force-completes
|
||||
- spawn failure still writes `result.json`
|
||||
- artifact write failures still do not block `result.json`
|
||||
- initiator/child environment tests stay green
|
||||
|
||||
## Non-goals
|
||||
|
||||
- switching the wrapper from JSON mode to RPC mode
|
||||
- redesigning the monitor contract
|
||||
- changing runner selection behavior
|
||||
- broader tool/schema changes
|
||||
|
||||
## Expected outcome
|
||||
|
||||
After this change:
|
||||
|
||||
- subagents will no longer be killed immediately on non-terminal tool-use assistant messages,
|
||||
- lingering child processes that already reached real terminal completion will still be cleaned up,
|
||||
- result artifacts will carry stable canonical stop-reason semantics plus raw debugging data.
|
||||
Reference in New Issue
Block a user