docs: add stop-reason normalization design spec

This commit is contained in:
pi
2026-04-11 01:40:14 +01:00
parent 620fb66358
commit 1d426e82f2

View File

@@ -0,0 +1,170 @@
# Subagent Stop-Reason Normalization Design
**Date:** 2026-04-11
**Package:** `pi-subagents`
## Goal
Fix regression where subagents now exit almost immediately after being called, while preserving the previous fix for lingering child processes that already reached real completion.
## Root cause
`src/wrapper/cli.mjs` currently starts semantic completion cleanup on any normalized assistant event with a `stopReason`.
That assumption is wrong.
Evidence gathered during debugging:
1. `pi` emits `message_end` for assistant messages in general, not only for final overall completion.
2. Pi stop reasons include non-terminal tool-use states.
3. The wrapper currently treats those non-terminal assistant messages as terminal.
4. After the first assistant `message_end`, the wrapper waits ~250 ms and then sends `SIGTERM` / `SIGKILL`.
5. Real subagent runs therefore get killed before their tool work or later assistant turns complete.
A local repro confirmed this with a fake `pi` binary that emitted:
- an early assistant `message_end` with `stopReason: "toolUse"`,
- later tool/final events,
- and no immediate process exit.
The wrapper exited early and wrote `result.json` with the first assistant text instead of the true final result.
## Chosen approach
Use **normalized semantic completion**.
Instead of treating every raw `message_end.stopReason` as terminal, introduce a canonical stop-reason layer and make wrapper completion decisions from that semantic category.
## Scope
### Modify
- `src/wrapper/normalize.mjs`
- `src/wrapper/normalize.test.ts`
- `src/wrapper/cli.mjs`
- `src/wrapper/cli.test.ts`
### Do not modify
- `src/monitor.ts`
- `src/process-runner.ts`
- `src/tmux-runner.ts`
- `src/tool.ts`
- `src/schema.ts`
## Design
### 1. Canonical stop reasons
Normalize raw provider/session stop reasons into canonical values at the wrapper boundary.
Expected canonical values:
- `stop` — normal terminal completion
- `length` — terminal length limit
- `toolUse` — non-terminal assistant turn that is handing off to tools
- `aborted` — terminal abort
- `error` — terminal failure
Representative raw mappings:
- `toolUse`, `tool_use` -> `toolUse`
- `stop`, `end_turn`, `endTurn` -> `stop`
- `length` -> `length`
- `aborted` -> `aborted`
- `error` -> `error`
Unknown values should be preserved conservatively instead of guessed away. The wrapper should not treat an unknown stop reason as definitely terminal unless explicitly mapped that way.
### 2. Preserve raw reason for debugging
Normalized assistant events should keep both:
- canonical `stopReason`
- auxiliary `rawStopReason`
This gives stable internal semantics without losing evidence when debugging provider-specific behavior.
### 3. Terminal vs non-terminal assistant messages
Wrapper semantic completion should start only for canonical **terminal** stop reasons:
- `stop`
- `length`
- `aborted`
- `error`
Wrapper semantic completion must **not** start for:
- `toolUse`
- missing stop reason
- unknown unmapped stop reason
This preserves the previous lingering-child fix while no longer killing subagents during normal tool orchestration.
### 4. Result semantics
`result.json` should use canonical `stopReason` so downstream logic sees stable values.
`rawStopReason` should be included as auxiliary debug information when available.
If a true terminal assistant message was already captured and the child lingers, the wrapper may still force cleanup and write a successful result.
If the most recent assistant message was non-terminal (`toolUse`), forced cleanup must not begin.
### 5. File responsibilities
#### `src/wrapper/normalize.mjs`
Owns translation from raw Pi JSON events into wrapper events with canonical stop-reason semantics.
#### `src/wrapper/cli.mjs`
Owns semantic completion policy:
- consume normalized events,
- track latest terminal assistant result,
- trigger grace/cleanup only for terminal semantic stop reasons,
- preserve best-effort artifact behavior.
#### `src/wrapper/*.test.ts`
Own regression coverage for normalization and wrapper lifecycle behavior.
## Testing strategy
Follow TDD.
### New failing tests
1. **Normalization test**
- verifies raw stop reasons map to canonical values correctly
- verifies raw value is preserved for debugging
2. **Early-exit regression test**
- fake `pi` emits early assistant `message_end` with `toolUse` / `tool_use`
- later emits tool/final events
- wrapper must not exit early
- final `result.json` must reflect the later true terminal completion
### Existing tests to preserve
- lingering final-message child still force-completes
- spawn failure still writes `result.json`
- artifact write failures still do not block `result.json`
- initiator/child environment tests stay green
## Non-goals
- switching the wrapper from JSON mode to RPC mode
- redesigning the monitor contract
- changing runner selection behavior
- broader tool/schema changes
## Expected outcome
After this change:
- subagents will no longer be killed immediately on non-terminal tool-use assistant messages,
- lingering child processes that already reached real terminal completion will still be cleaned up,
- result artifacts will carry stable canonical stop-reason semantics plus raw debugging data.