170 lines
5.2 KiB
Markdown
170 lines
5.2 KiB
Markdown
# Subagent Stop-Reason Normalization Design
|
|
|
|
**Date:** 2026-04-11
|
|
**Package:** `pi-subagents`
|
|
|
|
## Goal
|
|
|
|
Fix regression where subagents now exit almost immediately after being called, while preserving the previous fix for lingering child processes that already reached real completion.
|
|
|
|
## Root cause
|
|
|
|
`src/wrapper/cli.mjs` currently starts semantic completion cleanup on any normalized assistant event with a `stopReason`.
|
|
|
|
That assumption is wrong.
|
|
|
|
Evidence gathered during debugging:
|
|
|
|
1. `pi` emits `message_end` for assistant messages in general, not only for final overall completion.
|
|
2. Pi stop reasons include non-terminal tool-use states.
|
|
3. The wrapper currently treats those non-terminal assistant messages as terminal.
|
|
4. After the first assistant `message_end`, the wrapper waits ~250 ms and then sends `SIGTERM` / `SIGKILL`.
|
|
5. Real subagent runs therefore get killed before their tool work or later assistant turns complete.
|
|
|
|
A local repro confirmed this with a fake `pi` binary that emitted:
|
|
|
|
- an early assistant `message_end` with `stopReason: "toolUse"`,
|
|
- later tool/final events,
|
|
- and no immediate process exit.
|
|
|
|
The wrapper exited early and wrote `result.json` with the first assistant text instead of the true final result.
|
|
|
|
## Chosen approach
|
|
|
|
Use **normalized semantic completion**.
|
|
|
|
Instead of treating every raw `message_end.stopReason` as terminal, introduce a canonical stop-reason layer and make wrapper completion decisions from that semantic category.
|
|
|
|
## Scope
|
|
|
|
### Modify
|
|
|
|
- `src/wrapper/normalize.mjs`
|
|
- `src/wrapper/normalize.test.ts`
|
|
- `src/wrapper/cli.mjs`
|
|
- `src/wrapper/cli.test.ts`
|
|
|
|
### Do not modify
|
|
|
|
- `src/monitor.ts`
|
|
- `src/process-runner.ts`
|
|
- `src/tmux-runner.ts`
|
|
- `src/tool.ts`
|
|
- `src/schema.ts`
|
|
|
|
## Design
|
|
|
|
### 1. Canonical stop reasons
|
|
|
|
Normalize raw provider/session stop reasons into canonical values at the wrapper boundary.
|
|
|
|
Expected canonical values:
|
|
|
|
- `stop` — normal terminal completion
|
|
- `length` — terminal length limit
|
|
- `toolUse` — non-terminal assistant turn that is handing off to tools
|
|
- `aborted` — terminal abort
|
|
- `error` — terminal failure
|
|
|
|
Representative raw mappings:
|
|
|
|
- `toolUse`, `tool_use` -> `toolUse`
|
|
- `stop`, `end_turn`, `endTurn` -> `stop`
|
|
- `length` -> `length`
|
|
- `aborted` -> `aborted`
|
|
- `error` -> `error`
|
|
|
|
Unknown values should be preserved conservatively instead of guessed away. The wrapper should not treat an unknown stop reason as definitely terminal unless explicitly mapped that way.
|
|
|
|
### 2. Preserve raw reason for debugging
|
|
|
|
Normalized assistant events should keep both:
|
|
|
|
- canonical `stopReason`
|
|
- auxiliary `rawStopReason`
|
|
|
|
This gives stable internal semantics without losing evidence when debugging provider-specific behavior.
|
|
|
|
### 3. Terminal vs non-terminal assistant messages
|
|
|
|
Wrapper semantic completion should start only for canonical **terminal** stop reasons:
|
|
|
|
- `stop`
|
|
- `length`
|
|
- `aborted`
|
|
- `error`
|
|
|
|
Wrapper semantic completion must **not** start for:
|
|
|
|
- `toolUse`
|
|
- missing stop reason
|
|
- unknown unmapped stop reason
|
|
|
|
This preserves the previous lingering-child fix while no longer killing subagents during normal tool orchestration.
|
|
|
|
### 4. Result semantics
|
|
|
|
`result.json` should use canonical `stopReason` so downstream logic sees stable values.
|
|
|
|
`rawStopReason` should be included as auxiliary debug information when available.
|
|
|
|
If a true terminal assistant message was already captured and the child lingers, the wrapper may still force cleanup and write a successful result.
|
|
|
|
If the most recent assistant message was non-terminal (`toolUse`), forced cleanup must not begin.
|
|
|
|
### 5. File responsibilities
|
|
|
|
#### `src/wrapper/normalize.mjs`
|
|
|
|
Owns translation from raw Pi JSON events into wrapper events with canonical stop-reason semantics.
|
|
|
|
#### `src/wrapper/cli.mjs`
|
|
|
|
Owns semantic completion policy:
|
|
|
|
- consume normalized events,
|
|
- track latest terminal assistant result,
|
|
- trigger grace/cleanup only for terminal semantic stop reasons,
|
|
- preserve best-effort artifact behavior.
|
|
|
|
#### `src/wrapper/*.test.ts`
|
|
|
|
Own regression coverage for normalization and wrapper lifecycle behavior.
|
|
|
|
## Testing strategy
|
|
|
|
Follow TDD.
|
|
|
|
### New failing tests
|
|
|
|
1. **Normalization test**
|
|
- verifies raw stop reasons map to canonical values correctly
|
|
- verifies raw value is preserved for debugging
|
|
|
|
2. **Early-exit regression test**
|
|
- fake `pi` emits early assistant `message_end` with `toolUse` / `tool_use`
|
|
- later emits tool/final events
|
|
- wrapper must not exit early
|
|
- final `result.json` must reflect the later true terminal completion
|
|
|
|
### Existing tests to preserve
|
|
|
|
- lingering final-message child still force-completes
|
|
- spawn failure still writes `result.json`
|
|
- artifact write failures still do not block `result.json`
|
|
- initiator/child environment tests stay green
|
|
|
|
## Non-goals
|
|
|
|
- switching the wrapper from JSON mode to RPC mode
|
|
- redesigning the monitor contract
|
|
- changing runner selection behavior
|
|
- broader tool/schema changes
|
|
|
|
## Expected outcome
|
|
|
|
After this change:
|
|
|
|
- subagents will no longer be killed immediately on non-terminal tool-use assistant messages,
|
|
- lingering child processes that already reached real terminal completion will still be cleaned up,
|
|
- result artifacts will carry stable canonical stop-reason semantics plus raw debugging data. |