5.2 KiB
Subagent Stop-Reason Normalization Design
Date: 2026-04-11
Package: pi-subagents
Goal
Fix regression where subagents now exit almost immediately after being called, while preserving the previous fix for lingering child processes that already reached real completion.
Root cause
src/wrapper/cli.mjs currently starts semantic completion cleanup on any normalized assistant event with a stopReason.
That assumption is wrong.
Evidence gathered during debugging:
piemitsmessage_endfor assistant messages in general, not only for final overall completion.- Pi stop reasons include non-terminal tool-use states.
- The wrapper currently treats those non-terminal assistant messages as terminal.
- After the first assistant
message_end, the wrapper waits ~250 ms and then sendsSIGTERM/SIGKILL. - Real subagent runs therefore get killed before their tool work or later assistant turns complete.
A local repro confirmed this with a fake pi binary that emitted:
- an early assistant
message_endwithstopReason: "toolUse", - later tool/final events,
- and no immediate process exit.
The wrapper exited early and wrote result.json with the first assistant text instead of the true final result.
Chosen approach
Use normalized semantic completion.
Instead of treating every raw message_end.stopReason as terminal, introduce a canonical stop-reason layer and make wrapper completion decisions from that semantic category.
Scope
Modify
src/wrapper/normalize.mjssrc/wrapper/normalize.test.tssrc/wrapper/cli.mjssrc/wrapper/cli.test.ts
Do not modify
src/monitor.tssrc/process-runner.tssrc/tmux-runner.tssrc/tool.tssrc/schema.ts
Design
1. Canonical stop reasons
Normalize raw provider/session stop reasons into canonical values at the wrapper boundary.
Expected canonical values:
stop— normal terminal completionlength— terminal length limittoolUse— non-terminal assistant turn that is handing off to toolsaborted— terminal aborterror— terminal failure
Representative raw mappings:
toolUse,tool_use->toolUsestop,end_turn,endTurn->stoplength->lengthaborted->abortederror->error
Unknown values should be preserved conservatively instead of guessed away. The wrapper should not treat an unknown stop reason as definitely terminal unless explicitly mapped that way.
2. Preserve raw reason for debugging
Normalized assistant events should keep both:
- canonical
stopReason - auxiliary
rawStopReason
This gives stable internal semantics without losing evidence when debugging provider-specific behavior.
3. Terminal vs non-terminal assistant messages
Wrapper semantic completion should start only for canonical terminal stop reasons:
stoplengthabortederror
Wrapper semantic completion must not start for:
toolUse- missing stop reason
- unknown unmapped stop reason
This preserves the previous lingering-child fix while no longer killing subagents during normal tool orchestration.
4. Result semantics
result.json should use canonical stopReason so downstream logic sees stable values.
rawStopReason should be included as auxiliary debug information when available.
If a true terminal assistant message was already captured and the child lingers, the wrapper may still force cleanup and write a successful result.
If the most recent assistant message was non-terminal (toolUse), forced cleanup must not begin.
5. File responsibilities
src/wrapper/normalize.mjs
Owns translation from raw Pi JSON events into wrapper events with canonical stop-reason semantics.
src/wrapper/cli.mjs
Owns semantic completion policy:
- consume normalized events,
- track latest terminal assistant result,
- trigger grace/cleanup only for terminal semantic stop reasons,
- preserve best-effort artifact behavior.
src/wrapper/*.test.ts
Own regression coverage for normalization and wrapper lifecycle behavior.
Testing strategy
Follow TDD.
New failing tests
-
Normalization test
- verifies raw stop reasons map to canonical values correctly
- verifies raw value is preserved for debugging
-
Early-exit regression test
- fake
piemits early assistantmessage_endwithtoolUse/tool_use - later emits tool/final events
- wrapper must not exit early
- final
result.jsonmust reflect the later true terminal completion
- fake
Existing tests to preserve
- lingering final-message child still force-completes
- spawn failure still writes
result.json - artifact write failures still do not block
result.json - initiator/child environment tests stay green
Non-goals
- switching the wrapper from JSON mode to RPC mode
- redesigning the monitor contract
- changing runner selection behavior
- broader tool/schema changes
Expected outcome
After this change:
- subagents will no longer be killed immediately on non-terminal tool-use assistant messages,
- lingering child processes that already reached real terminal completion will still be cleaned up,
- result artifacts will carry stable canonical stop-reason semantics plus raw debugging data.