Files
pi-context-manager/docs/specs/2026-04-12-context-manager-design.md

212 lines
8.6 KiB
Markdown

# Context manager: stronger pruning and earlier compaction
## Status
Approved for planning.
## Problem
`pi-context-manager` loads, but it does not materially cap live context growth.
Observed behavior:
- Context usage keeps climbing during long sessions.
- The extension mostly distills old bulky `toolResult` output, but it keeps most older user and assistant turns intact.
- Compaction still waits for Pi core's later reserve-token threshold.
- Raw compaction and branch-summary artifacts can remain in live context even though the extension already persists and replays the same information through its own ledger.
- Footer status noise is unnecessary for this package.
## Goals
1. Keep live context flatter during long sessions.
2. Trigger compaction earlier than Pi core's default reserve-token threshold.
3. Use lean resume injection, not raw latest compaction or branch-summary blobs.
4. Remove the persistent footer status line.
5. Preserve deterministic memory merging behavior, including stable exact-timestamp tie resolution.
## Non-goals
- Adding new LLM-facing tools.
- Broad extractor or summarizer redesign unrelated to live-context pressure.
- Editing `docs/extensions.md`.
- Changing the existing deterministic same-slot tie-break contract in `src/ledger.ts`.
## Constraints
- Do not wait for Pi core's much later reserve-token threshold.
- Resume injection should use lean packet text, not raw latest compaction or branch-summary blobs.
- Exact-timestamp ties must resolve the same way regardless of candidate processing order.
- Keep changes package-local and minimal.
## Design
### 1. Pre-filter raw summary artifacts from live context
Before turn-aware pruning runs, the extension should drop raw `compactionSummary` and `branchSummary` messages from the `context` event payload.
Reason:
- These artifacts are already persisted.
- They are already replayed into the extension ledger.
- A separate hidden packet/resume message is already injected.
- Keeping the raw artifacts in live context duplicates information and allows context growth even when the extension is active.
Effect:
- The model sees one lean synthesized checkpoint, not both the raw Pi summary artifacts and the extension's packet.
- The screenshot symptom of noisy summary text leaking into context should disappear.
### 2. Replace message-level pruning with turn-aware pruning
`src/prune.ts` should stop acting like a bulky-tool-result filter with a weak recent-turn suffix. It should prune by conversation turn.
#### Turn model
A turn is the contiguous slice beginning at a `user` message and including the assistant reply plus any following `toolResult` messages until the next `user` message.
Pruning rules:
- Keep only the most recent turn suffix, with the exact turn count controlled by policy and zone.
- Drop entire older turns instead of keeping their user and assistant messages forever.
- Within kept but older turns, distill bulky `toolResult` messages to short summaries.
- Keep the newest active turn lossless.
- Never keep a `toolResult` without the surrounding kept turn.
Why this shape:
- It preserves tool ordering.
- It prevents stale planning chatter from accumulating.
- It makes packet injection the main mechanism for carrying older context.
#### Policy shape
Current `Policy.recentUserTurns` remains the main knob, but zone adjustments become stronger.
Recommended practical behavior:
- Green: keep a small suffix of recent turns.
- Yellow: keep fewer turns and distill more aggressively.
- Red/compact: keep only the newest turn or the smallest safe suffix.
The exact numbers can be set during planning, but the direction is fixed: fewer full turns than today, with stronger tightening once pressure reaches yellow/red.
### 3. Make resume injection lean
`src/runtime.ts` currently builds resume text by prepending raw `lastCompactionSummary` and `lastBranchSummary`, then appending the ledger-based resume packet.
That should change.
New behavior:
- `buildResumePacket()` returns only the lean ledger-rendered restart packet.
- Raw persisted summary blobs remain stored in the snapshot for inspection and recovery, but they are not injected into model context.
Why:
- The ledger already extracts active goal, constraints, decisions, tasks, and blockers from summaries.
- Re-injecting the full raw summaries duplicates content and defeats pruning.
- Lean packet text is easier to bound and test.
### 4. Trigger compaction earlier from the extension
The extension should request compaction on its own once context pressure reaches the extension's red zone instead of waiting for Pi core's later `contextWindow - reserveTokens` check.
#### Trigger point
Use `turn_end` after usage is observed, because it provides the newest context measurement before the next model call.
#### Gate behavior
Compaction should trigger when:
- `ctx.getContextUsage()?.tokens` is known, and
- the runtime zone reaches `red` or worse under the extension's model-aware policy, and
- a local latch/cooldown says a compaction request is not already in flight for the same pressure episode.
Compaction should not spam:
- only fire on threshold crossing or after a clear reset,
- clear the latch after successful compaction or after tokens fall back below the trigger zone.
Reason:
- This is the earliest safe extension-controlled point that uses real usage data.
- It directly satisfies the requirement to compact earlier than Pi core.
### 5. Remove footer status noise
Remove `ctx.ui.setStatus("context-manager", ...)` updates.
Keep:
- `/ctx-status` for explicit inspection.
- Snapshot persistence and internal pressure tracking.
Do not keep:
- automatic footer text like `ctx green/yellow/red/compact`.
## Data flow after the change
1. `turn_end`
- Sync model context window.
- Observe actual context tokens.
- Persist snapshot.
- If the extension's red-zone compaction gate trips, request compaction immediately.
2. `session_compact` / `session_tree`
- Persist summary artifacts to snapshot state.
- Re-ingest them into the ledger.
- Arm one-shot resume injection.
3. `context`
- Remove raw `compactionSummary` and `branchSummary` messages from the outgoing message list.
- Prune remaining conversation by recent turn suffix.
- Distill bulky kept tool results when allowed by zone.
- Inject exactly one hidden lean packet or lean resume packet.
## Testing
### `src/prune.test.ts`
Add coverage for:
- dropping entire older turns instead of retaining their user and assistant messages,
- preserving the newest turn intact,
- distilling bulky tool results only inside older kept turns,
- stronger suffix tightening as zone worsens,
- preserving ordering safety for assistant/tool-result groups.
### `src/runtime.test.ts`
Add coverage for:
- `buildResumePacket()` no longer containing raw `## Latest compaction handoff` or `## Latest branch handoff` sections,
- lean resume output still containing current goal, task, constraints, decisions, and blockers extracted into the ledger.
### `src/extension.test.ts`
Add coverage for:
- filtering raw `compactionSummary` and `branchSummary` messages from the `context` payload,
- triggering extension-driven compaction once pressure reaches the red zone without waiting for Pi core's later threshold,
- not repeatedly triggering compaction while a latch/cooldown is active,
- no `context-manager` footer status writes,
- `/ctx-status` still reporting mode, zone, packet size, and summary presence.
## Acceptance criteria
- Live context no longer grows monotonically just because older user and assistant turns remain in place.
- The extension can compact before Pi core's reserve-token threshold.
- Hidden injected context no longer includes raw full compaction or branch-summary blobs.
- Raw summary artifact messages do not remain in live context after the extension has folded them into its own ledger.
- Footer status line from this package is gone.
- Existing deterministic ledger tie behavior remains unchanged.
## Risks and mitigations
### Risk: pruning drops too much recent context
Mitigation:
- keep the newest turn lossless,
- keep packet generation focused on active goal, constraints, decisions, tasks, blockers,
- cover turn-suffix behavior with targeted tests.
### Risk: early compaction triggers too often
Mitigation:
- use threshold-crossing plus latch/cooldown behavior,
- clear the latch only after compaction or a meaningful pressure drop,
- test repeated `turn_end` events around the boundary.
### Risk: summary filtering removes needed information
Mitigation:
- filter only raw `compactionSummary` and `branchSummary` message roles,
- keep summary-derived facts in the ledger,
- keep persisted raw summaries in snapshots for debug and recovery.