pi-context-manager/docs/specs/2026-04-12-context-manager-design.md

# Context manager: stronger pruning and earlier compaction

## Status

Approved for planning.

## Problem

`pi-context-manager` loads, but it does not materially cap live context growth.

Observed behavior:
- Context usage keeps climbing during long sessions.
- The extension mostly distills old bulky `toolResult` output, but it keeps most older user and assistant turns intact.
- Compaction still waits for Pi core's later reserve-token threshold.
- Raw compaction and branch-summary artifacts can remain in live context even though the extension already persists and replays the same information through its own ledger.
- Footer status noise is unnecessary for this package.

## Goals

1. Keep live context flatter during long sessions.
2. Trigger compaction earlier than Pi core's default reserve-token threshold.
3. Use lean resume injection, not raw latest compaction or branch-summary blobs.
4. Remove the persistent footer status line.
5. Preserve deterministic memory merging behavior, including stable exact-timestamp tie resolution.

## Non-goals

- Adding new LLM-facing tools.
- Broad extractor or summarizer redesign unrelated to live-context pressure.
- Editing `docs/extensions.md`.
- Changing the existing deterministic same-slot tie-break contract in `src/ledger.ts`.

## Constraints

- Do not wait for Pi core's much later reserve-token threshold.
- Resume injection should use lean packet text, not raw latest compaction or branch-summary blobs.
- Exact-timestamp ties must resolve the same way regardless of candidate processing order.
- Keep changes package-local and minimal.

## Design

### 1. Pre-filter raw summary artifacts from live context

Before turn-aware pruning runs, the extension should drop raw `compactionSummary` and `branchSummary` messages from the `context` event payload.

Reason:
- These artifacts are already persisted.
- They are already replayed into the extension ledger.
- A separate hidden packet/resume message is already injected.
- Keeping the raw artifacts in live context duplicates information and allows context growth even when the extension is active.

Effect:
- The model sees one lean synthesized checkpoint, not both the raw Pi summary artifacts and the extension's packet.
- The screenshot symptom of noisy summary text leaking into context should disappear.

### 2. Replace message-level pruning with turn-aware pruning

`src/prune.ts` should stop acting like a bulky-tool-result filter with a weak recent-turn suffix. It should prune by conversation turn.

#### Turn model

A turn is the contiguous slice beginning at a `user` message and including the assistant reply plus any following `toolResult` messages until the next `user` message.

Pruning rules:
- Keep only the most recent turn suffix, with the exact turn count controlled by policy and zone.
- Drop entire older turns instead of keeping their user and assistant messages forever.
- Within kept but older turns, distill bulky `toolResult` messages to short summaries.
- Keep the newest active turn lossless.
- Never keep a `toolResult` without the surrounding kept turn.

Why this shape:
- It preserves tool ordering.
- It prevents stale planning chatter from accumulating.
- It makes packet injection the main mechanism for carrying older context.

#### Policy shape

Current `Policy.recentUserTurns` remains the main knob, but zone adjustments become stronger.

Recommended practical behavior:
- Green: keep a small suffix of recent turns.
- Yellow: keep fewer turns and distill more aggressively.
- Red/compact: keep only the newest turn or the smallest safe suffix.

The exact numbers can be set during planning, but the direction is fixed: fewer full turns than today, with stronger tightening once pressure reaches yellow/red.

### 3. Make resume injection lean

`src/runtime.ts` currently builds resume text by prepending raw `lastCompactionSummary` and `lastBranchSummary`, then appending the ledger-based resume packet.

That should change.

New behavior:
- `buildResumePacket()` returns only the lean ledger-rendered restart packet.
- Raw persisted summary blobs remain stored in the snapshot for inspection and recovery, but they are not injected into model context.

Why:
- The ledger already extracts active goal, constraints, decisions, tasks, and blockers from summaries.
- Re-injecting the full raw summaries duplicates content and defeats pruning.
- Lean packet text is easier to bound and test.

### 4. Trigger compaction earlier from the extension

The extension should request compaction on its own once context pressure reaches the extension's red zone instead of waiting for Pi core's later `contextWindow - reserveTokens` check.

#### Trigger point

Use `turn_end` after usage is observed, because it provides the newest context measurement before the next model call.

#### Gate behavior

Compaction should trigger when:
- `ctx.getContextUsage()?.tokens` is known, and
- the runtime zone reaches `red` or worse under the extension's model-aware policy, and
- a local latch/cooldown says a compaction request is not already in flight for the same pressure episode.

Compaction should not spam:
- only fire on threshold crossing or after a clear reset,
- clear the latch after successful compaction or after tokens fall back below the trigger zone.

Reason:
- This is the earliest safe extension-controlled point that uses real usage data.
- It directly satisfies the requirement to compact earlier than Pi core.

### 5. Remove footer status noise

Remove `ctx.ui.setStatus("context-manager", ...)` updates.

Keep:
- `/ctx-status` for explicit inspection.
- Snapshot persistence and internal pressure tracking.

Do not keep:
- automatic footer text like `ctx green/yellow/red/compact`.

## Data flow after the change

1. `turn_end`
   - Sync model context window.
   - Observe actual context tokens.
   - Persist snapshot.
   - If the extension's red-zone compaction gate trips, request compaction immediately.

2. `session_compact` / `session_tree`
   - Persist summary artifacts to snapshot state.
   - Re-ingest them into the ledger.
   - Arm one-shot resume injection.

3. `context`
   - Remove raw `compactionSummary` and `branchSummary` messages from the outgoing message list.
   - Prune remaining conversation by recent turn suffix.
   - Distill bulky kept tool results when allowed by zone.
   - Inject exactly one hidden lean packet or lean resume packet.

## Testing

### `src/prune.test.ts`

Add coverage for:
- dropping entire older turns instead of retaining their user and assistant messages,
- preserving the newest turn intact,
- distilling bulky tool results only inside older kept turns,
- stronger suffix tightening as zone worsens,
- preserving ordering safety for assistant/tool-result groups.

### `src/runtime.test.ts`

Add coverage for:
- `buildResumePacket()` no longer containing raw `## Latest compaction handoff` or `## Latest branch handoff` sections,
- lean resume output still containing current goal, task, constraints, decisions, and blockers extracted into the ledger.

### `src/extension.test.ts`

Add coverage for:
- filtering raw `compactionSummary` and `branchSummary` messages from the `context` payload,
- triggering extension-driven compaction once pressure reaches the red zone without waiting for Pi core's later threshold,
- not repeatedly triggering compaction while a latch/cooldown is active,
- no `context-manager` footer status writes,
- `/ctx-status` still reporting mode, zone, packet size, and summary presence.

## Acceptance criteria

- Live context no longer grows monotonically just because older user and assistant turns remain in place.
- The extension can compact before Pi core's reserve-token threshold.
- Hidden injected context no longer includes raw full compaction or branch-summary blobs.
- Raw summary artifact messages do not remain in live context after the extension has folded them into its own ledger.
- Footer status line from this package is gone.
- Existing deterministic ledger tie behavior remains unchanged.

## Risks and mitigations

### Risk: pruning drops too much recent context

Mitigation:
- keep the newest turn lossless,
- keep packet generation focused on active goal, constraints, decisions, tasks, blockers,
- cover turn-suffix behavior with targeted tests.

### Risk: early compaction triggers too often

Mitigation:
- use threshold-crossing plus latch/cooldown behavior,
- clear the latch only after compaction or a meaningful pressure drop,
- test repeated `turn_end` events around the boundary.

### Risk: summary filtering removes needed information

Mitigation:
- filter only raw `compactionSummary` and `branchSummary` message roles,
- keep summary-derived facts in the ledger,
- keep persisted raw summaries in snapshots for debug and recovery.