Files
pi-context-manager/docs/specs/2026-04-12-context-manager-design.md

8.6 KiB

Context manager: stronger pruning and earlier compaction

Status

Approved for planning.

Problem

pi-context-manager loads, but it does not materially cap live context growth.

Observed behavior:

  • Context usage keeps climbing during long sessions.
  • The extension mostly distills old bulky toolResult output, but it keeps most older user and assistant turns intact.
  • Compaction still waits for Pi core's later reserve-token threshold.
  • Raw compaction and branch-summary artifacts can remain in live context even though the extension already persists and replays the same information through its own ledger.
  • Footer status noise is unnecessary for this package.

Goals

  1. Keep live context flatter during long sessions.
  2. Trigger compaction earlier than Pi core's default reserve-token threshold.
  3. Use lean resume injection, not raw latest compaction or branch-summary blobs.
  4. Remove the persistent footer status line.
  5. Preserve deterministic memory merging behavior, including stable exact-timestamp tie resolution.

Non-goals

  • Adding new LLM-facing tools.
  • Broad extractor or summarizer redesign unrelated to live-context pressure.
  • Editing docs/extensions.md.
  • Changing the existing deterministic same-slot tie-break contract in src/ledger.ts.

Constraints

  • Do not wait for Pi core's much later reserve-token threshold.
  • Resume injection should use lean packet text, not raw latest compaction or branch-summary blobs.
  • Exact-timestamp ties must resolve the same way regardless of candidate processing order.
  • Keep changes package-local and minimal.

Design

1. Pre-filter raw summary artifacts from live context

Before turn-aware pruning runs, the extension should drop raw compactionSummary and branchSummary messages from the context event payload.

Reason:

  • These artifacts are already persisted.
  • They are already replayed into the extension ledger.
  • A separate hidden packet/resume message is already injected.
  • Keeping the raw artifacts in live context duplicates information and allows context growth even when the extension is active.

Effect:

  • The model sees one lean synthesized checkpoint, not both the raw Pi summary artifacts and the extension's packet.
  • The screenshot symptom of noisy summary text leaking into context should disappear.

2. Replace message-level pruning with turn-aware pruning

src/prune.ts should stop acting like a bulky-tool-result filter with a weak recent-turn suffix. It should prune by conversation turn.

Turn model

A turn is the contiguous slice beginning at a user message and including the assistant reply plus any following toolResult messages until the next user message.

Pruning rules:

  • Keep only the most recent turn suffix, with the exact turn count controlled by policy and zone.
  • Drop entire older turns instead of keeping their user and assistant messages forever.
  • Within kept but older turns, distill bulky toolResult messages to short summaries.
  • Keep the newest active turn lossless.
  • Never keep a toolResult without the surrounding kept turn.

Why this shape:

  • It preserves tool ordering.
  • It prevents stale planning chatter from accumulating.
  • It makes packet injection the main mechanism for carrying older context.

Policy shape

Current Policy.recentUserTurns remains the main knob, but zone adjustments become stronger.

Recommended practical behavior:

  • Green: keep a small suffix of recent turns.
  • Yellow: keep fewer turns and distill more aggressively.
  • Red/compact: keep only the newest turn or the smallest safe suffix.

The exact numbers can be set during planning, but the direction is fixed: fewer full turns than today, with stronger tightening once pressure reaches yellow/red.

3. Make resume injection lean

src/runtime.ts currently builds resume text by prepending raw lastCompactionSummary and lastBranchSummary, then appending the ledger-based resume packet.

That should change.

New behavior:

  • buildResumePacket() returns only the lean ledger-rendered restart packet.
  • Raw persisted summary blobs remain stored in the snapshot for inspection and recovery, but they are not injected into model context.

Why:

  • The ledger already extracts active goal, constraints, decisions, tasks, and blockers from summaries.
  • Re-injecting the full raw summaries duplicates content and defeats pruning.
  • Lean packet text is easier to bound and test.

4. Trigger compaction earlier from the extension

The extension should request compaction on its own once context pressure reaches the extension's red zone instead of waiting for Pi core's later contextWindow - reserveTokens check.

Trigger point

Use turn_end after usage is observed, because it provides the newest context measurement before the next model call.

Gate behavior

Compaction should trigger when:

  • ctx.getContextUsage()?.tokens is known, and
  • the runtime zone reaches red or worse under the extension's model-aware policy, and
  • a local latch/cooldown says a compaction request is not already in flight for the same pressure episode.

Compaction should not spam:

  • only fire on threshold crossing or after a clear reset,
  • clear the latch after successful compaction or after tokens fall back below the trigger zone.

Reason:

  • This is the earliest safe extension-controlled point that uses real usage data.
  • It directly satisfies the requirement to compact earlier than Pi core.

Remove ctx.ui.setStatus("context-manager", ...) updates.

Keep:

  • /ctx-status for explicit inspection.
  • Snapshot persistence and internal pressure tracking.

Do not keep:

  • automatic footer text like ctx green/yellow/red/compact.

Data flow after the change

  1. turn_end

    • Sync model context window.
    • Observe actual context tokens.
    • Persist snapshot.
    • If the extension's red-zone compaction gate trips, request compaction immediately.
  2. session_compact / session_tree

    • Persist summary artifacts to snapshot state.
    • Re-ingest them into the ledger.
    • Arm one-shot resume injection.
  3. context

    • Remove raw compactionSummary and branchSummary messages from the outgoing message list.
    • Prune remaining conversation by recent turn suffix.
    • Distill bulky kept tool results when allowed by zone.
    • Inject exactly one hidden lean packet or lean resume packet.

Testing

src/prune.test.ts

Add coverage for:

  • dropping entire older turns instead of retaining their user and assistant messages,
  • preserving the newest turn intact,
  • distilling bulky tool results only inside older kept turns,
  • stronger suffix tightening as zone worsens,
  • preserving ordering safety for assistant/tool-result groups.

src/runtime.test.ts

Add coverage for:

  • buildResumePacket() no longer containing raw ## Latest compaction handoff or ## Latest branch handoff sections,
  • lean resume output still containing current goal, task, constraints, decisions, and blockers extracted into the ledger.

src/extension.test.ts

Add coverage for:

  • filtering raw compactionSummary and branchSummary messages from the context payload,
  • triggering extension-driven compaction once pressure reaches the red zone without waiting for Pi core's later threshold,
  • not repeatedly triggering compaction while a latch/cooldown is active,
  • no context-manager footer status writes,
  • /ctx-status still reporting mode, zone, packet size, and summary presence.

Acceptance criteria

  • Live context no longer grows monotonically just because older user and assistant turns remain in place.
  • The extension can compact before Pi core's reserve-token threshold.
  • Hidden injected context no longer includes raw full compaction or branch-summary blobs.
  • Raw summary artifact messages do not remain in live context after the extension has folded them into its own ledger.
  • Footer status line from this package is gone.
  • Existing deterministic ledger tie behavior remains unchanged.

Risks and mitigations

Risk: pruning drops too much recent context

Mitigation:

  • keep the newest turn lossless,
  • keep packet generation focused on active goal, constraints, decisions, tasks, blockers,
  • cover turn-suffix behavior with targeted tests.

Risk: early compaction triggers too often

Mitigation:

  • use threshold-crossing plus latch/cooldown behavior,
  • clear the latch only after compaction or a meaningful pressure drop,
  • test repeated turn_end events around the boundary.

Risk: summary filtering removes needed information

Mitigation:

  • filter only raw compactionSummary and branchSummary message roles,
  • keep summary-derived facts in the ledger,
  • keep persisted raw summaries in snapshots for debug and recovery.