diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 7bfd1416..9aada32e 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -16,7 +16,7 @@ Voyage is **pre-release** — not yet in production use. During pre-release: **Key architectural pattern — API Proxy**: The frontend never calls the Django backend directly. All API calls go to `src/routes/api/[...path]/+server.ts`, which proxies requests to the Django server (`http://server:8000`), injecting CSRF tokens and managing session cookies. This means frontend fetches use relative URLs like `/api/locations/`. -**AI Chat**: The AI travel chat assistant is embedded in Collections → Recommendations (component: `AITravelChat.svelte`). There is no standalone `/chat` route. Chat providers are loaded dynamically from `GET /api/chat/providers/` (backed by LiteLLM runtime list + custom entries like `opencode_zen`). Chat conversations stream via SSE through `/api/chat/conversations/`. Provider config lives in `backend/server/chat/llm_client.py` (`CHAT_PROVIDER_CONFIG`). Default AI provider/model saved via `UserAISettings` in DB (authoritative over browser localStorage). Chat composer supports per-provider model override via dropdown selector fed by `GET /api/chat/providers/{provider}/models/` (persisted in browser `localStorage` key `voyage_chat_model_prefs`). Collection chats inject multi-stop itinerary context and the system prompt guides `get_trip_details`-first reasoning. LiteLLM errors are mapped to sanitized user-safe messages via `_safe_error_payload()` (never exposes raw exception text). Invalid tool calls (missing required args) are detected and short-circuited with a user-visible error — not replayed into history. +**AI Chat**: The AI travel chat assistant is embedded in Collections → Recommendations (component: `AITravelChat.svelte`). There is no standalone `/chat` route. Chat providers are loaded dynamically from `GET /api/chat/providers/` (backed by LiteLLM runtime list + custom entries like `opencode_zen`). Chat conversations stream via SSE through `/api/chat/conversations/`. Provider config lives in `backend/server/chat/llm_client.py` (`CHAT_PROVIDER_CONFIG`). Default AI provider/model saved via `UserAISettings` in DB (authoritative over browser localStorage). Chat composer supports per-provider model override via dropdown selector fed by `GET /api/chat/providers/{provider}/models/` (persisted in browser `localStorage` key `voyage_chat_model_prefs`). Collection chats inject collection UUID + multi-stop itinerary context; system prompt guides `get_trip_details`-first reasoning and confirms only before first `add_to_itinerary`. LiteLLM errors are mapped to sanitized user-safe messages via `_safe_error_payload()` (never exposes raw exception text). Invalid tool calls (missing required args) are detected and short-circuited with a user-visible error — not replayed into history. Tool outputs render as concise summaries (not raw JSON); `role=tool` messages are hidden from display and reconstructed on reload via `rebuildConversationMessages()`. **Services** (docker-compose): - `web` → SvelteKit frontend at `:8015` diff --git a/.memory/decisions.md b/.memory/decisions.md index 7ba75a1c..bbf1425b 100644 --- a/.memory/decisions.md +++ b/.memory/decisions.md @@ -296,3 +296,79 @@ - **Prior findings**: hardcoded `gpt-4o-mini` WARNING (decisions.md:224) confirmed resolved. `_safe_error_payload` sanitization guardrail (decisions.md:120) confirmed satisfied. - **Reference**: See [Plan: Chat provider fixes](plans/chat-provider-fixes.md#suggestion-add-flow) - **Date**: 2026-03-09 + +## Correctness Review: chat-tool-grounding-and-confirmation +- **Verdict**: APPROVED (score 3) +- **Lens**: Correctness +- **Scope**: UUID grounding in trip context, reduced re-confirmation behavior in system prompt, error wording alignment with required-arg short-circuit regex. +- **Files reviewed**: `backend/server/chat/views/__init__.py` (lines 255-296, 135-153), `backend/server/chat/llm_client.py` (lines 322-350), `backend/server/chat/agent_tools.py` (lines 319-406, 590-618) +- **Acceptance criteria verification**: + - AC1 (grounded UUID): ✅ — `views/__init__.py:256-259` injects validated `collection.id` into system prompt `## Trip Context` with explicit tool-usage instruction ("use this exact collection_id for get_trip_details and add_to_itinerary"). Collection validated for ownership/sharing at lines 242-253. + - AC2 (reduced re-confirmation): ✅ — `llm_client.py:340-341` provides two-phase instruction: confirm before first `add_to_itinerary`, then proceed directly after approval phrases. Prompt-level instruction is the correct approach (hard-coded confirmation state would be fragile). + - AC3 (error wording alignment): ✅ — All error strings traced through `_is_required_param_tool_error`: + - `"dates is required"` (agent_tools.py:603) → matches regex. **Closes prior known gap** (decisions.md:166, tester:183). + - `"collection_id is required"` (agent_tools.py:322) → matches regex. Correct. + - `"collection_id is required and must reference a trip you can access"` (agent_tools.py:402) → does NOT match `fullmatch` regex. Correct — this is an invalid-value error, not a missing-param error; should NOT trigger short-circuit. + - No false positives introduced. No successful tool flows degraded. +- **Findings**: + - WARNING: [agent_tools.py:401-403] Semantic ambiguity in `get_trip_details` DoesNotExist error: `"collection_id is required and must reference a trip you can access"` conflates missing-param and invalid-value failure modes. The prefix "collection_id is required" may mislead the LLM into thinking it omitted the parameter rather than supplied a wrong one, reducing chance it retries with the grounded UUID from context. Compare `add_to_itinerary` DoesNotExist which returns the clearer `"Trip not found"`. A better message: `"No accessible trip found for the given collection_id"`. (confidence: MEDIUM) +- **Suggestions**: (1) Reword `get_trip_details` DoesNotExist to `"No accessible trip found for the given collection_id"` for clearer LLM self-correction. (2) `get_trip_details` only filters `user=user` (not `shared_with`) — shared users will get DoesNotExist despite having `send_message` access. Pre-existing, now more visible with UUID grounding. (3) Malformed UUID strings fall to generic "unexpected error" handler — a `ValidationError` catch returning `"collection_id must be a valid UUID"` would improve LLM self-correction. Pre-existing. +- **No regressions**: `_build_llm_messages` orphan trimming intact. Streaming loop structure unchanged. `MAX_TOOL_ITERATIONS` guard intact. +- **Prior findings**: `get_weather` "dates must be a non-empty list" gap (decisions.md:166) now **RESOLVED** — changed to "dates is required". Multi-tool orphan fixes (decisions.md:272-281) confirmed intact. +- **Reference**: See [Plan: Chat provider fixes](plans/chat-provider-fixes.md#chat-tool-grounding-and-confirmation) +- **Date**: 2026-03-09 + +## Correctness Review: embedded-chat-ux-polish +- **Verdict**: CHANGES-REQUESTED (score 3) +- **Lens**: Correctness +- **Scope**: Embedded chat header de-crowding (settings dropdown), height constraints, sidebar accessibility, streaming indicator visibility, and visual language preservation. +- **File reviewed**: `frontend/src/lib/components/AITravelChat.svelte` +- **Acceptance criteria**: + - AC1 (header de-crowded): ✅ — Provider/model selectors moved into `
` gear-icon dropdown, leaving header with only toggle + title + ⚙️ button. + - AC2 (layout stability): ✅ — `h-[65vh]` with `min-h-[30rem]`/`max-h-[46rem]` bounds. Embedded uses `bg-base-100` + border (softer treatment). Quick-action chips use `btn-xs` + `overflow-x-auto` for embedded. + - AC3 (streaming indicator visible): ✅ — Indicator inside last assistant bubble, conditioned on `isStreaming && msg.id === lastVisibleMessageId`. Visible throughout entire generation, not just before first token. + - AC4 (existing features preserved): ✅ — All tool result rendering, conversation management, date selector modal, quick actions, send button states intact. +- **Findings**: + - WARNING: [AITravelChat.svelte:61,624] `sidebarOpen` defaults to `true`; sidebar uses fixed `w-60` inline layout. On narrow/mobile viewports (≤640px) in embedded mode, sidebar consumes 240px leaving ≈135px for chat content — functionally unusable. Fix: `let sidebarOpen = !embedded;` or media-aware init. (confidence: HIGH) +- **Suggestions**: (1) `aria-label` values at lines 678 and 706 are hardcoded English — should use `$t()` per project i18n convention. (2) `
` dropdown doesn't auto-close on outside click, unlike focus-based dropdowns elsewhere in codebase — consider tabindex-based pattern or click-outside handler for consistency. +- **Next**: Set `sidebarOpen` default to `false` for embedded mode (e.g., `let sidebarOpen = !embedded;`). +- **Reference**: See [Plan: Chat provider fixes](plans/chat-provider-fixes.md#embedded-chat-ux-polish) +- **Date**: 2026-03-09 + +## Re-Review: embedded-chat-ux-polish — sidebar default fix +- **Verdict**: APPROVED (score 0) +- **Lens**: Correctness +- **Scope**: Targeted re-review of `sidebarOpen` initialization fix only. +- **File reviewed**: `frontend/src/lib/components/AITravelChat.svelte` +- **Finding resolution**: Original WARNING (`sidebarOpen` defaulting `true` in embedded mode, line 61→63) is resolved. Line 63 now reads `let sidebarOpen = !embedded;`, which initializes to `false` when `embedded=true`. Sidebar CSS at line 688 applies `hidden` when `sidebarOpen=false`, overridden by `lg:flex` on desktop — correct responsive pattern. Non-embedded mode unaffected (`!false = true`). No new issues introduced. +- **Reference**: See [Plan: Chat provider fixes](plans/chat-provider-fixes.md#embedded-chat-ux-polish) +- **Date**: 2026-03-09 + +## Re-Review: chat-tool-output-cleanup — tool_results reconstruction on reload +- **Verdict**: APPROVED (score 0) +- **Lens**: Correctness (targeted re-review) +- **Scope**: Fix for CRITICAL finding (decisions.md:262-267) — tool summaries and rich cards lost on conversation reload because `tool_results` was ephemeral and never reconstructed from persisted `role=tool` messages. +- **File reviewed**: `frontend/src/lib/components/AITravelChat.svelte` (lines 31-39, 271-340, 598) +- **Original finding status**: **RESOLVED**. `selectConversation()` now pipes `data.messages` through `rebuildConversationMessages()` (line 276), which iterates persisted messages, parses `role=tool` rows via `parseStoredToolResult()`, and attaches them as `tool_results` on the preceding assistant message. `visibleMessages` filter (line 598) still hides raw tool rows. Both streaming and reload paths now produce identical `tool_results` data. +- **Verification of fix correctness**: + - `ChatMessage` type (lines 36-37) adds `tool_calls?: Array<{ id?: string }>` and `tool_call_id?: string` — matches backend serializer fields exactly (`ChatMessageSerializer` returns `tool_calls`, `tool_call_id`, `name`). + - `rebuildConversationMessages` (lines 298-340): creates shallow copies (no input mutation), tracks `activeAssistant` for messages with non-empty `tool_calls`, attaches parsed tool results to assistant, auto-detaches when all expected results collected (`tool_results.length >= toolCallIds.length`). Correctly handles: (a) legacy data without `tool_call_id` (positional attachment), (b) `tool_call_id`-based matching when IDs are present, (c) multi-tool-call assistant messages, (d) assistant messages without `tool_calls` (skipped). + - `parseStoredToolResult` (lines 280-296): guards on `role !== 'tool'`, uses `msg.name` from serializer, JSON.parse with graceful fallback on non-JSON content. No null dereference risks. + - Streaming path (lines 432-438) independently populates `tool_results` during live SSE — no interference with reload path. +- **No new issues introduced**: No async misuse, no null dereference, no off-by-one, no mutation of shared state, no contract mismatch with backend serializer. +- **Reference**: See [Plan: Chat provider fixes](plans/chat-provider-fixes.md#chat-tool-output-cleanup), original CRITICAL at decisions.md:262-267 +- **Date**: 2026-03-09 + +## Tester Validation: embedded-chat-ux-polish +- **Status**: PASS (Both Standard + Adversarial passes) +- **Scope**: Sidebar default closed for embedded mode, compact header with settings dropdown, bounded height, chip scroll behavior, streaming indicator visibility. +- **Key findings**: + - `sidebarOpen = !embedded` (line 63) correctly initializes to `false` in embedded mode; `lg:flex` on sidebar ensures always-visible on desktop as intended — correct responsive pattern. + - `lastVisibleMessageId` reactive (`$:`) — no stale-indicator risk during streaming. + - All i18n keys used in header/settings dropdown confirmed present in `en.json`. + - `
` dropdown does not auto-close on outside click — UX inconvenience, not a defect. + - `aria-label` at lines 743 and 771 are hardcoded English (i18n convention violation, low severity). +- **MUTATION_ESCAPES**: 0/4 +- **Residual**: Two low-priority follow-ups (aria-label i18n, dropdown outside-click behavior) — not blocking. +- **Reference**: See [Plan: Chat provider fixes](plans/chat-provider-fixes.md#tester-validation--embedded-chat-ux-polish) +- **Date**: 2026-03-09 diff --git a/.memory/knowledge/overview.md b/.memory/knowledge/overview.md index 0d68e7e3..51cb9053 100644 --- a/.memory/knowledge/overview.md +++ b/.memory/knowledge/overview.md @@ -8,10 +8,12 @@ Frontend never calls Django directly. All API calls go through `src/routes/api/[ - Provider selector loads dynamically from `GET /api/chat/providers/` (backed by `litellm.provider_list` + `CHAT_PROVIDER_CONFIG` in `backend/server/chat/llm_client.py`). - Supported configured providers: OpenAI, Anthropic, Gemini, Ollama, Groq, Mistral, GitHub Models, OpenRouter, OpenCode Zen (`opencode_zen`, `api_base=https://opencode.ai/zen/v1`, default model `openai/gpt-5-nano`). - Chat conversations stream via SSE through `/api/chat/conversations/`. -- `ChatViewSet.send_message()` accepts optional context fields (`collection_id`, `collection_name`, `start_date`, `end_date`, `destination`) and appends a `## Trip Context` section to the system prompt when provided. When a `collection_id` is present, also injects `Itinerary stops:` from `collection.locations` (up to 8 unique stops). See [patterns/chat-and-llm.md](patterns/chat-and-llm.md#multi-stop-context-derivation). +- `ChatViewSet.send_message()` accepts optional context fields (`collection_id`, `collection_name`, `start_date`, `end_date`, `destination`) and appends a `## Trip Context` section to the system prompt when provided. When a `collection_id` is present, also injects `Itinerary stops:` from `collection.locations` (up to 8 unique stops) and the collection UUID with explicit `get_trip_details`/`add_to_itinerary` grounding. See [patterns/chat-and-llm.md](patterns/chat-and-llm.md#trip-context-uuid-grounding) and [patterns/chat-and-llm.md](patterns/chat-and-llm.md#multi-stop-context-derivation). - Chat composer supports per-provider model override (persisted in browser `localStorage` key `voyage_chat_model_prefs`). DB-saved default provider/model (`UserAISettings`) is authoritative on initialization; localStorage is write-only sync target. Backend `send_message` accepts optional `model` param; falls back to DB defaults → instance defaults → `"openai"`. - Invalid required-argument tool calls are detected and short-circuited: stream terminates with `tool_validation_error` SSE event + `[DONE]` and invalid tool results are not replayed into conversation history. See [patterns/chat-and-llm.md](patterns/chat-and-llm.md#tool-call-error-handling-chat-loop-hardening). - LiteLLM errors mapped to sanitized user-safe messages via `_safe_error_payload()` (never exposes raw exception text). See [patterns/chat-and-llm.md](patterns/chat-and-llm.md#sanitized-llm-error-mapping). +- Tool outputs display as concise summaries (not raw JSON) via `getToolSummary()`. Persisted `role=tool` messages are hidden from display; on conversation reload, `rebuildConversationMessages()` reconstructs `tool_results` on assistant messages. See [patterns/chat-and-llm.md](patterns/chat-and-llm.md#tool-output-rendering). +- Embedded chat uses compact header (provider/model selectors in settings dropdown), bounded height, sidebar-closed-by-default, and visible streaming indicator. See [patterns/chat-and-llm.md](patterns/chat-and-llm.md#embedded-chat-ux). - Frontend type: `ChatProviderCatalogEntry` in `src/lib/types.ts`. - Reference: [Plan: AI travel agent](../plans/ai-travel-agent-collections-integration.md), [Plan: AI travel agent redesign — WS4](../plans/ai-travel-agent-redesign.md#ws4-collection-level-chat-improvements) diff --git a/.memory/knowledge/patterns/chat-and-llm.md b/.memory/knowledge/patterns/chat-and-llm.md index 4bf33fe8..4f7859dc 100644 --- a/.memory/knowledge/patterns/chat-and-llm.md +++ b/.memory/knowledge/patterns/chat-and-llm.md @@ -33,8 +33,36 @@ - **Persistence skip**: Invalid tool call results (and the tool_call entry itself) are NOT persisted to the database, preventing replay into future conversation turns. - **Historical cleanup**: `_build_llm_messages()` filters persisted tool-role messages containing required-param errors AND trims the corresponding assistant `tool_calls` array to only IDs that have non-filtered tool messages. Empty `tool_calls` arrays are omitted entirely. - **Multi-tool partial success**: When model returns N tool calls and call K fails, calls 1..K-1 (the successful prefix) are persisted normally. Only the failed call and subsequent calls are dropped. -- **Tool iteration guard**: `MAX_TOOL_ITERATIONS = 10` with correctly-incremented counter prevents unbounded loops from other error classes (e.g. `"dates must be a non-empty list"` from `get_weather` does NOT match the required-arg regex but is bounded by iteration limit). -- **Known gap**: `get_weather` error `"dates must be a non-empty list"` does not trigger the short-circuit — mitigated by `MAX_TOOL_ITERATIONS`. +- **Tool iteration guard**: `MAX_TOOL_ITERATIONS = 10` with correctly-incremented counter prevents unbounded loops from non-required-arg error classes that don't match the regex. +- **Resolved gap**: `get_weather` error was changed from `"dates must be a non-empty list"` to `"dates is required"` — now matches the regex and triggers the short-circuit. Resolved 2026-03-09. + +## Trip Context UUID Grounding +- `send_message()` injects the active collection UUID into the system prompt `## Trip Context` section with explicit instruction: `"use this exact collection_id for get_trip_details and add_to_itinerary"`. +- UUID injection only occurs when collection lookup succeeds AND user is owner or `shared_with` member (authorization gate). +- System prompt includes two-phase confirmation guidance: confirm only before the **first** `add_to_itinerary` action; after explicit user approval phrases ("yes", "go ahead", "add them"), proceed directly without re-confirming. +- `get_trip_details` DoesNotExist returns `"collection_id is required and must reference a trip you can access"` (does NOT match short-circuit regex due to `fullmatch` — correct, this is an invalid-value error, not missing-param). +- Known pre-existing: `get_trip_details` filters `user=user` only — shared-collection members get UUID context but tool returns DoesNotExist. Low severity. + +## Tool Output Rendering +- Frontend `AITravelChat.svelte` hides raw `role=tool` messages via `visibleMessages` filter (`messages.filter(msg => msg.role !== 'tool')`). +- Tool results render as concise user-facing summaries via `getToolSummary()`: + - `get_trip_details` → "Loaded details for {name} ({N} itinerary items)." + - `list_trips` → "Found {N} trip(s)." + - `add_to_itinerary` → "Added {name} to itinerary." + - `get_weather` → "Retrieved weather data." + - `search_places` / `web_search` → existing rich cards (place cards, linked cards). + - Error payloads → "{name} could not be completed." (no raw JSON). + - Unknown tools → generic fallback. +- **Reload reconstruction**: `rebuildConversationMessages()` scans persisted messages after conversation load, parses `role=tool` rows via `parseStoredToolResult()`, and attaches them as `tool_results` on the preceding assistant message (matched by `tool_call_id`). Both streaming and reload paths produce identical `tool_results` data. +- Text rendered via Svelte text interpolation (not `{@html}`), so LLM-sourced names are auto-escaped (no XSS vector). + +## Embedded Chat UX +- Provider/model selectors moved into a compact `
` gear-icon dropdown in the header — header contains only hamburger toggle + title + settings gear. +- Embedded mode uses bounded height: `h-[65vh]` with `min-h-[30rem]` / `max-h-[46rem]`; softened card treatment (`bg-base-100` + border). +- Sidebar defaults to closed in embedded mode (`let sidebarOpen = !embedded;`); `lg:flex` ensures always-visible on desktop. +- Quick-action chips use `btn-xs` + `overflow-x-auto` for compact embedded fit. +- Streaming indicator visible inside last assistant bubble throughout entire generation (conditioned on `isStreaming && msg.id === lastVisibleMessageId`). +- Known low-priority: `aria-label` values on sidebar toggle and settings button are hardcoded English (should use `$t()`). `
` dropdown does not auto-close on outside click. ## OpenCode Zen Provider - Provider ID: `opencode_zen` diff --git a/.memory/manifest.yaml b/.memory/manifest.yaml index 384dbd0d..b35db099 100644 --- a/.memory/manifest.yaml +++ b/.memory/manifest.yaml @@ -74,7 +74,7 @@ categories: group: sessions - path: plans/chat-provider-fixes.md - description: "Chat provider fixes plan (COMPLETE) — chat-loop-hardening, default-ai-settings, suggestion-add-flow workstreams with full review/test records" + description: "Chat provider fixes plan (COMPLETE) — chat-loop-hardening, default-ai-settings, suggestion-add-flow, chat-tool-grounding-and-confirmation, chat-tool-output-cleanup, embedded-chat-ux-polish workstreams with full review/test records" group: plans # Deprecated (content migrated) diff --git a/.memory/plans/chat-provider-fixes.md b/.memory/plans/chat-provider-fixes.md index 23049b8d..3a2a280f 100644 --- a/.memory/plans/chat-provider-fixes.md +++ b/.memory/plans/chat-provider-fixes.md @@ -246,3 +246,151 @@ Alternative (Vercel AI SDK): **Cleanup required:** Two test artifact files left on host (not git-tracked, safe to delete): - `/home/alex/projects/voyage/test_suggestion_flow.py` - `/home/alex/projects/voyage/suggestion-modal-error-state.png` + +## Completion Note — `embedded-chat-ux-polish` (2026-03-09) + +- Updated `frontend/src/lib/components/AITravelChat.svelte` embedded UX only: moved provider/model selectors into a compact header settings dropdown, reduced embedded sidebar width, and added sidebar toggle accessibility attributes (`aria-label`, `aria-expanded`, `aria-controls`). +- Replaced rigid embedded height (`h-[70vh]`) with a bounded strategy (`h-[65vh]` + min/max constraints) and softened embedded card treatment for better fit in recommendations layouts across desktop/mobile. +- Kept streaming status visible throughout generation (not only before first token) and tightened embedded quick-action/input alignment with compact chip sizing + scrollable chip row behavior. + +## Completion Note — `chat-tool-output-cleanup` (2026-03-09) + +- Updated `frontend/src/lib/components/AITravelChat.svelte` to suppress standalone rendering of persisted `role=tool` messages, so reloaded conversations no longer surface raw tool payload rows. +- Replaced inline raw-JSON fallback rendering with concise user-facing summaries for `get_trip_details`, `list_trips`, `add_to_itinerary`, and `get_weather`, while keeping existing rich cards for `search_places` and `web_search`. +- Added safe error summarization for inline tool results so tool error payloads no longer display raw JSON in the normal chat UI. + +## Review Verdict — `chat-tool-output-cleanup` (2026-03-09) + +### STATUS: CHANGES-REQUESTED (score 13) + +**CRITICAL: Tool summaries and rich cards lost on conversation reload** (`AITravelChat.svelte:534,782`). `tool_results` is a frontend-only ephemeral field populated exclusively during SSE streaming (line 373). When a conversation is reloaded via `selectConversation()`, the backend serializer returns `role=tool` messages with raw payloads, but the new `visibleMessages` filter (line 534) hides them. No reconstruction step maps persisted `role=tool` messages back onto their preceding assistant message's `tool_results` array. Result: after page refresh or conversation switch, all tool activity indicators (summaries, search_places cards, web_search links) silently vanish. The user sees only the assistant's text with no tool context. (confidence: HIGH) + +**WARNING: Acceptance criterion partially unmet — "reloaded conversations do not expose raw tool payloads"** is satisfied (filter works), but the related user expectation that tool activity "remains understandable" on reload is violated because no tool indicators appear at all on reloaded conversations. + +**What was checked and confirmed safe:** +- `visibleMessages` filter correctly excludes `role=tool` messages from display (line 534). No raw JSON blobs shown during streaming or on reload. +- `getToolSummary()` logic is safe: uses Svelte text interpolation (not `{@html}`), so LLM-sourced names (trip names, location names) are auto-escaped. No XSS vector. +- Error tool results render a generic "could not be completed" message rather than raw error JSON. Correct and safe. +- Streaming state management is correct: `streamingContent` reset on each send (line 302), `isStreaming` cleared in `finally` (line 387). No stale state. +- `lastVisibleMessageId` correctly tracks the last visible (non-tool) message for the streaming indicator. +- `asRecord()` null guard is correct — rejects null, arrays, and non-objects. +- Fallback summary for unknown tool names (line 599-602) is generic and safe. + +**NEXT (fix actions):** +In `selectConversation()`, after loading `data.messages`, reconstruct `tool_results` on each assistant message by scanning the immediately following `role=tool` messages (which share `tool_call_id` with the assistant's `tool_calls` entries). For each tool message, parse its `content` (JSON string from `serialize_tool_result`), extract the tool `name` from the message's `name` field, and push a `ToolResultEntry` onto the preceding assistant message's `tool_results`. This ensures summaries and rich cards appear on reload. The `visibleMessages` filter continues to hide the raw tool rows. + +## Tester Validation — `chat-tool-output-cleanup` (2026-03-09) + +### STATUS: PASS + +**Evidence from lead (runtime):** Page reload of seeded conversation with persisted `get_trip_details` assistant tool call + `role=tool` result showed `🗺️ Loaded details for test (0 itinerary items).` — no raw JSON. Sidebar remained functional. Reviewer-APPROVED follow-up fix confirmed implemented and working. + +**Standard pass findings:** + +- `visibleMessages` filter (`messages.filter(msg => msg.role !== 'tool')`) correctly suppresses raw `role=tool` rows from display. Live DOM scan of 10 chat bubbles across two conversations found zero raw JSON blobs (`"itinerary":`, `"tool_call_id":` patterns absent). +- `rebuildConversationMessages()` scans messages in one pass: sets `activeAssistant` on each assistant-with-tool-calls message; attaches subsequent `role=tool` rows as `ToolResultEntry` objects matched via `tool_call_id`. `activeAssistant` overridden on each new assistant message, preventing cross-turn leakage. +- `parseStoredToolResult()` JSON-parses tool content; falls back to raw string on failure. Both paths produce a valid `ToolResultEntry` — no crash. +- `getToolSummary()` produces human-readable summaries for `get_trip_details`, `list_trips`, `add_to_itinerary`, `get_weather`; generic fallback for unknown tools. Error payloads render `" could not be completed."` — no raw JSON. +- Backend `ChatMessageSerializer` confirmed to include `name`, `tool_call_id`, and `tool_calls` fields required for reconstruction. +- Multi-turn live conversation validated: `⚠️ get trip details could not be completed.` + `🧳 Found 1 trip.` + `🗺️ Loaded details for test (6 itinerary items).` — all clean summaries, no raw JSON. +- Text-only conversation (no tool calls) unaffected — loads correctly with zero tool artifacts. +- Frontend build: `bun run lint`, `bun run check`, `bun run build` all passed (per lead). + +**Adversarial pass findings (7 hypotheses, all safe):** + +1. **Hypothesis: `assistant.tool_calls` with null IDs causes cross-turn leakage.** When `toolCallIds=[]`, the guard `msg.tool_call_id && toolCallIds.length > 0 && !includes` short-circuits at `length=0` → tool IS attached (permissive loose match). But the next `assistant` message overrides `activeAssistant` before its own tool rows, so no cross-turn pollution occurs. **Acceptable; null IDs cannot arise from correctly persisted backend rows.** +2. **Hypothesis: orphaned `role=tool` after non-tool-call assistant attaches to wrong message.** `activeAssistant=null` when `tool_calls` absent/empty. Tool row skipped. **Not vulnerable.** +3. **Hypothesis: malformed JSON in tool content crashes reconstruction.** Try/catch fallback returns `result: rawString`. `asRecord(string)` → `null`; `getToolSummary` hits generic branch. **Safe; no crash, no raw JSON exposed.** +4. **Hypothesis: `name=null` on tool message causes downstream crash.** `msg.name || 'tool'` guard → `'tool'`. Generic fallback renders `"tool completed."` **Safe.** +5. **Hypothesis: multi-tool assistant reconstructs both in correct order.** Both `call_A` and `call_B` rows attach to same assistant; `activeAssistant` cleared after count reaches `toolCallIds.length`. **Verified: 2 results attached in correct order.** +6. **Hypothesis: empty `messages` array crashes.** Returns `[]` immediately. **Safe.** +7. **Hypothesis: `role=tool` before any assistant crashes or attaches to user message.** `activeAssistant=null` at start; tool row skipped. **Safe.** + +**MUTATION_ESCAPES: 1/7** — The `toolCallIds.length > 0` guard in the clear condition means an assistant with all-null tool_call IDs never has `activeAssistant` cleared post-attachment. A second stray tool row would attach to the same assistant. Extremely low practical likelihood (backend always persists real IDs from LiteLLM); no production scenario produces this DB state. + +**FLAKY: 0** + +**COVERAGE: N/A** — No automated frontend test suite for `AITravelChat.svelte`. All validation via in-browser function evaluation (7 unit-level cases) + visual browser confirmation. Recommended follow-up: Playwright e2e test seeding a conversation with `role=tool` rows and verifying summary cards render on reload. + +**Screenshot evidence:** Captured `tool-summary-reload-verification.png` — showed `Tool summary reload test` conversation with assistant text + `🗺️ Loaded details for test (0 itinerary items).` summary card, no raw JSON. Screenshot deleted post-verification (artifact not git-tracked). + +## Tester Validation — `embedded-chat-ux-polish` (2026-03-09) + +### STATUS: PASS + +**Lead evidence accepted:** +- `bun run lint`, `bun run check` (0 errors, 6 pre-existing warnings), and `bun run build` all passed. +- Browser-validated: embedded chat opens with sidebar closed, compact header (`Show conversations` toggle + title + ⚙️ gear), recommendations area remains visible. Sidebar toggle opens conversation list correctly. +- Reviewer APPROVED after sidebar-default follow-up fix (`let sidebarOpen = !embedded;` at line 63 confirmed in code). + +**Standard pass findings (code inspection):** + +- AC1 (header de-crowded): ✅ Provider/model selectors moved into `