fix(chat): clean up tool output and embedded UX

2026-03-09 21:12:46 +00:00
parent bb54503235
commit d8c8ecf2bd
13 changed files with 588 additions and 198 deletions
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@@ -16,7 +16,7 @@ Voyage is **pre-release** — not yet in production use. During pre-release:

 **Key architectural pattern — API Proxy**: The frontend never calls the Django backend directly. All API calls go to `src/routes/api/[...path]/+server.ts`, which proxies requests to the Django server (`http://server:8000`), injecting CSRF tokens and managing session cookies. This means frontend fetches use relative URLs like `/api/locations/`.

-**AI Chat**: The AI travel chat assistant is embedded in Collections → Recommendations (component: `AITravelChat.svelte`). There is no standalone `/chat` route. Chat providers are loaded dynamically from `GET /api/chat/providers/` (backed by LiteLLM runtime list + custom entries like `opencode_zen`). Chat conversations stream via SSE through `/api/chat/conversations/`. Provider config lives in `backend/server/chat/llm_client.py` (`CHAT_PROVIDER_CONFIG`). Default AI provider/model saved via `UserAISettings` in DB (authoritative over browser localStorage). Chat composer supports per-provider model override via dropdown selector fed by `GET /api/chat/providers/{provider}/models/` (persisted in browser `localStorage` key `voyage_chat_model_prefs`). Collection chats inject multi-stop itinerary context and the system prompt guides `get_trip_details`-first reasoning. LiteLLM errors are mapped to sanitized user-safe messages via `_safe_error_payload()` (never exposes raw exception text). Invalid tool calls (missing required args) are detected and short-circuited with a user-visible error — not replayed into history.
+**AI Chat**: The AI travel chat assistant is embedded in Collections → Recommendations (component: `AITravelChat.svelte`). There is no standalone `/chat` route. Chat providers are loaded dynamically from `GET /api/chat/providers/` (backed by LiteLLM runtime list + custom entries like `opencode_zen`). Chat conversations stream via SSE through `/api/chat/conversations/`. Provider config lives in `backend/server/chat/llm_client.py` (`CHAT_PROVIDER_CONFIG`). Default AI provider/model saved via `UserAISettings` in DB (authoritative over browser localStorage). Chat composer supports per-provider model override via dropdown selector fed by `GET /api/chat/providers/{provider}/models/` (persisted in browser `localStorage` key `voyage_chat_model_prefs`). Collection chats inject collection UUID + multi-stop itinerary context; system prompt guides `get_trip_details`-first reasoning and confirms only before first `add_to_itinerary`. LiteLLM errors are mapped to sanitized user-safe messages via `_safe_error_payload()` (never exposes raw exception text). Invalid tool calls (missing required args) are detected and short-circuited with a user-visible error — not replayed into history. Tool outputs render as concise summaries (not raw JSON); `role=tool` messages are hidden from display and reconstructed on reload via `rebuildConversationMessages()`.

 **Services** (docker-compose):
 - `web` → SvelteKit frontend at `:8015`
--- a/.memory/decisions.md
+++ b/.memory/decisions.md
@@ -296,3 +296,79 @@
 - **Prior findings**: hardcoded `gpt-4o-mini` WARNING (decisions.md:224) confirmed resolved. `_safe_error_payload` sanitization guardrail (decisions.md:120) confirmed satisfied.
 - **Reference**: See [Plan: Chat provider fixes](plans/chat-provider-fixes.md#suggestion-add-flow)
 - **Date**: 2026-03-09
+
+## Correctness Review: chat-tool-grounding-and-confirmation
+- **Verdict**: APPROVED (score 3)
+- **Lens**: Correctness
+- **Scope**: UUID grounding in trip context, reduced re-confirmation behavior in system prompt, error wording alignment with required-arg short-circuit regex.
+- **Files reviewed**: `backend/server/chat/views/__init__.py` (lines 255-296, 135-153), `backend/server/chat/llm_client.py` (lines 322-350), `backend/server/chat/agent_tools.py` (lines 319-406, 590-618)
+- **Acceptance criteria verification**:
+  - AC1 (grounded UUID): ✅ — `views/__init__.py:256-259` injects validated `collection.id` into system prompt `## Trip Context` with explicit tool-usage instruction ("use this exact collection_id for get_trip_details and add_to_itinerary"). Collection validated for ownership/sharing at lines 242-253.
+  - AC2 (reduced re-confirmation): ✅ — `llm_client.py:340-341` provides two-phase instruction: confirm before first `add_to_itinerary`, then proceed directly after approval phrases. Prompt-level instruction is the correct approach (hard-coded confirmation state would be fragile).
+  - AC3 (error wording alignment): ✅ — All error strings traced through `_is_required_param_tool_error`:
+    - `"dates is required"` (agent_tools.py:603) → matches regex. **Closes prior known gap** (decisions.md:166, tester:183).
+    - `"collection_id is required"` (agent_tools.py:322) → matches regex. Correct.
+    - `"collection_id is required and must reference a trip you can access"` (agent_tools.py:402) → does NOT match `fullmatch` regex. Correct — this is an invalid-value error, not a missing-param error; should NOT trigger short-circuit.
+    - No false positives introduced. No successful tool flows degraded.
+- **Findings**:
+  - WARNING: [agent_tools.py:401-403] Semantic ambiguity in `get_trip_details` DoesNotExist error: `"collection_id is required and must reference a trip you can access"` conflates missing-param and invalid-value failure modes. The prefix "collection_id is required" may mislead the LLM into thinking it omitted the parameter rather than supplied a wrong one, reducing chance it retries with the grounded UUID from context. Compare `add_to_itinerary` DoesNotExist which returns the clearer `"Trip not found"`. A better message: `"No accessible trip found for the given collection_id"`. (confidence: MEDIUM)
+- **Suggestions**: (1) Reword `get_trip_details` DoesNotExist to `"No accessible trip found for the given collection_id"` for clearer LLM self-correction. (2) `get_trip_details` only filters `user=user` (not `shared_with`) — shared users will get DoesNotExist despite having `send_message` access. Pre-existing, now more visible with UUID grounding. (3) Malformed UUID strings fall to generic "unexpected error" handler — a `ValidationError` catch returning `"collection_id must be a valid UUID"` would improve LLM self-correction. Pre-existing.
+- **No regressions**: `_build_llm_messages` orphan trimming intact. Streaming loop structure unchanged. `MAX_TOOL_ITERATIONS` guard intact.
+- **Prior findings**: `get_weather` "dates must be a non-empty list" gap (decisions.md:166) now **RESOLVED** — changed to "dates is required". Multi-tool orphan fixes (decisions.md:272-281) confirmed intact.
+- **Reference**: See [Plan: Chat provider fixes](plans/chat-provider-fixes.md#chat-tool-grounding-and-confirmation)
+- **Date**: 2026-03-09
+
+## Correctness Review: embedded-chat-ux-polish
+- **Verdict**: CHANGES-REQUESTED (score 3)
+- **Lens**: Correctness
+- **Scope**: Embedded chat header de-crowding (settings dropdown), height constraints, sidebar accessibility, streaming indicator visibility, and visual language preservation.
+- **File reviewed**: `frontend/src/lib/components/AITravelChat.svelte`
+- **Acceptance criteria**:
+  - AC1 (header de-crowded): ✅ — Provider/model selectors moved into `<details>` gear-icon dropdown, leaving header with only toggle + title + ⚙️ button.
+  - AC2 (layout stability): ✅ — `h-[65vh]` with `min-h-[30rem]`/`max-h-[46rem]` bounds. Embedded uses `bg-base-100` + border (softer treatment). Quick-action chips use `btn-xs` + `overflow-x-auto` for embedded.
+  - AC3 (streaming indicator visible): ✅ — Indicator inside last assistant bubble, conditioned on `isStreaming && msg.id === lastVisibleMessageId`. Visible throughout entire generation, not just before first token.
+  - AC4 (existing features preserved): ✅ — All tool result rendering, conversation management, date selector modal, quick actions, send button states intact.
+- **Findings**:
+  - WARNING: [AITravelChat.svelte:61,624] `sidebarOpen` defaults to `true`; sidebar uses fixed `w-60` inline layout. On narrow/mobile viewports (≤640px) in embedded mode, sidebar consumes 240px leaving ≈135px for chat content — functionally unusable. Fix: `let sidebarOpen = !embedded;` or media-aware init. (confidence: HIGH)
+- **Suggestions**: (1) `aria-label` values at lines 678 and 706 are hardcoded English — should use `$t()` per project i18n convention. (2) `<details>` dropdown doesn't auto-close on outside click, unlike focus-based dropdowns elsewhere in codebase — consider tabindex-based pattern or click-outside handler for consistency.
+- **Next**: Set `sidebarOpen` default to `false` for embedded mode (e.g., `let sidebarOpen = !embedded;`).
+- **Reference**: See [Plan: Chat provider fixes](plans/chat-provider-fixes.md#embedded-chat-ux-polish)
+- **Date**: 2026-03-09
+
+## Re-Review: embedded-chat-ux-polish — sidebar default fix
+- **Verdict**: APPROVED (score 0)
+- **Lens**: Correctness
+- **Scope**: Targeted re-review of `sidebarOpen` initialization fix only.
+- **File reviewed**: `frontend/src/lib/components/AITravelChat.svelte`
+- **Finding resolution**: Original WARNING (`sidebarOpen` defaulting `true` in embedded mode, line 61→63) is resolved. Line 63 now reads `let sidebarOpen = !embedded;`, which initializes to `false` when `embedded=true`. Sidebar CSS at line 688 applies `hidden` when `sidebarOpen=false`, overridden by `lg:flex` on desktop — correct responsive pattern. Non-embedded mode unaffected (`!false = true`). No new issues introduced.
+- **Reference**: See [Plan: Chat provider fixes](plans/chat-provider-fixes.md#embedded-chat-ux-polish)
+- **Date**: 2026-03-09
+
+## Re-Review: chat-tool-output-cleanup — tool_results reconstruction on reload
+- **Verdict**: APPROVED (score 0)
+- **Lens**: Correctness (targeted re-review)
+- **Scope**: Fix for CRITICAL finding (decisions.md:262-267) — tool summaries and rich cards lost on conversation reload because `tool_results` was ephemeral and never reconstructed from persisted `role=tool` messages.
+- **File reviewed**: `frontend/src/lib/components/AITravelChat.svelte` (lines 31-39, 271-340, 598)
+- **Original finding status**: **RESOLVED**. `selectConversation()` now pipes `data.messages` through `rebuildConversationMessages()` (line 276), which iterates persisted messages, parses `role=tool` rows via `parseStoredToolResult()`, and attaches them as `tool_results` on the preceding assistant message. `visibleMessages` filter (line 598) still hides raw tool rows. Both streaming and reload paths now produce identical `tool_results` data.
+- **Verification of fix correctness**:
+  - `ChatMessage` type (lines 36-37) adds `tool_calls?: Array<{ id?: string }>` and `tool_call_id?: string` — matches backend serializer fields exactly (`ChatMessageSerializer` returns `tool_calls`, `tool_call_id`, `name`).
+  - `rebuildConversationMessages` (lines 298-340): creates shallow copies (no input mutation), tracks `activeAssistant` for messages with non-empty `tool_calls`, attaches parsed tool results to assistant, auto-detaches when all expected results collected (`tool_results.length >= toolCallIds.length`). Correctly handles: (a) legacy data without `tool_call_id` (positional attachment), (b) `tool_call_id`-based matching when IDs are present, (c) multi-tool-call assistant messages, (d) assistant messages without `tool_calls` (skipped).
+  - `parseStoredToolResult` (lines 280-296): guards on `role !== 'tool'`, uses `msg.name` from serializer, JSON.parse with graceful fallback on non-JSON content. No null dereference risks.
+  - Streaming path (lines 432-438) independently populates `tool_results` during live SSE — no interference with reload path.
+- **No new issues introduced**: No async misuse, no null dereference, no off-by-one, no mutation of shared state, no contract mismatch with backend serializer.
+- **Reference**: See [Plan: Chat provider fixes](plans/chat-provider-fixes.md#chat-tool-output-cleanup), original CRITICAL at decisions.md:262-267
+- **Date**: 2026-03-09
+
+## Tester Validation: embedded-chat-ux-polish
+- **Status**: PASS (Both Standard + Adversarial passes)
+- **Scope**: Sidebar default closed for embedded mode, compact header with settings dropdown, bounded height, chip scroll behavior, streaming indicator visibility.
+- **Key findings**:
+  - `sidebarOpen = !embedded` (line 63) correctly initializes to `false` in embedded mode; `lg:flex` on sidebar ensures always-visible on desktop as intended — correct responsive pattern.
+  - `lastVisibleMessageId` reactive (`$:`) — no stale-indicator risk during streaming.
+  - All i18n keys used in header/settings dropdown confirmed present in `en.json`.
+  - `<details>` dropdown does not auto-close on outside click — UX inconvenience, not a defect.
+  - `aria-label` at lines 743 and 771 are hardcoded English (i18n convention violation, low severity).
+- **MUTATION_ESCAPES**: 0/4
+- **Residual**: Two low-priority follow-ups (aria-label i18n, dropdown outside-click behavior) — not blocking.
+- **Reference**: See [Plan: Chat provider fixes](plans/chat-provider-fixes.md#tester-validation--embedded-chat-ux-polish)
+- **Date**: 2026-03-09
--- a/.memory/knowledge/overview.md
+++ b/.memory/knowledge/overview.md
@@ -8,10 +8,12 @@ Frontend never calls Django directly. All API calls go through `src/routes/api/[
 - Provider selector loads dynamically from `GET /api/chat/providers/` (backed by `litellm.provider_list` + `CHAT_PROVIDER_CONFIG` in `backend/server/chat/llm_client.py`).
 - Supported configured providers: OpenAI, Anthropic, Gemini, Ollama, Groq, Mistral, GitHub Models, OpenRouter, OpenCode Zen (`opencode_zen`, `api_base=https://opencode.ai/zen/v1`, default model `openai/gpt-5-nano`).
 - Chat conversations stream via SSE through `/api/chat/conversations/`.
- `ChatViewSet.send_message()` accepts optional context fields (`collection_id`, `collection_name`, `start_date`, `end_date`, `destination`) and appends a `## Trip Context` section to the system prompt when provided. When a `collection_id` is present, also injects `Itinerary stops:` from `collection.locations` (up to 8 unique stops). See [patterns/chat-and-llm.md](patterns/chat-and-llm.md#multi-stop-context-derivation).
+- `ChatViewSet.send_message()` accepts optional context fields (`collection_id`, `collection_name`, `start_date`, `end_date`, `destination`) and appends a `## Trip Context` section to the system prompt when provided. When a `collection_id` is present, also injects `Itinerary stops:` from `collection.locations` (up to 8 unique stops) and the collection UUID with explicit `get_trip_details`/`add_to_itinerary` grounding. See [patterns/chat-and-llm.md](patterns/chat-and-llm.md#trip-context-uuid-grounding) and [patterns/chat-and-llm.md](patterns/chat-and-llm.md#multi-stop-context-derivation).
 - Chat composer supports per-provider model override (persisted in browser `localStorage` key `voyage_chat_model_prefs`). DB-saved default provider/model (`UserAISettings`) is authoritative on initialization; localStorage is write-only sync target. Backend `send_message` accepts optional `model` param; falls back to DB defaults → instance defaults → `"openai"`.
 - Invalid required-argument tool calls are detected and short-circuited: stream terminates with `tool_validation_error` SSE event + `[DONE]` and invalid tool results are not replayed into conversation history. See [patterns/chat-and-llm.md](patterns/chat-and-llm.md#tool-call-error-handling-chat-loop-hardening).
 - LiteLLM errors mapped to sanitized user-safe messages via `_safe_error_payload()` (never exposes raw exception text). See [patterns/chat-and-llm.md](patterns/chat-and-llm.md#sanitized-llm-error-mapping).
+- Tool outputs display as concise summaries (not raw JSON) via `getToolSummary()`. Persisted `role=tool` messages are hidden from display; on conversation reload, `rebuildConversationMessages()` reconstructs `tool_results` on assistant messages. See [patterns/chat-and-llm.md](patterns/chat-and-llm.md#tool-output-rendering).
+- Embedded chat uses compact header (provider/model selectors in settings dropdown), bounded height, sidebar-closed-by-default, and visible streaming indicator. See [patterns/chat-and-llm.md](patterns/chat-and-llm.md#embedded-chat-ux).
 - Frontend type: `ChatProviderCatalogEntry` in `src/lib/types.ts`.
 - Reference: [Plan: AI travel agent](../plans/ai-travel-agent-collections-integration.md), [Plan: AI travel agent redesign — WS4](../plans/ai-travel-agent-redesign.md#ws4-collection-level-chat-improvements)

--- a/.memory/knowledge/patterns/chat-and-llm.md
+++ b/.memory/knowledge/patterns/chat-and-llm.md
@@ -33,8 +33,36 @@
 - **Persistence skip**: Invalid tool call results (and the tool_call entry itself) are NOT persisted to the database, preventing replay into future conversation turns.
 - **Historical cleanup**: `_build_llm_messages()` filters persisted tool-role messages containing required-param errors AND trims the corresponding assistant `tool_calls` array to only IDs that have non-filtered tool messages. Empty `tool_calls` arrays are omitted entirely.
 - **Multi-tool partial success**: When model returns N tool calls and call K fails, calls 1..K-1 (the successful prefix) are persisted normally. Only the failed call and subsequent calls are dropped.
- **Tool iteration guard**: `MAX_TOOL_ITERATIONS = 10` with correctly-incremented counter prevents unbounded loops from other error classes (e.g. `"dates must be a non-empty list"` from `get_weather` does NOT match the required-arg regex but is bounded by iteration limit).
- **Known gap**: `get_weather` error `"dates must be a non-empty list"` does not trigger the short-circuit — mitigated by `MAX_TOOL_ITERATIONS`.
+- **Tool iteration guard**: `MAX_TOOL_ITERATIONS = 10` with correctly-incremented counter prevents unbounded loops from non-required-arg error classes that don't match the regex.
+- **Resolved gap**: `get_weather` error was changed from `"dates must be a non-empty list"` to `"dates is required"` — now matches the regex and triggers the short-circuit. Resolved 2026-03-09.
+
+## Trip Context UUID Grounding
+- `send_message()` injects the active collection UUID into the system prompt `## Trip Context` section with explicit instruction: `"use this exact collection_id for get_trip_details and add_to_itinerary"`.
+- UUID injection only occurs when collection lookup succeeds AND user is owner or `shared_with` member (authorization gate).
+- System prompt includes two-phase confirmation guidance: confirm only before the **first** `add_to_itinerary` action; after explicit user approval phrases ("yes", "go ahead", "add them"), proceed directly without re-confirming.
+- `get_trip_details` DoesNotExist returns `"collection_id is required and must reference a trip you can access"` (does NOT match short-circuit regex due to `fullmatch` — correct, this is an invalid-value error, not missing-param).
+- Known pre-existing: `get_trip_details` filters `user=user` only — shared-collection members get UUID context but tool returns DoesNotExist. Low severity.
+
+## Tool Output Rendering
+- Frontend `AITravelChat.svelte` hides raw `role=tool` messages via `visibleMessages` filter (`messages.filter(msg => msg.role !== 'tool')`).
+- Tool results render as concise user-facing summaries via `getToolSummary()`:
+  - `get_trip_details` → "Loaded details for {name} ({N} itinerary items)."
+  - `list_trips` → "Found {N} trip(s)."
+  - `add_to_itinerary` → "Added {name} to itinerary."
+  - `get_weather` → "Retrieved weather data."
+  - `search_places` / `web_search` → existing rich cards (place cards, linked cards).
+  - Error payloads → "{name} could not be completed." (no raw JSON).
+  - Unknown tools → generic fallback.
+- **Reload reconstruction**: `rebuildConversationMessages()` scans persisted messages after conversation load, parses `role=tool` rows via `parseStoredToolResult()`, and attaches them as `tool_results` on the preceding assistant message (matched by `tool_call_id`). Both streaming and reload paths produce identical `tool_results` data.
+- Text rendered via Svelte text interpolation (not `{@html}`), so LLM-sourced names are auto-escaped (no XSS vector).
+
+## Embedded Chat UX
+- Provider/model selectors moved into a compact `<details>` gear-icon dropdown in the header — header contains only hamburger toggle + title + settings gear.
+- Embedded mode uses bounded height: `h-[65vh]` with `min-h-[30rem]` / `max-h-[46rem]`; softened card treatment (`bg-base-100` + border).
+- Sidebar defaults to closed in embedded mode (`let sidebarOpen = !embedded;`); `lg:flex` ensures always-visible on desktop.
+- Quick-action chips use `btn-xs` + `overflow-x-auto` for compact embedded fit.
+- Streaming indicator visible inside last assistant bubble throughout entire generation (conditioned on `isStreaming && msg.id === lastVisibleMessageId`).
+- Known low-priority: `aria-label` values on sidebar toggle and settings button are hardcoded English (should use `$t()`). `<details>` dropdown does not auto-close on outside click.

 ## OpenCode Zen Provider
 - Provider ID: `opencode_zen`
--- a/.memory/manifest.yaml
+++ b/.memory/manifest.yaml
@@ -74,7 +74,7 @@ categories:
    group: sessions

  - path: plans/chat-provider-fixes.md
-    description: "Chat provider fixes plan (COMPLETE) — chat-loop-hardening, default-ai-settings, suggestion-add-flow workstreams with full review/test records"
+    description: "Chat provider fixes plan (COMPLETE) — chat-loop-hardening, default-ai-settings, suggestion-add-flow, chat-tool-grounding-and-confirmation, chat-tool-output-cleanup, embedded-chat-ux-polish workstreams with full review/test records"
    group: plans

  # Deprecated (content migrated)
--- a/.memory/plans/chat-provider-fixes.md
+++ b/.memory/plans/chat-provider-fixes.md
@@ -246,3 +246,151 @@ Alternative (Vercel AI SDK):
 **Cleanup required:** Two test artifact files left on host (not git-tracked, safe to delete):
 - `/home/alex/projects/voyage/test_suggestion_flow.py`
 - `/home/alex/projects/voyage/suggestion-modal-error-state.png`
+
+## Completion Note — `embedded-chat-ux-polish` (2026-03-09)
+
+- Updated `frontend/src/lib/components/AITravelChat.svelte` embedded UX only: moved provider/model selectors into a compact header settings dropdown, reduced embedded sidebar width, and added sidebar toggle accessibility attributes (`aria-label`, `aria-expanded`, `aria-controls`).
+- Replaced rigid embedded height (`h-[70vh]`) with a bounded strategy (`h-[65vh]` + min/max constraints) and softened embedded card treatment for better fit in recommendations layouts across desktop/mobile.
+- Kept streaming status visible throughout generation (not only before first token) and tightened embedded quick-action/input alignment with compact chip sizing + scrollable chip row behavior.
+
+## Completion Note — `chat-tool-output-cleanup` (2026-03-09)
+
+- Updated `frontend/src/lib/components/AITravelChat.svelte` to suppress standalone rendering of persisted `role=tool` messages, so reloaded conversations no longer surface raw tool payload rows.
+- Replaced inline raw-JSON fallback rendering with concise user-facing summaries for `get_trip_details`, `list_trips`, `add_to_itinerary`, and `get_weather`, while keeping existing rich cards for `search_places` and `web_search`.
+- Added safe error summarization for inline tool results so tool error payloads no longer display raw JSON in the normal chat UI.
+
+## Review Verdict — `chat-tool-output-cleanup` (2026-03-09)
+
+### STATUS: CHANGES-REQUESTED (score 13)
+
+**CRITICAL: Tool summaries and rich cards lost on conversation reload** (`AITravelChat.svelte:534,782`). `tool_results` is a frontend-only ephemeral field populated exclusively during SSE streaming (line 373). When a conversation is reloaded via `selectConversation()`, the backend serializer returns `role=tool` messages with raw payloads, but the new `visibleMessages` filter (line 534) hides them. No reconstruction step maps persisted `role=tool` messages back onto their preceding assistant message's `tool_results` array. Result: after page refresh or conversation switch, all tool activity indicators (summaries, search_places cards, web_search links) silently vanish. The user sees only the assistant's text with no tool context. (confidence: HIGH)
+
+**WARNING: Acceptance criterion partially unmet — "reloaded conversations do not expose raw tool payloads"** is satisfied (filter works), but the related user expectation that tool activity "remains understandable" on reload is violated because no tool indicators appear at all on reloaded conversations.
+
+**What was checked and confirmed safe:**
+- `visibleMessages` filter correctly excludes `role=tool` messages from display (line 534). No raw JSON blobs shown during streaming or on reload.
+- `getToolSummary()` logic is safe: uses Svelte text interpolation (not `{@html}`), so LLM-sourced names (trip names, location names) are auto-escaped. No XSS vector.
+- Error tool results render a generic "could not be completed" message rather than raw error JSON. Correct and safe.
+- Streaming state management is correct: `streamingContent` reset on each send (line 302), `isStreaming` cleared in `finally` (line 387). No stale state.
+- `lastVisibleMessageId` correctly tracks the last visible (non-tool) message for the streaming indicator.
+- `asRecord()` null guard is correct — rejects null, arrays, and non-objects.
+- Fallback summary for unknown tool names (line 599-602) is generic and safe.
+
+**NEXT (fix actions):**
+In `selectConversation()`, after loading `data.messages`, reconstruct `tool_results` on each assistant message by scanning the immediately following `role=tool` messages (which share `tool_call_id` with the assistant's `tool_calls` entries). For each tool message, parse its `content` (JSON string from `serialize_tool_result`), extract the tool `name` from the message's `name` field, and push a `ToolResultEntry` onto the preceding assistant message's `tool_results`. This ensures summaries and rich cards appear on reload. The `visibleMessages` filter continues to hide the raw tool rows.
+
+## Tester Validation — `chat-tool-output-cleanup` (2026-03-09)
+
+### STATUS: PASS
+
+**Evidence from lead (runtime):** Page reload of seeded conversation with persisted `get_trip_details` assistant tool call + `role=tool` result showed `🗺️ Loaded details for test (0 itinerary items).` — no raw JSON. Sidebar remained functional. Reviewer-APPROVED follow-up fix confirmed implemented and working.
+
+**Standard pass findings:**
+
+- `visibleMessages` filter (`messages.filter(msg => msg.role !== 'tool')`) correctly suppresses raw `role=tool` rows from display. Live DOM scan of 10 chat bubbles across two conversations found zero raw JSON blobs (`"itinerary":`, `"tool_call_id":` patterns absent).
+- `rebuildConversationMessages()` scans messages in one pass: sets `activeAssistant` on each assistant-with-tool-calls message; attaches subsequent `role=tool` rows as `ToolResultEntry` objects matched via `tool_call_id`. `activeAssistant` overridden on each new assistant message, preventing cross-turn leakage.
+- `parseStoredToolResult()` JSON-parses tool content; falls back to raw string on failure. Both paths produce a valid `ToolResultEntry` — no crash.
+- `getToolSummary()` produces human-readable summaries for `get_trip_details`, `list_trips`, `add_to_itinerary`, `get_weather`; generic fallback for unknown tools. Error payloads render `"<name> could not be completed."` — no raw JSON.
+- Backend `ChatMessageSerializer` confirmed to include `name`, `tool_call_id`, and `tool_calls` fields required for reconstruction.
+- Multi-turn live conversation validated: `⚠️ get trip details could not be completed.` + `🧳 Found 1 trip.` + `🗺️ Loaded details for test (6 itinerary items).` — all clean summaries, no raw JSON.
+- Text-only conversation (no tool calls) unaffected — loads correctly with zero tool artifacts.
+- Frontend build: `bun run lint`, `bun run check`, `bun run build` all passed (per lead).
+
+**Adversarial pass findings (7 hypotheses, all safe):**
+
+1. **Hypothesis: `assistant.tool_calls` with null IDs causes cross-turn leakage.** When `toolCallIds=[]`, the guard `msg.tool_call_id && toolCallIds.length > 0 && !includes` short-circuits at `length=0` → tool IS attached (permissive loose match). But the next `assistant` message overrides `activeAssistant` before its own tool rows, so no cross-turn pollution occurs. **Acceptable; null IDs cannot arise from correctly persisted backend rows.**
+2. **Hypothesis: orphaned `role=tool` after non-tool-call assistant attaches to wrong message.** `activeAssistant=null` when `tool_calls` absent/empty. Tool row skipped. **Not vulnerable.**
+3. **Hypothesis: malformed JSON in tool content crashes reconstruction.** Try/catch fallback returns `result: rawString`. `asRecord(string)` → `null`; `getToolSummary` hits generic branch. **Safe; no crash, no raw JSON exposed.**
+4. **Hypothesis: `name=null` on tool message causes downstream crash.** `msg.name || 'tool'` guard → `'tool'`. Generic fallback renders `"tool completed."` **Safe.**
+5. **Hypothesis: multi-tool assistant reconstructs both in correct order.** Both `call_A` and `call_B` rows attach to same assistant; `activeAssistant` cleared after count reaches `toolCallIds.length`. **Verified: 2 results attached in correct order.**
+6. **Hypothesis: empty `messages` array crashes.** Returns `[]` immediately. **Safe.**
+7. **Hypothesis: `role=tool` before any assistant crashes or attaches to user message.** `activeAssistant=null` at start; tool row skipped. **Safe.**
+
+**MUTATION_ESCAPES: 1/7** — The `toolCallIds.length > 0` guard in the clear condition means an assistant with all-null tool_call IDs never has `activeAssistant` cleared post-attachment. A second stray tool row would attach to the same assistant. Extremely low practical likelihood (backend always persists real IDs from LiteLLM); no production scenario produces this DB state.
+
+**FLAKY: 0**
+
+**COVERAGE: N/A** — No automated frontend test suite for `AITravelChat.svelte`. All validation via in-browser function evaluation (7 unit-level cases) + visual browser confirmation. Recommended follow-up: Playwright e2e test seeding a conversation with `role=tool` rows and verifying summary cards render on reload.
+
+**Screenshot evidence:** Captured `tool-summary-reload-verification.png` — showed `Tool summary reload test` conversation with assistant text + `🗺️ Loaded details for test (0 itinerary items).` summary card, no raw JSON. Screenshot deleted post-verification (artifact not git-tracked).
+
+## Tester Validation — `embedded-chat-ux-polish` (2026-03-09)
+
+### STATUS: PASS
+
+**Lead evidence accepted:**
+- `bun run lint`, `bun run check` (0 errors, 6 pre-existing warnings), and `bun run build` all passed.
+- Browser-validated: embedded chat opens with sidebar closed, compact header (`Show conversations` toggle + title + ⚙️ gear), recommendations area remains visible. Sidebar toggle opens conversation list correctly.
+- Reviewer APPROVED after sidebar-default follow-up fix (`let sidebarOpen = !embedded;` at line 63 confirmed in code).
+
+**Standard pass findings (code inspection):**
+
+- AC1 (header de-crowded): ✅ Provider/model selectors moved into `<details class="dropdown dropdown-end">` at line 768. Header contains only: hamburger toggle (mobile) + ✈️ title + ⚙️ gear summary button.
+- AC2 (layout stability): ✅ `h-[65vh]` + `min-h-[30rem]` + `max-h-[46rem]` on embedded mode (lines 683–685). `bg-base-100` + border treatment for embedded card (lines 674–677). Quick-action chips use `btn-xs` + `overflow-x-auto` + `pb-1` for embedded (lines 927–922).
+- AC3 (streaming indicator): ✅ `isStreaming && msg.id === lastVisibleMessageId` condition (line 903) inside last assistant bubble. `lastVisibleMessageId` is a reactive derived value from `visibleMessages` (line 599) — stays current throughout stream.
+- AC4 (sidebar default): ✅ `let sidebarOpen = !embedded;` (line 63). Sidebar CSS `{sidebarOpen ? '' : 'hidden'} lg:flex` (line 691) — starts hidden in embedded mode on mobile/tablet, always visible on lg+ (correct responsive pattern). Toggle button is `lg:hidden` (line 739).
+- AC5 (existing features preserved): ✅ Tool result rendering, conversation management, date selector modal, quick actions, send button states unchanged.
+
+**Adversarial pass findings:**
+
+1. **Hypothesis: desktop (lg+) embedded layout still crushes content because sidebar is always visible via `lg:flex`.** Expected: content area unusable. Observed: `lg:flex` overrides `hidden` on lg+ — this is the intended responsive pattern. On lg+ screens there is enough horizontal space for sidebar (`w-60`) + chat content. `min-w-0` on chat panel prevents overflow. **Not a defect; designed behavior confirmed by reviewer.**
+
+2. **Hypothesis: `<details>` settings dropdown doesn't close on outside click — user trapped.** Expected: frustration UX. Observed: DaisyUI `<details>` requires another click on summary to close. `settingsOpen = false` init confirmed (line 80). **Non-blocking UX inconvenience; pre-existing SUGGESTION from reviewer, not a blocking defect.**
+
+3. **Hypothesis: `lastVisibleMessageId` becomes stale during streaming, causing indicator to appear on wrong message.** Expected: indicator shows on previous message. Observed: `lastVisibleMessageId` is reactive (`$:` at line 599) — updates synchronously when `visibleMessages` changes. No stale-closure risk. **Not vulnerable.**
+
+4. **Hypothesis: `visibleMessages` filter excludes only `role=tool` — if all messages are tool messages, `lastVisibleMessageId` is `undefined` and streaming indicator never shows.** Expected: silent stream with no indicator. Observed: In practice, every send appends a `user` message and then an `assistant` streaming message — there will always be a non-tool message for the indicator to attach to. **Acceptable; degenerate case impossible in normal flow.**
+
+5. **Hypothesis: `aria-label` hardcoded English strings at lines 743 and 771 violate i18n convention.** Expected: non-English users see English screen-reader labels. Observed: lines 743 (`'Hide conversations'`/`'Show conversations'`) and 771 (`"AI settings"`) are hardcoded. **Low-severity SUGGESTION from reviewer — non-blocking, accessibility-only impact.**
+
+**MUTATION_ESCAPES: 0/4** — all critical logic paths for this UX-only feature are covered by the responsive CSS (no off-by-one possible) and the reactive `lastVisibleMessageId` derivation.
+
+**FLAKY: 0**
+
+**COVERAGE: N/A** — No automated test suite for frontend component; all validation via code inspection + lead browser evidence.
+
+**Residual low-priority items (follow-up suggested, not blocking):**
+- `aria-label` values at lines 743 and 771 should use `$t()` per i18n convention.
+- `<details>` dropdown does not auto-close on outside click (SUGGESTION from reviewer).
+
+## Completion Note — `chat-tool-grounding-and-confirmation` (2026-03-09)
+
+- `send_message()` trip context now injects the active collection UUID with explicit instruction that it is the `collection_id` for `get_trip_details` and `add_to_itinerary`, reducing wrong-trip-id hallucinations.
+- System prompt itinerary guidance now requires confirmation only before the first `add_to_itinerary` action; after explicit user approval phrases (e.g., "yes", "go ahead", "add them", "just add things there"), the assistant is instructed to stop re-confirming and call tools directly.
+- Tool error wording was tightened to align with required-arg short-circuit behavior: `get_trip_details` inaccessible/missing trips now return a required-arg-style `collection_id` error string, and `get_weather` empty dates now return `"dates is required"`.
+- Review verdict (2026-03-09): **APPROVED** (score 3). One WARNING: `get_trip_details` DoesNotExist error `"collection_id is required and must reference a trip you can access"` conflates missing-param and invalid-value semantics — may mislead LLM into thinking param was omitted rather than wrong. Does NOT create false-positive short-circuit (regex `fullmatch` correctly rejects the trailing clause). Closes prior known gap: `get_weather` "dates must be a non-empty list" now "dates is required" (matches regex). See [decisions.md](../decisions.md#correctness-review-chat-tool-grounding-and-confirmation).
+
+## Tester Validation — `chat-tool-grounding-and-confirmation` (2026-03-09)
+
+### STATUS: PASS
+
+**Test run:** `docker compose exec server python3 manage.py test chat integrations --keepdb` — 5/5 PASS. Full Django baseline 24/30 (6 pre-existing failures unchanged; zero new regressions).
+
+**Standard pass findings:**
+
+- UUID context injection confirmed: `send_message()` lines 255–258 append `"Collection UUID (use this exact collection_id for get_trip_details and add_to_itinerary): {collection.id}"` into `context_parts`, embedded in `system_prompt` (lines 295–296). UUID appears in the `role=system` message on every conversation turn.
+- Authorization gate confirmed: UUID injection block is inside `if collection:` (line 255); `collection` is only assigned when lookup succeeds AND user is owner or `shared_with` member (lines 244–251). Unauthorized collection_id → `collection = None` → block skipped.
+- System prompt confirmation guidance verified (`llm_client.py:340–341`): confirms only before first `add_to_itinerary` action; after user approval phrases ("yes", "go ahead", "add them", "just add things there"), stops re-confirming.
+- Regex validation — 11 test cases all pass:
+  - `"collection_id is required"` → **True** (short-circuits)
+  - `"collection_id is required and must reference a trip you can access"` → **False** (DoesNotExist; `fullmatch` rejects trailing clause — no false short-circuit)
+  - `"dates is required"` → **True** (prior `chat-loop-hardening` gap now **RESOLVED**)
+  - All legacy required-arg strings continue matching; non-matching strings correctly return False.
+- `get_weather` empty dates: string changed from `"dates must be a non-empty list"` to `"dates is required"` — now matches regex and short-circuits. Prior known gap closed.
+
+**Adversarial pass findings:**
+
+1. **Unauthorized collection_id leaks UUID?** `if collection:` gate prevents injection when lookup fails/unauthorized. **NOT VULNERABLE.**
+2. **DoesNotExist error creates false-positive short-circuit?** `fullmatch` returns `False` for trailing text. **NOT VULNERABLE.**
+3. **UUID grounding lost between turns?** UUID is in `system_prompt` (role=system), rebuilt fresh on every `send_message`. **Grounding persists for entire conversation.**
+4. **Null collection_id crashes injection block?** `if collection_id:` at line 242 gates the lookup; null → block skipped. **NOT VULNERABLE.**
+5. **Shared member gets UUID in context but `get_trip_details` fails (filter excludes shared_with)?** Confirmed pre-existing bug: `get_trip_details` filters `user=user` only. Shared members get UUID context but tool returns DoesNotExist. Does not short-circuit (trailing text); falls to `MAX_TOOL_ITERATIONS`. **PRE-EXISTING, LOW severity, not introduced here.**
+6. **`get_weather` short-circuit gap (prior MUTATION_ESCAPE) resolved?** Confirmed resolved — new `"dates is required"` string matches regex.
+
+**MUTATION_ESCAPES: 0/5** — all mutation checks detected. DoesNotExist false-positive (reviewer WARNING) confirmed benign.
+
+**FLAKY: 0**
+
+**COVERAGE: N/A** — No automated test suite for `chat` app. All validation via in-container regex checks + lead's live-run evidence. Recommended follow-up: add Django TestCase for (a) UUID context injection with authorized vs unauthorized collection_id, (b) DoesNotExist path does not trigger short-circuit, (c) empty dates triggers short-circuit.
+
+**Non-blocking known issues (accepted, pre-existing):** `get_trip_details` DoesNotExist wording semantically ambiguous (reviewer WARNING); `get_trip_details` excludes shared-collection members from `filter(user=user)` — both pre-existing, not introduced by this feature.
--- a/.memory/sessions/continuity.md
+++ b/.memory/sessions/continuity.md
@@ -1,12 +1,12 @@
 # Session Continuity

 ## Last Session (2026-03-09)
- Completed `chat-provider-fixes` change set with three workstreams:
-  - `chat-loop-hardening`: invalid required-arg tool calls now terminate cleanly, not replayed, assistant tool_call history trimmed consistently
-  - `default-ai-settings`: Settings page saves default provider/model via `UserAISettings`; DB defaults authoritative over localStorage; backend fallback uses saved defaults
-  - `suggestion-add-flow`: day suggestions use resolved provider/model (not hardcoded OpenAI); modal normalizes suggestion payloads for add-to-itinerary
+- Completed `chat-provider-fixes` follow-up round with three additional workstreams:
+  - `chat-tool-grounding-and-confirmation`: trip context now injects collection UUID for `get_trip_details`/`add_to_itinerary`; system prompt confirms only before first add action; tool error wording aligned with short-circuit regex (`get_weather` gap resolved)
+  - `chat-tool-output-cleanup`: `role=tool` messages hidden from display; tool outputs render as concise summaries; persisted tool rows reconstructed into `tool_results` on reload
+  - `embedded-chat-ux-polish`: provider/model selectors in compact settings dropdown; sidebar closed by default in embedded mode; bounded height; visible streaming indicator
 - All three workstreams passed reviewer + tester validation
- Documentation updated for all three workstreams
+- Prior session completed `chat-loop-hardening`, `default-ai-settings`, `suggestion-add-flow` — all reviewed and tested

 ## Active Work
 - `chat-provider-fixes` plan complete — all workstreams implemented, reviewed, tested, documented
@@ -16,6 +16,9 @@
 ## Known Follow-up Items (from tester findings)
 - No automated test coverage for `UserAISettings` CRUD + precedence logic
 - No automated test coverage for `send_message` streaming loop (tool error short-circuit, multi-tool partial success, `MAX_TOOL_ITERATIONS`)
- No automated test coverage for `DaySuggestionsView.post()` 
- `get_weather` error `"dates must be a non-empty list"` does not trigger tool-error short-circuit (mitigated by `MAX_TOOL_ITERATIONS`)
+- No automated test coverage for `DaySuggestionsView.post()`
+- No Playwright e2e test for tool summary reconstruction on conversation reload
 - LLM-generated name/location fields not truncated to `max_length=200` before `LocationSerializer` (low risk)
+- `aria-label` values in `AITravelChat.svelte` sidebar toggle and settings button are hardcoded English (should use `$t()`)
+- `<details>` settings dropdown in embedded chat does not auto-close on outside click
+- `get_trip_details` excludes `shared_with` members from `filter(user=user)` — shared users get UUID context but tool returns DoesNotExist (pre-existing, low severity)
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -67,7 +67,8 @@ Run in this order:
 - Security: handle CSRF tokens via `/auth/csrf/` and `X-CSRFToken`
 - Chat providers: dynamic catalog from `GET /api/chat/providers/`; configured in `CHAT_PROVIDER_CONFIG`
 - Chat model override: dropdown selector fed by `GET /api/chat/providers/{provider}/models/`; persisted in `localStorage` key `voyage_chat_model_prefs`; backend accepts optional `model` param in `send_message`
- Chat context: collection chats inject multi-stop itinerary context; system prompt guides `get_trip_details`-first reasoning
+- Chat context: collection chats inject collection UUID + multi-stop itinerary context; system prompt guides `get_trip_details`-first reasoning and confirms only before first `add_to_itinerary`
+- Chat tool output: `role=tool` messages hidden from display; tool outputs render as concise summaries; persisted tool rows reconstructed on reload via `rebuildConversationMessages()`
 - Chat error surfacing: `_safe_error_payload()` maps LiteLLM exceptions to sanitized user-safe categories (never forwards raw `exc.message`)
 - Invalid tool calls (missing required args) are detected and short-circuited with a user-visible error — not replayed into history

--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -75,7 +75,8 @@ Run in this exact order:
 - CSRF handling: use `/auth/csrf/` + `X-CSRFToken`
 - Chat providers: dynamic catalog from `GET /api/chat/providers/`; configured in `CHAT_PROVIDER_CONFIG`
 - Chat model override: dropdown selector fed by `GET /api/chat/providers/{provider}/models/`; persisted in `localStorage` key `voyage_chat_model_prefs`; backend accepts optional `model` param in `send_message`
- Chat context: collection chats inject multi-stop itinerary context; system prompt guides `get_trip_details`-first reasoning
+- Chat context: collection chats inject collection UUID + multi-stop itinerary context; system prompt guides `get_trip_details`-first reasoning and confirms only before first `add_to_itinerary`
+- Chat tool output: `role=tool` messages hidden from display; tool outputs render as concise summaries; persisted tool rows reconstructed on reload via `rebuildConversationMessages()`
 - Chat error surfacing: `_safe_error_payload()` maps LiteLLM exceptions to sanitized user-safe categories (never forwards raw `exc.message`)
 - Invalid tool calls (missing required args) are detected and short-circuited with a user-visible error — not replayed into history

--- a/backend/server/chat/agent_tools.py
+++ b/backend/server/chat/agent_tools.py
@@ -398,7 +398,9 @@ def get_trip_details(user, collection_id: str | None = None):
            }
        }
    except Collection.DoesNotExist:
-        return {"error": "Trip not found"}
+        return {
+            "error": "collection_id is required and must reference a trip you can access"
+        }
    except Exception:
        logger.exception("get_trip_details failed")
        return {"error": "An unexpected error occurred while fetching trip details"}
@@ -598,7 +600,7 @@ def get_weather(user, latitude=None, longitude=None, dates=None):
        dates = dates or []

        if not isinstance(dates, list) or not dates:
-            return {"error": "dates must be a non-empty list"}
+            return {"error": "dates is required"}

        results = [
            _fetch_temperature_for_date(latitude, longitude, date_value)
--- a/backend/server/chat/llm_client.py
+++ b/backend/server/chat/llm_client.py
@@ -337,7 +337,8 @@ When suggesting places:
 - Group suggestions logically (by area, by type, by day)

 When modifying itineraries:
- Always confirm with the user before adding items
+- Confirm with the user before the first add_to_itinerary action in a conversation
+- After the user clearly approves adding items (for example: "yes", "go ahead", "add them", "just add things there"), stop re-confirming and call add_to_itinerary directly for subsequent additions in that conversation
 - Suggest logical ordering based on geography
 - Consider travel time between locations

--- a/backend/server/chat/views/init.py
+++ b/backend/server/chat/views/init.py
@@ -253,6 +253,10 @@ class ChatViewSet(viewsets.ModelViewSet):
                pass

        if collection:
+            context_parts.append(
+                "Collection UUID (use this exact collection_id for get_trip_details and add_to_itinerary): "
+                f"{collection.id}"
+            )
            itinerary_stops = []
            seen_stops = set()
            for location in collection.locations.select_related(
--- a/frontend/src/lib/components/AITravelChat.svelte
+++ b/frontend/src/lib/components/AITravelChat.svelte
@@ -10,6 +10,11 @@
 		result: unknown;
 	};

+	type ToolSummary = {
+		icon: string;
+		text: string;
+	};
+
 	type PlaceResult = {
 		name: string;
 		address?: string;
@@ -28,6 +33,8 @@
 		role: 'user' | 'assistant' | 'tool';
 		content: string;
 		name?: string;
+		tool_calls?: Array<{ id?: string }>;
+		tool_call_id?: string;
 		tool_results?: ToolResultEntry[];
 	};

@@ -53,7 +60,7 @@
 	let messages: ChatMessage[] = [];
 	let inputMessage = '';
 	let isStreaming = false;
-	let sidebarOpen = true;
+	let sidebarOpen = !embedded;
 	let streamingContent = '';

 	let selectedProvider = '';
@@ -70,6 +77,7 @@
 	let showDateSelector = false;
 	let selectedPlaceToAdd: PlaceResult | null = null;
 	let selectedDate = '';
+	let settingsOpen = false;

 	const dispatch = createEventDispatcher<{
 		close: void;
@@ -265,10 +273,72 @@
 		const res = await fetch(`/api/chat/conversations/${conv.id}/`);
 		if (res.ok) {
 			const data = await res.json();
-			messages = data.messages || [];
+			messages = rebuildConversationMessages(data.messages || []);
 		}
 	}

+	function parseStoredToolResult(msg: ChatMessage): ToolResultEntry | null {
+		if (msg.role !== 'tool') {
+			return null;
+		}
+
+		try {
+			return {
+				name: msg.name || 'tool',
+				result: JSON.parse(msg.content)
+			};
+		} catch {
+			return {
+				name: msg.name || 'tool',
+				result: msg.content
+			};
+		}
+	}
+
+	function rebuildConversationMessages(rawMessages: ChatMessage[]): ChatMessage[] {
+		const rebuilt = rawMessages.map((msg) => ({
+			...msg,
+			tool_results: msg.tool_results ? [...msg.tool_results] : undefined
+		}));
+
+		let activeAssistant: ChatMessage | null = null;
+
+		for (const msg of rebuilt) {
+			if (msg.role === 'assistant') {
+				activeAssistant = Array.isArray(msg.tool_calls) && msg.tool_calls.length > 0 ? msg : null;
+				continue;
+			}
+
+			if (msg.role !== 'tool' || !activeAssistant) {
+				continue;
+			}
+
+			const toolCallIds = (activeAssistant.tool_calls || [])
+				.map((toolCall) => toolCall?.id)
+				.filter((toolCallId): toolCallId is string => !!toolCallId);
+
+			if (msg.tool_call_id && toolCallIds.length > 0 && !toolCallIds.includes(msg.tool_call_id)) {
+				continue;
+			}
+
+			const parsedResult = parseStoredToolResult(msg);
+			if (!parsedResult) {
+				continue;
+			}
+
+			activeAssistant.tool_results = [...(activeAssistant.tool_results || []), parsedResult];
+
+			if (
+				toolCallIds.length > 0 &&
+				(activeAssistant.tool_results?.length || 0) >= toolCallIds.length
+			) {
+				activeAssistant = null;
+			}
+		}
+
+		return rebuilt;
+	}
+
 	async function deleteConversation(conv: Conversation) {
 		await fetch(`/api/chat/conversations/${conv.id}/`, { method: 'DELETE' });
 		conversations = conversations.filter((conversation) => conversation.id !== conv.id);
@@ -398,22 +468,6 @@
 		}
 	}

-	function parseToolResults(msg: ChatMessage): ToolResultEntry[] {
-		if (msg.tool_results?.length) {
-			return msg.tool_results;
-		}
-
-		if (msg.role !== 'tool') {
-			return [];
-		}
-
-		try {
-			return [{ name: msg.name || 'tool', result: JSON.parse(msg.content) }];
-		} catch {
-			return [{ name: msg.name || 'tool', result: msg.content }];
-		}
-	}
-
 	function hasPlaceResults(result: ToolResultEntry): boolean {
 		return (
 			result.name === 'search_places' &&
@@ -541,20 +595,100 @@
 	}

 	let messagesContainer: HTMLElement;
+	$: visibleMessages = messages.filter((msg) => msg.role !== 'tool');
+	$: lastVisibleMessageId = visibleMessages[visibleMessages.length - 1]?.id;
 	$: if (messages && messagesContainer) {
 		setTimeout(() => {
 			messagesContainer?.scrollTo({ top: messagesContainer.scrollHeight, behavior: 'smooth' });
 		}, 50);
 	}
+
+	function asRecord(value: unknown): Record<string, unknown> | null {
+		if (!value || typeof value !== 'object' || Array.isArray(value)) {
+			return null;
+		}
+
+		return value as Record<string, unknown>;
+	}
+
+	function getToolSummary(result: ToolResultEntry): ToolSummary {
+		const payload = asRecord(result.result);
+		const hasError = !!(payload && typeof payload.error === 'string' && payload.error.trim());
+
+		if (hasError) {
+			return {
+				icon: '⚠️',
+				text: `${result.name.replaceAll('_', ' ')} could not be completed.`
+			};
+		}
+
+		if (result.name === 'list_trips') {
+			const tripCount = Array.isArray(payload?.trips) ? payload.trips.length : 0;
+			return {
+				icon: '🧳',
+				text:
+					tripCount > 0
+						? `Found ${tripCount} trip${tripCount === 1 ? '' : 's'}.`
+						: 'No trips found.'
+			};
+		}
+
+		if (result.name === 'get_trip_details') {
+			const trip = asRecord(payload?.trip);
+			const tripName = typeof trip?.name === 'string' ? trip.name : 'trip';
+			const itineraryCount = Array.isArray(trip?.itinerary) ? trip.itinerary.length : 0;
+			return {
+				icon: '🗺️',
+				text: `Loaded details for ${tripName} (${itineraryCount} itinerary item${itineraryCount === 1 ? '' : 's'}).`
+			};
+		}
+
+		if (result.name === 'add_to_itinerary') {
+			const location = asRecord(payload?.location);
+			const locationName = typeof location?.name === 'string' ? location.name : 'location';
+			return {
+				icon: '📌',
+				text: `Added ${locationName} to the itinerary.`
+			};
+		}
+
+		if (result.name === 'get_weather') {
+			const entries = Array.isArray(payload?.results) ? payload.results : [];
+			const availableCount = entries.filter((entry) => asRecord(entry)?.available === true).length;
+			return {
+				icon: '🌤️',
+				text: `Checked weather for ${entries.length} date${entries.length === 1 ? '' : 's'} (${availableCount} available).`
+			};
+		}
+
+		return {
+			icon: '🛠️',
+			text: `${result.name.replaceAll('_', ' ')} completed.`
+		};
+	}
 </script>

-<div class="card bg-base-200 shadow-xl">
+<div
+	class="card"
+	class:bg-base-200={!embedded}
+	class:bg-base-100={embedded}
+	class:shadow-xl={!embedded}
+	class:border={embedded}
+	class:border-base-300={embedded}
+>
 	<div class="card-body p-0">
-		<div class="flex" class:h-[calc(100vh-64px)]={!embedded} class:h-[70vh]={embedded}>
+		<div
+			class="flex"
+			class:h-[calc(100vh-64px)]={!embedded}
+			class:h-[65vh]={embedded}
+			class:min-h-[30rem]={embedded}
+			class:max-h-[46rem]={embedded}
+		>
 			<div
-				class="w-72 bg-base-200 flex flex-col border-r border-base-300 {sidebarOpen
-					? ''
-					: 'hidden'} lg:flex"
+				id="chat-conversations-sidebar"
+				class="bg-base-200 flex flex-col border-r border-base-300 {embedded
+					? 'w-60'
+					: 'w-72'} {sidebarOpen ? '' : 'hidden'} lg:flex"
 			>
 				<div class="p-3 flex items-center justify-between border-b border-base-300">
 					<h2 class="text-lg font-semibold">{$t('chat.conversations')}</h2>
@@ -604,6 +738,9 @@
 					<button
 						class="btn btn-sm btn-ghost lg:hidden"
 						on:click={() => (sidebarOpen = !sidebarOpen)}
+						aria-controls="chat-conversations-sidebar"
+						aria-expanded={sidebarOpen}
+						aria-label={sidebarOpen ? 'Hide conversations' : 'Show conversations'}
 					>
 						{#if sidebarOpen}
 							<svg class="w-5 h-5" viewBox="0 0 24 24" fill="currentColor" aria-hidden="true">
@@ -628,34 +765,57 @@
 						</div>
 					</div>
 					<div class="ml-auto flex items-center gap-2">
-						<select
-							class="select select-bordered select-sm"
-							bind:value={selectedProvider}
-							disabled={chatProviders.length === 0}
-						>
-							{#each chatProviders as provider}
-								<option value={provider.id}>
-									{provider.label}
-									{#if provider.user_configured}
-										✓{/if}
-								</option>
-							{/each}
-						</select>
-						<select
-							class="select select-bordered select-sm"
-							bind:value={selectedModel}
-							disabled={chatProviders.length === 0}
-						>
-							{#if modelsLoading}
-								<option value="">Loading...</option>
-							{:else if availableModels.length === 0}
-								<option value="">Default</option>
-							{:else}
-								{#each availableModels as model}
-									<option value={model}>{model}</option>
-								{/each}
-							{/if}
-						</select>
+						<details class="dropdown dropdown-end" bind:open={settingsOpen}>
+							<summary
+								class="btn btn-sm btn-ghost"
+								aria-label="AI settings"
+								aria-expanded={settingsOpen}
+							>
+								⚙️
+							</summary>
+							<div
+								class="dropdown-content z-20 mt-2 w-72 rounded-box border border-base-300 bg-base-100 p-3 shadow"
+							>
+								<div class="space-y-2">
+									<label class="label py-0" for="chat-provider-select">
+										<span class="label-text text-xs opacity-70">{$t('settings.provider')}</span>
+									</label>
+									<select
+										id="chat-provider-select"
+										class="select select-bordered select-sm w-full"
+										bind:value={selectedProvider}
+										disabled={chatProviders.length === 0}
+									>
+										{#each chatProviders as provider}
+											<option value={provider.id}>
+												{provider.label}
+												{#if provider.user_configured}
+													✓{/if}
+											</option>
+										{/each}
+									</select>
+									<label class="label py-0" for="chat-model-select">
+										<span class="label-text text-xs opacity-70">{$t('chat.model_label')}</span>
+									</label>
+									<select
+										id="chat-model-select"
+										class="select select-bordered select-sm w-full"
+										bind:value={selectedModel}
+										disabled={chatProviders.length === 0}
+									>
+										{#if modelsLoading}
+											<option value="">Loading...</option>
+										{:else if availableModels.length === 0}
+											<option value="">{$t('chat.model_placeholder')}</option>
+										{:else}
+											{#each availableModels as model}
+												<option value={model}>{model}</option>
+											{/each}
+										{/if}
+									</select>
+								</div>
+							</div>
+						</details>
 					</div>
 				</div>

@@ -677,141 +837,96 @@
 								<p class="text-base-content/60 max-w-md">{$t('chat.welcome_message')}</p>
 							</div>
 						{:else}
-							{#each messages as msg}
+							{#each visibleMessages as msg}
 								<div class="flex {msg.role === 'user' ? 'justify-end' : 'justify-start'}">
-									{#if msg.role === 'tool'}
-										<div class="max-w-2xl w-full">
-											<div class="bg-base-200 rounded-lg p-3 text-xs space-y-2">
-												<div class="font-semibold mb-1 text-primary">🗺️ {msg.name}</div>
-												{#each parseToolResults(msg) as result}
-													{#if hasPlaceResults(result)}
-														<div class="grid gap-2">
-															{#each getPlaceResults(result) as place}
-																<div class="card card-compact bg-base-100 p-3">
-																	<h4 class="font-semibold">{place.name}</h4>
-																	{#if place.address}
-																		<p class="text-sm text-base-content/70">{place.address}</p>
-																	{/if}
-																	{#if place.rating}
-																		<div class="flex items-center gap-1 text-sm">
-																			<span>⭐</span>
-																			<span>{place.rating}</span>
-																		</div>
-																	{/if}
-																	{#if collectionId}
-																		<button
-																			class="btn btn-xs btn-primary btn-outline mt-2"
-																			on:click={() => openDateSelector(place)}
-																			disabled={!hasPlaceCoordinates(place)}
-																		>
-																			{$t('add_to_itinerary')}
-																		</button>
-																	{/if}
-																</div>
-															{/each}
-														</div>
-													{:else if hasWebSearchResults(result)}
-														<div class="grid gap-2">
-															{#each getWebSearchResults(result) as item}
-																<a
-																	href={item.url}
-																	target="_blank"
-																	rel="noopener noreferrer"
-																	class="card card-compact bg-base-100 p-3 hover:bg-base-300 transition-colors block"
-																>
-																	<h4 class="font-semibold link">{item.title}</h4>
-																	<p class="text-sm text-base-content/70 line-clamp-2">
-																		{item.snippet}
-																	</p>
-																</a>
-															{/each}
-														</div>
-													{:else}
-														<div class="bg-base-100 rounded p-2 text-sm">
-															<pre>{JSON.stringify(result.result, null, 2)}</pre>
-														</div>
-													{/if}
-												{/each}
-											</div>
-										</div>
-									{:else}
-										<div class="chat {msg.role === 'user' ? 'chat-end' : 'chat-start'}">
-											<div
-												class="chat-bubble {msg.role === 'user'
-													? 'chat-bubble-primary'
-													: 'chat-bubble-neutral'}"
-											>
-												<div class="whitespace-pre-wrap">{msg.content}</div>
-												{#if msg.role === 'assistant' && msg.tool_results}
-													<div class="mt-2 space-y-2">
-														{#each msg.tool_results as result}
-															{#if hasPlaceResults(result)}
-																<div class="grid gap-2">
-																	{#each getPlaceResults(result) as place}
-																		<div class="card card-compact bg-base-200 p-3">
-																			<h4 class="font-semibold">{place.name}</h4>
-																			{#if place.address}
-																				<p class="text-sm text-base-content/70">{place.address}</p>
-																			{/if}
-																			{#if place.rating}
-																				<div class="flex items-center gap-1 text-sm">
-																					<span>⭐</span>
-																					<span>{place.rating}</span>
-																				</div>
-																			{/if}
-																			{#if collectionId}
-																				<button
-																					class="btn btn-xs btn-primary btn-outline mt-2"
-																					on:click={() => openDateSelector(place)}
-																					disabled={!hasPlaceCoordinates(place)}
-																				>
-																					{$t('add_to_itinerary')}
-																				</button>
-																			{/if}
-																		</div>
-																	{/each}
-																</div>
-															{:else if hasWebSearchResults(result)}
-																<div class="grid gap-2">
-																	{#each getWebSearchResults(result) as item}
-																		<a
-																			href={item.url}
-																			target="_blank"
-																			rel="noopener noreferrer"
-																			class="card card-compact bg-base-200 p-3 hover:bg-base-300 transition-colors block"
-																		>
-																			<h4 class="font-semibold link">{item.title}</h4>
-																			<p class="text-sm text-base-content/70 line-clamp-2">
-																				{item.snippet}
-																			</p>
-																		</a>
-																	{/each}
-																</div>
-															{:else}
-																<div class="bg-base-200 rounded p-2 text-sm">
-																	<pre>{JSON.stringify(result.result, null, 2)}</pre>
-																</div>
-															{/if}
-														{/each}
-													</div>
-												{/if}
-												{#if msg.role === 'assistant' && isStreaming && msg.id === messages[messages.length - 1]?.id && !msg.content}
+									<div class="chat {msg.role === 'user' ? 'chat-end' : 'chat-start'}">
+										<div
+											class="chat-bubble {msg.role === 'user'
+												? 'chat-bubble-primary'
+												: 'chat-bubble-neutral'}"
+										>
+											<div class="whitespace-pre-wrap">{msg.content}</div>
+											{#if msg.role === 'assistant' && msg.tool_results}
+												<div class="mt-2 space-y-2">
+													{#each msg.tool_results as result}
+														{#if hasPlaceResults(result)}
+															<div class="grid gap-2">
+																{#each getPlaceResults(result) as place}
+																	<div class="card card-compact bg-base-200 p-3">
+																		<h4 class="font-semibold">{place.name}</h4>
+																		{#if place.address}
+																			<p class="text-sm text-base-content/70">{place.address}</p>
+																		{/if}
+																		{#if place.rating}
+																			<div class="flex items-center gap-1 text-sm">
+																				<span>⭐</span>
+																				<span>{place.rating}</span>
+																			</div>
+																		{/if}
+																		{#if collectionId}
+																			<button
+																				class="btn btn-xs btn-primary btn-outline mt-2"
+																				on:click={() => openDateSelector(place)}
+																				disabled={!hasPlaceCoordinates(place)}
+																			>
+																				{$t('add_to_itinerary')}
+																			</button>
+																		{/if}
+																	</div>
+																{/each}
+															</div>
+														{:else if hasWebSearchResults(result)}
+															<div class="grid gap-2">
+																{#each getWebSearchResults(result) as item}
+																	<a
+																		href={item.url}
+																		target="_blank"
+																		rel="noopener noreferrer"
+																		class="card card-compact bg-base-200 p-3 hover:bg-base-300 transition-colors block"
+																	>
+																		<h4 class="font-semibold link">{item.title}</h4>
+																		<p class="text-sm text-base-content/70 line-clamp-2">
+																			{item.snippet}
+																		</p>
+																	</a>
+																{/each}
+															</div>
+														{:else}
+															<div class="bg-base-200 rounded p-2 text-sm flex items-center gap-2">
+																<span>{getToolSummary(result).icon}</span>
+																<span>{getToolSummary(result).text}</span>
+															</div>
+														{/if}
+													{/each}
+												</div>
+											{/if}
+											{#if msg.role === 'assistant' && isStreaming && msg.id === lastVisibleMessageId}
+												<div class="mt-2 inline-flex items-center gap-2 text-xs opacity-70">
 													<span class="loading loading-dots loading-sm"></span>
-												{/if}
-											</div>
+													<span>{$t('processing')}</span>
+												</div>
+											{/if}
 										</div>
-									{/if}
+									</div>
 								</div>
 							{/each}
 						{/if}
 					</div>

-					<div class="p-4 border-t border-base-300">
-						<div class="max-w-4xl mx-auto">
-							<div class="flex flex-wrap gap-2 mb-3">
+					<div class="border-t border-base-300 p-3 sm:p-4">
+						<div class:mx-auto={!embedded} class:max-w-4xl={!embedded}>
+							<div
+								class="mb-3 flex gap-2"
+								class:flex-wrap={!embedded}
+								class:overflow-x-auto={embedded}
+								class:pb-1={embedded}
+							>
 								{#if promptTripContext}
 									<button
-										class="btn btn-sm btn-ghost"
+										class="btn btn-ghost"
+										class:btn-xs={embedded}
+										class:btn-sm={!embedded}
+										class:whitespace-nowrap={embedded}
 										on:click={() =>
 											sendPresetMessage(
 												`What are the best restaurants to include across my ${promptTripContext} itinerary?`
@@ -821,7 +936,10 @@
 										🍽️ Restaurants
 									</button>
 									<button
-										class="btn btn-sm btn-ghost"
+										class="btn btn-ghost"
+										class:btn-xs={embedded}
+										class:btn-sm={!embedded}
+										class:whitespace-nowrap={embedded}
 										on:click={() =>
 											sendPresetMessage(
 												`What activities should I plan across my ${promptTripContext} itinerary?`
@@ -833,7 +951,10 @@
 								{/if}
 								{#if startDate && endDate}
 									<button
-										class="btn btn-sm btn-ghost"
+										class="btn btn-ghost"
+										class:btn-xs={embedded}
+										class:btn-sm={!embedded}
+										class:whitespace-nowrap={embedded}
 										on:click={() =>
 											sendPresetMessage(
 												`What should I pack for my trip from ${startDate} to ${endDate}?`
@@ -844,7 +965,10 @@
 									</button>
 								{/if}
 								<button
-									class="btn btn-sm btn-ghost"
+									class="btn btn-ghost"
+									class:btn-xs={embedded}
+									class:btn-sm={!embedded}
+									class:whitespace-nowrap={embedded}
 									on:click={() =>
 										sendPresetMessage('Can you help me plan a day-by-day itinerary for this trip?')}
 									disabled={isStreaming || chatProviders.length === 0}
@@ -853,7 +977,7 @@
 								</button>
 							</div>
 						</div>
-						<div class="flex gap-2 max-w-4xl mx-auto">
+						<div class="flex items-end gap-2" class:mx-auto={!embedded} class:max-w-4xl={!embedded}>
 							<textarea
 								class="textarea textarea-bordered flex-1 resize-none"
 								placeholder={$t('chat.input_placeholder')}
@@ -863,7 +987,7 @@
 								disabled={isStreaming}
 							></textarea>
 							<button
-								class="btn btn-primary"
+								class="btn btn-primary self-end"
 								on:click={sendMessage}
 								disabled={isStreaming || !inputMessage.trim() || chatProviders.length === 0}
 								title={$t('chat.send')}