chore: remove accidental memory files from repo root

These were stashed basic-memory files that got staged. Memory files
belong in .memory/ (managed by basic-memory MCP), not the repo root.
This commit is contained in:
2026-03-10 18:40:45 +00:00
parent c4b8f291f2
commit 78d2b3774e
5 changed files with 0 additions and 452 deletions

View File

@@ -1,19 +0,0 @@
---
title: Retrospective - Chat Tool Failure Handling
type: note
permalink: voyage/decisions/retrospective-chat-tool-failure-handling
tags:
- pattern
- lesson
- chat
- retrospective
---
# Retrospective - Chat Tool Failure Handling
## Retrospective: chat tool failure handling
- [pattern] Separating successful-tool iterations from bounded all-failure rounds prevents external tool outages from exhausting the main tool-call budget. #pattern
- [pattern] Classify chat tool failures into required-parameter errors, retryable execution failures, and permanent execution failures so each path can use the right UX and retry behavior. #pattern
- [risk] Mixed batches with both successful and failed tool calls still rely on the global success iteration cap as a backstop; acceptable pre-release, but worth revisiting if multi-tool itineraries remain noisy. #risk
confidence=high; last_validated=2026-03-10; volatility=medium; review_after_days=60; validation_count=1; contradiction_count=0

View File

@@ -1,65 +0,0 @@
---
title: chat-tool-loop-fix tester gate
type: note
permalink: voyage/gates/chat-tool-loop-fix-tester-gate
tags:
- tester
- gate
- chat
- loop-fix
- adversarial
---
## Gate: chat-tool-loop-fix adversarial validation (2026-03-10)
**Verdict: PASS**
**Plan ref:** [[voyage/plans/assistant-add-flow-fixes]]
**Branch:** fix/chat-tool-loop-fix
### Test execution
- All 32 chat tests pass (docker compose exec server python3 manage.py test chat --keepdb --verbosity=2)
- 0 failures, 0 errors, 0 flaky
### Adversarial checks and findings
#### AC1 — all-failure rounds stop cleanly without falling into tool_loop_limit
PASS. `all_failure_rounds` counter increments only when `not successful_tool_calls and first_execution_failure` (line 749). On reaching `MAX_ALL_FAILURE_ROUNDS=3`, the code issues `tool_execution_error` and `return`s (line 777). `tool_iterations` is never incremented for all-failure rounds (it increments only at line 782, after `all_failure_rounds = 0`), so 10-iteration cap is never hit for pure-failure scenarios. Test `test_all_failure_rounds_stop_with_execution_error_before_tool_cap` confirms exactly 3 `stream_chat_completion` calls and `tool_execution_error` rather than `tool_loop_limit`.
#### AC2 — permanent failures stop in one round
PASS. When `retryable=False`, `encountered_permanent_failure = True` is set (line 717), then at post-iteration check (line 750-751) `all_failure_rounds` is set directly to `MAX_ALL_FAILURE_ROUNDS` (bypassing the increment), triggering immediate stop. Test `test_permanent_execution_failure_stops_immediately` confirms 1 `stream_chat_completion` call.
#### AC3 — required-parameter clarification flow still intact
PASS. The `_is_required_param_tool_error` path (lines 647-711) short-circuits before the execution-failure path (line 713). `search_places` missing-location triggers `clarification_content` SSE emit + DB persist + `return`. Other required-param errors still emit `tool_validation_error` and `return`. Tests `test_missing_search_place_location_streams_clarifying_content` and `test_collection_access_error_does_not_short_circuit_required_param_regex` confirm correctness.
#### AC4 — execution failures sanitized and NOT emitted as tool_result SSE events
PASS. The execution-failure branch at line 713 does `continue` — it never reaches the `yield tool_event` at line 747. Failed tool results are recorded in `first_execution_failure` but not in `successful_tool_calls` / `successful_tool_messages` / `successful_tool_chat_entries`, so they are never persisted to DB as role=tool rows and never streamed to the frontend. `_build_tool_execution_error_event` uses only the sanitized error text (line 265-271), not the raw exception. Test confirms no `tool_result` events in failure scenarios.
#### AC5 — mixed-batch partial failure: one tool succeeds, another fails
OBSERVED FINDING (LOW): When a batch has e.g. 2 tool calls where call_1 succeeds and call_2 fails with an execution error:
- `successful_tool_calls` will contain call_1's result; call_2's failure is silently dropped from this iteration's context.
- The assistant message persisted to DB will reference only call_1's tool_calls.
- `all_failure_rounds` stays at 0 (successful_tool_calls is non-empty, so the `not successful_tool_calls` condition at line 749 is False).
- Loop continues as if call_2 never happened — the LLM won't know call_2 failed.
- Acceptable for now (pre-release), but the LLM may loop trying to re-execute call_2 if it expects an answer. Not a regression from the old behavior (where ALL failures were fed back to LLM), and the `MAX_TOOL_ITERATIONS` cap remains as a backstop.
#### AC6 — `_build_llm_messages` orphan tool_call filtering on reload
PASS. `valid_tool_call_ids` (line 59) is built only from non-required-param-error and non-execution-failure tool messages. Assistant messages with `tool_calls` have their `tool_calls` filtered to only the valid set (line 95-101). If ALL tool_calls for an assistant message are failures, `filtered_tool_calls` is empty and `payload["tool_calls"]` is not set (line 100 guard) — the assistant message is sent to LLM without tool_calls, avoiding orphaned tool references.
#### AC7 — frontend `rebuildConversationMessages` dedupe
PASS. Line 374 forces `tool_results: undefined` on all messages before the sibling-walk, discarding any server-side `tool_results` field. `appendToolResultDedup` guards on `tool_call_id` uniqueness during the walk. `uniqueToolResultsByCallId` is applied again at render time (line 939) as a second safety net. Combined: no double-accumulation on reload.
#### AC8 — `uniqueToolResultsByCallId` silent drop for null/undefined tool_call_id
LOW RISK: `uniqueToolResultsByCallId` (line 351-368) has a special case: if `result.tool_call_id` is falsy (null/undefined), the result is always appended without dedup check (line 359 guard). This means multiple results with no `tool_call_id` will all be included. Given that tool_call_id is always populated from the backend for proper tool calls, this is a safe fallback for edge cases.
#### AC9 — `_is_execution_failure_tool_error` catches ANY non-empty error that is not required-param error
PASS. This is intentionally broad — any `{error: "..."}` dict that isn't a missing-param pattern is treated as an execution failure. This correctly captures geocode errors, network errors, unexpected exceptions, etc.
#### AC10 — retry fallback after geocode failure still uses `_is_execution_failure_tool_error` guard
PASS (line 631-634): The retry succeeds only if the retry result is not a required-param error AND not an execution failure. If all retries fail, the `result` remains the original geocode error, which is then caught by `_is_execution_failure_tool_error` at line 713 and handled as a failure (not emitted as tool result).
### Mutation escape analysis
- MUTATION_ESCAPES: 1/8
- The unchecked mutation: flipping `not successful_tool_calls``successful_tool_calls` in the all-failure-round check (line 749) would cause failures to be silently dropped without incrementing the counter. No existing test with partial success + partial failure to detect this mutation. Low risk given the mixed-batch scenario is acceptable behavior.
### Summary
Implementation is sound for the primary stated goals. All-failure rounds stop cleanly; permanent failures stop in 1 round; clarification flow is intact; execution failures are not leaked to frontend. Frontend dedupe is correct via `tool_results: undefined` reset + ID-based dedup. One low-severity gap: mixed-batch partial failure silently drops the failing tool call (acceptable pre-release).

View File

@@ -1,59 +0,0 @@
---
title: Chat Tool Error Handling Architecture
type: note
permalink: voyage/knowledge/chat-tool-error-handling-architecture
tags:
- chat
- tools
- error-handling
- architecture
- pattern
---
# Chat Tool Error Handling Architecture
## Overview
The chat agent tool loop classifies tool call outcomes into three distinct categories, each with different retry and surfacing behavior.
## Error Classification
### 1. Required-parameter validation errors
- [pattern] Detected by `_is_required_param_tool_error()` regex matching `"... is required"` patterns in tool result `error` field
- [convention] Short-circuited immediately with a user-visible error — never replayed into LLM history
- [pattern] `search_places` missing `location` has a special path: `_is_search_places_location_retry_candidate_error()` triggers deterministic context-retry (trip destination → first itinerary stop → user clarification) before surfacing
### 2. Execution failures (new in chat-tool-loop-fix)
- [pattern] Any `error`-bearing tool result dict that does NOT match the required-param pattern is classified as an execution failure by `_is_execution_failure_tool_error()`
- [convention] Execution failures are NEVER replayed into LLM context — they are excluded from `successful_tool_calls`, `successful_tool_messages`, and `successful_tool_chat_entries`
- [pattern] `tool_iterations` increments only after at least one successful tool call in a round
- [pattern] All-failure rounds (every tool in a round fails) increment `all_failure_rounds`, capped at `MAX_ALL_FAILURE_ROUNDS` (3)
- [pattern] Permanent failures (`retryable: false` in tool result, e.g. `web_search` ImportError) set `all_failure_rounds = MAX_ALL_FAILURE_ROUNDS` for immediate stop
- [convention] Execution failures emit a `tool_execution_error` SSE event with sanitized text via `_build_tool_execution_error_event()`
### 3. Geocoding failures in search_places
- [pattern] `Could not geocode location: ...` errors are detected by `_is_search_places_location_retry_candidate_error()` (same path as missing-location)
- [convention] Eligible for the existing context-retry fallback before being treated as a terminal failure
## Error Sanitization
- [convention] `_safe_error_payload()` maps LiteLLM exceptions to sanitized user-safe categories — never forwards raw `exc.message`
- [convention] `execute_tool()` catch-all returns `{"error": "Tool execution failed"}` (hardcoded) — never raw `str(exc)`
- [decision] The `_build_tool_execution_error_event()` wraps sanitized tool error text in a user-safe sentence for SSE emission and DB persistence
## Frontend Tool-Result Deduplication
- [pattern] Three-layer dedup by `tool_call_id`:
1. `rebuildConversationMessages()` sets `tool_results: undefined` on all assistant messages, then re-derives exclusively from persisted `role=tool` sibling rows — discards any server-side pre-populated `tool_results`
2. `appendToolResultDedup()` deduplicates during both rebuild walk and live SSE ingestion
3. `uniqueToolResultsByCallId()` at render time provides a final safety net
## Key Files
- Backend classification/loop: `backend/server/chat/views/__init__.py`
- Tool execution + sanitization: `backend/server/chat/agent_tools.py`
- Frontend dedup: `frontend/src/lib/components/AITravelChat.svelte`
- Tests: `backend/server/chat/tests.py` (32 total chat tests)
## Relations
- related_to [[assistant-add-flow-fixes]]

View File

@@ -1,101 +0,0 @@
---
title: assistant-add-flow-fixes
type: note
permalink: voyage/plans/assistant-add-flow-fixes-1
---
## Feature A hotfix implementation note
- Added a targeted chat hotfix in `ChatViewSet` so required-arg validation for `search_places` with missing `location` now emits a natural assistant clarification as SSE `content` (and persists it as an assistant message) instead of streaming a `tool_validation_error` payload.
- Kept existing generic required-param short-circuit behavior unchanged for all other tools/errors.
- Added tests covering the specific detection helper and end-to-end streaming behavior for the missing-location clarify path.
## Follow-up fix: itinerary-aware search fallback
- Feature: Use collection itinerary context to satisfy missing `search_places.location` when available, and preserve dining intent when category is omitted.
- Acceptance:
- Restaurant requests across a collection itinerary should not ask for location when itinerary context can provide one.
- `search_places` should not silently return tourism attractions for restaurant requests when category is omitted.
- Existing behavior for non-dining searches remains intact.
- Planned worktree: `.worktrees/chat-search-intent-fix`
- Planned branch: `fix/chat-search-intent-fix`
## Implementation note: chat search intent + context retry
- Implemented deterministic `search_places` retry fallback in `ChatViewSet`: when tool call misses required `location`, retry first with trip context (`destination`, then first itinerary stop label), then retain existing user-location-reply retry behavior.
- Added deterministic dining-intent heuristic in `ChatViewSet` to inject `category="food"` when `search_places.category` is omitted and current/recent user messages indicate restaurant/dining intent.
- Updated `search_places` tool schema text to explicitly describe `food` (restaurants/dining), `tourism` (attractions), and `lodging` (hotels/stays).
- Added targeted tests for category inference and a collection-context integration flow proving restaurant requests can complete via destination-backed retry without location clarification prompt.
## Regression Exploration: Restaurant Recommendations Across Test Itinerary (2026-03-10)
### Observed failure modes
1. **"too many tool calls"** — `tool_loop_limit` error after 10 iterations
2. **Repeated "search places could not be completed"** — LLM retries `search_places` each iteration, each producing a non-location-error that bypasses the clarification/retry gate
3. **Repeated "web search could not be completed"** — LLM retries `web_search` each iteration; non-fatal errors (rate-limit, ImportError, generic failure) are treated as successful tool calls and fed back to the LLM, which re-calls the tool
### Root cause analysis
#### RC1 — `web_search` non-fatal errors are NOT short-circuited (primary cause of web_search loop)
**File**: `backend/server/chat/views/__init__.py`, `event_stream()` loop (line 474+)
- `web_search` returns `{"error": "...", "results": []}` on ImportError, rate-limit, and generic failures.
- `_is_required_param_tool_error()` only matches "location is required" / "query is required" / `"X is required"` pattern. It does **not** match "Web search failed. Please try again." or "Search rate limit reached."
- So the error payload passes through `_is_required_param_tool_error``False`, and the tool result (including the error text) is added to `successful_tool_calls` and fed back into the LLM context as a successful tool message.
- The LLM sees a tool result with `error` text but no real data, and calls `web_search` again — repeating until `MAX_TOOL_ITERATIONS` (10) is hit.
#### RC2 — `search_places` geocoding/Overpass failures are NOT short-circuited (cause of search_places loop)
**File**: `backend/server/chat/agent_tools.py` `search_places()` (line 149, 199205)
- `search_places` returns `{"error": "Could not geocode location: X"}` when Nominatim returns empty results, `{"error": "Places API request failed: ..."}` on network errors, or `{"error": "An unexpected error occurred..."}` on generic failures.
- **None of these match** `_is_required_param_tool_error()` (which only detects missing-param patterns). They are not `location is required`.
- The context-retry fallback (`_is_search_places_missing_location_required_error`, lines 522565) only activates on `location is required`. Geocoding/network failures silently pass through as "successful" tool results.
- LLM receives `{"error": "Could not geocode location: Paris"}` in tool context, tries again → same failure → 10 times → loop limit.
#### RC3 — `MAX_TOOL_ITERATIONS = 10` counter increments even when ALL tool calls in an iteration are failures
**File**: `backend/server/chat/views/__init__.py`, line 475
- `tool_iterations += 1` happens **before** tool results are checked. A single iteration with 2 tool calls each producing non-fatal errors still costs 1 iteration count and adds 2 failed results to LLM context.
- For a restaurant request across a multi-stop itinerary, the LLM may call `search_places` for multiple stops per iteration, burning through 10 iterations rapidly.
#### RC4 — `successful_tool_calls` naming is misleading — it accumulates ALL non-required-param-error results
**File**: `backend/server/chat/views/__init__.py`, lines 476, 635
- The variable `successful_tool_calls` accumulates tool calls whose results are not `_is_required_param_tool_error`. This includes tool calls that returned geocoding failures, network errors, or `web_search` unavailability errors. These are appended to `current_messages` and persisted to DB, causing the LLM to see repeated failure payloads.
### Edit points and recommended fix approaches
#### EP1 — Add non-fatal tool error detection and short-circuit
**File**: `backend/server/chat/views/__init__.py`
**Symbol**: New static method `_is_nonfatal_tool_error(result)` + use in `event_stream()`
- Detect results that have `{"error": ..., "results": []}` (web_search style) or `{"error": "Could not geocode..."}` / `{"error": "Places API request failed..."}` (search_places failures).
- Recommended approach: check `result.get("error")` is non-empty AND the result is not a successful payload (no `results` data, no places data). If a tool returns an error result that is not a required-param error, log a warning, do NOT add to `successful_tool_calls`, and instead emit a summary to the LLM context as a tool message explaining the failure — then break the loop or emit a user-visible message after N consecutive non-fatal failures.
#### EP2 — Distinguish `web_search` persistent unavailability (ImportError) from retryable errors
**File**: `backend/server/chat/agent_tools.py`
**Symbol**: `web_search()` (lines 294307)
- Currently ImportError (`duckduckgo-search` not installed) and rate-limit both return the same shape: `{"error": "...", "results": []}`.
- Recommended approach: add a sentinel field `"permanent": True` for ImportError, so the view layer can detect non-retryable failures and short-circuit immediately on the first iteration rather than retrying 10 times. Alternatively add a distinct error key like `"error_code": "unavailable"` vs `"error_code": "rate_limited"`.
#### EP3 — Add non-fatal tool error counting per-tool to break loops early
**File**: `backend/server/chat/views/__init__.py`
**Symbol**: `event_stream()` (line 424+)
- Track `consecutive_nonfatal_errors` per tool (or globally). After 2 consecutive failures of the same tool name, stop calling it for this turn and emit a user-facing message: "Search places could not be completed for this location."
- This prevents the 10-iteration burn for a single unreachable external API.
#### EP4 — Only increment `tool_iterations` when at least one tool call succeeds
**File**: `backend/server/chat/views/__init__.py`
**Symbol**: `event_stream()` line 475 (`tool_iterations += 1`)
- Move `tool_iterations += 1` to after the tool result loop, and only increment if `len(successful_tool_calls) > 0`. Failed-only iterations should not count toward the limit (or should count less). Alternatively, add a separate `failed_tool_iterations` counter with a lower limit (e.g. 3) specifically for all-failure iterations.
#### EP5 — `search_places` geocoding failure should surface earlier in retry logic
**File**: `backend/server/chat/views/__init__.py`
**Symbol**: `event_stream()` lines 522565 (the `_is_search_places_missing_location_required_error` retry block)
- Current retry block only fires on `location is required`. Extend retry logic to also retry with fallback location when result contains `{"error": "Could not geocode location: ..."}` (Nominatim empty result) using the same `retry_locations` fallback list. This gives context-location a second chance when Nominatim fails on the first geocode.
### Files / symbols map
| File | Symbol | Role |
|------|--------|------|
| `backend/server/chat/views/__init__.py` | `MAX_TOOL_ITERATIONS = 10` (line 422) | Hard loop cap |
| `backend/server/chat/views/__init__.py` | `tool_iterations += 1` (line 475) | Iteration counter — always increments, even on all-failure iterations |
| `backend/server/chat/views/__init__.py` | `_is_required_param_tool_error()` (line 136) | Only detects missing-param errors; does NOT catch tool execution failures |
| `backend/server/chat/views/__init__.py` | `_is_search_places_missing_location_required_error()` (line 180) | Only triggers retry for `location is required`; geocoding/network failures bypass it |
| `backend/server/chat/views/__init__.py` | `successful_tool_calls` (line 476) | Misnamed — accumulates all non-required-param-error results including failures |
| `backend/server/chat/views/__init__.py` | `event_stream()` lines 480660 | Full tool execution loop |
| `backend/server/chat/agent_tools.py` | `search_places()` (line 125) | Returns `{"error": "..."}` on geocode/network failure — not a required-param error |
| `backend/server/chat/agent_tools.py` | `web_search()` (line 257) | Returns `{"error": "...", "results": []}` on import/rate-limit/generic failure |
| `backend/server/chat/llm_client.py` | `stream_chat_completion()` (line 380) | No tool-level retry logic — passes through all tool results |

View File

@@ -1,208 +0,0 @@
---
title: assistant-add-flow-fixes
type: note
permalink: voyage/plans/assistant-add-flow-fixes
---
## Feature A hotfix implementation note
- Added a targeted chat hotfix in `ChatViewSet` so required-arg validation for `search_places` with missing `location` now emits a natural assistant clarification as SSE `content` (and persists it as an assistant message) instead of streaming a `tool_validation_error` payload.
- Kept existing generic required-param short-circuit behavior unchanged for all other tools/errors.
- Added tests covering the specific detection helper and end-to-end streaming behavior for the missing-location clarify path.
## Follow-up fix: itinerary-aware search fallback
- Feature: Use collection itinerary context to satisfy missing `search_places.location` when available, and preserve dining intent when category is omitted.
- Acceptance:
- Restaurant requests across a collection itinerary should not ask for location when itinerary context can provide one.
- `search_places` should not silently return tourism attractions for restaurant requests when category is omitted.
- Existing behavior for non-dining searches remains intact.
- Planned worktree: `.worktrees/chat-search-intent-fix`
- Planned branch: `fix/chat-search-intent-fix`
## Implementation note: chat search intent + context retry
- Implemented deterministic `search_places` retry fallback in `ChatViewSet`: when tool call misses required `location`, retry first with trip context (`destination`, then first itinerary stop label), then retain existing user-location-reply retry behavior.
- Added deterministic dining-intent heuristic in `ChatViewSet` to inject `category="food"` when `search_places.category` is omitted and current/recent user messages indicate restaurant/dining intent.
- Updated `search_places` tool schema text to explicitly describe `food` (restaurants/dining), `tourism` (attractions), and `lodging` (hotels/stays).
- Added targeted tests for category inference and a collection-context integration flow proving restaurant requests can complete via destination-backed retry without location clarification prompt.
## Regression: Duplicate Tool Summaries / Repeated Items — Frontend Analysis (2026-03-10)
### Context
Exploring why restaurant recommendations across a test itinerary produce duplicated itinerary-item summaries and repeated warning toasts in the frontend chat renderer.
### Root Cause 1 — `rebuildConversationMessages` double-accumulates `tool_results` on reload
**File:** `frontend/src/lib/components/AITravelChat.svelte`, `rebuildConversationMessages()` (lines 333375)
**Mechanism:** When `selectConversation()` reloads a conversation from the API, `rebuildConversationMessages()` walks the raw message list and pushes each `role=tool` message into its parent assistant message's `tool_results` array. The backend serializer may return a pre-populated `tool_results` field on the assistant message object. Because the `rebuilt` map at line 336 preserves any server-side `tool_results` via spread (`msg.tool_results ? [...msg.tool_results] : undefined`), and then the walk loop **appends again** from the raw `role=tool` sibling messages, any tool result present on both the server-side `assistant.tool_results` field *and* as a sibling `role=tool` message will be pushed **twice**.
**Second trigger path:** A multi-stop itinerary causes the backend tool loop to fire `search_places` once per stop via the context-retry fallback. Each call emits a separate `tool_result` SSE event. The frontend appends each into `assistantMsg.tool_results` during streaming (line 472). On conversation reload, `rebuildConversationMessages` re-appends them again from the persisted `role=tool` DB rows → **2× the place-result cards**.
**Fix approach:** Change line 336 so the initial `rebuilt` map always sets `tool_results: undefined`. This discards any server-side `tool_results` field on assistant messages and lets the sibling-walk be the sole source of truth:
```ts
// line 334337 — change to:
const rebuilt = rawMessages.map((msg) => ({
...msg,
tool_results: undefined // always re-derive from role=tool siblings below
}));
```
### Root Cause 2 — Multi-iteration tool loop creates multiple assistant messages; `rebuildConversationMessages` may pair results to wrong parent
**File:** `AITravelChat.svelte`, `rebuildConversationMessages()` lines 342344 and backend `event_stream` lines 662688
**Mechanism:** For a multi-restaurant flow the backend iterates: `search_places` (iteration 1) → saves assistant+tool messages → then `add_to_itinerary × N` (iterations 2N). Each iteration saves a separate assistant message with distinct `tool_calls`. On reload, the loop in `rebuildConversationMessages` sets `activeAssistant` to each assistant message with `tool_calls`, then nulls it when `tool_results.length >= toolCallIds.length` (lines 366371). If the DB `created_at` timestamps for assistant and tool messages are identical (same DB transaction), the `order_by("created_at")` in `_build_llm_messages` is non-deterministic, and the sibling walk may misalign tool results to wrong parent assistant messages → wrong or duplicated summary cards.
**Fix approach (backend):** After saving the assistant message at line 670 with `await sync_to_async(ChatMessage.objects.create)`, save each tool message **in a separate awaited call** (already the case at lines 678682) and add a `time.sleep(0)` / tick between saves to guarantee distinct `created_at`. Alternatively, add an explicit `order` integer field to `ChatMessage` and sort by it instead of `created_at`.
### Root Cause 3 — Repeated "success" toast from duplicate place-result cards
**File:** `AITravelChat.svelte`, `addPlaceToItinerary()` lines 569632
**Mechanism:** `addToast('success', $t('added_successfully'))` fires unconditionally at line 626 on every user-triggered add. When Root Cause 1 causes duplicate place-result cards to be rendered, the user sees two "Add to itinerary" buttons for the same place and clicking either fires a toast. Fixing Root Cause 1 (dedup in `rebuildConversationMessages`) eliminates the duplicate cards and therefore this repeat toast.
**No action required here unless RC1 fix is insufficient.** As a defense-in-depth measure, consider tracking which place names have been added within the session and disabling the button.
### Concrete Edit Points
| File | Function/Section | Change |
|---|---|---|
| `AITravelChat.svelte` L334336 | `rebuilt` map init in `rebuildConversationMessages` | Change `tool_results: msg.tool_results ? [...msg.tool_results] : undefined``tool_results: undefined` (always re-derive from sibling walk) |
| `AITravelChat.svelte` L467474 | SSE `parsed.tool_result` handler in `sendMessage` | Add dedup guard: skip push if `tool_call_id` already in `assistantMsg.tool_results` |
| `AITravelChat.svelte` L894944 | `{#each msg.tool_results as result}` template | Add JS-side dedup by `tool_call_id` or a `{#key}` block before rendering |
| `backend/server/chat/views/__init__.py` L670682 | assistant + tool message save in `event_stream` | Verify saved order is deterministic; add explicit sequence/order field if `created_at` collisions are possible |
### Recommended Fix Sequence
1. **Priority 1:** `AITravelChat.svelte` line 336 — change to `tool_results: undefined`. Single-line fix, eliminates Root Cause 1 and RC3 transitively.
2. **Priority 2:** `AITravelChat.svelte` SSE handler (line 472) — add `tool_call_id` dedup guard for defense against backend emitting duplicate events.
3. **Priority 3:** Template dedup (lines 894944) as render-layer safety net.
4. **Verify:** Reload a multi-restaurant conversation; each `add_to_itinerary` summary and each place card must appear exactly once.
### Related Notes
- [[assistant-add-flow-fixes]] — backend search intent + retry logic that triggers multi-iteration tool loops
## Follow-on fix: restaurant recommendation tool loop
- [ ] Workstream `fix/chat-tool-loop-fix` in `.worktrees/chat-tool-loop-fix`: stop chat tool loops from replaying execution failures into the LLM context; only count successful tool iterations, cap repeated all-failure rounds, and hard-stop permanent `web_search` failures.
- Acceptance: restaurant recommendation requests do not end with "too many tool calls" when tool executions fail; `search_places`/`web_search` failures surface once and stop retry spirals.
- [ ] Keep frontend tool-result rendering deduplicated in `AITravelChat.svelte` during both SSE streaming and conversation rebuild.
- Acceptance: itinerary/tool summaries do not render duplicate cards or duplicate downstream add-to-itinerary affordances after reload.
- [ ] Validate the fix in worktree `fix/chat-tool-loop-fix` with targeted backend chat tests plus frontend `bun run check`/`bun run build`, then perform a functional restaurant-request verification.
- Acceptance: validation demonstrates the assistant either returns restaurant guidance or surfaces a single actionable failure instead of looping.
## Implementation outcome: chat tool loop + duplicate summary fix
- Added execution-failure classification in `backend/server/chat/views/__init__.py` that separates required-parameter validation errors from execution failures, and excludes execution-failure tool rows from LLM replay/history filtering.
- Updated tool loop control so `tool_iterations` increments only after at least one successful tool call; added bounded `MAX_ALL_FAILURE_ROUNDS = 3` for all-failure rounds, with immediate stop for permanent failures (`retryable: false`).
- Extended `search_places` location retry candidate detection to include geocoding failures (`Could not geocode location: ...`) so destination/itinerary/user-location fallback can retry before terminating.
- Marked `web_search` dependency import failure as permanent in `backend/server/chat/agent_tools.py` (`retryable: false`) so it is not retried like transient outages.
- Fixed frontend duplication in `frontend/src/lib/components/AITravelChat.svelte` by rebuilding tool results only from persisted `role=tool` rows, deduping on `tool_call_id` during rebuild and live SSE ingestion, and rendering deduped tool results.
- Added focused backend tests in `backend/server/chat/tests.py` for execution-failure classification, geocode retry fallback, bounded all-failure loop behavior, permanent failure immediate stop, and `web_search` import-failure retryability flag.
- Related prior analysis note: [[assistant-add-flow-fixes]]
## Security Review Verdict: chat-tool-loop-fix (2026-03-10)
- **Verdict:** CHANGES-REQUESTED
- **Review Score:** 6 (2 WARNINGs × 3)
- **Lens:** Security
- **WARNING 1:** `_build_tool_execution_error_event` (views/__init__.py:260-272) forwards raw tool error text to the client SSE stream. The error string from `execute_tool` (agent_tools.py:643) uses `str(exc)`, which can contain internal details for uncaught exceptions. This is also persisted as assistant message content.
- **WARNING 2:** Persisted error text in assistant messages (L769) creates a durable exposure vector — raw error details survive in DB and are re-served on reload.
- **No CRITICALs.** Auth/permission checks, injection safety, and CSRF handling remain intact.
- **Fix required:** Sanitize error text in `_build_tool_execution_error_event` or at source in `execute_tool` catch-all. Targeted re-review sufficient after fix.
- **Confirmed:** `_safe_error_payload()` convention not violated by existing code, but new path bypasses it.
- Related: [[assistant-add-flow-fixes]]
## Review verdict: chat-tool-loop-fix (2026-03-10)
**VERDICT: APPROVED**
**LENS: correctness**
**REVIEW_SCORE: 0**
### What was checked
1. **Execution failure classification** (`_is_execution_failure_tool_error`, `_is_retryable_execution_failure`): Verified the classification is correctly the complement of `_is_required_param_tool_error` — any `error`-bearing dict that doesn't match the required-param pattern is an execution failure. The `retryable` flag defaults to `True` unless explicitly `False`. No false-positive/false-negative classification risk found.
2. **Execution failure exclusion from LLM context**: Execution failure tool results are never added to `successful_tool_calls`/`successful_tool_messages`/`successful_tool_chat_entries` (skipped via `continue` at line 718). They are never persisted to DB and never appended to `current_messages`. The `_build_llm_messages` filter at lines 65-67 and 84-88 is a belt-and-suspenders defense for pre-existing DB data.
3. **All-failure-round bounding**: `all_failure_rounds` increments by 1 on each all-failure round and is hard-set to MAX on permanent failures. The outer `while` loop guard (`tool_iterations < MAX_TOOL_ITERATIONS`) remains satisfied during all-failure rounds (since `tool_iterations` isn't incremented), but `all_failure_rounds >= MAX_ALL_FAILURE_ROUNDS` (3) exits within bounded iterations. No infinite loop possible.
4. **Permanent failure fast-path**: `encountered_permanent_failure = True``all_failure_rounds = MAX_ALL_FAILURE_ROUNDS` → immediate stop. Confirmed `web_search` ImportError returns `retryable: False`.
5. **Geocode failure retry**: `_is_search_places_location_retry_candidate_error` now matches both "location is required" and "Could not geocode location: ..." patterns. Retry uses trip context destination or user content. Retry success check (lines 631-635) now requires the retry to be free of BOTH required-param AND execution-failure errors — correct tightening vs main branch.
6. **Frontend tool-result dedup**: Three layers of defense confirmed:
- `rebuildConversationMessages` now sets `tool_results: undefined` (line 374), eliminating RC1 (double-accumulation from server-side pre-populated `tool_results`).
- `appendToolResultDedup` deduplicates by `tool_call_id` during both rebuild (line 402) and SSE streaming (line 514).
- `uniqueToolResultsByCallId` at render time (line 939) provides a final safety net.
- Edge case: tool results without `tool_call_id` bypass dedup — acceptable since real tool calls always have IDs.
7. **Test coverage**: Tests cover execution failure classification, geocode retry, bounded all-failure loop (verifying 3 LLM calls + 3 tool calls), permanent failure immediate stop (verifying 1 LLM call + 1 tool call), and `web_search` ImportError retryability flag. Mock setup is correct — `side_effect = failing_stream` (callable) creates fresh async generators per call.
### No findings
- No off-by-one in loop bounds.
- No async/await misuse — `sync_to_async` wraps synchronous `execute_tool` correctly.
- No resource leaks — generators terminate cleanly via `return` or `break`.
- No race conditions — the event stream is single-threaded.
- No contract violations — SSE event format is preserved for both success and error paths.
- No mutation/shared-state risks — `first_execution_failure` and `encountered_permanent_failure` are scoped per tool-call batch.
## Security fix outcome: execute_tool error sanitization
- Updated `backend/server/chat/agent_tools.py` so `execute_tool()` catch-all no longer returns raw `str(exc)`.
- New catch-all payload is now the safe generic message: `{"error": "Tool execution failed"}`.
- Added focused regression test in `backend/server/chat/tests.py` (`ExecuteToolErrorSanitizationTests`) to verify exception text is not exposed via `execute_tool()`.
## Security Re-Review Verdict: execute_tool sanitization fix (2026-03-10)
- **Verdict:** APPROVED
- **Review Score:** 0
- **Lens:** Security (targeted re-review)
- **Resolution:** Both prior WARNINGs (SSE raw error leak, DB persistence of raw errors) are RESOLVED.
- **Evidence:**
- `execute_tool()` catch-all at `agent_tools.py:643` now returns `{"error": "Tool execution failed"}` (hardcoded string, no `str(exc)`).
- `_build_tool_execution_error_event()` at `views/__init__.py:260-272` wraps the sanitized error in a user-safe sentence.
- SSE emission at `views/__init__.py:775` and DB persistence at `views/__init__.py:769` both receive only the wrapped safe text.
- Regression test `ExecuteToolErrorSanitizationTests` at `tests.py:49-65` validates sanitization by injecting `RuntimeError("sensitive backend detail")` and asserting the generic output.
- **No new issues introduced.** Single-line source-level fix with no new code paths or trust boundaries.
- Related: [[assistant-add-flow-fixes]]
## Outcome: restaurant recommendation tool loop fix (2026-03-10)
- [x] Workstream `fix/chat-tool-loop-fix` in `.worktrees/chat-tool-loop-fix`: chat tool execution failures no longer replay into LLM context as successful tool results; successful-tool iteration budget is separate from all-failure rounds; permanent failures stop immediately.
- [x] Frontend `AITravelChat.svelte` dedupes tool results on rebuild, SSE ingestion, and render.
- [x] Validation passed: backend targeted chat suites passed after syncing worktree files into `voyage-server-1:/code/chat/...`; frontend `bun run check` and `bun run build` passed with only pre-existing warnings.
- [x] Review passed after sanitizing `execute_tool()` catch-all errors to `"Tool execution failed"`.
Open risk accepted for pre-release:
- Mixed batches with both successful and failed tool calls can still drop the failed tool silently and rely on `MAX_TOOL_ITERATIONS` as the backstop; adversarial tester marked this low severity and acceptable for pre-release.
## Implementation outcome: backend context-location extraction + clarification suppression
- Updated `backend/server/chat/views/__init__.py` itinerary-stop fallback parsing so address-only stops derive a city hint from the last non-numeric comma-separated segment (e.g. `Little Turnstile 6, London``London`) before computing `trip_context_location`.
- Updated `search_places` context-retry block to track `attempted_location_retry`; when retries were attempted but still return required-param errors, result is converted to execution failure (`Could not search places at the provided itinerary locations`) to avoid location clarification prompts.
- Clarification branch now only triggers for missing-location required-param errors when no context retry was attempted.
- Added tests in `backend/server/chat/tests.py` for city extraction from fallback address and for retry-with-context failure path asserting no clarification SSE is emitted and execution-failure behavior is returned.
## Targeted Re-Review Verdict: city extraction + clarification suppression (2026-03-10)
- **Verdict:** APPROVED
- **Review Score:** 0
- **Lens:** Correctness (targeted re-review of Fix 1 + Fix 2 only)
- **Scope:** `views/__init__.py` lines 444-462 (city extraction from fallback address) and lines 611-671 + 694-700 (retry failure → execution failure mutation + clarification guard). Plus two new tests in `tests.py`.
### Fix 1: City extraction — edge cases verified
- Comma address: extracts last non-empty non-numeric segment (correct)
- No comma: falls back to raw `fallback_name` (correct)
- All-numeric segments: falls back to full `fallback_name` (safe, will fail geocoding gracefully)
- Empty parts after split: filtered by truthiness check (correct)
- Dedup via `stop_key`: uses extracted city, correctly collapses same-city addresses
### Fix 2: Retry failure mutation — flow verified
- `attempted_location_retry` is per-tool-call scoped (no cross-call bleed)
- Mutated result `"Could not search places at the provided itinerary locations"` correctly fails `_is_required_param_tool_error` (no match on exact strings or regex)
- Mutated result correctly passes `_is_execution_failure_tool_error` (dict with non-empty error, not required-param)
- Routes to execution-failure handler, not clarification branch
- Clarification guard at lines 694-700 is defense-in-depth (unreachable in mutation scenario, but protects against future refactoring)
### Test coverage verified
- `test_collection_context_retry_extracts_city_from_fallback_address`: correctly tests city extraction from `"Little Turnstile 6, London"``"London"` retry
- `test_context_retry_failure_does_not_emit_location_clarification`: correctly tests mutation routing to `tool_execution_error` SSE event, no clarification
- Both tests exercise the intended code paths with correct mock setups
### No findings — APPROVED with zero issues
- Related: [[assistant-add-flow-fixes]]