chore: remove accidental memory files from repo root
These were stashed basic-memory files that got staged. Memory files belong in .memory/ (managed by basic-memory MCP), not the repo root.
This commit is contained in:
@@ -1,19 +0,0 @@
|
||||
---
|
||||
title: Retrospective - Chat Tool Failure Handling
|
||||
type: note
|
||||
permalink: voyage/decisions/retrospective-chat-tool-failure-handling
|
||||
tags:
|
||||
- pattern
|
||||
- lesson
|
||||
- chat
|
||||
- retrospective
|
||||
---
|
||||
|
||||
# Retrospective - Chat Tool Failure Handling
|
||||
|
||||
## Retrospective: chat tool failure handling
|
||||
- [pattern] Separating successful-tool iterations from bounded all-failure rounds prevents external tool outages from exhausting the main tool-call budget. #pattern
|
||||
- [pattern] Classify chat tool failures into required-parameter errors, retryable execution failures, and permanent execution failures so each path can use the right UX and retry behavior. #pattern
|
||||
- [risk] Mixed batches with both successful and failed tool calls still rely on the global success iteration cap as a backstop; acceptable pre-release, but worth revisiting if multi-tool itineraries remain noisy. #risk
|
||||
|
||||
confidence=high; last_validated=2026-03-10; volatility=medium; review_after_days=60; validation_count=1; contradiction_count=0
|
||||
@@ -1,65 +0,0 @@
|
||||
---
|
||||
title: chat-tool-loop-fix tester gate
|
||||
type: note
|
||||
permalink: voyage/gates/chat-tool-loop-fix-tester-gate
|
||||
tags:
|
||||
- tester
|
||||
- gate
|
||||
- chat
|
||||
- loop-fix
|
||||
- adversarial
|
||||
---
|
||||
|
||||
## Gate: chat-tool-loop-fix adversarial validation (2026-03-10)
|
||||
|
||||
**Verdict: PASS**
|
||||
**Plan ref:** [[voyage/plans/assistant-add-flow-fixes]]
|
||||
**Branch:** fix/chat-tool-loop-fix
|
||||
|
||||
### Test execution
|
||||
- All 32 chat tests pass (docker compose exec server python3 manage.py test chat --keepdb --verbosity=2)
|
||||
- 0 failures, 0 errors, 0 flaky
|
||||
|
||||
### Adversarial checks and findings
|
||||
|
||||
#### AC1 — all-failure rounds stop cleanly without falling into tool_loop_limit
|
||||
PASS. `all_failure_rounds` counter increments only when `not successful_tool_calls and first_execution_failure` (line 749). On reaching `MAX_ALL_FAILURE_ROUNDS=3`, the code issues `tool_execution_error` and `return`s (line 777). `tool_iterations` is never incremented for all-failure rounds (it increments only at line 782, after `all_failure_rounds = 0`), so 10-iteration cap is never hit for pure-failure scenarios. Test `test_all_failure_rounds_stop_with_execution_error_before_tool_cap` confirms exactly 3 `stream_chat_completion` calls and `tool_execution_error` rather than `tool_loop_limit`.
|
||||
|
||||
#### AC2 — permanent failures stop in one round
|
||||
PASS. When `retryable=False`, `encountered_permanent_failure = True` is set (line 717), then at post-iteration check (line 750-751) `all_failure_rounds` is set directly to `MAX_ALL_FAILURE_ROUNDS` (bypassing the increment), triggering immediate stop. Test `test_permanent_execution_failure_stops_immediately` confirms 1 `stream_chat_completion` call.
|
||||
|
||||
#### AC3 — required-parameter clarification flow still intact
|
||||
PASS. The `_is_required_param_tool_error` path (lines 647-711) short-circuits before the execution-failure path (line 713). `search_places` missing-location triggers `clarification_content` SSE emit + DB persist + `return`. Other required-param errors still emit `tool_validation_error` and `return`. Tests `test_missing_search_place_location_streams_clarifying_content` and `test_collection_access_error_does_not_short_circuit_required_param_regex` confirm correctness.
|
||||
|
||||
#### AC4 — execution failures sanitized and NOT emitted as tool_result SSE events
|
||||
PASS. The execution-failure branch at line 713 does `continue` — it never reaches the `yield tool_event` at line 747. Failed tool results are recorded in `first_execution_failure` but not in `successful_tool_calls` / `successful_tool_messages` / `successful_tool_chat_entries`, so they are never persisted to DB as role=tool rows and never streamed to the frontend. `_build_tool_execution_error_event` uses only the sanitized error text (line 265-271), not the raw exception. Test confirms no `tool_result` events in failure scenarios.
|
||||
|
||||
#### AC5 — mixed-batch partial failure: one tool succeeds, another fails
|
||||
OBSERVED FINDING (LOW): When a batch has e.g. 2 tool calls where call_1 succeeds and call_2 fails with an execution error:
|
||||
- `successful_tool_calls` will contain call_1's result; call_2's failure is silently dropped from this iteration's context.
|
||||
- The assistant message persisted to DB will reference only call_1's tool_calls.
|
||||
- `all_failure_rounds` stays at 0 (successful_tool_calls is non-empty, so the `not successful_tool_calls` condition at line 749 is False).
|
||||
- Loop continues as if call_2 never happened — the LLM won't know call_2 failed.
|
||||
- Acceptable for now (pre-release), but the LLM may loop trying to re-execute call_2 if it expects an answer. Not a regression from the old behavior (where ALL failures were fed back to LLM), and the `MAX_TOOL_ITERATIONS` cap remains as a backstop.
|
||||
|
||||
#### AC6 — `_build_llm_messages` orphan tool_call filtering on reload
|
||||
PASS. `valid_tool_call_ids` (line 59) is built only from non-required-param-error and non-execution-failure tool messages. Assistant messages with `tool_calls` have their `tool_calls` filtered to only the valid set (line 95-101). If ALL tool_calls for an assistant message are failures, `filtered_tool_calls` is empty and `payload["tool_calls"]` is not set (line 100 guard) — the assistant message is sent to LLM without tool_calls, avoiding orphaned tool references.
|
||||
|
||||
#### AC7 — frontend `rebuildConversationMessages` dedupe
|
||||
PASS. Line 374 forces `tool_results: undefined` on all messages before the sibling-walk, discarding any server-side `tool_results` field. `appendToolResultDedup` guards on `tool_call_id` uniqueness during the walk. `uniqueToolResultsByCallId` is applied again at render time (line 939) as a second safety net. Combined: no double-accumulation on reload.
|
||||
|
||||
#### AC8 — `uniqueToolResultsByCallId` silent drop for null/undefined tool_call_id
|
||||
LOW RISK: `uniqueToolResultsByCallId` (line 351-368) has a special case: if `result.tool_call_id` is falsy (null/undefined), the result is always appended without dedup check (line 359 guard). This means multiple results with no `tool_call_id` will all be included. Given that tool_call_id is always populated from the backend for proper tool calls, this is a safe fallback for edge cases.
|
||||
|
||||
#### AC9 — `_is_execution_failure_tool_error` catches ANY non-empty error that is not required-param error
|
||||
PASS. This is intentionally broad — any `{error: "..."}` dict that isn't a missing-param pattern is treated as an execution failure. This correctly captures geocode errors, network errors, unexpected exceptions, etc.
|
||||
|
||||
#### AC10 — retry fallback after geocode failure still uses `_is_execution_failure_tool_error` guard
|
||||
PASS (line 631-634): The retry succeeds only if the retry result is not a required-param error AND not an execution failure. If all retries fail, the `result` remains the original geocode error, which is then caught by `_is_execution_failure_tool_error` at line 713 and handled as a failure (not emitted as tool result).
|
||||
|
||||
### Mutation escape analysis
|
||||
- MUTATION_ESCAPES: 1/8
|
||||
- The unchecked mutation: flipping `not successful_tool_calls` → `successful_tool_calls` in the all-failure-round check (line 749) would cause failures to be silently dropped without incrementing the counter. No existing test with partial success + partial failure to detect this mutation. Low risk given the mixed-batch scenario is acceptable behavior.
|
||||
|
||||
### Summary
|
||||
Implementation is sound for the primary stated goals. All-failure rounds stop cleanly; permanent failures stop in 1 round; clarification flow is intact; execution failures are not leaked to frontend. Frontend dedupe is correct via `tool_results: undefined` reset + ID-based dedup. One low-severity gap: mixed-batch partial failure silently drops the failing tool call (acceptable pre-release).
|
||||
@@ -1,59 +0,0 @@
|
||||
---
|
||||
title: Chat Tool Error Handling Architecture
|
||||
type: note
|
||||
permalink: voyage/knowledge/chat-tool-error-handling-architecture
|
||||
tags:
|
||||
- chat
|
||||
- tools
|
||||
- error-handling
|
||||
- architecture
|
||||
- pattern
|
||||
---
|
||||
|
||||
# Chat Tool Error Handling Architecture
|
||||
|
||||
## Overview
|
||||
|
||||
The chat agent tool loop classifies tool call outcomes into three distinct categories, each with different retry and surfacing behavior.
|
||||
|
||||
## Error Classification
|
||||
|
||||
### 1. Required-parameter validation errors
|
||||
- [pattern] Detected by `_is_required_param_tool_error()` regex matching `"... is required"` patterns in tool result `error` field
|
||||
- [convention] Short-circuited immediately with a user-visible error — never replayed into LLM history
|
||||
- [pattern] `search_places` missing `location` has a special path: `_is_search_places_location_retry_candidate_error()` triggers deterministic context-retry (trip destination → first itinerary stop → user clarification) before surfacing
|
||||
|
||||
### 2. Execution failures (new in chat-tool-loop-fix)
|
||||
- [pattern] Any `error`-bearing tool result dict that does NOT match the required-param pattern is classified as an execution failure by `_is_execution_failure_tool_error()`
|
||||
- [convention] Execution failures are NEVER replayed into LLM context — they are excluded from `successful_tool_calls`, `successful_tool_messages`, and `successful_tool_chat_entries`
|
||||
- [pattern] `tool_iterations` increments only after at least one successful tool call in a round
|
||||
- [pattern] All-failure rounds (every tool in a round fails) increment `all_failure_rounds`, capped at `MAX_ALL_FAILURE_ROUNDS` (3)
|
||||
- [pattern] Permanent failures (`retryable: false` in tool result, e.g. `web_search` ImportError) set `all_failure_rounds = MAX_ALL_FAILURE_ROUNDS` for immediate stop
|
||||
- [convention] Execution failures emit a `tool_execution_error` SSE event with sanitized text via `_build_tool_execution_error_event()`
|
||||
|
||||
### 3. Geocoding failures in search_places
|
||||
- [pattern] `Could not geocode location: ...` errors are detected by `_is_search_places_location_retry_candidate_error()` (same path as missing-location)
|
||||
- [convention] Eligible for the existing context-retry fallback before being treated as a terminal failure
|
||||
|
||||
## Error Sanitization
|
||||
|
||||
- [convention] `_safe_error_payload()` maps LiteLLM exceptions to sanitized user-safe categories — never forwards raw `exc.message`
|
||||
- [convention] `execute_tool()` catch-all returns `{"error": "Tool execution failed"}` (hardcoded) — never raw `str(exc)`
|
||||
- [decision] The `_build_tool_execution_error_event()` wraps sanitized tool error text in a user-safe sentence for SSE emission and DB persistence
|
||||
|
||||
## Frontend Tool-Result Deduplication
|
||||
|
||||
- [pattern] Three-layer dedup by `tool_call_id`:
|
||||
1. `rebuildConversationMessages()` sets `tool_results: undefined` on all assistant messages, then re-derives exclusively from persisted `role=tool` sibling rows — discards any server-side pre-populated `tool_results`
|
||||
2. `appendToolResultDedup()` deduplicates during both rebuild walk and live SSE ingestion
|
||||
3. `uniqueToolResultsByCallId()` at render time provides a final safety net
|
||||
|
||||
## Key Files
|
||||
|
||||
- Backend classification/loop: `backend/server/chat/views/__init__.py`
|
||||
- Tool execution + sanitization: `backend/server/chat/agent_tools.py`
|
||||
- Frontend dedup: `frontend/src/lib/components/AITravelChat.svelte`
|
||||
- Tests: `backend/server/chat/tests.py` (32 total chat tests)
|
||||
|
||||
## Relations
|
||||
- related_to [[assistant-add-flow-fixes]]
|
||||
@@ -1,101 +0,0 @@
|
||||
---
|
||||
title: assistant-add-flow-fixes
|
||||
type: note
|
||||
permalink: voyage/plans/assistant-add-flow-fixes-1
|
||||
---
|
||||
|
||||
## Feature A hotfix implementation note
|
||||
- Added a targeted chat hotfix in `ChatViewSet` so required-arg validation for `search_places` with missing `location` now emits a natural assistant clarification as SSE `content` (and persists it as an assistant message) instead of streaming a `tool_validation_error` payload.
|
||||
- Kept existing generic required-param short-circuit behavior unchanged for all other tools/errors.
|
||||
- Added tests covering the specific detection helper and end-to-end streaming behavior for the missing-location clarify path.
|
||||
|
||||
## Follow-up fix: itinerary-aware search fallback
|
||||
- Feature: Use collection itinerary context to satisfy missing `search_places.location` when available, and preserve dining intent when category is omitted.
|
||||
- Acceptance:
|
||||
- Restaurant requests across a collection itinerary should not ask for location when itinerary context can provide one.
|
||||
- `search_places` should not silently return tourism attractions for restaurant requests when category is omitted.
|
||||
- Existing behavior for non-dining searches remains intact.
|
||||
- Planned worktree: `.worktrees/chat-search-intent-fix`
|
||||
- Planned branch: `fix/chat-search-intent-fix`
|
||||
|
||||
## Implementation note: chat search intent + context retry
|
||||
- Implemented deterministic `search_places` retry fallback in `ChatViewSet`: when tool call misses required `location`, retry first with trip context (`destination`, then first itinerary stop label), then retain existing user-location-reply retry behavior.
|
||||
- Added deterministic dining-intent heuristic in `ChatViewSet` to inject `category="food"` when `search_places.category` is omitted and current/recent user messages indicate restaurant/dining intent.
|
||||
- Updated `search_places` tool schema text to explicitly describe `food` (restaurants/dining), `tourism` (attractions), and `lodging` (hotels/stays).
|
||||
- Added targeted tests for category inference and a collection-context integration flow proving restaurant requests can complete via destination-backed retry without location clarification prompt.
|
||||
|
||||
## Regression Exploration: Restaurant Recommendations Across Test Itinerary (2026-03-10)
|
||||
|
||||
### Observed failure modes
|
||||
1. **"too many tool calls"** — `tool_loop_limit` error after 10 iterations
|
||||
2. **Repeated "search places could not be completed"** — LLM retries `search_places` each iteration, each producing a non-location-error that bypasses the clarification/retry gate
|
||||
3. **Repeated "web search could not be completed"** — LLM retries `web_search` each iteration; non-fatal errors (rate-limit, ImportError, generic failure) are treated as successful tool calls and fed back to the LLM, which re-calls the tool
|
||||
|
||||
### Root cause analysis
|
||||
|
||||
#### RC1 — `web_search` non-fatal errors are NOT short-circuited (primary cause of web_search loop)
|
||||
**File**: `backend/server/chat/views/__init__.py`, `event_stream()` loop (line 474+)
|
||||
- `web_search` returns `{"error": "...", "results": []}` on ImportError, rate-limit, and generic failures.
|
||||
- `_is_required_param_tool_error()` only matches "location is required" / "query is required" / `"X is required"` pattern. It does **not** match "Web search failed. Please try again." or "Search rate limit reached."
|
||||
- So the error payload passes through `_is_required_param_tool_error` → `False`, and the tool result (including the error text) is added to `successful_tool_calls` and fed back into the LLM context as a successful tool message.
|
||||
- The LLM sees a tool result with `error` text but no real data, and calls `web_search` again — repeating until `MAX_TOOL_ITERATIONS` (10) is hit.
|
||||
|
||||
#### RC2 — `search_places` geocoding/Overpass failures are NOT short-circuited (cause of search_places loop)
|
||||
**File**: `backend/server/chat/agent_tools.py` `search_places()` (line 149, 199–205)
|
||||
- `search_places` returns `{"error": "Could not geocode location: X"}` when Nominatim returns empty results, `{"error": "Places API request failed: ..."}` on network errors, or `{"error": "An unexpected error occurred..."}` on generic failures.
|
||||
- **None of these match** `_is_required_param_tool_error()` (which only detects missing-param patterns). They are not `location is required`.
|
||||
- The context-retry fallback (`_is_search_places_missing_location_required_error`, lines 522–565) only activates on `location is required`. Geocoding/network failures silently pass through as "successful" tool results.
|
||||
- LLM receives `{"error": "Could not geocode location: Paris"}` in tool context, tries again → same failure → 10 times → loop limit.
|
||||
|
||||
#### RC3 — `MAX_TOOL_ITERATIONS = 10` counter increments even when ALL tool calls in an iteration are failures
|
||||
**File**: `backend/server/chat/views/__init__.py`, line 475
|
||||
- `tool_iterations += 1` happens **before** tool results are checked. A single iteration with 2 tool calls each producing non-fatal errors still costs 1 iteration count and adds 2 failed results to LLM context.
|
||||
- For a restaurant request across a multi-stop itinerary, the LLM may call `search_places` for multiple stops per iteration, burning through 10 iterations rapidly.
|
||||
|
||||
#### RC4 — `successful_tool_calls` naming is misleading — it accumulates ALL non-required-param-error results
|
||||
**File**: `backend/server/chat/views/__init__.py`, lines 476, 635
|
||||
- The variable `successful_tool_calls` accumulates tool calls whose results are not `_is_required_param_tool_error`. This includes tool calls that returned geocoding failures, network errors, or `web_search` unavailability errors. These are appended to `current_messages` and persisted to DB, causing the LLM to see repeated failure payloads.
|
||||
|
||||
### Edit points and recommended fix approaches
|
||||
|
||||
#### EP1 — Add non-fatal tool error detection and short-circuit
|
||||
**File**: `backend/server/chat/views/__init__.py`
|
||||
**Symbol**: New static method `_is_nonfatal_tool_error(result)` + use in `event_stream()`
|
||||
- Detect results that have `{"error": ..., "results": []}` (web_search style) or `{"error": "Could not geocode..."}` / `{"error": "Places API request failed..."}` (search_places failures).
|
||||
- Recommended approach: check `result.get("error")` is non-empty AND the result is not a successful payload (no `results` data, no places data). If a tool returns an error result that is not a required-param error, log a warning, do NOT add to `successful_tool_calls`, and instead emit a summary to the LLM context as a tool message explaining the failure — then break the loop or emit a user-visible message after N consecutive non-fatal failures.
|
||||
|
||||
#### EP2 — Distinguish `web_search` persistent unavailability (ImportError) from retryable errors
|
||||
**File**: `backend/server/chat/agent_tools.py`
|
||||
**Symbol**: `web_search()` (lines 294–307)
|
||||
- Currently ImportError (`duckduckgo-search` not installed) and rate-limit both return the same shape: `{"error": "...", "results": []}`.
|
||||
- Recommended approach: add a sentinel field `"permanent": True` for ImportError, so the view layer can detect non-retryable failures and short-circuit immediately on the first iteration rather than retrying 10 times. Alternatively add a distinct error key like `"error_code": "unavailable"` vs `"error_code": "rate_limited"`.
|
||||
|
||||
#### EP3 — Add non-fatal tool error counting per-tool to break loops early
|
||||
**File**: `backend/server/chat/views/__init__.py`
|
||||
**Symbol**: `event_stream()` (line 424+)
|
||||
- Track `consecutive_nonfatal_errors` per tool (or globally). After 2 consecutive failures of the same tool name, stop calling it for this turn and emit a user-facing message: "Search places could not be completed for this location."
|
||||
- This prevents the 10-iteration burn for a single unreachable external API.
|
||||
|
||||
#### EP4 — Only increment `tool_iterations` when at least one tool call succeeds
|
||||
**File**: `backend/server/chat/views/__init__.py`
|
||||
**Symbol**: `event_stream()` line 475 (`tool_iterations += 1`)
|
||||
- Move `tool_iterations += 1` to after the tool result loop, and only increment if `len(successful_tool_calls) > 0`. Failed-only iterations should not count toward the limit (or should count less). Alternatively, add a separate `failed_tool_iterations` counter with a lower limit (e.g. 3) specifically for all-failure iterations.
|
||||
|
||||
#### EP5 — `search_places` geocoding failure should surface earlier in retry logic
|
||||
**File**: `backend/server/chat/views/__init__.py`
|
||||
**Symbol**: `event_stream()` lines 522–565 (the `_is_search_places_missing_location_required_error` retry block)
|
||||
- Current retry block only fires on `location is required`. Extend retry logic to also retry with fallback location when result contains `{"error": "Could not geocode location: ..."}` (Nominatim empty result) using the same `retry_locations` fallback list. This gives context-location a second chance when Nominatim fails on the first geocode.
|
||||
|
||||
### Files / symbols map
|
||||
|
||||
| File | Symbol | Role |
|
||||
|------|--------|------|
|
||||
| `backend/server/chat/views/__init__.py` | `MAX_TOOL_ITERATIONS = 10` (line 422) | Hard loop cap |
|
||||
| `backend/server/chat/views/__init__.py` | `tool_iterations += 1` (line 475) | Iteration counter — always increments, even on all-failure iterations |
|
||||
| `backend/server/chat/views/__init__.py` | `_is_required_param_tool_error()` (line 136) | Only detects missing-param errors; does NOT catch tool execution failures |
|
||||
| `backend/server/chat/views/__init__.py` | `_is_search_places_missing_location_required_error()` (line 180) | Only triggers retry for `location is required`; geocoding/network failures bypass it |
|
||||
| `backend/server/chat/views/__init__.py` | `successful_tool_calls` (line 476) | Misnamed — accumulates all non-required-param-error results including failures |
|
||||
| `backend/server/chat/views/__init__.py` | `event_stream()` lines 480–660 | Full tool execution loop |
|
||||
| `backend/server/chat/agent_tools.py` | `search_places()` (line 125) | Returns `{"error": "..."}` on geocode/network failure — not a required-param error |
|
||||
| `backend/server/chat/agent_tools.py` | `web_search()` (line 257) | Returns `{"error": "...", "results": []}` on import/rate-limit/generic failure |
|
||||
| `backend/server/chat/llm_client.py` | `stream_chat_completion()` (line 380) | No tool-level retry logic — passes through all tool results |
|
||||
@@ -1,208 +0,0 @@
|
||||
---
|
||||
title: assistant-add-flow-fixes
|
||||
type: note
|
||||
permalink: voyage/plans/assistant-add-flow-fixes
|
||||
---
|
||||
|
||||
## Feature A hotfix implementation note
|
||||
- Added a targeted chat hotfix in `ChatViewSet` so required-arg validation for `search_places` with missing `location` now emits a natural assistant clarification as SSE `content` (and persists it as an assistant message) instead of streaming a `tool_validation_error` payload.
|
||||
- Kept existing generic required-param short-circuit behavior unchanged for all other tools/errors.
|
||||
- Added tests covering the specific detection helper and end-to-end streaming behavior for the missing-location clarify path.
|
||||
|
||||
## Follow-up fix: itinerary-aware search fallback
|
||||
- Feature: Use collection itinerary context to satisfy missing `search_places.location` when available, and preserve dining intent when category is omitted.
|
||||
- Acceptance:
|
||||
- Restaurant requests across a collection itinerary should not ask for location when itinerary context can provide one.
|
||||
- `search_places` should not silently return tourism attractions for restaurant requests when category is omitted.
|
||||
- Existing behavior for non-dining searches remains intact.
|
||||
- Planned worktree: `.worktrees/chat-search-intent-fix`
|
||||
- Planned branch: `fix/chat-search-intent-fix`
|
||||
|
||||
## Implementation note: chat search intent + context retry
|
||||
- Implemented deterministic `search_places` retry fallback in `ChatViewSet`: when tool call misses required `location`, retry first with trip context (`destination`, then first itinerary stop label), then retain existing user-location-reply retry behavior.
|
||||
- Added deterministic dining-intent heuristic in `ChatViewSet` to inject `category="food"` when `search_places.category` is omitted and current/recent user messages indicate restaurant/dining intent.
|
||||
- Updated `search_places` tool schema text to explicitly describe `food` (restaurants/dining), `tourism` (attractions), and `lodging` (hotels/stays).
|
||||
- Added targeted tests for category inference and a collection-context integration flow proving restaurant requests can complete via destination-backed retry without location clarification prompt.
|
||||
|
||||
## Regression: Duplicate Tool Summaries / Repeated Items — Frontend Analysis (2026-03-10)
|
||||
|
||||
### Context
|
||||
Exploring why restaurant recommendations across a test itinerary produce duplicated itinerary-item summaries and repeated warning toasts in the frontend chat renderer.
|
||||
|
||||
### Root Cause 1 — `rebuildConversationMessages` double-accumulates `tool_results` on reload
|
||||
|
||||
**File:** `frontend/src/lib/components/AITravelChat.svelte`, `rebuildConversationMessages()` (lines 333–375)
|
||||
|
||||
**Mechanism:** When `selectConversation()` reloads a conversation from the API, `rebuildConversationMessages()` walks the raw message list and pushes each `role=tool` message into its parent assistant message's `tool_results` array. The backend serializer may return a pre-populated `tool_results` field on the assistant message object. Because the `rebuilt` map at line 336 preserves any server-side `tool_results` via spread (`msg.tool_results ? [...msg.tool_results] : undefined`), and then the walk loop **appends again** from the raw `role=tool` sibling messages, any tool result present on both the server-side `assistant.tool_results` field *and* as a sibling `role=tool` message will be pushed **twice**.
|
||||
|
||||
**Second trigger path:** A multi-stop itinerary causes the backend tool loop to fire `search_places` once per stop via the context-retry fallback. Each call emits a separate `tool_result` SSE event. The frontend appends each into `assistantMsg.tool_results` during streaming (line 472). On conversation reload, `rebuildConversationMessages` re-appends them again from the persisted `role=tool` DB rows → **2× the place-result cards**.
|
||||
|
||||
**Fix approach:** Change line 336 so the initial `rebuilt` map always sets `tool_results: undefined`. This discards any server-side `tool_results` field on assistant messages and lets the sibling-walk be the sole source of truth:
|
||||
```ts
|
||||
// line 334–337 — change to:
|
||||
const rebuilt = rawMessages.map((msg) => ({
|
||||
...msg,
|
||||
tool_results: undefined // always re-derive from role=tool siblings below
|
||||
}));
|
||||
```
|
||||
|
||||
### Root Cause 2 — Multi-iteration tool loop creates multiple assistant messages; `rebuildConversationMessages` may pair results to wrong parent
|
||||
|
||||
**File:** `AITravelChat.svelte`, `rebuildConversationMessages()` lines 342–344 and backend `event_stream` lines 662–688
|
||||
|
||||
**Mechanism:** For a multi-restaurant flow the backend iterates: `search_places` (iteration 1) → saves assistant+tool messages → then `add_to_itinerary × N` (iterations 2–N). Each iteration saves a separate assistant message with distinct `tool_calls`. On reload, the loop in `rebuildConversationMessages` sets `activeAssistant` to each assistant message with `tool_calls`, then nulls it when `tool_results.length >= toolCallIds.length` (lines 366–371). If the DB `created_at` timestamps for assistant and tool messages are identical (same DB transaction), the `order_by("created_at")` in `_build_llm_messages` is non-deterministic, and the sibling walk may misalign tool results to wrong parent assistant messages → wrong or duplicated summary cards.
|
||||
|
||||
**Fix approach (backend):** After saving the assistant message at line 670 with `await sync_to_async(ChatMessage.objects.create)`, save each tool message **in a separate awaited call** (already the case at lines 678–682) and add a `time.sleep(0)` / tick between saves to guarantee distinct `created_at`. Alternatively, add an explicit `order` integer field to `ChatMessage` and sort by it instead of `created_at`.
|
||||
|
||||
### Root Cause 3 — Repeated "success" toast from duplicate place-result cards
|
||||
|
||||
**File:** `AITravelChat.svelte`, `addPlaceToItinerary()` lines 569–632
|
||||
|
||||
**Mechanism:** `addToast('success', $t('added_successfully'))` fires unconditionally at line 626 on every user-triggered add. When Root Cause 1 causes duplicate place-result cards to be rendered, the user sees two "Add to itinerary" buttons for the same place and clicking either fires a toast. Fixing Root Cause 1 (dedup in `rebuildConversationMessages`) eliminates the duplicate cards and therefore this repeat toast.
|
||||
|
||||
**No action required here unless RC1 fix is insufficient.** As a defense-in-depth measure, consider tracking which place names have been added within the session and disabling the button.
|
||||
|
||||
### Concrete Edit Points
|
||||
|
||||
| File | Function/Section | Change |
|
||||
|---|---|---|
|
||||
| `AITravelChat.svelte` L334–336 | `rebuilt` map init in `rebuildConversationMessages` | Change `tool_results: msg.tool_results ? [...msg.tool_results] : undefined` → `tool_results: undefined` (always re-derive from sibling walk) |
|
||||
| `AITravelChat.svelte` L467–474 | SSE `parsed.tool_result` handler in `sendMessage` | Add dedup guard: skip push if `tool_call_id` already in `assistantMsg.tool_results` |
|
||||
| `AITravelChat.svelte` L894–944 | `{#each msg.tool_results as result}` template | Add JS-side dedup by `tool_call_id` or a `{#key}` block before rendering |
|
||||
| `backend/server/chat/views/__init__.py` L670–682 | assistant + tool message save in `event_stream` | Verify saved order is deterministic; add explicit sequence/order field if `created_at` collisions are possible |
|
||||
|
||||
### Recommended Fix Sequence
|
||||
1. **Priority 1:** `AITravelChat.svelte` line 336 — change to `tool_results: undefined`. Single-line fix, eliminates Root Cause 1 and RC3 transitively.
|
||||
2. **Priority 2:** `AITravelChat.svelte` SSE handler (line 472) — add `tool_call_id` dedup guard for defense against backend emitting duplicate events.
|
||||
3. **Priority 3:** Template dedup (lines 894–944) as render-layer safety net.
|
||||
4. **Verify:** Reload a multi-restaurant conversation; each `add_to_itinerary` summary and each place card must appear exactly once.
|
||||
|
||||
### Related Notes
|
||||
- [[assistant-add-flow-fixes]] — backend search intent + retry logic that triggers multi-iteration tool loops
|
||||
|
||||
## Follow-on fix: restaurant recommendation tool loop
|
||||
- [ ] Workstream `fix/chat-tool-loop-fix` in `.worktrees/chat-tool-loop-fix`: stop chat tool loops from replaying execution failures into the LLM context; only count successful tool iterations, cap repeated all-failure rounds, and hard-stop permanent `web_search` failures.
|
||||
- Acceptance: restaurant recommendation requests do not end with "too many tool calls" when tool executions fail; `search_places`/`web_search` failures surface once and stop retry spirals.
|
||||
- [ ] Keep frontend tool-result rendering deduplicated in `AITravelChat.svelte` during both SSE streaming and conversation rebuild.
|
||||
- Acceptance: itinerary/tool summaries do not render duplicate cards or duplicate downstream add-to-itinerary affordances after reload.
|
||||
- [ ] Validate the fix in worktree `fix/chat-tool-loop-fix` with targeted backend chat tests plus frontend `bun run check`/`bun run build`, then perform a functional restaurant-request verification.
|
||||
- Acceptance: validation demonstrates the assistant either returns restaurant guidance or surfaces a single actionable failure instead of looping.
|
||||
|
||||
## Implementation outcome: chat tool loop + duplicate summary fix
|
||||
- Added execution-failure classification in `backend/server/chat/views/__init__.py` that separates required-parameter validation errors from execution failures, and excludes execution-failure tool rows from LLM replay/history filtering.
|
||||
- Updated tool loop control so `tool_iterations` increments only after at least one successful tool call; added bounded `MAX_ALL_FAILURE_ROUNDS = 3` for all-failure rounds, with immediate stop for permanent failures (`retryable: false`).
|
||||
- Extended `search_places` location retry candidate detection to include geocoding failures (`Could not geocode location: ...`) so destination/itinerary/user-location fallback can retry before terminating.
|
||||
- Marked `web_search` dependency import failure as permanent in `backend/server/chat/agent_tools.py` (`retryable: false`) so it is not retried like transient outages.
|
||||
- Fixed frontend duplication in `frontend/src/lib/components/AITravelChat.svelte` by rebuilding tool results only from persisted `role=tool` rows, deduping on `tool_call_id` during rebuild and live SSE ingestion, and rendering deduped tool results.
|
||||
- Added focused backend tests in `backend/server/chat/tests.py` for execution-failure classification, geocode retry fallback, bounded all-failure loop behavior, permanent failure immediate stop, and `web_search` import-failure retryability flag.
|
||||
- Related prior analysis note: [[assistant-add-flow-fixes]]
|
||||
|
||||
## Security Review Verdict: chat-tool-loop-fix (2026-03-10)
|
||||
|
||||
- **Verdict:** CHANGES-REQUESTED
|
||||
- **Review Score:** 6 (2 WARNINGs × 3)
|
||||
- **Lens:** Security
|
||||
- **WARNING 1:** `_build_tool_execution_error_event` (views/__init__.py:260-272) forwards raw tool error text to the client SSE stream. The error string from `execute_tool` (agent_tools.py:643) uses `str(exc)`, which can contain internal details for uncaught exceptions. This is also persisted as assistant message content.
|
||||
- **WARNING 2:** Persisted error text in assistant messages (L769) creates a durable exposure vector — raw error details survive in DB and are re-served on reload.
|
||||
- **No CRITICALs.** Auth/permission checks, injection safety, and CSRF handling remain intact.
|
||||
- **Fix required:** Sanitize error text in `_build_tool_execution_error_event` or at source in `execute_tool` catch-all. Targeted re-review sufficient after fix.
|
||||
- **Confirmed:** `_safe_error_payload()` convention not violated by existing code, but new path bypasses it.
|
||||
- Related: [[assistant-add-flow-fixes]]
|
||||
|
||||
## Review verdict: chat-tool-loop-fix (2026-03-10)
|
||||
|
||||
**VERDICT: APPROVED**
|
||||
**LENS: correctness**
|
||||
**REVIEW_SCORE: 0**
|
||||
|
||||
### What was checked
|
||||
|
||||
1. **Execution failure classification** (`_is_execution_failure_tool_error`, `_is_retryable_execution_failure`): Verified the classification is correctly the complement of `_is_required_param_tool_error` — any `error`-bearing dict that doesn't match the required-param pattern is an execution failure. The `retryable` flag defaults to `True` unless explicitly `False`. No false-positive/false-negative classification risk found.
|
||||
|
||||
2. **Execution failure exclusion from LLM context**: Execution failure tool results are never added to `successful_tool_calls`/`successful_tool_messages`/`successful_tool_chat_entries` (skipped via `continue` at line 718). They are never persisted to DB and never appended to `current_messages`. The `_build_llm_messages` filter at lines 65-67 and 84-88 is a belt-and-suspenders defense for pre-existing DB data.
|
||||
|
||||
3. **All-failure-round bounding**: `all_failure_rounds` increments by 1 on each all-failure round and is hard-set to MAX on permanent failures. The outer `while` loop guard (`tool_iterations < MAX_TOOL_ITERATIONS`) remains satisfied during all-failure rounds (since `tool_iterations` isn't incremented), but `all_failure_rounds >= MAX_ALL_FAILURE_ROUNDS` (3) exits within bounded iterations. No infinite loop possible.
|
||||
|
||||
4. **Permanent failure fast-path**: `encountered_permanent_failure = True` → `all_failure_rounds = MAX_ALL_FAILURE_ROUNDS` → immediate stop. Confirmed `web_search` ImportError returns `retryable: False`.
|
||||
|
||||
5. **Geocode failure retry**: `_is_search_places_location_retry_candidate_error` now matches both "location is required" and "Could not geocode location: ..." patterns. Retry uses trip context destination or user content. Retry success check (lines 631-635) now requires the retry to be free of BOTH required-param AND execution-failure errors — correct tightening vs main branch.
|
||||
|
||||
6. **Frontend tool-result dedup**: Three layers of defense confirmed:
|
||||
- `rebuildConversationMessages` now sets `tool_results: undefined` (line 374), eliminating RC1 (double-accumulation from server-side pre-populated `tool_results`).
|
||||
- `appendToolResultDedup` deduplicates by `tool_call_id` during both rebuild (line 402) and SSE streaming (line 514).
|
||||
- `uniqueToolResultsByCallId` at render time (line 939) provides a final safety net.
|
||||
- Edge case: tool results without `tool_call_id` bypass dedup — acceptable since real tool calls always have IDs.
|
||||
|
||||
7. **Test coverage**: Tests cover execution failure classification, geocode retry, bounded all-failure loop (verifying 3 LLM calls + 3 tool calls), permanent failure immediate stop (verifying 1 LLM call + 1 tool call), and `web_search` ImportError retryability flag. Mock setup is correct — `side_effect = failing_stream` (callable) creates fresh async generators per call.
|
||||
|
||||
### No findings
|
||||
|
||||
- No off-by-one in loop bounds.
|
||||
- No async/await misuse — `sync_to_async` wraps synchronous `execute_tool` correctly.
|
||||
- No resource leaks — generators terminate cleanly via `return` or `break`.
|
||||
- No race conditions — the event stream is single-threaded.
|
||||
- No contract violations — SSE event format is preserved for both success and error paths.
|
||||
- No mutation/shared-state risks — `first_execution_failure` and `encountered_permanent_failure` are scoped per tool-call batch.
|
||||
|
||||
## Security fix outcome: execute_tool error sanitization
|
||||
- Updated `backend/server/chat/agent_tools.py` so `execute_tool()` catch-all no longer returns raw `str(exc)`.
|
||||
- New catch-all payload is now the safe generic message: `{"error": "Tool execution failed"}`.
|
||||
- Added focused regression test in `backend/server/chat/tests.py` (`ExecuteToolErrorSanitizationTests`) to verify exception text is not exposed via `execute_tool()`.
|
||||
|
||||
## Security Re-Review Verdict: execute_tool sanitization fix (2026-03-10)
|
||||
|
||||
- **Verdict:** APPROVED
|
||||
- **Review Score:** 0
|
||||
- **Lens:** Security (targeted re-review)
|
||||
- **Resolution:** Both prior WARNINGs (SSE raw error leak, DB persistence of raw errors) are RESOLVED.
|
||||
- **Evidence:**
|
||||
- `execute_tool()` catch-all at `agent_tools.py:643` now returns `{"error": "Tool execution failed"}` (hardcoded string, no `str(exc)`).
|
||||
- `_build_tool_execution_error_event()` at `views/__init__.py:260-272` wraps the sanitized error in a user-safe sentence.
|
||||
- SSE emission at `views/__init__.py:775` and DB persistence at `views/__init__.py:769` both receive only the wrapped safe text.
|
||||
- Regression test `ExecuteToolErrorSanitizationTests` at `tests.py:49-65` validates sanitization by injecting `RuntimeError("sensitive backend detail")` and asserting the generic output.
|
||||
- **No new issues introduced.** Single-line source-level fix with no new code paths or trust boundaries.
|
||||
- Related: [[assistant-add-flow-fixes]]
|
||||
|
||||
## Outcome: restaurant recommendation tool loop fix (2026-03-10)
|
||||
- [x] Workstream `fix/chat-tool-loop-fix` in `.worktrees/chat-tool-loop-fix`: chat tool execution failures no longer replay into LLM context as successful tool results; successful-tool iteration budget is separate from all-failure rounds; permanent failures stop immediately.
|
||||
- [x] Frontend `AITravelChat.svelte` dedupes tool results on rebuild, SSE ingestion, and render.
|
||||
- [x] Validation passed: backend targeted chat suites passed after syncing worktree files into `voyage-server-1:/code/chat/...`; frontend `bun run check` and `bun run build` passed with only pre-existing warnings.
|
||||
- [x] Review passed after sanitizing `execute_tool()` catch-all errors to `"Tool execution failed"`.
|
||||
|
||||
Open risk accepted for pre-release:
|
||||
- Mixed batches with both successful and failed tool calls can still drop the failed tool silently and rely on `MAX_TOOL_ITERATIONS` as the backstop; adversarial tester marked this low severity and acceptable for pre-release.
|
||||
|
||||
## Implementation outcome: backend context-location extraction + clarification suppression
|
||||
- Updated `backend/server/chat/views/__init__.py` itinerary-stop fallback parsing so address-only stops derive a city hint from the last non-numeric comma-separated segment (e.g. `Little Turnstile 6, London` → `London`) before computing `trip_context_location`.
|
||||
- Updated `search_places` context-retry block to track `attempted_location_retry`; when retries were attempted but still return required-param errors, result is converted to execution failure (`Could not search places at the provided itinerary locations`) to avoid location clarification prompts.
|
||||
- Clarification branch now only triggers for missing-location required-param errors when no context retry was attempted.
|
||||
- Added tests in `backend/server/chat/tests.py` for city extraction from fallback address and for retry-with-context failure path asserting no clarification SSE is emitted and execution-failure behavior is returned.
|
||||
|
||||
## Targeted Re-Review Verdict: city extraction + clarification suppression (2026-03-10)
|
||||
|
||||
- **Verdict:** APPROVED
|
||||
- **Review Score:** 0
|
||||
- **Lens:** Correctness (targeted re-review of Fix 1 + Fix 2 only)
|
||||
- **Scope:** `views/__init__.py` lines 444-462 (city extraction from fallback address) and lines 611-671 + 694-700 (retry failure → execution failure mutation + clarification guard). Plus two new tests in `tests.py`.
|
||||
|
||||
### Fix 1: City extraction — edge cases verified
|
||||
- Comma address: extracts last non-empty non-numeric segment (correct)
|
||||
- No comma: falls back to raw `fallback_name` (correct)
|
||||
- All-numeric segments: falls back to full `fallback_name` (safe, will fail geocoding gracefully)
|
||||
- Empty parts after split: filtered by truthiness check (correct)
|
||||
- Dedup via `stop_key`: uses extracted city, correctly collapses same-city addresses
|
||||
|
||||
### Fix 2: Retry failure mutation — flow verified
|
||||
- `attempted_location_retry` is per-tool-call scoped (no cross-call bleed)
|
||||
- Mutated result `"Could not search places at the provided itinerary locations"` correctly fails `_is_required_param_tool_error` (no match on exact strings or regex)
|
||||
- Mutated result correctly passes `_is_execution_failure_tool_error` (dict with non-empty error, not required-param)
|
||||
- Routes to execution-failure handler, not clarification branch
|
||||
- Clarification guard at lines 694-700 is defense-in-depth (unreachable in mutation scenario, but protects against future refactoring)
|
||||
|
||||
### Test coverage verified
|
||||
- `test_collection_context_retry_extracts_city_from_fallback_address`: correctly tests city extraction from `"Little Turnstile 6, London"` → `"London"` retry
|
||||
- `test_context_retry_failure_does_not_emit_location_clarification`: correctly tests mutation routing to `tool_execution_error` SSE event, no clarification
|
||||
- Both tests exercise the intended code paths with correct mock setups
|
||||
|
||||
### No findings — APPROVED with zero issues
|
||||
- Related: [[assistant-add-flow-fixes]]
|
||||
Reference in New Issue
Block a user