docs: update docs and AGENTS.md with chat tool loop fix patterns

This commit is contained in:
2026-03-10 18:40:29 +00:00
parent 0ca73a417d
commit c4b8f291f2
53 changed files with 801 additions and 73 deletions

View File

@@ -0,0 +1,59 @@
---
title: Chat Tool Error Handling Architecture
type: note
permalink: voyage/knowledge/chat-tool-error-handling-architecture
tags:
- chat
- tools
- error-handling
- architecture
- pattern
---
# Chat Tool Error Handling Architecture
## Overview
The chat agent tool loop classifies tool call outcomes into three distinct categories, each with different retry and surfacing behavior.
## Error Classification
### 1. Required-parameter validation errors
- [pattern] Detected by `_is_required_param_tool_error()` regex matching `"... is required"` patterns in tool result `error` field
- [convention] Short-circuited immediately with a user-visible error — never replayed into LLM history
- [pattern] `search_places` missing `location` has a special path: `_is_search_places_location_retry_candidate_error()` triggers deterministic context-retry (trip destination → first itinerary stop → user clarification) before surfacing
### 2. Execution failures (new in chat-tool-loop-fix)
- [pattern] Any `error`-bearing tool result dict that does NOT match the required-param pattern is classified as an execution failure by `_is_execution_failure_tool_error()`
- [convention] Execution failures are NEVER replayed into LLM context — they are excluded from `successful_tool_calls`, `successful_tool_messages`, and `successful_tool_chat_entries`
- [pattern] `tool_iterations` increments only after at least one successful tool call in a round
- [pattern] All-failure rounds (every tool in a round fails) increment `all_failure_rounds`, capped at `MAX_ALL_FAILURE_ROUNDS` (3)
- [pattern] Permanent failures (`retryable: false` in tool result, e.g. `web_search` ImportError) set `all_failure_rounds = MAX_ALL_FAILURE_ROUNDS` for immediate stop
- [convention] Execution failures emit a `tool_execution_error` SSE event with sanitized text via `_build_tool_execution_error_event()`
### 3. Geocoding failures in search_places
- [pattern] `Could not geocode location: ...` errors are detected by `_is_search_places_location_retry_candidate_error()` (same path as missing-location)
- [convention] Eligible for the existing context-retry fallback before being treated as a terminal failure
## Error Sanitization
- [convention] `_safe_error_payload()` maps LiteLLM exceptions to sanitized user-safe categories — never forwards raw `exc.message`
- [convention] `execute_tool()` catch-all returns `{"error": "Tool execution failed"}` (hardcoded) — never raw `str(exc)`
- [decision] The `_build_tool_execution_error_event()` wraps sanitized tool error text in a user-safe sentence for SSE emission and DB persistence
## Frontend Tool-Result Deduplication
- [pattern] Three-layer dedup by `tool_call_id`:
1. `rebuildConversationMessages()` sets `tool_results: undefined` on all assistant messages, then re-derives exclusively from persisted `role=tool` sibling rows — discards any server-side pre-populated `tool_results`
2. `appendToolResultDedup()` deduplicates during both rebuild walk and live SSE ingestion
3. `uniqueToolResultsByCallId()` at render time provides a final safety net
## Key Files
- Backend classification/loop: `backend/server/chat/views/__init__.py`
- Tool execution + sanitization: `backend/server/chat/agent_tools.py`
- Frontend dedup: `frontend/src/lib/components/AITravelChat.svelte`
- Tests: `backend/server/chat/tests.py` (32 total chat tests)
## Relations
- related_to [[assistant-add-flow-fixes]]