Files
voyage/knowledge/Chat Tool Error Handling Architecture.md

3.4 KiB

title, type, permalink, tags
title type permalink tags
Chat Tool Error Handling Architecture note voyage/knowledge/chat-tool-error-handling-architecture
chat
tools
error-handling
architecture
pattern

Chat Tool Error Handling Architecture

Overview

The chat agent tool loop classifies tool call outcomes into three distinct categories, each with different retry and surfacing behavior.

Error Classification

1. Required-parameter validation errors

  • [pattern] Detected by _is_required_param_tool_error() regex matching "... is required" patterns in tool result error field
  • [convention] Short-circuited immediately with a user-visible error — never replayed into LLM history
  • [pattern] search_places missing location has a special path: _is_search_places_location_retry_candidate_error() triggers deterministic context-retry (trip destination → first itinerary stop → user clarification) before surfacing

2. Execution failures (new in chat-tool-loop-fix)

  • [pattern] Any error-bearing tool result dict that does NOT match the required-param pattern is classified as an execution failure by _is_execution_failure_tool_error()
  • [convention] Execution failures are NEVER replayed into LLM context — they are excluded from successful_tool_calls, successful_tool_messages, and successful_tool_chat_entries
  • [pattern] tool_iterations increments only after at least one successful tool call in a round
  • [pattern] All-failure rounds (every tool in a round fails) increment all_failure_rounds, capped at MAX_ALL_FAILURE_ROUNDS (3)
  • [pattern] Permanent failures (retryable: false in tool result, e.g. web_search ImportError) set all_failure_rounds = MAX_ALL_FAILURE_ROUNDS for immediate stop
  • [convention] Execution failures emit a tool_execution_error SSE event with sanitized text via _build_tool_execution_error_event()

3. Geocoding failures in search_places

  • [pattern] Could not geocode location: ... errors are detected by _is_search_places_location_retry_candidate_error() (same path as missing-location)
  • [convention] Eligible for the existing context-retry fallback before being treated as a terminal failure

Error Sanitization

  • [convention] _safe_error_payload() maps LiteLLM exceptions to sanitized user-safe categories — never forwards raw exc.message
  • [convention] execute_tool() catch-all returns {"error": "Tool execution failed"} (hardcoded) — never raw str(exc)
  • [decision] The _build_tool_execution_error_event() wraps sanitized tool error text in a user-safe sentence for SSE emission and DB persistence

Frontend Tool-Result Deduplication

  • [pattern] Three-layer dedup by tool_call_id:
    1. rebuildConversationMessages() sets tool_results: undefined on all assistant messages, then re-derives exclusively from persisted role=tool sibling rows — discards any server-side pre-populated tool_results
    2. appendToolResultDedup() deduplicates during both rebuild walk and live SSE ingestion
    3. uniqueToolResultsByCallId() at render time provides a final safety net

Key Files

  • Backend classification/loop: backend/server/chat/views/__init__.py
  • Tool execution + sanitization: backend/server/chat/agent_tools.py
  • Frontend dedup: frontend/src/lib/components/AITravelChat.svelte
  • Tests: backend/server/chat/tests.py (32 total chat tests)

Relations