3.4 KiB
3.4 KiB
title, type, permalink, tags
| title | type | permalink | tags | |||||
|---|---|---|---|---|---|---|---|---|
| Chat Tool Error Handling Architecture | note | voyage/knowledge/chat-tool-error-handling-architecture |
|
Chat Tool Error Handling Architecture
Overview
The chat agent tool loop classifies tool call outcomes into three distinct categories, each with different retry and surfacing behavior.
Error Classification
1. Required-parameter validation errors
- [pattern] Detected by
_is_required_param_tool_error()regex matching"... is required"patterns in tool resulterrorfield - [convention] Short-circuited immediately with a user-visible error — never replayed into LLM history
- [pattern]
search_placesmissinglocationhas a special path:_is_search_places_location_retry_candidate_error()triggers deterministic context-retry (trip destination → first itinerary stop → user clarification) before surfacing
2. Execution failures (new in chat-tool-loop-fix)
- [pattern] Any
error-bearing tool result dict that does NOT match the required-param pattern is classified as an execution failure by_is_execution_failure_tool_error() - [convention] Execution failures are NEVER replayed into LLM context — they are excluded from
successful_tool_calls,successful_tool_messages, andsuccessful_tool_chat_entries - [pattern]
tool_iterationsincrements only after at least one successful tool call in a round - [pattern] All-failure rounds (every tool in a round fails) increment
all_failure_rounds, capped atMAX_ALL_FAILURE_ROUNDS(3) - [pattern] Permanent failures (
retryable: falsein tool result, e.g.web_searchImportError) setall_failure_rounds = MAX_ALL_FAILURE_ROUNDSfor immediate stop - [convention] Execution failures emit a
tool_execution_errorSSE event with sanitized text via_build_tool_execution_error_event()
3. Geocoding failures in search_places
- [pattern]
Could not geocode location: ...errors are detected by_is_search_places_location_retry_candidate_error()(same path as missing-location) - [convention] Eligible for the existing context-retry fallback before being treated as a terminal failure
Error Sanitization
- [convention]
_safe_error_payload()maps LiteLLM exceptions to sanitized user-safe categories — never forwards rawexc.message - [convention]
execute_tool()catch-all returns{"error": "Tool execution failed"}(hardcoded) — never rawstr(exc) - [decision] The
_build_tool_execution_error_event()wraps sanitized tool error text in a user-safe sentence for SSE emission and DB persistence
Frontend Tool-Result Deduplication
- [pattern] Three-layer dedup by
tool_call_id:rebuildConversationMessages()setstool_results: undefinedon all assistant messages, then re-derives exclusively from persistedrole=toolsibling rows — discards any server-side pre-populatedtool_resultsappendToolResultDedup()deduplicates during both rebuild walk and live SSE ingestionuniqueToolResultsByCallId()at render time provides a final safety net
Key Files
- Backend classification/loop:
backend/server/chat/views/__init__.py - Tool execution + sanitization:
backend/server/chat/agent_tools.py - Frontend dedup:
frontend/src/lib/components/AITravelChat.svelte - Tests:
backend/server/chat/tests.py(32 total chat tests)
Relations
- related_to assistant-add-flow-fixes