Chat Tool Error Handling Architecture

Overview

The chat agent tool loop classifies tool call outcomes into three distinct categories, each with different retry and surfacing behavior.

[pattern] Detected by _is_required_param_tool_error() regex matching "... is required" patterns in tool result error field
[convention] Short-circuited immediately with a user-visible error — never replayed into LLM history
[pattern] search_places missing location has a special path: _is_search_places_location_retry_candidate_error() triggers deterministic context-retry (trip destination → first itinerary stop → user clarification) before surfacing

[pattern] Any error-bearing tool result dict that does NOT match the required-param pattern is classified as an execution failure by _is_execution_failure_tool_error()
[convention] Execution failures are NEVER replayed into LLM context — they are excluded from successful_tool_calls, successful_tool_messages, and successful_tool_chat_entries
[pattern] tool_iterations increments only after at least one successful tool call in a round
[pattern] All-failure rounds (every tool in a round fails) increment all_failure_rounds, capped at MAX_ALL_FAILURE_ROUNDS (3)
[pattern] Permanent failures (retryable: false in tool result, e.g. web_search ImportError) set all_failure_rounds = MAX_ALL_FAILURE_ROUNDS for immediate stop
[convention] Execution failures emit a tool_execution_error SSE event with sanitized text via _build_tool_execution_error_event()

[pattern] Could not geocode location: ... errors are detected by _is_search_places_location_retry_candidate_error() (same path as missing-location)
[convention] Eligible for the existing context-retry fallback before being treated as a terminal failure

[convention] _safe_error_payload() maps LiteLLM exceptions to sanitized user-safe categories — never forwards raw exc.message
[convention] execute_tool() catch-all returns {"error": "Tool execution failed"} (hardcoded) — never raw str(exc)
[decision] The _build_tool_execution_error_event() wraps sanitized tool error text in a user-safe sentence for SSE emission and DB persistence

[pattern] Three-layer dedup by tool_call_id:
1. rebuildConversationMessages() sets tool_results: undefined on all assistant messages, then re-derives exclusively from persisted role=tool sibling rows — discards any server-side pre-populated tool_results
2. appendToolResultDedup() deduplicates during both rebuild walk and live SSE ingestion
3. uniqueToolResultsByCallId() at render time provides a final safety net