fix(chat): add saved AI defaults and harden suggestions
This commit is contained in:
0
.memory/research/.gitkeep
Normal file
0
.memory/research/.gitkeep
Normal file
130
.memory/research/auto-learn-preference-signals.md
Normal file
130
.memory/research/auto-learn-preference-signals.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# Research: Auto-Learn User Preference Signals
|
||||
|
||||
## Purpose
|
||||
Map all existing user data that could be aggregated into an automatic preference profile, without requiring manual input.
|
||||
|
||||
## Signal Inventory
|
||||
|
||||
### 1. Location.category (FK → Category)
|
||||
- **Model**: `adventures/models.py:Category` — per-user custom categories (name, display_name, icon)
|
||||
- **Signal**: Top categories by count → dominant interest type (e.g. "hiking", "dining", "cultural")
|
||||
- **Query**: `Location.objects.filter(user=user).values('category__name').annotate(cnt=Count('id')).order_by('-cnt')`
|
||||
- **Strength**: HIGH — user-created categories are deliberate choices
|
||||
|
||||
### 2. Location.tags (ArrayField)
|
||||
- **Model**: `adventures/models.py:Location.tags` — `ArrayField(CharField(max_length=100))`
|
||||
- **Signal**: Most frequent tags across all user locations → interest keywords
|
||||
- **Query**: `Location.objects.filter(user=user).values_list('tags', flat=True).distinct()` (used in `tags_view.py`)
|
||||
- **Strength**: MEDIUM-HIGH — tags are free-text user input
|
||||
|
||||
### 3. Location.rating (FloatField)
|
||||
- **Model**: `adventures/models.py:Location.rating`
|
||||
- **Signal**: Average rating + high-rated locations → positive sentiment for place types; filtering for visited + high-rated → strong preferences
|
||||
- **Query**: `Location.objects.filter(user=user).aggregate(avg_rating=Avg('rating'))` or breakdown by category
|
||||
- **Strength**: HIGH for positive signals (≥4.0); weak if rarely filled in
|
||||
|
||||
### 4. Location.description / Visit.notes (TextField)
|
||||
- **Model**: `adventures/models.py:Location.description`, `Visit.notes`
|
||||
- **Signal**: Free-text content for NLP keyword extraction (budget, adventure, luxury, cuisine words)
|
||||
- **Query**: `Location.objects.filter(user=user).values_list('description', flat=True)`
|
||||
- **Strength**: LOW (requires NLP to extract structured signals; many fields blank)
|
||||
|
||||
### 5. Lodging.type (LODGING_TYPES enum)
|
||||
- **Model**: `adventures/models.py:Lodging.type` — choices: hotel, hostel, resort, bnb, campground, cabin, apartment, house, villa, motel
|
||||
- **Signal**: Most frequently used lodging type → travel style indicator (e.g. "hostel" → budget; "resort/villa" → luxury; "campground/cabin" → outdoor)
|
||||
- **Query**: `Lodging.objects.filter(user=user).values('type').annotate(cnt=Count('id')).order_by('-cnt')`
|
||||
- **Strength**: HIGH — directly maps to trip_style field
|
||||
|
||||
### 6. Lodging.rating (FloatField)
|
||||
- **Signal**: Combined with lodging type, identifies preferred accommodation standards
|
||||
- **Strength**: MEDIUM
|
||||
|
||||
### 7. Transportation.type (TRANSPORTATION_TYPES enum)
|
||||
- **Model**: `adventures/models.py:Transportation.type` — choices: car, plane, train, bus, boat, bike, walking
|
||||
- **Signal**: Primary transport mode → mobility preference (e.g. mostly walking/bike → slow travel; lots of planes → frequent flyer)
|
||||
- **Query**: `Transportation.objects.filter(user=user).values('type').annotate(cnt=Count('id')).order_by('-cnt')`
|
||||
- **Strength**: MEDIUM
|
||||
|
||||
### 8. Activity.sport_type (SPORT_TYPE_CHOICES)
|
||||
- **Model**: `adventures/models.py:Activity.sport_type` — 60+ choices mapped to 10 SPORT_CATEGORIES in `utils/sports_types.py`
|
||||
- **Signal**: Activity categories user is active in → physical/adventure interests
|
||||
- **Categories**: running, walking_hiking, cycling, water_sports, winter_sports, fitness_gym, racket_sports, climbing_adventure, team_sports
|
||||
- **Query**: Already aggregated in `stats_view.py:_get_activity_stats_by_category()` — uses `Activity.objects.filter(user=user).values('sport_type').annotate(count=Count('id'))`
|
||||
- **Strength**: HIGH — objective behavioral data from Strava/Wanderer imports
|
||||
|
||||
### 9. VisitedRegion / VisitedCity (worldtravel)
|
||||
- **Model**: `worldtravel/models.py` — `VisitedRegion(user, region)` and `VisitedCity(user, city)` with country/subregion
|
||||
- **Signal**: Countries/regions visited → geographic preferences (beach vs. mountain vs. city; EU vs. Asia etc.)
|
||||
- **Query**: `VisitedRegion.objects.filter(user=user).select_related('region__country')` → country distribution
|
||||
- **Strength**: MEDIUM-HIGH — "where has this user historically traveled?" informs destination type
|
||||
|
||||
### 10. Collection metadata
|
||||
- **Model**: `adventures/models.py:Collection` — name, description, start/end dates
|
||||
- **Signal**: Collection names/descriptions may contain destination/theme hints; trip duration (end_date − start_date) → travel pace; trip frequency (count, spacing) → travel cadence
|
||||
- **Query**: `Collection.objects.filter(user=user).values('name', 'description', 'start_date', 'end_date')`
|
||||
- **Strength**: LOW-MEDIUM (descriptions often blank; names are free-text)
|
||||
|
||||
### 11. Location.price / Lodging.price (MoneyField)
|
||||
- **Signal**: Average spend across locations/lodging → budget tier
|
||||
- **Query**: `Location.objects.filter(user=user).aggregate(avg_price=Avg('price'))` (requires djmoney amount field)
|
||||
- **Strength**: MEDIUM — but many records may have no price set
|
||||
|
||||
### 12. Location geographic clustering (lat/lon)
|
||||
- **Signal**: Country/region distribution of visited locations → geographic affinity
|
||||
- **Already tracked**: `Location.country`, `Location.region`, `Location.city` (FK, auto-geocoded)
|
||||
- **Query**: `Location.objects.filter(user=user).values('country__name').annotate(cnt=Count('id')).order_by('-cnt')`
|
||||
- **Strength**: HIGH
|
||||
|
||||
### 13. UserAchievement types
|
||||
- **Model**: `achievements/models.py:UserAchievement` — types: `adventure_count`, `country_count`
|
||||
- **Signal**: Milestone count → engagement level (casual vs. power user); high `country_count` → variety-seeker
|
||||
- **Strength**: LOW-MEDIUM (only 2 types currently)
|
||||
|
||||
### 14. ChatMessage content (user role)
|
||||
- **Model**: `chat/models.py:ChatMessage` — `role`, `content`
|
||||
- **Signal**: User messages in travel conversations → intent signals ("I love hiking", "looking for cheap food", "family-friendly")
|
||||
- **Query**: `ChatMessage.objects.filter(conversation__user=user, role='user').values_list('content', flat=True)`
|
||||
- **Strength**: MEDIUM — requires NLP; could be rich but noisy
|
||||
|
||||
## Aggregation Patterns Already in Codebase
|
||||
|
||||
| Pattern | Location | Reusability |
|
||||
|---|---|---|
|
||||
| Activity stats by category | `stats_view.py:_get_activity_stats_by_category()` | Direct reuse |
|
||||
| All-tags union | `tags_view.py:ActivityTypesView.types()` | Direct reuse |
|
||||
| VisitedRegion/City counts | `stats_view.py:counts()` | Direct reuse |
|
||||
| Multi-user preference merge | `llm_client.py:get_aggregated_preferences()` | Partial reuse |
|
||||
| Category-filtered location count | `serializers.py:location_count` | Pattern reference |
|
||||
| Location queryset scoping | `location_view.py:get_queryset()` | Standard pattern |
|
||||
|
||||
## Proposed Auto-Profile Fields from Signals
|
||||
|
||||
| Target Field | Primary Signals | Secondary Signals |
|
||||
|---|---|---|
|
||||
| `cuisines` | Location.tags (cuisine words), Location.category (dining) | Location.description NLP |
|
||||
| `interests` | Activity.sport_type categories, Location.category top-N | Location.tags frequency, VisitedRegion types |
|
||||
| `trip_style` | Lodging.type top (luxury/budget/outdoor), Transportation.type, Activity sport categories | Location.rating Avg, price signals |
|
||||
| `notes` | (not auto-derived — keep manual only) | — |
|
||||
|
||||
## Where to Implement
|
||||
|
||||
**New function target**: `integrations/views/recommendation_profile_view.py` or a new `integrations/utils/auto_profile.py`
|
||||
|
||||
**Suggested function signature**:
|
||||
```python
|
||||
def build_auto_preference_profile(user) -> dict:
|
||||
"""
|
||||
Returns {cuisines, interests, trip_style} inferred from user's travel history.
|
||||
Fields are non-destructive suggestions, not overrides of manual input.
|
||||
"""
|
||||
```
|
||||
|
||||
**New API endpoint target**: `POST /api/integrations/recommendation-preferences/auto-learn/`
|
||||
**ViewSet action**: `@action(detail=False, methods=['post'], url_path='auto-learn')` on `UserRecommendationPreferenceProfileViewSet`
|
||||
|
||||
## Integration Point
|
||||
`get_system_prompt()` in `chat/llm_client.py` already consumes `UserRecommendationPreferenceProfile` — auto-learned values
|
||||
flow directly into AI context with zero additional changes needed there.
|
||||
|
||||
See: [knowledge.md — User Recommendation Preference Profile](../knowledge.md#user-recommendation-preference-profile)
|
||||
See: [plans/ai-travel-agent-redesign.md — WS2](../plans/ai-travel-agent-redesign.md#ws2-user-preference-learning)
|
||||
35
.memory/research/litellm-zen-provider-catalog.md
Normal file
35
.memory/research/litellm-zen-provider-catalog.md
Normal file
@@ -0,0 +1,35 @@
|
||||
# Research: LiteLLM provider catalog and OpenCode Zen support
|
||||
|
||||
Date: 2026-03-08
|
||||
Related plan: [AI travel agent in Collections Recommendations](../plans/ai-travel-agent-collections-integration.md)
|
||||
|
||||
## LiteLLM provider enumeration
|
||||
- Runtime provider list is available via `litellm.provider_list` and currently returns 128 provider IDs in this environment.
|
||||
- The enum source `LlmProviders` can be used for canonical provider identifiers.
|
||||
|
||||
## OpenCode Zen compatibility
|
||||
- OpenCode Zen is **not** a native LiteLLM provider alias.
|
||||
- Zen can be supported via LiteLLM's OpenAI-compatible routing using:
|
||||
- provider id in app: `opencode_zen`
|
||||
- model namespace: `openai/<zen-model>`
|
||||
- `api_base`: `https://opencode.ai/zen/v1`
|
||||
- No new SDK dependency required.
|
||||
|
||||
## Recommended backend contract
|
||||
- Add backend source-of-truth endpoint: `GET /api/chat/providers/`.
|
||||
- Response fields:
|
||||
- `id`
|
||||
- `label`
|
||||
- `available_for_chat`
|
||||
- `needs_api_key`
|
||||
- `default_model`
|
||||
- `api_base`
|
||||
- Return all LiteLLM runtime providers; mark non-mapped providers `available_for_chat=false` for display-only compliance.
|
||||
|
||||
## Data/storage compatibility notes
|
||||
- Existing `UserAPIKey(provider)` model supports adding `opencode_zen` without migration.
|
||||
- Consistent provider ID usage across serializer validation, key lookup, and chat request payload is required.
|
||||
|
||||
## Risks
|
||||
- Zen model names may evolve; keep default model configurable in backend mapping.
|
||||
- Full provider list is large; UI should communicate unavailable-for-chat providers clearly.
|
||||
303
.memory/research/opencode-zen-connection-debug.md
Normal file
303
.memory/research/opencode-zen-connection-debug.md
Normal file
@@ -0,0 +1,303 @@
|
||||
# OpenCode Zen Connection Debug — Research Findings
|
||||
|
||||
**Date**: 2026-03-08
|
||||
**Researchers**: researcher agent (root cause), explorer agent (code path trace)
|
||||
**Status**: Complete — root causes identified, fix proposed
|
||||
|
||||
## Summary
|
||||
|
||||
The OpenCode Zen provider configuration in `backend/server/chat/llm_client.py` has **two critical mismatches** that cause connection/API errors:
|
||||
|
||||
1. **Invalid model ID**: `gpt-4o-mini` does not exist on OpenCode Zen
|
||||
2. **Wrong endpoint for GPT models**: GPT models on Zen use `/responses` endpoint, not `/chat/completions`
|
||||
|
||||
An additional structural risk is that the backend runs under **Gunicorn WSGI** (not ASGI/uvicorn), but `stream_chat_completion` is an `async def` generator that is driven via `_async_to_sync_generator` which creates a new event loop per call. This works but causes every tool iteration to open/close an event loop, which is inefficient and fragile under load.
|
||||
|
||||
## End-to-End Request Path
|
||||
|
||||
### 1. Frontend: `AITravelChat.svelte` → `sendMessage()`
|
||||
- **File**: `frontend/src/lib/components/AITravelChat.svelte`, line 97
|
||||
- POST body: `{ message: <text>, provider: selectedProvider }` (e.g. `"opencode_zen"`)
|
||||
- Sends to: `POST /api/chat/conversations/<id>/send_message/`
|
||||
- On `fetch` network failure: shows `$t('chat.connection_error')` = `"Connection error. Please try again."` (line 191)
|
||||
- On HTTP error: tries `res.json()` → uses `err.error || $t('chat.connection_error')` (line 126)
|
||||
- On SSE `parsed.error`: shows `parsed.error` inline in the chat (line 158)
|
||||
- **Any exception from `litellm` is therefore masked as `"An error occurred while processing your request."` or `"Connection error. Please try again."`**
|
||||
|
||||
### 2. Proxy: `frontend/src/routes/api/[...path]/+server.ts` → `handleRequest()`
|
||||
- Strips and re-generates CSRF token (line 57-60)
|
||||
- POSTs to `http://server:8000/api/chat/conversations/<id>/send_message/`
|
||||
- Detects `content-type: text/event-stream` and streams body directly through (lines 94-98) — **no buffering**
|
||||
- On any fetch error: returns `{ error: 'Internal Server Error' }` (line 109)
|
||||
|
||||
### 3. Backend: `chat/views.py` → `ChatViewSet.send_message()`
|
||||
- Validates provider via `is_chat_provider_available()` (line 114) — passes for `opencode_zen`
|
||||
- Saves user message to DB (line 120)
|
||||
- Builds LLM messages list (line 131)
|
||||
- Wraps `async event_stream()` in `_async_to_sync_generator()` (line 269)
|
||||
- Returns `StreamingHttpResponse` with `text/event-stream` content type (line 268)
|
||||
|
||||
### 4. Backend: `chat/llm_client.py` → `stream_chat_completion()`
|
||||
- Normalizes provider (line 208)
|
||||
- Looks up `CHAT_PROVIDER_CONFIG["opencode_zen"]` (line 209)
|
||||
- Fetches API key from `UserAPIKey.objects.get(user=user, provider="opencode_zen")` (line 154)
|
||||
- Decrypts it via Fernet using `FIELD_ENCRYPTION_KEY` (line 102)
|
||||
- Calls `litellm.acompletion(model="openai/gpt-4o-mini", api_key=<key>, api_base="https://opencode.ai/zen/v1", stream=True, tools=AGENT_TOOLS, tool_choice="auto")` (line 237)
|
||||
- On **any exception**: logs and yields `data: {"error": "An error occurred..."}` (lines 274-276)
|
||||
|
||||
## Root Cause Analysis
|
||||
|
||||
### #1 CRITICAL: Invalid default model `gpt-4o-mini`
|
||||
- **Location**: `backend/server/chat/llm_client.py:62`
|
||||
- `CHAT_PROVIDER_CONFIG["opencode_zen"]["default_model"] = "openai/gpt-4o-mini"`
|
||||
- `gpt-4o-mini` is an OpenAI-hosted model. The OpenCode Zen gateway at `https://opencode.ai/zen/v1` does not offer `gpt-4o-mini`.
|
||||
- LiteLLM sends: `POST https://opencode.ai/zen/v1/chat/completions` with `model: gpt-4o-mini`
|
||||
- Zen API returns HTTP 4xx (model not found or not available)
|
||||
- Exception is caught generically at line 274 → yields masked error SSE → frontend shows generic message
|
||||
|
||||
### #2 SIGNIFICANT: Generic exception handler masks real errors
|
||||
- **Location**: `backend/server/chat/llm_client.py:274-276`
|
||||
- Bare `except Exception:` with logger.exception and a generic user message
|
||||
- LiteLLM exceptions carry structured information: `litellm.exceptions.NotFoundError`, `AuthenticationError`, `BadRequestError`, etc.
|
||||
- All of these show up to the user as `"An error occurred while processing your request. Please try again."`
|
||||
- Prevents diagnosis without checking Docker logs
|
||||
|
||||
### #3 SIGNIFICANT: WSGI + async event loop per request
|
||||
- **Location**: `backend/server/chat/views.py:66-76` (`_async_to_sync_generator`)
|
||||
- Backend runs **Gunicorn WSGI** (from `supervisord.conf:11`); there is **no ASGI entry point** (`asgi.py` doesn't exist)
|
||||
- `stream_chat_completion` is `async def` using `litellm.acompletion` (awaited)
|
||||
- `_async_to_sync_generator` creates a fresh event loop via `asyncio.new_event_loop()` for each request
|
||||
- For multi-tool-iteration responses this loop drives multiple sequential `await` calls
|
||||
- This works but is fragile: if `litellm.acompletion` internally uses a singleton HTTP client that belongs to a different event loop, it will raise `RuntimeError: This event loop is already running` or connection errors on subsequent calls
|
||||
- **httpx/aiohttp sessions in LiteLLM may not be compatible with per-call new event loops**
|
||||
|
||||
### #4 MINOR: `tool_choice: "auto"` sent unconditionally with tools
|
||||
- **Location**: `backend/server/chat/llm_client.py:229`
|
||||
- `"tool_choice": "auto" if tools else None` — None values in kwargs are passed to litellm
|
||||
- Some OpenAI-compat endpoints (including potentially Zen models) reject `tool_choice: null` or unsupported parameters
|
||||
- Fix: remove key entirely instead of setting to None
|
||||
|
||||
### #5 MINOR: API key lookup is synchronous in async context
|
||||
- **Location**: `backend/server/chat/llm_client.py:217` and `views.py:144`
|
||||
- `get_llm_api_key` calls `UserAPIKey.objects.get(...)` synchronously
|
||||
- Called from within `async for chunk in stream_chat_completion(...)` in the async `event_stream()` generator
|
||||
- Django ORM operations must use `sync_to_async` in async contexts; direct sync ORM calls can cause `SynchronousOnlyOperation` errors or deadlocks under ASGI
|
||||
- Under WSGI+new-event-loop approach this is less likely to fail but is technically incorrect
|
||||
|
||||
## Recommended Fix (Ranked by Impact)
|
||||
|
||||
### Fix #1 (Primary): Correct the default model
|
||||
```python
|
||||
# backend/server/chat/llm_client.py:59-64
|
||||
"opencode_zen": {
|
||||
"label": "OpenCode Zen",
|
||||
"needs_api_key": True,
|
||||
"default_model": "openai/gpt-5-nano", # Free; confirmed to work via /chat/completions
|
||||
"api_base": "https://opencode.ai/zen/v1",
|
||||
},
|
||||
```
|
||||
Confirmed working models (use `/chat/completions`, OpenAI-compat):
|
||||
- `openai/gpt-5-nano` (free)
|
||||
- `openai/kimi-k2.5` (confirmed by GitHub usage)
|
||||
- `openai/glm-5` (GLM family)
|
||||
- `openai/big-pickle` (free)
|
||||
|
||||
GPT family models route through `/responses` endpoint on Zen, which LiteLLM's openai-compat mode does NOT use — only the above "OpenAI-compatible" models on Zen reliably work with LiteLLM's `openai/` prefix + `/chat/completions`.
|
||||
|
||||
### Fix #2 (Secondary): Structured error surfacing
|
||||
```python
|
||||
# backend/server/chat/llm_client.py:274-276
|
||||
except Exception as exc:
|
||||
logger.exception("LLM streaming error")
|
||||
# Extract structured detail if available
|
||||
status_code = getattr(exc, 'status_code', None)
|
||||
detail = getattr(exc, 'message', None) or str(exc)
|
||||
user_msg = f"Provider error ({status_code}): {detail}" if status_code else "An error occurred while processing your request. Please try again."
|
||||
yield f"data: {json.dumps({'error': user_msg})}\n\n"
|
||||
```
|
||||
|
||||
### Fix #3 (Minor): Remove None from tool_choice kwarg
|
||||
```python
|
||||
# backend/server/chat/llm_client.py:225-234
|
||||
completion_kwargs = {
|
||||
"model": provider_config["default_model"],
|
||||
"messages": messages,
|
||||
"stream": True,
|
||||
"api_key": api_key,
|
||||
}
|
||||
if tools:
|
||||
completion_kwargs["tools"] = tools
|
||||
completion_kwargs["tool_choice"] = "auto"
|
||||
if provider_config["api_base"]:
|
||||
completion_kwargs["api_base"] = provider_config["api_base"]
|
||||
```
|
||||
|
||||
## Error Flow Diagram
|
||||
|
||||
```
|
||||
User sends message (opencode_zen)
|
||||
→ AITravelChat.svelte:sendMessage()
|
||||
→ POST /api/chat/conversations/<id>/send_message/
|
||||
→ +server.ts:handleRequest() [proxy, no mutation]
|
||||
→ POST http://server:8000/api/chat/conversations/<id>/send_message/
|
||||
→ views.py:ChatViewSet.send_message()
|
||||
→ llm_client.py:stream_chat_completion()
|
||||
→ litellm.acompletion(model="openai/gpt-4o-mini", ← FAILS HERE
|
||||
api_base="https://opencode.ai/zen/v1")
|
||||
→ except Exception → yield data:{"error":"An error occurred..."}
|
||||
← SSE: data:{"error":"An error occurred..."}
|
||||
← StreamingHttpResponse(text/event-stream)
|
||||
← streamed through
|
||||
← streamed through
|
||||
← reader.read() → parsed.error set
|
||||
← assistantMsg.content = "An error occurred..." ← shown to user
|
||||
```
|
||||
|
||||
If the network/DNS fails entirely (e.g. `https://opencode.ai` unreachable):
|
||||
```
|
||||
→ litellm.acompletion raises immediately
|
||||
→ except Exception → yield data:{"error":"An error occurred..."}
|
||||
— OR —
|
||||
→ +server.ts fetch fails → json({error:"Internal Server Error"}, 500)
|
||||
→ AITravelChat.svelte res.ok is false → res.json() → err.error || $t('chat.connection_error')
|
||||
→ shows "Connection error. Please try again."
|
||||
```
|
||||
|
||||
## File References
|
||||
|
||||
| File | Line(s) | Relevance |
|
||||
|---|---|---|
|
||||
| `backend/server/chat/llm_client.py` | 59-64 | `CHAT_PROVIDER_CONFIG["opencode_zen"]` — primary fix |
|
||||
| `backend/server/chat/llm_client.py` | 150-157 | `get_llm_api_key()` — DB lookup for stored key |
|
||||
| `backend/server/chat/llm_client.py` | 203-276 | `stream_chat_completion()` — full LiteLLM call + error handler |
|
||||
| `backend/server/chat/llm_client.py` | 225-234 | `completion_kwargs` construction |
|
||||
| `backend/server/chat/llm_client.py` | 274-276 | Generic `except Exception` (swallows all errors) |
|
||||
| `backend/server/chat/views.py` | 103-274 | `send_message()` — SSE pipeline orchestration |
|
||||
| `backend/server/chat/views.py` | 66-76 | `_async_to_sync_generator()` — WSGI/async bridge |
|
||||
| `backend/server/integrations/models.py` | 78-112 | `UserAPIKey` — encrypted key storage |
|
||||
| `frontend/src/lib/components/AITravelChat.svelte` | 97-195 | `sendMessage()` — SSE consumer + error display |
|
||||
| `frontend/src/lib/components/AITravelChat.svelte` | 124-129 | HTTP error → `$t('chat.connection_error')` |
|
||||
| `frontend/src/lib/components/AITravelChat.svelte` | 157-160 | SSE `parsed.error` → inline display |
|
||||
| `frontend/src/lib/components/AITravelChat.svelte` | 190-192 | Outer catch → `$t('chat.connection_error')` |
|
||||
| `frontend/src/routes/api/[...path]/+server.ts` | 34-110 | `handleRequest()` — proxy |
|
||||
| `frontend/src/routes/api/[...path]/+server.ts` | 94-98 | SSE passthrough (no mutation) |
|
||||
| `frontend/src/locales/en.json` | 46 | `chat.connection_error` = "Connection error. Please try again." |
|
||||
| `backend/supervisord.conf` | 11 | Gunicorn WSGI startup (no ASGI) |
|
||||
|
||||
---
|
||||
|
||||
## Model Selection Implementation Map
|
||||
|
||||
**Date**: 2026-03-08
|
||||
|
||||
### Frontend Provider/Model Selection State (Current)
|
||||
|
||||
In `AITravelChat.svelte`:
|
||||
- `selectedProvider` (line 29): `let selectedProvider = 'openai'` — bare string, no model tracking
|
||||
- `providerCatalog` (line 30): `ChatProviderCatalogEntry[]` — already contains `default_model: string | null` per entry
|
||||
- `chatProviders` (line 31): reactive filtered view of `providerCatalog` (available only)
|
||||
- `loadProviderCatalog()` (line 37): populates catalog from `GET /api/chat/providers/`
|
||||
- `sendMessage()` (line 97): POST body at line 121 is `{ message: msgText, provider: selectedProvider }` — **no model field**
|
||||
- Provider `<select>` (lines 290–298): in the top toolbar of the chat panel
|
||||
|
||||
### Request Payload Build Point
|
||||
|
||||
`AITravelChat.svelte`, line 118–122:
|
||||
```ts
|
||||
const res = await fetch(`/api/chat/conversations/${conversation.id}/send_message/`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ message: msgText, provider: selectedProvider }) // ← ADD model here
|
||||
});
|
||||
```
|
||||
|
||||
### Backend Request Intake Point
|
||||
|
||||
`chat/views.py`, `send_message()` (line 104):
|
||||
- Line 113: `provider = (request.data.get("provider") or "openai").strip().lower()`
|
||||
- Line 144: `stream_chat_completion(request.user, current_messages, provider, tools=AGENT_TOOLS)`
|
||||
- **No model extraction**; model comes only from `CHAT_PROVIDER_CONFIG[provider]["default_model"]`
|
||||
|
||||
### Backend Model Usage Point
|
||||
|
||||
`chat/llm_client.py`, `stream_chat_completion()` (line 203):
|
||||
- Line 225–226: `completion_kwargs = { "model": provider_config["default_model"], ... }`
|
||||
- This is the **sole place model is resolved** — no override capability exists yet
|
||||
|
||||
### Persistence Options Analysis
|
||||
|
||||
| Option | Files changed | Migration? | Risk |
|
||||
|---|---|---|---|
|
||||
| **`localStorage` (recommended)** | `AITravelChat.svelte` only for persistence | No | Lowest: no backend, no schema |
|
||||
| `CustomUser` field (`chat_model_prefs` JSONField) | `users/models.py`, `users/serializers.py`, `users/views.py`, migration | **Yes** | Medium: schema change, serializer exposure |
|
||||
| `UserAPIKey`-style new model prefs table | new `chat/models.py` + serializer + view + urls + migration | **Yes** | High: new endpoint, multi-file |
|
||||
| `UserRecommendationPreferenceProfile` JSONField addition | `integrations/models.py`, serializer, migration | **Yes** | Medium: migration on integrations app |
|
||||
|
||||
**Selected**: `localStorage` — key `voyage_chat_model_prefs`, value `Record<provider_id, model_string>`.
|
||||
|
||||
### File-by-File Edit Plan
|
||||
|
||||
#### 1. `backend/server/chat/llm_client.py`
|
||||
| Symbol | Change |
|
||||
|---|---|
|
||||
| `stream_chat_completion(user, messages, provider, tools=None)` | Add `model: str \| None = None` parameter |
|
||||
| `completion_kwargs["model"]` (line 226) | Change to `model or provider_config["default_model"]` |
|
||||
| (new) validation | If `model` provided: assert it starts with expected LiteLLM prefix or raise SSE error |
|
||||
|
||||
#### 2. `backend/server/chat/views.py`
|
||||
| Symbol | Change |
|
||||
|---|---|
|
||||
| `send_message()` (line 104) | Extract `model = (request.data.get("model") or "").strip() or None` |
|
||||
| `stream_chat_completion(...)` call (line 144) | Pass `model=model` |
|
||||
| (optional validation) | Return 400 if model prefix doesn't match provider |
|
||||
|
||||
#### 3. `frontend/src/lib/components/AITravelChat.svelte`
|
||||
| Symbol | Change |
|
||||
|---|---|
|
||||
| (new) `let selectedModel: string` | Initialize from `loadModelPref(selectedProvider)` or `default_model` |
|
||||
| (new) `$: selectedProviderEntry` | Reactive lookup of current provider's catalog entry |
|
||||
| (new) `$: selectedModel` reset | Reset on provider change; persist with `saveModelPref` |
|
||||
| `sendMessage()` body (line 121) | Add `model: selectedModel || undefined` to JSON body |
|
||||
| (new) model `<input>` in toolbar | Placed after provider `<select>`, `bind:value={selectedModel}`, placeholder = `default_model` |
|
||||
| (new) `loadModelPref(provider)` | Read from `localStorage.getItem('voyage_chat_model_prefs')` |
|
||||
| (new) `saveModelPref(provider, model)` | Write to `localStorage.setItem('voyage_chat_model_prefs', ...)` |
|
||||
|
||||
#### 4. `frontend/src/locales/en.json`
|
||||
| Key | Value |
|
||||
|---|---|
|
||||
| `chat.model_label` | `"Model"` |
|
||||
| `chat.model_placeholder` | `"Default model"` |
|
||||
|
||||
### Provider-Model Compatibility Validation
|
||||
|
||||
The critical constraint is **LiteLLM model-string routing**. LiteLLM uses the `provider/model-name` prefix to determine which SDK client to use:
|
||||
- `openai/gpt-5-nano` → OpenAI client (with custom `api_base` for Zen)
|
||||
- `anthropic/claude-sonnet-4-20250514` → Anthropic client
|
||||
- `groq/llama-3.3-70b-versatile` → Groq client
|
||||
|
||||
If user types `anthropic/claude-opus` for `openai` provider, LiteLLM uses Anthropic SDK with OpenAI credentials → guaranteed failure.
|
||||
|
||||
**Recommended backend guard** in `send_message()`:
|
||||
```python
|
||||
if model:
|
||||
expected_prefix = provider_config["default_model"].split("/")[0]
|
||||
if not model.startswith(expected_prefix + "/"):
|
||||
return Response(
|
||||
{"error": f"Model must use '{expected_prefix}/' prefix for provider '{provider}'."},
|
||||
status=status.HTTP_400_BAD_REQUEST,
|
||||
)
|
||||
```
|
||||
|
||||
Exception: `opencode_zen` and `openrouter` accept any prefix (they're routing gateways). Guard should skip prefix check when `api_base` is set (custom gateway).
|
||||
|
||||
### Migration Requirement
|
||||
|
||||
**NO migration required** for the recommended localStorage approach.
|
||||
|
||||
---
|
||||
|
||||
## Cross-references
|
||||
|
||||
- See [Plan: OpenCode Zen connection error](../plans/opencode-zen-connection-error.md)
|
||||
- See [Research: LiteLLM provider catalog](litellm-zen-provider-catalog.md)
|
||||
- See [Knowledge: AI Chat](../knowledge.md#ai-chat-collections--recommendations)
|
||||
198
.memory/research/provider-strategy.md
Normal file
198
.memory/research/provider-strategy.md
Normal file
@@ -0,0 +1,198 @@
|
||||
# Research: Multi-Provider Strategy for Voyage AI Chat
|
||||
|
||||
**Date**: 2026-03-09
|
||||
**Researcher**: researcher agent
|
||||
**Status**: Complete
|
||||
|
||||
## Summary
|
||||
|
||||
Investigated how OpenCode, OpenClaw-like projects, and LiteLLM-based production systems handle multi-provider model discovery, auth, rate-limit resilience, and tool-calling compatibility. Assessed whether replacing LiteLLM is warranted for Voyage.
|
||||
|
||||
**Bottom line**: Keep LiteLLM, harden it. Replacing LiteLLM would be a multi-week migration with negligible user-facing benefit. LiteLLM already solves the hard problems (100+ provider SDKs, streaming, tool-call translation). Voyage's issues are in the **integration layer**, not in LiteLLM itself.
|
||||
|
||||
---
|
||||
|
||||
## 1. Pattern Analysis: How Projects Handle Multi-Provider
|
||||
|
||||
### 1a. Dynamic Model Discovery
|
||||
|
||||
| Project | Approach | Notes |
|
||||
|---|---|---|
|
||||
| **OpenCode** | Static registry from `models.dev` (JSON database), merged with user config, filtered by env/auth presence | No runtime API calls to providers for discovery; curated model metadata (capabilities, cost, limits) baked in |
|
||||
| **Ragflow** | Hardcoded `SupportedLiteLLMProvider` enum + per-provider model lists | Similar to Voyage's current approach |
|
||||
| **daily_stock_analysis** | `litellm.Router` model_list config + `fallback_models` list from config file | Runtime fallback, not runtime discovery |
|
||||
| **Onyx** | `LLMProvider` DB model + admin UI for model configuration | DB-backed, admin-managed |
|
||||
| **LiteLLM Proxy** | YAML config `model_list` with deployment-level params | Static config, hot-reloadable |
|
||||
| **Voyage (current)** | `CHAT_PROVIDER_CONFIG` dict + hardcoded `models()` per provider + OpenAI API `client.models.list()` for OpenAI only | Mixed: one provider does live discovery, rest are hardcoded |
|
||||
|
||||
**Key insight**: No production project does universal runtime model discovery across all providers. OpenCode — the most sophisticated — uses a curated static database (`models.dev`) with provider/model metadata including capability flags (`toolcall`, `reasoning`, `streaming`). This is the right pattern for Voyage.
|
||||
|
||||
### 1b. Provider Auth Handling
|
||||
|
||||
| Project | Approach |
|
||||
|---|---|
|
||||
| **OpenCode** | Multi-source: env vars → `Auth.get()` (stored credentials) → config file → plugin loaders; per-provider custom auth (AWS chains, Google ADC, OAuth) |
|
||||
| **LiteLLM Router** | `api_key` per deployment in model_list; env var fallback |
|
||||
| **Cognee** | Rate limiter context manager wrapping LiteLLM calls |
|
||||
| **Voyage (current)** | Per-user encrypted `UserAPIKey` DB model + instance-level `VOYAGE_AI_API_KEY` env fallback; key fetched per-request |
|
||||
|
||||
**Voyage's approach is sound.** Per-user DB-stored keys with instance fallback matches the self-hosted deployment model. No change needed.
|
||||
|
||||
### 1c. Rate-Limit Fallback / Retry
|
||||
|
||||
| Project | Approach |
|
||||
|---|---|
|
||||
| **LiteLLM Router** | Built-in: `num_retries`, `fallbacks` (cross-model), `allowed_fails` + `cooldown_time`, `RetryPolicy` (per-exception-type retry counts), `AllowedFailsPolicy` |
|
||||
| **daily_stock_analysis** | `litellm.Router` with `fallback_models` list + multi-key support (rotate API keys on rate limit) |
|
||||
| **Cognee** | `tenacity` retry decorator with `wait_exponential_jitter` + LiteLLM rate limiter |
|
||||
| **Suna** | LiteLLM exception mapping → structured error processor |
|
||||
| **Voyage (current)** | Zero retries. Single attempt. `_safe_error_payload()` maps exceptions to user messages but does not retry. |
|
||||
|
||||
**This is Voyage's biggest gap.** Every other production system has retry logic. LiteLLM has this built in — Voyage just isn't using it.
|
||||
|
||||
### 1d. Tool-Calling Compatibility
|
||||
|
||||
| Project | Approach |
|
||||
|---|---|
|
||||
| **OpenCode** | `capabilities.toolcall` boolean per model in `models.dev` database; models without tool support are filtered from agentic use |
|
||||
| **LiteLLM** | `litellm.supports_function_calling(model=)` runtime check; `get_supported_openai_params(model=)` for param filtering |
|
||||
| **PraisonAI** | `litellm.supports_function_calling()` guard before tool dispatch |
|
||||
| **open-interpreter** | Same `litellm.supports_function_calling()` guard |
|
||||
| **Voyage (current)** | No tool-call capability check. `AGENT_TOOLS` always passed. Reasoning models excluded from `opencode_zen` list by critic gate (manual). |
|
||||
|
||||
**Actionable gap.** `litellm.supports_function_calling(model=)` exists and should be used before passing `tools` kwarg.
|
||||
|
||||
---
|
||||
|
||||
## 2. Architecture Options Comparison
|
||||
|
||||
| Option | Description | Effort | Risk | Benefit |
|
||||
|---|---|---|---|---|
|
||||
| **A. Keep LiteLLM, harden** | Add Router for retry/fallback, add `supports_function_calling` guard, curate model lists with capability metadata | **Low** (1-2 sessions) | **Low** — incremental changes to existing working code | Retry resilience, tool-call safety, zero migration |
|
||||
| **B. Hybrid: direct SDK for some** | Use `@ai-sdk/*` packages (like OpenCode) for primary providers, LiteLLM for others | **High** (1-2 weeks) | **High** — new TS→Python SDK mismatch, dual streaming paths, test surface explosion | Finer control per provider; no real benefit for Django backend |
|
||||
| **C. Replace LiteLLM entirely** | Build custom provider abstraction or adopt Vercel AI SDK (TypeScript-only) | **Very High** (3-4 weeks) | **Very High** — rewrite streaming, tool-call translation, error mapping for each provider | Only makes sense if moving to full-stack TypeScript |
|
||||
| **D. LiteLLM Proxy (sidecar)** | Run LiteLLM as a separate proxy service, call it via OpenAI-compatible API | **Medium** (2-3 days) | **Medium** — new Docker service, config management, latency overhead | Centralized config, built-in admin UI, but overkill for single-user self-hosted |
|
||||
|
||||
---
|
||||
|
||||
## 3. Recommendation
|
||||
|
||||
### Immediate (this session / next session): Option A — Harden LiteLLM
|
||||
|
||||
**Specific code-level adaptations:**
|
||||
|
||||
#### 3a. Add `litellm.Router` for retry + fallback (highest impact)
|
||||
|
||||
Replace bare `litellm.acompletion()` with `litellm.Router.acompletion()`:
|
||||
|
||||
```python
|
||||
# llm_client.py — new module-level router
|
||||
import litellm
|
||||
from litellm.router import RetryPolicy
|
||||
|
||||
_router = None
|
||||
|
||||
def _get_router():
|
||||
global _router
|
||||
if _router is None:
|
||||
_router = litellm.Router(
|
||||
model_list=[], # empty — we use router for retry/timeout only
|
||||
num_retries=2,
|
||||
timeout=60,
|
||||
retry_policy=RetryPolicy(
|
||||
AuthenticationErrorRetries=0,
|
||||
RateLimitErrorRetries=2,
|
||||
TimeoutErrorRetries=1,
|
||||
BadRequestErrorRetries=0,
|
||||
),
|
||||
)
|
||||
return _router
|
||||
```
|
||||
|
||||
**However**: LiteLLM Router requires models pre-registered in `model_list`. For Voyage's dynamic per-user-key model, the simpler approach is:
|
||||
|
||||
```python
|
||||
# In stream_chat_completion, add retry params to acompletion:
|
||||
response = await litellm.acompletion(
|
||||
**completion_kwargs,
|
||||
num_retries=2,
|
||||
request_timeout=60,
|
||||
)
|
||||
```
|
||||
|
||||
LiteLLM's `acompletion()` accepts `num_retries` directly — no Router needed.
|
||||
|
||||
**Files**: `backend/server/chat/llm_client.py` line 418 (add `num_retries=2, request_timeout=60`)
|
||||
|
||||
#### 3b. Add tool-call capability guard
|
||||
|
||||
```python
|
||||
# In stream_chat_completion, before building completion_kwargs:
|
||||
effective_model = model or provider_config["default_model"]
|
||||
if tools and not litellm.supports_function_calling(model=effective_model):
|
||||
# Strip tools — model doesn't support them
|
||||
tools = None
|
||||
logger.warning("Model %s does not support function calling; tools stripped", effective_model)
|
||||
```
|
||||
|
||||
**Files**: `backend/server/chat/llm_client.py` around line 397
|
||||
|
||||
#### 3c. Curate model lists with tool-call metadata in `models()` endpoint
|
||||
|
||||
Instead of returning bare string lists, return objects with capability info:
|
||||
|
||||
```python
|
||||
# In ChatProviderCatalogViewSet.models():
|
||||
if provider in ["opencode_zen"]:
|
||||
return Response({"models": [
|
||||
{"id": "openai/gpt-5-nano", "supports_tools": True},
|
||||
{"id": "openai/gpt-4o-mini", "supports_tools": True},
|
||||
{"id": "openai/gpt-4o", "supports_tools": True},
|
||||
{"id": "anthropic/claude-sonnet-4-20250514", "supports_tools": True},
|
||||
{"id": "anthropic/claude-3-5-haiku-20241022", "supports_tools": True},
|
||||
]})
|
||||
```
|
||||
|
||||
**Files**: `backend/server/chat/views/__init__.py` — `models()` action. Frontend `loadModelsForProvider()` would need minor update to handle objects.
|
||||
|
||||
#### 3d. Fix `day_suggestions.py` hardcoded model
|
||||
|
||||
Line 194 uses `model="gpt-4o-mini"` — doesn't respect provider config or user selection:
|
||||
|
||||
```python
|
||||
# day_suggestions.py line 193-194
|
||||
response = litellm.completion(
|
||||
model="gpt-4o-mini", # BUG: ignores provider config
|
||||
```
|
||||
|
||||
Should use provider_config default or user-selected model.
|
||||
|
||||
**Files**: `backend/server/chat/views/day_suggestions.py` line 194
|
||||
|
||||
### Long-term (future sessions)
|
||||
|
||||
1. **Adopt `models.dev`-style curated database**: OpenCode's approach of maintaining a JSON/YAML model registry with capabilities, costs, and limits is superior to hardcoded lists. Could be a YAML file in `backend/server/chat/models.yaml` loaded at startup.
|
||||
|
||||
2. **LiteLLM Proxy sidecar**: If Voyage gains multi-user production deployment, running LiteLLM as a proxy sidecar gives centralized rate limiting, key management, and an admin dashboard. Not warranted for current self-hosted single/few-user deployment.
|
||||
|
||||
3. **WSGI→ASGI migration**: Already documented as out-of-scope, but remains the long-term fix for event loop fragility (see [opencode-zen-connection-debug.md](opencode-zen-connection-debug.md#3-significant-wsgi--async-event-loop-per-request)).
|
||||
|
||||
---
|
||||
|
||||
## 4. Why NOT Replace LiteLLM
|
||||
|
||||
| Concern | Reality |
|
||||
|---|---|
|
||||
| "LiteLLM is too heavy" | It's a pip dependency (~40MB installed). No runtime sidecar. Same weight as Django itself. |
|
||||
| "We could use provider SDKs directly" | Each provider has different streaming formats, tool-call schemas, and error types. LiteLLM normalizes all of this. Reimplementing costs weeks per provider. |
|
||||
| "OpenCode doesn't use LiteLLM" | OpenCode is TypeScript + Vercel AI SDK. It has ~20 bundled `@ai-sdk/*` provider packages. The Python equivalent IS LiteLLM. |
|
||||
| "LiteLLM has bugs" | All Voyage's issues are in our integration layer (no retries, no capability checks, hardcoded models), not in LiteLLM itself. |
|
||||
|
||||
---
|
||||
|
||||
## Cross-references
|
||||
|
||||
- See [Research: LiteLLM provider catalog](litellm-zen-provider-catalog.md)
|
||||
- See [Research: OpenCode Zen connection debug](opencode-zen-connection-debug.md)
|
||||
- See [Plan: Travel agent context + models](../plans/travel-agent-context-and-models.md)
|
||||
- See [Decisions: Critic Gate](../decisions.md#critic-gate-travel-agent-context--models-follow-up)
|
||||
Reference in New Issue
Block a user