Files

alex wiesner c4d39f2812 changes

2026-03-13 20:15:22 +00:00

10 KiB

Raw Blame History

title, type, permalink

title	type	permalink
provider-strategy	note	voyage/research/provider-strategy

Research: Multi-Provider Strategy for Voyage AI Chat

Date: 2026-03-09 Researcher: researcher agent Status: Complete

Summary

Investigated how OpenCode, OpenClaw-like projects, and LiteLLM-based production systems handle multi-provider model discovery, auth, rate-limit resilience, and tool-calling compatibility. Assessed whether replacing LiteLLM is warranted for Voyage.

Bottom line: Keep LiteLLM, harden it. Replacing LiteLLM would be a multi-week migration with negligible user-facing benefit. LiteLLM already solves the hard problems (100+ provider SDKs, streaming, tool-call translation). Voyage's issues are in the integration layer, not in LiteLLM itself.

1. Pattern Analysis: How Projects Handle Multi-Provider

1a. Dynamic Model Discovery

Project	Approach	Notes
OpenCode	Static registry from `models.dev` (JSON database), merged with user config, filtered by env/auth presence	No runtime API calls to providers for discovery; curated model metadata (capabilities, cost, limits) baked in
Ragflow	Hardcoded `SupportedLiteLLMProvider` enum + per-provider model lists	Similar to Voyage's current approach
daily_stock_analysis	`litellm.Router` model_list config + `fallback_models` list from config file	Runtime fallback, not runtime discovery
Onyx	`LLMProvider` DB model + admin UI for model configuration	DB-backed, admin-managed
LiteLLM Proxy	YAML config `model_list` with deployment-level params	Static config, hot-reloadable
Voyage (current)	`CHAT_PROVIDER_CONFIG` dict + hardcoded `models()` per provider + OpenAI API `client.models.list()` for OpenAI only	Mixed: one provider does live discovery, rest are hardcoded

Key insight: No production project does universal runtime model discovery across all providers. OpenCode — the most sophisticated — uses a curated static database (models.dev) with provider/model metadata including capability flags (toolcall, reasoning, streaming). This is the right pattern for Voyage.

1b. Provider Auth Handling

Project	Approach
OpenCode	Multi-source: env vars → `Auth.get()` (stored credentials) → config file → plugin loaders; per-provider custom auth (AWS chains, Google ADC, OAuth)
LiteLLM Router	`api_key` per deployment in model_list; env var fallback
Cognee	Rate limiter context manager wrapping LiteLLM calls
Voyage (current)	Per-user encrypted `UserAPIKey` DB model + instance-level `VOYAGE_AI_API_KEY` env fallback; key fetched per-request

Voyage's approach is sound. Per-user DB-stored keys with instance fallback matches the self-hosted deployment model. No change needed.

1c. Rate-Limit Fallback / Retry

Project	Approach
LiteLLM Router	Built-in: `num_retries`, `fallbacks` (cross-model), `allowed_fails` + `cooldown_time`, `RetryPolicy` (per-exception-type retry counts), `AllowedFailsPolicy`
daily_stock_analysis	`litellm.Router` with `fallback_models` list + multi-key support (rotate API keys on rate limit)
Cognee	`tenacity` retry decorator with `wait_exponential_jitter` + LiteLLM rate limiter
Suna	LiteLLM exception mapping → structured error processor
Voyage (current)	Zero retries. Single attempt. `_safe_error_payload()` maps exceptions to user messages but does not retry.

This is Voyage's biggest gap. Every other production system has retry logic. LiteLLM has this built in — Voyage just isn't using it.

1d. Tool-Calling Compatibility

Project	Approach
OpenCode	`capabilities.toolcall` boolean per model in `models.dev` database; models without tool support are filtered from agentic use
LiteLLM	`litellm.supports_function_calling(model=)` runtime check; `get_supported_openai_params(model=)` for param filtering
PraisonAI	`litellm.supports_function_calling()` guard before tool dispatch
open-interpreter	Same `litellm.supports_function_calling()` guard
Voyage (current)	No tool-call capability check. `AGENT_TOOLS` always passed. Reasoning models excluded from `opencode_zen` list by critic gate (manual).

Actionable gap. litellm.supports_function_calling(model=) exists and should be used before passing tools kwarg.

2. Architecture Options Comparison

Option	Description	Effort	Risk	Benefit
A. Keep LiteLLM, harden	Add Router for retry/fallback, add `supports_function_calling` guard, curate model lists with capability metadata	Low (1-2 sessions)	Low — incremental changes to existing working code	Retry resilience, tool-call safety, zero migration
B. Hybrid: direct SDK for some	Use `@ai-sdk/*` packages (like OpenCode) for primary providers, LiteLLM for others	High (1-2 weeks)	High — new TS→Python SDK mismatch, dual streaming paths, test surface explosion	Finer control per provider; no real benefit for Django backend
C. Replace LiteLLM entirely	Build custom provider abstraction or adopt Vercel AI SDK (TypeScript-only)	Very High (3-4 weeks)	Very High — rewrite streaming, tool-call translation, error mapping for each provider	Only makes sense if moving to full-stack TypeScript
D. LiteLLM Proxy (sidecar)	Run LiteLLM as a separate proxy service, call it via OpenAI-compatible API	Medium (2-3 days)	Medium — new Docker service, config management, latency overhead	Centralized config, built-in admin UI, but overkill for single-user self-hosted

3. Recommendation

Immediate (this session / next session): Option A — Harden LiteLLM

Specific code-level adaptations:

3a. Add `litellm.Router` for retry + fallback (highest impact)

Replace bare litellm.acompletion() with litellm.Router.acompletion():

# llm_client.py — new module-level router
import litellm
from litellm.router import RetryPolicy

_router = None

def _get_router():
    global _router
    if _router is None:
        _router = litellm.Router(
            model_list=[],  # empty — we use router for retry/timeout only
            num_retries=2,
            timeout=60,
            retry_policy=RetryPolicy(
                AuthenticationErrorRetries=0,
                RateLimitErrorRetries=2,
                TimeoutErrorRetries=1,
                BadRequestErrorRetries=0,
            ),
        )
    return _router

However: LiteLLM Router requires models pre-registered in model_list. For Voyage's dynamic per-user-key model, the simpler approach is:

# In stream_chat_completion, add retry params to acompletion:
response = await litellm.acompletion(
    **completion_kwargs,
    num_retries=2,
    request_timeout=60,
)

LiteLLM's acompletion() accepts num_retries directly — no Router needed.

Files: backend/server/chat/llm_client.py line 418 (add num_retries=2, request_timeout=60)

3b. Add tool-call capability guard

# In stream_chat_completion, before building completion_kwargs:
effective_model = model or provider_config["default_model"]
if tools and not litellm.supports_function_calling(model=effective_model):
    # Strip tools — model doesn't support them
    tools = None
    logger.warning("Model %s does not support function calling; tools stripped", effective_model)

Files: backend/server/chat/llm_client.py around line 397

3c. Curate model lists with tool-call metadata in `models()` endpoint

Instead of returning bare string lists, return objects with capability info:

# In ChatProviderCatalogViewSet.models():
if provider in ["opencode_zen"]:
    return Response({"models": [
        {"id": "openai/gpt-5-nano", "supports_tools": True},
        {"id": "openai/gpt-4o-mini", "supports_tools": True},
        {"id": "openai/gpt-4o", "supports_tools": True},
        {"id": "anthropic/claude-sonnet-4-20250514", "supports_tools": True},
        {"id": "anthropic/claude-3-5-haiku-20241022", "supports_tools": True},
    ]})

Files: backend/server/chat/views/__init__.py — models() action. Frontend loadModelsForProvider() would need minor update to handle objects.

3d. Fix `day_suggestions.py` hardcoded model

Line 194 uses model="gpt-4o-mini" — doesn't respect provider config or user selection:

# day_suggestions.py line 193-194
response = litellm.completion(
    model="gpt-4o-mini",  # BUG: ignores provider config

Should use provider_config default or user-selected model.

Files: backend/server/chat/views/day_suggestions.py line 194

Long-term (future sessions)

Adopt models.dev-style curated database: OpenCode's approach of maintaining a JSON/YAML model registry with capabilities, costs, and limits is superior to hardcoded lists. Could be a YAML file in backend/server/chat/models.yaml loaded at startup.
LiteLLM Proxy sidecar: If Voyage gains multi-user production deployment, running LiteLLM as a proxy sidecar gives centralized rate limiting, key management, and an admin dashboard. Not warranted for current self-hosted single/few-user deployment.
WSGI→ASGI migration: Already documented as out-of-scope, but remains the long-term fix for event loop fragility (see opencode-zen-connection-debug.md).

4. Why NOT Replace LiteLLM

Concern	Reality
"LiteLLM is too heavy"	It's a pip dependency (~40MB installed). No runtime sidecar. Same weight as Django itself.
"We could use provider SDKs directly"	Each provider has different streaming formats, tool-call schemas, and error types. LiteLLM normalizes all of this. Reimplementing costs weeks per provider.
"OpenCode doesn't use LiteLLM"	OpenCode is TypeScript + Vercel AI SDK. It has ~20 bundled `@ai-sdk/*` provider packages. The Python equivalent IS LiteLLM.
"LiteLLM has bugs"	All Voyage's issues are in our integration layer (no retries, no capability checks, hardcoded models), not in LiteLLM itself.

10 KiB Raw Blame History