Files
voyage/.memory/research/auto-learn-preference-signals.md
alex wiesner c4d39f2812 changes
2026-03-13 20:15:22 +00:00

136 lines
8.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: auto-learn-preference-signals
type: note
permalink: voyage/research/auto-learn-preference-signals
---
# Research: Auto-Learn User Preference Signals
## Purpose
Map all existing user data that could be aggregated into an automatic preference profile, without requiring manual input.
## Signal Inventory
### 1. Location.category (FK → Category)
- **Model**: `adventures/models.py:Category` — per-user custom categories (name, display_name, icon)
- **Signal**: Top categories by count → dominant interest type (e.g. "hiking", "dining", "cultural")
- **Query**: `Location.objects.filter(user=user).values('category__name').annotate(cnt=Count('id')).order_by('-cnt')`
- **Strength**: HIGH — user-created categories are deliberate choices
### 2. Location.tags (ArrayField)
- **Model**: `adventures/models.py:Location.tags``ArrayField(CharField(max_length=100))`
- **Signal**: Most frequent tags across all user locations → interest keywords
- **Query**: `Location.objects.filter(user=user).values_list('tags', flat=True).distinct()` (used in `tags_view.py`)
- **Strength**: MEDIUM-HIGH — tags are free-text user input
### 3. Location.rating (FloatField)
- **Model**: `adventures/models.py:Location.rating`
- **Signal**: Average rating + high-rated locations → positive sentiment for place types; filtering for visited + high-rated → strong preferences
- **Query**: `Location.objects.filter(user=user).aggregate(avg_rating=Avg('rating'))` or breakdown by category
- **Strength**: HIGH for positive signals (≥4.0); weak if rarely filled in
### 4. Location.description / Visit.notes (TextField)
- **Model**: `adventures/models.py:Location.description`, `Visit.notes`
- **Signal**: Free-text content for NLP keyword extraction (budget, adventure, luxury, cuisine words)
- **Query**: `Location.objects.filter(user=user).values_list('description', flat=True)`
- **Strength**: LOW (requires NLP to extract structured signals; many fields blank)
### 5. Lodging.type (LODGING_TYPES enum)
- **Model**: `adventures/models.py:Lodging.type` — choices: hotel, hostel, resort, bnb, campground, cabin, apartment, house, villa, motel
- **Signal**: Most frequently used lodging type → travel style indicator (e.g. "hostel" → budget; "resort/villa" → luxury; "campground/cabin" → outdoor)
- **Query**: `Lodging.objects.filter(user=user).values('type').annotate(cnt=Count('id')).order_by('-cnt')`
- **Strength**: HIGH — directly maps to trip_style field
### 6. Lodging.rating (FloatField)
- **Signal**: Combined with lodging type, identifies preferred accommodation standards
- **Strength**: MEDIUM
### 7. Transportation.type (TRANSPORTATION_TYPES enum)
- **Model**: `adventures/models.py:Transportation.type` — choices: car, plane, train, bus, boat, bike, walking
- **Signal**: Primary transport mode → mobility preference (e.g. mostly walking/bike → slow travel; lots of planes → frequent flyer)
- **Query**: `Transportation.objects.filter(user=user).values('type').annotate(cnt=Count('id')).order_by('-cnt')`
- **Strength**: MEDIUM
### 8. Activity.sport_type (SPORT_TYPE_CHOICES)
- **Model**: `adventures/models.py:Activity.sport_type` — 60+ choices mapped to 10 SPORT_CATEGORIES in `utils/sports_types.py`
- **Signal**: Activity categories user is active in → physical/adventure interests
- **Categories**: running, walking_hiking, cycling, water_sports, winter_sports, fitness_gym, racket_sports, climbing_adventure, team_sports
- **Query**: Already aggregated in `stats_view.py:_get_activity_stats_by_category()` — uses `Activity.objects.filter(user=user).values('sport_type').annotate(count=Count('id'))`
- **Strength**: HIGH — objective behavioral data from Strava/Wanderer imports
### 9. VisitedRegion / VisitedCity (worldtravel)
- **Model**: `worldtravel/models.py``VisitedRegion(user, region)` and `VisitedCity(user, city)` with country/subregion
- **Signal**: Countries/regions visited → geographic preferences (beach vs. mountain vs. city; EU vs. Asia etc.)
- **Query**: `VisitedRegion.objects.filter(user=user).select_related('region__country')` → country distribution
- **Strength**: MEDIUM-HIGH — "where has this user historically traveled?" informs destination type
### 10. Collection metadata
- **Model**: `adventures/models.py:Collection` — name, description, start/end dates
- **Signal**: Collection names/descriptions may contain destination/theme hints; trip duration (end_date start_date) → travel pace; trip frequency (count, spacing) → travel cadence
- **Query**: `Collection.objects.filter(user=user).values('name', 'description', 'start_date', 'end_date')`
- **Strength**: LOW-MEDIUM (descriptions often blank; names are free-text)
### 11. Location.price / Lodging.price (MoneyField)
- **Signal**: Average spend across locations/lodging → budget tier
- **Query**: `Location.objects.filter(user=user).aggregate(avg_price=Avg('price'))` (requires djmoney amount field)
- **Strength**: MEDIUM — but many records may have no price set
### 12. Location geographic clustering (lat/lon)
- **Signal**: Country/region distribution of visited locations → geographic affinity
- **Already tracked**: `Location.country`, `Location.region`, `Location.city` (FK, auto-geocoded)
- **Query**: `Location.objects.filter(user=user).values('country__name').annotate(cnt=Count('id')).order_by('-cnt')`
- **Strength**: HIGH
### 13. UserAchievement types
- **Model**: `achievements/models.py:UserAchievement` — types: `adventure_count`, `country_count`
- **Signal**: Milestone count → engagement level (casual vs. power user); high `country_count` → variety-seeker
- **Strength**: LOW-MEDIUM (only 2 types currently)
### 14. ChatMessage content (user role)
- **Model**: `chat/models.py:ChatMessage``role`, `content`
- **Signal**: User messages in travel conversations → intent signals ("I love hiking", "looking for cheap food", "family-friendly")
- **Query**: `ChatMessage.objects.filter(conversation__user=user, role='user').values_list('content', flat=True)`
- **Strength**: MEDIUM — requires NLP; could be rich but noisy
## Aggregation Patterns Already in Codebase
| Pattern | Location | Reusability |
|---|---|---|
| Activity stats by category | `stats_view.py:_get_activity_stats_by_category()` | Direct reuse |
| All-tags union | `tags_view.py:ActivityTypesView.types()` | Direct reuse |
| VisitedRegion/City counts | `stats_view.py:counts()` | Direct reuse |
| Multi-user preference merge | `llm_client.py:get_aggregated_preferences()` | Partial reuse |
| Category-filtered location count | `serializers.py:location_count` | Pattern reference |
| Location queryset scoping | `location_view.py:get_queryset()` | Standard pattern |
## Proposed Auto-Profile Fields from Signals
| Target Field | Primary Signals | Secondary Signals |
|---|---|---|
| `cuisines` | Location.tags (cuisine words), Location.category (dining) | Location.description NLP |
| `interests` | Activity.sport_type categories, Location.category top-N | Location.tags frequency, VisitedRegion types |
| `trip_style` | Lodging.type top (luxury/budget/outdoor), Transportation.type, Activity sport categories | Location.rating Avg, price signals |
| `notes` | (not auto-derived — keep manual only) | — |
## Where to Implement
**New function target**: `integrations/views/recommendation_profile_view.py` or a new `integrations/utils/auto_profile.py`
**Suggested function signature**:
```python
def build_auto_preference_profile(user) -> dict:
"""
Returns {cuisines, interests, trip_style} inferred from user's travel history.
Fields are non-destructive suggestions, not overrides of manual input.
"""
```
**New API endpoint target**: `POST /api/integrations/recommendation-preferences/auto-learn/`
**ViewSet action**: `@action(detail=False, methods=['post'], url_path='auto-learn')` on `UserRecommendationPreferenceProfileViewSet`
## Integration Point
`get_system_prompt()` in `chat/llm_client.py` already consumes `UserRecommendationPreferenceProfile` — auto-learned values
flow directly into AI context with zero additional changes needed there.
See: [knowledge.md — User Recommendation Preference Profile](../knowledge.md#user-recommendation-preference-profile)
See: [plans/ai-travel-agent-redesign.md — WS2](../plans/ai-travel-agent-redesign.md#ws2-user-preference-learning)