Files
voyage/.memory/research/auto-learn-preference-signals.md
alex wiesner c4d39f2812 changes
2026-03-13 20:15:22 +00:00

8.2 KiB
Raw Blame History

title, type, permalink
title type permalink
auto-learn-preference-signals note voyage/research/auto-learn-preference-signals

Research: Auto-Learn User Preference Signals

Purpose

Map all existing user data that could be aggregated into an automatic preference profile, without requiring manual input.

Signal Inventory

1. Location.category (FK → Category)

  • Model: adventures/models.py:Category — per-user custom categories (name, display_name, icon)
  • Signal: Top categories by count → dominant interest type (e.g. "hiking", "dining", "cultural")
  • Query: Location.objects.filter(user=user).values('category__name').annotate(cnt=Count('id')).order_by('-cnt')
  • Strength: HIGH — user-created categories are deliberate choices

2. Location.tags (ArrayField)

  • Model: adventures/models.py:Location.tagsArrayField(CharField(max_length=100))
  • Signal: Most frequent tags across all user locations → interest keywords
  • Query: Location.objects.filter(user=user).values_list('tags', flat=True).distinct() (used in tags_view.py)
  • Strength: MEDIUM-HIGH — tags are free-text user input

3. Location.rating (FloatField)

  • Model: adventures/models.py:Location.rating
  • Signal: Average rating + high-rated locations → positive sentiment for place types; filtering for visited + high-rated → strong preferences
  • Query: Location.objects.filter(user=user).aggregate(avg_rating=Avg('rating')) or breakdown by category
  • Strength: HIGH for positive signals (≥4.0); weak if rarely filled in

4. Location.description / Visit.notes (TextField)

  • Model: adventures/models.py:Location.description, Visit.notes
  • Signal: Free-text content for NLP keyword extraction (budget, adventure, luxury, cuisine words)
  • Query: Location.objects.filter(user=user).values_list('description', flat=True)
  • Strength: LOW (requires NLP to extract structured signals; many fields blank)

5. Lodging.type (LODGING_TYPES enum)

  • Model: adventures/models.py:Lodging.type — choices: hotel, hostel, resort, bnb, campground, cabin, apartment, house, villa, motel
  • Signal: Most frequently used lodging type → travel style indicator (e.g. "hostel" → budget; "resort/villa" → luxury; "campground/cabin" → outdoor)
  • Query: Lodging.objects.filter(user=user).values('type').annotate(cnt=Count('id')).order_by('-cnt')
  • Strength: HIGH — directly maps to trip_style field

6. Lodging.rating (FloatField)

  • Signal: Combined with lodging type, identifies preferred accommodation standards
  • Strength: MEDIUM

7. Transportation.type (TRANSPORTATION_TYPES enum)

  • Model: adventures/models.py:Transportation.type — choices: car, plane, train, bus, boat, bike, walking
  • Signal: Primary transport mode → mobility preference (e.g. mostly walking/bike → slow travel; lots of planes → frequent flyer)
  • Query: Transportation.objects.filter(user=user).values('type').annotate(cnt=Count('id')).order_by('-cnt')
  • Strength: MEDIUM

8. Activity.sport_type (SPORT_TYPE_CHOICES)

  • Model: adventures/models.py:Activity.sport_type — 60+ choices mapped to 10 SPORT_CATEGORIES in utils/sports_types.py
  • Signal: Activity categories user is active in → physical/adventure interests
  • Categories: running, walking_hiking, cycling, water_sports, winter_sports, fitness_gym, racket_sports, climbing_adventure, team_sports
  • Query: Already aggregated in stats_view.py:_get_activity_stats_by_category() — uses Activity.objects.filter(user=user).values('sport_type').annotate(count=Count('id'))
  • Strength: HIGH — objective behavioral data from Strava/Wanderer imports

9. VisitedRegion / VisitedCity (worldtravel)

  • Model: worldtravel/models.pyVisitedRegion(user, region) and VisitedCity(user, city) with country/subregion
  • Signal: Countries/regions visited → geographic preferences (beach vs. mountain vs. city; EU vs. Asia etc.)
  • Query: VisitedRegion.objects.filter(user=user).select_related('region__country') → country distribution
  • Strength: MEDIUM-HIGH — "where has this user historically traveled?" informs destination type

10. Collection metadata

  • Model: adventures/models.py:Collection — name, description, start/end dates
  • Signal: Collection names/descriptions may contain destination/theme hints; trip duration (end_date start_date) → travel pace; trip frequency (count, spacing) → travel cadence
  • Query: Collection.objects.filter(user=user).values('name', 'description', 'start_date', 'end_date')
  • Strength: LOW-MEDIUM (descriptions often blank; names are free-text)

11. Location.price / Lodging.price (MoneyField)

  • Signal: Average spend across locations/lodging → budget tier
  • Query: Location.objects.filter(user=user).aggregate(avg_price=Avg('price')) (requires djmoney amount field)
  • Strength: MEDIUM — but many records may have no price set

12. Location geographic clustering (lat/lon)

  • Signal: Country/region distribution of visited locations → geographic affinity
  • Already tracked: Location.country, Location.region, Location.city (FK, auto-geocoded)
  • Query: Location.objects.filter(user=user).values('country__name').annotate(cnt=Count('id')).order_by('-cnt')
  • Strength: HIGH

13. UserAchievement types

  • Model: achievements/models.py:UserAchievement — types: adventure_count, country_count
  • Signal: Milestone count → engagement level (casual vs. power user); high country_count → variety-seeker
  • Strength: LOW-MEDIUM (only 2 types currently)

14. ChatMessage content (user role)

  • Model: chat/models.py:ChatMessagerole, content
  • Signal: User messages in travel conversations → intent signals ("I love hiking", "looking for cheap food", "family-friendly")
  • Query: ChatMessage.objects.filter(conversation__user=user, role='user').values_list('content', flat=True)
  • Strength: MEDIUM — requires NLP; could be rich but noisy

Aggregation Patterns Already in Codebase

Pattern Location Reusability
Activity stats by category stats_view.py:_get_activity_stats_by_category() Direct reuse
All-tags union tags_view.py:ActivityTypesView.types() Direct reuse
VisitedRegion/City counts stats_view.py:counts() Direct reuse
Multi-user preference merge llm_client.py:get_aggregated_preferences() Partial reuse
Category-filtered location count serializers.py:location_count Pattern reference
Location queryset scoping location_view.py:get_queryset() Standard pattern

Proposed Auto-Profile Fields from Signals

Target Field Primary Signals Secondary Signals
cuisines Location.tags (cuisine words), Location.category (dining) Location.description NLP
interests Activity.sport_type categories, Location.category top-N Location.tags frequency, VisitedRegion types
trip_style Lodging.type top (luxury/budget/outdoor), Transportation.type, Activity sport categories Location.rating Avg, price signals
notes (not auto-derived — keep manual only)

Where to Implement

New function target: integrations/views/recommendation_profile_view.py or a new integrations/utils/auto_profile.py

Suggested function signature:

def build_auto_preference_profile(user) -> dict:
    """
    Returns {cuisines, interests, trip_style} inferred from user's travel history.
    Fields are non-destructive suggestions, not overrides of manual input.
    """

New API endpoint target: POST /api/integrations/recommendation-preferences/auto-learn/
ViewSet action: @action(detail=False, methods=['post'], url_path='auto-learn') on UserRecommendationPreferenceProfileViewSet

Integration Point

get_system_prompt() in chat/llm_client.py already consumes UserRecommendationPreferenceProfile — auto-learned values flow directly into AI context with zero additional changes needed there.

See: knowledge.md — User Recommendation Preference Profile See: plans/ai-travel-agent-redesign.md — WS2