Product Requirements Document
FanCode AI Recommendation Engine
Document Sections
Overview
Build a simple, scalable, AI-assisted recommendation engine that personalizes FanCode's home/feed using:
- AI metadata auto-extracted per video (sport, teams, players, tags + confidences) and stored in one row (videos) with JSON fields
- Lightweight user affinity scores for sports, teams, players (0–1)
- A transparent scoring formula with fixed weights, freshness & popularity boosts, and a small exploration quota
Problem
Generic Feed Experience
Users with different interests (single-sport loyalists vs multi-sport samplers) currently see generic or poorly mixed feeds.
Manual Tagging Limitations
Manual tagging of videos is infeasible → low recall/precision for player/team-level personalization.
System Limitations
Existing systems often overfit to one sport/team, under-explore, or fail with cold/returning users.
Objectives
Priority 1 (P1)
Ship a maintainable, SQL-friendly recommendation engine in ≤ 8 weeks
Achieve +15% CTR on the Recommended row and +10% avg. session watch time
Priority 2 (P2)
Handle cold start & returning users gracefully
Keep infra simple: micro-batch (10–15 min) affinity recomputes; hourly Top-N generation
Constraints
SQL-First Architecture
Primary datastore = SQL (PostgreSQL / Snowflake / BigQuery). We'll use JSON columns inside SQL.
Real-time Limitations
No heavy real-time model serving in v1; real-time = only tiny re-ranks/overrides (e.g., live match).
Performance Target
Serve-time latency target: <200 ms for fetching and assembling the slate (Top-N is precomputed).
Personas
Rohan Sharma
Casual Cricket Fan • 27, Mumbai, SWE
Watches: Only cricket, prefers highlights & interviews
Goal: Quick, relevant cricket snippets
Pain: Non-cricket clutter
Ananya Verma
Multi-Sport Enthusiast • 32, Bangalore, Marketing Manager
Watches: Cricket, football, F1
Goal: Balanced feed across sports
Pain: One sport dominating
Karan Singh
New User • 22, Delhi, Student
Watches: No history; explores trending clips
Goal: Get hooked fast with exciting content
Pain: Irrelevant feed initially
Meera Iyer
Team Loyalist • 29, Chennai, HR
Watches: Hardcore CSK & Dhoni fan
Goal: All CSK/Dhoni content surfaced
Pain: Cross-sport noise during IPL
Use Cases
Personalized Home Feed
Show my top sport/team/player content first.
Cold Start Feed
Show trending + fresh until you learn me.
Cross-Sport Balance
If I watch cricket & football, mix them proportionally.
Live Override
If my fav team is playing live, put it #1.
Exploration
Give me some fresh/trending/new-player stuff (~20%) to keep it interesting.
User Journey
Video Upload → AI Metadata
AI tags sport, teams, players, tags with confidences; writes to videos.
User Browses & Watches
Every interaction is logged in user_events.
Micro-batch Affinity Refresh
Every 10–15 mins, scores in user_affinity_* are updated.
Hourly Top-N Build
Join affinities + video metadata, compute FinalScore, persist top 100–200 in user_topn_recos.
Serve
On app open, read user_topn_recos, apply tiny real-time tweaks (e.g., live boost, device/time-of-day), interleave 20% exploration, render.
Release Plan
Phase 0 – Foundations
Week 0–2
- • Define schemas, constants, and jobs
- • Implement AI metadata writer → videos JSON fields
Phase 1 – MVP
Week 3–6
- • Implement affinity updater (micro-batch)
- • Implement Top-N scorer & persister
- • Ship Home feed using Top-N + exploration mix
- • Metrics: CTR, watch time
Phase 2 – Hardening
Week 7–10
- • Add decay, skip suppression, diversity caps
- • Live override logic
- • A/B test weight tweaks
Phase 3 – Growth
Week 11+
- • Add merch recos with same scores
- • Consider RL/bandits or learned weights
Features
9.1 AI Metadata Extraction (Single Object)
What: sport_id, format, teams[], players[], tags[] (each with confidence 0–1), popularity_score
Where: videos table (JSON arrays)
When: On video ingestion
9.2 Affinity Scores (Sports/Teams/Players)
Stored in: user_affinity_sport, user_affinity_team, user_affinity_player
Updated: Every 10–15 mins from user_events
Capped storage: Top 5 teams & top 5 players per user
9.3 Scoring Engine
Formula: FinalScore = (weighted sum of user affinities × AI confidences) × FreshnessBoost × TrendingBoost
Exploration quota: 20% of visible tiles
9.4 Diversity & Suppression
• Max 3 items per team in a row
• Items skipped twice in 72h → suppressed from top 20 for 7 days
9.5 Live & Contextual Overrides
• If a favourite team/player is live → tile #1
• (Optional) slight time-of-day/device boosts in app layer
Success Metrics
Primary Metrics
+15% CTR
on Recommended row
+10% avg. session watch time
Increased engagement
+10% 7-day return rate
User retention
Secondary Metrics
≥70% interaction rate
Top 10 items watched ≥30%
Exploration CTR ≥ 60%
of personalized CTR
Appendix A — Scoring Logic
A1) Storage Schema (SQL, single-object videos)
-- One-row-per-video + JSON metadata
videos (
video_id BIGINT PRIMARY KEY,
title VARCHAR,
description TEXT,
sport_id INT,
format TEXT, -- 'highlight' | 'interview' | 'fullmatch'
teams JSON, -- [{"team_id":11,"confidence":0.95}, ...]
players JSON, -- [{"player_id":101,"confidence":0.92}, ...]
tags JSON, -- [{"tag":"Last Over","confidence":0.8}, ...]
published_at TIMESTAMP,
popularity_score FLOAT, -- 0.0–1.0
duration INT
);
user_events (
event_id BIGINT PRIMARY KEY,
user_id BIGINT,
video_id BIGINT,
event_type TEXT, -- 'watch','like','skip','search','follow'
watch_pct FLOAT, -- 0–1
event_ts TIMESTAMP
);
user_affinity_sport (user_id BIGINT, sport_id INT, score FLOAT, PRIMARY KEY(user_id, sport_id));
user_affinity_team (user_id BIGINT, team_id INT, score FLOAT, PRIMARY KEY(user_id, team_id));
user_affinity_player (user_id BIGINT, player_id INT, score FLOAT, PRIMARY KEY(user_id, player_id));
user_topn_recos (user_id BIGINT, video_id BIGINT, final_score FLOAT, rank INT, batch_ts TIMESTAMP,
PRIMARY KEY(user_id, video_id));A2) Score Ranges & Initialization
| Thing | Range | Default (cold) |
|---|---|---|
| Sport affinity | 0–1 | 0.20 |
| Team affinity (top 5) | 0–1 | 0.00 |
| Player affinity (top 5) | 0–1 | 0.00 |
A3) Event → Delta Table
Clamp to [0,1] after update.
| Event | Sport Δ | Team Δ | Player Δ |
|---|---|---|---|
| Watch ≥70% | +0.05 | +0.05 | +0.03 |
| Watch 30–70% | +0.02 | +0.03 | +0.015 |
| Watch <30% | −0.02 | −0.02 | −0.01 |
| Like/Favorite | +0.05 | +0.05 | +0.03 |
| Search/Follow | +0.05 | +0.05 | +0.05 |
| Skip twice / 72h | −0.05 | −0.05 | −0.03 |
| Rewatch (≥60% again) | +0.02 | +0.03 | +0.02 |
Decay: every 15 days without a positive event for that entity:
score = max(0.02, score * 0.9)
Cap storage: keep top 5 teams / top 5 players per user. If a new entity's score surpasses the lowest, replace it.
A4) FinalScore Formula
For user u, video v:
Find components (note JSON parsing):
sport_aff = S_sport[u, sport_id(v)]
team_aff = max over teams(v): S_team[u, team_id] * team_confidence
player_aff = max over players(v): S_player[u, player_id] * player_confidence
BaseScore:
BaseScore = 0.50 * sport_aff
+ 0.30 * team_aff
+ 0.20 * player_aff
Multipliers:
FreshnessBoost
0–2 days → 1.20
3–14 days → 1.00
14+ days → 0.90
TrendingBoost
popularity_score ≥ 0.90 → +0.10 additive on BaseScore OR ×1.10 multiplicative
(default ×1.10)
FinalScore:
FinalScore = BaseScore * FreshnessBoost * (popularity_score >= 0.90 ? 1.10 : 1.00)
Exploration uplift (only for the 20% exploration slots):
FinalScore_explore = FinalScore * 1.05A6) Worked Example
User u affinities:
- S_sport[Cricket] = 0.80
- S_team[CSK] = 0.70
- S_player[Dhoni] = 0.60
Video v (AI metadata):
- sport_id = Cricket
- teams = [{"team_id": 11, "confidence": 0.95}, {"team_id": 12, "confidence": 0.90}]
- players = [{"player_id": 101, "confidence": 0.92}, {"player_id": 102, "confidence": 0.85}]
- published_at = 1 day ago
- popularity_score = 0.92
Compute:
sport_aff = 0.80
team_aff = max( 0.70 * 0.95 , 0.10 * 0.90 ) = 0.665
player_aff = max( 0.60 * 0.92 , 0.20 * 0.85 ) = 0.552
BaseScore = 0.5*0.80 + 0.3*0.665 + 0.2*0.552
= 0.40 + 0.1995 + 0.1104
= 0.7099
FinalScore = 0.7099 * 1.20 (fresh) * 1.10 (trending)
= 0.939Appendix B — Logic by User State
B1) First-time user (no history)
- • Init: S_sport = 0.20 for all major sports, teams/players = 0
- • Feed mix: 50% trending, 50% fresh multi-sport
- • Switch to personalization after 3 videos watched ≥70% (or 5 total events)
B2) Single-sport loyalist
- • High S_sport[Cricket] (e.g., ≥0.7)
- • 80%+ of feed = that sport
- • Exploration stays within that sport first, then occasionally cross-sport
B3) Multi-sport
- • If Cricket=0.7, Football=0.6, split rows roughly 55/45
- • Within each sport, use top teams/players
B4) Team/Player loyalist
- • If a team/player score ≥0.75, ensure at least 1 row (or 2 tiles in top 10) locked to that entity
- • Live content for that entity → tile #1
B5) Returning after a long gap
- • Scores decayed to ~0.4–0.5
- • Blend trending + old interests; add temporary +0.1 boost on first few positive signals to relearn quickly
B6) Low-engagement user
- • <3 watches in last 30 days: default to fresh + trending, with light sport bias if any exist
- • After each event, affinity recalculates and personalization ramps
End of PRD