Product Requirements Document

FanCode AI Recommendation Engine

Aditya ShuklaJuly 2025FanCode

Document Sections

Overview

Build a simple, scalable, AI-assisted recommendation engine that personalizes FanCode's home/feed using:

AI metadata auto-extracted per video (sport, teams, players, tags + confidences) and stored in one row (videos) with JSON fields
Lightweight user affinity scores for sports, teams, players (0–1)
A transparent scoring formula with fixed weights, freshness & popularity boosts, and a small exploration quota

Problem

Generic Feed Experience

Users with different interests (single-sport loyalists vs multi-sport samplers) currently see generic or poorly mixed feeds.

Manual Tagging Limitations

Manual tagging of videos is infeasible → low recall/precision for player/team-level personalization.

System Limitations

Existing systems often overfit to one sport/team, under-explore, or fail with cold/returning users.

Objectives

Priority 1 (P1)

Ship a maintainable, SQL-friendly recommendation engine in ≤ 8 weeks

Achieve +15% CTR on the Recommended row and +10% avg. session watch time

Priority 2 (P2)

Handle cold start & returning users gracefully

Keep infra simple: micro-batch (10–15 min) affinity recomputes; hourly Top-N generation

Constraints

SQL-First Architecture

Primary datastore = SQL (PostgreSQL / Snowflake / BigQuery). We'll use JSON columns inside SQL.

Real-time Limitations

No heavy real-time model serving in v1; real-time = only tiny re-ranks/overrides (e.g., live match).

Performance Target

Serve-time latency target: <200 ms for fetching and assembling the slate (Top-N is precomputed).

Personas

Rohan Sharma

Casual Cricket Fan • 27, Mumbai, SWE

Watches: Only cricket, prefers highlights & interviews

Goal: Quick, relevant cricket snippets

Pain: Non-cricket clutter

Ananya Verma

Multi-Sport Enthusiast • 32, Bangalore, Marketing Manager

Watches: Cricket, football, F1

Goal: Balanced feed across sports

Pain: One sport dominating

Karan Singh

New User • 22, Delhi, Student

Watches: No history; explores trending clips

Goal: Get hooked fast with exciting content

Pain: Irrelevant feed initially

Meera Iyer

Team Loyalist • 29, Chennai, HR

Watches: Hardcore CSK & Dhoni fan

Goal: All CSK/Dhoni content surfaced

Pain: Cross-sport noise during IPL

Use Cases

Personalized Home Feed

Show my top sport/team/player content first.

Cold Start Feed

Show trending + fresh until you learn me.

Cross-Sport Balance

If I watch cricket & football, mix them proportionally.

Live Override

If my fav team is playing live, put it #1.

Exploration

Give me some fresh/trending/new-player stuff (~20%) to keep it interesting.

User Journey

Video Upload → AI Metadata

AI tags sport, teams, players, tags with confidences; writes to videos.

User Browses & Watches

Every interaction is logged in user_events.

Micro-batch Affinity Refresh

Every 10–15 mins, scores in user_affinity_* are updated.

Hourly Top-N Build

Join affinities + video metadata, compute FinalScore, persist top 100–200 in user_topn_recos.

Serve

On app open, read user_topn_recos, apply tiny real-time tweaks (e.g., live boost, device/time-of-day), interleave 20% exploration, render.

Release Plan

Phase 0 – Foundations

Week 0–2

• Define schemas, constants, and jobs
• Implement AI metadata writer → videos JSON fields

Phase 1 – MVP

Week 3–6

• Implement affinity updater (micro-batch)
• Implement Top-N scorer & persister
• Ship Home feed using Top-N + exploration mix
• Metrics: CTR, watch time

Phase 2 – Hardening

Week 7–10

• Add decay, skip suppression, diversity caps
• Live override logic
• A/B test weight tweaks

Phase 3 – Growth

Week 11+

• Add merch recos with same scores
• Consider RL/bandits or learned weights

Features

9.1 AI Metadata Extraction (Single Object)

What: sport_id, format, teams[], players[], tags[] (each with confidence 0–1), popularity_score

Where: videos table (JSON arrays)

When: On video ingestion

9.2 Affinity Scores (Sports/Teams/Players)

Stored in: user_affinity_sport, user_affinity_team, user_affinity_player

Updated: Every 10–15 mins from user_events

Capped storage: Top 5 teams & top 5 players per user

9.3 Scoring Engine

Formula: FinalScore = (weighted sum of user affinities × AI confidences) × FreshnessBoost × TrendingBoost

Exploration quota: 20% of visible tiles

9.4 Diversity & Suppression

• Max 3 items per team in a row

• Items skipped twice in 72h → suppressed from top 20 for 7 days

9.5 Live & Contextual Overrides

• If a favourite team/player is live → tile #1

• (Optional) slight time-of-day/device boosts in app layer

Success Metrics

Primary Metrics

+15% CTR

on Recommended row

+10% avg. session watch time

Increased engagement

+10% 7-day return rate

User retention

Secondary Metrics

≥70% interaction rate

Top 10 items watched ≥30%

Exploration CTR ≥ 60%

of personalized CTR

Appendix A — Scoring Logic

A1) Storage Schema (SQL, single-object videos)

-- One-row-per-video + JSON metadata
videos (
   video_id BIGINT PRIMARY KEY,
   title VARCHAR,
   description TEXT,
   sport_id INT,
   format TEXT,                         -- 'highlight' | 'interview' | 'fullmatch'
   teams JSON,                          -- [{"team_id":11,"confidence":0.95}, ...]
   players JSON,                        -- [{"player_id":101,"confidence":0.92}, ...]
   tags JSON,                           -- [{"tag":"Last Over","confidence":0.8}, ...]
   published_at TIMESTAMP,
   popularity_score FLOAT,              -- 0.0–1.0
   duration INT
);

user_events (
   event_id BIGINT PRIMARY KEY,
   user_id BIGINT,
   video_id BIGINT,
   event_type TEXT,                     -- 'watch','like','skip','search','follow'
   watch_pct FLOAT,                     -- 0–1
   event_ts TIMESTAMP
);

user_affinity_sport   (user_id BIGINT, sport_id INT,  score FLOAT, PRIMARY KEY(user_id, sport_id));
user_affinity_team    (user_id BIGINT, team_id  INT,  score FLOAT, PRIMARY KEY(user_id, team_id));
user_affinity_player  (user_id BIGINT, player_id INT, score FLOAT, PRIMARY KEY(user_id, player_id));

user_topn_recos (user_id BIGINT, video_id BIGINT, final_score FLOAT, rank INT, batch_ts TIMESTAMP,
                 PRIMARY KEY(user_id, video_id));

A2) Score Ranges & Initialization

Thing	Range	Default (cold)
Sport affinity	0–1	0.20
Team affinity (top 5)	0–1	0.00
Player affinity (top 5)	0–1	0.00

A3) Event → Delta Table

Clamp to [0,1] after update.

Event	Sport Δ	Team Δ	Player Δ
Watch ≥70%	+0.05	+0.05	+0.03
Watch 30–70%	+0.02	+0.03	+0.015
Watch <30%	−0.02	−0.02	−0.01
Like/Favorite	+0.05	+0.05	+0.03
Search/Follow	+0.05	+0.05	+0.05
Skip twice / 72h	−0.05	−0.05	−0.03
Rewatch (≥60% again)	+0.02	+0.03	+0.02

Decay: every 15 days without a positive event for that entity:

score = max(0.02, score * 0.9)

Cap storage: keep top 5 teams / top 5 players per user. If a new entity's score surpasses the lowest, replace it.

A4) FinalScore Formula

For user u, video v:

Find components (note JSON parsing):
sport_aff = S_sport[u, sport_id(v)]
team_aff = max over teams(v): S_team[u, team_id] * team_confidence
player_aff = max over players(v): S_player[u, player_id] * player_confidence

BaseScore:
BaseScore = 0.50 * sport_aff
          + 0.30 * team_aff
          + 0.20 * player_aff

Multipliers:
FreshnessBoost
0–2 days → 1.20
3–14 days → 1.00
14+ days → 0.90

TrendingBoost
popularity_score ≥ 0.90 → +0.10 additive on BaseScore OR ×1.10 multiplicative
(default ×1.10)

FinalScore:
FinalScore = BaseScore * FreshnessBoost * (popularity_score >= 0.90 ? 1.10 : 1.00)

Exploration uplift (only for the 20% exploration slots):
FinalScore_explore = FinalScore * 1.05

A6) Worked Example

User u affinities:

S_sport[Cricket] = 0.80
S_team[CSK] = 0.70
S_player[Dhoni] = 0.60

Video v (AI metadata):

sport_id = Cricket
teams = [{"team_id": 11, "confidence": 0.95}, {"team_id": 12, "confidence": 0.90}]
players = [{"player_id": 101, "confidence": 0.92}, {"player_id": 102, "confidence": 0.85}]
published_at = 1 day ago
popularity_score = 0.92

Compute:

sport_aff   = 0.80
team_aff    = max( 0.70 * 0.95 , 0.10 * 0.90 ) = 0.665
player_aff  = max( 0.60 * 0.92 , 0.20 * 0.85 ) = 0.552

BaseScore   = 0.5*0.80 + 0.3*0.665 + 0.2*0.552
            = 0.40     + 0.1995    + 0.1104
            = 0.7099

FinalScore  = 0.7099 * 1.20 (fresh) * 1.10 (trending)
            = 0.939

Appendix B — Logic by User State

B1) First-time user (no history)

• Init: S_sport = 0.20 for all major sports, teams/players = 0
• Feed mix: 50% trending, 50% fresh multi-sport
• Switch to personalization after 3 videos watched ≥70% (or 5 total events)

B2) Single-sport loyalist

• High S_sport[Cricket] (e.g., ≥0.7)
• 80%+ of feed = that sport
• Exploration stays within that sport first, then occasionally cross-sport

B3) Multi-sport

• If Cricket=0.7, Football=0.6, split rows roughly 55/45
• Within each sport, use top teams/players

B4) Team/Player loyalist

• If a team/player score ≥0.75, ensure at least 1 row (or 2 tiles in top 10) locked to that entity
• Live content for that entity → tile #1

B5) Returning after a long gap

• Scores decayed to ~0.4–0.5
• Blend trending + old interests; add temporary +0.1 boost on first few positive signals to relearn quickly

B6) Low-engagement user

• <3 watches in last 30 days: default to fresh + trending, with light sport bias if any exist
• After each event, affinity recalculates and personalization ramps

End of PRD