Product Requirements Document

FanCode AI Recommendation Engine

Aditya ShuklaJuly 2025FanCode

Document Sections

Overview

Build a simple, scalable, AI-assisted recommendation engine that personalizes FanCode's home/feed using:

  • AI metadata auto-extracted per video (sport, teams, players, tags + confidences) and stored in one row (videos) with JSON fields
  • Lightweight user affinity scores for sports, teams, players (0–1)
  • A transparent scoring formula with fixed weights, freshness & popularity boosts, and a small exploration quota

Problem

Generic Feed Experience

Users with different interests (single-sport loyalists vs multi-sport samplers) currently see generic or poorly mixed feeds.

Manual Tagging Limitations

Manual tagging of videos is infeasible → low recall/precision for player/team-level personalization.

System Limitations

Existing systems often overfit to one sport/team, under-explore, or fail with cold/returning users.

Objectives

Priority 1 (P1)

P1

Ship a maintainable, SQL-friendly recommendation engine in ≤ 8 weeks

P1

Achieve +15% CTR on the Recommended row and +10% avg. session watch time

Priority 2 (P2)

P2

Handle cold start & returning users gracefully

P2

Keep infra simple: micro-batch (10–15 min) affinity recomputes; hourly Top-N generation

Constraints

SQL-First Architecture

Primary datastore = SQL (PostgreSQL / Snowflake / BigQuery). We'll use JSON columns inside SQL.

Real-time Limitations

No heavy real-time model serving in v1; real-time = only tiny re-ranks/overrides (e.g., live match).

Performance Target

Serve-time latency target: <200 ms for fetching and assembling the slate (Top-N is precomputed).

Personas

Rohan Sharma

Casual Cricket Fan • 27, Mumbai, SWE

Watches: Only cricket, prefers highlights & interviews

Goal: Quick, relevant cricket snippets

Pain: Non-cricket clutter

Ananya Verma

Multi-Sport Enthusiast • 32, Bangalore, Marketing Manager

Watches: Cricket, football, F1

Goal: Balanced feed across sports

Pain: One sport dominating

Karan Singh

New User • 22, Delhi, Student

Watches: No history; explores trending clips

Goal: Get hooked fast with exciting content

Pain: Irrelevant feed initially

Meera Iyer

Team Loyalist • 29, Chennai, HR

Watches: Hardcore CSK & Dhoni fan

Goal: All CSK/Dhoni content surfaced

Pain: Cross-sport noise during IPL

Use Cases

Personalized Home Feed

Show my top sport/team/player content first.

Cold Start Feed

Show trending + fresh until you learn me.

Cross-Sport Balance

If I watch cricket & football, mix them proportionally.

Live Override

If my fav team is playing live, put it #1.

Exploration

Give me some fresh/trending/new-player stuff (~20%) to keep it interesting.

User Journey

1

Video Upload → AI Metadata

AI tags sport, teams, players, tags with confidences; writes to videos.

2

User Browses & Watches

Every interaction is logged in user_events.

3

Micro-batch Affinity Refresh

Every 10–15 mins, scores in user_affinity_* are updated.

4

Hourly Top-N Build

Join affinities + video metadata, compute FinalScore, persist top 100–200 in user_topn_recos.

5

Serve

On app open, read user_topn_recos, apply tiny real-time tweaks (e.g., live boost, device/time-of-day), interleave 20% exploration, render.

Release Plan

0

Phase 0 – Foundations

Week 0–2

  • • Define schemas, constants, and jobs
  • • Implement AI metadata writer → videos JSON fields
1

Phase 1 – MVP

Week 3–6

  • • Implement affinity updater (micro-batch)
  • • Implement Top-N scorer & persister
  • • Ship Home feed using Top-N + exploration mix
  • • Metrics: CTR, watch time
2

Phase 2 – Hardening

Week 7–10

  • • Add decay, skip suppression, diversity caps
  • • Live override logic
  • • A/B test weight tweaks
3

Phase 3 – Growth

Week 11+

  • • Add merch recos with same scores
  • • Consider RL/bandits or learned weights

Features

9.1 AI Metadata Extraction (Single Object)

What: sport_id, format, teams[], players[], tags[] (each with confidence 0–1), popularity_score

Where: videos table (JSON arrays)

When: On video ingestion

9.2 Affinity Scores (Sports/Teams/Players)

Stored in: user_affinity_sport, user_affinity_team, user_affinity_player

Updated: Every 10–15 mins from user_events

Capped storage: Top 5 teams & top 5 players per user

9.3 Scoring Engine

Formula: FinalScore = (weighted sum of user affinities × AI confidences) × FreshnessBoost × TrendingBoost

Exploration quota: 20% of visible tiles

9.4 Diversity & Suppression

• Max 3 items per team in a row

• Items skipped twice in 72h → suppressed from top 20 for 7 days

9.5 Live & Contextual Overrides

• If a favourite team/player is live → tile #1

• (Optional) slight time-of-day/device boosts in app layer

Success Metrics

Primary Metrics

+15% CTR

on Recommended row

+10% avg. session watch time

Increased engagement

+10% 7-day return rate

User retention

Secondary Metrics

≥70% interaction rate

Top 10 items watched ≥30%

Exploration CTR ≥ 60%

of personalized CTR

Appendix A — Scoring Logic

A1) Storage Schema (SQL, single-object videos)

-- One-row-per-video + JSON metadata
videos (
   video_id BIGINT PRIMARY KEY,
   title VARCHAR,
   description TEXT,
   sport_id INT,
   format TEXT,                         -- 'highlight' | 'interview' | 'fullmatch'
   teams JSON,                          -- [{"team_id":11,"confidence":0.95}, ...]
   players JSON,                        -- [{"player_id":101,"confidence":0.92}, ...]
   tags JSON,                           -- [{"tag":"Last Over","confidence":0.8}, ...]
   published_at TIMESTAMP,
   popularity_score FLOAT,              -- 0.0–1.0
   duration INT
);

user_events (
   event_id BIGINT PRIMARY KEY,
   user_id BIGINT,
   video_id BIGINT,
   event_type TEXT,                     -- 'watch','like','skip','search','follow'
   watch_pct FLOAT,                     -- 0–1
   event_ts TIMESTAMP
);

user_affinity_sport   (user_id BIGINT, sport_id INT,  score FLOAT, PRIMARY KEY(user_id, sport_id));
user_affinity_team    (user_id BIGINT, team_id  INT,  score FLOAT, PRIMARY KEY(user_id, team_id));
user_affinity_player  (user_id BIGINT, player_id INT, score FLOAT, PRIMARY KEY(user_id, player_id));

user_topn_recos (user_id BIGINT, video_id BIGINT, final_score FLOAT, rank INT, batch_ts TIMESTAMP,
                 PRIMARY KEY(user_id, video_id));

A2) Score Ranges & Initialization

ThingRangeDefault (cold)
Sport affinity0–10.20
Team affinity (top 5)0–10.00
Player affinity (top 5)0–10.00

A3) Event → Delta Table

Clamp to [0,1] after update.

EventSport ΔTeam ΔPlayer Δ
Watch ≥70%+0.05+0.05+0.03
Watch 30–70%+0.02+0.03+0.015
Watch <30%−0.02−0.02−0.01
Like/Favorite+0.05+0.05+0.03
Search/Follow+0.05+0.05+0.05
Skip twice / 72h−0.05−0.05−0.03
Rewatch (≥60% again)+0.02+0.03+0.02

Decay: every 15 days without a positive event for that entity:

score = max(0.02, score * 0.9)

Cap storage: keep top 5 teams / top 5 players per user. If a new entity's score surpasses the lowest, replace it.

A4) FinalScore Formula

For user u, video v:

Find components (note JSON parsing):
sport_aff = S_sport[u, sport_id(v)]
team_aff = max over teams(v): S_team[u, team_id] * team_confidence
player_aff = max over players(v): S_player[u, player_id] * player_confidence

BaseScore:
BaseScore = 0.50 * sport_aff
          + 0.30 * team_aff
          + 0.20 * player_aff

Multipliers:
FreshnessBoost
0–2 days → 1.20
3–14 days → 1.00
14+ days → 0.90

TrendingBoost
popularity_score ≥ 0.90 → +0.10 additive on BaseScore OR ×1.10 multiplicative
(default ×1.10)

FinalScore:
FinalScore = BaseScore * FreshnessBoost * (popularity_score >= 0.90 ? 1.10 : 1.00)

Exploration uplift (only for the 20% exploration slots):
FinalScore_explore = FinalScore * 1.05

A6) Worked Example

User u affinities:

  • S_sport[Cricket] = 0.80
  • S_team[CSK] = 0.70
  • S_player[Dhoni] = 0.60

Video v (AI metadata):

  • sport_id = Cricket
  • teams = [{"team_id": 11, "confidence": 0.95}, {"team_id": 12, "confidence": 0.90}]
  • players = [{"player_id": 101, "confidence": 0.92}, {"player_id": 102, "confidence": 0.85}]
  • published_at = 1 day ago
  • popularity_score = 0.92

Compute:

sport_aff   = 0.80
team_aff    = max( 0.70 * 0.95 , 0.10 * 0.90 ) = 0.665
player_aff  = max( 0.60 * 0.92 , 0.20 * 0.85 ) = 0.552

BaseScore   = 0.5*0.80 + 0.3*0.665 + 0.2*0.552
            = 0.40     + 0.1995    + 0.1104
            = 0.7099

FinalScore  = 0.7099 * 1.20 (fresh) * 1.10 (trending)
            = 0.939

Appendix B — Logic by User State

B1) First-time user (no history)

  • Init: S_sport = 0.20 for all major sports, teams/players = 0
  • Feed mix: 50% trending, 50% fresh multi-sport
  • Switch to personalization after 3 videos watched ≥70% (or 5 total events)

B2) Single-sport loyalist

  • • High S_sport[Cricket] (e.g., ≥0.7)
  • • 80%+ of feed = that sport
  • • Exploration stays within that sport first, then occasionally cross-sport

B3) Multi-sport

  • • If Cricket=0.7, Football=0.6, split rows roughly 55/45
  • • Within each sport, use top teams/players

B4) Team/Player loyalist

  • • If a team/player score ≥0.75, ensure at least 1 row (or 2 tiles in top 10) locked to that entity
  • • Live content for that entity → tile #1

B5) Returning after a long gap

  • • Scores decayed to ~0.4–0.5
  • • Blend trending + old interests; add temporary +0.1 boost on first few positive signals to relearn quickly

B6) Low-engagement user

  • • <3 watches in last 30 days: default to fresh + trending, with light sport bias if any exist
  • • After each event, affinity recalculates and personalization ramps

End of PRD

Built with v0