MovieLens 100K Recommendation System Comparison

By Fouzil Ali Β· Published June 29, 2026

Head-to-head evaluation of five recommendation methods (popularity baseline, user-based CF, item-based CF, SVD, TF-IDF content-based) on MovieLens 100K with metrics and per-user recommendations.

  • collaborative-filtering
  • recommendation-system
  • matrix-factorization
  • eda
  • movielens
27 cells1 experiment26 views0 forks

Inside this notebook

# 🎬 Recommendation System β€” MovieLens 100K A **four-method comparison** on the [MovieLens 100K](https://grouplens.org/datasets/movielens/100k/) dataset (~100K ratings, 943 users, 1682 movies): | Method | Idea | Type | |---|---|---| | **Popularity Baseline** | Recommend what everyone watches | Non-personalised | | **User-based CF** | Find users like you; recommend what they liked | Memory-based | | **Item-based CF** | Find movies like those you liked | Memory-based | | **SVD (Matrix Factorisation)** | Learn latent taste factors | Model-based | | **Content-based (TF-IDF)** | Match your taste profile to genre descriptions | Content-based | **Key outputs:** head-to-head metrics (RMSE, Precision@10, Recall@10) and concrete per-user recommendations from every method.

# ── Imports ──────────────────────────────────────────────────────────────────
import io, zipfile, urllib.request, warnings
warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix
from scipy.sparse.linalg import svds
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
…
Downloading MovieLens 100K … done (4.9 MB)

Ratings : 100,000 rows  |  Users: 943  |  Items: 1682
Rating scale: 1 – 5

## 1 Β· Exploratory Data Analysis

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# ── Rating distribution ───────────────────────────────────────────────────────
counts = ratings["rating"].value_counts().sort_index()
axes[0].bar(counts.index, counts.values, color="#4C72B0", edgecolor="white", width=0.6)
axes[0].set_title("Rating Distribution", fontsize=13, fontweight="bold")
axes[0].set_xlabel("Star Rating"); axes[0].set_ylabel("Count")
for x, y in zip(counts.index, counts.values):
    axes[0].text(x, y + 200, f"{y:,}", ha="center", fontsize=9)

# ── Ratings per user (log scale) ─────────────────────────────────────────────
ucount = ratings.groupby("user_id").size()
axes[1].hist(ucount, bins=40, color="#55A868", edgecolor="white")
axes[1].axvline(ucount.median(), color="crimson", ls="--", lw=1.5,
                label=f"Median = {ucount.median():.0f}")
axes[1].set_title("Ratings per User", fontsize=13, fontweight="bold")
axes[1].set_xlabel("# Ratings"); axes[1].set_ylabel("# Users")
axes[1].legend(fontsize=9)
…
Matrix sparsity: 93.70%  (943 users Γ— 1682 items)
Most-rated movie : Star Wars (1977)  (583 ratings)
Most active user : user #405  (737 ratings)
# ── Train / Test split  ──────────────────────────────────────────────────────
# Hold out 20 % of each user's ratings so every user has test items
def user_stratified_split(df, test_frac=0.2, seed=42):
    train_rows, test_rows = [], []
    for _, grp in df.groupby("user_id"):
        if len(grp) < 5:
            train_rows.append(grp)
            continue
        tr, te = train_test_split(grp, test_size=test_frac, random_state=seed)
        train_rows.append(tr)
        test_rows.append(te)
    return pd.concat(train_rows), pd.concat(test_rows)

train_df, test_df = user_stratified_split(ratings)
print(f"Train: {len(train_df):,} rows  |  Test: {len(test_df):,} rows")

# ── Build full user-item matrix (train only) ──────────────────────────────────
ALL_USERS = sorted(ratings.user_id.unique())
…
Train: 79,619 rows  |  Test: 20,381 rows
Sparse matrix shape: (943, 1682)  nnz=79,619

## 2 Β· Popularity Baseline

# ── Popularity Baseline ───────────────────────────────────────────────────────
# Bayesian average to balance high-rating + low-count items
C = train_df.groupby("item_id")["rating"].count()
M = train_df.groupby("item_id")["rating"].mean()
m = C.quantile(0.60)          # min-vote threshold
global_mean = train_df["rating"].mean()

pop_score = (C / (C + m)) * M + (m / (C + m)) * global_mean
pop_df = (movies.merge(pd.DataFrame({"item_id": pop_score.index,
                                      "score": pop_score.values,
                                      "n_ratings": C.values,
                                      "avg_rating": M.values}),
                       on="item_id")
          .sort_values("score", ascending=False)
          .reset_index(drop=True))

print("Top-10 most popular movies (Bayesian-average score):\n")
print(pop_df[["title","genres","avg_rating","n_ratings","score"]].head(10).to_string(index=False))
…
Top-10 most popular movies (Bayesian-average score):

                           title                              genres  avg_rating  n_ratings    score
         Schindler's List (1993)                           Drama War    4.465306        245 4.348391
                Star Wars (1977) Action Adventure Romance Sci_Fi War    4.404348        460 4.342524
               Casablanca (1942)                   Drama Romance War    4.471795        195 4.328476
Shawshank Redemption, The (1994)…

## 3 Β· User-Based Collaborative Filtering Find the K most similar users (cosine similarity on mean-centered ratings) and aggregate their ratings as a predicted score.

This is a preview. Open the live notebook to see all 27 cells with their charts and full outputs, or fork it into your own Clusy workspace.

MovieLens 100K Recommendation System Comparison | Clusy