MovieLens 100K Recommendation System Comparison
By Fouzil Ali Β· Published June 29, 2026
Head-to-head evaluation of five recommendation methods (popularity baseline, user-based CF, item-based CF, SVD, TF-IDF content-based) on MovieLens 100K with metrics and per-user recommendations.
- collaborative-filtering
- recommendation-system
- matrix-factorization
- eda
- movielens
Inside this notebook
# π¬ Recommendation System β MovieLens 100K A **four-method comparison** on the [MovieLens 100K](https://grouplens.org/datasets/movielens/100k/) dataset (~100K ratings, 943 users, 1682 movies): | Method | Idea | Type | |---|---|---| | **Popularity Baseline** | Recommend what everyone watches | Non-personalised | | **User-based CF** | Find users like you; recommend what they liked | Memory-based | | **Item-based CF** | Find movies like those you liked | Memory-based | | **SVD (Matrix Factorisation)** | Learn latent taste factors | Model-based | | **Content-based (TF-IDF)** | Match your taste profile to genre descriptions | Content-based | **Key outputs:** head-to-head metrics (RMSE, Precision@10, Recall@10) and concrete per-user recommendations from every method.
# ββ Imports ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
import io, zipfile, urllib.request, warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix
from scipy.sparse.linalg import svds
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
β¦Downloading MovieLens 100K β¦ done (4.9 MB) Ratings : 100,000 rows | Users: 943 | Items: 1682 Rating scale: 1 β 5
## 1 Β· Exploratory Data Analysis
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
# ββ Rating distribution βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
counts = ratings["rating"].value_counts().sort_index()
axes[0].bar(counts.index, counts.values, color="#4C72B0", edgecolor="white", width=0.6)
axes[0].set_title("Rating Distribution", fontsize=13, fontweight="bold")
axes[0].set_xlabel("Star Rating"); axes[0].set_ylabel("Count")
for x, y in zip(counts.index, counts.values):
axes[0].text(x, y + 200, f"{y:,}", ha="center", fontsize=9)
# ββ Ratings per user (log scale) βββββββββββββββββββββββββββββββββββββββββββββ
ucount = ratings.groupby("user_id").size()
axes[1].hist(ucount, bins=40, color="#55A868", edgecolor="white")
axes[1].axvline(ucount.median(), color="crimson", ls="--", lw=1.5,
label=f"Median = {ucount.median():.0f}")
axes[1].set_title("Ratings per User", fontsize=13, fontweight="bold")
axes[1].set_xlabel("# Ratings"); axes[1].set_ylabel("# Users")
axes[1].legend(fontsize=9)
β¦Matrix sparsity: 93.70% (943 users Γ 1682 items) Most-rated movie : Star Wars (1977) (583 ratings) Most active user : user #405 (737 ratings)
# ββ Train / Test split ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Hold out 20 % of each user's ratings so every user has test items
def user_stratified_split(df, test_frac=0.2, seed=42):
train_rows, test_rows = [], []
for _, grp in df.groupby("user_id"):
if len(grp) < 5:
train_rows.append(grp)
continue
tr, te = train_test_split(grp, test_size=test_frac, random_state=seed)
train_rows.append(tr)
test_rows.append(te)
return pd.concat(train_rows), pd.concat(test_rows)
train_df, test_df = user_stratified_split(ratings)
print(f"Train: {len(train_df):,} rows | Test: {len(test_df):,} rows")
# ββ Build full user-item matrix (train only) ββββββββββββββββββββββββββββββββββ
ALL_USERS = sorted(ratings.user_id.unique())
β¦Train: 79,619 rows | Test: 20,381 rows Sparse matrix shape: (943, 1682) nnz=79,619
## 2 Β· Popularity Baseline
# ββ Popularity Baseline βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Bayesian average to balance high-rating + low-count items
C = train_df.groupby("item_id")["rating"].count()
M = train_df.groupby("item_id")["rating"].mean()
m = C.quantile(0.60) # min-vote threshold
global_mean = train_df["rating"].mean()
pop_score = (C / (C + m)) * M + (m / (C + m)) * global_mean
pop_df = (movies.merge(pd.DataFrame({"item_id": pop_score.index,
"score": pop_score.values,
"n_ratings": C.values,
"avg_rating": M.values}),
on="item_id")
.sort_values("score", ascending=False)
.reset_index(drop=True))
print("Top-10 most popular movies (Bayesian-average score):\n")
print(pop_df[["title","genres","avg_rating","n_ratings","score"]].head(10).to_string(index=False))
β¦Top-10 most popular movies (Bayesian-average score):
title genres avg_rating n_ratings score
Schindler's List (1993) Drama War 4.465306 245 4.348391
Star Wars (1977) Action Adventure Romance Sci_Fi War 4.404348 460 4.342524
Casablanca (1942) Drama Romance War 4.471795 195 4.328476
Shawshank Redemption, The (1994)β¦## 3 Β· User-Based Collaborative Filtering Find the K most similar users (cosine similarity on mean-centered ratings) and aggregate their ratings as a predicted score.
This is a preview. Open the live notebook to see all 27 cells with their charts and full outputs, or fork it into your own Clusy workspace.