ESG Climate Risk Analytics for Investors
By Fouzil Ali Β· Published June 29, 2026
Predict company ESG risk scores and map regional carbon exposure using Random Forest and XGBoost models on S&P 500 and emissions data.
- esg
- climate-risk
- machine-learning
- investor-analytics
- random-forest
- xgboost
Inside this notebook
# π ESG & Climate-Risk Analytics β Investor Demo **Datasets:** S&P 500 ESG Risk Ratings (Kaggle Β· CC0) β’ COβ & GHG Emissions by Country/Sector (Kaggle Β· CC0) **Goal:** Predict company-level ESG Total Risk score, map regional carbon exposure, and surface three evidence-backed insights for an investor audience. **Key outputs:** ML model (Random Forest + XGBoost) Β· Sector/country visualisations Β· Investor insight cards
# ββ Setup & Imports ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
import warnings
warnings.filterwarnings("ignore")
import os, io, zipfile, requests
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import matplotlib.ticker as mticker
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from scipy import stats
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
β¦Note: you may need to restart the kernel to use updated packages. All imports OK β
# ββ Load Dataset 1: S&P 500 ESG Risk Ratings βββββββββββββββββββββββββββββββββ
print("Downloading S&P 500 ESG Risk Ratings β¦")
path_esg = kagglehub.dataset_download("pritish509/s-and-p-500-esg-risk-ratings")
esg_file = [f for f in os.listdir(path_esg) if f.endswith(".csv")][0]
esg = pd.read_csv(os.path.join(path_esg, esg_file))
print(f"ESG dataset: {esg.shape[0]:,} rows Γ {esg.shape[1]} cols")
print("\nColumns:", esg.columns.tolist())
print("\nSample:\n", esg.head(3).to_string())Downloading S&P 500 ESG Risk Ratings β¦ Downloading to /root/.cache/kagglehub/datasets/pritish509/s-and-p-500-esg-risk-ratings/2.archive... Extracting files... ESG dataset: 503 rows Γ 15 cols Columns: ['Symbol', 'Name', 'Address', 'Sector', 'Industry', 'Full Time Employees', 'Description', 'Total ESG Risk score', 'Environment Risk Score', 'Governance Risk Score', 'Social Risk Score', 'Controversy Level', 'Controversy Score', 'ESG Risk Percentile', 'ESG Risk Level'] Sample: Symbolβ¦
# ββ Load Dataset 2: COβ & GHG Emissions by Country / Sector ββββββββββββββββββ
print("Downloading COβ & GHG Emissions β¦")
path_co2 = kagglehub.dataset_download("imtkaggleteam/co-and-greenhouse-gas-emissions")
co2_files = os.listdir(path_co2)
print("Files:", co2_files)Downloading COβ & GHG Emissions β¦ Downloading to /root/.cache/kagglehub/datasets/imtkaggleteam/co-and-greenhouse-gas-emissions/1.archive... Extracting files... Files: ['1- temperature-anomaly.csv', '2- annual-co-emissions-by-region.csv', '3- co-emissions-per-capita.csv']
# ββ Load and inspect all three COβ files βββββββββββββββββββββββββββββββββββββ
temp_anom = pd.read_csv(os.path.join(path_co2, "1- temperature-anomaly.csv"))
co2_region = pd.read_csv(os.path.join(path_co2, "2- annual-co-emissions-by-region.csv"))
co2_cap = pd.read_csv(os.path.join(path_co2, "3- co-emissions-per-capita.csv"))
for label, df in [("Temperature Anomaly", temp_anom),
("Annual COβ by Region", co2_region),
("COβ per Capita", co2_cap)]:
print(f"\nββ {label} ββ {df.shape}")
print(" Cols:", df.columns.tolist())
print(df.head(2).to_string())ββ Temperature Anomaly ββ (522, 6) Cols: ['Entity', 'Code', 'Year', 'Global average temperature anomaly relative to 1961-1990', 'Upper bound of the annual temperature anomaly (95% confidence interval)', 'Lower bound of the annual temperature anomaly (95% confidence interval)'] Entity Code Year Global average temperature anomaly relative to 1961-1990 Upper bound of the annual temperature anomaly (95% confidence interval) Lower bound of the annual temperature anomaly (95% confidence intβ¦
## 1 Β· Data Profiling & Cleaning
# ββ Clean ESG Dataset βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Standardise numeric columns
num_cols = ['Total ESG Risk score', 'Environment Risk Score',
'Governance Risk Score', 'Social Risk Score',
'Controversy Score', 'Full Time Employees']
esg_clean = esg.copy()
for c in num_cols:
esg_clean[c] = pd.to_numeric(esg_clean[c].astype(str).str.replace(",", ""), errors='coerce')
# Parse employees from strings like "3,157"
esg_clean['Employees'] = esg_clean['Full Time Employees']
esg_clean = esg_clean.dropna(subset=['Total ESG Risk score', 'Environment Risk Score',
'Social Risk Score', 'Governance Risk Score'])
# Derive a simple US geo-region from the Address field
def addr_to_region(addr):
if not isinstance(addr, str): return "Unknown"
β¦ESG clean shape: (430, 17) Sector distribution: Sector Financial Services 63 Technology 61 Industrials 60 Healthcare 53 Consumer Cyclical 51 Consumer Defensive 33 Real Estate 28 Utilities 28 Energy 20 Basic Materials 19 Communication Services 14 US Region distribution: USRegion Northeast 99 Other US 87 West Coast 72 Midwest 60 Southeastβ¦
This is a preview. Open the live notebook to see all 24 cells with their charts and full outputs, or fork it into your own Clusy workspace.