ESG Climate Risk Analytics for Investors

By Fouzil Ali · Published June 29, 2026

Predict company ESG risk scores and map regional carbon exposure using Random Forest and XGBoost models on S&P 500 and emissions data.

esg
climate-risk
machine-learning
investor-analytics
random-forest
xgboost

24 cells1 experiment25 views0 forks

Open the live notebook Explore more notebooks

Inside this notebook

# 🌍 ESG & Climate-Risk Analytics — Investor Demo **Datasets:** S&P 500 ESG Risk Ratings (Kaggle · CC0) • CO₂ & GHG Emissions by Country/Sector (Kaggle · CC0) **Goal:** Predict company-level ESG Total Risk score, map regional carbon exposure, and surface three evidence-backed insights for an investor audience. **Key outputs:** ML model (Random Forest + XGBoost) · Sector/country visualisations · Investor insight cards

# ── Setup & Imports ────────────────────────────────────────────────────────────
import warnings
warnings.filterwarnings("ignore")

import os, io, zipfile, requests
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import matplotlib.ticker as mticker
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from scipy import stats

from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
…

Note: you may need to restart the kernel to use updated packages.
All imports OK ✓

# ── Load Dataset 1: S&P 500 ESG Risk Ratings ─────────────────────────────────
print("Downloading S&P 500 ESG Risk Ratings …")
path_esg = kagglehub.dataset_download("pritish509/s-and-p-500-esg-risk-ratings")
esg_file = [f for f in os.listdir(path_esg) if f.endswith(".csv")][0]
esg = pd.read_csv(os.path.join(path_esg, esg_file))
print(f"ESG dataset: {esg.shape[0]:,} rows × {esg.shape[1]} cols")
print("\nColumns:", esg.columns.tolist())
print("\nSample:\n", esg.head(3).to_string())

Downloading S&P 500 ESG Risk Ratings …
Downloading to /root/.cache/kagglehub/datasets/pritish509/s-and-p-500-esg-risk-ratings/2.archive...
Extracting files...
ESG dataset: 503 rows × 15 cols

Columns: ['Symbol', 'Name', 'Address', 'Sector', 'Industry', 'Full Time Employees', 'Description', 'Total ESG Risk score', 'Environment Risk Score', 'Governance Risk Score', 'Social Risk Score', 'Controversy Level', 'Controversy Score', 'ESG Risk Percentile', 'ESG Risk Level']

Sample:
   Symbol…

# ── Load Dataset 2: CO₂ & GHG Emissions by Country / Sector ──────────────────
print("Downloading CO₂ & GHG Emissions …")
path_co2 = kagglehub.dataset_download("imtkaggleteam/co-and-greenhouse-gas-emissions")
co2_files = os.listdir(path_co2)
print("Files:", co2_files)

Downloading CO₂ & GHG Emissions …
Downloading to /root/.cache/kagglehub/datasets/imtkaggleteam/co-and-greenhouse-gas-emissions/1.archive...
Extracting files...
Files: ['1- temperature-anomaly.csv', '2- annual-co-emissions-by-region.csv', '3- co-emissions-per-capita.csv']

# ── Load and inspect all three CO₂ files ─────────────────────────────────────
temp_anom  = pd.read_csv(os.path.join(path_co2, "1- temperature-anomaly.csv"))
co2_region = pd.read_csv(os.path.join(path_co2, "2- annual-co-emissions-by-region.csv"))
co2_cap    = pd.read_csv(os.path.join(path_co2, "3- co-emissions-per-capita.csv"))

for label, df in [("Temperature Anomaly", temp_anom),
                  ("Annual CO₂ by Region", co2_region),
                  ("CO₂ per Capita",      co2_cap)]:
    print(f"\n── {label} ──  {df.shape}")
    print("  Cols:", df.columns.tolist())
    print(df.head(2).to_string())

── Temperature Anomaly ──  (522, 6)
  Cols: ['Entity', 'Code', 'Year', 'Global average temperature anomaly relative to 1961-1990', 'Upper bound of the annual temperature anomaly (95% confidence interval)', 'Lower bound of the annual temperature anomaly (95% confidence interval)']
   Entity  Code  Year  Global average temperature anomaly relative to 1961-1990  Upper bound of the annual temperature anomaly (95% confidence interval)  Lower bound of the annual temperature anomaly (95% confidence int…

## 1 · Data Profiling & Cleaning

# ── Clean ESG Dataset ─────────────────────────────────────────────────────────
# Standardise numeric columns
num_cols = ['Total ESG Risk score', 'Environment Risk Score',
            'Governance Risk Score', 'Social Risk Score',
            'Controversy Score', 'Full Time Employees']

esg_clean = esg.copy()
for c in num_cols:
    esg_clean[c] = pd.to_numeric(esg_clean[c].astype(str).str.replace(",", ""), errors='coerce')

# Parse employees from strings like "3,157"
esg_clean['Employees'] = esg_clean['Full Time Employees']
esg_clean = esg_clean.dropna(subset=['Total ESG Risk score', 'Environment Risk Score',
                                       'Social Risk Score', 'Governance Risk Score'])

# Derive a simple US geo-region from the Address field
def addr_to_region(addr):
    if not isinstance(addr, str): return "Unknown"
…

ESG clean shape: (430, 17)

Sector distribution:
Sector
Financial Services        63
Technology                61
Industrials               60
Healthcare                53
Consumer Cyclical         51
Consumer Defensive        33
Real Estate               28
Utilities                 28
Energy                    20
Basic Materials           19
Communication Services    14

US Region distribution:
USRegion
Northeast        99
Other US         87
West Coast       72
Midwest          60
Southeast…

This is a preview. Open the live notebook to see all 24 cells with their charts and full outputs, or fork it into your own Clusy workspace.