Air Passengers Forecast with Confidence Intervals

By Ju Lin · Published June 28, 2026

Forecasts airline passenger volume 24 months ahead using SARIMA with 95% confidence intervals.

  • time-series
  • forecasting
  • sarima
  • confidence-intervals
  • eda
  • seasonal
8 cells1 experiment190 views0 forks

Inside this notebook

## Air Passengers — 24-Month Forecast with Confidence Intervals **What this notebook does:** Forecasts international airline passenger volume 24 months into the future using a SARIMA (Seasonal ARIMA) model, with quantified uncertainty. **Dataset:** Monthly totals of international airline passengers (Box & Jenkins, 1976), spanning January 1949 through December 1960 — 144 observations. This is the classic textbook time series for demonstrating seasonal forecasting. **Workflow:** 1. **Explore** — Plot the raw series, examine autocorrelation (ACF/PACF), test for stationarity 2. **Model** — Fit and compare several SARIMA specifications, pick the best by AIC 3. **Diagnose** — Check residuals for remaining patterns 4. **Forecast** — Project 24 months ahead (1961–01 to 1962–12) with 95% confidence intervals 5. **Visualise** — One polished figure showing history, forecast, and the widening uncertainty band **Key features of the data:** Clear upward trend + strong 12-month seasonal cycle (summer peaks, winter troughs). The seasonal amplitude grows with the trend, suggesting the model may benefit from a log transform — explored in the diagnostics.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.stattools import adfuller
import warnings
warnings.filterwarnings("ignore")

# Load the classic Air Passengers dataset
data = sm.datasets.get_rdataset("AirPassengers", "datasets")
series = data.data
series.head()
# ── 1. Convert to proper time series ──────────────────────────────────
# The "time" column is year+fraction; convert to monthly period
series['month'] = pd.to_datetime(series['time'].apply(lambda x: f"{int(x)}-{int(round((x % 1) * 12)) + 1:02d}-01"))
ts = series.set_index('month')['value'].asfreq('MS')
ts.name = 'passengers'

print(f"Shape: {ts.shape}")
print(f"Date range: {ts.index.min()} → {ts.index.max()}")
print(f"Periods per year: {ts.resample('YE').count().unique()}")
print(f"First 5 rows:\n{ts.head()}")
Shape: (144,)
Date range: 1949-01-01 00:00:00 → 1960-12-01 00:00:00
Periods per year: [12]
First 5 rows:
month
1949-01-01    112
1949-02-01    118
1949-03-01    132
1949-04-01    129
1949-05-01    121
Freq: MS, Name: passengers, dtype: int64
# ── 2. EDA: plot the series, ACF/PACF, and stationarity test ──────────

fig, axes = plt.subplots(2, 2, figsize=(14, 8))
fig.suptitle("Air Passengers — Time Series Diagnostics", fontsize=14, y=1.02)

# Raw series
ax1 = axes[0, 0]
ax1.plot(ts.index, ts.values, color='#2c3e50', linewidth=1.5)
ax1.set_title("Monthly Air Passengers (1949–1960)")
ax1.set_ylabel("Passengers")
ax1.grid(alpha=0.3)

# Annual trend overlay
yearly = ts.resample('YE').mean()
ax1.plot(yearly.index, yearly.values, color='#e74c3c', linewidth=2, alpha=0.8, label='Annual avg')
ax1.legend()

# ACF
…
Seasonal decomposition (additive model):
# ── 3. Fit SARIMA model ────────────────────────────────────────────────
# Based on EDA, the series requires:
#   - d=1  (trend non-stationary)
#   - D=1  (seasonal non-stationary — amplitude grows with trend)
#   - s=12 (monthly seasonality)
#
# Try two candidate specifications and pick the lower AIC.

from statsmodels.tsa.statespace.sarimax import SARIMAX

candidates = [
    ("SARIMA(0,1,1)(0,1,1,12)",  (0, 1, 1),  (0, 1, 1, 12)),
    ("SARIMA(1,1,1)(1,1,1,12)",  (1, 1, 1),  (1, 1, 1, 12)),
    ("SARIMA(2,1,1)(1,1,1,12)",  (2, 1, 1),  (1, 1, 1, 12)),
    ("SARIMA(0,1,2)(0,1,1,12)",  (0, 1, 2),  (0, 1, 1, 12)),
]

results = {}
…
Fitting SARIMA(0,1,1)(0,1,1,12) … AIC = 920.63
Fitting SARIMA(1,1,1)(1,1,1,12) … AIC = 922.21
Fitting SARIMA(2,1,1)(1,1,1,12) … AIC = 924.22
Fitting SARIMA(0,1,2)(0,1,1,12) … AIC = 915.59

✓ Best model: SARIMA(0,1,2)(0,1,1,12)  (AIC = 915.59)
  Order: (0, 1, 2) × Seasonal: (0, 1, 1, 12)
  Log-Likelihood: -453.80
  Number of parameters: 4

Coefficient summary:
==============================================================================
                 coef    std err          z      P>|z|…
# ── 4. 24-month forecast with confidence intervals ────────────────────

# Forecast horizon
n_steps = 24
forecast_result = best_model.get_forecast(steps=n_steps)
pred = forecast_result.predicted_mean
ci = forecast_result.conf_int(alpha=0.05)
ci.columns = ['lower_95', 'upper_95']

# Future date index
last_date = ts.index[-1]
future_dates = pd.date_range(start=last_date + pd.DateOffset(months=1), periods=n_steps, freq='MS')
pred.index = future_dates

# In-sample fitted values (aligned)
fitted = best_model.fittedvalues
fitted.index = ts.index

…
Forecast horizon: 24 months (1961-01 → 1962-12)

         Forecast  Lower 95%  Upper 95%
1961-01     446.8      423.1      470.5
1961-02     421.6      392.8      450.4
1961-03     453.4      420.2      486.7
1961-04     489.5      452.4      526.7
1961-05     501.9      461.2      542.5
1961-06     563.9      519.9      607.8
1961-07     649.3      602.4      696.3
1961-08     636.4      586.6      686.2
1961-09     538.6      486.1      591.1
1961-10     490.7      435.7      545.8
1961-11…
# ── 5. Polished visualization ──────────────────────────────────────────

fig, ax = plt.subplots(figsize=(16, 7))

# ── In-sample period ──
ax.plot(ts.index, ts.values, color='#2c3e50', linewidth=1.8, 
        label='Historical passengers', zorder=3)

# ── Forecast ──
ax.plot(pred.index, pred.values, color='#e74c3c', linewidth=2.2,
        label='Forecast (24 months)', zorder=4)

# ── Confidence band ──
ax.fill_between(pred.index, ci['lower_95'], ci['upper_95'],
                color='#e74c3c', alpha=0.15, label='95% confidence interval', zorder=2)

# ── Separation line ──
ax.axvline(x=ts.index[-1] + pd.DateOffset(months=1), color='#95a5a6', 
…
✅ Forecast plotted — the band widens over time reflecting increasing uncertainty.
Air Passengers Forecast with Confidence Intervals | Clusy