Air Passengers Forecast with Confidence Intervals
By Ju Lin · Published June 28, 2026
Forecasts airline passenger volume 24 months ahead using SARIMA with 95% confidence intervals.
- time-series
- forecasting
- sarima
- confidence-intervals
- eda
- seasonal
Inside this notebook
## Air Passengers — 24-Month Forecast with Confidence Intervals **What this notebook does:** Forecasts international airline passenger volume 24 months into the future using a SARIMA (Seasonal ARIMA) model, with quantified uncertainty. **Dataset:** Monthly totals of international airline passengers (Box & Jenkins, 1976), spanning January 1949 through December 1960 — 144 observations. This is the classic textbook time series for demonstrating seasonal forecasting. **Workflow:** 1. **Explore** — Plot the raw series, examine autocorrelation (ACF/PACF), test for stationarity 2. **Model** — Fit and compare several SARIMA specifications, pick the best by AIC 3. **Diagnose** — Check residuals for remaining patterns 4. **Forecast** — Project 24 months ahead (1961–01 to 1962–12) with 95% confidence intervals 5. **Visualise** — One polished figure showing history, forecast, and the widening uncertainty band **Key features of the data:** Clear upward trend + strong 12-month seasonal cycle (summer peaks, winter troughs). The seasonal amplitude grows with the trend, suggesting the model may benefit from a log transform — explored in the diagnostics.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.stattools import adfuller
import warnings
warnings.filterwarnings("ignore")
# Load the classic Air Passengers dataset
data = sm.datasets.get_rdataset("AirPassengers", "datasets")
series = data.data
series.head()# ── 1. Convert to proper time series ──────────────────────────────────
# The "time" column is year+fraction; convert to monthly period
series['month'] = pd.to_datetime(series['time'].apply(lambda x: f"{int(x)}-{int(round((x % 1) * 12)) + 1:02d}-01"))
ts = series.set_index('month')['value'].asfreq('MS')
ts.name = 'passengers'
print(f"Shape: {ts.shape}")
print(f"Date range: {ts.index.min()} → {ts.index.max()}")
print(f"Periods per year: {ts.resample('YE').count().unique()}")
print(f"First 5 rows:\n{ts.head()}")Shape: (144,) Date range: 1949-01-01 00:00:00 → 1960-12-01 00:00:00 Periods per year: [12] First 5 rows: month 1949-01-01 112 1949-02-01 118 1949-03-01 132 1949-04-01 129 1949-05-01 121 Freq: MS, Name: passengers, dtype: int64
# ── 2. EDA: plot the series, ACF/PACF, and stationarity test ──────────
fig, axes = plt.subplots(2, 2, figsize=(14, 8))
fig.suptitle("Air Passengers — Time Series Diagnostics", fontsize=14, y=1.02)
# Raw series
ax1 = axes[0, 0]
ax1.plot(ts.index, ts.values, color='#2c3e50', linewidth=1.5)
ax1.set_title("Monthly Air Passengers (1949–1960)")
ax1.set_ylabel("Passengers")
ax1.grid(alpha=0.3)
# Annual trend overlay
yearly = ts.resample('YE').mean()
ax1.plot(yearly.index, yearly.values, color='#e74c3c', linewidth=2, alpha=0.8, label='Annual avg')
ax1.legend()
# ACF
…Seasonal decomposition (additive model):
# ── 3. Fit SARIMA model ────────────────────────────────────────────────
# Based on EDA, the series requires:
# - d=1 (trend non-stationary)
# - D=1 (seasonal non-stationary — amplitude grows with trend)
# - s=12 (monthly seasonality)
#
# Try two candidate specifications and pick the lower AIC.
from statsmodels.tsa.statespace.sarimax import SARIMAX
candidates = [
("SARIMA(0,1,1)(0,1,1,12)", (0, 1, 1), (0, 1, 1, 12)),
("SARIMA(1,1,1)(1,1,1,12)", (1, 1, 1), (1, 1, 1, 12)),
("SARIMA(2,1,1)(1,1,1,12)", (2, 1, 1), (1, 1, 1, 12)),
("SARIMA(0,1,2)(0,1,1,12)", (0, 1, 2), (0, 1, 1, 12)),
]
results = {}
…Fitting SARIMA(0,1,1)(0,1,1,12) … AIC = 920.63
Fitting SARIMA(1,1,1)(1,1,1,12) … AIC = 922.21
Fitting SARIMA(2,1,1)(1,1,1,12) … AIC = 924.22
Fitting SARIMA(0,1,2)(0,1,1,12) … AIC = 915.59
✓ Best model: SARIMA(0,1,2)(0,1,1,12) (AIC = 915.59)
Order: (0, 1, 2) × Seasonal: (0, 1, 1, 12)
Log-Likelihood: -453.80
Number of parameters: 4
Coefficient summary:
==============================================================================
coef std err z P>|z|…# ── 4. 24-month forecast with confidence intervals ────────────────────
# Forecast horizon
n_steps = 24
forecast_result = best_model.get_forecast(steps=n_steps)
pred = forecast_result.predicted_mean
ci = forecast_result.conf_int(alpha=0.05)
ci.columns = ['lower_95', 'upper_95']
# Future date index
last_date = ts.index[-1]
future_dates = pd.date_range(start=last_date + pd.DateOffset(months=1), periods=n_steps, freq='MS')
pred.index = future_dates
# In-sample fitted values (aligned)
fitted = best_model.fittedvalues
fitted.index = ts.index
…Forecast horizon: 24 months (1961-01 → 1962-12)
Forecast Lower 95% Upper 95%
1961-01 446.8 423.1 470.5
1961-02 421.6 392.8 450.4
1961-03 453.4 420.2 486.7
1961-04 489.5 452.4 526.7
1961-05 501.9 461.2 542.5
1961-06 563.9 519.9 607.8
1961-07 649.3 602.4 696.3
1961-08 636.4 586.6 686.2
1961-09 538.6 486.1 591.1
1961-10 490.7 435.7 545.8
1961-11…# ── 5. Polished visualization ──────────────────────────────────────────
fig, ax = plt.subplots(figsize=(16, 7))
# ── In-sample period ──
ax.plot(ts.index, ts.values, color='#2c3e50', linewidth=1.8,
label='Historical passengers', zorder=3)
# ── Forecast ──
ax.plot(pred.index, pred.values, color='#e74c3c', linewidth=2.2,
label='Forecast (24 months)', zorder=4)
# ── Confidence band ──
ax.fill_between(pred.index, ci['lower_95'], ci['upper_95'],
color='#e74c3c', alpha=0.15, label='95% confidence interval', zorder=2)
# ── Separation line ──
ax.axvline(x=ts.index[-1] + pd.DateOffset(months=1), color='#95a5a6',
…✅ Forecast plotted — the band widens over time reflecting increasing uncertainty.