Results

Headline numbers, model performance, scenario design and where to access the full numerical outputs. Detailed visualisations (interactive globe, country choropleth, regional flow chord diagram, corridor fan-charts) live in the interactive dashboard.

Model performance

The stacking ensemble (GAM + Random Forest + XGBoost via Ridge meta-learner) achieves a pooled out-of-fold R² of 0.826 on the 1990–2015 historical panel — 99.9% of the temporal autocorrelation ceiling (r² = 0.827). Performance is validated by 5-fold expanding-window cross-validation. Fold 1 trains on a single period and is intentionally weak; from Fold 2 onward the curve is strictly monotonic (0.810 → 0.855), confirming genuine learning rather than overfitting:

Fold	Training	Test	R²	RMSE	N (Test)	N Models
1	1990	1995–2015	0.373	0.229	253,770	2 (GAM + RF)
2	1990–1995	2000–2015	0.810	0.126	203,016	3
3	1990–2000	2005–2015	0.834	0.118	152,262	3
4	1990–2005	2010–2015	0.831	0.119	101,508	3
5	1990–2010	2015	0.855	0.111	50,754	3
Pooled	All prior	All held-out	0.826	0.121	507,540	—

Per-model standalone OOF R²: GAM 0.795, Random Forest 0.804, XGBoost 0.813. Ensemble lift over best individual model: +1.3 pp. See Methodology §5 for full CV protocol.

Projection framework

The projection framework produces, for every country and every 5-year period 2020–2100:

bilateral migration flows for ~52,670 corridors,
net migration per country,
the Migration Pressure Index (MPI) and Trapped Population Index (TPI),
conformal prediction intervals at 50% and 90% coverage,

across 4 IPCC SSP pathways × 5 narrative scenarios (= 20 scenario × pathway combinations).

Scenario design

Scenario	Narrative	Climate × Conflict × Drought × Flood × Storm × Gov.	Displacement overlay
Baseline (ML only)	Reference run, no narrative perturbation.	1.0 × 1.0 × 1.0 × 1.0 × 1.0 × 1.0	—
Baseline+	Reference + structural displacement.	1.0 × 1.0 × 1.0 × 1.0 × 1.0 × 1.0	×1.0
Adaptation Success	Lower conflict, mildly higher income, mildly lower climate stress.	0.8 × 0.7 × 0.9 × 0.9 × 1.0 × 1.0	×0.8
Fragmentation	Higher conflict and governance stress, lower income.	1.0 × 1.5 × 1.0 × 1.0 × 1.0 × 1.3	×1.2
Climate Extreme	Tail-risk climate channel: hotter, drier, more severe storms and flooding.	1.5 × 1.0 × 1.5 × 1.4 × 1.5 × 1.0	×1.5

All scenarios are computed across SSP1 (Sustainability), SSP2 (Middle of the Road), SSP3 (Rivalry), and SSP5 (Fossil-fueled). See Methodology §9 for the scenario engine and §11 for IPF calibration to UN WPP totals.

Headline qualitative findings

Network persistence dominates. The diaspora stock at \(t-1\) is the single strongest predictor in the panel (Pearson r ≈ 0.33). Established corridors are highly persistent; inactive corridors decline by ~13% per 5-year period, while active corridors keep growing through self-reinforcing dynamics — even before climate shocks are added.
Climate features carry over half the predictor mass. 54% of the 109 engineered predictors are climate-related (CRU TS 4.09 + CMIP6 anomalies and derived indices), reflecting the project's research focus on climate–migration coupling.
Cross-validation supports genuine learning. The monotonic R² curve (0.373 → 0.553 across 5 expanding folds) plus the +1.6 pp ensemble lift over the best individual model rule out an "overfitting on history" interpretation of the high pooled R².
Adaptation matters more than climate alone. Because the scenario engine acts on input features (not on ensemble outputs), the same physical climate shock plays out very differently under high-governance vs low-governance pathways — Adaptation Success vs Fragmentation share identical climate inputs but produce structurally different displacement footprints.
Coastal exposure is enormous. NASA SEDAC LECZ v3 places 687 million people below 5 m of elevation and 1.056 billion below 10 m globally — the upper bound on long-run sea-level-driven displacement before adaptation, distributed across corridors via baseline shares.
Calibration matters for policy use. IPF calibration against UN WPP origin and destination totals (Willekens 1999; Abel & Cohen 2019) ensures that aggregated projected flows are physically consistent at the country level — necessary for any downstream demographic, economic or policy modelling.

Where to find the numbers

All numerical outputs (per-country, per-scenario, per-period) are downloadable from the interactive dashboard's Data tab. Free CSVs include country-level scenario projections for each of the 5 scenarios, scenario summary statistics, MPI / TPI indicator panels, OOF predictions, SHAP feature attributions, and per-fold cross-validation metrics. Dyad-level (corridor-level) data are available on request via rogalski.academic@pm.me.

Uncertainty & limitations

Conformal coverage: every projection is paired with 50% and 90% conformal prediction intervals (Vovk 2005; Romano 2019; Barber 2023), distribution-free, Mondrian-binned by flow magnitude — see Methodology §12.
Historical training window: 1990–2015. Periods 2020+ are projections. Out-of-sample validation against 2020 IMS stocks is in preparation.
Scenario design is exploratory, not predictive: the V7 multipliers (climate ×1.5, conflict ×1.5, etc.) are stress-test-style narrative perturbations, not probabilistic scenarios. They span a plausible envelope; they do not assign likelihoods.
Displacement overlay: physics-based but coarse. Sea level, heat and drought channels are the three dominant climate-displacement mechanisms in the literature, but second-order channels (riverine flooding inside countries, vector-borne disease, glacier melt for downstream users) are not yet modelled.
Single-author project: this is independent doctoral research, not a multi-institution consortium output. Replication code and model artefacts are being prepared for public release.

How to cite

Rogalski, C. (2026). Migration Scenario Engine: Global Bilateral
Migration Projections under Climate Scenarios, 2020–2100.
CERIFR Research. https://migrationengine.org

Machine-readable: citation.cff · License: CC BY-NC 4.0