Results
Headline numbers, model performance, scenario design and where to access the full numerical outputs. Detailed visualisations (interactive globe, country choropleth, regional flow chord diagram, corridor fan-charts) live in the interactive dashboard.
Model performance
The stacking ensemble (GAM + Random Forest + XGBoost via Ridge meta-learner) achieves a pooled out-of-fold R² of 0.826 on the 1990–2015 historical panel — 99.9% of the temporal autocorrelation ceiling (r² = 0.827). Performance is validated by 5-fold expanding-window cross-validation. Fold 1 trains on a single period and is intentionally weak; from Fold 2 onward the curve is strictly monotonic (0.810 → 0.855), confirming genuine learning rather than overfitting:
| Fold | Training | Test | R² | RMSE | N (Test) | N Models |
|---|---|---|---|---|---|---|
| 1 | 1990 | 1995–2015 | 0.373 | 0.229 | 253,770 | 2 (GAM + RF) |
| 2 | 1990–1995 | 2000–2015 | 0.810 | 0.126 | 203,016 | 3 |
| 3 | 1990–2000 | 2005–2015 | 0.834 | 0.118 | 152,262 | 3 |
| 4 | 1990–2005 | 2010–2015 | 0.831 | 0.119 | 101,508 | 3 |
| 5 | 1990–2010 | 2015 | 0.855 | 0.111 | 50,754 | 3 |
| Pooled | All prior | All held-out | 0.826 | 0.121 | 507,540 | — |
Projection framework
The projection framework produces, for every country and every 5-year period 2020–2100:
- bilateral migration flows for ~52,670 corridors,
- net migration per country,
- the Migration Pressure Index (MPI) and Trapped Population Index (TPI),
- conformal prediction intervals at 50% and 90% coverage,
across 4 IPCC SSP pathways × 5 narrative scenarios (= 20 scenario × pathway combinations).
Scenario design
| Scenario | Narrative | Climate × Conflict × Drought × Flood × Storm × Gov. | Displacement overlay |
|---|---|---|---|
| Baseline (ML only) | Reference run, no narrative perturbation. | 1.0 × 1.0 × 1.0 × 1.0 × 1.0 × 1.0 | — |
| Baseline+ | Reference + structural displacement. | 1.0 × 1.0 × 1.0 × 1.0 × 1.0 × 1.0 | ×1.0 |
| Adaptation Success | Lower conflict, mildly higher income, mildly lower climate stress. | 0.8 × 0.7 × 0.9 × 0.9 × 1.0 × 1.0 | ×0.8 |
| Fragmentation | Higher conflict and governance stress, lower income. | 1.0 × 1.5 × 1.0 × 1.0 × 1.0 × 1.3 | ×1.2 |
| Climate Extreme | Tail-risk climate channel: hotter, drier, more severe storms and flooding. | 1.5 × 1.0 × 1.5 × 1.4 × 1.5 × 1.0 | ×1.5 |
Headline qualitative findings
- Network persistence dominates. The diaspora stock at \(t-1\) is the single strongest predictor in the panel (Pearson r ≈ 0.33). Established corridors are highly persistent; inactive corridors decline by ~13% per 5-year period, while active corridors keep growing through self-reinforcing dynamics — even before climate shocks are added.
- Climate features carry over half the predictor mass. 54% of the 109 engineered predictors are climate-related (CRU TS 4.09 + CMIP6 anomalies and derived indices), reflecting the project's research focus on climate–migration coupling.
- Cross-validation supports genuine learning. The monotonic R² curve (0.373 → 0.553 across 5 expanding folds) plus the +1.6 pp ensemble lift over the best individual model rule out an "overfitting on history" interpretation of the high pooled R².
- Adaptation matters more than climate alone. Because the scenario engine acts on input features (not on ensemble outputs), the same physical climate shock plays out very differently under high-governance vs low-governance pathways — Adaptation Success vs Fragmentation share identical climate inputs but produce structurally different displacement footprints.
- Coastal exposure is enormous. NASA SEDAC LECZ v3 places 687 million people below 5 m of elevation and 1.056 billion below 10 m globally — the upper bound on long-run sea-level-driven displacement before adaptation, distributed across corridors via baseline shares.
- Calibration matters for policy use. IPF calibration against UN WPP origin and destination totals (Willekens 1999; Abel & Cohen 2019) ensures that aggregated projected flows are physically consistent at the country level — necessary for any downstream demographic, economic or policy modelling.
Where to find the numbers
All numerical outputs (per-country, per-scenario, per-period) are downloadable from the interactive dashboard's Data tab. Free CSVs include country-level scenario projections for each of the 5 scenarios, scenario summary statistics, MPI / TPI indicator panels, OOF predictions, SHAP feature attributions, and per-fold cross-validation metrics. Dyad-level (corridor-level) data are available on request via rogalski.academic@pm.me.
Uncertainty & limitations
- Conformal coverage: every projection is paired with 50% and 90% conformal prediction intervals (Vovk 2005; Romano 2019; Barber 2023), distribution-free, Mondrian-binned by flow magnitude — see Methodology §12.
- Historical training window: 1990–2015. Periods 2020+ are projections. Out-of-sample validation against 2020 IMS stocks is in preparation.
- Scenario design is exploratory, not predictive: the V7 multipliers (climate ×1.5, conflict ×1.5, etc.) are stress-test-style narrative perturbations, not probabilistic scenarios. They span a plausible envelope; they do not assign likelihoods.
- Displacement overlay: physics-based but coarse. Sea level, heat and drought channels are the three dominant climate-displacement mechanisms in the literature, but second-order channels (riverine flooding inside countries, vector-borne disease, glacier melt for downstream users) are not yet modelled.
- Single-author project: this is independent doctoral research, not a multi-institution consortium output. Replication code and model artefacts are being prepared for public release.
How to cite
Rogalski, C. (2026). Migration Scenario Engine: Global Bilateral Migration Projections under Climate Scenarios, 2020–2100. CERIFR Research. https://migrationengine.org