CERIFR Research — Migration Scenario Engine

Results

Headline numbers, model performance, scenario design and where to access the full numerical outputs. Detailed visualisations (interactive globe, country choropleth, regional flow chord diagram, corridor fan-charts) live in the interactive dashboard.

Model performance

The stacking ensemble (GAM + Random Forest + XGBoost via Ridge meta-learner) achieves a pooled out-of-fold R² of 0.826 on the 1990–2015 historical panel — 99.9% of the temporal autocorrelation ceiling (r² = 0.827). Performance is validated by 5-fold expanding-window cross-validation. Fold 1 trains on a single period and is intentionally weak; from Fold 2 onward the curve is strictly monotonic (0.810 → 0.855), confirming genuine learning rather than overfitting:

FoldTrainingTestRMSEN (Test)N Models
119901995–20150.3730.229253,7702 (GAM + RF)
21990–19952000–20150.8100.126203,0163
31990–20002005–20150.8340.118152,2623
41990–20052010–20150.8310.119101,5083
51990–201020150.8550.11150,7543
PooledAll priorAll held-out0.8260.121507,540

Per-model standalone OOF R²: GAM 0.795, Random Forest 0.804, XGBoost 0.813. Ensemble lift over best individual model: +1.3 pp. See Methodology §5 for full CV protocol.

Projection framework

The projection framework produces, for every country and every 5-year period 2020–2100:

  • bilateral migration flows for ~52,670 corridors,
  • net migration per country,
  • the Migration Pressure Index (MPI) and Trapped Population Index (TPI),
  • conformal prediction intervals at 50% and 90% coverage,

across 4 IPCC SSP pathways × 5 narrative scenarios (= 20 scenario × pathway combinations).

Scenario design

ScenarioNarrativeClimate × Conflict × Drought × Flood × Storm × Gov.Displacement overlay
Baseline (ML only)Reference run, no narrative perturbation.1.0 × 1.0 × 1.0 × 1.0 × 1.0 × 1.0
Baseline+Reference + structural displacement.1.0 × 1.0 × 1.0 × 1.0 × 1.0 × 1.0×1.0
Adaptation SuccessLower conflict, mildly higher income, mildly lower climate stress.0.8 × 0.7 × 0.9 × 0.9 × 1.0 × 1.0×0.8
FragmentationHigher conflict and governance stress, lower income.1.0 × 1.5 × 1.0 × 1.0 × 1.0 × 1.3×1.2
Climate ExtremeTail-risk climate channel: hotter, drier, more severe storms and flooding.1.5 × 1.0 × 1.5 × 1.4 × 1.5 × 1.0×1.5

All scenarios are computed across SSP1 (Sustainability), SSP2 (Middle of the Road), SSP3 (Rivalry), and SSP5 (Fossil-fueled). See Methodology §9 for the scenario engine and §11 for IPF calibration to UN WPP totals.

Headline qualitative findings

  • Network persistence dominates. The diaspora stock at \(t-1\) is the single strongest predictor in the panel (Pearson r ≈ 0.33). Established corridors are highly persistent; inactive corridors decline by ~13% per 5-year period, while active corridors keep growing through self-reinforcing dynamics — even before climate shocks are added.
  • Climate features carry over half the predictor mass. 54% of the 109 engineered predictors are climate-related (CRU TS 4.09 + CMIP6 anomalies and derived indices), reflecting the project's research focus on climate–migration coupling.
  • Cross-validation supports genuine learning. The monotonic R² curve (0.373 → 0.553 across 5 expanding folds) plus the +1.6 pp ensemble lift over the best individual model rule out an "overfitting on history" interpretation of the high pooled R².
  • Adaptation matters more than climate alone. Because the scenario engine acts on input features (not on ensemble outputs), the same physical climate shock plays out very differently under high-governance vs low-governance pathways — Adaptation Success vs Fragmentation share identical climate inputs but produce structurally different displacement footprints.
  • Coastal exposure is enormous. NASA SEDAC LECZ v3 places 687 million people below 5 m of elevation and 1.056 billion below 10 m globally — the upper bound on long-run sea-level-driven displacement before adaptation, distributed across corridors via baseline shares.
  • Calibration matters for policy use. IPF calibration against UN WPP origin and destination totals (Willekens 1999; Abel & Cohen 2019) ensures that aggregated projected flows are physically consistent at the country level — necessary for any downstream demographic, economic or policy modelling.

Where to find the numbers

All numerical outputs (per-country, per-scenario, per-period) are downloadable from the interactive dashboard's Data tab. Free CSVs include country-level scenario projections for each of the 5 scenarios, scenario summary statistics, MPI / TPI indicator panels, OOF predictions, SHAP feature attributions, and per-fold cross-validation metrics. Dyad-level (corridor-level) data are available on request via rogalski.academic@pm.me.

Uncertainty & limitations

  • Conformal coverage: every projection is paired with 50% and 90% conformal prediction intervals (Vovk 2005; Romano 2019; Barber 2023), distribution-free, Mondrian-binned by flow magnitude — see Methodology §12.
  • Historical training window: 1990–2015. Periods 2020+ are projections. Out-of-sample validation against 2020 IMS stocks is in preparation.
  • Scenario design is exploratory, not predictive: the V7 multipliers (climate ×1.5, conflict ×1.5, etc.) are stress-test-style narrative perturbations, not probabilistic scenarios. They span a plausible envelope; they do not assign likelihoods.
  • Displacement overlay: physics-based but coarse. Sea level, heat and drought channels are the three dominant climate-displacement mechanisms in the literature, but second-order channels (riverine flooding inside countries, vector-borne disease, glacier melt for downstream users) are not yet modelled.
  • Single-author project: this is independent doctoral research, not a multi-institution consortium output. Replication code and model artefacts are being prepared for public release.

How to cite

Rogalski, C. (2026). Migration Scenario Engine: Global Bilateral
Migration Projections under Climate Scenarios, 2020–2100.
CERIFR Research. https://migrationengine.org

Machine-readable: citation.cff · License: CC BY-NC 4.0