CERIFR Research — Migration Scenario Engine

Frequently Asked Questions

Answers to the most common questions about the Migration Scenario Engine — what it projects, how it is calibrated, how to read the scenarios, and how to use and cite the outputs.

Does the Migration Scenario Engine project "climate refugees"?

No. The engine projects bilateral gross migration flows between countries under combined climate and socioeconomic scenarios. The term "refugee" has a specific legal definition under the 1951 Convention and its 1967 Protocol; it applies to persons fleeing persecution on grounds of race, religion, nationality, political opinion, or membership of a particular social group. International law does not currently recognise a legal category of "climate refugee". The MSE deliberately uses "migration flows" and, in the displacement overlay, "persons displaced by climate channels" — not "refugees".

What is the difference between migrant stock and migrant flow, and which one does the MSE report?

A migrant stock is the cumulative number of foreign-born persons living in a country at a given census point in time — approximately 281 million globally in 2020 according to UN DESA (IMS 2024). A migrant flow is the directional count of people moving between an origin and a destination country during a time interval.

The MSE dashboard reports aggregated bilateral gross flows (directional origin→destination transitions) over 5-year periods, summed across approximately 52,670 dyads. It does not report stock totals. Where the dashboard shows an origin "losing" or a destination "gaining" population, these are flow-based differentials, not snapshots of who currently lives where.

This distinction matters for policy readers comparing MSE outputs to UN DESA stock tables or to IOM "migrants worldwide" numbers, which are stock concepts.

What does SSP × scenario mean, and why 4 × 5 combinations?

SSPs (Shared Socioeconomic Pathways) are the IPCC's reference scenarios for population, GDP, urbanisation and governance trajectories through 2100. The MSE uses four: SSP1 (Sustainability), SSP2 (Middle of the Road), SSP3 (Regional Rivalry), and SSP5 (Fossil-fuelled Development). SSP4 is omitted because its governance narrative is not well differentiated from SSP3 for migration purposes.

On top of each SSP the MSE runs five narrative scenarios — Baseline (ML only), Baseline+ (reference plus structural displacement), Adaptation Success, Fragmentation, and Climate Extreme — yielding 4 × 5 = 20 scenario × pathway combinations. Scenarios are exploratory stress tests, not probabilistic forecasts.

How accurate are the projections through 2100?

The ensemble achieves a pooled out-of-fold R² of 0.826 on the 1990–2015 historical panel, reaching 99.9% of the temporal autocorrelation ceiling (r² = 0.827). This measures how well the model reproduces past bilateral flows on held-out data, not how well it predicts the future.

For the projection horizon (2020–2100), every output is paired with distribution-free conformal prediction intervals at 50% and 90% coverage. The intervals widen with horizon and with flow magnitude. Long-horizon projections should be read as scenarios, not forecasts: they describe what would happen if the socioeconomic and climate trajectories play out as specified, not what will happen.

What are conformal prediction intervals?

Conformal prediction intervals (CPIs) are distribution-free uncertainty intervals with finite-sample coverage guarantees, introduced by Vovk et al. (2005) and extended by Romano et al. (2019) and Barber et al. (2023). The MSE uses Mondrian-binned CPIs: residuals are stratified by flow magnitude (zero, low, medium, high), and interval width is calibrated separately within each bin. A multiplicative bootstrap (N = 500) propagates the intervals from dyad-level predictions to corridor aggregates.

Unlike Gaussian confidence intervals, CPIs do not assume any particular distribution of residuals and remain valid even when the residual distribution is skewed or heavy-tailed.

Is this project peer-reviewed?

The Migration Scenario Engine is an independent doctoral research project. The underlying dissertation is under review at Alexandru Ioan Cuza University of Iași. A preprint describing the methodology and benchmarking the ensemble against prior bilateral-migration literature is in preparation and will be deposited on arXiv and Zenodo, with a persistent DOI, after the dissertation's resubmission milestone. The dashboard itself is a research output, not a peer-reviewed publication; it is released under CC BY-NC 4.0 for transparency and scrutiny.

Who funded this work and is there a commercial interest?

The Migration Scenario Engine is self-funded by the author. There is no external funding, no corporate sponsorship, no institutional client, and no advocacy organisation behind the project. Hosting, data, and compute are paid by the author personally. The outputs represent no lobby, no commercial interest, and no institutional agenda.

This funding posture is a deliberate choice to preserve analytical independence; its trade-off is limited infrastructure capacity, which is why the project is released as a single-author scientific platform rather than a commercial product.

Which data sources does the MSE use?

Seventeen primary sources span migration (Abel & Cohen 2019; UN DESA IMS 2024), climate (CRU TS 4.09; CMIP6 ScenarioMIP over a 5-GCM ensemble — ACCESS-CM2, GFDL-ESM4, IPSL-CM6A-LR, MIROC6, MPI-ESM1-2-LR), economics (World Bank WDI), governance (WGI), conflict (UCDP v25.1), disasters (EM-DAT), vulnerability (ND-GAIN), education (Barro-Lee v3), demography (UN WPP 2024), gravity (CEPII GeoDist), policy (DEMIG VISA), socioeconomic pathways (IIASA SSP Database v3.1), and climate displacement (NASA SEDAC LECZ v3; IPCC AR6 WG1 Table 9.9). The full catalogue, with licenses and DOIs, is at /data.html.

Can I download the data?

Yes. Country-level scenario projections, the indicator panel (MPI, TPI, OOF predictions), SHAP feature attributions, and per-fold cross-validation metrics are free to download from the dashboard's Data tab at migrationengine.org. The harmonised outputs are licensed CC BY-NC 4.0. Raw third-party inputs remain under their respective upstream licenses. Dyad-level (corridor-level) data, which is larger and requires context for correct interpretation, is available on request via rogalski.academic@pm.me.

How should I cite the Migration Scenario Engine?

Rogalski, C. (2026). Migration Scenario Engine: Global Bilateral
Migration Projections under Climate Scenarios, 2020–2100.
CERIFR Research. https://migrationengine.org

A machine-readable CITATION.cff file is provided. Once the preprint is deposited, a DOI will be added to the citation record; please check the CITATION.cff file for the most current citation at the time of use.

Is the "Migration Scenario Engine" related to database-migration tools or game-engine migration?

No. The Migration Scenario Engine (MSE) is a scientific platform for climate-driven international migration projections. It is unrelated to database schema migration tools (such as Flyway, Liquibase, Alembic), ORM migration frameworks (Django migrations, Rails migrations), or game-engine migration (for example porting projects between Unity and Unreal). In academic and policy contexts the project is consistently referred to as "Migration Scenario Engine" together with disambiguators such as "climate migration", "bilateral flows", "SSP scenarios", "Rogalski", or "CERIFR".

What are the Migration Pressure Index (MPI) and the Trapped Population Index (TPI)?

The Migration Pressure Index (MPI) is a composite country-level index on [0, 1] that aggregates five stress components — climate (25%), conflict (25%), disasters (20%), governance breakdown (15%), and economic crisis (15%) — and normalises within each SSP group via percentile rank. Higher MPI indicates greater structural pressure to migrate.

The Trapped Population Index (TPI) combines the MPI with a country's observed cross-border mobility: TPI = MPI × (1 − normalised mobility). High TPI flags populations that face high migration pressure but have limited ability to move — the "trapped populations" concept from the Foresight (2011) report. Both indices are intended as policy-relevant diagnostics, not as predictors of future conflict or displacement by themselves.

What models does the ensemble combine, and why stacking?

The stacking ensemble combines three base learners on log-flow-rate residuals:

  • A Generalised Additive Model (GAM, mgcv package, REML estimation, weight 31.7%) captures smooth nonlinear climate–income interactions.
  • A Random Forest (ranger, 500 trees, weight 33.3%) captures high-order interactions and robust tail behaviour.
  • An XGBoost model (500 rounds, η = 0.01, depth = 6, weight 35.0%) captures gradient-boosted residual patterns.

A Ridge meta-learner (glmnet, α = 0) is trained on the out-of-fold prediction matrix [N × 3] and produces the final weighted prediction. Stacking is preferred over voting because the Ridge layer learns algorithm-specific biases from the out-of-fold data and shrinks them toward the optimal combination. Pooled OOF R² improves by +1.3 pp over the best individual base learner.

What does the project explicitly not model?

  • Internal (within-country) migration, which is a major share of climate-displacement literature but out of scope for a bilateral-flow framework.
  • Refugee status adjudication and asylum flows, which are governed by legal processes not captured by the socioeconomic covariates.
  • Irregular-migration dynamics, where underreporting in the training panel prevents reliable estimation.
  • Second-order climate channels such as riverine flooding, vector-borne disease, and glacier melt for downstream users — acknowledged in the limitations section but not yet modelled.
  • Short-term circular and seasonal migration, which do not appear in the 5-year Abel & Cohen flow matrices.

The Methodology and Results pages list the full scope and limitations.