HTTPS://JOURNALS.
UMS.
AC.
ID/INDEX.
PHP/FG/
ISSN: 0852-0682 | E-ISSN: 2460-3945
Research article Flood Risk Analysis Using Spatial Synthetic Population in the Upper Bengawan Solo Watershed.
Indonesia Muhammad Musiyam1,*.
Jumadi Jumadi2,3.
Vidya Nahdhiyatul Fikriyah2,4.
Heni Masruroh5.
Ema Dwi Septiyani2.
Choirul Amin2.
Hamim Zaky Hadibasyir2,6.
Farha Sattar7.
Muhammad Nawaz8 Department of Geography Education.
Faculty of Education.
Universitas Muhammadiyah Surakarta.
Surakarta 57162.
Department of Geography.
Faculty of Geography.
Universitas Muhammadiyah Surakarta.
Indonesia.
3 INTI International University, 71800 Nilai.
Negeri Sembilan.
Malaysia.
4 Faculty of Geo information Science and Earth Observation.
University of Twente.
Netherlands.
5 Department of Geography.
Universitas Negeri Malang.
Indonesia.
6 School of Built Environment.
University of New South Wales.
Sydney.
New South Wales 2052.
Australia.
7 Faculty of Arts and Society.
Charles Darwin University.
Australia.
8 Department of Geography.
National University of Singapore.
AS#3-03-11, 1 Arts Link.
Kent Ridge.
Singapore.
Citation:
Musiyam.
Jumadi.
Fikriyah.
Masruroh.
Septiyani.
Amin.
Hadibasyir.
Sattar.
, & Nawaz.
Flood Risk Analysis Using Spatial Synthetic Population in the Upper Bengawan Solo Watershed.
Indonesia.
Forum Geografi.
, 383-397.
Article history:
Received: 22 November 2025 Revised: 4 December 2025 Accepted: 4 December 2025 Published: 10 December 2025 Correspondence: mm102@ums.
Abstract This study develops a spatial synthetic population (SSP)-based computational model to produce realistic, high-resolution flood-risk maps for the Upper Bengawan Solo Watershed.
It combines Global Human Settlement .
patial distributio.
with local population statistics .
The SSP is created for flood risk mapping in the Upper Bengawan Solo Watershed (BSH) using a 100 m grid from the Global Human Settlement Layer (GHSL) GHS-POP R2023A.
Synthetic individuals are strategically placed around the pixel centre .
adius O 100 .
, and each is assigned demographic attributes .
ge, gender, education, occupatio.
validated against official county-level data.
Social vulnerability is calculated through weighted aggregation (AHP) across four attributes.
individual scores are combined with flood hazard intensity at each location to produce a risk index for each person.
Validation shows that .
the SSP aligns closely with reference statistics:
gender and age are nearly identical (MAE OO 0.
01Ae0.
02%), with slight deviations in occupation (MAE 6.
and education (MAE 4.
89%), .
the overall suitability of the SSP compared to GHS counts at pixel samples, and .
location plausibility testing using ESRI Sentinel-2 Land Cover .
Results indicate that .
the SSP aligns well for gender, moderately for education and occupation, but shows significant misalignment in age, .
96% of SSP points are in built-up land, suggesting high spatial accuracy.
Medium- to high-risk patterns are mainly along the main river corridors and peri-urban areas, while rural non-built zones are mostly low- to medium-risk.
These findings suggest that this methodology is scalable, reproducible, and suitable for data limited regions, enabling the production of detail risk maps that can guide mitigation and preparedness efforts.
Keywords: Spatial synthetic population.
Global Human Settlement Layer.
Flood risk.
Upper Bengawan Solo.
AHP.
Social vulnerability.
Flood mitigation.
Introduction Floods are the most common hydrometeorological disasters in Indonesia and have significant socio-economic impacts, especially in fast-growing urban and rural areas such as the Upper Bengawan Solo River Basin (Jumadi et al.
, 2024a.
Jumadi et al.
, 2024b.
Purwanto et al.
, 2023.
Sahid.
Tellman et al.
, 2.
Accurate flood risk mapping requires integrating hazards and vulnerabilities.
however, the quality of mapping is often limited by the lack of detailed.
Populationbased raster methodsAisuch as medium-resolution global datasetsAihelp estimate exposure but usually overlook demographic structures and household traits that influence resilience and recovery after disasters.
This is where spatial synthetic population (SSP) becomes a breakthrough: it assigns realistic attributes and locations to individual and household units without jeopardizing privacy, thus connecting aggregate data with detailed demographic information for better risk assessment (Agriesti et al.
, 2022.
Chapuis & Taillandier, 2019.
Jiang et al.
, 2024.
Jumadi et al.
Prydhumeau & Manley, 2.
Copyright: A 2025 by the authors.
Submitted for possible open access publication under the terms and conditions of the Creative Commons Attribution (CC BY) license .
ttps://creativecommons.
org/licenses/by/4.
0/).
Musiyam et al.
Recent research shows rapid progress in creating SSPs at scales from urban to national, emphasizing reproducibility, data openness, and computational efficiency.
Earlier studies have focused on building national synthetic populations, highlighting transparent pipelines and statistical alignment between synthetic data and census figures (Prydhumeau & Manley, 2.
Similar endeavours at a larger scale demonstrate that assembling a geographically explicit synthetic population (SSP) is achievable by combining microdata, spatial constraints, and realistic housing placement strategies (Jiang et al.
, 2.
In mobility planning and public services, assignment and placement techniques for SSPs underscore the importance of matching individual attributes, household structures, and spatial activity patterns to ensure population representations remain accurate and practical (Agriesti et al.
, 2.
Methodologically, a brief review of synthetic population creation methods identifies Iterative Proportional Fitting/Updating (IPF/IPU), microsimulation, and Page 383 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
machine learning as the primary approaches, each suited to varying data availability and research goals (Balac & Hyrl, 2021.
Bigi et al.
, 2024.
Burger et al.
, 2017.
De Mooij et al.
, 2024.
Jiang et , 2022.
Jumadi et al.
, 2018, 2.
In the context of hydrometeorological disasters, the main challenge is not only in representing the number of exposed individuals but also in accounting for their demographic structure, employment, and socio-economic conditions, which influence vulnerability.
Many studies use social vulnerability indices .
, derivatives of SoVI) to summarize sensitivity and adaptive capacity at the community level.
however, at the village level in developing countries, data gaps are common, requiring the synthesis of indicators from incomplete distributions (Chakraborty et al.
, 2023.
Cutter et al.
, 2003.
Cutter & Finch, 2008.
Jumadi et al.
, 2.
At the same time, flood hazard information can be analyzed based on several parameters by combining indicators such as slope gradient, rainfall, drainage density, soil moisture, land use/land cover (LULC), elevation, distance to rivers, normalized difference vegetation index (NDVI), curvature, flow accumulation, and topographic wetness index (TWI).
these indicators can be calibrated with historical event records and/or remotely sensed rainfall if available (Jumadi et al.
, 2024a.
Jumadi et al.
, 2024b.
Musiyam et al.
, 2.
Although global population grid maps support macro-level analysis, several studies show that flood exposure results are sensitive to the choice of population datasets and their distribution These findings highlight the importance of methodological transparency in selecting and creating population representations, as minor differences in inputs can significantly affect risk estimates (Abdelkareem & Mansour, 2023.
Arnell & Gosling, 2016.
Chakraborty et al.
, 2023.
Diriba et al.
, 2024.
Jumadi, et al.
, 2.
This methodological awareness is especially relevant for floods, as demographic structures and settlement patterns can determine who is exposed to flooding, where flooding occurs, and its severity.
In Thailand, for instance, flood risk assessments have become increasingly important as the population ages rapidly (Waiyasusri et al.
, 2021.
Sawangnate et al.
, 2022.
Waiyasusri et al.
, 2.
On the other hand, informal settlements and regional terrain variations pose a high risk of flooding in the Philippines (Usamah et al.
, 2.
Similar challenges are also evident in other coastal regions, such as peninsular Malaysia (Maqtan et al.
, 2.
, highlighting the need to consider population context in flood risk mapping.
In Indonesia, including the Upper Bengawan Solo River Basin, there are practical gaps in three First, the lack of microdata at the village level limits the use of IPF/IPU.
as a result, many studies depend on an aggregate population raster that may be biased in areas with mixed settlements and open land.
Second, the spatial distribution of populations often ignores the actual settlement structure .
uilding footprints and road network.
, so exposure at the unit level .
uildings and household.
is not accurately reflected.
Third, combining SSP with composite multi indicator hazard and social vulnerability indices in a reproducible format is not yet a common practice at the village level, making cross regional and temporal comparison difficult (Agriesti et al.
, 2022.
Chapuis & Taillandier, 2019.
Jiang et al.
, 2024.
Prydhumeau & Manley, 2.
This research addresses the gap identified above by developing an SSP model that utilizes the GHS-POP R2023A (Pesaresi et al.
, 2024.
Schiavina et al.
, 2.
using lightweight computing.
incorporates directed stochastic allocation around settlement centers .
ased on GHS-POP pixel.
to match local spatial patterns.
Population attributes such as age, gender, education, occupation, and income are synthesized from official frequency distributions, with their margins enforced for consistency across districts.
Flood hazard is modelled using eleven measurable risk factors .
lope, rainfall, drainage density, soil moisture, land use/land cover, elevation, distance to river.
NDVI, curvature, flow accumulation.
TWI) through ordinal reclassification and normalized weighted Meanwhile, social vulnerability is summarized into a composite index based on tested demographic, social, and economic indicators.
Risk is then calculated as a composite or multiplicative function of hazard and vulnerability at an operational resolution.
This framework is built on best practices for SSP generationAiensuring transparency and evaluating statistical congruenceAi while remaining adaptable to regions with data limitations.
This design produces an individualbased flood risk map that more accurately reflects local demographic and socio-economic conditions, making it ready for stakeholders in the Upper Bengawan Solo region to prioritize mitigation and risk communication.
Finally, the rest of the paper is organized as follows: the Research Methods section explains the construction of the SSP, the composite hazard and vulnerability, and the validation processes.
The Results and Discussion section displays the risk maps and validation The Conclusion section discusses contributions, limitations, and future research directions.
Musiyam et al.
Page 384 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
Research Methods The study develops SSP modelling, which uses demographic data, gender, education, and occupation, with randomized spatial allocation centered on GHS-POP pixels to determine individual The accuracy of the SSP is validated against official statistics .
sing MAE and RMSE metric.
, spatial fit with population grids, and placement tests on built-up land based on ESRI Sentinel 2 Land Cover.
The social vulnerability index (V) is calculated as a linear weighted sum using AHP.
Flood risk (R) is derived by multiplying the standardized hazard (H) by V, then classified into five categories using natural breaks (Jenk.
The Getis-Ord Gi hotspot analysis identifies significant risk clusters, which are then interpreted in relation to river corridors and urban areas.
All steps are performed in a geographic information system (GIS) environment with a reproducible workflow (Figure .
Figure 1.
Research Framework.
Study Area The study area is the Upper Bengawan Solo Watershed (BSH), the upstream part of the Bengawan Solo system, the largest river basin on Java Island (Figure .
Geographically, this area is located in Central Java and covers the upstream region, supplied by the MerapiAeMerbabu volcanic complex to the west and the Lawu to the east.
Generally, the area's terrain is dominated by relatively flat plains.
however, the northeastern and northwestern sections feature undulating to rolling terrain near the mountain foothills.
The hydrological segmentation classifies BSH as one of the three main sub-watersheds within the Bengawan Solo system.
The study boundaries are defined by hydrological .
boundaries based on the digital elevation model (DEM) and align with administrative borders (Absori et al.
, 2023.
Anna et al.
, 2023.
Anna & Priyana, 2015.
Santhyami et al.
, 2.
The topographic conditions and river networks primarily influence flood dynamics in BSH.
Past studies have shown varying vulnerability to flooding: areas around Surakarta .
he urban cor.
tend to be more susceptible, while certain regions in Wonogiri.
Karanganyar, and Boyolali are comparatively less at risk.
This pattern aligns with the morphological gradient .
lat versus undulating or steep terrai.
, drainage density, and land use/cover structure along the main river corridor.
Variations in vulnerability among districts highlight the need for locally tailored management.
Musiyam et al.
Page 385 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
especially during the rainy season when increased rainfall strains urban drainage systems (Jumadi et al.
, 2.
Administratively.
BSH includes at least six districts/cities: Boyolali.
Surakarta.
Karanganyar.
Sukoharjo.
Klaten, and Wonogiri.
The Surakarta urban area acts as a hub for population and economic activity, leading to higher exposure of residents and property in this region (Santhyami et al.
, 2.
Figure 2.
Study Area.
Data and Sources Population Centers (GHS-POP R2023A) The population center dataset is derived from the latest Global Human Settlement Layer (GHSL) release (GHS global population grid multitemporal.
GHS-POP R2023A) (Pesaresi et al.
, 2024.
Schiavina et al.
, 2.
The population pixel values from GHSL, typically at 100 m resolution, were converted into point centers using Raster to Point in ArcGIS Pro.
Each point retains a center_id, pixel center coordinates, and population count .
he number of residents in that pixe.
These points are used as references to generate synthetic individuals within a 100 m radius.
Attribute Frequency Tables (Age.
Gender.
Education.
Occupatio.
Four tables display the proportions of demographic and social categories at the watershed level, which combines four districts or cities, for age, gender, education, and occupation.
For age, the labels are given as age ranges.
during synthesis, these are converted to specific ages by sampling integers within each range.
For gender, education, and occupation, category selection is weighted by frequency.
Substantive rules are applied to maintain social realism, such as automatically assigning individuals under 15 years old to the occupation 0 .
o occupatio.
Flood Hazard Data Based on RSAeGIS The hazard index (Figure .
is compiled from physical-environmental parameters derived from open-access spatial data (Jumadi et al.
, 2024a.
Jumadi et al.
, 2024.
The core framework uses eleven parameters: elevation (E.
, slope (S.
, flow accumulation (FA), distance to river (DR), drainage density (DD), topographic wetness index (TWI), curvature (C.
, land cover/use (LULC).
NDVI, rainfall (R.
, and soil moisture (SM).
The primary data sources include SRTM 30 m (El.
Sl.
FA.
DR.
DD.
TWI.
Sentinel-2 10 m (LULC.
NDVI).
GPM v6 (R.
, and SMAP (SM).
All parameters are projected to the working CRS, standardized in spatial resolution, and then reclassified to an ordinal scale of 1Ae5 .
ery low Ae very hig.
with local thematic thresholds.
Weights among parameters are explicitly assigned .
El/Sl/FA/DR as major controlling factor.
, and the weighted overlay results in a hazard surface H.
Physically, low or flat locations and those near rivers tend to experience flow accumulation and low flow velocity.
high NDVI values indicate vegetative cover that may delay runoff.
TWI and SM capture soil saturation.
flat or concave curvature leads to pooling .
ee further explanations in Jumadi et al.
, 2024a.
Jumadi et al.
, 2024.
Musiyam et al.
Page 386 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
Figure 3.
Flood hazard map .
ource: Jumadi et al.
, 2024a.
Jumadi et al.
, 2024.
Supporting Data The research uses administrative boundaries .
, river networks, watershed boundaries, and ESRI Sentinel 2 Land Cover.
Administrative boundaries are essential for reporting, especially when comparing results with different regions.
The river network helps interpret the results.
ESRI
Sentinel-2 land cover is needed to validate the distribution of SSP.
The characteristics and sources of all datasets are summarised in Table 1.
Table 1.
Data Description and Sources.
Data Population Demographic and social categories Images DEM Rainfall Soil moisture Land cover Description Global Human Settlement Layer (GHS-POP R2023A).
Pixel at 100 m resolution Regency /city statistics data Source Earth Engine Data Catalogue Central Bureau of Statistics Derived data Population center_id, coordinates, and count .
umber of resident.
Age, gender, education, occupation Sentinel-2 at 10 m resolution Shuttle Radar Topography Mission (SRTM) at 30 m
ESA
USGS
Global Precipitation Measurement (GPM) v6 at 10 km NASA-USDA Enhanced Soil Moisture Active Passive (SMAP) at 10 km resolution Based on the ESA Sentinel-2 image at 10 m resolution NASA LULC.
NDVI
Elevation, slope, flow accumulation, distance to river, drainage density.
TWI, and curvature Rainfall NASA Soil moisture ESRI Land cover Data Pre-processing Spatial Harmonization Preprocessing starts with aligning the coordinate system and analysis grid so that all layers function properly without distortion.
All data is projected to UTM 49S (EPSG:32.
as a standard CRS to ensure consistency in distance and area measurements.
The extent is defined by the mask of the Upper Solo River Basin (BSH), which is then used as a clipping boundary for all raster and vector data.
Extraction of GHSL Population Data The GHSL population raster is converted into points using ArcGIS Pro (Raster to Poin.
These points are clipped by the watershed mask and assigned a unique ID .
enter_i.
The population attribute .
umber of people per pixe.
is cleaned by removing negative or empty values.
Points with a population of zero are discarded to improve computational efficiency.
If there is an off by Musiyam et al.
Page 387 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
one issue at the watershed boundary where GHSL cells cross it, a small adjustment is applied to the analysis grid.
All points are stored in the working CRS and exported to CSV for use in the Python pipeline (SSP Algorith.
Spatial Synthetic Population (SSP) Development Principles and Assumptions The development of SSP aims to represent the population at the individual level in a synthetic way while maintaining overall consistency with official statistics and spatial accuracy regarding settlement areas.
The main principles are: .
aggregate consistencyAimatching the number of individuals generated at each center point to the population count from GHS-POP.
microspatial realismAidispersing individuals around the pixel center within a fixed radius .
with a denser radial distribution at the center.
attribute consistencyAisampling age, gender, education, and occupation based on weighted standardized frequency tables.
substantive rulesAito avoid unrealistic attribute combinations .
, children workin.
, such as age < 15 Ie education = 0, occupation = 0.
Other key assumptions include: .
category frequencies reflect the population makeup at the watershed level.
micro-spatial dispersion within a 100 m radius sufficiently represents the intrapixel variability observed in GHS-POP.
ages within intervals are assumed uniform unless additional information is available.
Spatial Algorithm (Individual Placemen.
For each center point .
x, c.
with a population n, generate n individual positions using radial jitter, where the angle theta is uniformly distributed between 0 and 2A, and the radius r is drawn from a Beta.
, 2,.
distribution scaled by RADIUS_M.
This creates a denser clustering around the center.
Offsets are calculated as dx = r* cos .
and dy = r* sin .
Demographic-Social Attribute Sampling Individual attributes are assigned through weighted random sampling based on frequency tables.
For age, the ID label can indicate a range .
AuaAebA.
, a single value, or Aua .
Ay The range AuaAebAy is converted into specific ages by selecting a random integer within .
, .
labels Aua Ay are limited to .
, a .
to avoid an infinite tail.
single values are used directly.
For gender, education, and occupation, categories are selected based on their relative probability .
requency divided by tota.
After age is assigned, the rule age < 15 is applied to disable education and occupation .
et to Au0A.
Computational Implementation The implementation uses Python on Google Colab Pro (High-RAM).
It reads population centers from DBF files with dbfread and converts data types to numeric.
The source code for SSP generation is available at https://doi.
org/10.
17605/OSF.
IO/2KEJQ.
Validation of Aggregate Consistency and Spatial Distribution Validation involves three layers.
Aggregate count: the number of SSP per point must match the GHSL count.
a difference greater than 0 indicates anomalies .
, non integer population values or casting issue.
Attribute distribution: calculate the proportion of results for age, gender, education, and occupation, then compare these with frequency tables using MAE and RMSE (Equations 1 and .
, where y is the observed proportion and ycC is SSP proportion in category i, and n is the number of categories.
Compatibility of the spatial distribution of SSP with the built-up land positions according to ESRI Sentinel-2 Land Cover.
ycAyaya = Oc .
cycn Oe ycCycn | ycu ycn=1 ycu ycIycAycIya = Oo Oc.
cycn Oe ycCycn )2 ycu ycn=1 Vulnerability Analysis Determination of Scores per Attribute Class The method for calculating scores per class follows the schema you established in the table.
ensure consistency and facilitate multicriteria aggregation, all scores are expressed within the range .
, .
If an initial score column falls outside .
, .
, linear normalization can be applied Musiyam et al.
Page 388 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
without altering the actual ordering: sA.
= .
A Oe min sA) / .
ax sA Oe min sA).
Table 2 illustrates this basic schema.
Table 2.
Criteria and Scoring of Vulnerability.
Criteria Age Education Occupation Gender Category Description Vulnerability Score extremely vulnerable strongly vulnerable less vulnerable strongly vulnerable >60 extremely vulnerable Elementary School extremely vulnerable Yunior High School strongly vulnerable Senior High School University Graduate less vulnerable Informal Worker extremely vulnerable Self-employed/entrepreneur Employee strongly vulnerable Man less vulnerable Woman Regarding age groups, the young .
Ae10 year.
and elderly (>60 year.
are assigned the highest scores due to mobility limitations, dependency, and increased support needs during emergency The middle working age group .
Ae40 year.
receives a lower score because of its relatively high responsiveness.
This mapping also assumes that higher levels of education are linked to greater risk literacy, better access to information, and improved ability to use servicesAi thereby reducing vulnerability scores.
Jobs with low income stability or high physical exposure are assigned higher scores .
, category .
because of increased economic and operational risks during flood disruptions.
From a gender perspective, women are considered more vulnerable than Determination of Vulnerability Parameter Weights The established core preferences are: Age .
ery importan.
Occupation .
Education .
omewhat importan.
Gender .
ess importan.
(Table .
These preferences are linked to the Saaty scale as anchor vectors s = .
, 5, 3, .
The paired comparison matrix A = .
A] is created as anchor ratios, where aA = sA/s, ensuring the matrix's transitivity and mathematical consistency (Table .
Table 3.
Summary of Saaty Scale.
Value 2,4,6,8 Meaning Equally important A little more important More/stronger important Very strong important Extreme importance Intermediate value Opposite preference Table 4.
Pairwise comparison matrix A.
Age Occupation Education Gender Age Work Education Gender The AHP weights w are derived from the normalized principal eigenvector of matrix A (Table .
When constructing anchor ratios, the maximum eigenvalue becomes max = n.
as a result, the Musiyam et al.
Page 389 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
consistency index and ratio are: CI = .
ax Oe .
Oe .
= 0 and CR = CI/RI = 0 .
ith RI for n = 4 being 0.
Table 5.
AHP Weights Between Attributes.
Attribute Age Occupation Education Gender Total Weight .
A) Determination of Vulnerability Index Vulnerability (V) is the tendency of a population to experience adverse impacts when exposed to hazards, influenced by demographic characteristics and socio-economic conditions.
In this context.
V does not measure the likelihood of floods .
azard domai.
, but rather the internal capacity of individuals to anticipate, absorb, and recover.
The scope of indicators is limited to attributes directly available from the SSPAinamely, age, gender, education level, and employment status Ai where Si is the score for the i-th attribute category and Wi is the weight for the i-th attribute (Equation .
ycO = Oc ycIycn ycOycn ycn=1 Risk Analysis (R) Risk analysis combines hazards (H) and vulnerabilities (V) to evaluate the potential impact of flooding on populations.
We use a complementary approach: .
Relative Risk Index: R = H y V, with H and V scaled from 0 to 1.
Results and Discussion Spatial Synthetic Population (SSP) The SSP modeling result is shown in Figure 4.
The resulting SSP includes 4,459,936 individual entities geolocated on a 100 m grid in the Upper Bengawan Solo watershed, derived from GHSPOP R2023.
Each individual is assigned directional jitter within a radius of O100 m of the pixel center and is assigned attributes such as age, sex, education, and occupation, which are calibrated against official district or city level statistics via marginal alignment (IPF) until they are consistent.
Figure 4.
Spatial Synthetic Population Map.
Musiyam et al.
Page 390 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
Attributes Validation The results of attribute validation are shown in Table 5 and Figure 5.
Based on these results, the errors for each attribute are as follows: education (MAE 4.
RMSE 5.
19%), occupation (MAE RMSE 6.
93%), gender (MAE 0.
RMSE 0.
0103%), and age (MAE 0.
RMSE 0.
0163%).
Notably, gender and age are nearly identical to the observed values .
difference of about 0.
, highlighting the success of mapping age bins and maintaining consistent category This alignment is important because these attributes are often key factors in flood vulnerability indices .
uch as the proportion of children and the elderly as vulnerable group.
the stability of their values helps prevent errors from affecting the overall risk results.
Table 6.
Error Metrics.
Attribute MAE (%) RMSE (%) Education Occupation Gender Age Figure 5.
Comparison of Population Distribution by: .
Gender, .
Occupation, .
Education, .
Age.
Regarding education and occupation, the MAE of 4.
9Ae6.
5 reflects different dynamics: the SSP distribution closely matches the observed distribution .
he pattern of comparisons between categories is preserve.
, but there is an offset in some high frequency categories.
In practice, this often happens when category ontologies are not perfectly aligned, such as combining High School/Vocational School into a single source and distinguishing High School from Vocational School in In these cases, a few percentage points difference in one category is usually offset by a comparable difference in a nearby category.
Spatial Error Analysis The SSP validation results against GHS-POP for 30 sample pixels show a plausible pattern (Figure .
The relationship between the two is strong .
= 0.
, yet there is a systematic overcount with a bias of 10.
4 people per pixel (MAE 12.
RMSE 22.
, and a 95% agreement limit that remains wide (Oe28.
74 to 49.
, indicating significant residual uncertainty in some pixels.
Structurally, the regression slope is less than 1, and increasing error at higher density bin points to heteroskedasticity and the potential for spatial AubleedingAy caused by jitter points extending beyond Musiyam et al.
Page 391 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
pixel limits and non mass preserving allocations per pixel.
The SSP has successfully captured the density gradient but needs scale calibration, and jitter restrictions within pixel polygons and allocations hinder consistency at 100 m.
With these adjustments, especially for aggregations Ou300 m, the accuracy is expected to improve, but expanding the sample will be necessary to stabilize the metrics and reduce the Level of Acceptance range.
Figure 6.
Comparison between GHS and SSP Pixel Values .
= .
Additionally, the distribution of SSP locations compared to ESRI Sentinel-2 Land Cover .
shows high consistency: out of a total of 4,235,296 points, 91.
95% .
,894,.
are on built-up land and 8.
05% .
on non-built-up areas.
Treating this as an estimate of binomial proportions, its statistical uncertainty is minimal .
% confidence interval approximately 91.
92Ae 98%), indicating a strong Aulocation accuracyAy signal for the SSP.
Substantively, the roughly 8% off-built-up figure remains reasonable given .
thematic errors in land cover maps .
uch as commission/omission of built classes, especially in rural residential areas mixed with vegetatio.
, .
the impact of the 10 m boundaryAimisregistration and mixed pixels at settlement edges, and .
jitter or bleeding of SSP points near class boundaries.
These results strongly support the conclusion that the placement of SSP aligns with built-up spaces.
the remaining 8% likely results from class boundary issues and thematic uncertainties, which could be minimized through small buffering, datum/registration alignment, or the use of building footprints in the test zone.
Population Vulnerability Analysis based on SSP The demographic vulnerability analysis indicates a very AulowAy profile (Figures 7 and .
The average weighted population is 0.
cale 0Ae.
, with 99.
63% of the population categorized as Very Low .
43%) and Low .
20%), while only 0.
37% falls into the Moderate category.
The AHP weights are consistent (CR=.
, confirming that age is the primary factor .
=0.
, followed by occupation .
, education .
, and gender .
Consequently, pockets of vulnerability mainly develop in areas with higher proportions of children and the elderly, or in areas dominated by low adaptive capacity jobs .
nformal/daily worker.
and low education levels, even though their overall contribution is smaller than their share of the Spatially, the impact is that vulnerability hotspots are limited but operationally significant: although they account for only 0.
37% of the population, the intersection of moderate pockets with high hazard zones (H) could prioritize interventions such as increasing household preparedness, improving access to emergency services, and strengthening social networks.
These results highlight that most regions possess reasonable demographic resilience reserves.
risk management must continue to focus on micro locations with vulnerable age groups and job/education vulnerabilities to deploy mitigation resources effectively.
Musiyam et al.
Page 392 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
Figure 7.
Distribution of Population by Vulnerability Class.
Figure 8.
Spatial Distribution of Population by Vulnerability Class.
Risk The distribution of the population's flood risk based on SSP .
23 million entitie.
is heavily concentrated in the Low Risk category .
,391,760.
21%), with a substantial portion in the Moderate Risk category .
,469.
64%).
Meanwhile, the tail distribution is nearly zero for the Very Low .
,026.
12%) and High .
,602.
04%) categories, indicating that overall exposure is primarily in the low risk class.
However, about 832,071 (A19.
68%) of the population falls into the moderate high risk category, which should be the focus of priority interventions, such as improvements to early warning systems, micro infrastructure protection, and spatial planning in hotspots .
ee Figure .
Musiyam et al.
Page 393 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
Figure 9.
Spatial Distribution of Population by Risk Index.
The GetisAeOrd Gi analysis reveals that risk hotspots form significant clusters along river network corridors and in urban core areas (Figure .
The directional pattern along water pathways indicates the dominance of river flooding in floodplains, while the clustering in urban zones reflects pluvial flooding caused by high impervious surfaces and limited drainage capacity.
both are worsened by high exposure .
opulation/asset densit.
and local vulnerabilities .
nformal settlements, fragile buildings, service inequalit.
Therefore, the interaction of HyV explains the consistently positive z-score of Gi observed in riverbanks and urban pockets.
Operationally, this involves addressing setbacks and restoring floodplain connectivity, improving drainage systems and greenblue infrastructure in cities, and prioritizing early warning systems and protecting critical facilities in hotspots.
To ensure the accuracy of these conclusions, sensitivity testing should be performed on spatial neighbor weighting schemes .
ncluding river network based method.
, along with corrections for multiple testing and cross validation using historical flood records.
Figure 10.
Risk Hotspots.
Musiyam et al.
Page 394 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
Discussion The metrics findings show that the synthetic population distribution (SSP) has been accurately reconstructed regarding attribute spread and spatial distribution, although some residual errors remain in several categories.
In high resolution population mapping, harmonizing category definitions and maintaining consistent classification schemes are essential steps to reduce artifact related residual bias (Rubinyi et al.
, 2.
Substantively, age and gender attributes exhibit practical errors of approximately 0.
(OO0.
01%).
This confirms the success of age bin mapping, the consistency of category labels, and the normalization of proportions.
High accuracy in these two attributes is essential because the proportions of children and the elderly often influence response capacity and evacuation needs, while demographic structure also impacts risk projection by incorporating population dynamics and climate change (Shu et al.
, 2.
Therefore, the propagation of errors to the overall vulnerability index should be minimal.
Conversely, in education and employment, a mean absolute error (MAE) of around 5Ae7 percentage points was observed.
While this is generally considered a good fit for categorical shape, some dominant classes still show offset levels.
From a policy standpoint, such deviations tend to affect thematic evaluationsAisuch as post flood job training programs or school level disaster literacy initiativesAimore than they impact spatial risk rankings between zones, provided these two attribute weights do not dominate the composite index.
From a utilization standpoint, current performance is sufficient for operational risk mapping at a resolution of Ou100 m .
r village leve.
The research on composite based flood risk assessment models indicates that aggregate outcomes are fairly resilient to minor variations in less sensitive components, as long as the most sensitive factorsAiin this case, age and genderAiare accurately However, it is also important to note that fluctuations in population datasets alone have been shown to alter flood exposure estimates across different regions (Li et al.
, 2024.
Zhang et , 2025.
Zhang et al.
, 2.
Consequently, documenting and monitoring residual patterns remain essential for quality control, especially in urban and peri urban areas, which often display greater social and economic diversity.
Regarding sources of bias, three main factors should be noted.
First, misalignment between category lexicons .
bserved vs.
SSP) could cause discrepancies of several percentage points in large Second, imperfections in conditioning employment attributes on demographic predictors and regional contexts .
, age, education, and urban rural typologie.
may lead to small but systematic distortions.
Third, rounding in reference data can mathematically amplify differences in high frequency classes without changing the overall shape of the distribution.
Two low cost steps are recommended that do not alter the SSP architecture: .
post synthesis ranking or Iterative Proportional Fitting (IPF) in each territorial unit to "attach" margins to the targets.
this method is common and effectively reduces the mean absolute error (MAE) of categories to <3 percentage points (Nejad et al.
, 2.
, and .
one to one category ontology harmonization .
onsistently merging or crossing "similar" classe.
, as recommended in high resolution synthetic population mapping (Rubinyi et al.
, 2.
Both steps improve accuracy without sacrificing spatial consistency or computational efficiency.
As an additional measure, two more tests are recommended.
First, a sensitivity analysis of the weight assigned to the vulnerability index: assessing the stability of spatial rankings across reasonable variations in educational and employment weights will confirm the robustness of the results.
Second, constructing a residual map .
bserved Ae SSP, in %) for each main category can help identify spatial bias patternsAisuch as residual concentration in urban cores versus peri urban areasAithat may influence the next ranking stage.
This can also help distinguish between methodological bias and actual demographic heterogeneity (Zhang et al.
, 2.
In relation to medium to long term planning, combining demographic structure accuracy with these light balancing mechanisms is also important for a risk projection framework that considers future population dynamics and climate related hazard scenarios (Shu et al.
, 2.
Overall, these results support the conclusion that the SSP has adequately captured key demographic structures for area prioritization and operational level mitigation planning, with room for improvement through IPF/ranking and category harmonization.
This approach aligns with industry best practices and offers a way to increase precision without adding computational complexity or model architecture overhead (Nejad et al.
, 2021.
Rubinyi et al.
, 2022.
Shu et al.
, 2023.
Zhang et al.
, 2025.
Zhang et al.
, 2.
Conclusion This study presents the development of SSP for flood risk analysis in the Upper Bengawan Solo Watershed.
The developed SSP is valid in both a distributional and a spatial sense, with some Musiyam et al.
Page 395 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
Acknowledgements The author would like to express sincere gratitude to all parties who contributed to the research and completion of this manuscript, including mentors, funding supporters, and those who provided essential data and information.
Their assistance has been invaluable to this suggestions for improvement.
For attributes, the fit to reference data is very high for sex and age (MAE=0.
RMSE=0.
0103 and MAE=0.
RMSE=0.
Education (MAE 4.
RMSE 5.
and employment (MAE 6.
RMSE 6.
preserve a similar distribution shape to the observed but show an offset in some high frequency categories.
In spatial validation results, the SSP-GHS relationship is strong .
=0.
, but there is an overcount of 10.
4 people per pixel (MAE 12.
RMSE 22.
and wide 95% Limits of Agreement (Oe28.
74 to 49.
, indicating heteroskedasticity and possible issues from jitter and allocations not being mass preserving per 100 m.
Limiting jitter within pixels, using conservative allocations, and aggregating at Ou300 m are expected to improve accuracy.
Positional validation against ESRI Sentinel-2 shows that 95% of points are within built-up areas (CI95%OO91.
92Ae91.
98%).
In comparison, about 8% outside built-up areas is likely due to thematic or edge uncertainty, which can be reduced through buffering, improved registration, or more precise building footprint definitions.
The risk of residents is mainly in Low .
21%) and Moderate .
64%) categories.
Getis-Ord Gi analysis confirms risk hotspots tend to run along river corridors and cluster in urban centersAireflecting the combined effects of river flood and pluvial processesAiwith operational implications: setback planning, floodplain restoration, improving drainage and green blue infrastructure, strengthening early warning systems, and protecting critical facilities, alongside spatial weight sensitivity tests, corrections for multiple testing, and cross validation against historical floods.
In short, the research enables fine scale flood risk assessment, identifying hotspots for targeted interventions such as urban planning, drainage improvements, and early warning systems.
Validated and robust, the SSP framework offers policymakers a practical tool to enhance resilience, optimize resources, and mitigate the societal impacts of rising flood hazards.
References