HTTPS://JOURNALS.
UMS.
AC.
ID/INDEX.
PHP/FG/
ISSN: 0852-0682 | E-ISSN: 2460-3945
Research article Optimized Artificial Intelligence-Based Algorithm for Groundwater Potential Mapping in Trenggalek Regency.
Indonesia Muhammad Hidayatul Ummah1.
Diyono Diyono2,* Geomatics Engineering Master Program.
Department of Geodetic Engineering.
Universitas Gadjah Mada.
Jln.
Grafika No.
Yogyakarta .
Indonesia.
2 Department of Geodetic Engineering.
Faculty of Engineering.
Universitas Gadjah Mada.
Jln.
Grafika No.
Yogyakarta .
Indonesia.
Correspondence: diyono@ugm.
Citation:
Abstract Ummah.
, & Diyono.
Op- Groundwater management is essential as more than one-third of the world's population relies on groundwater as a source of freshwater.
Exploration of groundwater potential serves as a practical implementation to ensure accessibility to freshwater.
Therefore, this study developed a machine learning (ML), deep learning (DL), and stacking learning (SL) based model for groundwater potential mapping in Trenggalek Regency.
Indonesia.
A total of 740 spring locations were used as training data, and 18 variables were considered in the modelling.
The eighteen parameters were classified into geological, topographic, land cover, climatological, hydrological, and geophysical factors.
We used several algorithms, including gradient boosting decision trees (GBDT), random forest (RF), recurrent neural network (RNN), convolutional neural network (CNN).
SL GBDT-RF.
SL CNN-RNN, and SL GBDT-RF-CNN-RNN.
This study optimized each basic learning task through hyperparameter fine-tuning using a tree-Parzen structured estimator (TPE) method.
Models were evaluated using four metrics: accuracy (Ac.
Cohen's kappa (CK).
Matthews correlation coefficient (MCC), and receiver operating characteristic (ROC) area under the curve (AUC).
Of the seven models generated.
SL GBDT-RF achieved the best performance on the test data, with Acc.
MCC.
CK, and AUC values of 0.
915, 0.
915, and 0.
990, respectively.
The geological unit parameter has the highest relative contribution rate in all prediction models.
Based on the best model, the study area is dominated by the low-potential class, accounting for 31.
This study contributes to providing a benchmark for the development of groundwater potential prediction using ML.
DL, and SL algorithms across various case studies.
In addition, this study can be used by concerned stakeholders in sustainable water resource planning, drought disaster management, and the prevention of inappropriate groundwater exploration.
timized Artificial Intelligence-Based Algorithm for Groundwater Potential Mapping in Trenggalek Regency.
Indonesia.
Forum Geografi.
, 398-419.
Article history:
Received: 11 September 2025 Revised: 2 December 2025 Accepted: 2 December 2025 Published: 10 December 2025 Keywords: Groundwater potential mapping, machine learning, deep learning, stacking learning, and hyperparameter optimizations.
Introduction Groundwater is one of the freshwater resources to fulfill common needs, as well as for agriculture and industry (Li et al.
, 2.
According to AQUASTAT data published by the Food and Agriculture Organization (FAO), the global withdrawal rate of freshwater from groundwater in 2022 This places groundwater as the second-largest source of freshwater withdrawal in the world (FAO, 2.
Therefore, mapping groundwater potential is necessary for the sustainable management of water resources (Khosravi et al.
, 2.
, especially in Trenggalek Regency.
This regency, located in East Java Province.
Indonesia, is affected by recurrent annual drought (Regional Disaster Management Agency Of Trenggalek Regency, 2.
In addition, more than 60% of the Trenggalek population relies on groundwater as a freshwater resources (Statistical Centre Of East Java Province, 2.
Exploration of groundwater is a de facto solution for ASEAN countries such as Indonesia during emergencies such as droughts (Vrba & Salamat.
The groundwater potential map shows locations that sufficiently supply freshwater resources.
This information can be used to formulate strategies for emergencies such as drought.
Copyright: A 2025 by the authors.
Submitted for possible open access publication under the terms and conditions of the Creative Commons Attribution (CC BY) license .
ttps://creativecommons.
org/licenses/by/4.
0/).
Ummah & Diyono Groundwater exploration using terrestrial methods such as drilling, hydrological, and geo-electricity surveys is less effective for large areas, even if it produces great accuracy (Tolche, 2021.
Yariyan et al.
, 2.
Determining groundwater potential zoning using terrestrial survey techniques requires high-end instruments and skilled labor.
In addition, surveys are not viable in areas with limited accessibility (Abdelouhed et al.
, 2.
Consequently, terrestrial exploration of groundwater potential is unlikely to be feasible for large areas such as regency within a limited time, especially given the highly diverse geomorphological conditions in the study area.
Time and cost remain a consideration in socio-environmental studies (Muthu & Sudalaimuthu, 2.
Geographical information system (GIS) and remote sensing (RS) data can be used for groundwater potential identification using specific approaches to evaluate groundwater influence parameters (Andualem & Demeke, 2.
Expert judgement approaches, such as the analytical hierarchy process (AHP), have limitations on subjectivity to compute parameter weight influencing groundwater (Esen, 2.
Page 398 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
Currently, the latest approach is artificial intelligence (AI).
Both machine learning (ML) and deep learning (DL) have optimal performance in predicting groundwater potential in various locations by considering multiple factors such as geology, climatology, topography, hydrology, and land cover (Pokhrel et al.
, 2.
These approaches are more effective than expert judgment-based approaches and linear statistics because they can overcome the complex and non-linear relationship between groundwater influencing factors and groundwater presence through certain statistical approaches (Nguyen et al.
, 2.
Various AI-based algorithms have been used for groundwater potential mapping.
Several ML algorithms applied in this context include artificial neural network (ANN), logistic model tree (MLT), logistic regression (LR), alternating decision tree (ALT), k-nearest neighbor (KNN), random forest (RF).
AdaBoost (ADB), support vector machine (SVM), decision stump (DS), catBoost (CB), extreme gradient boosting (XGBoos.
, and voting ensemble (VE).
With these algorithms, they produce area under the curve (AUC) values of 0.
720 to 0.
993 (Chen et al.
, 2020.
Halder et al.
, 2024.
Nguyen et al.
, 2024.
Parasar et al.
, 2025.
Pham et al.
, 2019.
Sarkar et al.
In addition.
Arabameri et al.
conducted a study that integrates ML algorithms with multi-criteria decision making (MCDM).
Their work produced a groundwater potential model with an AUC of 0.
925 by integrating the frequency ratio (FR) technique.
Vise Kriterijumska Optimizacija I Kompromisno Resenje (VIKOR), and RF.
DL algorithms can recognize complex relationships between dependent and independent variables, making them suitable for modelling applications that require high complexity.
DL algorithms used for groundwater potential mapping in previous studies include convolutional neural network (CNN), long short-term memory (LSTM), deep learning neural network (DLNN), deep learning tree (DLT), deep boosting (DB), and recurrent neural network (RNN).
From these studies, the AUC values varied from 0.
801 to 910 (Chen et al.
, 2022.
Hakim et al.
, 2022.
Moughani et al.
, 2023.
Ragragui et al.
, 2.
There is no universal judgment on the best algorithm for mapping groundwater potential, either DL or ML.
This arises because ML and DL-based modelling are highly dependent on the characteristics of the training data and on how the algorithm processes them.
Many previous studies have focused on comparing algorithms with pre-defined hyperparameters for groundwater potential mapping (Arabameri et al.
, 2020.
Chen et al.
, 2020.
Chen et al.
, 2022.
Parasar et al.
, 2025.
Pham et al.
, 2.
On the other hand, algorithm optimization through hyperparameter fine-tuning plays an important role in improving the generalization and robustness of models (Seifi & Niaki, 2.
Hyperparameter tuning is essential for controlling the behavior of algorithms during the training process, thereby affecting the performance of ML and DL models (Wu et al.
, 2.
The most common techniques used in the hyperparameter tuning process in spatial modelling applications are grid search (Kanwar et al.
, 2025.
Lei et al.
, 2025.
Li et al.
Nguyen et al.
, 2023.
Shams et al.
, 2.
These techniques are relatively simple, but the computational cost increases with the complexity of the search space and the number of hyperparameters examined.
Methods such as the tree-parzen structured estimator (TPE) can reduce this computational cost by using a Bayesian-based approach.
The hyperparameter tuning process can proceed more efficiently (Lai, 2.
However, the use of TPE for hyperparameter optimization in spatial modelling applications, particularly for groundwater potential mapping, remains relatively limited, especially in Indonesia.
Each algorithm has its own weaknesses in performing the prediction process.
Performance improvements to address these weaknesses, through ensemble learning between single algorithms .
ften referred to as basic learnin.
(Wang et al.
, 2.
Several techniques have been developed to integrate basic learning methods, namely boosting, bagging, and stacking.
Among these three ensemble techniques, stacking has outperformed boosting and bagging according to several studies in various case studies, including for predicting soil moisture, agribusiness product prices, landslide susceptibility, and high-frequency trading (Das et al.
, 2022.
Ferrouhi & Bouabdallaoui.
Ribeiro & Coelho, 2020.
Wu & Wang, 2.
The stacking technique is advantageous because it can identify the strengths and weaknesses of each basic learning model.
Stacking learning can combine heterogeneous basic learners, which bagging and boosting cannot.
Stacking Learning (SL) is a technique that integrates the output of each base learning tier .
ier 1 outpu.
into tier 2 using meta-learning.
Previous research integrates multiple base learning using ensemble learning techniques such as voting, boosting, and bagging (Lv et al.
, 2.
Explicit and concurrent evaluation of ML.
DL, and stacking learning for groundwater potential mapping remains limited, especially in Indonesian case studies.
The use of AI algorithms for groundwater potential mapping in Indonesia in previous studies used RF.
SVM.
ANN, and XGBoost, with accuracy values varying from 0.
58 to 0.
978 (Nugroho et al.
, 2024.
Nugroho et al.
, 2.
Performing DL and stacking, which integrates multiple algorithms, has not yet been conducted in Indonesia.
Ummah & Diyono Page 399 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
Previous studies using DL and ML algorithms have mainly considered topography, land cover, geology, hydrology, and hydrogeology as influence factors for groundwater potential mapping.
However, to the best of our knowledge, geophysical factors such as the mean shear-wave velocity at 30 m depth .
and the complete Bouguer anomaly (CBA) have not been considered in studies mapping groundwater potential using ML and DL algorithms.
Both parameters have an indirect influence on the presence of groundwater.
There are even studies that interpret groundwater potential from gravity measurements (Alshehri & Mohamed, 2023.
Handayani et al.
, 2.
The CBA parameter is a product of the gravity anomaly.
According to Newton's law, gravitational force is a function of the mass distribution and spatial location within and on the surface of the Earth.
changes in groundwater accumulation affect the water mass in a region.
Thus, the gravity force value affects the presence of groundwater in an area.
In addition, with the gravity force value, it is possible to interpret the porosity conditions of rock units that affect the groundwater transport system (Rustadi et al.
, 2.
Vs30 correlates with geological maturity and primary lithology (Meneisy et al.
, 2.
Vs30 values can be used as an indication of fluid saturation levels in a rock layer, particularly in early-stage, unconsolidated lithological units (Okay & ynzacar, 2.
Shear wave velocity provides information related to aquifer characteristics.
It can provide information related to geological properties such as hydraulic conductivity, permeability, storability, and porosity (Khalilidermani & Knez, 2.
Despite CBA and VS30 indirectly affecting groundwater potential, these two geophysical parameters can enhance the performance of groundwater potential prediction.
These geophysical factors provide subsurface information that complements other parameters.
The model can identify groundwater potential characteristics in the study area by leveraging hidden geophysical controls, yielding a more reliable model.
Therefore, to fill gaps in previous research, this study evaluates the implementation and integration of ML and DL algorithms for mapping groundwater potential in Indonesia, with particular focus on the Trenggalek Regency area.
In addition, this study employs hyperparameter fine-tuning with TPE to optimize model performance.
In this modelling, geophysical factors such as CBA and Vs30 are considered modelling parameters.
The ML algorithms used are RF and GBDT.
Meanwhile, the DL algorithms used are 1-D CNN and RNN.
The ML and DL algorithms were chosen because, based on the literature review, they have high performance and can identify relationships between groundwater influencing factors and the potential presence of complex There are three schemes for integrating the algorithms: integrating each DL and ML algorithm, and combining DL and ML.
The integration technique used is stacking with logistic regression (LR) as the meta-learner.
Therefore, this study will produce seven models that will be evaluated for performance using accuracy.
Cohen's kappa.
Matthews correlation coefficient, and receiver operating characteristic (ROC) AUC.
This study then compares the performance of the nine models to obtain the optimal model for accurately identifying groundwater potential in Trenggalek Regency.
Research Methods Study Area Trenggalek Regency is one of the regions in East Java Province.
Indonesia.
Trenggalek is located at 111A 23' 31.
2Ay to 111A 50' 48.
95Ay E, and 8A 23' 18.
46 " to 7A 53' 18.
16Ao S.
Trenggalek Regency experiences hydrometeorological drought disasters that occur annually.
There were 123 droughtrelated disasters in Trenggalek Regency throughout 2023, highlighting the water-scarcity problem.
The population growth rate of Trenggalek Regency was 0.
51% in 2023, in line with the increasing dependence on groundwater as a freshwater source, which reached 135,293 m3.
Therefore, a groundwater potential map is urgently needed, as it can help in optimizing the fulfillment of clean water needs in Trenggalek Regency.
Trenggalek Regency has diverse geomorphological conditions.
Two-thirds of Trenggalek Regency is mountainous.
Trenggalek Regency has elevations ranging from 0 to 1,179 metres above sea level.
Trenggalek Regency is a tropical region with an average rainfall of 1,879 to 2,756 mm/year and an average land surface temperature of 24AC to 31AC.
The geology of Trenggalek Regency is dominated by the Mandalika formation, which is Neogene in age and consists of interbedded volcanic breccia, lava, tuff, and sandstone.
Ummah & Diyono Page 400 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
Figure 1.
Location of Trenggalek Regency and the Distribution of Training Points.
Conditioning Factors of Groundwater Potential The selection of conditioning factors can affect the quality of groundwater potential modelling The conditioning factors were determined based on a literature review and data availability.
This study considered 18 parameters such as TWI, river density (RD).
SPI, land cover (LC), normalized difference vegetation index (NDVI), annual rainfall (AR), evapotranspiration (ET), land surface temperature (LST), atmospheric pressure in the surface (SP), elevation, slope, slope aspect, plan curvature (PC), lineament density (LD), geological unit (GU), soil type (ST), complete bouguer anomaly (CBA), and average shear wave velocity to 30m (VS.
The 18 parameters are classified into six factors: hydrology, land cover, climate, topography, geology, and geophysics.
Detailed conditioning factor information can be seen in Table 1.
Table 1.
The groundwater potential conditioning factors used in the study are described, and their influence on groundwater potential is discussed.
Factor Hyrology Land Cover Climate Parameter Topographic Wetness Index River Density Role in Groundwater Potential Control the ability of groundwater infiltration rate in an Influence on surface runoff and water infiltration.
Hydraulic connections can occur between surface water and groundwater in areas with high RD, which will affect the groundwater recharge process Negative SPI values are easier to retain groundwater Controls soil moisture level, penetration, and surface runoff rate.
Therefore, it can directly affect the level of groundwater recharge Source (Senapati & Das, 2.
NDVI The higher the vegetation density, the greater the groundwater potential of the area (Senapati & Das, 2.
Annual rainfall Main source of groundwater recharge that impacts surface runoff and depends on the duration, volume, and intensity of rainfall Increased ET can shift the proportion of the rainfall as a source of groundwater recharge (Ahmed et al.
, 2.
Groundwater quantity decreases when LST increases because water will evaporate faster from the soil (Silver et al.
, 2.
Stream Power Index Land Cover Evapotranspiration Land surface temperature Ummah & Diyono (Abijith et al.
, 2.
(Echogdali et al.
, 2.
(Senapati & Das, 2.
(Ghiat et al.
, 2.
Page 401 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
Table 2.
(Continue.
Factor Topographic Parameter Role in Groundwater Potential Source Atmospheric pressure at the surface (Kramer et al.
, 2.
Elevation SP produces subsurface material compression, thereby narrowing rock pores.
Therefore, it can influence the water absorption capacity of the aquifer Influence the local rainfall frequency and processes.
Slope Affect the rate of water flow and infiltration in an area.
Slope aspect Influence the moisture level and vegetation density in an area.
Curvature affects water flow convergence and divergence, as well as flow acceleration and declination.
Rock formation can influence the permeability and porosity characteristics of an aquifer.
Determine permeability in an area.
In addition, texture, granule size, and soil composition influenced soil water It represents secondary porosity and is crucial in regulating the hydrometeorological conditions of groundwater flow and storage.
VS30 can be used as an indicator of fluid saturation in a rock layer, especially in early unconsolidated lithological units.
CBA can interpret the porosity conditions of rock units that affect the groundwater transport system
Plan curvature Geology Geological Type
Soil Type
Lineament Density Geophysical VS30
CBA
(Berhanu & Hatiye, (Maskooni et al.
, 2.
(Benjmel et al.
, 2.
(Al-Abadi et al.
, 2.
(Muavhi et al.
, 2.
(Rehman et al.
, 2.
(Berhanu & Hatiye, (Rustadi et al.
, 2.
(Okay& ynzacar, 2023.
Khalilidermani & Knez.
There were 1,840 samples used to train the model, consisting of 740 potential points and 740 nonpotential points.
The possible points used are spring locations with discharge rates of 10Ae100 liters/second.
Meanwhile, non-potential points were random points located in GU with low porosity .
2% - 8%).
The distribution of training points can be seen in Figure 1.
All parameters were resampled in raster format with a spatial resolution of 30m.
The details of the data used in this study can be seen in Table 3.
The data was processed with spatial data processing software to produce modelling parameters.
The parameters can be shown in Figure 2 and Figure 3.
Table 3.
Summary of data resources and derived parameters required in this study.
Data Source Spring locations Local government of Trenggalek Regency National Geospatial Information Agency of Indonesia .
ell known as BIG) BIG
BIG
United States Geological Survey (USGS), which is obtained from the Google Earth (GEE) platform Climate Hazard Group Infrared Precipitation with Station Data (CHIRPS) from the GEE catalogue National Aeronautics and Space Administration (NASA) from the GEE catalogue.
Volcanology and Geological Hazard Mitigation Center of Indonesia .
ell known as PVMBG) Local government of Trenggalek Regency Murray lab Digital elevation model (DEM) River density Land cover Landsat-8 imagery Annual rainfall MODIS imagery Geological unit map Soil type map Global Gravity Model Plus (GGmPlu.
Free-air gravity model SRTM2Gravity terrain correction VS30 ERA5-Land Ummah & Diyono Spatial resolution/ 1:50,000 Derivative parameter.
Potential class training Elevation.
Slope, slope aspect.
TWI.
LD, and SPI
1:25,000
1:25,000
NDVI
5 arc degree ET and LST
1:100,000
GU and non-potential training 1:50,000 CBA
Murray lab
CBA
USGS
European Center for Medium-Range Weather Forecast (ECMWF) 30 arc seconds 11 km Vs30 Page 402 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
Figure 2.
Conditioning Parameters 1 to 9 for Groundwater Potential Mapping.
TWI.
RD.
SPI.
LC (GL: grass land.
LA: lake.
FO: forest.
FL: field land.
PL: plantation.
RE: residential.
SW: swamp.
PF:
paddy field.
SH:shurb.
RI: rive.
NDVI.
AR.
ET.
LST.
SP.
Ummah & Diyono Page 403 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
Figure 3.
Conditioning Parameters 10 to 18 for Groundwater Potential Mapping.
Elevation.
Slope.
Aspect (F: Flat.
N: North.
NE: Northeast.
E: East.
SE: Southeast.
S: South.
SW: Southwest.
W: West.
N: Nort.
PC.
LD.
ST (Al: Alluvium.
Md: Mediteranian.
Li: Litosol.
La: Latoso.
CBA.
VS30.
GU (AF: Arjosari formation, porosity A25%-40%.
Al: alluvium.
A25%-40%.
Alc: Alluvium and coastal deposits.
A25%-55%.
AM: Argohalangan morphocet.
A0.
2%-8%.
IR: Intrusive rock.
A0.
1%-8%.
JF:
Janten formation.
A25%-40%.
MF: Mandalika formation.
A7%-25%.
NF: Nampol formation.
A45%-55%.
OF: Oyo formation.
A25%-40%.
SM: Sedudo morphonite.
A0.
2%-8%.
WF: Wonosari formation.
A0.
27%4.
WuF: Wuni formation.
A48%-52%).
Ummah & Diyono Page 404 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
Pre-Processing This study uses two types of data: discrete and continuous.
To ensure data reliability in the modelling process, pre-processing was carried out.
Discrete data are transformed into numerical form using the one-hot encoding (OHE) technique.
OHE transforms each class into a binary vector, with 1 if it appears in a particular row and 0 if it does not.
The OHE technique can be seen in equation 1 (Samuels, 2.
Meanwhile, for numerical data, the Yeo-Johnson transformer (YJT) and standard scaler are applied.
The YJT is a technique that aims to transform the data distribution to near normal (Yeo & Johnson, 2.
On the other hand, the standard scaler or z-score standardization technique is a method to ensure model independence from specific feature scales.
The standard scaler transforms data to a mean of 0 and a standard deviation of 1 (Demir & Sahin.
The YJT and standard scaler can be seen in Equations 2 and 3.
= { ycnyce yc = ycn ycuycEayceycycycnycyce .
yuI Oe 1 c, yuI) = Oe[(Oeyc .
2OeyuI Oe .
2OeyuI {Oe ycoycuyci(Oeyc .
ycs= .
uI O 0, yc .
uI = 0, yc .
uI O 2, yc < .
uI = 2, yc < .
ycu Oe ycuI Where, ycycn .
Is OHE vector in parameter-I, class j.
c, yuI) is the result of the YJT transform.
yc is original value of the parameter.
yuI is a constant estimated by maximum likelihood estimation (MLE).
ycs is a standardized value.
ycu is the original value before the standardized value.
ycuI is the mean value.
and yua is the standard deviation (Demir & Sahin, 2024.
Samuels, 2024.
Yeo & Johnson, 2.
Modelling Process This study evaluates the use of ML and DL for groundwater potential mapping in Trenggalek Regency.
Indonesia.
In addition to using each algorithm individually, this study used the stacking technique to integrate ML and DL models.
Each algorithm performed hyperparameter tuning using a tree-structured Parzen estimator (TPE).
The two ML algorithms used are RF and GBDT.
RF is an ML algorithm that uses bootstrapping and bagging to form multiple DTs.
Each DT is constructed with several n-parameters, and n-data are randomly selected.
The result of the RF is the voting of the results of each single DT (Breiman, 2.
GBDT is a straightforward combination of DT and gradient descent.
GBDT comprises two main processes: boosting and classification.
These two processes can improve model performance and minimize overfitting through the regularization process within sequential steps to iterate on the loss function (Chen et al.
, 2.
Loss function is the difference of prediction and the actual value (Yang et al.
, 2.
Figure 4.
The Architecture of the DL Model: .
1D CNN.
RNN.
CNN and RNN were used as DL algorithms in this study.
CNN can solve complex problems and consists of three main layers: convolutional layers (CL), pooling layers (PL), and fully connected layers (FCL) (Zhang et al.
, 2.
CL is a convolutional filter .
lso known as a kerne.
where the input features expressed by a matrix of dimension Ny1yP are convolved with a filter of size Ummah & Diyono Page 405 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
ny1yp, where N>n and P>p, resulting in an output feature map.
PL is a subsampling technique applied to the feature map generated by CL.
After the PL layer, the feature map will be flattened and moved to the FCL layer.
In this layer, each neuron is connected to all neurons from the previous hidden layer, so it is called a fully connected (FC) approach (Fang et al.
, 2.
The overall architecture of the CNN used in this study is illustrated in Figure 4.
RNNs are designed to process data sequentially by considering hidden states that capture information about previous RNNs have recurrent connections (Ji et al.
, 2.
, as shown in Figure 4.
Such connections allow information to loop within the network.
The network memorizes the input information of the previous state .
and then applies it to the calculation of the output at state t.
The nodes between hidden layers are interconnected, and the input to the hidden layer includes not only the production of the previous input layer but also the output of the hidden layer from the previous state (Mienye et al.
, 2.
This study uses two strategies to optimize the prediction model: hyperparameter optimization for each algorithm and stacking.
Stacking is a method of successfully integrating different ML or DL models (Sikora, 2.
In the stacking model, heterogeneous prediction models are used, where the output of all the algorithms is given as input to the meta-model to get a more stable model (Khoshkroodi et al.
, 2.
Logistic regression (LR) is the meta-learner applied in this study, which utilizes the maximum likelihood method to discover the 'best fit' association between input and output (Sun et al.
, 2.
TPE is the technique used to tune each algorithm's hyperparameters.
TPE is a Bayesian technique used for optimizing hyperparameters that utilizes a kernel density estimator to model the probability distribution of hyperparameters using an objective function (Ishii et al.
, 2.
TPE imitates the density distribution function to optimize hyperparameters in both the optimal and worst-case The iterative approach is used to select hyperparameters for sampling, evaluate model performance, and update probability density functions.
The conditional probability theory ycy.
is used, where ycu is the hyperparameter and yc is the loss from using that hyperparameter.
The first step in TPE optimisation is selecting the threshold loss .
c O ) given the available data, such as based on the median.
The formation of probability density functions yco.
cu ) and yci.
is supported by the threshold loss as seen in equation 4 (Rong et al.
, 2.
Afterwards, other hyperparameter combinations are determined by maximizing the likelihood between yco.
cu ) and yci.
using equation 5 (Islam et al.
, 2.
ycnyce yc < yc O = { yci.
ycnyce yc > yc O ycaycyciycoycaycu ( yci.
Model Evaluation and Inspection Model evaluation is a validation process for the accuracy and reliability of a prediction model.
Evaluation in this study was carried out on two datasets, namely training and testing, to determine whether each model suffered from overfitting.
Model evaluation is carried out using four parameters.
Model evaluation parameters can be seen in Table 4.
Where TP .
rue positiv.
is a positive class that is predicted positive.
FP .
alse positiv.
is a negative class that is predicted positive.
alse negativ.
is a positive class that is predicted negative, and TN .
rue negativ.
is a negative class that is predicted negative (Stern, 2021.
Tharwat, 2.
Table 4.
Model Evaluation Metrics.
Metric Accuracy (ACC) MatthewAos Coefficient Correlation (MCC) CohenAos kappa (CK) ROC-AUC Equation ycNycA ycNycE ycNycA ycNycE yaycE yaycA ycNycE y ycNycA Oe yaycE y yaycA Oo.
cNycE yaycE).
cNycE yaycA).
cNycA yaycE).
cNycA yaycA) 2.
cNycE y ycNycA Oe yaycE y yaycA) .
cNycE yaycE).
cNycE yaycA).
cNycA yaycE).
cNycA yaycA) O ycNycEycI ycc.
aycEycI) Range value Best Value (-.
-1 Model evaluation was performed using k-fold cross-validation (KFCV).
KFCV divides the dataset into n-fold.
1/n fold is used to evaluate the model, and the other n-1 fold is used to train the model.
This research uses five folds.
This means the training-to-testing ratio is 80:20.
The KFCV technique is used to detect overfitting by using the t-test with degrees of freedom .
n-1, which was Model inspection improves our understanding of how to interpret the model.
Two additional Ummah & Diyono Page 406 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
approaches to model inspection in this study were permutation feature importance (PFI) and partial dependence plots (PDP.
PFI approximates the relative importance of parameters in a predictive model by permuting each parameter's values.
As accuracy varies significantly, the parameter has an important contribution to the model (Altmann et al.
, 2.
The PDP employs the Shapley additive explanation (SHAP) value, which computes the Shapley value for every sample in the dataset.
Shapley values offer a game-theoretic mechanism for reasonably apportioning the prediction output to individual input SHAP was initiated by Shapley, .
with game theory.
It considers every possible combination of parameters in a row of data to calculate the marginal contribution of each parameter.
This value is calculated for each parameter value.
The SHAP value can be calculated using In this equation.
OIycn yc.
cu ) is the SHAP value for feature i with the prediction model v.
ycI is the feature subset dataset.
F is the entire feature.
ycI OI ya/.
is a subset of feature F without considering parameter-I.
cuycIO.
) is the prediction output with subset S considering feature-i.
and yc.
cUycI ) is the prediction output with subset S without considering i-feature (Lundberg & Lee, 2.
OIycn yc.
= Oc ycIOIya/.
cI|! (.
Oe .
cI| Oe .
! .
cuycIO.
) .
! .
Oe yc.
cUycI )] Results and Discussion Hyperparameter Optimization Result Hyperparameter tuning is one of the crucial steps of constructing an optimal prediction model (Jafari et al.
, 2.
TPE reduces computational cost by selecting the most appropriate pair using a Bayesian-based method, rather than evaluating all pairs of hyperparameters.
The optimum hyperparameter configurations for each algorithm are presented in Table 5.
In this study, the number of iterations is limited to 100 trials, with test-set accuracy as the objective value.
Although there is no universal judgment on the number of trials that ensures maximum accuracy or convergence, several studies have used a setting of 100 trials and achieved convergent results (Fan et al.
, 2022.
Kilic et al.
, 2024.
Zhao et al.
, 2.
Table 5.
Optimization Result of the Hyperparameter for Each Algorithm Using TPE.
Algorithm
Hyperparameter Max depth each DT
Max random feature for each Min samples each leaf
Min samples can be split
Number of DT
Number of boosting Max feature for best split consideration Max depth each DT
Min samples make the internal Min samples each leaf
Learning rate
Number of subsamples Number of recurrent network
layers (RNL) Number of neurons in each RNL
Activation function in the output layer
Optimizer Activation function in output
Number of FCL
Number of CL
Number of neurons each FCL
Optimizer GBDT
RNN
CNN
Ummah & Diyono Search spaces Range .
Step Distribution Uniform Optimal [AosqrtAo.
Aolog2Ao.
AoNoneA.
Choice AosqrtAo , 2,.
, 2,.
Uniform
Uniform
Uniform
Uniform
1,500
[AosqrtAo.
Aolog2Ao.
AoNoneA.
Choice None Uniform Uniform 07,0.
5,0.
Uniform Choice Choice .
Uniform Uniform [AosigmoidAo.
AosoftmaxA.
Choice AosoftmaxAo [AoAdamAo.
AoAdamaxA.
Choice AoAdamAo [AosigmoidAo.
AosoftmaxA.
Choice AoSigmoidAo , 1,.
[AoAdamAo.
AoAdamaxA.
Uniform Uniform Quniform Choice 1,024 AoAdamAo Page 407 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
Figure 5.
The TPE Optimization Objective Value in the Trial Process of Four Basic Learning Models.
Optimal hyperparameter optimization using TPE can be visualized in a line diagram, as shown in Figure 5.
Figure 5 illustrates the relationship between trial n and the objective value.
According to the figure, the RF.
CNN, and GBDT algorithms have an accuracy of more than 0.
9 at the initial state or trial to 0.
Meanwhile, the RNN algorithm has an accuracy of 0.
49 at the initial state.
happens because, in the initial state, the hyperparameter configuration is randomly selected.
The best trial or convergence conditions for the RF.
GBDT.
CNN, and RNN algorithms are 4, 84, 25, and 86, respectively.
The highest objective values in the RF.
GBDT.
CNN, and RNN algorithms 952, 0.
962, 0.
962, and 0.
945, respectively.
Overall.
GBDT and CNN have the highest objective values, but CNN finds convergent conditions earlier than GBDT.
While it has a very low aim value at the initial state, the RNN algorithm experiences a drastic increase in the third trial.
Groundwater Potential Map: Each Model Groundwater potential maps are generated by applying each prediction model to pre-processed parameter data, including categorical data encoding, transformation, and standardization for numerical data.
There are seven prediction models generated in this study, namely RF.
GBDT.
RNN, 1D CNN.
RF-GBDT stacking, 1D CNN-RNN stacking, and RF-GBDT-RNN-1D CNN stacking.
The output of the modelling process is the probability of groundwater potential.
The range of probability values is 0 to 1.
The closer the value is to 1, the higher the groundwater potential.
Meanwhile, the closer the value is to 0, the lower the groundwater potential.
The groundwater potential probability is classified into five classes: very low .
- 0.
, low .
2 - 0.
, medium .
- 0.
, high .
6 - 0.
, and very high .
8 - .
Equal intervals were used to demonstrate the consistency and interpretability of groundwater potential mapping results expressed as continuous probability valuesAiequal intervals allowed for an objective comparison of groundwater potential class interpretations across prediction models.
Natural and quantile classification techniques are highly dependent on the distribution of each prediction model result, leading to less objective comparative analysis.
A few spatial modelling studies have used the equal interval technique to facilitate comparisons among models (Bi et al.
Hossen et al.
, 2025.
Nurwatik et al.
, 2022.
Prasad et al.
, 2.
The groundwater potential map produced in this study can be seen in Figure 6.
The map is visualized with graduated colors ranging from light blue to dark blue.
Light blue represents very low potential, while dark blue represents very high potential.
Overall, the probability distribution almost has the same pattern between prediction models, where high probability is found in the west of the study area.
Meanwhile, low probabilities are found in the north, some in the southwest.
Figure 7 presents the percentage area of each potential class and the percentage of spring locations within each class.
Each model indicates that the class with the largest area is the low class, with an area of 22.
32% to 31.
The very low class is the smallest area, except for the RF model.
The very low class has an area percentage ranging from 12.
57% to 17.
For the RF model, medium is the smallest-area potential class, with 16.
The percentage of potential points ought to be related in a direct relationship to the groundwater potential class.
The lowest potential class has the least percentage of points at 4.
05%, and the highest percentage is in the very high class at Most models show an inverse trend between the low and medium classes, with more potential points in the class of low potential.
However, in general all models showed that the class of very high potential contained the most significant proportion of potential points.
Ummah & Diyono Page 408 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
35,00%
60,00%
30,00%
50,00%
25,00%
40,00%
20,00%
30,00%
15,00%
20,00%
10,00%
Spring Location Percentage Area percentage Figure 6.
Groundwater Potential Maps Generated from Seven AI-based Algorithms: .
RF.
GBDT.
CNN.
RNN.
RF-GBDT.
RNN-CNN.
RF-GBDT-RNN-CNN.
10,00%
5,00%
0,00%
Very-low
GBDT
Medium
CNN
RNN
High
Very-high
RF-GBDT
Very-low
RNN-CNN
Low
RF-GBDT-RNN-CNN
Medium
High
0,00%
Very-high Figure 7.
Area Percentages (Shown by Bar Char.
and Spring-Location Percentages (Shown by Line Char.
by Potential Class.
Ummah & Diyono Page 409 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
Model Performance The predictive model that has been established was evaluated through the KFCV method and utilized the ACC.
MCC.
CK, and ROC-AUC assessment metrics.
The evaluation results are represented through the average value and standard deviation.
Standard deviation can be used to quantify the robustness of each metric across the five cross-validation iterations.
The results are shown in Table 6.
A low standard deviation for each metric indicates stable performance across all folding schemes.
Overall, all models achieved high prediction accuracy (ACC > 0.
MCC and CK > 0.
The GBDT algorithm demonstrated better consistency and stability than other single algorithms, as evidenced by its lower performance variance.
GBDT, which iterates on bias values, can produce more stable prediction models with the best performance.
Although both are built on DT.
GBDT is superior to RF because it minimizes prediction variance while placing greater emphasis on bias.
However, the performance of GBDT and RF is not significantly different, with only a 0.
01 difference on several parameters, and GBDT is relatively more stable than RF.
The bagging technique, which randomly selects features, makes RF less stable than the boosting technique used in GBDT.
Table 6.
Metric Evaluations Using KFCV and P-Value (PV) Results of the T-test Between the Training and Testing Datasets for Each Model.
Model
GBDT
CNN
RNN
RF-GBDT
CNN-RNN
RF-GBDTCNN-RNN
Training Acc 950 A 0.
950 A 0.
943 A 0.
935 A 0.
957 A 0.
939 A 0.
MCC
901 A 0.
902 A 0.
888 A 0.
877 A 0.
915 A 0.
915 A 0.
900 A 0.
901 A 0.
886 A 0.
871 A 0.
915 A 0.
913 A 0.
Testing Acc 938 A 0.
938 A 0.
930 A 0.
936 A 0.
957 A 0.
939 A 0.
MCC
877 A 0.
878 A 0.
863 A 0.
878 A 0.
915 A 0.
883 A 0.
876 A 0.
876 A 0.
859 A 0.
872 A 0.
915 A 0.
877 A 0.
Acc
MCC
955 A 0.
910 A 0.
909 A 0.
953 A 0.
907 A 0.
906 A 0.
Stacking learning (SL-ML) models that integrate RF and GBDT achieve the best performance and generalization on the test data, thereby confirming the safe and beneficial combination of the two algorithms.
The SL-ML technique outperformed because it complemented the weaknesses of each weak learner.
The SL technique reduced the bias and variance of the prediction values produced by each weak learner.
SL-ML had advantages over SL-DL and the combination of SL-MLDL due to the nature of the data used in the groundwater potential prediction model.
The DL algorithm outperforms ML when there is a highly complex relationship between dependent and independent variables.
This study confirms that the relationship between dependent and independent variables in mapping groundwater potential in the study area is not too complex.
DT-based algorithms such as RF and GBDT are better at capturing the relationship between dependent and independent variables in this case study.
In the DL group.
RNN outperformed CNN.
Although RNNs are designed to solve sequential problems such as time-series prediction, research confirms that RNNs also excel at non-sequential RNN builds internal hierarchical relationships among independent variables, enabling it to capture better relationships with dependent variables in the dataset used in this study.
This technique outperforms CNN-based convolutional feature extraction in this study.
Model performance can be visualized using the ROC-AUC curve, which is shown in Figure 8.
Several models show an AUC value of 1.
on training data, but decrease to 0.
98Ae0.
on testing data.
These models are RF.
GBDT, and SL-ML.
An ROC-AUC value of 1.
00 is standard in ensemble models based on decision trees such as RF.
GBDT, and SL-ML.
The same phenomenon has been observed in several studies using DT-based ensemble models (Jin et al.
, 2024.
Ouali et al.
, 2023.
Prasad et al.
, 2.
However, it should be noted that the decreases are not significant, with the resulting gap ranging from 0.
02 to 0.
This phenomenon shows that prediction models can learn the relationship between dependent and independent variables well enough to generalize new data .
The lowest AUC in the training dataset is for RNN, at 0.
The highest AUC value in the testing dataset is the stack ML model .
The other models have an AUC value of 0.
In general, for every model included: DL.
DL-ML.
GBDT.
RNN.
CNN, and Stack ML, the AUC values from prediction models using the training and testing data are all more than 0.
95, which indicates that all the models have excellent performance.
The t-test is performed to assess whether there is a statistically significant difference between the two evaluations on the training and testing data.
This was done to determine whether the model was overfitting.
The ya0 set was that the two datasets were not statistically different at a 95% Ummah & Diyono Page 410 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
confidence level.
The t-test result refers to the ycE Oe ycycaycoycyce (PV) which is the probability value to support ya0 .
The ya0 is acceptable if the PV is higher than 0.
The test results for each model were represented in Table 6 with the PV component.
Based on the PV value, it can be inferred that none of the models overfit because the PV value is greater than 0.
In general, all the prediction models performed well and did not exhibit overfitting.
Figure 8.
ROC-AUC Performance Evaluation of the Seven AI-based Models during Training on the Training dataset .
and the Testing Dataset .
Model Inspections PFI analysis identifies the most influential parameters in each prediction model.
The results of the PFI analysis were conducted on all four basic learning models, as can be seen in Figure 9Error! Reference source not found.
The PFI values are presented in a bar chart, with the x-axis ranging from 0 to 1 and the y-axis showing the variable names, sorted in order of highest PFI values.
According to these results.
GU is the most influential parameter in all models with PFI values ranging from 0.
242 to 0.
Other parameters that significantly influence the four prediction models are CBA.
RD.
TWI, and slope.
There are nine parameters with a PFI value of 0 in the RNN model: NDVI.
ST.
LC.
AR.
LST, slope aspect.
VS30.
LD, and PC.
Meanwhile, the other three models each have one parameter with a PFI value of 0.
Parameters with PFI 0 in the CNN.
RF, and GBDT models are ET.
SP, and AR, respectively.
Visually, other parameters have PFI 0 in the RF and RNN models.
However, these parameters have values greater than 0 but less than 0.
A value of 0 in PFI may be due to PFI's limitations in explaining agnostic models.
PFI only calculates the decrease in accuracy when the value of a variable in the test data is randomized to assess its importance.
A value of 0 in PFI does not represent that the variable has no theoretical effect on the prediction model.
However, this condition occurs when randomizing a particular variable does not decrease the accuracy of the initial PFI is highly sensitive to variables that are correlated with each other (Kaneko, 2.
two variables are highly correlated, randomizing one variable will not significantly affect the initial accuracy, because the other variable already captures the values required by the model.
For instance.
LST and NDVI have a correlation value of 0.
4 according to the analysis results.
This yields LST or NDVI PFI values of 0 or near 0.
A feature selection process is needed to ensure that the variables are not redundant with one another when creating a prediction model.
To identify the response characteristics of each parameter to the prediction model, this study uses PDP.
PDP describes the relationship between a parameter's value and its SHAP value.
PDP analysis is only performed on the best basic learning model, namely GBDT.
The analysis results can be seen in Figure 10Error! Reference source not found.
Continuous parameters are represented by a line graph .
ark blue lin.
with an uncertainty polygon .
ight blue colo.
The uncertainty polygon represents a confidence interval of 95%.
Meanwhile, for discrete parameters, it is defined as the average SHAP value for each class, shown in a bar chart.
Ummah & Diyono Page 411 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
Figure 9.
Feature Importance for Each Basic Learning Model.
The non-linear characteristics of each variable in relation to groundwater potential can be interpreted through the PDP.
Topographical features such as elevation and slope have a negative relationship with groundwater potential.
This indicates that the higher the surface elevation, the lower the groundwater accumulation.
The higher the surface, the faster the runoff process in the area, resulting in less infiltration.
It is supported by the condition that steeper slopes can increase lateral flow and reduce the infiltration process.
The south-east to north slope direction shows a positive SHAP value due to the shorter solar exposure compared to the north to east direction.
The duration of solar exposure can affect the level of evapotranspiration, resulting in a decrease in groundwater quantity in the area.
Concave terrain has a positive SHAP value due to its morphological ability to accumulate water properly.
Ummah & Diyono Page 412 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
Figure 10.
Partial Dependence Plots for GBDT Model Variables .
ontinuous = line with confidence interval, categorical = mean SHAP by clas.
Ummah & Diyono Page 413 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
Hydrological conditions such as SPI.
TWI, and river density likely have a negative relationship with groundwater potential.
SPI and TWI are hydrological parameters derived from topographical Drainage connectivity is represented by river density.
The higher the drainage connectivity and TWI, the greater the potential for surface flow accumulation, thereby increasing the probability of lateral infiltration phenomena.
SPI represents flow strength.
theoretically, the higher the flow strength, the lower the lateral infiltration.
However, the PDP results suggest a different The higher the SPI value, the higher the groundwater potential.
This could occur when supported by permeable geological and soil conditions.
Climatological factors play a significant role in the capability of groundwater recharge and storage.
AR with a range of 2,200Ae2,400 mm/year has a high SHAP value, whereas above this range, the SHAP value decreases significantly.
This occurs because high rainfall typically occurs in areas with high topography and is often associated with steep slopes.
This condition results in low infiltration rates despite high recharge potential due to high runoff processes.
ET shows a negative correlation with SHAP values, meaning that higher ET is associated with greater potential for groundwater, surface water, and vegetation water loss.
LST negatively correlates with SHAP, but SHAP values are highest when LST values are too.
It is possible that in high-temperature areas, supported by high groundwater runoff .
asin area.
, with permeable geological conditions.
negatively correlates with SHAP values, because higher atmospheric pressure can reduce porosity by narrowing the inter-rock gaps beneath.
Land cover and vegetation conditions influence the characteristics of groundwater dynamics.
The NDVI parameter is positively associated with groundwater potential.
This supports the statement that areas with high vegetation can increase the infiltration process.
The type of land cover with the highest SHAP value is residential.
Residential areas may reduce infiltration capacity.
However, this can be supported by geological conditions with high permeability and recharge potential.
Subsurface conditions greatly influence groundwater transport systems.
In the GU parameters, it has the highest range of SHAP values compared to other parameters.
The negative SHAP value in GU is obtained by the GU class with low porosity, namely 0% - 8%.
Meanwhile, positive SHAP values are obtained by the GU class with a high porosity of 25% to 55%.
The soil type parameter in all classes has a negative SHAP value.
The lowest negative value is Latosol, which has a low permeability value of less than 0.
13 cm/hour.
LD has a negative association with the SHAP value because sloping areas .
here lienaments are rar.
reduce the runoff rate, resulting in a positive effect on groundwater recharge.
Both CBA and VS30 geophysical properties are positively associated with groundwater potential.
This condition indicates that the higher the rock density, the higher the groundwater potential in the study area.
This may occur because volcanic influences dominate the study area.
The Mandalika geological formation dominates the study area.
This formation has rock types with medium to high density but is prone to fracturing.
The presence of these fractures allows surface water to be transported into groundwater.
Discussions This study implemented AI-based algorithms for groundwater potential mapping.
The utilization of AI-based algorithms can improve the accuracy and objectivity of groundwater potential mapping.
RF and GBDT algorithms represent ML in this study.
Meanwhile.
RNN and CNN represent the DL algorithm.
The integration of ML.
DL, and ML-DL algorithms is carried out to optimize the prediction model with the SL approach with LR as meta-learning.
Utilized 18 parameters, all prediction models performed well with ACC>0.
CK>0.
and MCC>0.
An interesting finding on the performance of the prediction models was that by applying SL to integrate the algorithms, they can improve 2.
04% to 4.
45% in ML stacking, 0.
44% to 2.
32% in DL stacking, and 18% to 5.
47% in ML-DL stacking compared to the performance of every single algorithm.
The application of stacking techniques has been demonstrated to decrease the bias and variance of prediction outcomes significantly (Lu et al.
, 2.
Another important finding is that the CBA parameter, which is rarely used in modelling groundwater potential, shows a significant influence on all models in this study.
Values of CBA variation represent lateral density characteristics of lithological units (Zaenudin et al.
, 2.
The most influential variable in each model is GU.
Rock characteristics such as porosity affect the absorption rate of surface water into groundwater.
The GU variable, as the variable with the highest relative contribution, aligns with research by Bai et al.
, which had a relative contribution value of 0.
Research by Nugroho et al.
GU is in the second-highest position with a relative contribution rate of 0.
However, other studies produced the opposite condition, where Ummah & Diyono Page 414 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
the GU parameter had less effect on the prediction model, for example, in a study by Aslam et al.
, where GU was in the last position in the relative contribution level of the prediction model, with a value of 0.
The less influential parameters in this study are mainly climatological factors.
for instance, in the GBDT model.
AR has the lowest PFI value.
SP is the lowest PI in the RF model, and ET is the lowest PFI in the CNN model.
This condition aligned with research by Hasan et al.
where climatological factors such as rainfall had the lowest relative contribution, at The rainfall variable in the study by Prasad et al.
was also lacking in significant influence in predicting groundwater potential with a contribution value of 10%.
However, studies by Aslam et al.
and Bai et al.
had different results, where rainfall had a significant influence on the prediction model.
This indicates that each case study has its own characteristics in indicating the potential presence of groundwater.
Compared to previous similar studies using a hybrid algorithm integration approach and AI for groundwater potential mapping (Arabameri et al.
, 2020.
Chen et al.
, 2020.
Chen et al.
, 2022.
Parasar et al.
, 2025.
Pham et al.
, 2.
, this study has advantages in performance optimization through hyperparameter tuning and stacking techniques.
Both techniques can optimize the performance of the prediction model, resulting in good and stable evaluation values during cross-validation.
In addition, this study also uses modelling parameters that are rarely used in similar studies, namely CBA and VS30.
Both parameters have been proven to contribute significantly to modelling using AI-based algorithms through PFI values.
Modelling parameters used in this study not only utilize surface phenomena but also subsurface phenomena through parameters on geological and geophysical factors.
Hence, the model is more reliable and applicable for decisionmaking purposes related to freshwater resource management.
This study can provide implications for depicting areas lacking access to groundwater potential in Trenggalek Regency.
All models indicate that the study area is dominated by low potential.
Some parts of the area, however, have high potential.
This information can be leveraged for decision-making in providing equitable access to freshwater resources in Trenggalek Regency during such emergency conditions as drought.
The primary contribution of this study is to explore the ML.
DL, and SL integration algorithms for groundwater potential.
Moreover, this study bridges the black-box problem of the prediction model by applying PFI and PDP.
Therefore, it is important to make the model more interpretable.
The limitation of this study is not conducting feature selection, which may improve the performance of the prediction model and increase its effectiveness due to reduced computational cost.
Further exploration of AI-based algorithms and other parameters is recommended, as the best prediction model in this study may not necessarily perform the following when applied to other study areas due to differences in geological and geomorphological characteristics.
Conclusion This study implemented and evaluated AI algorithm-based prediction models.
This study utilizes two ML algorithms, namely RF and GBDT, two DL algorithms, namely RNN and CNN, an SL ML algorithm, an SL DL algorithm, and an SL ML-DL algorithm.
Based on ACC.
MCC.
CK, and ROC-AUC, the SL ML model has the best performance.
The mean evaluation values of the SL-ML model in ACC.
MCC.
CK, and AUC are 0.
957, 0.
915, 0.
915, and 0.
99, respectively.
Nevertheless, all prediction models performed very well with ACC>0.
MCC>0.
CK>0.
85, and AUC>0.
According to the PFI value.
GU was a parameter with the highest contribution to all prediction models.
GU PFI values for the RF.
GBDT.
CNN.
RNN, and GBDT models are 0.
300, 0.
242, and 0.
294, respectively.
All prediction models indicate that Trenggalek Regency was dominated by the low-potential class, ranging from 22.
32% to 31.
Based on this information, the government can serve as a basis for the sustainable management of clean water resources.
In addition, this information will help reduce the cost of locating well drilling sites to meet the supply demands of various sectors.
Future research should consider validating groundwater prediction models derived from AI-based algorithms using ground-based data, such as geoelectric surveys.
Geoelectric surveys can be conducted across multiple study areas, including low- and high-potential zones.
By validating these models, the resulting predictions become more reliable for subsequent policy application.
Additionally, feature selection can reduce redundancy in modelling parameters and provide an efficient process, leading to improved predictive model performance.
The use of higher-resolution data is highly recommended to produce models suitable for more detailed regional planning.
Exploration of variables affecting groundwater potential and various other algorithms is necessary.
Testing different algorithms is essential to obtain optimal groundwater potential prediction models for other case studies with different geological and topographical conditions.
Ummah & Diyono Page 415 Forum Geografi, 39.
, 2025.
DOI: 10.
23917/forgeo.
Acknowledgements We would like to express our gratitude to the Trenggalek Regency Government, the Indonesian Geospatial Information Agency, the Indonesian Volcanology and Geological Disaster Mitigation Center, the United States Geological Survey (USGS), the National Aeronautics and Space Administration (NASA).
Murray Lab, and the European Centre for Medium-Range Weather Forecasts (ECMWF) for providing access to data to support this study.
Author Contributions Conceptualization: Ummah.
Diyono.
methodology: Ummah, investigation: Ummah.
writingAioriginal draft preparation:
Ummah.
writingAireview and editing: Diyono.
Ummah.
All authors have read and agreed to the published version of the manuscript.
Conflict of interest All authors declare that they have no conflicts of interest.
Data availability Data is available upon Request.
Funding The research was funded by the Indonesia Endowment Fund for Education Agency .
ell known as LPDP or Lembaga Pengelola Dana Pendidika.
under the Ministry of Finance of Indonesia.
Ummah & Diyono
References