Journal of the Civil Engineering Forum. January 2026, 12. :23-39 DOI 10. 22146/jcef. Available Online at https://jurnal. id/v3/jcef/issue/archive Enhancing Soil Liquefaction Prediction: Overcoming Data Challenges in SPT-Based Machine Learning with Imputation Technique Fandi Fadliansyah1 . Fikri Faris1,4* . Wahyu Wilopo2,4 . Ardiansyah3 1 Department of Civil and Environmental Engineering. Universitas Gadjah Mada. Yogyakarta. INDONESIA 2 Department of Geological Engineering. Universitas Gadjah Mada. Yogyakarta. INDONESIA 3 Department of Computer Science. Faculty of Mathematics and Natural Sciences. Universitas Lampung. Lampung. INDONESIA 4 Center for Disaster Mitigation and Technological Innovation (GAMA-InaTEK). Universitas Gadjah Mada. Yogyakarta. INDONESIA *Corresponding author: fikri. faris@ugm. SUBMITTED 05 May 2025 REVISED 20 July 2025 ACCEPTED 23 July 2025 ABSTRACT In addition to the adverse effects of earthquakes, the loss of soil-bearing capacity during liquefaction can exacerbate damage to buildings. Liquefaction phenomena involve many parameters, making it more complex to evaluate. Machine learning has been studied to deal with liquefaction complexity in recent decades. However, incomplete liquefaction data can result in missing information, complicating model development across various datasets. Therefore, this study aims to assess the capability of machine learning models to predict liquefaction by implementing the missing value imputation technique. Seismicity, soil properties, and soil condition parameters were utilized to develop models. Random Forest (RF), k-Nearest Neighbor . -NN), and eXtreme Gradient Boosting (XGBoos. were trained by applying feature selection and parameter optimization based on standard penetration test (SPT) data. The confusion matrix was used to assess the performance of the model based on the performance matrix of Overall Accuracy (OA). Precision (Pre. Recall (Re. F1-Score (F. , and Area Under the Curve (AUC). In addition, the preprocessing stage included data normalization and outlier treatment to enhance the reliability of model predictions, ensuring consistent learning behavior across different variable scales. The results show that the RF achieved the highest performance (OA = 90. 71%), which is comparable to findings from other previous studies. The AUC results indicate that the models deliver excellent classification performance. These findings suggest that the integration of imputation and preprocessing techniques can significantly improve data-driven approaches in geotechnical earthquake engineering. In conclusion, the missing imputation is quite effective in the predictive model. Finally, this study offers a new perspective on developing machine learning models using a more user-friendly software and applying imputation techniques to handle missing data. KEYWORDS Machine learning. Missing value imputation. Soil liquefaction. Earthquake. Standard penetration test. A The Author. This article is distributed under a Creative Commons Attribution-ShareAlike 4. 0 International license. 1 INTRODUCTION 1 Seismic-Induced Liquefaction and Advances in Identification Methods The movement of tectonic plates can lead to vibrations known as earthquakes. The intensity of an earthquake can lead to devastating effects. In addition, earthquakes, earthquakes can trigger other natural disasters, such as soil liquefaction. Soil liquefaction can worsen the damage to building infrastructure. This phenomenon caused extensive damage, as happened in the 1964 Niigata earthquake in Japan, the 1964 Alaska earthquake, the 1999 Chi-Chi and Kocaeli earthquakes, and the 2018 Palu earthquake. Soil liquefaction can cause the buildingAos foundation to crack and collapse, resulting in fatalities. As a result of liquefaction, formerly solid soil transforms into a liquid, causing structures above it to sink and damaging underground infrastructure such as pipes and cable networks. Therefore, identifying the vulnerability of soil liquefaction is critical. A comprehensive knowledge of liquefaction empowers engineers, planners, and policymakers to identify areas that may undergo liquefaction effectively. Liquefaction assessment is particularly challenging because many variables that affect liquefaction do not always correlate directly. In addition, identifying liquefaction potential becomes more complex because there is no single conclusive Experts assess liquefaction susceptibility by analyzing several factors, such as seismicity and soil conditions. This approach gained acceptance until the simplified procedure, an empirical method, emerged. The simplified procedure, introduced by Seed and Idriss . , is frequently used to evaluate liquefaction potential. The simplified approach for assessing liquefaction potential involves determining the safety factor by comparing the cyclic resistance ratio (CRR) and the cyclic stress ratio (CSR). This method has been continuously updated (Youd Journal of the Civil Engineering Forum et al. ,2001. Idriss and Boulanger,2008. Boulanger and Idriss,2. through empirical evaluation, laboratory test results, and the availability of field test Soil investigation data using standard penetration test (SPT), cone penetration test (CPT), and shear wave velocity (VS ) techniques are examples of field test data that are commonly used for empirical or semi-empirical analysis. The SPT method is the most widely used in estimating liquefaction among the three methods. 2 Overview of Machine Learning Models in Liquefaction Assessment Geological processes, composition, depth, and other factors influence soil characteristics. The simplified procedure method occasionally cannot accommodate the variability of complex soil properties. Furthermore, the parameters used to assess liquefaction tend to have abstract or non-linear relationships with each other (Zhang and Wang, 2021. Demir and Sahin, 2022. Inappropriate handling of soil property variability may lead to a reduction in the accuracy of liquefaction assessment. Thus, an alternative method is needed to deal with such Machine learning algorithms can handle complex problems by determining the hidden patterns in data that humans may not notice (Xie et al. , 2. This capability allows for a better understanding of the factors inducing liquefaction, including the interactions between various geotechnical variables. Machine learning is increasingly utilized across various fields to address complex issues and identify hidden information in large datasets. It has proven reliable in solving geotechnical problems (Puri et al. , 2018. Tang and Na, 2021. Galupino and Dungca, 2022. Torres and Dungcaa, 2. In recent decades, various researchers have used machine learning to estimate liquefaction vulnerabilities. A previous study conducted by (Gandomi et al. used 620 data points from the last liquefaction occurrences to compare the performance of the decision tree (DT) approach and the logistic regression (LR) algorithm. They concluded that DT could successfully predict liquefaction and outperform the LR Using 24 field observations from the 1976 Tangshan earthquake and the 1997 Sanshui earthquake. Xue et al. employed a hybrid probabilistic neural network (PNN) and particle swarm optimization (PSO) method to predict liquefaction. The result indicates that PSO-PNN is an effective instrument for analyzing the complex relationship between soil and seismic properties in liquefaction Some probabilistic approaches were introduced by Zhao et al. 1, 2022, 2. to evaluate the liquefaction potential. The models were developed using Python programming software. The probabilistic Vol. 12 No. 1 (January 2. method was found to be reliable in predicting liquefaction using CPT-based, as well as the combination of VS and CPT-based datasets. Kumar et al. developed machine learning models using soft computing techniques based on SPT data. The result implies that the XGBoost model is the most efficient of the four models. Three ensemble algorithms. AdaBoost. Gradient Boosting Machine, and XGBoost, were performed using 620 SPT-based data from the 1999 Chi-Chi and Kocaeli earthquakes by Demir and Sahin . to estimate liquefaction susceptibility. The study used pseudocode through the R software with the AycaretAy library package. The findings imply that all the generated models perform well in predicting soil liquefaction. Khatti et al. compared conventional and hybrid models in predicting liquefaction using CPT-based data. The results suggest that hybrid models outperform conventional ones, while the k-NN model performs over 90%. 3 Research Gap. Objectives, and Novelty of the Study Although substantial research has focused on liquefaction prediction using machine learning, a thorough literature review reveals that no comparative study specifically addresses missing value imputation for liquefaction prediction with these techniques. Several previous studies, such as those conducted by Hu et al. Hu and Wang . have eliminated the data containing missing values, which may lead to the loss of important information. Also, it can impede the further development of the models when combining data from different sources. In addition, some earlier studies developed machine learning models by using coding techniques (Demir and Sahin,2022a. Zhao et al. ,2024. Khatti et al. ,2. , which may be difficult for non-expert users or practitioners with no or limited computer programming expertise. Therefore, developing the machine learning model using other relatively large liquefaction world case data and utilizing the missing value imputation technique to deal with incomplete data using a user-friendly method for non-expert users is Considering the limitations of previous studies, this study aims to assess the reliability of machine learning in predicting soil liquefaction by utilizing a missing value imputation technique to address incomplete data. The research findings are expected to provide a new perspective for future studies in developing machine learning models to evaluate liquefaction phenomena by employing the missing value imputation technique, allowing larger datasets and a more diverse set of features or parameters, even with varying data completeness. Using a more user-friendly method, this study can hopefully provide new insight into developing machine learn- Vol. 12 No. 1 (January 2. Journal of the Civil Engineering Forum ing models for practitioners or non-experts with no or limited experience in computer programming. 2 LIQUEFACTION HISTORICAL DATA mately 208 data points . liquefaction and 95 non-liquefactio. were used out of an entire set of 210, since two were identified as marginal liquefaction. Data selection considered reliable sources, complete parameters, and liquefaction potential. Proper data is essential for developing robust machine learning models that achieve high levels of The liquefaction prediction model was developed using historical earthquake and liquefaction data. This study used 1116 SPT-based data compiled from various earthquakes around the The data utilized in this paper exceeds that of other previous studies, such as those conducted by Gandomi et al. Xue et al. Demir and Sahin . The data used mainly originates from three previous studies, grouped as Dataset A. Dataset B, and Dataset C. Dataset A consists of 288 liquefaction data from the Chi-chi earthquake, as compiled by Hwang and Yang . The record contains 164 liquefaction and 124 non-liquefaction data from soil investigations conducted before and after the earthquake. Dataset B includes 620 sets of data, consisting of 256 liquefaction and 364 non-liquefaction data, compiled by Hanna et al. The dataset is a compilation of soil investigations studied at many sites in Taiwan and Turkey following the 1999 ChiChi and Kocaeli earthquakes that caused liquefaction in various areas. Dataset C contains historical earthquake and liquefaction data cases compiled by Cetin et al. It includes soil investigation data from major Japanese earthquakes, such as Niigata . Tokachi-Oki . MiyagikenOki . Nihonkai-Chubu . , and HyogokenNambu . It also covers seismic events in the Americas, like the Argentina earthquake . , the Imperial Valley earthquake . , the Loma Prieta earthquake . , and others with moment magnitudes (Mw ) ranging from 5. 9 to 8. Approxi- Figure 1 The distribution of missing data for each parameter. Ten parameters were utilized in this study, including earthquake magnitude (Mw ), peak ground acceleration . max ). SPT blow number ((N1 )60 ), fine content (F C), median grain size (D50 ), depth of soil layer . , groundwater table depth . w ), total vertical stress (Ev ), effective vertical stress (Ev A ), and cyclic stress ratio (CSR). Among these. D50 . Ev , and Ev A contain missing values Figure 1. Descriptive statistical and correlation analyses were conducted to determine the structure, distribution, and relationships between the variables, following the approach of several previous studies, such as Kumar. Samui and Burman . Kumar. Samui. Burman. Wipulanusat and Keawsawasvong . The statistical Table 1. Statistical descriptions of the dataset Parameter Magnitude Peak Horizontal Acceleration . SPT blow number Fine content (%) Median grain size Depth of soil layer Groundwater Table Depth . Total Vertical Stress . Effective Vertical Stress . Cyclic Stress Ratio Notation Mean Median Mode Standard Deviation Minimum Maximum (N1 )60 D50 Ev Ev A CSR Journal of the Civil Engineering Forum Vol. 12 No. 1 (January 2. Table 2. Correlation matrix of the parameters amax . (N1 )60 F C (%) D50 . Ev . Ev A . CSR . (N1 )60 (%) D50 . Ev . Ev A . CSR description of the dataset is described in Table 1. Mw and amax are seismic parameters mostly used in liquefaction susceptibility assessment. Liquefaction susceptibility increases with an increase in Mw and amax . The dataset contained a maximum peak acceleration of 1g and a minimum acceleration of Saturated, loose sandy soils with low fine-grain contents are prone to liquefaction. However, sandy soils with a high fine-grain content may also liquefy under certain circumstances. Consequently, liquefaction potential evaluation depends extensively on (N1 )60 and F C. The descriptive statistical analysis of the F C and D50 results indicated that the historical liquefaction data used to develop the machine learning model represent liquefaction on clean sand and liquefaction that occurs in soil layers with high fine content. The dataset contains liquefaction in soil layers with SPT values less than 29. The sandy soil layer will undergo liquefaction when located below the groundwater table because it can increase the potential of excess pore water pressure. A study conducted by Zakariya et al. show a correlation between liquefaction potential and excess pore water pressure. In addition, the results also show that liquefaction commonly occurred in a shallow soil layer. Therefore, the soil layer depth and groundwater table depth are important in evaluating liquefaction Table 2 displays the correlation matrix values between the used parameters. Red represents a positive correlation, and the color green represents a negative correlation. A higher intensity of color shows a stronger connection among the factors. The weak correlation between parameters indicates that each parameter has no linear relationship. The correlation between parameters is expressed in the R-value as the correlation coefficient. The R-value classification consists of . > 0. 8, which shows a very strong correlation. 61 Ae 0. 80 shows a strong . 41 Ae 0. 60 presents a moderate relationship. 21 Ae 0. 40 represents a weak correlation. < 0. 20 shows no correlation between the parameters (Khatti and Grover,2024a. Samadi et al. , 2. Generally, the parameters used have no or weak correlation, which may indicate that the correlation between the parameters is not linear but may be non-linear. The non-linear have a strong correlation. For example, the correlation between total vertical stress and effective vertical stress with the depth of the soil layer, as well as the peak horizontal acceleration with the cyclic stress Based on Table 2, total vertical stress and effective vertical stress strongly correlate with the soil layer depth (R > 0. Theoretically, the effective vertical and total vertical stress values increase with soil layer depth. The peak ground acceleration parameter and cyclic stress ratio have a strong positive correlation (R = 0. , whereby an increase in the peak ground acceleration value is correlated with an increase in the cyclic stress ratio value. 3 METHODS The study began with collecting historical liquefaction data from various sources. The data was then prepared before being used to develop the model, train it with different techniques, and evaluate its liquefaction prediction performance. The processes of this study are represented in Figure 2. The model-building processes were conducted using RapidMiner Studio 10. 001 software, provided with various extra extensions. The computer that used to run the models was powered by an 11th Gen Intel(R) CoreE i7-1165G7 @2. 80GHz processor, 16 GB of RAM, and an Nvidia GeForce MX450 graphics card with 2 GB of VRAM running on Windows 10 Home 64-bit. Vol. 12 No. 1 (January 2. Journal of the Civil Engineering Forum 1 Preprocessing data 1 Missing value imputation Preprocessing data is an early stage in machine learning that significantly impacts model performance. This stage involves cleaning, transforming, and preparing raw data for the effective construction of prediction models by machine learning algorithms. In general, preprocessing data can consist of multiple operations, such as missing value imputation, outlier identification, and normalization. Missing value imputation approaches can be grouped into statistical and machine learningbased techniques (Aittokallio, 2010. GarcyaLaencina et al. , 2. Statistical approaches work by filling in missing values with the available dataAos mean, median, or mode for the same attribute. Machine learning approaches detect missing values using algorithms, including decision trees, neural networks, and k-NN. Statistical methods are often simpler and faster than machine learning methods, while machine learning methods tend to provide more accurate results but are more computationally intensive. One of the most popular imputation techniques is k-NN machine learningbased imputation, built on the notion that data with comparable features will be close together in the feature space (Lin and Tsai, 2. Several classification methods can perform better using the k-NN imputation approach (Pan et al. In the context of missing value imputation, k-NN looks for the k-nearest neighbors of a data point with a missing value and utilizes the values from these neighbors to replace the missing value. The number of the closest data affected the prediction of the missing value. Several trial-and-error experiments were done to obtain the ideal k value. The missing value imputation process used the AuImpute Missing ValuesAy operator in RapidMiner. 2 Outlier detection Outlier identification is critical in data sets because it eliminates data that may affect model It can be done using fundamental statistical-based methods, such as z-score and interquartile range, or classification methods employing algorithms, one of which is a distance-based outlier detection approach (Aggarwal, 2017. Mandhare and Idate, 2. The output of outlier detection can be an outlier score and a binary label. The k-NN concept is used for distance-based outlier detection. This method is suitable for highdimensional data, offering low computational costs and good efficiency (Mandhare and Idate, 2. The outlier detection and filtering process was done using the AuDetect Outlier (Distanc. Ay and AuFilter ExamplesAy operators in RapidMiner. The operator identifies n outliers in the given dataset based on the distance to their k nearest neighbors by applying several experimental trial-and-error methods to obtain the optimal k value. Figure 2 The process chart of the methodology used in this study. Journal of the Civil Engineering Forum Vol. 12 No. 1 (January 2. 3 Data normalization 3 Feature Selection Method The data collection process may lead to nonuniform value scales for each attribute. Therefore, data normalization is a necessary step in machine learning data preprocessing. Its objective is to rescale or range the input variables, resulting in a uniform or standardized distribution of the data. the previous study. Kumar. Samui. Burman. Biswas and Vanapalli . Kumar. Samui. Burman and Kumar . used datasets with different scales and standardized them into a 0 Ae 1 range to minimize the dimensional effect of the input parameters. Therefore, in this study, all input and output variables were normalized between 0 and 1 during preprocessing to prevent issues with machine learning algorithmsAo learning rates. The standard data normalization formula used in the current study is shown in Equation . Feature selection is an important stage that could simplify the model-building process. Although not every feature selection can enhance the modelAos performance, this technique might help the model perform better by considering several factors, including the data and method used (Theng and Bhoyar, 2. This Feature selection might reduce the datasetAos dimensionality to improve the modelAos performance by selecting the most relevant features or parameters to use in model development (Shi et al. , 2. Reducing the dimensions of the input data can decrease the complexity of the algorithmAos modeling, which may reduce the time required for the model to execute. One of the widely used feature selection methods is the wrapper. The wrapper feature selection method is prone to overfitting (Dhal and Azad, 2. X Oe Xmin Xnorm = Xmax Oe Xmin Xmin and Xmax stand for the maximum and minimum values of each given feature, respectively, and X represents the value of the actual feature. Xnorm stands for the value of the normalized feature. 2 Stratified Random Sampling The sampling technique is a fundamental component of machine learning. It is carried out at the data processing stage before it is used to train and evaluate the model. Sampling techniques divide the dataset into training and testing data samples in a certain ratio, commonly known as the data split process. The two most common sampling techniques used in machine learning are Simple Random Sampling (SRS) and Stratified Random Sampling (StRS) (Demir and Sahin, 2022. StRS has various advantages over SRS despite being more complex due to the need for a data division process and requires accurate information on the proportions of each stratum. This technique can generate representative and proportional sampling results from each stratum, making it appropriate for use on datasets with class imbalance. Based on the results of a study conducted by Ye et al. StRS can reduce test error and boost AUC for highdimensional data. Furthermore, because each stratum is represented well (Acharya et al. , 2. , this technique can perform well on heterogeneous data and reduce bias. Due to the class imbalance in the data used and to ensure the uniform distribution of the dataset, the sampling process was done using the StRS technique through the AuSplit DataAy operator in RapidMiner. Forward Selection (FS) and Backward Elimination (BE) are the most common wrapper methods used in feature selection (Sahin and Demir, 2. begins with an empty model and iteratively adds more variables to the model individually, depending on specific criteria. On the other hand. BE begins with a model containing all variables and proceeds to remove variables one by one that are considered unimportant according to certain criteria. The feature selection processes used the AuForward SelectionAy and AuBackward EliminationAy operators in RapidMiner software. 4 Hyperparameter Optimization Hyperparameter optimization is one of the important processes in machine learning. Even though not every hyperparameter tuning can enhance the model performance, this procedure should help the model perform better when predicting new data by finding the best hyperparameter combination. Each algorithm has its hyperparameters that should be adjusted before training to achieve the best performance (Demir and Sahin, 2. It is essential to avoid using the test data when tuning parameters (Probst et al. , 2. The most common method used in this process is grid search. In addition to grid search, there are several other methods, such as random search and Bayesian optimization. Grid search is a systematic search technique that finds the optimal combination of hyperparameters by thoroughly searching over a predetermined range (Ranjan et al. , 2. Grid search is an exhaustive optimization method. This method performs hyperparameter tuning by training and evaluating the model for each specified combination of hyperparameters. Hence, this method incurs a high computational cost as compensation. In contrast, this method ensures that all hyperparameter combina- Vol. 12 No. 1 (January 2. tions are examined and analyzed, resulting in the best and most optimal results. The grid search hyperparameter selection processes used the AuOptimize Parameters (Gri. Ay operator in RapidMiner, enabling parallel execution. 5 Machine Learning Classifiers 1 k-nearest neighbour . -NN) The k-NN algorithm is a fundamental approach for handling classification and regression issues in machine learning that belongs to the supervised learning category. The core premise of k-NN is to create predictions based on data that is most similar to the data to be predicted. Classification is performed by identifying an instanceAos label using the most frequent labels from its k-nearest neighbors as a reference. The number of nearest neighbors and distance used as references can be adjusted to meet those criteria and produce the best results. The k-NN process involves two stages: selecting the nearest neighbor as a reference and determining the class based on this neighbor to identify a value or feature of the data (Cunningham and Delany, 2. This algorithm is noise-resistant since it relies on most of the nearest neighbors and is known to handle non-linear data, making it more adaptable to various sorts of data. According to Manzali et al. , the k-NN performance can be improved by assigning a greater weight to the significant characteristic, as it can be beneficial when performed using a high-dimensional dataset. This study applies the k-NN algorithm through the Auk-NNAy operator in RapidMiner software. This operator provides four measure types: mixed measures, nominal measures, numerical measures, and Bregman Several parameters may be tuned to increase model performance, including the number of nearest neighbors . , distance metrics (Euclidean. Manhattan, etc. ), and weighting methods. 2 Random forest (RF) RF algorithm is an ensemble learning approach capable of handling data complexity and variability. Similar to how several kinds of trees make up a forest. RF comprises several decision trees belonging to various data subsets. RF combines the concept of AubaggingAy . ootstrap aggregatin. with decision trees, which are groups of randomly generated decision trees. It was established by Breiman . and has become one of the most used machine learning algorithms due to its excellent classification and regression capabilities. Journal of the Civil Engineering Forum According to various studies. RF has several benefits over similar machine learning methods, including the capacity to avoid overfitting, producing consistently excellent performance across datasets, and being insensitive to outliers (Cutler et al. Genuer et al. , 2010. Roy and Larocque, 2. Besides, since multiple decision trees are made. RF tends to be less sensitive to high variability and heterogeneous data. According to Gregorutti et al. RF performs well even with highdimensional data and can handle multi-class output even when using imbalanced data. Some previous studies show that RF generally performs well in predicting liquefaction (Demir and Sahin, 2022a,. Therefore. RF is utilized in this work and will be compared with other ensemble and basic algorithms. In this study, the default configuration of the RF algorithm used was the Aumajority voteAy with the Auinformation_gainAy criterion. RF capability can be improved by optimizing several hyperparameters, such as the number of trees, mtry, sample size, node size, and max depth (Probst et al. , 2019. Demir and Sahin, 2022. 3 eXtreme gradient boosting (XGBoos. XGBoost has recently emerged as one of the most powerful and widely used machine learning methods. This algorithm, which uses a decision tree as the basis model, was introduced by Chen and Guestrin . It implements the Gradient Boosting algorithm to increase speed, performance, and Until then. XGBoost was one of the most precise and accurate decision tree-based algorithms for handling classification and regression issues. The basic concept of XGBoostAos learning method is ensemble learning, which combines several weak learners to build a more powerful learner. XGBoost applied a gradient boosting approach, where each subsequent model focuses on reducing the previous modelAos prediction error. XGBoostAos performance has been evaluated in multiple studies utilizing a variety of datasets, including public, medical, and geotechnical datasets, with great results Can et al. Paleczek et al. Zhang et al. This method can perform effectively with imbalanced data by adjusting the class XGBoost provides many adjustable hyperparameters that users can customize to increase model performance and improve accuracy. Various prior studies have done the research and discovered various hyperparameters that have significant effects on XGBoost performance, such as learning rate, subsample, max_leave, max_depth, min_child_weight, and number of rounds (Wang and Sherry Ni, 2019. Demir and Sahin, 2. Journal of the Civil Engineering Forum Vol. 12 No. 1 (January 2. 6 Performance Evaluation Criteria The performance of a model must be assessed to determine its ability to make predictions. Machine learning can be evaluated using various approaches. This study employs the Confusion Matrix (CM) approach to evaluate model performance. The CM is a table presented in matrix form, depicting the modelAos performance by comparing its prediction results to the actual values of the test data. It includes four parameters: true positive (TP), true negative (TN), false positive (FP), and false negative (FN) Figure 3. TPR, also known as Recall or sensitivity, can be calculated using Equation . , while FPR can be calculated using Equation . The AUC value can be calculated from the total area under the ROC curve (Gorunescu, 2. Its value will always be between the range of 0. 0 and 1. As the AUC value gets closer to 1. 0, it indicates a better classification performance of the model. conversely, it indicates a worse performance when it gets closer to 0. FPR = FP TN 4 RESULTS AND DISCUSSION 1 Preprocessing Data Figure 3 Confusion matrix illustration. OA = TP TN TP TN FP FN P rec = TP FP Rec = TP FN F1 = 2 A P rec A Rec P rec Rec The model performance matrix can be calculated based on the CM parameters, which include Overall Accuracy (OA). Precision (Pre. Recall (Re. , and F1-score (F. OA indicates the rate at which the model correctly predicts the class Equation . the same time. Prec represents the proportion of correct positive predictions among all positive predictions Equation . Rec determines how often the model correctly identifies the correct class out of any instance that should be considered positive Equation . F1 is the harmonic mean of Prec and Rec, providing a balanced assessment of the modelAos performance Equation . Receiver Operating Characteristic (ROC) and Area Under the Curve (AUC) are curve-based tools to measure model performance. Both are commonly used metrics in binary classification. The ROC curve shows how well the model performs at different classification thresholds. The curve is created by plotting the true positive rate (TPR) on the y-axis against the false positive rate (FPR) on the x-axis. The data preparation stage is the initial and most important stage in developing a machine learning This process includes identifying and managing missing data, detecting and eliminating outliers, and normalizing the data before splitting the dataset into training and testing data. The first step is to detect and manage any missing values in the liquefaction dataset. Missing data values were identified for the parameters median grain size (D50 ), total vertical stress (Ev ), and effective vertical stress (Ev A ). These missing values were addressed using the Auimpute missing valuesAy operator in the RapidMiner application, which employs an algorithmic approach to estimate missing values. Specifically, the k-NN method was used to estimate and fill in the missing values for these parameters. The number of neighbors was selected by applying several experiments of trial-and-error methods using numbers ranging from 5 to 30 in multiples of 5, considering the size of the dataset. Some other missing value imputation techniques were These included replacing the missing data with minimum, maximum, and mean values. However, the k-NN-based method still outperformed the others. Once the missing data was processed, the Audetect outliers . Ay operator was used to identify outliers using the k-NN approach. The identified outliers were then removed from the modeling process. Next, the data was normalized to ensure that large-scale values did not dominate the model development process. Finally, the processed dataset was split into training and testing data. Some previous studies showed that the data ratio affects the performance of the developed model (Pham et al. , 2020. Nguyen et al. A study conducted by Pham et al. showed an increase in model performance as the amount of training data increased from 30% to 80%, while the model began to decrease in performance when using 80% to 90% of training data. Therefore. Vol. 12 No. 1 (January 2. Journal of the Civil Engineering Forum considering those findings, several trial-and-error experiments were applied, and 75:25 was obtained as the optimum data ratio. It means that 75% of the total processed data was used to train the model, while the remaining 25% was used to test it. 2 Feature Selection Using the Wrapper Method The feature selection process aims to choose the most relevant and significant subset of features from the provided dataset. By reducing the amount of data, feature selection is attempted to enhance the modelAos predictive accuracy. This study utilizes a feature selection approach on a dataset with ten features (Mw , amax , (N1 )60 . F C. D50 , z, dw . Ev . Ev A , and CSR). FS and BE were employed to identify the most significant and relevant features to be included in the modelAos construction. Both methods were implemented using the RF algorithm and kfold cross-validation. The k-fold cross-validation approach involves splitting the dataset into k subsets. Each subset is used as training data k-1 times and testing data once. This process is repeated k times, ensuring each subset serves as test data once. The average assessment outcomes from each iteration estimate the modelAos overall performance. This approach helps reduce overfitting and delivers consistent prediction performance. This study utilized a k-value of 5, meaning the dataset was uniformly divided into five subsets, alternately used as testing and training data. Table 3. Selected parameters based on the result of feature selection Parameters (N1 )60 D50 Ev A CSR Feature Selection Method RAW *FS : forward selection. BE : back elimination. RAW : standard algorithm without feature selection. Table 3 displays the result of the feature selection method employing BE and FS. The parameters amax , (N1 )60 . F C, z. Ev A , and CSR were identified as having the most important role to be utilized in developing the liquefaction prediction model based on the results of running feature selection. In contrast. Mw is regarded as the least significant among the ten parameters, which runs contrary to the findings of the study carried out by Hu . and Hu et al. The FS method selected six different parameters . max , (N1 )60 . F C,z. Ev A , and CSR), while the BE method only eliminated the parameter Mw from being used in building the model. Nonetheless, the feature selection results are generally consistent with the results of previous studies. Hu . suggests that amax . F C, dw , and Ev A are relatively significant factors. other than that. Hu et al. imply that amax . F C. D50 , dw , z, are the key factors, and Ev A is relatively important. Furthermore, the model will be constructed with all parameters (RAW) to compare the implications of the feature selection approach. 3 Hyperparameter Optimization Using Grid Search This study utilizes the Auoptimize parameters . Ay operator to optimize hyperparameters using 5-fold cross-validation. The hyperparameter combinations optimized in the RF algorithm include Aunumber_of_treesAy. Aumaximal_depthAy, and AucriterionAy. The XGBoost algorithm focuses on optimizing the Aumax_depthAy. Aumin_child_weightAy, and AusubsampleAy while the k-NN algorithm optimizes the hyperparameters k-value. Aumeasure_typeAy. Auweighted_voteAy, and Aunumerical_measureAy. The hyperparameter optimization results are summarized in Table 4. Based on the evaluation of these results, the XGBoost_BE modelAos hyperparameter combination achieved the highest OA score, while the k-NN_BE model had the lowest score. 4 Liquefaction Prediction and ModelAos Performance Evaluation RF. XGBoost, and k-NN algorithms predict liquefaction based on historical earthquake and liquefaction The performance of each model is evaluated using three schemes: standard without feature selection (RAW), and utilizing feature selection (FS and BE). According to previous studies (Demir and Sahin, 2022b, 2023. Hu and Wang, 2. , the five and 10-fold cross-validation were frequently used in liquefaction prediction. Therefore, after experimenting with both 5 and 10 k values, it was determined that the 5-fold cross-validation yielded better performance and lower computational cost. This approach was then utilized to assess the modelAos The modelAos performance was measured by assessing each modelAos OA. Prec. Rec, and F1 values. The summary performance measurement results are shown in Table 5. The confusion matrix evaluation results show that the model can predict liquefaction using the missing value imputation technique. In the FS scheme. Journal of the Civil Engineering Forum Vol. 12 No. 1 (January 2. Table 4. Result of the best hyperparameter combination Feature selection Algorithm Prec Rec 'number_of_trees': 61, 'maximal_depth': 8, 'criterion': information_gain XGBoost 'max_depth': 3, 'min_child_weight': 0. 'subsample': 0. k-NN 'k': 8, 'measure_type': NumericalMeasures, 'weighted_vote': true, 'numerical_measure': ManhattanDistance 'number_of_trees': 160, 'maximal_depth': 18, 'criterion': information_gain XGBoost 'max_depth': 3, 'min_child_weight': 1. 'subsample': 0. k-NN 'k': 10, 'measure_type': NumericalMeasures, 'weighted_vote': true, 'numerical_measure': EuclideanDistance 'number_of_trees': 41, 'maximal_depth': 24, 'criterion': information_gain XGBoost 'max_depth': 4, 'min_child_weight': 1. 'subsample': 0. k-NN 'k': 4, 'measure_type': NumericalMeasures, 'weighted_vote': true, 'numerical_measure': EuclideanDistance RAW the XGBoost model has the highest OA value . 59%) compared to the other algorithms, indicating that the model can predict liquefaction with just six parameters. However, in the BE and RAW schemes. RF achieved the highest scores for OA . 33% and 90. 71%, respectivel. The RF model generally has the best OA score in almost all schemes, with the RF_RAW model having the highest OA (Figure . Overall, the k-NN model without feature selection has the lowest OA performance of any model, with a score of 81. Regarding OA, feature selection improves the performance of the k-NN and, under certain conditions, can improve XGBoost performance. The RF algorithm typically has a desirable OA value without feature selection. This result showed that feature selection does not always Best hyperparameters improve model performance. In certain cases, removing seemingly insignificant features may lead to losing important information. Therefore, it is highly recommended that a thorough model performance evaluation be conducted after the feature selection is applied. The findings suggest that the missing values could still be handled properly since the modelAos performance utilizing missing value imputation is still reasonably comparable to the study conducted by Demir and Sahin . , especially the RF model. The RF model developed by Demir and Sahin . 54% accuracy using stratified random sampling and 93. 24% accuracy using SMOTE. The performance differences stay within a reasonably acceptable range due to the difference in the use of data, software, and some methods, such as missing value imputation. Vol. 12 No. 1 (January 2. Journal of the Civil Engineering Forum Table 5. Result of the modelAos performance evaluation Algorithm XGBoost k-NN Performance Feature Selection Method RAW Prec Rec AUC Prec Rec AUC Prec Rec AUC *The best results are shown in bold. Based on the feature selection result, the FS and BE methods excluded the Mw parameter, and even the FS method excluded D50 . The correlation analysis result shows that these two parameters do not significantly correlate with each other, which may be contrary to the technical perspective. From a technical standpoint, excluding highly correlated parameters can be advantageous for various reasons, such as avoiding redundant information and multicollinearity in models. However, attention to vari- Figure 4 Overall accuracy performance of the models. ous aspects and conditions, such as domain knowledge and model performance, is also appropriate. Apart from that, the exclusion should also be justified by evaluating the modelAos performance. Based on the performance analysis, the RF models tend to achieve the highest performance when using all the This might suggest that the RF model is less sensitive to correlated parameters. This studyAos findings imply that the RF algorithm has the best overall accuracy value without feature selection. This result may indicate that feature selection does not always improve model performance. In certain cases, removing seemingly insignificant features may lead to losing important information. Therefore, it is recommended that a comprehensive evaluation of model performance be conducted after applying feature selection. It should also be noted that several factors affect the impact of the feature selection on model performance, including the data used (Theng and Bhoyar, 2. Precision is the ratio of true positive predictions to the overall data that is predicted to be positive. The higher the precision value, the fewer instances of Aunon-liquefactionAy are classified as AuliquefactionAy by the model. The RF_BE model attained the highest price value of 91. 41%, while the k-NN_RAW model acquired the lowest Prec value of 80. 45% (Figure . Based on the result, the Prec performance of XGBoost and k-NN increased when using the FS method, while the BE method increased RFAos Prec Journal of the Civil Engineering Forum Figure 5 Precision performance of the models. In the case of estimating liquefaction susceptibility, a false negative condition is when the model identifies data as Aunon-liquefactionAy in actual conditions that should be AuliquefactionAy, which is more dangerous than a false positive condition. Model errors in predicting conditions that should be liquefaction but are identified as non-liquefaction can be fatal in planning and construction. Thus, in the liquefaction context, it is generally preferable to minimize false negative conditions, making recall a good choice to utilize as a benchmark when evaluating model performance. The recall is the ratio of the true positive value to the total of the true positive and false negative values. therefore, the lower the false negative value, the higher the Rec value. in other words, the lower the proportion of liquefied conditions that the model fails to detect. In the current study, the RF_RAW model achieved a Rec score of 90. 91%, the highest among the other models (Figure . The RF_RAW model can accurately identify up to 120 out of the 132 actual liquefaction conditions that were evaluated (Figure . There is an imbalance in the amount of AuliquefactionAy class and Aunon-liquefactionAy class data used in this study, with more Aunon-liquefactionAy data than AuliquefactionAy data. As a result, there may be an imbalance in the modelAos ability to forecast the data, with the model being stronger at predicting the Aunon-liquefactionAy condition than the AuliquefactionAy condition. A performance matrix called the F1-Score can be used to evaluate how well models perform when given unbalanced data. The F1 is a performance metric that combines precision and recall into a single value. in other words, it is the harmonic average of Prec and Rec. The test results showed that the RF_RAW model, with a total of 90. 57%, had the greatest F1 value (Figure . The RF_RAW model can Vol. 12 No. 1 (January 2. Figure 6 Recall performance of the models. Figure 7 Confusion matrix of RF_RAW model. predict liquefied and non-liquefied conditions in a generally balanced way, with results in the form of FP and FN values that are typically balanced in percentages with actual conditions. Figure 9 shows the AUC value of the models. The RF and k-NN models achieved the highest AUC values of 0. 952 and 0. respectively, while using backward elimination. the other hand. XGBoost obtained the highest AUC value . using the forward selection scheme. Figure 8 F1-Score performance of the models. Vol. 12 No. 1 (January 2. Journal of the Civil Engineering Forum imputation technique to handle missing data, allowing greater variety in the dataset. To identify the best-performing model, three algorithms, namely k-NN. RF, and XGBoost, were evaluated using different feature selection and parameter optimization Figure 9 AUC score of the models. 5 LIMITATIONS AND FUTURE WORKS Although the findings indicate that the missing value imputation technique could handle missing data properly, this study still has some limitations that need further investigation. This study was done using only an SPT-based dataset. Therefore, further research using other data types, such as VS and CPT-based data, is necessary to investigate the modelAos performance using the missing value imputation technique. The k-NN model seems more sensitive to highly correlated features. hence, exploring other algorithms besides using other hyperparameter techniques is necessary. Future studies should incorporate a larger variety of liquefaction data, and another method should be used to assess the modelAos performance in more detail. A more diverse dataset may provide new, important information, but could impact the modelAos generalizability. Therefore, further investigation is needed by applying other feature selection techniques and hyperparameter optimization techniques in combination with other robust ensemble algorithms to anticipate this possibility. In addition, it is important to explore various missing value imputation techniques to identify the most suitable technique for the liquefaction dataset. 6 CONCLUSIONS Identifying soil liquefaction susceptibility is critical for managing seismic disaster risks. Various methods remain to be developed to identify liquefaction vulnerability and reduce hazards accurately. One extensively studied method for liquefaction prediction is machine learning. In this study, liquefaction prediction was conducted using a larger amount of liquefaction historical data, with a missing value The overall results indicate that all models are still effective in predicting liquefaction, especially the RF model. The RF_RAW model achieved the best performance (OA = 90. 71%) and is still reasonably comparable to the previous study. It may suggest that missing value imputation using the nearestneighbor approach could still handle missing data In general, the RF algorithm outperformed nearly every modeling scheme tested. The RF model performed best when incorporating all data considered, while feature selection generally improved performance for the XGBoost and k-NN It may indicate that RF is less sensitive to correlated data. In the context of liquefaction prediction. Rec is an important metric because reducing false negatives helps prevent larger losses due to inaccuracies in building design and liquefaction mitigation planning. Additionally, the AUC results demonstrate that the models deliver excellent classification performance. Finally, despite the differences in accuracy reported from previous studies, this study hopefully can provide a beneficial perspective for further research into managing missing data using imputation techniques to assess liquefaction vulnerability. This approach allows for the combination of data from many data sources to accommodate other significant information that may be missed. In addition, this study is expected to provide a new perspective for future studies in developing machine learning models to evaluate liquefaction phenomena by using a more user-friendly method that is more usable for non-expert users with no or limited computer programming experience. DISCLAIMER The authors declare no conflict of interest. ACKNOWLEDGMENTS The authors express their sincere gratitude to the Ministry of Public Works for funding this study. REFERENCES