Indonesian Journal of Electrical Engineering and Informatics (IJEEI) Vol. No. September 2025, pp. 703O721 ISSN: 2089-3272. DOI: 10. 52549/ijeei. ye Exploratory Analysis of the Impact of Data Balancing on the ClassifierAos Performance in Predicting Creditworthiness Reliability Md. Mahedi Hassan1 . Arif Hossen2 . Yeasin Arafat3 . Md Nurunnabi sarker4 . Md Hossain Jamil5 . Ayesha Siddika6 1 Computer Science and Engineering. World University of Bangladesh. Dhaka-1230. Bangladesh 2 Business Analytics. International American University. Los Angeles. USA 3 IT Management. Westcliff University. California. USA 4 Data Analysis. Westcliff University. California. USA 5 Information Technology. Humphreys University. California. USA 6 Software Engineering. Daffodil International University. Dhaka-1216. Bangladesh Article Info ABSTRACT Article history: This study examines the application of machine learning algorithms for creditworthiness prediction within the banking sector and addresses the issue of class imbalance through sampling methodologies. The research indicates that using the Stacking Ensemble algorithm with random oversampling can predict creditworthiness with an impressive 93% accuracy. The method consistently achieves excellent precision, recall, and F1-score values, indicating that it can produce accurate predictions while maintaining a balanced evaluation. Random oversampling helps models improve their predictive accuracy and reduce class The research findings underscore the feasibility of this technique for financial institutions, facilitating informed lending decisions and improving credit risk assessment methodologies. This research enhances the field by identifying the most effective machine learning methods for accurate creditworthiness Using XAI tools like Shapash provides financial organizations with valuable insights into assessing loan risks and enhancing their lending operations. Received May 13, 2025 Revised Aug 30, 2025 Accepted Sep 13, 2025 Keywords: Creditworthiness Prediction Loan Eligibility Machine learning Algorithm comparison Imbalance data handling XAI Copyright A 2025 Institute of Advanced Engineering and Science. All rights reserved. Corresponding Author: Md. Mahedi Hassan Department of Computer Science and Engineering World University of Bangladesh Dhaka-1230. Bangladesh Email: mahedi7171@gmail. INTRODUCTION Loan eligibility is crucial in business and banking. Personal and commercial loan applicants must be carefully assessed by lenders. This examination formerly used subjective, time-consuming manual techniques. The emergence of machine learning algorithms and data analysis has driven interest in automated creditworthiness assessment. This research compares creditworthiness-predicting machine learning techniques. Loan defaults affect both established banks and the startup Internet finance industry. Borrowers who default hurt banks and the economy, perhaps causing an economic crisis. To determine loan eligibility, the loan eligibility prediction task evaluates income, credit history, employment, and other characteristics. Traditional rule-based techniques cannot capture complex variable relationships. Logistic regression is used to predict Journal homepage: https://section. com/index. php/IJEEI/ ye ISSN: 2089-3272 loan defaulters . An ensemble model using two or more classifiers is proposed to improve loan acceptance predictions . In contrast, machine learning methods can use such data to make accurate predictions. Through parsing of historical loan transaction data, these algorithms can identify patterns and associations that lending approval decisions can rely on. The effectiveness of machine learning and deep learning models for stock market trend . forecasting has led to their use in various other financial fields with the aim of automating and improving processes. The success of these advanced models has enabled financial operations automation and innovation, which could enhance efficiency and accuracy in manual jobs . This study aims to improve the accuracy of loan eligibility prediction, reduce decision time, and minimize workload. Lenders can even use machine-learning algorithms to help them make better and more calculated decisions on who gets approved for loans, thus reducing the risk factor. Prediction of loan eligibility is one way to achieve this, allowing all individuals to have access to funds on a fair and consistent basis, which are prerequisites for financial inclusion and economic development . Performance variations across the datasets highlight that a model that is robust and can perform well on both large and small sample data is crucial. The goal is to develop a robust model that works in varied data settings. This entails achieving high accuracy in larger datasets and maintaining the modelAos usefulness with small data samples. The goal is to develop a model that overcomes data-set size problems and predicts loan eligibility on different data scales. This method strengthens a modelAos robustness and generalisability for use in real-world settings with diverse and changing datasets. Building an advanced ensemble machine learning model that exceeds existing performance requirements is the thesisAos goal. Advanced strategies like hyperparameter tweaking boost the modelAos prediction power by optimising it further. A brief review with Shapash provides a comprehensive and localized explanation of the most relevant and influential variables in the model decisions, enabling a more accurate interpretation of loan eligibility predictions. To develop a loan qualification system and to mitigate the problems existing in the study, the following objectives are set. A To study and compare the performance of various machine learning algorithms for loan eligibility prediction, and to investigate the applicability of sampling methods on this problem. A To construct an ensemble model capable of delivering superior performance across datasets of varying sizes by optimize the hyperparameters. A To offer insights and recommendations on the experimental results that help financial institutions choose the most appropriate algorithm for predicting loan eligibility with global and local explanations, considering XAI tools. Numerous studies have been conducted to determine the likelihood of someone repaying a loan, employing various algorithms and methods. In the following paragraphs, we talk about several recent studies that look into this topic. The logistic regression model by Mohammad Ahmad Sheikh et al. is essential for predictive analytics loan defaulter prediction. This technique enables for loan default risk assessment, making it easy to choose the right customers for loan offering. If the bank has a solid model to predict which client loans to accept and which to reject, it can reduce loan default risk. Based on the original data set, the maximum accuracy is . Understanding the internal dependent and independent variables requires univariate, bivariate, and multivariate studies. A significant amount of research has been conducted to predict creditworthiness using various algorithms and methods. Ashwini S. Kadam et al. proposed the Naive Bayes model, which is superior to other models in predicting loans . In a thorough investigation . Yong Shic et al. explored the complex domain of customer churn prediction in commercial banks. At the same time. Kwofie et al. also examined the effectiveness of logistic regression in estimating the likelihood of default using data from a microfinance organization . The study suggests using Random Forest and Decision Trees to predict if someone is eligible for a loan based on certain traits. The reported accuracies for these methods are 73 Singh et al. claimed in . that the accurate prediction of the dataset is ascertained by three machine learning algorithms: Decision Tree. XGBoost and Random Forest. Iain Brown et al. did a thorough comparison of several methods used to analyze credit score datasets in . We use logistic regression, random forests, gradient boosting, neural networks, and least square SVM methods. A study . demonstrates that the IJEEI. Vol. No. September 2025: 703-721 IJEEI ISSN: 2089-3272 ye probability of an individualAos loan approval can be forecasted using four machine learning algorithms: Random Forest. SVM. Logistic Regression, and XGBoost. In another study. Odegua achieved a 79% accuracy rate by employing the XGBoost algorithm on a banking dataset. Kwofie et al. utilized logistic regression on microfinance company data to forecast The researchers utilized 90 sampled beneficiaries to construct a logistic regression model and 30 beneficiaries to predict loan defaults. The analysis employed age, marital status, gender, education, business experience, and initial capital as predictive variables. Based on the model, marital status, business years, and starting capital were identified as statistically significant factors. The logistic regression model exhibited minimal explained variability in the response variable, suggesting its inadequacy in properly predicting defaults based on the chosen predictors. Ashwini S. Kadam et al. developed the Naive Bayes model, which is superior to previous models in predicting loans . Amruta S. Aphale et al. automate bank risk assessment by utilizing client creditworthiness. The suggested model predicts the likelihood of a client repaying a loan by examining their behavior. The experiment showed that all algorithms, except for Nearest Centroid and Gaussian Naive Bayes, do well in terms of accuracy and other performance measures. These algorithms were right 76% to 80% of the time. They built a linear regression model that could estimate the likelihood of a person repaying a loan based on the most significant factors. Srinivasa Rao et al. methodologies for assessing credit risk using customer datasets. The proposed approach considers all factors influencing an individualAos loan status and delivers precise outcomes for credit extension or denial. They developed a loan risk analysis system that utilizes five algorithms to integrate the models from the five techniques mentioned. Naive Bayes has the highest accuracy rate . %) of all the models. Despite numerous advances, it remains unknown how to properly handle class imbalance and apply an ensemble model effectively in credit scoring . By applying feature selection and oversampling techniques (SMOTE), it is also demonstrated that the predictive score improves with the stacking ensemble model. For example. Rofik et al. combined stacking and SMOTE, which achieved 83% accuracy on a well-known credit dataset . Other studies also show that ensemble methods, such as Random Forest. AdaBoost. XGBoost, and two-stage models, can help forecast creditworthiness. however, they have issues with uneven data distribution and making the models more complex to understand. For instance. Uddin et al. showed that ensemble models improve the prediction of loan acceptance, but the accuracy rates are still below 90%, which could be improved even further . Additionally, some newly developed techniques for class imbalance learning, including asymmetric adjusted activation function . also demonstrate that the treatment of minority class representation is of paramount importance when dealing with credit scoring tasks and statistics relevantly show the overfitting problem that models tend to favor majority good customers . Several previous studies have shown good results with their proposed models. However, an ensemble model that uses different classifiers is a better and more accurate option, which could outperform other individual models. Consequently, the principal aim of this research is to develop resilient models for assessing creditworthiness, utilizing eleven machine learning techniques. This contribution also features next-level explainability with XAI tools, which isnAot very common in existing research. The succeeding sections of this work are organized as follows. Section 2 describes how our proposed method operates and outlines the experimental setup. In this section, we provide a comprehensive explanation of the method we employed to address the research challenge. This document provides a comprehensive account of the methods, procedures, and tools employed in our research. Section 3 presents the findings from our experiments. In Section 4, the report concludes with a summary of our findings and an examination of their We also provide an overview of relevant topics for future research and development in this sector, highlighting various approaches to further research and improvement. METHOD This section covers over the preprocessing strategies used to clean and prepare the data for analysis. will look at solutions for managing unbalanced data, specifically the issue of unevenly distributed classes in the Exploratory Analysis of the Impact of Data Balancing on the ClassifierAos. (Md. Mahedi Hassa. ISSN: 2089-3272 target variable. Furthermore, the chapter will present information about several machine-learning techniques. It will also go into detail about the hyperparameter tuning procedure and the feature importance analysis technique. Dataset Description The Dream Housing Finance company collected the data in the dataset to automate the process of determining who is eligible for a loan . The idea is to identify groups of customers who are eligible for a loan based on the information provided in an online application form. The dataset includes these variables: Table 1. A short summary of the Dataset Feature Loan ID Gender Married Dependents Education Self Employed ApplicantIncome CoapplicantIncome LoanAmount Loan Amount Term Credit History Property Area Loan Status Description A unique code assigned to each loan application Whether the applicant identifies as male or female Indicates if the applicant is married or single The number of people financially dependent on the applicant The applicantAos level of education, either graduate or undergraduate Shows if the applicant runs their own business or not The applicantAos yearly income The annual income of the co-applicant, if there is one The amount of money the applicant is asking to borrow . n thousand. How long the applicant has to repay the loan, in months Whether the applicantAos credit record meets the lenderAos standards . for yes, 0 for n. The type of area where the property is located: urban, semi-urban, or rural The final decision on the loan application, approved or not approved These variables capture various attributes and information about the loan applicants, their financial situation, and the properties associated with the loan applications. The dataset contains a total of 614 entries. However, some columns have missing values . onnull count is less than . The data types of the columns include float64 . or numerical value. , int64 . or integer value. , and object . or categorical value. Overview of the Methodology Various machine learning methods have been applied in the banking sector to assess creditworthiness. A relevant dataset was obtained from Kaggle and analyzed using Python within the Anaconda Jupyter Notebook environment. The methodology included data collection, preprocessing, data splitting, algorithm training and testing, performance evaluation, and prediction formulation. Performance metrics such as precision, recall, accuracy, and F1 score were used to address data bias. The results were visualized to support more effective credit risk assessment by banks. A 10-fold cross-validation procedure was implemented to ensure model stability across multiple datasets. Grid Search and Random Search improved model performance by tweaking To improve input feature quality, feature engineering methodologies and data pretreatment methods like missing value handling and categorical variable encoding were examined. Ensemble methods were compared to find the best ones. Shapash study illuminated feature importance in model decision-making. Potential dataset discrepancies were reduced and ethical concerns addressed. This strategy organises our study and helps us predict banking creditworthiness. Metrics assess algorithm performance. These indicators reveal several facets of the modelAos creditworthiness prediction. Imbalanced Data Handling Techniques Random Over Sampler The Random Over Sampler is a machine learning technique that addresses class imbalances in datasets. It operates by randomly replicating examples from the minority class until the classes are balanced. This would result in making the instances more uniformly spread across different clusters, which assists the classifier by having a better spread of instances. SMOTE SMOTE (Synthetic Minority Oversampling Techniqu. is a method for addressing imbalanced data in classification tasks. It works by interpolating to create new examples of the minority class that fall between the existing ones. This helps balance the number of examples in each class, making the classifier more effective in both classes. IJEEI. Vol. No. September 2025: 703-721 IJEEI ISSN: 2089-3272 ye Figure 1. Workflow of the Proposed Methodology NearMiss Another way to cope with uneven data is to use NearMiss. To make this work, you pick instances from the majority class that are AycloseAy to examples from the minority class. It is crucial to choose examples from the majority class, and they must also be fairly representative of other data points in the minority classes to even out the distribution ratio. SMOTETomek SMOTE-Tomek is a combination of the SMOTE and Tomek connection technologies. Tomek linkages are examples of things that are close to one another yet belong to different classifications. The SMOTE-Tomek method eliminates Tomek links to further purify the data and improve the classifierAos performance. Algorithms description Naive Bayes Naive Bayes is a type of classifier that predicts outcomes using BayesAo theorem. ItAos called AynaiveAy because it assumes that every feature is independent from the othersAia simplification that makes calculations much faster. Despite its simple assumptions. Naive Bayes works surprisingly well for sorting information into categories, especially when dealing with large amounts of data or text. ItAos particularly popular for tasks like detecting spam emails, classifying documents, and analyzing the sentiment of text . Decision Tree Decision trees are commonly used when you need to predict outcomes for both categories and numbers. They have the advantage of training much faster than neural networks, making them a practical choice for quick predictions and situations where speed matters . Decision trees are called non-parametric, meaning they donAot make assumptions about how the data is distributed. They can handle data with lots of features, often leading to more accurate results. Decision trees work especially well when paired with techniques like SMOTE. One challenge, however, is deciding which feature to split on at each step. Two popular ways to Exploratory Analysis of the Impact of Data Balancing on the ClassifierAos. (Md. Mahedi Hassa. ISSN: 2089-3272 make this decision are Information Gain and the Gini Index. As a tree splits the training data, the disorderAior entropyAichanges, and Information Gain measures how much the entropy decreases. Equation: Gain(S. A) = Entropy(S) Oe |Sv | |S| A Entropy(Sv ) Random Forest A random forest is an ensemble method made up of many decision trees working together. Each tree is built using slightly different data and random choices, so they each learn something a little different. When itAos time to make a prediction, all the trees AyvoteAy and the most common answer wins. Generally, the more trees in the forest, the better the accuracy . Some forests are built with special techniques like bootstrapping or boosting, which can make them even stronger or give them unique advantages compared to other approaches. KNN KNN, or k-nearest neighbors, is a straightforward method used for both classification and regression. To make a prediction about a new data point. KNN looks at the AokAo closest labeled examples in the dataset and chooses the most common label or average value. The basic idea is that similar things tend to be found near each other, so their results are likely to be similar too . In real-world research, data often comes from many sources and isnAot always completeAimissing values are common. Choosing how to fill in these gaps . is important for model accuracy. In PythonAos scikit-learn, the KNN imputer fills in missing values by looking at the closest complete data points, using a distance metric called Euclidean distance. When calculating this distance, it ignores missing values and compares only the available information. The algorithmAos equation Dxy = weight A squared distance from present coordinates . total number of coordinates number of present coordinates . Weight = Logistic Regression Logistic regression is a type of classification method that can be used in supervised learning models to guess the probability of a target variable. LR is like linear regression in that it tries to guess the targetAos probability based on an input feature. This approach has been used for boom classification jobs in water quality . , but it may also be changed to work with multiclass classification problems. Regarding the binary classification problem, there is a well-known method known as Logistic Regression(LR). The logistic equation, or commonly called the sigmoid function, is among the reasons why LR is so popular. The sigmoid function maps the output of that number to a value between 0 and 1 with its signature S-curve. 1 eOevalue Support Vector Machine (SVM) A Support Vector Machine (SVM) is a type of supervised learning algorithm used for classification and regression purposes. It seeks to discover the best hyperplane that maximizes the margin between classes . The SVM algorithm seeks a hyperplane such that for a given dataset, it is defined as follows: wAx b=0 . where w is the weight and x is an input while b being a scalar bias. The SVM aims to achieve the maximum margin 2/. , but, of course, with all data points of each class on the right side of the hyperplane. SVM can be generalized, using kernel functions to map the input space into higher dimensions, where a separation in the linear case becomes possible. IJEEI. Vol. No. September 2025: 703-721 IJEEI ISSN: 2089-3272 AdaBoost AdaBoost is an intelligent method for creating more accurate predictions by combining weaker models, known as weak learners, and boosting them to form a stronger model. The idea is based on giving more emphasis to examples that are difficult to get right: whenever a weak learning algorithm makes an error. AdaBoost increases the importance of that example for learning the next round, . This eventually enables the final model to focus on the problematic examples and make more accurate predictions. The key youAore aiming for is simply continual improvement by avoiding the same mistakes. Gradient Boosting An alternative approach to learning a combination of many simple models . sually decision tree. to produce a much more accurate prediction . You can use it to predict both categorical and numerical values. The next model in the series attempts to correct the errors of its predecessors. Put them all together and you have a powerful prediction instrument. In other words, each step in the gradient boosting process is trying to adjust for the errors of its The cycle repeats, as the next mini-model learns from the mistakes of those before it. Once all the models are combined, the aggregate prediction is far more accurate than any individual model by itself. XGBoost XGBoost Classifier excels in classification. It forms a powerful team of decision trees that focus on regions the previous ones neglected. XGBoost enhances decision trees differently from extreme gradient Teamwork and accuracy and efficiency skills yield outstanding results . Model performance is measured and improved using metrics including accuracy, precision, recall, and F1 score. This study classifies water quality precisely using XGBoost, a fast supervised learning algorithm. Regularised learning features refine final weights and reduce overfitting, motivating its use. Equation of this algorithm. E() = d. i , yCi ) . k ) Voting Ensemble The Voting Ensemble algorithm improves performance by combining machine learning model predictions. Hard and soft voting are the primary types. The majority vote of the classifiers determines the final prediction in hard voting, while the average of projected probabilities is used in soft voting . For a given set of classifiers . 1 , h2 , . , hn } and an input sample x, the hard voting prediction yC is given by: yC = mode. , h2 . , . , hn . ) . In soft voting, the prediction yC is obtained by averaging the predicted probabilities PC . = . for each class c and selecting the class with the highest average probability: yC = arg max Pi . = . n i=1 . Voting Ensemble algorithms are effective because they leverage the strengths of multiple models, leading to improved accuracy and robustness compared to individual models. Stacking Ensemble Stacking Ensemble uses Random Forest (RF). XGBoost (XGB). AdaBoost (ADA). Gradient Boosting (GB), and Decision Tree. A list of base models is used as the estimatorAos parameter to create this ensemble using the Stacking Classifier class. These base models forecast independently and merge during stacking. The final creditworthiness prediction is made by training a Random Forest Classifier with the aggregated forecasts . Exploratory Analysis of the Impact of Data Balancing on the ClassifierAos. (Md. Mahedi Hassa. ISSN: 2089-3272 Figure 2. Stacking ensemble structure. XAI tools Shapash Shapash is a Python package that helps you understand and explain models. SHAP (Shapley Additive exPlanation. is better with automated, customizable, and interactive ML model explanations. With Shapash, people of all skill levels can understand model predictions and how features affect them. SHAPashAos easy-to-use and interactive graphs of SHAP values help users examine how each feature affects the modelAos Force, summary, and dependence graphs illustrate how features are related, how they interact, and what they may indicate. Users can comprehend their modelsAo behavior using ShapashAos model comparison, sensitivity analysis, and global feature relevance evaluation. It can work with tree-based, linear, and ensemble ML models. therefore, it can be used in a wide range of fields and applications . Shapash is utilized by data science projects to analyze, debug, and validate models due to its simple interface and excellent visualization Performance Metrics You can use these approaches to quantify performance, including accuracy, precision, recall. F1 score, specificity. AUC. CohenAos Kappa, and others. Accuracy is the number of correctly categorized data points divided by the total number of observations. Precision indicates how accurately the model forecasts the positive Recall shows us how many of the good things that happened were remembered as good things. The F-1 score is the harmonic mean of Precision and Recall. Specificity indicates how accurately the model predicts the negative instance. The Area Under the ROC Curve (AUC) indicates how effectively a classifier distinguishes between classes. CohenAos Kappa quantifies the degree of agreement between two raters, taking into account the possibility that they might agree by chance. RESULTS AND DISCUSSION Performance of the Models on Dataset Table 2. Performance of the different models using SMOTE Algorithm Naive Bayes Decision Tree KNN SVM Ada Boost Gradient Boosting XG boost Voting Classifier Stacking Classifier Accuracy Precision Recall F1 Score AUC N/A CKS Table 2 shows the performance metrics of machine learning models applied to dataset DS1 after using SMOTE to handle imbalanced data. The models include Naive Bayes. Decision Tree. Random Forest. KNearest Neighbours (KNN). Logistic Regression (LR). Support Vector Machine (SVM). Ada Boost. Gradient Boosting. XG Boost. Voting Classifier, and Stacking Classifier. Model performance is measured by accuracy. IJEEI. Vol. No. September 2025: 703-721 IJEEI ISSN: 2089-3272 precision, recall. F1 score. AUC, and CohenAos Kappa Score. In particular. Random Forest. Gradient Boosting. XG Boost. Ada Boost, and ensemble approaches like Voting Classifier and Stacking Classifier perform well on the dataset and task. Table 3. Performance of the different models using NearMiss Algorithm Naive Bayes Decision Tree KNN SVM Ada Boost Gradient Boosting XG boost Voting Classifier Stacking Classifier Accuracy Precision Recall F1 Score AUC N/A CKS Table 3 shows NearMiss sample performance of various machine learning algorithms, with rows representing algorithms and columns indicating accuracy, precision, recall, and F1 score metrics. Random Forest (RF) has the best accuracy . and precision . , indicating reliable classification and favourable predictions. By utilising model strengths, ensemble approaches including AdaBoost. Gradient Boosting. XGBoost. Voting Classifier, and Stacking Classifier achieve competitive accuracy of 0. 75 to 0. However. Support Vector Machine (SVM) performs worse across all metrics, suggesting classification issues. The NearMiss and SMOTE methods yield different performance measures. Compared to SMOTE. NearMiss has inferior accuracy, precision, recall, and F1 score. In this investigation. NearMiss may not improve algorithm performance as much as SMOTE. The different methods of NearMiss . and SMOTE . to class imbalance lead to these variances. NearMiss may reduce majority class samples, affecting the modelAos capacity to learn from the majority class. Increasing minority class representation through oversampling appears to ameliorate class imbalance better, as seen by SMOTEAos superior performance metrics. Table 4. Performance of the different models using SMOTETomek Algorithm Naive Bayes Decision Tree KNN SVM Ada Boost Gradient Boosting XG boost Voting Classifier Stacking Classifier Accuracy Precision Recall F1 Score AUC N/A CKS SMOTETomek data handling performance metrics for machine learning algorithms are shown in Table 4. Random Forest (RF) predicts class labels best . 85, 0. XG Boost follows with 0. 85/0. 86 accuracy and precision. SVM has the lowest accuracy . and precision . , showing classification difficulties. Voting and Stacking Classifier ensemble techniques often outperform, showing that mixed models can compete. RF. XG Boost, and Naive Bayes have strong recall and F1 Scores, indicating they can collect positive cases and balance precision and recall. KNN and SVM have lower recall and F1 Score, suggesting they may struggle to detect positive cases. Voting Classifier accuracy was 0. 83 and Stacking Classifier accuracy was 81, proving ensemble methods work. Finally. RF. XG Boost, and Naive Bayes work well, but ensemble techniques increase model performance using SMOTETomek data processing. In conclusion. Random Forest. XG Boost, and Naive Bayes have good classification accuracy, precision, recall, and F1 Score. Poor KNN and SVM. Voting and Stacking Classifier ensemble techniques consistently generate competitive results, showcasing SMOTETomekAos multi-model strengths. The Table 5 summarises RandomOverSamplerAos performance metrics for several methods on the The Stacking Classifier and RF (Random Fores. algorithms have the highest accuracy, precision. Exploratory Analysis of the Impact of Data Balancing on the ClassifierAos. (Md. Mahedi Hassa. ISSN: 2089-3272 Table 5. Performance of the different models using Random Over Sampler Algorithm Naive Bayes Decision Tree KNN SVM Ada Boost Gradient Boosting XG boost Voting Classifier Stacking Classifier Accuracy Precision Recall F1 Score AUC CKS Table 6. Hyperparameters Tuning of Algorithms Algorithms Random Forest XGBoost AdaBoost Gradient Boosting Final Estimator Hyperparameter n estimators: . , . n estimators: . , . n estimators: . , . n estimators: . , . n estimators: . , . max depth: [None, 10, . recall, and F1 scores, 0. 93 and 0. 9, respectively. These algorithms accurately detect positive instances and minimize false positives. The Decision Tree algorithm has a lower F1 score of 0. 42, indicating poorer dataset The SVM algorithm scores lower on all metrics, indicating poor performance compared to other Ensemble approaches like Voting and Stacking Classifiers perform well. Ensemble approaches increase accuracy, precision, recall, and F1 scores by integrating various algorithms to maximize model strengths. This shows how ensemble methods improve performance and predictions. Figure 3. Comparison of models performance using Random Over Sampler Data Balancing 10-Fold Cross Validation In Table 7, which represents the 10-fold cross-validation results for the Stacking Classifier on Dataset, we observe a consistent and commendable performance. The F1 scores range from 0. 88 to 0. 98, showcasing a robust ability of the model to balance precision and recall across different folds. The corresponding accuracy scores vary from 0. 89 to 0. 96, with an average accuracy of 0. This suggests a high level of accuracy in predicting the target variable across different subsets of the dataset. IJEEI. Vol. No. September 2025: 703-721 IJEEI ISSN: 2089-3272 ye Figure 4. Comparison of models output using Random Over Sampler Table 7. Result for 10-Fold Cross Validation of Stacking Ensemble Model Algorithm F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 Avg. Accuracy Stacking Classifier 0. Figure 5. Importance of global characteristics of the prediction XAI Analysis Global Explainability Figure 5 illustrates the feature importance graph, which displays the relative importance of each feature in relation to the others. In this case, the response variable is one, and the graph illustrates how much each Exploratory Analysis of the Impact of Data Balancing on the ClassifierAos. (Md. Mahedi Hassa. ISSN: 2089-3272 feature adds to the mean absolute value. CREDIT HISTORY has the greatest value, which means it is the most important factor in predicting the response variable. This means that when the CREDIT HISTORY feature is set to one, it is strongly related to the answer variable. Another essential thing in predicting the target variable is that APPLICANTINCOME has a value that is close to CREDIT HISTORY. CREDIT HISTORY and APPLICANTINCOME, on the other hand, have higher values than LOANAMOUNT. COAPPLICANTINCOME. PROPERTY AREA. LOAN AMOUNT TERM. DEPENDENTS, and EDUCATION. This means that the relationship between these features and the response variable is not very strong. This means that those factors donAot significantly contribute to the response variable in the dataset examined. Local Explainability Figure 6. Contribution of the Feature CREDIT HISTORY in the model Figure 6 illustrates that a contribution plot is a graph that displays the importance or utility of each feature in a statistical or machine learning model. In this scenario, the plot illustrates how a CREDIT HISTORY can either help or hurt things. In our methodology, a scenario where an elevated expected outcome correlates with an increase in the CREDIT HISTORY value is identified as a positive contribution. On the other hand, a plot with a negative contribution indicates that CREDIT HISTORY is a significant component of the model and has a substantial impact on its decisions. IJEEI. Vol. No. September 2025: 703-721 IJEEI ISSN: 2089-3272 ye Figure 7. Contribution of the Feature APPLICANTINCOME in the model Figure 7 shows how the APPLICANTINCOME feature adds to the model by showing its contribution The length of the subgroup is 2000 . %). Plotting shows that APPLICANTINCOME has a big positive effect on the model and a smaller negative effect. Figure 8. Contribution of the Feature LOANAMOUNT in the model Figure 8 shows the contribution plot, which illustrates how the loan amount affects a group of 2000 cases . %). The plot shows that LOANAMOUNT has a significantly positive effect on the model, outweighing any negative effect, which means that LOANAMOUNT is crucial in determining the modelAos behavior. Exploratory Analysis of the Impact of Data Balancing on the ClassifierAos. (Md. Mahedi Hassa. ISSN: 2089-3272 Figure 9. Contribution of the Feature COAPPLICANTINCOME in the model With a subset length of 2000 . %), the contribution of COAPPLICANTINCOME is depicted in the contribution plot in Figure 9. COAPPLICANTINCOME has a varied effect on the model, as evidenced by the plot, contributing both positively and negatively. The data indicate that although COAPPLICANTINCOME may be predictively relevant, its influence may be less substantial compared to other variables that have either a greater positive contribution or a lesser negative one. Figure 10. Contribution of the Feature PROPERTY AREA in the model IJEEI. Vol. No. September 2025: 703-721 IJEEI ISSN: 2089-3272 ye Figure 11. Contribution of the Feature LOAN AMOUNT TERM in the model Figure 12. Contribution of the Feature DEPENDENTS in the model Figure 13. Contribution of the Feature EDUCATION in the model The contribution plot in Figures 10, 11, 12, and 13 shows that the PROPERTY AREA. LOAN AMOUNT TERM. DEPENDENTS, and EDUCATION characteristics are just as important in the model, using Exploratory Analysis of the Impact of Data Balancing on the ClassifierAos. (Md. Mahedi Hassa. ISSN: 2089-3272 a subset length of 2000 . %). Plot analysis indicates that the PROPERTY AREA. LOAN AMOUNT TERM. DEPENDENTS, and EDUCATION features exert a more favorable influence than an unfavorable one, exhibiting a significantly smaller adverse impact. Figure 14. Local explanation of a random id: 0 Figure 14 displays a local explanation for a person with an ID of 900 and a probability value of 0. CREDIT HISTORY. APPLICANTINCOME. LOANAMOUNT. COAPPLICANTINCOME, and PROPERTY AREA all have positive values. CREDI HISTORY has the greatest value of all of them. But DEPENDENTS has a value that is less than zero. Figure 15. Local explanation of a random id: 1 An individualized explanation for person with ID 900 with the predicted probability of 0. 29 is given in Figure 15. In this scenario. CREDIT HISTORY APPLICANTINCOME LOANAMOUNT COAPPLICANTINCOME and PROPERTY AREA pushes the probability higher with highest being CREDIT HISTORY. On the other side if there is more DEPENDENTS then also it increases the probability. The comparison plot in Figure 16 shows the contribution levels for each feature. It was generated by randomly choosing 15 unique IDs. The lines for CREDIT HISTORY. APPLICANTINCOME. LOANAMOUNT, and PROPERTY AREA IJEEI. Vol. No. September 2025: 703-721 IJEEI ISSN: 2089-3272 ye Figure 16. Comparison plot for contribution values of each feature are more spread out than the lines for the other attributes. The lines for the other features become shorter, indicating that they are less important than the first features. CONCLUSION This research demonstrates the ability to predict of creditworthiness with several machine learning The goal is to find the best way to accomplish things by looking at algorithms like Decision Tree. Random Forest. Naive Bayes. KNN. SVM. Logistic Regression. AdaBoost. Gradient Boost. XGBoost. Voting Ensemble, and Stacking Ensemble. After a lot of research, it was determined that the Stacking Ensemble method, when used with random oversampling, worked better than previous algorithms, with an amazing accuracy rate of 93 Future study could focus on real-time implementation as two key areas of interest. Although this paper has offered useful insights into the prediction of creditworthiness using machine learning algorithms, there are various opportunities for further research and enhancement. One could explore advanced sampling approaches like Borderline-SMOTE or ADASYN to better handle class imbalance and potentially enhance algorithm performance. In addition, conducting additional optimisation and refinement of ensemble models, such as fine-tuning hyperparameters and investigating other ensemble configurations, has the potential to improve their performance. Deploying and integrating the created models into real-time creditworthiness prediction systems would guarantee scalability, efficiency, and accuracy in a production context. Ultimately, these endeavours can result in enhanced precision and dependability of forecasting models, empowering financial organisations to make more astute judgements and streamline their lending procedures with greater effectiveness. REFERENCES