International Journal of Electrical and Computer Engineering (IJECE) Vol. No. October 2025, pp. ISSN: 2088-8708. DOI: 10. 11591/ijece. Enhancing software fault prediction using wrapper-based metaheuristic feature selection methods Ha Thi Minh Phuong. Dang Thi Kim Ngan. Dao Khanh Duy. Nguyen Thanh Binh The University of Danang. Vietnam - Korea University of Information and Communication Technology. Danang. Vietnam Article Info ABSTRACT Article history: The application of software fault prediction (SFP) to predict faulty components at the early stage has been investigated in various studies. Reducing feature redundancy is key to enhancing the predictive accuracy of SFP models. Feature selection methods are utilized to select and retain the features that contribute the most information while eliminating irrelevant or redundant features from software fault datasets. However, feature selection (FS) in the field of SFP remains a broad and continuously evolving field, encompassing a diverse range of techniques and methodologies. In this work, we study and perform empirical evaluation of ten wrapper FS methods, namely artificial butterfly optimization (ABO), atom search optimization (ASO), equilibrium optimizer (EO). Henry gas solubility optimization (HGSO), poor and rich optimization (PRO), generalized normal distribution optimization (GNDO), slime mold algorithm. Harris hawkAos optimization, pathfinder algorithm (PFA) and manta ray foraging optimization for resolving the data redundancy issue in SFP datasets. Experimental results on nine fault datasets from the PROMISE and AeM repositories show that the EO achieves the best performance, with PRO and HGSO ranking next. The comparative analysis revealed that ten wrapperbased FS methods demonstrated a substantial improvement in handling data redundancy issues for SFP. Received Jun 6, 2025 Revised Jul 18, 2025 Accepted Aug 1, 2025 Keywords: Datasets Feature selection methods Machine learning Software fault prediction Wrapper-based feature selection methods This is an open access article under the CC BY-SA license. Corresponding Author: Nguyen Thanh Binh The University of Danang. Vietnam - Korea University of Information and Communication Technology 470 Tran Dai Nghia. Ngu Hanh Son District. Danang 55000. Vietnam Email: ntbinh@vku. INTRODUCTION The primary objective of feature selection is to extract a subset of relevant features from the original set, aiming to reduce dimensionality without compromising classification performance. Feature selection methods are typically categorized into three types: filter-based, wrapper-based, and hybrid approaches. Filterbased techniques rank features according to specific criteria and discard those that do not meet a predefined threshold . Wrapper-based feature selection (FS) techniques utilize classification models to evaluate the effectiveness of feature subsets, often resulting in superior performance compared to filter-based methods . Hybrid approaches combine the advantages of both filter and wrapper methods to achieve a balance between computational efficiency and predictive accuracy . Prior research has shown that wrapper-based approaches generally outperform filter-based techniques . Nevertheless, a large number of metaheuristic variants remain underexplored in the context of feature selection. Therefore, this study presents an empirical evaluation of metaheuristic algorithms within wrapper-based FS methods to reduce data redundancy in common software fault prediction (SFP) datasets, with the goal of improving model efficiency while preserving predictive performance. Specifically, we investigate a range of wrapper-based FS techniques Journal homepage: http://ijece. ISSN: 2088-8708 applied to software fault datasets, including artificial butterfly optimization (ABO), atom search optimization (ASO), equilibrium optimizer (EO). Henry gas solubility optimization (HGSO), poor and rich optimization (PRO), generalized normal distribution optimization (GNDO), slime mould algorithm. Harris hawkAos optimization, pathfinder algorithm (PFA), and Manta Ray Foraging Optimization. Specifically, the proposed wrapper-based FS methods are evaluated against a baseline that applies learning algorithms directly to the original software fault datasets. Experiments were conducted on nine datasets derived from PROMISE and AeM repositories. To assess classification performance, we employed three learning models: random forest, extra trees, and AdaBoost. Evaluation metrics included precision, recall. F1-score, and area under the curve (AUC). To determine the statistical significance of performance differences between the ten wrapperbased FS techniques and the baseline, the Wilcoxon signed-rank test was performed at a 0. 05 significance Each experiment was repeated ten times to ensure reliability, producing ten unique test sets. The results indicate that the wrapper-based methods consistently outperform the baseline. Among them, the EO achieved the best overall performance, followed by PRO and HGSO. This study specifically addresses the following research questions: How effective are the applied FS techniques in enhancing SFP by filtering out irrelevant or redundant software metrics? Which wrapper-based FS method performs best for selecting the optimal features for SFP? The structure of the paper is organized as. Section 2 reviews the related literature, while section 3 outlines the research methodology. Section 4 presents and discusses the experimental findings, and section 5 highlights the main conclusions and recommendations. RELATED WORK A hybrid FS method was introduced by Anju et al. This study proposed a method that combines quantum particle swarm optimization (QPSO) and principal component analysis (PCA). The results demonstrate that the proposed model, employing an artificial neural network (ANN) classifier, achieved higher accuracy and precision compared to existing approaches. The authors further suggest that these findings hold significant implications for both academia and the software industry. According to Ali et al. , metaheuristic approaches within wrapper-based FS methods outperformed traditional techniques such as Best First Search and Greedy Stepwise Search. Some examples of these algorithms include: whale optimization algorithm (WOA) . , genetic algorithm (GA) . and particle swarm optimization (PSO) . are popular metaheuristic algorithms used for FS in SFP. Ali et al. emphasized that removing irrelevant or redundant features may bring better predictive performance. However, incorrect FS or omitting important features can lead to a decline in model performance. Therefore, analyzing and evaluating different FS methods is essential to identify the most effective approach for software defect prediction. Balogun et al. proposed a hybrid FS method that combines multiple filter techniques with a wrapper approach using rank aggregation. By combining filter and wrapper techniques, this approach aims to boost the efficacy of SFP models. It was shown through their experiments that the proposed methodology considerably enhanced the predictive performance of SFP models. Shah and Das . investigated the effectiveness of PSO for FS in conjunction with k-nearest neighbors . -NN), naive Bayes and decision tree classifiers. Their experimental results demonstrate that integrating PSO with these classifiers enhances predictive performance across multiple datasets. RESEARCH METHOD Experimental design Figure 1 indicates the main steps of our methodology. After splitting the data into training and testing parts, we filled in missing values and normalized the data. Subsequently, ten FS techniques based on wrapper strategies were employed to identify the most relevant set of features. Specifically, we utilized an open-source toolkit, wrapper-feature-selection-toolbox, which implements over 40 different wrapper methods. Finally, we utilized three ML classifiers, namely random forest, extra trees, and AdaBoost. precision, recall. F1-score, and AUC are the metrics used to determine the performance of the selected wrapper-driven approaches. Wrapper-based feature selection techniques There are various wrapper-based FS methods that have been applied in software fault prediction, including binary genetic algorithm (BGA), binary particle swarm optimization (BPSO), and binary ant colony optimization (BACO) . We consider the following wrapper-based FS approaches in this study. ABO, inspired by butterfly behavior, was proposed by Qi et al. This algorithm is based on the preference of speckled woods, which seek out warm sunspots in woods and areas. ASO was introduced by Zhao et al. for solving optimization challenges. Zhao et al. demonstrated that ASO outperforms Int J Elec & Comp Eng. Vol. No. October 2025: 4803-4812 Int J Elec & Comp Eng ISSN: 2088-8708 many classic and emerging algorithms in benchmark tests and they have shown that ASO is a promising solution for real-world engineering problems. The EO, proposed by Faramarzi et al. , is a physicsinspired algorithm that regulates the volume mass balance model to determine dynamic and equilibrium This mechanism enables EO to effectively balance exploration and exploitation during the search Similarly, the HGSO algorithm . , grounded in HenryAos law . , solves complex optimization tasks by simulating gas molecule clustering, which helps maintain exploration-exploitation balance and avoid premature convergence. The PRO algorithm, proposed by Moosavi and Bardsiri . , is inspired by the financial behavior of individuals within society, modeled through the interaction between wealthier and poorer groups striving to improve their status. The generalized normal distribution optimization (GNDO) algorithm, introduced by Zhang et al. , operates in three stages: initial solution dispersion, convergence toward the optimal region, and refinement, using multiple normal distributions with gradually reduced the slime mold algorithm (SMA), developed by Li et al. , is based on the foraging behavior of Physarum polycephalum, which identifies optimal paths to food sources without a central nervous system. Harris hawks optimization (HHO), introduced by Heidari et al. , simulates the cooperative hunting strategy of Harris hawks. The algorithm begins by randomly initializing hawk agents across the search space and guides them through exploration and exploitation phases. The PFA, proposed by Yapici and Cetinkaya . , is a metaheuristic method that simulates the collective foraging behavior of animal groups, where individuals follow a leader while relying on their own perception to explore the search space. The manta ray foraging optimization (MRFO) algorithm, introduced by Zhao et al. , is inspired by the unique foraging strategies of manta rays, which search large areas and dynamically adjust their positions toward regions with greater resource availability. Figure 1. The main steps of the evaluation Dataset In this study, we obtained nine datasets from the PROMISE and AeM repositories, both of which are widely recognized and frequently utilized in SFP research. These datasets comprise independent variables, such as lines of code (LOCod. and lines of comments (LOCommen. , along with a dependent variable indicating the status of a software component, classifying it as either faulty or non-faulty. A notable characteristic of the dataset is the predominance of non-faulty samples over faulty ones, which introduces an imbalanced data problem. The details are presented in Table 1. Table 1. The datasets used in the study Dataset CM1 KC1 KC2 PC1 JDT Lucene Mylyn PDE Projects PROMISE PROMISE PROMISE PROMISE AeM AeM AeM AeM AeM Instances Faulty instances Non-faulty instances Faulty ratio (%) Evaluation measures During the fault prediction model development process, performance evaluation metrics are applied to systematically assess the effectiveness of the models and identify the most appropriate one for a given In this study, we adopt precision (P), recall (R). F1-score (F. and AUC as evaluation measures. Enhancing software fault prediction using wrapper-based metaheuristic A (Ha Thi Minh Phuon. A ISSN: 2088-8708 Machine learning techniques Random forest . represents a supervised learning approach primarily used for regression and classification tasks. It operates by combining multiple decision trees, with the final prediction based on the aggregated results of these trees. Random forest is known for its high classification accuracy and its resistance to overfitting. AdaBoost . is a well-known Boosting algorithm designed to enhance weak classifiers by iteratively creating new models and adjusting their weights. It operates by sequentially training multiple weak classifiers, with each new model prioritizing the correction of errors made by its predecessors. ET . is a variant of random forest that selects attributes randomly for classification rather than choosing the optimal one, as random forest does. This approach enhances training speed and reduces overfitting. in some cases, it may result in lower accuracy. RESULTS AND DISCUSSION This section presents experimental findings to systematically address the two aforementioned research questions. Research question 1: How effective are the presented feature selection methods for reducing the irrelevant/redundant metrics for SFP? To answer research question 1, we perform various experiments to compare ten FS approaches using three ML classifiers described in section 3. In Tables 2 and 3, the best experimental results are highlighted in bold. Random forest results: Table 2 shows that, among all evaluated FS methods, the combination of EO and random forest achieved the highest average performance across all metrics, with PRO ranking second. Notably. EO achieved the highest average values across all evaluation metrics, with precision of 0. recall of 0. AUC of 0. 76 and F1-score of 0. PRO achieved the second-highest performance metrics, with precision, recall. F1-score and AUC values of 0. 68, 0. 84, 0. 76 and 0. 84, respectively. The analysis results demonstrate that EO and PRO achieved comparatively superior performance when used as FS methods for the Random Forest classifier, outperforming the other FS evaluated in this study. Table 2. The performance of the wrapper-based FS methods when applied with the RF model and the AdaBoost model Int J Elec & Comp Eng. Vol. No. October 2025: 4803-4812 Int J Elec & Comp Eng ISSN: 2088-8708 AdaBoost results: The results summarized in Table 2 reveal that the HGSO consistently yielded the highest precision and AUC values in the majority of the experimental datasets. Specifically, on the KC1. KC2. PC1. EQ, and Lucene datasets, the highest AUC values were recorded in HGSO with 0. 80, 0. 82, 0. 89, 0. 92 and 88, respectively. In addition. EO obtained the greatest values on recall values on most of the datasets. Finally. EO achieved the greatest average recall and F1-score values of 0. 82 and 0. 73, whereas HGSO demonstrated superior average precision and AUC with values of 0. 64 and 0. 81, respectively. ET results: It can be observed in Table 3. EO achieved superior recall. F1-score, and AUC compared to the other wrapper-based FS methods for the CM1. KC1. KC2 and PC1 datasets. ASO obtained the highest precision on the EQ. JDT and Lucene datasets, with values of 0. 82, 0. 73 and 0. 74, respectively. For the EQ dataset. EO highlighted better prediction performance than all other FS models with the greatest recall and AUC values of 0. 96 and 0. For the EQ dataset. EO demonstrated better predictive performance compared to all other FS models, achieving the highest recall and AUC values of 0. 96 and 86, respectively. Again, for the average values of recall. F1-score and AUC. EO outperformed all other FS models in terms of recall. F1-score and AUC values, with 0. 77, 0. 70 and 0. 84, while the highest average precision value was recorded in ASO method, with a value of 0. Table 3. The performance of the wrapper-based FS methods when applied with the ET model Dataset CM1 KC1 KC2 PC1 JDT Lucene Mylyn PDE Average values Metrics AUC AUC AUC AUC AUC AUC AUC AUC AUC AUC Baseline ABO ASO HGSO PRO GNDO SMA HHO PFA MRFO Research question 2: which wrapper-based feature selection method performs best for selecting the optimal features for SFP? Figure 2. in Appendix, presents the AUC performance of the proposed wrapper-based FS The x-axis indicates the names of the FS algorithms, while the y-axis shows their corresponding AUC scores. According to the chart. EO and PRO consistently delivered superior results across several Specifically. EO attained top AUC scores of 0. 78, 0. 90, 0. 84, and 0. 75 on KC1. EQ. Lucene, and Enhancing software fault prediction using wrapper-based metaheuristic A (Ha Thi Minh Phuon. A ISSN: 2088-8708 PDE, respectively. Similarly. PRO achieved notable AUC values of 0. 88, 0. 93, and 0. 86 on PC1. EQ, and JDT. To further analyze overall performance, the average AUC across different machine learning classifiers was computed. All evaluated FS methods achieved mean AUC values above 0. Among them. EO in combination with random forest yielded the highest average score of 0. 87, followed closely by PRO with random forest at 0. On the other hand. SMA coupled with ET obtained the lowest average AUC, with a score of 0. These outcomes suggest that EO offers the most reliable FS strategy for SFP, effectively enhancing classifier performance. PRO and HGSO also demonstrated strong and consistent performance across multiple scenarios. Statistical analysis To compare the performance of ten wrapper-based FS methods (ABO. ASO. EO. HGSO. PRO. GNDO. SMA. HHO. PFA, and MRFO) with that of the baseline method, the Wilcoxon signed-rank test was applied with = 0. Table 4 presents the detailed statistical test results for all evaluated ML algorithms and wrapper-based FS methods with respect to precision, recall. F1-score, and AUC. Table 4 presents the statistical analysis results, including P-values, effect size . , and mean differences. An asterisk (*) denotes a statistically significant difference, corresponding to a P-value less than 0. The effect size r quantifies the magnitude of the performance difference between the compared methods, while the mean difference indicates which method achieves a higher average score. According to Table 4, several FS methods demonstrated statistically significant improvements in F1-score compared to the baseline. For example. ASO achieved a statistically significant improvement with a P-value of 0. 026 and a large effect size . = 0. , along with a mean difference of 0. Similarly. HGSO and PRO demonstrated meaningful gains, with P-values of 0. 022 and 0. 033, and moderate effect sizes of 0. 520 and 0. 561, respectively. In contrast. EO showed a moderate effect size . = 0. but a non-significant P-value of 0. 293, indicating that its performance difference from the baseline may not be statistically reliable. In contrast. SMA and HHO showed P-values of 1. 000, indicating no statistical difference from the baseline, even though their effect sizes were moderate . 534 and 0. These findings suggest that not all numerical differences translate into meaningful statistical significance. Overall. ASO. HGSO. PRO. PFA, and MRFO stood out as statistically and practically effective FS methods for random forest and should be prioritized for real-world deployment in SFP models. Table 4. Statistical test results for the comparison between ten wrapper-based FS methods and the baseline method for random forest. ET and AdaBoost Comparison group Measure Random forest AdaBoost AUC AUC AUC ABO & Baseline P-value 043* 0. 043* 0. 018* 0. 052* 0. Effect r Mean-difference 0. ASO & Baseline P-value 013* 0. 053* 0. 013* 0. 053* 0. 018* 0. 035* 0. Effect r Mean-difference 0. EO & Baseline P-value 051* 0. 050* 0. 051* 0. 050* 0. 018* 0. Effect r Mean-difference 0. HGSO & Baseline P-value 022* 0. 033* 0. 022* 0. 033* 0. 011* 0. 011* 0. Effect r Mean-difference 0. PRO & Baseline P-value Effect r Mean-difference 0. GNDO & Baseline P-value 045* 0. 028* 0. 045* 0. 028* 0. 050* 0. 024* 0. Effect r Mean-difference 0. SMA & Baseline P-value 026* 0. 041* 0. 026* 0. 028* 0. Effect r Mean-difference -0. HHO & Baseline P-value 037* 0. 017* 0. Effect r Mean-difference 0. PFA & Baseline P-value 023* 0. 035* 0. 025* 0. 023* 0. 035* 0. 025* 0. 024* 0. 017* 0. Effect r Mean-difference 0. MRFO & Baseline P-value 035* 0. 035* 0. 013* 0. 028* 0. Effect r Mean-difference 0. Int J Elec & Comp Eng. Vol. No. October 2025: 4803-4812 Int J Elec & Comp Eng ISSN: 2088-8708 CONCLUSION The objective of SFP models is to build high-quality software with minimal testing efforts by identifying faulty modules early in the software development process. The performance of SFP models, however, is influenced by the quality of the software defect datasets, which commonly suffer from high Therefore, the elimination of irrelevant and redundant features from software fault datasets is an important step in building a robust SFP model. In this study, we present an evaluation of these wrapper strategies to select optimal features. Subsequently, we applied various ML techniques to the datasets with the previously extracted optimal features. Moreover, this study conducted a comparative performance analysis between the proposed wrapper-based FS methods and the baseline approach. The experimental results showed that ten presented wrapper-based FS methods achieved better performance compared to the baseline method which applied ML techniques to the original dataset. Moreover, the results indicated that EO was the most effective, with PRO and HGSO following. However, this study still has certain limitations. First, the datasets used are static and relatively well-balanced, which may not fully capture the challenges posed by real-world scenarios involving imbalanced distributions or data that change over time. Second, evaluating the methods using only three classifiers may limit the generalizability of the findings across other machine learning models. In the future, we aim to explore the integration of the proposed wrapper-based FS methods with sampling techniques such as SMOTE or ADASYN to address the challenges of high-dimensional and imbalanced datasets, thereby enhancing the performance of SFP models. FUNDING INFORMATION Funding for this research was provided by the Science and Technology Development Fund of the University of Danang, project number B2022-DN07-02. AUTHOR CONTRIBUTIONS STATEMENT This journal uses the Contributor Roles Taxonomy (CRediT) to recognize individual author contributions, reduce authorship disputes, and facilitate collaboration. Name of Author Ha Thi Minh Phuong Dang Thi Kim Ngan Dao Khanh Duy Nguyen Thanh Binh C : Conceptualization M : Methodology So : Software Va : Validation Fo : Formal analysis ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue I : Investigation R : Resources D : Data Curation O : Writing - Original Draft E : Writing - Review & Editing ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue Vi : Visualization Su : Supervision P : Project administration Fu : Funding acquisition CONFLICT OF INTEREST STATEMENT There are no conflicts of interest regarding the publication of this paper. DATA AVAILABILITY The datasets and source code used can be accessed at https://github. com/Duycan17/Wrapper-Based. REFERENCES