MUST: Journal of Mathematics Education.
Science and Technology Vol.
No.
Juli 2025 Hal 58-67 DOI: http://doi.
org/10.
30651/must.
ENSEMBLE AND VOTING APPROACHES FOR DEFECT PREDICTION
ACROSS MULTIPLE SOFTWARE PROJECTS
Kirso1.
Agus Subekti2
1, 2
Universitas Nusa Mandiri 14230033@nusamandiri.
id1, agus@nusamandiri.
Received 31 Mei 2025.
revised 24 Juni 2025.
accepted 26 Juni 2025.
ABSTRAK
Penelitian ini melakukan eksperimen dengan metode ensemble, tuning hyperparameter, dan voting untuk meningkatkan prediksi cacat perangkat lunak pada berbagai proyek menggunakan dataset Kamei.
Dengan aplikasi lima model machine learning LightGBM.
XGBoost.
Random Forest.
Extra Trees, dan Gradient Boosting pada enam proyek: Bugzilla.
Columba.
JDT.
Mozilla.
Platform, dan Postgres.
Hasilnya secara umum, model menunjukkan performa yang baik saat diuji pada dataset proyek dengan karakteristik serupa atau hubungan yang kuat, seperti pada proyek Mozilla.
JDT, dan Platform, dengan akurasi dan skor F1 di atas 80%.
Hal ini mengindikasikan bahwa pola cacat yang dipelajari dari satu proyek dapat diterapkan secara efektif pada proyek lain yang serupa.
Namun, performa model menurun secara signifikan saat memprediksi proyek Bugzilla dari proyek lain, yang menunjukkan adanya perbedaan pola cacat atau ketidakcocokan fitur yang cukup besar.
Perbedaan distribusi data antar proyek menjadi tantangan utama dalam CPDP.
Oleh karena itu, dibutuhkan teknik adaptasi domain atau transformasi fitur untuk mengurangi perbedaan antar proyek sehingga model dapat mengenali pola cacat dengan lebih baik di berbagai proyek.
Meskipun ada peningkatan, perbedaan data antar proyek dan ketidakseimbangan kelas masih membatasi kinerja prediksi.
Penelitian selanjutnya perlu mengatasi tantangan ini.
Kata kunci: Cross Project Defect Prediction (CPDP).
Ensemble Learning.
Hyperparameter Tuning.
Kamei Dataset.
Voting ABSTRACT This study conducted experiments using ensemble methods, hyperparameter tuning, and voting to improve software defect prediction across multiple projects using the Kamei dataset.
Five machine learning models LightGBM.
XGBoost.
Random Forest.
Extra Trees, and Gradient Boosting were applied to six projects: Bugzilla.
Columba.
JDT.
Mozilla.
Platform, and Postgres.
Overall, the models demonstrated good performance when tested on datasets of projects with similar Ensemble And Voting Approaches For Defect Prediction Across Multiple Software Projects characteristics or strong relationships, such as Mozilla.
JDT, and Platform, achieving accuracy and F1 scores above 80%.
This indicates that defect patterns learned from one project can be effectively applied to similar projects.
However, the modelsAo performance dropped significantly when predicting defects in the Bugzilla project from other projects, indicating notable differences in defect patterns or feature incompatibility.
Differences in data distribution across projects remain a major challenge in CPDP.
Therefore, domain adaptation techniques or feature transformation methods are needed to reduce inter-project differences, enabling the models to better recognize defect patterns across projects.
Despite some improvements, data differences and class imbalance still limit prediction performance.
Future research should address these challenges.
Keywords:
Cross Project Defect Prediction (CPDP).
Ensemble Learning.
Hyperparameter Tuning.
Kamei Dataset.
Voting INTRODUCTION Cross project defect prediction (CPDP) has emerged as a critical area within software engineering, offering a promising approach for enhancing software quality and maintainability by leveraging data from multiple software projects.
This methodology utilizes historical defect data from source projects to predict potential defects in new, target projects, thereby supporting a more efficient software development lifecycle (Bala et al.
, 2023.
Lei et al.
, 2.
While within project defect prediction (WPDP) methods rely on historical data from a single project, they often suffer from data scarcity particularly in the early stages of development.
CPDP addresses this limitation by enabling the reuse of defect data across different projects (Goel et al.
, 2022.
Li et al.
, 2.
However, despite its potential.
CPDP still faces challenges in ensuring model generalizability and handling heterogeneity across projects.
Compared to previous studies, which primarily focused on improving prediction accuracy within isolated project environments, this study emphasizes the development and evaluation of more robust cross project learning strategies to bridge these gaps and enhance defect prediction performance in data scarce target projects.
Innovative CPDP These techniques include novel hybrid models that combine classical machine learning methods, such as Support Vector Machines and Random Forests, with deep learning approaches, leading to improved prediction capabilities (Kumar Kirso.
Agus Subekti & Saxena, 2.
Additionally, the issue of class imbalance a prevalent challenge in CPDP has prompted researchers to devise specialized frameworks aimed at enhancing the learning process and optimizing feature selection (Sekaran & Lawrence, 2025.
Tahir et al.
, 2.
Focusing on comprehensive feature representation and managing distributional variations is critical for ensuring that CPDP methods are both robust and reliable (Bala et al.
, 2023.
Gul et al.
, 2.
Machine learning techniques play a crucial role in developing CPDP By leveraging complex algorithms ranging from various classical models to advanced hybrid frameworks that incorporate deep learning researchers aim to improve prediction accuracy and address challenges such as data imbalance and feature heterogeneity between source and target projects (Bhat & Farooq, 2021.
Haque et al.
, 2024.
Zhao et al.
, 2.
This study employs various approaches, such as multiple machine learning classifiers combined with parameter tuning and ensemble learning, to perform Cross-Project Defect Prediction (CPDP).
The ensemble learning method used in this study is voting.
The objective of this research is to conduct CPDP by applying hyperparameter tuning and ensemble learning using the PyCaret framework.
RESEARCH METHOD
This research was conducted by analyzing previously published datasets, specifically utilizing the Kamei Dataset, which includes several open-source software projects such as Bugzilla.
Columba.
JDT.
Mozilla.
Platform, and Postgres.
The dataset is referenced from the study by (Chen et al.
, 2.
and is publicly available at: https://github.
com/Kirso098/CPDP-Kamei.
The research methodology comprises three primary stages: data acquisition, model development, and model In the data acquisition phase, historical data from the Kamei dataset is systematically gathered and prepared for analysis.
During the model development phase, each project dataset is utilized for training and testing various machine learning algorithms, including LightGBM.
XGBoost.
Random Forest.
Gradient Boosting.
Extra Trees.
AdaBoost.
Decision Tree.
K-Nearest Neighbors (KNN).
Linear Discriminant Analysis (LDA).
Ridge Classifier.
Logistic Regression.
Support Vector Machine (SVM).
Naive Bayes, and Quadratic Discriminant Analysis (QDA).
The final stage involves model evaluation, which is conducted Ensemble And Voting Approaches For Defect Prediction Across Multiple Software Projects under a cross-project defect prediction (CPDP) framework, wherein the model is trained on one project and tested on different projects to assess generalizability.
The performance of each model is evaluated using standard classification metrics, including Accuracy.
Precision.
Recall.
F1-Score, and the Area Under the Receiver Operating Characteristic Curve (AUC-ROC).
Figure 1.
Research Method Next, the top three algorithms with the highest accuracy will be selected for hyperparameter tuning.
The best results from each algorithm, both before and after tuning, will then be combined using the voting method .
nsemble learnin.
Among all the algorithms, the one with the best accuracy will be selected as the final model.
RESEARCH RESULTS AND DISCUSSIONS
Dataset Name
Bugzilla
Columba
JDT
Mozilla Table 1.
Bug Ratio Dataset
Total Data Attributes Non Bug
Bug
Bug Ratio
36,71%
30,55%
14,38%
5,24%
Kirso.
Agus Subekti Platform Postgres 14,71% 25,06% The table above presents the details of the six datasets used in this study:
Bugzilla.
Columba.
JDT.
Mozilla.
Platform, and Postgres, all of which are part of the Kamei Dataset.
Each dataset contains 15 attributes, with varying numbers of instances, ranging from 4,455 entries (Columb.
to 98,275 entries (Mozill.
terms of data distribution between the Bug and Non-Bug classes, there is a significant imbalance in several datasets.
For example, the Mozilla dataset has a very low bug ratio of only 5.
24%, while Bugzilla has the highest bug ratio at This indicates that most datasets face challenges related to imbalanced classification, which must be addressed during the predictive model training The JDT and Platform datasets have relatively large numbers of bug instances .
,089 and 9,452, respectivel.
, but due to the overall size of the datasets, the bug ratios remain low.
This highlights the heterogeneity among the datasets in terms of both size and class distribution, making Cross Project Defect Prediction (CPDP) experiments particularly relevant and necessary.
Here is the result of the experiment conducted in this study, which involved six training models using cross dataset testing.
Table 2.
CPDP With Bugzilla Model Model Ae Data Uji Accuracy Precision Recall Bugzilla - Columba Bugzilla - JDT Bugzilla - Mozilla Bugzilla - Platform Bugzilla - Postgres F1-Score Based on Table 2, the results of model testing using the Bugzilla model show that the model performs fairly well overall.
The model performs the lowest on the Columba dataset with an accuracy of 69%, indicating that the model has some difficulty in recognizing patterns within this data.
Conversely, for the JDT.
Mozilla, and Platform datasets, the model demonstrates excellent performance, with accuracy, precision, recall, and F1-score consistently around 80%.
Notably, for the Mozilla dataset, the precision reaches 93%, meaning the model is highly accurate in predicting positive cases and rarely makes errors.
Meanwhile, the Ensemble And Voting Approaches For Defect Prediction Across Multiple Software Projects Postgres dataset shows a decent performance, though not as strong as the three datasets mentioned earlier.
Table 3.
CPDP With Columba Model Model Ae Data Uji Accuracy Precision Recall 1 Columba - Bugzilla Columba - JDT Columba - Mozilla Columba - Platform Columba - Postgres F1-Score Based on the results in Table 3, the evaluation of the Columba model on five test datasets shows generally good overall performance.
The model achieved its lowest results when tested on the Bugzilla dataset, with accuracy, precision, and recall at 69%, and an F1-score of 0.
This indicates that the model was not optimal in recognizing patterns within that dataset.
However, the model performed very well on the JDT.
Platform, and especially Mozilla datasets.
On the Mozilla dataset, the model achieved 85% accuracy, the highest F1-score of 0.
88, and 93% precision, indicating a high level of accuracy in predicting positive instances.
Meanwhile, on the Postgres dataset, the model still showed good performance, with 78% accuracy and an F1-score of 0.
Table 4.
CPDP With JDT Model Model Ae Data Uji Accuracy Precision JDT - Bugzilla JDT - Columba JDT - Mozilla JDT - Platform JDT - Postgres Recall F1-Score Based on the results in Table 4, the evaluation of the JDT model on five test datasets shows that the model has quite varied performance.
The lowest performance was observed on the Bugzilla dataset, with only 67% accuracy and an F1-score of 0.
61, indicating that the model still struggles to recognize patterns in this dataset.
The model's performance improved when tested on the Columba dataset, achieving 73% accuracy and an F1-score of 0.
The best performance was achieved on the Mozilla dataset, where the model reached 87% accuracy, 93% precision, 87% recall, and the highest F1-score of 0.
Excellent results were also seen on the Platform and Postgres datasets, with accuracies of 84% and 80%.
Kirso.
Agus Subekti Overall, the JDT model performs very well, especially on the Mozilla and Platform datasets, but still requires improvement when applied to the Bugzilla dataset to ensure greater stability and generalization.
Table 5.
CPDP With Mozilla Model Model Ae Data Uji Accuracy Precision Recall F1-Score Mozilla - Bugzilla Mozilla - Columba Mozilla - JDT Mozilla - Platform Mozilla - Postgres Based on the results in Table 5, the evaluation of the Mozilla model on five test datasets produced varied outcomes.
The lowest performance occurred when tested on the Bugzilla dataset, with 63% accuracy and an F1-score of 0.
indicating that the model had difficulty recognizing patterns in this dataset.
Performance improved when tested on the Columba dataset, achieving 70% accuracy and an F1-score of 0.
The best results were obtained when the Mozilla model was tested on the JDT and Platform datasets, with accuracies of 86% and 85%, respectively, and identical F1-scores of 0.
This indicates that the Mozilla model was able to understand the patterns in the JDT and Platform datasets very Meanwhile, on the Postgres dataset, performance remained good, with 75% accuracy and an F1-score of 0.
Overall, the Mozilla model performed well, especially when tested on the JDT and Platform datasets, although improvements are still needed for the Bugzilla dataset.
Table 6.
CPDP With Platform Model Model Ae Data Uji Platform - Bugzilla Platform - Columba Platform - JDT Platform - Mozilla Platform - Postgres Accuracy Precision Recall F1-Score Based on the model Platform's testing on five test datasets, it is evident that the model's performance is quite stable and robust.
The lowest performance occurred when tested on the Bugzilla data, with an accuracy of 69% and an F1score of 0.
62, indicating results that still need improvement.
Performance improved on the Columba data, with an accuracy of 72% and an F1-score of 0.
The model showed very good results when tested with the JDT and Mozilla data, achieving Ensemble And Voting Approaches For Defect Prediction Across Multiple Software Projects accuracies of 86% and 90%, and F1-scores of 0.
82 and 0.
The highest values among all tests.
Meanwhile, on the Postgres data, the model also demonstrated fairly good performance with an accuracy of 80% and an F1-score of 0.
Overall, the Platform model performed consistently and excellently, especially when tested with Mozilla and JDT data.
Table 7.
CPDP With Postgres Model Model Ae Data Uji Accuracy Precision Recall F1-Score Postgres - Bugzilla Postgres - Columba Postgres - JDT Postgres - Mozilla Postgres - Platform Based on the testing results of the Postgres model, its overall performance appears to be quite good.
The lowest values occurred when tested with the Bugzilla data, with an accuracy of 66% and an F1-score of 0.
58, indicating relatively low performance on that dataset.
The results improved when tested with the Columba data .
ccuracy and F1-score: 0.
The model demonstrated high and consistent performance when tested on the JDT and Platform data, both recording an accuracy and F1-score of 0.
The best performance was shown when tested on the Mozilla data, with an accuracy of 89% and the highest F1-score of 0.
Overall, the Postgres model exhibits very good performance, especially on the Mozilla data, and remains stable on other datasets.
CONCLUSIONS
This study shows that the performance of models in Cross-Project Defect Prediction (CPDP) is highly influenced by the similarity between the source and target projects.
The models perform well when applied to projects with similar characteristics, such as Mozilla.
JDT, and Platform, achieving accuracy and F1scores above 80%.
This indicates that defect patterns learned from one project can be used effectively to predict defects in other similar projects.
On the other hand, performance drops significantly when predicting defects in the Bugzilla project, likely due to differences in defect patterns or feature incompatibility.
Kirso.
Agus Subekti Theoretically, this finding highlights the importance of project similarity to improve model accuracy and generalization in CPDP.
Practically, it suggests that prediction models should be trained on data from projects that are structurally similar to the target project to ensure more reliable predictions.
This study still has several limitations, such as data distribution differences and class imbalance across projects, which ensemble methods alone could not fully resolve.
It also does not include deep learning or transfer learning approaches, which may offer better feature abstraction.
In addition, the dataset used was limited to several open-source projects and may not represent all real-world software environments.
For future research, it is recommended to apply domain adaptation techniques or feature transformation methods to align data distribution across Exploring hybrid models that combine machine learning with deep learning or meta-learning may also improve CPDP performance.
Additionally, expanding the dataset to include more diverse projects and incorporating temporal or semantic information of code may provide deeper insights into defect patterns across projects.
ACKNOWLEDGEMENTS
Thank you to Li-qiong Chen.
Can Wang, and Shi-long Song for providing this valuable dataset.
REFFERENCES
Bala.
Samat.
Sharif.
, & Manshor.
Improving CrossProject Software Defect Prediction Method Through Transformation and Feature Selection Approach.
Ie Access, 11, 2318Ae2326.
https://doi.
org/10.
1109/ACCESS.
Bhat.
Nayeem Ahmad, & Farooq.
Sheikh Umar.
Local modeling approach for cross-project defect prediction.
Intelligent Decision Technologies, 15.
, 623Ae637.
https://doi.
org/10.
3233/IDT-210130
Chen.
Wang.
, & Song.
Software defect prediction based on nestedstacking and heterogeneous feature selection.
Complex & Intelligent Systems, 8.
, 3333Ae3348.
https://doi.
org/10.
1007/s40747-022-00676-y Goel.
Nandal.
, & Gupta.
An optimized approach for class imbalance problem in heterogeneous cross project defect prediction .
ersion peer review: 1 approved with reservation.
F1000Research, 11.
https://doi.
org/10.
12688/f1000research.
Ensemble And Voting Approaches For Defect Prediction Across Multiple Software Projects Gul.
Faiz.
Bin.
Aljaidi.
Samara.
Alsarhan.
, & al-Qerem.
Impact Evaluation of Significant Feature Set in Cross Project for Defect Prediction through Hybrid Feature Selection in Multiclass.
BioRxiv, https://doi.
org/10.
1101/2023.
Haque.
Ali.
McClean.
Cleland.
, & Noppen.
Heterogeneous Cross-Project Defect Prediction Using Encoder Networks and Transfer Learning.
Ie Access, 409Ae419.
https://doi.
org/10.
1109/ACCESS.
Kumar.
, & Saxena.
Software Defect Prediction Using Hybrid Machine Learning Techniques: A Comparative StudyNo Title.
Journal of Software Engineering Applications, 17.
, 155Ae171.
https://doi.
org/https://doi.
org/10.
4236/jsea.
Lei.
Xue.
Man.
Wang.
Li.
, & Kong.
SDP-MTF: A
Composite Transfer Learning and Feature Fusion for Cross-Project Software Defect Prediction.
Electronics (Vol.
Issue https://doi.
org/10.
3390/electronics13132439 Li.
Wang.
, & Shi.
Within-project and cross-project defect prediction based on model averaging.
Scientific Reports, 15.
, 6390.
https://doi.
org/10.
1038/s41598-025-90832-4 Sekaran.
, & Lawrence.
Leveraging Levy Flight and Greylag Goose Optimization for Enhanced Cross-Project Defect Prediction in Software Evolution.
Journal of Software: Evolution and Process, 37.
, e70013.
https://doi.
org/https://doi.
org/10.
1002/smr.
Tahir.
Gencel.
Rasool.
Umer.
Rasheed.
Yeo.
, & Cevik.
Early Software Defects Density Prediction: Training the International Software Benchmarking Cross Projects Data Using Supervised Learning.
Ie Access, 141965Ae141986.
https://doi.
org/10.
1109/ACCESS.
Zhao.
Zhu.
Yu.
, & Chen.
Cross-Project Defect Prediction Considering Multiple Data Distribution Simultaneously.
In Symmetry (Vol.
Issue .
https://doi.
org/10.
3390/sym14020401