a Jurnal Pendidikan Teknologi Informasi dan Komunikasi JENTIK JENTIK https://ejournal. com/index. php/jentik Wrapper Feature Selection Method for Predicting Student Dropout in Higher Education Anuradha Kumari Singh1*. Karthikeyan2 Department of Computer Science. Banaras Hindu University. India anuradha@bhu. Article Information: Received June 05, 2025 Revised July 09, 2025 Accepted July 22, 2025 Keywords: Ant Colony Optimization. Artificial Intelligence in Education. Machine Learning. Student Dropout. Wrapper Technique Abstract Background of Study: Student dropout in higher education is influenced by a variety of factors including demographic, socioeconomic, macroeconomic, admission-related, and academic performance data. Accurately identifying students at risk of dropping out is a significant challenge within educational data mining (EDM), especially when working with large, complex Aims and Scope of Paper: This study aims to identify an optimal subset of features that can improve the accuracy of student dropout prediction. The scope includes comparing the effectiveness of different machine learning algorithms combined with a heuristic-based feature selection method to find the bestperforming model. Methods: A Wrapper-based feature selection approach was employed using Ant Colony Optimization (ACO) as the search ACO was integrated with five classifiersAiRandom Forest (RF). Logistic Regression (LR). K-Nearest Neighbors (KNN). Support Vector Machine (SVM), and Neural Network (NN)Aito select the most relevant feature subsets. The performance of each combination was evaluated and compared. Result: The study found that ACO combined with Random Forest (ACO-RF) outperformed the other combinations in feature selection effectiveness. The selected features were then validated using various machine learning algorithms and a neural network. Among them, the neural network achieved the highest accuracy of 93%. Conclusion: The proposed ACO-RF wrapper method is an effective feature selection strategy for predicting student dropout in higher education. The method enhances model performance, especially when used with neural networks, and offers a promising approach for early identification of at-risk students. Introduction Educational Data Mining (EDM) is an important approach to learning analytics, obtaining extensive applications in tasks such as knowledge tracing, recommendation systems, aiding data-driven decisionmaking, predicting performance and early dropout of students, and many more. Predicting studentsAo dropout in higher education institutions is one of the most challenging problems in EDM. EDM applies machine learning, statistics, data mining, educational psychology, cognitive psychology, and related methodologies to examine educational data. According to (Aleem & Gore, 2. EDM methodology can be classified into six groups: data extraction, prediction, association mining, structure discovery, model-based recognition, and hybrid approaches. AI and machine learning technologies have the potential to enhance the learning experience of students. The machine learning approaches have used in several studies to assess the studentAos performance (Aulck et al. , 2. (Li et al. , 2. In this work, undergraduate studentsAo How to Cite ISSN Published by : Singh. , & Karthikeyan. Wrapper Feature Selection Method for Predicting Student Dropout in Higher Education. JENTIK : Jurnal Pendidikan Teknologi Informasi dan Komunikasi, 4. , 58Ae76. https://doi. org/10. 58723/jentik. : 2963-1963 : CV Media Inti Teknologi ABDIGERMAS JENTIK Wrapper Feature Selection Method for Predicting Student Dropout in Higher Education demographic, socioeconomic, macroeconomic, data at admission, and academic performance from the first and second semesters have been used to predict student dropouts. We have also used machine learning algorithms to address this prediction problem. Student Dropout Prediction In higher education research. Tinto's paper in (Nicoletti, 2. introducing a new method for modeling and analyzing student dropouts. Another study (Siri, 2. examined 810 enrollees in a health care professions degree program at the University of Genoa during the 2008-09 academic year. Data collection included administrative records of student progress, statistical data from a custom survey, and information from telephone interviews with dropout students. They utilized an artificial neural network (ANN) for classification task. In another study addressing university dropout, presented in (Kadar et al. , 2. , the focus is on predicting student dropout using a distinct student representation. This approach utilizes data collection through webcams, eye trackers, and similar devices within a smart classroom environment. By analyzing emotions and detecting patterns among students in the classroom based on this data, the study aimed to forecast potential dropouts. Another paper performs an integral systematic literature review about university dropout prediction through data mining, with studies from 2006-20018 (Alban & Mauricio, 2. In the research conducted by (Del Bonifro et al. , 2. , their tool assessed the likelihood of a student discontinuing an academic program. This toolAos application is flexible, and suitable for use during the application process or within the first-year course credits. The study explored classification methods including Linear Discriminant Analysis (LDA). Support Vector Machine (SVM), and Random Forest (RF). (Wan Yaacob et al. , 2. explored data mining techniques to predict dropout rates of Computer Science undergraduate students at University Technology MARA after three years of enrollment. The study compared various classifiers, including LR. RF. KNN, and Neural network, to determine the most effective Their analysis revealed that the LR model outperformed others in predicting dropout among students and identifying potential subject-related causes. There is another paper (Gardner & Brooks, 2. in which the authors tried to find the best model for predicting student dropout rates in MOOCs using the Friedman and Nemenyi Two-Stage Produce. There is a lot of research being conducted to determine the optimal model and factors for predicting student dropout rates, either in MOOCs or in higher educational To build a dropout prediction model, we faced the challenge of identifying the proper features that influence early student dropout from the course. To overcome this challenge we explored different feature selection Feature selection (FS) is a process of selecting relevance subset of features from the original set. According to (Villa-Blanco et al. , 2. , classified feature subsets into four kinds depending on their relevance and redundancy: . Noisy and irrelevant. Redundant and weakly relevant. Weakly relevant and non-redundant. Strongly relevant. Features that do not contribute to predicting accuracy are referred to as irrelevant features. FS methods are classified into three types based on their interaction with the learning model: Filter. Wrapper, and Embedded methods. The Filter method selects features using statistical measures, operating independently of the learning algorithm and demanding less computational Statistical measures such as information gain (Sypyrtyly et al. , 2. , chi-square test. Fisher score, correlation coefficient, and variance threshold are employed to assess the significance of features. The wrapper methodAos efficacy relies on the classifier employed, as it chooses the most suitable feature subset based on classifier performance. While Wrapper methods produce greater computational costs compared to Filter methods due to iterative learning and cross-validation, however they yield higher accuracy. The Embedded method, on the other hand, utilizes ensemble learning and hybrid learning techniques to perform FS. Feature Selection Feature Selection (FS) is the process of identifying a subset of features from the original feature space. Many methods exist for feature selection, such as Sequential Backward Selection (SBS) and Sequential Forward Selection (SFS). In the SFS method, the process starts with the smallest number of features and then incrementally adds features throughout the selection process. In contrast, the SBS method starts with all features and iteratively removes the least significant ones. After completing the process, the subset of features that persists in both method is deemed the optimal subset. The three types of FS methods: Filter method. Wrapper method, and Embedded method. Figure 1 represents an overview of different FS methods. In our study, we employed the wrapper method. Jurnal Pendidikan Teknologi Informasi dan Komunikasi Vol 4. No 1, pp58-76 2025 ABDIGERMAS JENTIK Singh. , & Karthikeyan. There have been many studies which utilizing evolutionary algorithms to reduce the features. In another work (Antony Gnana Singh et al. , 2. , the authors used genetic algorithm (GA) for dimensionality reduction, to enhance accuracy in the classification process. Another study (Mukhlif et al. , 2. introduced an ant colony optimization (ACO) techniques for FS. It indicated that ACO exhibits reduce computational complexity compared to stochastic algorithms like GA. There is another study (Zahedi et , 2. , which introduced an automatized bee colony based algorithm for FS aimed at addressing a classification problem. According to paper (Xiao et al. , 2. , authors proposed the RnkHEU model, which combined ranking-based forward and heuristic search technique shown potential for enhancing the accuracy of predicting studentAos performance. Additionally, in paper (Turabieh et al. , 2. , a modified version of the Harris Hawks optimization (HHO) algorithm was proposed, performing as a feature selection method to identify the most valuable features for the student performance prediction problem. Most studies on student dropout prediction use filter methods. For example, one study (Cheng et al. , 2. applied genetic algorithms for FS in dropout prediction. Another compared correlation-based and entropybased FS methods (Setiadi et al. , 2. (Febro, 2. evaluated various filter methods . , correlation analysis, chi-square, gain rati. to identify retention factors. (Nuanmeesri et al. , 2. used filter methods and multilayer perceptron neural networks to forecast dropouts during the COVID-19 pandemic. We didn't find any studies that apply wrapper methods for student dropout prediction. One exception is (Youssef et , 2. , who used a wrapper approach for predicting dropouts in MOOCs. Table 1 depicts some recent literature on feature selection techniques used for student dropout prediction. Figure 1. Overview of Filter. Wrapper, and Embedded methods Table 1. Related literature for feature selection techniques used for student dropout prediction Papers Feature Selection Techniques . Genetic Algorithm Correlation-based feature selection and entropy-based feature selection . Filter method . orrelation feature selection. Chi-Square analysis, and gain rati. Filter method . orrelation feature selection. Chi-Square analysis, and gain rati. Wrapper method Ensemble method Ensemble feature selection . Filter methods and classification algorithm Feature X integrates filter-based and wrapper-based methods Jurnal Pendidikan Teknologi Informasi dan Komunikasi Vol 4. No 1, pp 58-76 2025 ABDIGERMAS JENTIK Wrapper Feature Selection Method for Predicting Student Dropout in Higher Education Ant Colony Optimization Ant Colony Optimization (ACO) is inspired by the foraging behavior of ants, which relies on indirect communication through chemical pheromone trails. This communication helps ants discover the shortest paths between their nest and food sources. In ACO, optimization is achieved by continuously updating pheromone trails and moving ants throughout the search space based on simple mathematical formulas involving transition probabilities and the total pheromone in a given area. During each iteration. ACO generates global ants and evaluates their fitness. this study, the fitness score is calculated using the classifier's accuracy as shown in equation . The pheromone levels and edges in weaker regions are updated accordingly. If the fitness improves, local ants are directed toward better regions. otherwise, a new random search direction is chosen. The pheromone levels are then adjusted, including evaporation to reduce old pheromones. fitness=accuracy. Continuous ACO incorporates both local and global search mechanisms. Local ants are guided toward the most promising regions based on the transition probability . s shown in equation . ) of region K, aiming to find the optimal solution. ycEya . = OcycA ya . yc=1 ycyc . where ycya . total pheromone at region K and n is the number of global ants. Pheromone is updated using the following equation . , t i . = . * ycycn *. , . where r is the pheromone evaporation rate. The probability of solution of the region for local ants is proportional to the pheromone trail. The process flow for ACO is shown in Figure 2. Figure 2. Process flow for ACO Jurnal Pendidikan Teknologi Informasi dan Komunikasi Vol 4. No 1, pp58-76 2025 ABDIGERMAS JENTIK Singh. , & Karthikeyan. Motivation and Justification: While Filter methods are widely used due to their simplicity, they may overlook complex interactions among features that are critical for dropout prediction. Similarly. Embedded methods, though efficient, often require tight coupling with specific algorithms. Therefore, this study adopts the Wrapper approach, which, despite its higher computational cost, allows feature selection tailored to the learning algorithm's performance, leading to better model outcomes. To address the limitations of deterministic wrapper methods, we employ a heuristic search strategy Ant Colony Optimization (ACO) due to its proven ability to explore large search spaces effectively and avoid local optima. Despite ACOAos known utility in feature selection tasks, its application within student dropout predictionAiespecially using a wrapper framework remains underexplored. Our this work fills that gap. Study Objectives: Our goal is to develop a model that identifies the most informative and non-redundant features for predicting student dropout. We applied the Wrapper method for FS using ACO as the search Given the class imbalance in our dataset, we used the SMOTE-Tomek links method to balance the data. Subsequently, we tested ACO with five classifiers Random Forest (ACO-RF). Logistic Regression (ACO-LR). K-Nearest Neighbors (ACO-KNN). Support Vector Machine (ACO-SVM), and Neural Network (ACO-NN) to evaluate the quality of selected features. The Random Forest classifier yielded the most optimal feature subset and was selected for further analysis. We propose the following research questions (RQ): RQ1: Will it be possible to obtain optimum features set to predict dropout student using ACO-RF? RQ2: Will ACO-RF helps to increase the accuracy of the models? RQ3: Will Neural networks perform better than machine learning algorithms on selected features? Addressing these questions is crucial within the context of Educational Data Mining because they directly impact the reliability, interpretability, and applicability of dropout prediction systems. Identifying the most relevant features (RQ. supports the development of simpler and more interpretable models, which are essential for institutional stakeholders. Demonstrating that ACO-RF improves model performance (RQ. validates the use of metaheuristic search strategies in educational domains. Finally, comparing neural networks and traditional models (RQ. helps determine the appropriate computational complexity for deployment in real-world educational settings. By answering these (RQ. , the study advances EDM practices and provides actionable insights for early intervention policies in higher education. The rest of the article is structured as follows: Section B details the proposed methodology. Section C presents results and discussion, and Section D concludes the study and outlines future directions. Methodology The steps of the proposed methodology are detailed below and illustrated in Figure 3: Preprocessing of the Dataset: This step involves handling missing values, encoding categorical variables, and scaling numerical features to prepare the data for further analysis. Balancing the Imbalanced Dataset using SMOTE-Tomek Links: SMOTE (Synthetic Minority Oversampling Techniqu. addresses class imbalance by generating synthetic examples of the minority class. Tomek links identify pairs of instances from different classes that are close to each other, and removing these pairs can improve the decision boundary. Combining SMOTE with Tomek links helps balance the dataset effectively. Splitting the Dataset into Train and Test Sets: The dataset is divided into training and testing sets. The training set is used to train the model, while the test set is used to evaluate its performance. Jurnal Pendidikan Teknologi Informasi dan Komunikasi Vol 4. No 1, pp 58-76 2025 ABDIGERMAS JENTIK Wrapper Feature Selection Method for Predicting Student Dropout in Higher Education Figure 3. Overview of proposed methodology. Applying ACO Techniques for Feature Selection: Initialize the Population: Start with a set of candidate solutions . eature subset. Construct the Solution: Build solutions incrementally using heuristics or stochastic methods. Binary Encoding of Features in the Solution: Represent each feature subset as a binary string. Binary encoding was chosen to represent feature subsets because it offers a simple, compact, and efficient representation of the feature selection problem. In this format, each feature is represented by a binary valueAi1 if the feature is selected, and 0 if it is not. This encoding is highly compatible with metaheuristic algorithms like Ant Colony Optimization (ACO), which explore solution spaces through combinatorial search. Binary encoding allows the algorithm to evaluate and update feature combinations in a structured way, making it easier to track inclusion/exclusion patterns, apply pheromone updates, and compute fitness scores. Moreover, this representation reduces the complexity of the search space and enhances convergence efficiency while maintaining flexibility across different classifier types. Select Feature Subset Using a Classifier: Evaluate the quality of each feature subset using classifiers such as Random Forest. SVM. Logistic Regression. Neural Network, or KNN. Jurnal Pendidikan Teknologi Informasi dan Komunikasi Vol 4. No 1, pp58-76 2025 ABDIGERMAS JENTIK Singh. , & Karthikeyan. Update the Best Feature Subset: Keep track of the best-performing feature subset found so far. Perform Local Pheromone Updating: Update pheromone levels based on the quality of solutions found within a local neighborhood. Perform Global Pheromone Updating Based on the Best Feature Subset: Update pheromone levels globally based on the performance of the best solution identified. Return the Best Feature Subset and Best Fitness Score: Once the algorithm converges or reaches a stopping criterion, return the best feature subset found and its corresponding fitness score. Performance Validation of the Best Feature Subset Using Five Learning Algorithms: Train and evaluate models using the best feature subset obtained from the ACO technique. This step employs five different learning algorithms for comparison. Performance Evaluation of Models Using Metrics: Evaluate each model's performance using standard classification metrics such as accuracy, precision, recall. F1-score, and AUC-ROC (Area Under the Receiver Operating Characteristic Curv. These metrics provide insights into the models' classification accuracy and their ability to correctly identify positive and negative instances. Proposed Algorithma Algorithm 1: ACO solutions construction algorithm Input : Initialize num_ants=10, alpha=1. 0, beta=2. 0, num_feature=X. , pheromone=0. Output: solutions 1: Initialize empty list solutions 2: For each ant in range. um_ant. Initialize empty list features For each feature index in range. um_feature. Calculate selection probabilities probs using pheromone levels and heuristic information Normalize probs so that they sum to 1 Randomly select a feature index based on the probabilities probs Append the selected feature index to features Encode the features list into a solution format Append the encode solution to solutions 11: Return solutions The Algorithm 1 illustrates the solution construction process using the ACO technique. The parameter num_ants determines how many solutions are constructed per iteration. The parameters alpha and beta control the relative importance of pheromone levels and heuristic information, respectively. Pheromone levels guide the feature selection process, while num_features represents the total number of features in the For this experiment, we empirically chose the values for the parameters of the ACO optimizer such as num_ants=10, alpha=1. 0, beta=2. 0, and pheromone=0. The proposed ACO-RF algorithm is detailed in Algorithm 2. The workflow begins with a main loop that continues until the termination criterion is met . maximum of 100 iteration. Effective termination criteria help minimize computational complexity in identifying an optimal feature subset and prevent overfitting. The choice of stopping criteria is influenced by decisions made in earlier stages. Common stopping criteria include: . a predefined number of features, . a predefined number of iterations, . a percentage of improvement over successive iteration steps, and . criteria based on the evaluation function. In our proposed algorithm, we used a predefined number of iterations. Algorithm 2: Proposed ACO-RF feature selection algorithm Input : Initialize best solution = None. Best score = OeO, iteration = 0 Output: selected features, fitness score 1:While termination criterion . Solutions = construct solution() For each solution in Solutions: selected features=indices of selected features in solution score= evaluate solution RandomForest. raining data with selected features, training labels, test data with selected features, test label. if score >best score: update best score = score update best solution=selected features Perform local update on selected features Jurnal Pendidikan Teknologi Informasi dan Komunikasi Vol 4. No 1, pp 58-76 2025 ABDIGERMAS JENTIK Wrapper Feature Selection Method for Predicting Student Dropout in Higher Education Perform global update on best solution Increment iteration by 1 12: Return the best solution and best score. Within each iteration, the first step involves constructing the solutions using Algorithm 1. In the second step, each solution in solutions is evaluated. The selected features from each solution are identified, and the Random Forest model is trained and tested using these features. If the current score surpasses the best score, both best score and best solution are updated. A local pheromone update is then performed based on the selected features. The third step involves a global pheromone update using the best solution. Once the loop concludes, the algorithm returns the optimal feature subset along with its fitness score. Dataset We have used a public dataset (Realinho et al. , 2. of the Polytechnic Institute of Portalegre. The same dataset that we used in our previous study (Singh & Karthikeyan, 2. , the dataset size is 743 KB and it contains 4424 records with 37 attributes. The dataset includes macroeconomic data, socioeconomic data, demographic data, and admission data and consists of semester first and semester second internal and external exam data from the institution. The original dataset has 3 targets class dropout, graduate, and enrolled, but for our work, we only used 2 classes i. dropout and graduate. To refine the classification task, we excluded students labeled as "enrolled" and focused solely on the two outcome-based categories: dropout and graduate. This decision was based on the fact that the enrolled students had not yet completed their academic journey at the time of data collection. Including them would introduce ambiguity, as their final statusAiwhether they would eventually graduate or drop outAiwas unknown. Retaining only instances with definitive outcomes enables the model to learn more distinct patterns and relationships, thereby improving the reliability and interpretability of dropout prediction results. Hence after removing enrolled records, we left with 3630 records. Table 2 provides a comprehensive breakdown of the dataset's attributes, categorized into distinct classes: demographic characteristics, socioeconomic factors, macroeconomic indicators, data at the time of student enrollment, and academic data of the first and second semesters. Table 2. Attributes used grouped by class of attribute Attribute Type Marital status Numeric/discrete Nationality Numeric/discrete Demographic data Age at enrollment Numeric/discrete Displaced Numeric/binary Marital status Numeric/discrete Gender Numeric/binary International Numeric/binary Socioeconomic data Father's qualification Numeric/discrete Mother's qualification Numeric/discrete Mother's occupation Numeric/discrete Father's occupation Numeric/discrete Educational special needs Numeric/binary Debtor Numeric/binary Tuition fees up to date Numeric/binary Scholarship holder Numeric/binary Macroeconomic data Unemployment data Numeric/continuous Inflation Numeric/continuous GDP Numeric/continuous Application mode Numeric/discrete Admission data at enrollment Application order Numeric/ordinal Course Numeric/discrete Daytime/evening attendance Numeric/binary Previous qualification Numeric/discrete Curricular units 1st semester Numeric/discrete . Curricular units 1st semester Numeric/ordinal Academic data at the end of 1st . Curricular units 1st semester Numeric/discrete . Class of Attribute Jurnal Pendidikan Teknologi Informasi dan Komunikasi Vol 4. No 1, pp58-76 2025 ABDIGERMAS JENTIK Singh. , & Karthikeyan. Class of Attribute Academic data at the end of 2nd Target Attribute Curricular units 1st . Curricular units 1st . Curricular units 1st . ithout valuation. Curricular units 2nd . Curricular units 2nd . Curricular units 2nd . Curricular units 2nd . Curricular units 2nd . Curricular units 2nd . ithout valuation. Target Type Numeric/binary Numeric/continuous Numeric/discrete Numeric/discrete Numeric/ordinal Numeric/discrete Numeric/binary Numeric/continuous Numeric/discrete Categorical Data Preprocessing. Balancing, and Train and Test split In this phase, data cleaning and preprocessing were performed. We handled missing data using imputation and removal methods, transformed the data by encoding categorical variables, scaled the features, created new features, and split the data into training and testing sets. We used preprocessed data from our previous study, where the dataset included 3,630 records with two classes: dropout, and graduate. Figure 4. illustrates the distribution of high-dimensional student data using the t-distributed Stochastic Neighbor Embedding . -SNE) technique. Each point represents a student instance, where the blue dots . indicate students who did not drop out, and the orange dots . represent students who eventually dropped out. It is the distribution of student records before handling the class imbalance issue, with a clear dominance of the majority class . over the minority class . This imbalance can lead to biased predictive models that favor the majority class, reducing the ability to correctly identify at-risk students. The significant overlap between the two classes in the plot also highlights the presence of noisy, irrelevant, or redundant features, which can obscure the true decision boundary. These observations emphasize the necessity of applying both class balancing techniquesAisuch as SMOTE-Tomek linksAiand feature selection methods. Specifically, the ACO-based wrapper approach was employed to identify a compact, informative subset of features that improves model performance by enhancing class separation and reducing overfitting. Figure 4. t-SNE Visualization of Student Dropout Dataset . ith/without balancin. We applied SMOTE-Tomek links to balance the imbalanced dataset. Figure 4. , illustrates the t-SNE plot of balanced dataset. As in our dataset, the number of dropout students is fewer than the number of graduated Jurnal Pendidikan Teknologi Informasi dan Komunikasi Vol 4. No 1, pp 58-76 2025 ABDIGERMAS JENTIK Wrapper Feature Selection Method for Predicting Student Dropout in Higher Education After applying the SMOTE-Tomek links we get 4186 records from 3630 records. The dataset was sorted by target features and then split into training . %) and testing . %) sets, with shuffling applied. After the split, we applied standardization to the training (X_trai. Standardization, also known as feature scaling, is a crucial preprocessing step in machine learning and statistics. It involves transforming the values of different features in the dataset to a common scale or distribution. Experimental setup The experiments were conducted using Python programming language . , the scikit-learn framework . ) for data preprocessing tasks, and the tensorflow which provides access to the implementation of neural network, training, and testing run on Google Colab environment with T4 GPU. The details of the parameter we set for our neural network are shown in the Table 3. We optimized the hyperparameters of machine learning algorithms with different optimizers such as GridSearchCV and RandomSearchCV as shown in the Table 4. Table 3. Parameters Set Up for the Neural Network Model. Parameter Number of layers Number of neurons in input unit Number of neurons in dense layer 2 Number of neurons in dense layer 3 Number of neurons in dense layer 4 Batch size Learning rate Optimizer Epochs Activation function Computation mode Value Adam Sigmoid T4 GPU Table 4. Parameters Set Up for the Neural Network Model. Hyperparameter Models Parameters Optimizer C: 0. Penalty: l1. Solver: liblinear GridSearchCV() Bootstrap: False. Criterion: entropy. Max depth: None. Max features: 6. Min samples leaf: 24. Min RandomSearchCV() samples split: 16 Criterion: gini. Max depth: 5. Max leaf nodes: 5. GridSearchCV() Min impurity decrease: 0. Splitter: random SVM C: 1. Kernel: linear. Gamma: scale GridSearchCV() KNN n_neighbors: 3 C: 0. Penalty: l1. Solver: liblinear GridSearchCV() Bootstrap: False. Criterion: entropy. Max depth: None. Max features: 6. Min samples leaf: 24. Min RandomSearchCV() samples split: 16 Evaluation Metrics The following metrics have been used to evaluate the models: Recall (REC), or true positive rate (TPR), represents the proportion of students who dropped out and were correctly predicted to do so. The false positive rate (FPR) measures the proportion of students who graduated but were incorrectly predicted to drop out. Specificity (SPEC), also known as the true negative rate (TNR), indicates the proportion of students who did not drop out and were accurately predicted to graduate. The ROC (Receiver Operating Characteristi. curve plots the true positive rate (TPR) against the false positive rate (FPR) as the threshold for classifying events as positive or negative is varied. The AUC (Area Under the Curv. represents the area under the ROC curve, providing a single scalar value that summarizes the overall performance of a binary classification mode The confusion metric parameters are used to examine the performance of our prediction models: Precision P. Recall R, and F1-score F. Their equations are given below . , . , . Jurnal Pendidikan Teknologi Informasi dan Komunikasi Vol 4. No 1, pp58-76 2025 ABDIGERMAS JENTIK Singh. , & Karthikeyan. TP FP TP FN 2O(ROP) R P where TP (True Positiv. counts non-successful students who are correctly classified as dropouts. TN (True Negativ. counts successful students who are correctly classified as successful. FP (False Positiv. counts successful students who are misclassified as dropouts . ype II erro. , and FN (False negativ. count nonsuccessful students who are misclassified as successful . ype I erro. Accuracy . is simply defined as a ratio of correctly predicted student at-risk retention status to the total number of students. Accuracy = TP TN TP FP FN TN Results and Discussion Results The study aimed to identify students who were at risk of dropping out their studies on time. Our dataset included three classes: enrolled, dropped out, and graduated. We focused solely on the dropout and graduated labels, removing the enrolled records. The preprocessed dataset was separated into two groups. 80% of the datasets were utilized for training, while 20% were for testing. The dataset had 2904 training samples and 858 testing samples, as indicated by the sample count. ACO parameters are set to maximum-iteration= 100. maximum ants count = 10. pheromone evaporation rate = 0. heromone importanc. = 1. beta (Heuristic information importanc. = 2. 0, with an average run of 20 times. The experiment is performed using cross-validation (CV) to yield more robust CV . splits the input into training data and test data independent of each other. Although CV may increase the computational time, it reduces the chance of overfitting and provides a more reliable In this study, we used parallel CV, and the final accuracy would be on average the accuracy for each of the folds. In this study, we compared the four machine learning models and neural network performance with the feature subset selected by ACO-RF. The Tables 5 and 6 show all feature subsets selected by ACO-RF and ACO-LR respectively in 20 trials. Tables 7, 8, and 9 show all feature subsets selected by ACO-KNN. ACOSVM, and ACO-NN respectively in the first 10 trials. The results indicate that ACO-RF achieves the highest fitness score of 0. Therefore, we propose using ACO-RF for feature selection in predicting student Table 5. Feature Selection Using ACO-RF Experiment No. Selected Features . , 1, 2, 3, 7, 12, 18, . , 1, 2, 3, 7, 12, 17, . , 1, 2, 3, 4, 6, 24, . , 1, 2, 3, 7, 9, 13, 18, . , 1, 2, 3, 9, 10, 12, 14, . , 1, 2, 3, 7, 8, 11, 24, . , 1, 2, 3, 8, 9, 12, . , 1, 3, 6, 12, 16, . , 1, 2, 3, 5, 8, 11, 14, 19, 30, . , 1, 2, 3, 5, 6, 8, 9, 11, 13, 18, . , 1, 2, 3, 4, 5, 6, 7, 11, 18, . , 1, 2, 3, 6, 7, 25, . , 1, 2, 3, 4, 5, 30, . , 1, 2, 3, 4, 5, 10, 16, 17, 24, . , 1, 2, 3, 4, 8, 10, 11, 12, 20, . , 1, 2, 7, 9, 12, . No. of Features Fitness Score Jurnal Pendidikan Teknologi Informasi dan Komunikasi Vol 4. No 1, pp 58-76 2025 ABDIGERMAS JENTIK Wrapper Feature Selection Method for Predicting Student Dropout in Higher Education Experiment No. Selected Features . , 1, 2, 3, 4, 5, 6, 11, . , 1, 2, 3, 6, 9, 25, . , 1, 2, 3, 4, 5, . , 1, 2, 3, 4, 6, 8, 16, . No. of Features Fitness Score Table 6. Feature Selection Using ACO-LR. Experiment No. Selected Features . , 1, 2, 4, 5, 6, 8, 12, . , 1, 2, 3, 6, 11, 16, 21, . , 1, 2, 7, 12, 25, 27, . , 1, 2, 4, 17, . , 1, 2, 4, 5, 10, 18, . , 1, 2, 3, 9, 12, 14, . , 1, 2, 3, 6, 14, 16, . , 1, 2, 3, 4, 26, . , 1, 2, 3, 4, 6, . , 1, 2, 4, 7, 16, . , 1, 2, 3, 4, 5, 6, 7, 21, . , 1, 2, 3, 5, . , 1, 2, 3, . , 1, 2, 4, 5, 9, 14, 23, . , 1, 2, . , 1, 2, 4, 5, 10, . , 1, 2, . , 1, 2, 4, 6, 8, 13, 16, . , 1, 2, 3, 4, 9, . , 1, 2, 3, 9, 22, 25, . No. of Features Fitness Score Table 7. Feature Selection Using ACO-KNN. Experiment No. Selected Features . , 1, 2, 3, 4, 21, . , 1, 2, 5, . , 1, 2, 3, 4, 10, . , 1, 2, 4, 5, 11, 14, . , 1, 3, . , 1, 2, 3, 5, 9, 24, 29, . , 1, 2, 3, 4, 6, 16, . , 1, 2, 3, 4, 6, . , 1, 2, 3, 4, 11, 14, 25, . , 1, 2, 3, 4, 6, 13, 15, 18, . No. of Features Fitness Score Table 8. Feature Selection Using ACO-SVM. Experiment No. Selected Features . , 1, 2, 3, 4, 5, 6, 20, . , 1, 2, 3, 4, 10, 23, . , 1, 2, 3, 4, . , 1, 2, 3, 4, 10, 11, 13, 14, 20, . , 1, 2, 3, 4, 6, 10, . , 1, 2, 3, 5, 6, 26, . , 1, 2, 3, . , 1, 2, 3, 4, 5, 7, 8, 15, . , 1, 2, 3, 5, . , 1, 2, 3, 4, 5, 6, 16, . No. of Features Fitness Score Jurnal Pendidikan Teknologi Informasi dan Komunikasi Vol 4. No 1, pp58-76 2025 ABDIGERMAS JENTIK Singh. , & Karthikeyan. Table 9. Feature Selection Using ACO-NN. Experiment No. Selected Features No. of Features . , 1, 2, 5, . , 1, 2, 3, 4, 6, 10, 11, 12, 25, 26, . , 1, 2, 3, . , 1, 2, 4, 5, 12, . , 1, 2, 4, 5, 12, . , 1, 2, 3, . , 1, 2, 3, 7, 12, 13, . , 1, 2, 3, 9, 28, . , 1, 2, 3, 4, 10, 11, 16, 17, . Fitness Score Table 10. Comparison of performance evaluation of models on different metrics with full features. Models Accuracy Precision Recall F1-score KNN Table 11. Validation of feature subsets extracted by ACO-RF using different ML models, and a neural Model KNN Selected Features . ,1,2,3,4,8,10,11,12,20,. ,1,2,3,5,8,11,14,19,30,. ,1,2,3,4,5,10,16,17,24,. ,1,2,3,4,6,8,16,. ,1,2,3,4,5,30,. ,1,2,3,7,8,11,24,. ,1,2,3,9,10,12,14,. ,1,2,3,4,6,24,. ,1,2,3,7,12,17,. Accuracy Precision F1-score Recall AUC ,1,2,3,4,8,10,11,12,20,. ,1,2,3,5,8,11,14,19,30,. ,1,2,3,4,5,10,16,17,24,. ,1,2,3,4,6,8,16,. ,1,2,3,4,5,30,. ,1,2,3,7,8,11,24,. ,1,2,3,9,10,12,14,. ,1,2,3,4,6,24,. ,1,2,3,7,12,17,. ,1,2,3,4,8,10,11,12,20,. ,1,2,3,5,8,11,14,19,30,. ,1,2,3,4,5,10,16,17,24,. ,1,2,3,4,6,8,16,. ,1,2,3,4,5,30,. ,1,2,3,7,8,11,24,. ,1,2,3,9,10,12,14,. ,1,2,3,4,6,24,. ,1,2,3,7,12,17,. ,1,2,3,4,8,10,11,12,20,. ,1,2,3,5,8,11,14,19,30,. Jurnal Pendidikan Teknologi Informasi dan Komunikasi Vol 4. No 1, pp 58-76 2025 ABDIGERMAS JENTIK Model Wrapper Feature Selection Method for Predicting Student Dropout in Higher Education Selected Features . ,1,2,3,4,5,10,16,17,24,. ,1,2,3,4,6,8,16,. ,1,2,3,4,5,30,. ,1,2,3,7,8,11,24,. ,1,2,3,9,10,12,14,. ,1,2,3,4,6,24,. ,1,2,3,7,12,17,. Accuracy Precision F1-score Recall AUC ,1,2,3,4,8,10,11,12,20,. ,1,2,3,5,8,11,14,19,30,. ,1,2,3,4,5,10,16,17,24,. ,1,2,3,4,6,8,16,. ,1,2,3,4,5,30,. ,1,2,3,7,8,11,24,. ,1,2,3,9,10,12,14,. ,1,2,3,4,6,24,. ,1,2,3,7,12,17,. Tables 10 and 11 present a comparative analysis of model performance on the full feature set and on optimized subsets obtained through ACO-based feature selection, respectively. Notably, the neural network (NN) achieved the highest accuracy . 25%) and AUC . after feature selection, surpassing its original accuracy of 89. 81% on the full dataset. KNN showed the most significant improvement, with its F1-score rising from 0. 8007 to 0. 8685 and accuracy increasing by 2. This indicates KNNAos sensitivity to irrelevant features and its benefit from dimensionality reduction. In contrast, logistic regression (LR), which previously outperformed all models with 92. 15% accuracy on the full dataset, experienced a decline to 31% post-selection, suggesting that LR benefits more from a complete feature space. Random forest (RF) and decision tree (DT) models showed minor changes in accuracy but consistent improvements in F1score, highlighting their robustness to feature pruning. Overall, these results demonstrate that feature selection can significantly enhance model generalizationAiparticularly for complex and non-parametric models such as NN and KNN. Table 12. Selected Features by ACO-RF. Sequence No. Selected Features Marital Status Application mode Application order Course Nationality Father's qualification Mother's occupation Father's occupation Age at enrollment Curricular units 2nd semester . Jurnal Pendidikan Teknologi Informasi dan Komunikasi Vol 4. No 1, pp58-76 2025 ABDIGERMAS JENTIK Singh. , & Karthikeyan. Figure 5. ROC curve of compared models using best subset . ,1,2,3,4,8,10,11,12,20,. obtained by ACO-RF. In Figure 5 represents ROC curves of all compared models are shown with a 10-fold CV. NN outperformed the other machine learning models. The Figure 6 depicts the models accuracy on the best subset selected . ,1,2,4,8,10,11,12,20,. by ACO-RF. Figure 6. Models Accuracy on best subset selected . ,1,2,4,8,10,11,12,20,. by ACO-RF. Jurnal Pendidikan Teknologi Informasi dan Komunikasi Vol 4. No 1, pp 58-76 2025 ABDIGERMAS JENTIK Wrapper Feature Selection Method for Predicting Student Dropout in Higher Education The features selected by ACO-RF that affect the dropout rate of students in Higher Education as shown in Table 12 are marital status, application mode, application order, course, nationality, father's qualification, mother's occupation, father's occupation, age of enrollment, and curricular units second semester . Most of the selected features subset included curricular units second semester . feature which determined that the second-semester grade is one of the deciding features for students dropping out of Higher Education. This study started with three research questions: first,Will it be possible to obtain optimum features set to predict dropout student using ACO-RF?. Will ACO-RF helps to increase the accuracy of the and third,Will Neural networks perform better than machine learning algorithms on selected features? To address our research questions, yes. ACO-RF performed better with a maximum fitness score of 90%, and validating its selected feature with classifiers improved the classifier's accuracy. As the Figure 6 depicts the improvement in accuracy of the neural network model from 0. 8981 to 0. Among all the classifiers we used for our experiment, the neural network outperformed with an accuracy of 93%. Figure 7. Represent the improvement in accuracy of the Neural network model using our proposed approach (ACO-RF). Discussion 1 Implications The findings reveal that applying ACO-RF-based feature selection significantly enhances the predictive performance of neural networks and KNN models. This suggests that educational institutions can implement models using compact and relevant feature subsets to more accurately identify at-risk students while reducing computational costs. Among the features selected, second-semester academic performance consistently emerged as a critical predictor, highlighting the importance of early academic intervention during the first year. Additional features such as parental occupation, application mode, and age at enrollment also demonstrated strong predictive value. These carry significant policy implications, as they offer concrete areas for targeted support. For instance, students with weak second-semester performance may benefit from remedial academic programs, while those from certain socioeconomic backgrounds could receive tailored counseling services. These insights enable institutions to design evidence-based policies that address student needs more effectively. Overall, the results reinforce the studyAos contributionsAi particularly in improving model interpretability and enhancing prediction performance using the ACO-RF 2 Research Contribution This study contributes to the field of educational data mining by proposing a novel integration of ACO with a wrapper-based feature selection approach tailored for student dropout prediction. The ACO-RF framework not only identifies relevant features but also enhances the performance of predictive modelsAi especially neural networks. Furthermore, it offers an interpretable subset of features, which is crucial for policy design and intervention planning in higher education institutions. Our study makes several Jurnal Pendidikan Teknologi Informasi dan Komunikasi Vol 4. No 1, pp58-76 2025 ABDIGERMAS JENTIK Singh. , & Karthikeyan. Few studies have applied metaheuristic techniques for FS in student dropout prediction. We address this gap by proposing an ACO-RF approach. We integrated ACO with various classifiers, compared their performance and selected the best one for feature selection. We improved the predictive performance of models by using a reduced feature set. Our approach identifies an optimal feature subset, providing insights that can help educational institutions formulate policies to support at-risk students. 3 Limitations Our work has two main limitations. Firstly, the computational time increases as the number of ants and generations in the ACO-RF algorithm rises. This is because more ants and generations demand greater processing power and time to execute. However, increasing these parameters improves the likelihood of identifying the optimal set of features, ultimately enhancing the accuracy and effectiveness of the feature selection process. The second limitation is the unavailability of a dataset. We are actively addressing this issue by collecting our own dataset, which will support future research. To ensure the broad applicability and robustness of the proposed methodology, we aim to use diverse datasets sourced from multiple This will help validate the effectiveness of our approach in various contexts. 4 Suggestions To build on this research, future work could: Explore other metaheuristic algorithms . Particle Swarm Optimization or Grey Wolf Optimize. for feature selection. Integrate temporal features such as progression patterns across semesters. Apply the method across multi-institutional or international datasets for broader generalization. Extend the study with SHAP or LIME to improve model interpretability and stakeholder trust. Additionally, further qualitative analysis on the impact of each selected feature, supported by domain literature, would provide deeper insight into student behavior and the effectiveness of interventions. Conclusion The rising rate of student dropouts in universities signals underlying challenges in the education system. This study proposed an ACO-RF feature selection method that integrates Ant Colony Optimization (ACO) with Random Forest (RF) to identify the most influential factors contributing to student dropout. Using a dataset of undergraduate students, including demographic, socioeconomic, macroeconomic, admission, and academic performance information, our approach successfully selected the most relevant and nonredundant features. The ACO-RF method achieved a maximum fitness score of 90%, demonstrating its superior ability to enhance model accuracy compared to other classifier combinations. The selected features such as second-semester evaluations, parental occupation, application mode, and age at enrollment carry significant policy implications. Institutions can leverage these insights to design targeted interventions, such as early academic support for students struggling in their second semester or counseling services based on socioeconomic indicators. Our results strongly support the contributions claimed in the study, particularly regarding improved model interpretability and prediction performance. However, the method also has limitations, primarily the increase in computational cost with more ants and generations, and the restricted availability of diverse datasets. We are addressing this by actively collecting a more representative dataset from multiple universities. Future work will expand the scope by incorporating behavioral and interaction dataAisuch as student engagement on learning platforms, attendance patterns, and course participationAiand by benchmarking ACO-RF against other heuristic techniques like Particle Swarm Optimization (PSO) and Grey Wolf Optimizer (GWO). These enhancements aim to create more robust and actionable models for dropout prediction and educational policy planning. Acknowledgment We would like to express our deep appreciation to all entities and individuals who have made significant contributions to this research on Wrapper Feature Selection Method for Dropout Prediction in Higher Education. The support provided, whether in the form of enlightening academic guidance, provision of crucial datasets, access to computing facilities or invaluable moral support, has been the main foundation Jurnal Pendidikan Teknologi Informasi dan Komunikasi Vol 4. No 1, pp 58-76 2025 ABDIGERMAS JENTIK Wrapper Feature Selection Method for Predicting Student Dropout in Higher Education for the smooth and successful analysis. All forms of participation were important determinants in the findings presented in this study. Author Contribution Statement formulated the research problem, conducted all the experiments, wrote the complete manuscript, validated the results, created the figures and tables, and performed the final review of the paper. supervised all the tasks, validated the results, and also reviewed the final manuscript. References