Bulletin of Informatics and Data Science Vol. 4 No. May 2025. Page 22Oe33 ISSN 2580-8389 (Media Onlin. DOI 10. 61944/bids. https://ejurnal. id/index. php/bids/index Implementation of Feature Selection Information Gain in Support Vector Machine Method for Stroke Disease Classification Anisa Fitri. Iis Afrianty*. Elvia Budianita. Siska Kurnia Gusti Faculty of Science and Technology. Informatics Engineering. Sultan Syarif Kasim Riau State Islamic University. Pekanbaru. Indonesia Email: 12150120006@students. uin-suska. id, *iis. afrianty@uin-suska. id, elvia. budianita@uin-suska. siskakurniagusti@uin-suska. Correspondence Author Email: iis. afrianty@uin-suska. Abstract Stroke is a disease with a high mortality and disability rate that requires early detection. However, the main challenge in the classification process of this disease is data imbalance and the large number of irrelevant features in the dataset. This study proposes a combination of Support Vector Machine (SVM) method with Information Gain feature selection technique and data balancing using Synthetic Minority Over-sampling Technique (SMOTE) to improve classification accuracy. The dataset used consists of 5,110 data with 10 variables and 1 label. Feature selection was performed with three threshold values . , while SVM classification was tested on three different kernels: Linear. RBF, and Polynomial. Model evaluation was performed using Confusion Matrix and training and test data sharing using k-fold cross validation with k=10. The best results were obtained on the RBF kernel with Cost=100 and Gamma=5 parameters at an Information Gain threshold of 0. 0005, with accuracy reaching 90. These results show that the combination of techniques used aims to determine the variables that most affect SVM classification in detecting stroke Keywords: Information Gain. Stroke Classification. Machine Learning. SMOTE. Support Vector Machine INTRODUCTION Stroke is a sudden brain disorder due to circulatory disorders that can permanently damage brain cells . The cause of stroke occurs due to rupture of blood vessels . emorrhagic strok. or blockage of blood vessels in the brain . schemic strok. which blocks the flow of oxygen and nutrients, if not treated immediately this condition can cause brain cell death . If left untreated, stroke can lead to permanent disability or death . Especially in hemorrhagic stroke, rupture of cerebral blood vessels can cause extensive bleeding and significantly damage brain tissue . Some of the risk factors that contribute to stroke incidence include age, gender, history of hypertension, cholesterol levels, obesity, coronary heart disease, smoking, consumption of high-salt foods, and lack of physical activity . Among these factors, hypertension is the most dominant, with the highest influence value of a mean of 0. Hypertension is also known to be a major cause of intracerebral hemorrhage, with a prevalence of more than 60% among stroke patients . Early detection of stroke is very important because the initial symptoms are often not recognized, and low public knowledge contributes to delays in treatment . Advances in technology now allow the use of artificial intelligence, particularly Machine Learning, in stroke disease detection more effectively . By using Machine Learning, the system can learn patterns from medical data without the need for explicit programming, which can speed up and improve the accuracy of medical analysis . One method in Machine Learning that is often used for medical data classification is Support Vector Machine (SVM). SVM works by finding a hyperplane to separate data classes with the largest margin so that it can produce more accurate classifications . Previous studies have shown that SVM can be applied for stroke disease classification with varying results. For example, research by . Using linear, polynomial, sigmoid, and RBF kernels on a dataset containing 5,110 data with 11 variables. The results show that the polynomial kernel produces the highest accuracy of 78. 86%, with 73. 98% precision 75% recall on 80:20 training and test data. However, this accuracy is still very low. Another study by . implemented the SVM algorithm in predicting stroke disease using polynomial. RBF, and sigmoid kernels with 80:20 training and test data, the dataset used was 5,110 data with 11 attributes and produced the highest accuracy of 95%. Research by . used the SMOTE method to balance data that had 5110 data and 11 attributes, achieving 92% accuracy with RBF parameters. Cost 100, and Gamma 1. In addition, research by. shows that the application of SVM with Gaussian RBF kernel is also able to provide high accuracy, which is 93. 90% on elementary school accreditation data, with parameters E = 3 and C = 1. This shows that the selection of the right kernel and parameters greatly affects the performance of SVM classification. Although SVM has shown good potential in a wide range of parameters, the accuracy obtained in some studies can still be improved with parameter adjustments and optimal data processing approaches. One way to improve SVM accuracy is to apply Feature Selection techniques before classification . Feature Selection is the process of selecting the most relevant features to reduce data dimensions, improve accuracy, prevent overfitting and make the model easier to understand . In this study using Information Gain feature selection. Information gain (IG) is a method widely used for feature selection and filtering . Related research on Information Gain is one of them by . applying Information Gain in feature selection for Nayve Bayes Classifier classification with 200 data and obtaining an increase in accuracy with a difference of 4% with a threshold between 0. 01 to 0. Without using the Information Gain method . of 70%, while using the Information Gain method . of Copyright A 2025 Authors. Page 22 This Journal is licensed under a Creative Commons Attribution 4. 0 International License Bulletin of Informatics and Data Science Vol. 4 No. May 2025. Page 22Oe33 ISSN 2580-8389 (Media Onlin. DOI 10. 61944/bids. https://ejurnal. id/index. php/bids/index In addition, research by . shows that the Information Gain feature selection on sentiment analysis with SVM using a tweet dataset from the Twitter application as much as 496 data produces 92% accuracy, 90% precision, and 92% recall, with 80:20 testing using a linear kernel. further research by . shows that the SVM model with the application of feature selection using Information Gain (SVM-IG) produces the best accuracy of 72. 45%, an increase of 3. 08% from the initial accuracy of 69. The average increase in accuracy after optimization is 2. SVM-IG shows better performance, so the proposed model is proven to be able to improve classification accuracy in SVM. In optimizing the performance and accuracy of the model on stroke disease classification, this research will use Information Gain to select the best features from the 10 variables in the dataset. This approach aims to select the most relevant features and reduce less important data, so as to speed up processing time and improve model accuracy. Thus, the application of SMOTE to handle data imbalance and feature selection. It is expected to produce a faster, more accurate, and more effective SVM model in early stage stroke classification, providing a more efficient and beneficial solution for stroke prevention. RESEARCH METHODOLOGY Research methodology is the stages that will be carried out during the implementation of research and is well organized and systematic. The research methodology is used as a reference or guideline during research in order to achieve the expected goals. This research uses the Support Vectore Machine method for stroke disease classification. 1 Research Stages Research stages will perform several stages as shown below. First, starting from data collection, then selecting data on the dataset, and then preprocessing the data where there are several processes carried out in preprocessing the data, including data cleaning, data transformation, normalization and smote. After preprocessing, continue to perform feature selection using Information Gain, after that classify the Support Vectore Machine (SVM) method, and continue to evaluate using Confusion Matrix. So that conclusions can be drawn in this study. The following stages of research can be seen in Figure 1. Figure 1. Research Stages 2 Data Collection The dataset used in the study is a stroke prediction dataset obtained from Kaggle . ttps://w. com/datasets/fedesoriano/stroke-prediction-datase. This dataset consists of 5,110 data with 10 variables and 1 label. The available variables include id, stroke . , gender, age, hypertension, heart disease, ever married, work type, residence type, avg glucose level. BMI, and smoking status. Can be seen in Table 1. Table 1. Initial Research Data Hypert Heart_ Ever_mari Yes Yes Gender Age Male Female Work _type Privat Selfempl Reside nce_type Urban Avg_gl Rural BMI N/A Smoking -status Stro Copyright A 2025 Authors. Page 23 This Journal is licensed under a Creative Commons Attribution 4. 0 International License Bulletin of Informatics and Data Science Vol. 4 No. May 2025. Page 22Oe33 ISSN 2580-8389 (Media Onlin. DOI 10. 61944/bids. https://ejurnal. id/index. php/bids/index Hypert Heart_ Ever_mari Yes Female Yes Female Yes Male Yes Male Yes Female Yes Gender Age Male Work _type Privat Privat Selfempl Privat Privat Govt _job Reside nce_type Rural Avg_gl Smoking -status Stro Urban Rural Never Urbal Rural Urban Unknow BMI 3 Data Selection Data selection is the process of selecting relevant features in the initial dataset. Data that is not important or not needed in the data will be deleted. The initial dataset consists of 12 variables, but after going through the feature selection process, only 11 variables are used, namely: stroke, gender, age, hypertension, heart_disease, ever_married, work_type, residence_type, avg_glucose_level. BMI, and smoking_status. Of all these variables, stroke is used as the target variable or label, while the other 10 variables function as variables or features in the classification process. The id variable contained in the initial dataset was removed because it had no relevance to the classification process and did not contain information that could help in predicting stroke risk. 4 Preprocessing Data preprocessing is the initial stage in data mining that aims to transform raw data into a cleaner format ready for use in analysis or modeling. This process includes data cleaning, transformation, normalization, and handling of missing or inconsistent data to make the analysis results more accurate and efficient. 1 Data Cleaner Data cleaning aims to handle missing values so that only valid and relevant variables are used in the analysis. In the BMI variable, the missing value (NaN) is replaced with the average value . of the BMI variable. This step is done to avoid bias and ensure the data is ready to be used in the model training process. 2 Data Transformation Data transformation is the process of changing the coding of categorical variables into numerical form so that they can be understood and processed by machine learning models, one of which is by using One-Hot Encoding and Label Encoding techniques. One Hot Encoding is converting categorical variables into a format that the model can use by creating a binary column for each category. While Label Encoding is converting categories into numbers that represent these categories . In the dataset there are 5 categorical variables such as Gender. Ever_married. Work_type. Residence_type, and Smoking_status that need to be transformed to a numerical format in order to be processed by the Support Vector Machine (SVM) algorithm. Such as Gender. Work_type, and Smoking_status variables are transformed using One-Hot Encoding, which results in new variables such as Gender_male. Work_type_private and Smoking_status_never. Features that have been One-Hot Encoded produce categories with values of 1 and 0. Meanwhile. Ever_married and Residence_type variables are transformed using Label Encoding. Besides categorical transformation, there are also other types of transformation such as normalization. Data normalization aims to rescale numerical values into a smaller range, from a scale of 0 to 1 or from -1 to 1. One commonly used method is min-max normalization. Min-max normalization is the process of performing a linear transformation on the original data . The min-max normalization formula is shown in Equation 1. AXi = Xi Oe min. Oe min. ) min. Oe min. ycUycn = State the original value AycUycn = Values after normalization ycoycaycu. Oe ycoycnycu. = Minimum and maximum values of the change in X ycoycaycu. Oe ycoycnycu. = New desired range Copyright A 2025 Authors. Page 24 This Journal is licensed under a Creative Commons Attribution 4. 0 International License Bulletin of Informatics and Data Science Vol. 4 No. May 2025. Page 22Oe33 ISSN 2580-8389 (Media Onlin. DOI 10. 61944/bids. https://ejurnal. id/index. php/bids/index 5 Synthetic Minority Over-Sampling Technique (SMOTE) Synthetic Minority Over sampling Technique (SMOTE) is an oversampling used to overcome the problem of lack of data related to minority groups, the aim is to multiply minority data by creating synthetic data similar to existing minority data . In this study, the stroke case uses the smote method to overcome the problem of lack of data. In Figure 2 is the data before SMOTE, there are 4,861 total data in class 0 . o strok. and 249 data in class 1 . from a of 5110 data. Figure 2. Data before Smote Figure 3 is the data after applying Smote, the data becomes balanced with a total of 9,722 samples, consisting of 4,861 for class 0 and 4,861 for class 1. With this balance, the model can detect patterns in both classes more effectively. Figure 3. Data after smote 6 Information Gain Feature Selection Feature selection is to determine the most influential variables in improving model accuracy. This research uses Information Gain, which measures the contribution of each variable based on entropy reduction. The steps of Information Gain are as follows: The first stage of feature selection separates 10 variables according to class, then calculates the total Next, calculate the Information Gain then the variables are arranged descendingly based on the Information Gain The final stage is to select the top ranking variable. This research uses three thresholds: 0. 0005 to determine the most influential features. Information gain formulas include . The Information Gain formula is shown in Equations 2 and 3 yca yaycuycycycuycyyc . cI) = Oc Oe ycyycn ycoycuyci2 ycyyc ycyc yaycuyceycuycycoycaycycnycuycu yaycaycnycu . cI, y. = yaycuycycycuycyyc . cI) Oe Euycycaycoycyceyc. cyc ) ycyc ycn Information Gain (S. A) is a measure of the reduction in entropy . that occurs after the data S is divided based on variable A. It is calculated from the difference between the entropy (S) before division and the weighted average of the entropy (S. , i. the entropy on each subset Sv for v values of variable A. The higher the Gain (S. A) value, the greater the contribution of variable A in differentiating the data towards the classification target. 7 K-Fold Cross Validation K-Fold Cross Validation is testing data that is divided into folds and the model is trained and tested on each different fold in turn . This research test uses k-fold cross validation with k = 10, which divides the data into 10 folds. Each iteration, the data is divided into training data to train the model and test data to evaluate its performance. This technique ensures a more accurate evaluation of the model. 8 Support Vector Machine (SVM) Method The Support Vector Machine method uses kernels to map low-dimensional nonlinear data into a higher dimensional space, thus allowing linear separation of data with a hyperplane. There are several types of kernels in SVM, namely linear kernels Copyright A 2025 Authors. Page 25 This Journal is licensed under a Creative Commons Attribution 4. 0 International License Bulletin of Informatics and Data Science Vol. 4 No. May 2025. Page 22Oe33 ISSN 2580-8389 (Media Onlin. DOI 10. 61944/bids. https://ejurnal. id/index. php/bids/index that are suitable for linearly separable data, polynomials that use polynomials of a certain degree to handle more complex data. RBF is often used because it is able to handle complex and irregular data patterns. The choice of kernel depends on the characteristics of the dataset and the classification needs. Classification testing is done using the k-fold cross-validation technique with a value of k=10. Three types of kernels used in the SVM method are Linear. Radial Basis Function (RBF), and Polynomial. For each kernel, testing is done with various parameters. The Linear kernel uses the C parameter with values of 1, 10, and 100. The second kernel is RBF using various values of C = 1, 10 and 100, and the gamma used () = 1, 4, and 5. Finally, the Polynomial kernel, with degree values = 1 and 2. This test aims to determine the best configuration of parameters in detecting the risk of stroke disease and evaluate the effect of parameter combinations on accuracy. The following kernel functions are used in Table 2 . Table 2. Kernel Function Kernel Name Linear RBF Polynomial Kernel Function ya. cuycn , ycuyc ) = ycuyc . , ycuyc ya. cuycn , ycuyc ) = . , ycuyc yc. ycc ya. cuycn , ycuyc ) = yceycuycy(Oey. cuyc Oe ycuyc |. cc 9 Confusion Matrix Evaluation Confusion matrix is a part of machine learning by studying existing data and grouping it as new data by giving out results with categorical Nominal or Ordinal variables. In Confusion Matrix there are 4 . terms used to represent the process of classification results TP (True Positiv. TN (True Negativ. FP (False Positiv. FN (False Negativ. The confusion matrix formula is shown in Equations 7 through 10. Accuracy (TP TN) X100% TP TN FP FN Precision (TP TN) X100% TP FP Recall X100% TP FN F1 Score 2(Precision x Recal. (Precision Recal. Accuracy: is a measure that shows how close the prediction results are to the actual data, which is the percentage of correctly classified test data out of all test data. Precision: is a measure of the accuracy of positive predictions, which is the ratio of correct positive predictions to all positive predictions made by the model. Recall: is a measure of the model's ability to detect positive data correctly, which is the ratio of positive data that is successfully recognized to all actual positive data. F1 Score: is a harmonic mean between precision and recall that reflects the balance of the two, especially useful when the data is not balanced. RESULTS AND DISCUSSION This study uses 5,110 stroke data with 10 variables that have previously been carried out in the data cleaning stage in the preprocessing process. The implementation was done using Python programming using Google Colab. Experiments were conducted by applying feature selection using Information Gain. Using the Support Vector Machine (SVM) classification method, using three types of kernels namely linear. RBF, and polynomial. The performance of the model is evaluated using Confusion Matrix to measure accuracy results and testing using K-Fold 10. The accuracy results of each kernel are presented in tabular form to be compared and analyzed further. 1 Feature Selection Result The figure below shows the feature selection process using Information Gain with a threshold of 0. This means that only features that have an Information Gain value greater than or equal to 0. 04 will be selected for the next classification The use of this threshold is quite high, so only features that are highly influential on the target class will be used. Figure 4. Threshold 0,04 Copyright A 2025 Authors. Page 26 This Journal is licensed under a Creative Commons Attribution 4. 0 International License Bulletin of Informatics and Data Science Vol. 4 No. May 2025. Page 22Oe33 ISSN 2580-8389 (Media Onlin. DOI 10. 61944/bids. https://ejurnal. id/index. php/bids/index Figure 5 shows the same process, but with a threshold of 0. Compared to the previous threshold, this threshold is lower, so more features are used. This allows the model to consider features with moderate influence on the target class. This threshold is usually used to maintain a balance between efficiency and diversity of information from the features. Figure 5. Threshold 0,01 The last figure uses a very low threshold of 0. Almost all features that have even a small Information Gain value will still be selected. The aim is that no potentially informative features are missed, although this may increase the risk of including less relevant features. Figure 6. Threshold 0,0005 Each of these thresholds was used to explore their impact on feature selection and model performance. A higher threshold tends to result in a simpler model, while a lower threshold allows the model to consider more features. The figure below shows the results of feature selection using the Information Gain algorithm with a threshold of Through this process, 18 features were selected that are considered to have the greatest contribution to the classification process. This selection aims to simplify the model without sacrificing important information while maintaining features that are relevant to the prediction target. The selected features include: avg_glucose_level, bmi, age, work_type_children, ever_married, work_type_Self-employed, smoking_status_formerly smoking_status_Unknown, work_type_Private, work_type_Govt_job, smoking_status_smokes, hypertension, heart_disease, gender_Male, gender_Female, smoking_status_never smoked, work_type_Never_worked, and Residence_type. Of all these features, avg_glucose_level has the highest Information Gain value of 0. 981916, which indicates that this feature has the most significant influence on class prediction. Meanwhile. Residence_type is the feature with the lowest Information Gain value, which is 0. 001453, but still exceeds the threshold value and is considered to still contribute enough to be included in the modeling. The results of the feature selection can be seen in Figure 7. Figure 7. Feature Selection Result 2 Classification Results This study shows that without the application of SMOTE and Information Gain, the SVM model performance is still not optimal due to data imbalance. The RBF kernel gives the best results with 82. 54% accuracy and 86. 49% F1-Score, while the Linear and Polynomial kernels are quite low in accuracy. When compared to previous research . which only 86% accuracy, 73. 98% precision, and 56. 75% recall on 80:20 training and test data, these results show a significant improvement in performance. This improvement shows that the selection of the right kernel such as RBF can produce better performance even before the feature selection and data balancing stages are carried out, but further optimization is still needed so that recall and accuracy increase. This can be seen in Table 3. Copyright A 2025 Authors. Page 27 This Journal is licensed under a Creative Commons Attribution 4. 0 International License Bulletin of Informatics and Data Science Vol. 4 No. May 2025. Page 22Oe33 ISSN 2580-8389 (Media Onlin. DOI 10. 61944/bids. https://ejurnal. id/index. php/bids/index Table 3. Overall Average Results without Smote and Information Gain Kernel Linear RBF Polynomial Accuracy Precision Recall F1-Score The Overall average results table without SMOTE using Information Gain presents the performance of the model after applying feature selection using three threshold values . , but without the use of SMOTE. The results displayed from the average accuracy across all parameter combinations for each kernel and threshold value show that the RBF kernel remains superior across all thresholds, with the highest accuracy reaching 82. 78% and F1-Score up The Polynomial kernel shows stable performance with high precision, but the recall accuracy values remain The Linear kernel also showed similar results, with high precision but low recall and accuracy. This shows that although feature selection can help simplify the model, without data balancing such as SMOTE, the model still struggles to optimally recognize minority classes. This can be seen in Table 4. Table 4. Overall Average Results without Smote using Information Gain Kernel Linear RBF Polynomial Linear RBF Polynomial Linear RBF Polynomial Threshold 0,04 Accuracy 0,01 0,0005 Precision Recall F1-Score The test results show the performance of the Support Vector Machine (SVM) model using three types of kernels, namely Linear. RBF, and Polynomial without the application of feature selection. Before the model training process, the data is balanced using the SMOTE method to overcome class imbalance by generating synthetic data in the minority The test results shown are the average of all tested parameters for each kernel. The RBF kernel shows the best performance with the highest accuracy reaching 88. 51%, precision 89. 06%, recall 88. 51%, and F1-Score 88. Furthermore, the polynomial kernel obtained an accuracy of 80. 82%, while the linear kernel showed the lowest result with an accuracy of 79. Can be seen in Table 5. Table 5. Overall Average Results using Smote without Information Gain Kernel Linear RBF Polynomial Accuracy Precision Recall F1-Score Test results after three trials using Information Gain feature selection and various types of SVM kernels, namely Linear. RBF, and Polynomial. The value shown is the average accuracy of all parameter combinations for each kernel and threshold value. Based on these results, the RBF kernel provides the best performance with the highest accuracy of 42% at a threshold of 0. In contrast, the Linear kernel at a threshold of 0. 04 only achieved the highest accuracy 84%, which is the lowest accuracy in this test. This shows that the selection of the kernel and the number of features based on the threshold greatly affects the performance of the model. It can be seen in Table 6. Table 6. Overall Average Results using Smote and Information Gain Kernel Linear RBF Polynomial Linear RBF Polynomial Linear RBF Polynomial Threshold 0,04 0,01 0,0005 Accuracy Precision Recall F1-Score Support Vector Machine (SVM) linear kernels very often used when the data is high-dimensional and linearly Kernels do not require mapping to higher dimensions as they do not provide significant performance Copyright A 2025 Authors. Page 28 This Journal is licensed under a Creative Commons Attribution 4. 0 International License Bulletin of Informatics and Data Science Vol. 4 No. May 2025. Page 22Oe33 ISSN 2580-8389 (Media Onlin. DOI 10. 61944/bids. https://ejurnal. id/index. php/bids/index In this study. Information Gain feature selection is used to select the most robust features before applying the model, in order to improve classification accuracy and efficiency. The C parameter with values of 1, 4, and 5 is used to control the balance between the decision limit and the classification error rate. Accuracy results after three tests of feature selection and model application. Table 7. Average Results of Linear Kernel using Smote with Information Gain Threshold 0,04 Cost 0,01 0,0005 Accuracy Precision Recall F1-Score The average results of the linear kernel with various Cost parameter values for three tests. Shows the test results on the feature selection process with a threshold of 0. 04 which produces the highest accuracy of 79. 84%, precision of 38%, recall of 78. 84%, and F1-Score of 78. 74% at a Cost value of 1. Furthermore, testing with a threshold of 0. shows the highest accuracy of 79. 17%, precision of 79. 79%, recall of 79. 17%, and F1-Score of 79. 06% at Cost values of 10 and 100. Then, with a threshold of 0. 0005, the best results were obtained at a Cost value of 100 with an accuracy of 40%, precision of 80. 06%, recall of 79. 40%, and F1-Score of 79. Overall, although there is an increase in performance at each threshold. The linear kernel shows relatively low performance and is less responsive to changes in the Cost parameter value. It can be seen in Table 7. The RBF kernel in Support Vector Machine (SVM) is a popular function because it is effective on data that is not linearly separable. This kernel has two main parameters, namely Cost (C) and Gamma . , where Gamma regulates the range of influence of the training data. Gamma values that are too small make the model less able to capture the complexity of the data, while values that are too large can cause overfitting. Before testing, feature selection was applied using Information Gain with thresholds of 0. In this test, various parameters C = 1, 10, and 100 and yu = 1, 4, and 5 were used. The accuracy results of the parameter combinations are shown in Table 8. Table 8. Average Results of RBF Kernel using Smote and Information Gain Threshold 0,04 Cost 0,01 0,0005 Gamma Accuracy Precision Recall F1-Score Copyright A 2025 Authors. Page 29 This Journal is licensed under a Creative Commons Attribution 4. 0 International License Bulletin of Informatics and Data Science Vol. 4 No. May 2025. Page 22Oe33 ISSN 2580-8389 (Media Onlin. DOI 10. 61944/bids. https://ejurnal. id/index. php/bids/index Based on Table 8, the test results in three tests on the RBF kernel with feature selection using Information Gain show that the highest accuracy is obtained at a threshold of 0. 0005 resulting in the highest accuracy of 90. 51% with Precision 90. Recall 90. 51%, and F1-Score 90. 49% (C = 100. Gamma = . followed by a threshold of 0. 01 with an accuracy of 86. Precision 87. Recall 86. 60%, and F1-Score 86. 54% (C = 100. Gamma = . and a threshold of 04 with an accuracy of 81. Precision 82. Recall 81. 28%, and F1-Score 81. 14% (C = 100. Gamma = . , which concludes that the smaller the threshold value in feature selection, the higher the performance of the resulting model. The polynomial kernel in Support Vector Machine (SVM) is an effective kernel function for large, normalized datasets, as it maps data to feature space while maintaining relationships between samples. This kernel is similar to the linear kernel in the way it works, but considers a combination of features in measuring similarity. The polynomial kernel has three main parameters, namely Cost (C). Gamma . , and Degree . , which affect the complexity of data separation. Before testing, feature selection is done using Information Gain with thresholds of 0. 0005 to get the most relevant features. Table 9. Average Results of Polynomial Kernel using Smote and Information Gain Threshold 0,04 Cost 0,01 0,0005 Accuracy Precision Recall F1-Score Evaluation results of Support Vector Machine (SVM) model with Polynomial kernel at three different threshold values, namely 0. Each threshold displays various model parameters such as Cost. Gamma, degree, and coef0 values, as well as evaluation results based on accuracy, precision, recall, and F1-score. At threshold 0. 04, the best performance is obtained when Cost=10, degree=2, and coef0=0, with an accuracy of 80. 04% and F1-Score of Then at threshold 0. 01 the performance increases at Cost=100, degree=2, and coef0=0, resulting in an accuracy 40% and F1-Score of 81. A more significant improvement is seen at threshold 0. 0005, where the Cost=100, degree=2, and coef0=0 configuration produces the highest accuracy of 83. 04% and F1- Score of 82. Overall, these results show that decreasing the threshold value . hich means more features are use. has a positive impact on model performance, especially in improving accuracy and F1-Score. This can be seen in Table 9. Copyright A 2025 Authors. Page 30 This Journal is licensed under a Creative Commons Attribution 4. 0 International License Bulletin of Informatics and Data Science Vol. 4 No. May 2025. Page 22Oe33 ISSN 2580-8389 (Media Onlin. DOI 10. 61944/bids. https://ejurnal. id/index. php/bids/index Figure 8. Comparison Results for each Karnel and Threshold The graph above shows the results of the SVM model accuracy comparison with three types of kernels, namely Linear. RBF, and Polynomial, against three different threshold values, namely 0. 04, 0. 01, and 0. Based on the graphs, it can be seen that the Linear kernel provides the lowest accuracy and is less affected by changes in the threshold value, with accuracy ranging from 78% to 80%. In contrast, the RBF and Polynomial kernels show a significant increase in accuracy as the threshold decreases. The RBF kernel with a threshold of 0. 0005 yields the highest accuracy of around 91%, which indicates that selecting more features . maller threshol. can improve the overall performance of the model. Thus, it can be concluded that the RBF kernel is superior to Polynomial and Linear, and the use of a lower threshold can improve classification performance. The study showed that the combination of Information Gain technique in feature selection and SMOTE for data balancing successfully improved the performance of SVM model in predicting stroke disease. Without this technique, the model has difficulty recognizing stroke patients because the amount of data is much smaller. After applying SMOTE, the data distribution becomes balanced, so the model can learn better. Feature selection using Information Gain . 04, 0. helps the model focus on relevant features. The SVM model with RBF kernel gave the best results, especially at threshold 0. 0005 with Cost 100 and Gamma 5, resulting in the highest accuracy of 90. The Polynomial kernel is also quite good . 04%), while the linear kernel is the most stable but has lower accuracy . 84%). Thus, proper feature selection, data balancing, and optimal parameter tuning are essential for building an accurate and effective stroke detection 3 Evaluation The confusion matrix in the figure below shows two confusion matrices of the Support Vector Machine (SVM) model with RBF kernel (C=100, gamma=5. Fold=. , comparing the performance of the model without feature selection and with feature selection using Information Gain. In the first confusion matrix without Information Gain, the model successfully classified 437 negative data and 465 positive data correctly, and produced 51 false positives and 19 false negatives, while in the second confusion matrix with Information Gain, the model slightly decreased in the classification of negative data with 419 true negatives and increased in the classification of positive data with 473 true positives, and produced 59 false positives and 21 false negatives, which overall shows that the use of Information Gain helps the model in recognizing positive cases . better although it slightly reduces the accuracy in the negative class. The Confusion matrix results can be seen in Figure 9. Figure 9. Comparison of confusion matrix results with the highest accuracy in SVM testing . without Information Gain and . using Information Gain Copyright A 2025 Authors. Page 31 This Journal is licensed under a Creative Commons Attribution 4. 0 International License Bulletin of Informatics and Data Science Vol. 4 No. May 2025. Page 22Oe33 ISSN 2580-8389 (Media Onlin. DOI 10. 61944/bids. https://ejurnal. id/index. php/bids/index 4 Discussion This study shows that the application of Feature Selection using Information Gain and data balancing with SMOTE successfully improves the performance of the Support Vector Machine (SVM) model in stroke disease classification. contrast to previous research by . which only evaluated various SVM kernels with parameters or data imbalance And was able to achieve a maximum accuracy of 78. 86% with the Polynomial kernel. this study explores the model parameters in more depth. The experimental results show that at a threshold of 0. 0005 the RBF kernel with parameters Cost = 100 and Gamma = 5 gives the highest accuracy of 90. While the Polynomial kernel produces 04% accuracy with parameters Cost=100. Degree=2, and Coef0=0 which has very stable results. Linear kernel has lower results than other kernels with the highest accuracy of 79. Thus, it can be concluded that the use of Feature Selection techniques. SMOTE, and appropriate parameter settings significantly improve the performance of stroke classification models. This approach contributes better than previous studies and shows more optimal results. CONCLUSION This study aims to evaluate the performance of Information Gain feature selection in the classification process using the Support Vector Machine (SVM) algorithm to detect stroke disease. The method used combines feature selection with Information Gain and data balancing using (SMOTE), which is proven effective in overcoming class imbalance problems and improving model accuracy. Feature selection is performed by applying a threshold of 0. 0005, resulting in 18 features that are considered most relevant to the classification target. These features include: avg_glucose_level, bmi, age, work_type_children, ever_married, work_type_Self-employed, smoking_status_formerly smoking_status_Unknown, work_type_Private, work_type_Govt_job, smoking_status_smokes, hypertension, heart_disease, gender_Male, gender_Female, smoking_status_never smoked, work_type_Never_worked, and Residence_type. On the other hand, the gender_Other feature was not used because it did not pass the selection threshold. The results showed that the RBF kernel with a combination of Cost = 100 and Gamma = 5 parameters gave the best performance with the highest accuracy of 90. The Polynomial kernel with parameters Cost = 100. Degree = 2, and Coef0 = 0 produced an accuracy of 83. 04%, while the Linear kernel showed the lowest accuracy of 79. 84%, although Overall, the application of feature selection, data balancing, and appropriate parameter settings were proven to determine the variables that most influence SVM classification, thus improving the model's ability to detect stroke disease. As a direction for further development, it is recommended to use other feature selection methods such as Gain Ratio. Recursive Feature Elimination (RFE), and Lasso Regression to compare performance and find a more optimal approach. In addition, the use of larger and more diverse datasets is expected to improve the generalizability and reliability of the Further research could also include comparing SVM with other algorithms such as Random Forest. XGBoost, or deep learning approaches to broaden insights in the development of stroke classification systems. These findings have great potential to be applied in technology-based early detection systems to support the stroke diagnosis process quickly, accurately, and efficiently. REFERENCES