SINERGI Vol. No. June 2021: 177-184 http://publikasi. id/index. php/sinergi http://doi. org/10. 22441/sinergi. CLASSIFICATION OF KIDNEY DISEASE USING GENETIC MODIFIED KNN AND ARTIFICIAL BEE COLONY ALGORITHM Ardina Ariani1. Samsuryadi2* Graduate Program in Computer Science. Faculty of Computer Science. Universitas Sriwijaya. Indonesia Department of Computer Science. Faculty of Computer Science. Universitas Sriwijaya. Indonesia Abstract The health care system is currently improving with the development of intelligent artificial systems in detecting diseases. Early detection of kidney disease is essential by recognizing symptoms to prevent more severe damages. This study introduces a classification system for kidney diseases using the Artificial Bee Colony (ABC) algorithm and genetically modified K-Nearest Neighbor (KNN). ABC algorithm is used as a feature selection to determine relevant symptoms used in influencing kidney disease and Genetic modified KNN used for This research consists of 3 stages: pre-processing, feature selection, and classification. However, it focuses on the preprocessing stage of chronic kidney disease using 400 records with 24 attributes for the feature selection and classification. Kidney disease data is classified into two classes, namely chronic kidney disease and not chronic kidney disease. Furthermore, the performance of the proposed method is compared with other The result showed that an accuracy of 98. 27% was obtained by dividing the dataset into 280 training and 120 test data. Keywords: Artificial Bee Colony. Classification of Kidney Disease. Feature Selection. Genetic Modified K-Nearest Neighbor. Article History: Received: July 6, 2020 Revised: September 20, 2020 Accepted: October 4, 2020 Published: February 20, 2021 Corresponding Author: Samsuryadi Department of Computer Science. Faculty of Computer Science. Universitas Sriwijaya. Indonesia Email: samsuryadi@unsri. This is an open access article under the CC BY-NC license INTRODUCTION Classification is one of the most commonly used data grouping mining techniques in the health sector. This technique is used to detect diseases based on Various classification techniques are used to detect and classify diseases. Several diseases are cancer, parkinsonism, heart, liver, diabetes, hepatitis, thalassemia, and Alzheimer's . The research carried out by . , 3, . stated that the most widely used method in classifying various applications and diseases is the KNearest Neighbor (KNN). This method has been used to classify hepatitis with better performance than the Neural Network (NN) . Fast training and simple algorithms are the hallmarks of the KNN. However, . stated that the number of features that influence the nearest neighbour's distance and the accuracy level is one of the weaknesses associated with the KNN. The condition leads to low accuracy on the multidimensional datasets. Therefore, it is necessary to determine the features that are best used in classification. Furthermore, to overcome the low accuracy in KNN, a new algorithm is needed, which modifies the KNN, enabling it to validate the training data and weigh the calculated Modified KNN also has weaknesses in determining the parameter k. Therefore, the proven Genetic Algorithm needs optimization . to produce k automatically with the best results. Optimization of the Genetic Algorithm (GA) in the Modified KNN is called Genetic Modified KNN (GMKNN). Also, features are selected to obtain the attributes that affect classification techniques based on the best model obtained. Kidney disease is a worldwide health It leads to decreased glomerular filtration rate and cardiovascular disease, which causes death . The situation is a killer disease because most infected people fail to obtain early The dataset derived from the UCI Machine Learning outlined 24 symptoms . used to determine and group people with and without kidney disease. Therefore, there are many features. Ancillary equipment components do not need to use as a determining Ariani & Samsuryadi. Et Al. Classification of Kidney Disease using Genetic Modified A SINERGI Vol. No. June 2021: 177-184 factor in classifying someone suffering from chronic kidney disease. In the study conducted by . Particle Swarm Optimization (PSO) was applied to feature selection on form-based diagnosis in Cluster Microcalcification in Mammography. The Ant Colony Optimization (ACO) method was modified into an efficient multivariate feature selection method to select the optimal feature from the initial This method shows a higher efficiency in reducing the number of features by maintaining maximum classification accuracy. Research from . combined the Artificial Bee Colony (ABC) with the ACO method to optimize feature selection. The research produced a suitable ABC performance method to reduce the feature subset's size with a better accuracy and speed level. Furthermore, this study examines the performance of the Genetic Modified KNN (GMKNN) classification method by applying the ABC algorithm for feature selection in classifying kidney diseases using the chronic dataset derived from the UCI Repository Machine Learning. METHOD Several research stages are needed to obtain the best results from the classification of chronic kidney diseases. Figure 1 shows various stages carried out in this study. Figure 1: Research Stages Data Collection The data used in this study were obtained from the Center for Chronic Kidney Disease at the University of California warehouse (UCI Repository Machine Learning Benchmar. This dataset consisted of 200 patients with 24 symptoms divided into chronic and non-chronic groups, as shown in Table 1. Table 1. Distribution of Chronic Kidney Disease Datasets Class CKD (Chronic Kidney Not CKD (Chronic Diseas. Kidney Diseas. Many factors greatly influence the success of the performance of the Machine Learning method. The representation and quality of data are one of essential attributes. The condition happened because irrelevant data and noise make it difficult for knowledge discovery in the training process . Therefore, it is important to pre-process the data by cleaning and normalizing it from noise to obtain the right information. Pre-processing At this stage, the data pre-processing was conducted by implementing the mean, median, and mode calculation techniques. Meanwhile, the same scale of data, normalization process, was performed using the min-max scalar technique. Feature selection is an effective data preprocessing technique that was conducted by taking a portion of the selected features from the original dataset. The purpose of feature selection is to eliminate noisy, irrelevant, and excessive The application of this selection enables a reduction in computational costs and increased accuracy in the data analysis process . The reduced dataset was normalized with the min-max scalar technique is shown in . S'= s Oe min( S k ) max( S k ) Oe min( S k ) . Where: = value from the input data min(S. = the minimum value of data max(S. = the maximum value of data Feature Selection The next step is a feature selection using the Artificial Bee Colony (ABC) algorithm. comprises a simple concept that is easy to implement with little control parameters. The ABC algorithm's three important components are food sources and waiting for bees in the hive . In the ABC algorithm, the food source position represents the solution that allows for the optimization problem and the nectar of the number of food sources according to the related solution's The number of observer bees is equal to the solutions in the population. Ariani & Samsuryadi. Et Al. Classification of Kidney Disease using Genetic Modified A p-ISSN: 1410-2331 e-ISSN: 2460-1217 Each search cycle consists of the following three steps: Sending employed bees to their food source and evaluating the amount of nectar. Selecting food source areas by observer bees and evaluating the amount of nectar from food Determining surveillance bees and randomly sending them to possible new food sources. the initialization stage, a set of food sources is randomly chosen by bees, with the amount of nectar determined. The main steps of the ABC algorithm are as follows . Initializing Population Repeat Place bees that are employed in food sources Place the observer bee on a food source depending on the amount of nectar Send a lookout to the search area to determine a new food source Memorize the best food sources found Repeat this process until the requirements are Classification In this last stage, the classification of kidney disease used the Modified K-Nearest Neighbor (MKNN) method combined with the GA method. The kidney disease data used results from feature selection using the ABC algorithm in the previous The KNN method has two parameters: the distance between the test data and training data and the nearest neighbour's distance . The classification process of the KNN method begins by calculating the distance between the test data and training data. Then, sorting based on the smallest distance, determining the closest k parameter, after that, selecting the shortest distance according to the number of k parameters. Finally, and voting on the largest class so that the class can be predicted. This algorithm's performance is also great, depending on the quality of the data used . The KNN method requires development to improve its performance, namely the Modified KNN. In KNN, the classification process is only done by calculating the distance and voting based on the k While in the MKNN method, there is a calculation of the value of validity and weighted voting, which can increase the resulting level of accuracy compared to the traditional KNN . The Modified K-Nearest Neighbor (MKNN) method is an initial KNN algorithm. At MKNN, there is an additional step of classification, with the class labeling determined based on the parameter k at the training's validated data points. Furthermore, the KNN is weighted on the test MKNN pseudo-code algorithm, according to . Output_label := MKNN . rain_set, test_sampl. Begin For i := 1 to train_size Validity. := Compute Validity of i-th End For. Output_label := Weighted_KNN(Validity, test_sampl. Return Output_label. End Every training data needs to go through a validation process. After all the training data has been validated, this process provides more information on points . Validity. = Where: Ni. Eu S . bl( . , lbl( N i ( . )) H i =1 . = Number of closest points = Class x = Label the closest point class x = The function used to calculate the similarity between the x point and the i-th nearest neighbor data ca, yc. = { yca=yca yca Oyca Where: = Class A on training data Other classes besides A on the training In addition to using the closest neighbor k, weighted voting was conducted in each of the major voting samples. Furthermore, the validity of each training data is multiplied by the weight based on the Euclidean distance. = Validity. aycc 0,. Where: = Weight voting calculation Validity. = Validity value yaycc = Euclidean distance Euclidean distance (E. equation as follows: Where: cuyco , ycuyc ) = Euclidean distance ycu = Number of data ycuyco = Training data . cuyco = ycuyco1 , ycuyco2 . A , ycuycoyc. ycuyc = Testing data . cuyc = ycuyc1 , ycuyc2 . A , ycuycycu ) ycn = 1, 2. A , ycu Ariani & Samsuryadi. Et Al. Classification of Kidney Disease using Genetic Modified A SINERGI Vol. No. June 2021: 177-184 A genetic algorithm is a proven and effective technique used in data mining and pattern recognition. Besides being useful for optimization, it can also be directly used as a The genetic algorithm method is very popular because it has been successfully applied to many optimization problems. One of its advantages is the ability to complete searches with enormous dimensions of space. In this study, the GA method was combined with the MKNN method to classify kidney disease where GA played a role in determining the optimal k The GMKNN-ABC method in the classification of kidney disease is compared with other methods to determine the proposed method's performance. The performance of GMKNN-ABC is compared to the MKNN method without using ABC as feature selection and GA optimization, and the MKNN method uses ABC as feature selection. The method's performance applied in this kidney disease classification study was measured by a confusion matrix, namely accuracy, precision, recall, and f1-score. These values were measured compared to the three methods, namely MKNN. MKNN-ABC, and GMKNN-ABC. Experimental Setup The dataset used to carry out this research was obtained from the Chronic Kidney Disease center at the University of California repository (UCI Repository Machine Learning Benchmar. The dataset can be downloaded by open access http://archive. edu/ml/datasets/Chronic_Kid ney_Disease. The kidney disease dataset consists of 400 records with 24 features and two classes, namely 250 CKD classes and 150 not CKD classes. The dataset is divided into 280 training data and 120 testing data. In this study, the classification of kidney disease was tested in several methods, namely MKNN. MKNN-ABC, and GMKNN-ABC. It aims to compare the performance of each method in classifying kidney disease. RESULTS AND DISCUSSION Classification of kidney disease is conducted by using the ABC method for feature selection and Genetic Modified KNN. GMKNNABC method performance testing results are seen from the resulting confusion matrix, namely accuracy, precision, recall, and f1-score. Testing is conducted by limiting the parameters k = 1 to Out of the 24 features in the chronic kidney disease dataset, only 1, 2, 5, 7, 11, 13, 14, 19, and 21 were used. These include blood pressure, specific gravity, red blood cells, pus cell clumps, serum creatinine, potassium, haemoglobin, diabetes mellitus, and appetite. The selected features are used in the classification of kidney The results of the confusion matrix based on the parameter k are shown in Table 2. The confusion matrix results show that the values of accuracy, precision, recall, and f1-score provided high enough results, with an average above 90%. In the form of a graph, the confusion matrix test results are shown in Figure 2. The confusion matrix graph shows that the large value of parameter k has no significant effect on the level of accuracy, precision, recall, and f1score with the production of high results above To test the GMKNN-ABC method's performance, the MKNN. MKNN-ABC, and GMKNN-ABC are compared, as shown in Figure Figure 3 shows a curve of the comparison of accuracy levels. Table 2. Confusion Matrix Testing the GMKNNABC Method Parameter Confusion Matrix (%) Accuracy Precision Recall F1score Ariani & Samsuryadi. Et Al. Classification of Kidney Disease using Genetic Modified A p-ISSN: 1410-2331 e-ISSN: 2460-1217 Figure 2. GMKNN-ABC Confusion Matrix Test Results Figure 3. The Accuracy Curve of MKNN. MKNN-ABC and GMKNN-ABC Ariani & Samsuryadi. Et Al. Classification of Kidney Disease using Genetic Modified A SINERGI Vol. No. June 2021: 177-184 The size of the parameter k greatly influences the level of accuracy of the MKNN test. Therefore, the greater the parameter k, the more the level of accuracy is obtained. However, the lowest accuracy is achieved when the parameter interval k ranges from 40 to 50. Meanwhile, the MKNN-ABC test gives a higher accuracy level than the MKNN, decreasing its accuracy level for the increasing k parameter. In GMKNN-ABC, the level of accuracy produced is more stable and more elevated than MKNN and MKNN-ABC. The size of the parameter k does not influence the high level of accuracy produced. Table 3 shows the comparison of the average Confusion Matrix results from the MKNN. MKNN-ABC and GMKNN- ABC. Table 4 display a comparison with other related works. A comparison of the average performance of the MKNN. MKNN-ABC, and GMKNN-ABC methods in a confusion matrix is displayed in the form of a bar graph in Figure 4. Table 3. Comparison of Average Confusion Matrix CONFUSION MATRIX ACCURACY PRECISION RECALL F1-SCORE MKNN MKNN-ABC GMKNN-ABC Figure 4. Comparison of Average Confusion Matrix Graphs Table 4. Comparing with other related works No. Author Kunwar et al. Gharibdousti et Subas. Alickovic, and Kevric Fadilla. Adikara, and Perdana 2016 ANN . 2017 NB . 2017 ANN . 2018 ELM . Abdelaziz et al. Elhoseny. Shankar, and Uthayakumar 2019 D-ACO . Year Method Accuracy (%) Hybrid LR and NN . Probabilistic Rady and Anwar 2019 Neural Network Fuzzy Expert Hamedan et al. System . Proposed MKNN* 2020 MKNN-ABC* GMKNN-ABC* Figure 4 shows that the GMKNN-ABC method provides the best results compared to MKNN and MKNN-ABC. Meanwhile, feature selection is very influential on the level of accuracy, precision, recall, and f1-score The GMKNN-ABC method provides the highest average yield above 95% compared to the MKNN and MKNN-ABC. CONCLUSION This study showed that the ABC method was implemented as a GMKNN feature selection that influences kidney diseases classification. The comparison results of the average confusion matrix testing on the MKNN. MKNN-ABC, and GMKNN-ABC methods show that the GMKNNABC provides the best average results level accuracy, precision, recall, and f1-score compared to others. In addition, the GMKNN-ABC method produces an average accuracy of 98. which is the highest compared to MKNN. MKNNABC and other methods. The method proposed in Ariani & Samsuryadi. Et Al. Classification of Kidney Disease using Genetic Modified A p-ISSN: 1410-2331 e-ISSN: 2460-1217 this study, especially the GMKNN-ABC method, has never been carried out by previous studies. The increase in accuracy resulting from the GMKNN-ABC method in the classification of kidney disease contributes to the study. ACKNOWLEDGMENT The author is grateful to UCI Machine Learning for providing the required dataset. The author also wishes to thank the Faculty Leaders of Computer Science. Universitas Sriwijaya, who have supported this study. REFERENCES