Institut Riset dan Publikasi Indonesia (IRPI) MALCOM: Indonesian Journal of Machine Learning and Computer Science Journal Homepage: https://journal. id/index. php/malcom Vol. 3 Iss. 2 October 2023, pp: 366-372 ISSN(P): 2797-2313 | ISSN(E): 2775-8575 Fuzzy Clustering-Based Grouping for Mapping the Distribution of Student Success Data Mustakim1*. Delvi Nur Aini2. Ana Uzla Batubara3. Moh. Erkamim4. Legito5 Program Studi Sistem Informasi. Fakultas Sains dan Teknologi. Universitas Islam Negeri Sultan Syarif Kasim Riau. Indonesia Sekolah Tinggi Ilmu Kesehatan Binalita Sudama Medan. Indonesia Program Studi Sistem Informasi Kota Cerdas. Universitas Tunas Pembangunan Surakarta. Indonesia Jurusan Teknik Informatika. Sekolah Tinggi Teknologi Sinar Husni Deliserdang. Sumatera Utara. Indonesia Email: 1mustakim@uin-suska. id, 212050320493@students. uin-suska. 3anauzla@gmail. com, 4erkamim@lecture. id, 5legitostt@gmail. Received Sep 03rd 2023. Revised Oct 25th 2023. Accepted Nov 10th 2023 Corresponding Author: Mustakim Abstract Learning activities are the main activity in the overall teaching and learning process in schools. This is because whether or not the achievement of educational goals depends on how the learning process is carried out by students. The uneven level of student success in learning is one of the problems in the school's efforts to realize the vision and mission of SMKN 5 Pekanbaru in preparing skilled graduates to be able to work in certain sectors by the public interest and the industrial In this study, mapping and grouping student grade data was carried out using the Fuzzy C-Means algorithm to provide information to the school in making the right decisions and optimizing the learning process. Furthermore, clustering was carried out in several experiments K=3 to K=7, and obtained the best validity value tested with the Silhouette Index of 0. 4277 located at K=5. Then the distribution of cluster 5 on student score data was obtained with details, namely cluster 1 with a capacity of 1 student, cluster 2 with a capacity of 27 students, cluster 3 with a capacity of 1 student, cluster 4 with a capacity of 10 students, cluster 5 with a capacity of 23 students. Keyword: Clustering. Fuzzy C-Means. Silhouette Index. The Success of Students INTRODUCTION Vocational High School is one manifestation of formal education that aims to equip graduates with knowledge and skills beneficial for themselves in their professions and public interests. Therefore, there is a need for efforts to improve the quality of graduates, including fostering good collaboration and coordination between the government, schools, cooperation in the industrial sector, teacher expertise, the role of parents, and the students themselves . Based on data from the Education Department of Riau Province . Vocational High School or SMKN 5 Pekanbaru is a secondary school located on Yos Sudarso Street. Rumbai District. Pekanbaru City. Riau. The school was established in 1995 and currently offers 12 study programs. SMKN 5 Pekanbaru is one of the vocational schools registered in the "Center of Excellence Vocational School" program, which includes several programs such as strengthening human resources (HR) in both Hard Skills and Soft Skills aspects. The program prioritizes student achievements, graduate skills, and student success in learning . The uneven level of student success in learning becomes one of the challenges in the school's efforts to realize the vision and mission of SMKN 5 Pekanbaru in preparing skilled graduates for a career in specific sectors according to the needs of the community and the industrial field . To gain a better understanding of the factors that have the greatest impact on student success, schools require a technique or method that can help them analyze data and make informed decisions. Data mining is an effective method that can be used for this purpose . Data mining is a process that applies mathematical, statistical, artificial intelligence, and machine learning methods to extract useful facts from a dataset . To facilitate the grouping of the distribution of student success data, a technique for grouping the data . is required. The Fuzzy Clustering algorithm or Fuzzy C-Means (FCM) is one type of algorithm applied in clustering data, where the degree of membership determines the presence of each data point in a group . DOI: https://doi. org/10. 57152/malcom. ISSN(P): 2797-2313 | ISSN(E): 2775-8575 Extensive research has been conducted on clustering techniques for grouping data. The Fuzzy C-Means algorithm has the highest silhouette index accuracy comparison of 0. 2559, which is better than the k-medoids algorithm . and has an accuracy rate of 76% when compared to the k-means algorithm . The Fuzzy CMeans algorithm is excellent at detailed grouping . based on the data attributes it possesses . , and it performs exceptionally well in finding high-level clusters . Additionally, it can map interactions between different cluster shapes . and demonstrate test results with a best percentage of 71. 23% . The Silhouette Index is shown to have a strong structure category value of 0. 84321191, indicating that the Fuzzy C-Means algorithm is well-suited for grouping product data into three clusters . Another study in grouping student achievement using the Fuzzy C-Means Clustering algorithm resulted in a validity value of 1. 3554 which mapped as many as 3 clusters based on academic scores . The Fuzzy C-Means method also proved optimal in cluster center placement compared to other cluster methods in predicting student truancy factors . Based on the problem statement and referring to previous research, data mining techniques using the Fuzzy Clustering algorithm are applied for mapping the distribution of student success data at SMKN 5 Pekanbaru. This study also aims to assess the ability and reliability of the Fuzzy Clustering algorithm, with the hope of assisting the school in making informed decisions and optimizing the learning process at SMKN 5 Pekanbaru. MATERIALS AND METHOD Data Data collection in this study includes a dataset of report card scores of grade 12 students in the Department of Building Modeling and Information Design (DPIB) for the 2019-2020 Academic Year, sourced from the Curriculum Department Database at SMKN 5 Pekanbaru. The data was collected from July 15 to August 31, 2022, with a total of 62 data records with details in Table 1. Table 1 . Student Grade Data from Cleaning Name Gender Religion Student 1 Student 2 Student 3 Student 4 Student 5 Student 6 Student 7 Students 62 Religion Edu. Pancasila Edu. EBK KUG KJJ Method The method applied in this study consists of several stages: Data Collection. Data Pre-Processing. Clustering. Validity Comparison. Results, and Analysis. The research begins with the process of data accumulation . carried out to clean data or delete data that is missing value. The results of preprocessing data are then normalized which aims to produce a balance between low and high values in finding the initial value, maximum, and minimum data. It then applies the Fuzzy C-Means algorithm to map student grade data and generate clusters of data to be evaluated using the Silhouette Index Validity. The process and results of the analysis are carried out to compare the validity values obtained in the final analysis clustering. Research Methodology described in Figure 1. Data Mining Data mining is a method in which one or more computer system learning techniques are instructed to acquire and analyze knowledge in a series of stages to add value to a database that is not obtained manually . Data mining is a technique for discovering valuable facts in a sequential set of databases to detect previously unknown trends, patterns, and relationships that can be utilized to form predictive models . and aims to detect specific models that need to be obtained for use in valid decision-making in the future. Clustering lustering is a process of categorizing data based on their level of similarity. This process aims to maximize the similarity among objects within a cluster while minimizing the similarity between different The similarity between two objects is measured by the relationship between them . Two types of data clustering techniques are commonly used: hierarchical clustering and non-hierarchical clustering. MALCOM - Vol. 3 Iss. 2 October 2023, pp: 366-372 MALCOM-03. : 366-372 Hierarchical clustering involves classifying objects to obtain the total number of clusters required. On the other hand, non-hierarchical clustering begins by specifying the desired number of clusters. Fuzzy clustering is a widely used non-hierarchical clustering technique . Figure 1. Research Methodology Fuzzy C-Means Fuzzy C-Means (FCM) is one clustering method that considers the presence of data in a cluster determined by the position of members that encompass a fuzzy set. This can help researchers determine the similarity of each data object in a group. Fuzzy C-Means fall under unsupervised learning methods with a predetermined total cluster in the clustering process. The algorithm commonly used in Fuzzy Clustering is FCM . The calculation method in the Fuzzy C-Means algorithm is as follows: Cluster X input data, which consists of a matrix with dimensions n x m . , total sample data. attributes of each dat. Xij, i-th sample data . =1,2. , j-th attribute . =1,2. Specify: Total Cluster: c. Rank: w. Maximum Iteration: MaxIter. Error Rate: AI. Initial Objective Function: P o= 0. Initial Iteration: t = 1. Decrease random numerical ik, i=1,2. k=1. into the elements of the initial partition matrix U. ycEycn = Oc = yca yco yuNycnyco yuNycnyco = yuNycnyco ycEycn With j = 1, 2, . Count: Specify the k-th cluster center point: Vkj, where k=1. and j=1. ycOycoyc = Ocycu ycn =1 (. uNycnyc. ycO ycUycnyc yc Ocycu ycn =1 . uNycnyc. Define objective functions according to the t-th iteration. Pt ycEyc = Ocycuycn = 1 Ocycayco = 1 ([Ocyco yc = 1. cUycnyc Oe ycOycnyc ) ] . uNycnyc. ) Fuzzy Clustering-Based Grouping for Mapping. (Mustakim et al, 2. ISSN(P): 2797-2313 | ISSN(E): 2775-8575 Calculate transformations on partition matrix Oe1 2 ycOe1 yuNycnyco = [Ocyco yc =1. cUycnyc OeycOycoyc ) ] . Oe1 2 ycOe1 [Ocycayco=. cUycnyc OeycOycoyc ) ] Check stop position: If: (|Pt-Pt-. < yuAya or . > MaxIte. then stop. If not: t = t 1, repeat at the 4th step Silhouette Index (SI) The analysis of the SI method can be applied for validation in the Fuzzy C-Means algorithm. SI is used to estimate the average distance values between clusters in assessing the quality of the clusters and predicting how well a study becomes a single cluster. This method is performed by estimating the similarity or dissimilarity between shapes within one cluster and another cluster from the analysis results . A cluster is considered more optimal if it has a value that approaches 1. The following is the formula for calculating the SI value in equation 6. Oeyca. ycIya = Ocycuycn=1( . ycu ycoycaycu . ,yca. } The Success of Students Learning activities are a fundamental aspect of the overall teaching and learning process in schools. This is because the achievement of educational goals depends on how students engage in the learning process. To realize students' success in the teaching and learning activities, the role of the teacher is crucial, acting as both a motivator and a facilitator. Implementing effective and understandable teaching methods is one type of effort that can be made by teachers. Furthermore, the adopted teaching methods also influence the success of the teaching and learning process in achieving the intended goals . The role of a teacher in the process of learning is quite complex. They are not just responsible for delivering course materials, but also for encouraging and guiding students to become more enthusiastic about In order to achieve this, teachers must be proficient in the subject matter they are teaching and use effective delivery models. Additionally, they should continuously strive to develop their skills and create a conducive learning environment to ensure the success of their students in their learning endeavors . Tools This study uses tools, namely RapidMiner for the cluster formation process. RapidMiner is an application with a GUI (Graphical User Interfac. that makes it easier for users to use this software. This software is open source and created using Java programs under the GNU Public License . This study also uses Matlab tools for cluster validation process using silhouette index validity. MATLAB (Matrix Laborator. is a high-level programming language specifically used for numerical computing, programming, and Matlab allows matrix manipulation, plotting functions and data, implementing algorithms, creating user interfaces, and interfaceing with programs in other languages . RESULTS AND DISCUSSION Preprocessing Data Result The results of preprocessing the data resulted in the attributes used covering gender, religion, and a total of 11 subjects. The results of the data cleaning process are then carried out in the normalization process. The normalization applied is the min-max normalization method. The following are the results of the data normalization process presented in Table 2. Table 2. Normalization of Student Grade Data Name Gender Religion Student 1 Student 2 Student 3 Student 4 Student 5 Student 6 Student 7 Students 62 Religion Edu. MALCOM - Vol. 3 Iss. 2 October 2023, pp: 366-372 Pancasila Edu. EBK KUG KJJ MALCOM-03. : 366-372 Clustering with Fuzzy C-Means Algorithm In this study, the Fuzzy C-Means algorithm was used to cluster data that had undergone pre-processing and normalization in the previous stage. The objective of the clustering tests was to obtain the most appropriate cluster mapping. The Fuzzy C-Means algorithm was tested with different parameters, including the number of clusters . anging from 3 to . and the power input set to 2, with Maxiter set to 100. The results of the clustering mapping are presented in Table 3. Table 3. Fuzzy C-Means clustering results Experiment K=3 K=4 K=5 K=6 K=7 Cluster Cluster Validity Comparison Based on clustering experiments with Fuzzy C-Means algorithm on experiments 3 clusters into 7 The best cluster results are obtained by comparing the validity values obtained using SI in each cluster The following comparative results of the experiments that have been carried out are presented in Figure 2. The Comparison of Validity Scores 0,4277 0,2648 0,2395 0,000 0,000 K=3 K=4 K=5 K=6 K=7 Figure 2. Validity Value Comparison The test value results are presented in Figure 2. The validity of the entire cluster model experiment lies in the range of K=3 to K=7. Among these, the validity value at K=5 is the most optimal for cluster placement, as it is closest to 1. Therefore, the SI validity value at K=5 is more optimal than the SI value at any other cluster. To analyze the data, the most optimal cluster validity value of K=5 is used. The Distribution of Cluster Results K=1 K=2 K=3 K=4 K=5 Figure 3. Cluster Results Distribution Clustering at K=5 with a total of 62 students is presented in Figure 3. It was obtained that the distribution of student score data clusters with details K=2 is the cluster with the highest data of 27 students and K=1 and K=3 are clusters with the lowest data, which only contains 1 student in each cluster. From this study, it can be Fuzzy Clustering-Based Grouping for Mapping. (Mustakim et al, 2. ISSN(P): 2797-2313 | ISSN(E): 2775-8575 seen that the Fuzzy C-Means algorithm can group student grade data into 5 clusters based on the best validity value even though the validity value is included in the weak structure category. CONCLUSION Based on the results and analysis carried out, it can be concluded that testing on mapping and grouping student value data by applying the Fuzzy C-Means algorithm obtained the best SI validity value from several cluster experiments, namely at K=5 of 0. The distribution of K=5 in student score data with details is cluster 1 with a capacity of 1 student, cluster 2 with a capacity of 27 students, cluster 3 with a capacity of 1 student, cluster 4 with a capacity of 10 students, cluster 5 with a capacity of 23 students. The results were obtained by 2 groups based on the value of student learning outcomes on the data where students who had learning outcomes scores of less than 80 were in clusters 1 and 2. Students who have learning outcomes above 80 are in clusters 3, 4, and 5. The results of this study contribute to providing a better picture of the details of student learning outcomes and provide a basis for schools to conduct more effective coaching to improve overall student learning success. In addition, it is also necessary to conduct a more in-depth analysis of the factors that affect student learning outcomes at SMKN 5 Pekanbaru to provide more comprehensive insights for the school in designing a more effective coaching program. As recommendations for future research, it can be considered to apply other methods or algorithms that can improve the validity of results, as well as more deeply understand the factors and features that affect student learning outcomes in the school scope. REFERENCES