Matrik: Jurnal Manajemen. Teknik Informatika, dan Rekayasa Komputer Vol. No. July 2023, pp. 431O440 ISSN: 2476-9843, accredited by Kemenristekdikti. Decree No: 200/M/KPT/2020 DOI: 10. 30812/matrik. Comparison of Machine Learning Methods for Classifying User Satisfaction Opinions of the PeduliLindungi Application Putu Tisna Putra. Anthony Anggrawan. Hairani Hairani Universitas Bumigora. Mataram. Indonesia Article Info ABSTRACT Article history: Since the emergence of the Covid-19 virus, the Indonesian government urged people to study, work, and worship or work from home. The social restriction policy has changed peopleAos behavior which requires physical distance in social interaction. The government developed an application to minimize the spread of Covid-19, namely the PeduliLindungi application. The PeduliLindungi application is a tracking application to prevent the spread of Covid-19. The governmentAos policy of implementing the PeduliLindungi application during Covid-19 aroused pros and cons from the public. The volume of PeduliLindungi application review data on Google Play was increasing, so manual analysis could not be done. New analytical approaches needed to be carried out, such as sentiment analysis. This research aimed to analyze user reviews of the PeduilLindungi application using classification methods, namely Support Vector Machine (SVM). Random Forest, and NaOve Bayes. The methods used were Synthetic Minority Oversampling Technique (SMOTE). Random Forest. SVM, and NaOve Bayes. SMOTE was used to balance user review data on the PeduliLindungi application. After the data had been balanced, classification was carried out. The results of this study showed that the Random Forest method with SMOTE got better accuracy than the SVM and NaOve Bayes methods, which was 96. 3% based on the division of training and testing data using 10-fold cross-validation. Thus, using the SMOTE method could improve the accuracy of classification methods in classifying opinions of user satisfaction with the PeduliLindungi application. Received April 20, 2023 Revised May 24, 2023 Accepted Juny 06, 2023 Keywords: Machine Learning PeduliLindungi Application Sentiment Analysis Text Mining Copyright c 2022 The Authors. This is an open access article under the CC BY-SA license. Corresponding Author: Hairani, 6287839793970. Faculty of Engineering and Computer Science Undergraduate. Universitas Bumigora. Mataram. Indonesia. Email: hairani@universitasbumigora. How to Cite: Tisna Putra. Anggrawan, and H. Hairani. AyComparison of Machine Learning Methods for Classifying User Satisfaction Opinions of the PeduliLindungi ApplicationAy. MATRIK: Jurnal Managemen,Teknik Informatika, dan Rekayasa Komputer. Vol. No. 3, pp. Jul, 2023. This is an open access article under the CC BY-SA license . ttps://creativecommons. org/licenses/by-sa/4. Journal homepage: https://journal. id/index. php/matrik INTRODUCTION ISSN: 2476-9843 Covid-19 has been a public concern since its appearance was detected in the city of Wuhan. China, in 2020. The emergence of Covid-19 has caused the order of human life to change, especially social life. Since the beginning of the emergence of the covid-19 virus, the Indonesian government has urged to study, work, and worship from home or work from home. PeopleAos habits, such as gathering and shaking hands, must be accustomed to social restrictions. The social restriction policy has changed peopleAos behavior which requires physical distance in the process of social interaction. The government itself developed an application that is used to minimize the spread of covid-19, namely the PeduliLindungi application . The PeduliLindungi application is a tracking application to prevent the spread of covid-19 . The features that can be used in the PeduliLindungi application are the location of vaccinations and the community sharing location data for traveling so that the contact history with Covid-19 patients can be known. In addition, the application lets users know when they are in the red zone . The policy of implementing the use of the PeduliLindungi application by the government during covid-19 arouses the pros and cons of the public. Public complaints about the use of the PeduliLindungi application are . frequent errors, . vaccine certificates do not appear, . vaccine status does not change, and . concerns about data leakage . addition, the volume of PeduliLindungi app review data on Google Play is increasing, so manual analysis cannot be done, so a new analytical approach needs to be taken, such as sentiment analysis . Sentiment analysis quickly emerged as an automated process to examine semantic relationships and meanings in reviews. One of the sentiment analysis techniques on social media data that can be used is the machine learning approach . The machine learning approach utilizes algorithms to extract and detect sentiment from an opinion on social media data. Therefore, this research proposes a machine learning approach for sentiment analysis of user satisfaction in the PeduliLindungi application, such as satisfied, neutral, and dissatisfied automatically. Several previous studies have discussed user reviews of the PeduliLindungi application using various methods, such as research conducted by . using the NaOve Bayes method for analyzing user reviews of the Protect application with a total of 1179 datasets and the accuracy obtained is 83. Research . compared the NaOve Bayes and SVM methods for sentiment analysis of the PeduliLindungi application using a dataset of 4636 instances. Based on the results of his research, the SVM method has higher accuracy than NaOve Bayes, which is 91% for SVM and 90% for NaOve Bayes. Research . used SVM. NaOve Bayes, and KNN methods to classify community responses to the PeduliLindungi application using a dataset of 6000 instances. Based on the test results of this research using 80% training data and 20% testing data, the SVM method got the best accuracy compared to the Nave Bayes and KNN methods of 76. Research . utilized the NaOve Bayes method for sentiment analysis of the Peduli Lindungi application with a dataset of 1000 instances. Based on the test results of this research using 80% training and 20 testing data, the NaOve Bayes method obtained an accuracy of 73%. Research . used the Long Short Term Memory (LSTM) method for user sentiment classification in the PeduliLindungi application with review data from as many as 3000 instances divided into positive and negative reviews. The results of his research obtained an accuracy of 82. 44%, a precision of 78. 66%, and a recall of 87%. Research . used the NaOve Bayes method to classify user reviews of the Peduli Lindungi application, with 496 instances of review data divided into positive and negative reviews. The study results obtained an accuracy of 85%, precision of 77. 7%, and recall of 98%. Research . combined the NaOve Bayes method with Particle Swarm Optimization (PSO) feature selection to classify user reviews on the PeduliLindungi application with review data of 1364 instances divided into positive and negative reviews. The results of the combination of NaOve Bayes and PSO methods get an accuracy of 93% and AUC of 97. Research . used the Convolutional Neural Network (CNN) method for sentiment analysis of PeduliLindungi application user reviews based on aspects of Visual Experience. Scan. Vaccine Certificate, eHac. Covid-19 Test. Login. Performance, and Security. The data used were 2320 instances, and the results of his research obtained an f1 score of 95. Research . used the SVM method for sentiment analysis of user reviews on the PeduliLindungi application with an accuracy of 64%. Research . used the C4. 5 method to classify the nutritional status of toddlers with an accuracy of 95%. Research conducted by . used SVM and NaOve Bayes methods to analyze online news media application reviews on Google Play with a dataset of 5000 instances with an accuracy of 88% for SVM and 87% for NaOve Bayes. Research conducted by . used the SVM method for sentiment analysis of electronic money service customers with a dataset of 3852 instances obtained from Twitter with an accuracy of 98%. Finally, research conducted by . used the random forest method to analyze comments on the relocation of the countryAos capital with a dataset of 1639, and the accuracy obtained was 76%. The difference between this research and previous research is this research handles the problem of unbalanced data on PeduliLindungi application reviews using SMOTE. After the data is balanced, the PeduliLindungi application user reviews are classified using three methods, namely SVM. Random Forest, and NaOve Bayes. The novelty of this research is the combination of the SMOTE method with several machine learning methods such as SVM. Random Forest, and NaOve Bayes for classifying user opinion satisfaction in the PeduliLindungi application. Therefore, this study aims to analyze user reviews of the PeduliLindungi application using classification methods, namely SVM. Random Forest, and NaOve Bayes. Matrik: Jurnal Managemen,Teknik Informatika, dan Rekayasa Komputer. Vol. No. July 2023: 431 Ae 440 Matrik: Jurnal Managemen,Teknik Informatika, dan Rekayasa Komputer RESEARCH METHOD Figure 1 shows the flow of this research. Based on Figure 1, this research consists of several stages: collecting datasets by scraping user reviews on the PeduliLindungi application on Google Play. Next is labeling user review data on the PeduliLindungi application, divided into satisfied, neutral, and dissatisfied reviews based on the rating given. Rating ratings of more than three are considered satisfied reviews, ratings of three become neutral, and ratings of less than three become dissatisfied. The next stage is the text preprocessing stage. The text preprocessing section processes raw data into quality data by case folding, tokenization, filtering, and stemming. The results of text preprocessing are then carried out text weighting using the TF-IDF method. The next stage is text classification using Nave Bayes. Random Forest, and SVM methods based on the division of training and testing data using 10-fold cross-validation. The last stage is performance testing based on accuracy using the confusion matrix table. Figure 1. Research Flowchart Dataset Collection he data used are user reviews about the PeduliLindungi application on Google Play obtained by scraping data. However, the results of scraping user review data do not yet contain review categories such as satisfied, neutral, and dissatisfied, so it is necessary to label the review category based on the rating value given by the user. For example, a user score rating Ou 4 becomes Satisfied, a user score rating O 3 becomes Dissatisfied, and a score rating = 3 becomes Neutral. Text Preprocessing Text preprocessing is used to process raw data into quality data by performing case folding, tokenization, filtering, and stemming . The commonly used stages in the preprocessing stage are shown in Figure 2. In Figure 2, the preprocessing stage begins with case folding. Case folding is the process of changing capital letters into lowercase letters. In contrast, tokenization is the process of separating sentences into words. Then, filtering is used to remove less essential words or keep important words. For example, common words that usually appear in Indonesian are Ayyang,Ay Aydan,Ay Aydi,Ay and Ayfrom,Ay so they need to be removed. Finally. Stemming is the process of forming base words. Figure 2. Preprocessing Stages Comparison of Machine . (Putu Tisna Putr. ISSN: 2476-9843 Word Weighting The data from text preprocessing that is still in the form of words will be converted to numerical form with a word weighting process that aims to get the weight of each word used as a feature. The result of word weighting with TF-IDF is the multiplication of TF and IDF values which will produce the weight of each word using Equation . Tf is term frequency. W is TF-IDF weight, and idf is Inverse Document Frequency. W = T f y idf Data Sampling The data used contains class imbalance, which can affect the performance of the classification method. Therefore, unbalanced data must be handled to improve the classification methodAos performance. This research employed the SMOTE method to balance the data by adding instances to the minority class based on the nearest neighbor. Figure 3 displays the various stages of the SMOTE Figure 3. The SMOTE Process Implementation Model At this stage, the algorithms used in classification will be implemented, namely the SVM. Random Forest, and NaOve Bayes The NaOve Bayes classifier is a classification method based on BayesAo theorem. The NaOve Bayes method uses probability and statistical methods to predict opportunities based on previous experience. The Naive Nayes methodAos classification process uses Equation . P(H|X) represents the probability that the hypothesis is true for the observed sample data X. P(X|H) represents the probability of the sample data X if hypothesis H is true. P(H) represents the probability of hypothesis H, and P(X) represents the probability of the observed sample. The SVM method works by separating two different classes using Equation . W is the weight, x is the input data, y is the class, and b is the bias. The Random Forest algorithm is an ensemble learning-based classification method. The advantages of Random Forest are that it has good accuracy results, is robust against outliers and noise, and is faster than bagging and boosting . The stages of the Random Forest method are shown in Figure 4. Matrik: Jurnal Managemen,Teknik Informatika, dan Rekayasa Komputer. Vol. No. July 2023: 431 Ae 440 Matrik: Jurnal Managemen,Teknik Informatika, dan Rekayasa Komputer P (H|X) = P (X|H)P (H) P (X) x b = 0 Figure 4. Random Forest Flowchart Performance Testing At this stage, testing is carried out on the classification method used. This research uses accuracy testing to measure performance on the classification method using Equation . Accuracy measurement uses a confusion matrix table that consists of True Positive. True Negative. False Positive, and False Negative. True Positive is the actual positive class predicted correctly. True Negative is the actual negative class predicted correctly. False Positive is the actual negative class, but the positive class is predicted. Finally, false Negative is the actual positive class, but the negative class is predicted. Accuracy = TP TN TP FP TN FN RESULT AND ANALYSIS This section describes the research results obtained under the stages in Figure 1. PeduliLindungi application user review data is 4005 instances that do not yet have labels, so labeling needs to be done based on user score ratings. For example, a user score rating Ou 4 becomes Satisfied, a user score rating O 3 becomes Dissatisfied, and a score rating = 3 becomes Neutral. The results of the data labeling that has been done are shown in Table 1. The data that has been obtained is text-processed to make it more qualified with case folding, tokenization, filtering, and stemming. The processed data can be shown in Table 2. The data processed in the previous stage is still in the form of words, so it needs to be converted to numeric form. Changing words into numeric form is required to obtain the weight of each word so that it can be processed by the classification method. The results of word weighting using the TF-IDF method are shown in Table 3. Comparison of Machine . (Putu Tisna Putr. ISSN: 2476-9843 Table 1. Dataset Labelling Result Content aplikasi tolong susah cek hasil sertifikasi vaksin fitur isi data tanggal lahir repot tulis manual min ijin scan barcode muncul notifikasi ijin biar masuk pabrik aplikasi gak suruh upload sertifikat vaksin Update berat forced closed Masuk aplikasi minta kirim kode verifikasi Manfaat aplikasi bikin ribet masuk mall Bermanfaat butuh butuh Score Opinion Dissatisfied Dissatisfied Dissatisfied Satisfied Dissatisfied Satisfied Dissatisfied Dissatisfied Satisfied Neutral Table 2. Text Preprocessing Result Content . plikasi, tolong, susah, cek, hasil, sertifikasi, vaksi. itur, isi data, tanggal, lahir, repot, tulis, manua. in, ijin, scan, barcode, muncul, notifikasi, iji. iar, masuk, pabri. plikasi, gak, suruh, upload, sertifikat, vaksi. pdate, berat, forced, close. asuk, aplikasi, minta, kirim, kode, verifikas. anfaat, aplikasi, bikin, ribet, masuk, mal. ermanfaat, butuh, butu. Score Opinion Dissatisfied Dissatisfied Dissatisfied Satisfied Dissatisfied Satisfied Dissatisfied Dissatisfied Dissatisfied Neutral Table 3. Term Weighting Results Using TF-IDF Method Index , 6. , 6. , 5. , 5. , 2. , 2. , 1. 5, 1. 5, . Term TF-IDF Values 0,25 0,29 0,53 0,31 0,35 0,26 0,73 0,68 The data used still contains unbalanced data that affects the performance of the classification method. For example, the number of dissatisfied opinion labels . 9 instance. is more than satisfied opinions . and neutral . , so it needs to balance the opinion label data. Therefore, this research uses the SMOTE method to oversample the data so that the data becomes The amount of unbalanced and balanced data can be seen in Figure 5. The classification methods used in this research are NaOve Bayes. Random Forest, and SVM. The three methodsAo performance testing is based on accuracy, precision, and specificity using the confusion matrix table. Based on testing the NaOve Bayes. Random Forest, and SVM methods using 10-fold cross-validation, the results are obtained as a confusion matrix table as in Figure 6 for the NaOve Bayes. Figure 7 for the SVM, and Figure 8 for the Random Forest results. Figure 6 shows that without SMOTE, the NaOve Bayes method can correctly classify the Satisfied label in 0 instances. Neutral in 88 instances, and Dissatisfied in 2992. With SMOTE, the NaOve Bayes method can classify the Satisfied label in 2625 instances. Neutral in 2583 instances, and Unsatisfied in 2136. According to Figure 7, the SVM method without SMOTE can correctly classify the Satisfied label in 17 instances. Neutral in 258, and Dissatisfied in 2950. However, with SMOTE, the SVM method can classify the Satisfied label in 2776 instances. Neutral in 2916 instances, and Dissatisfied in 2942. Figure 8 shows that without SMOTE, the Random Forest method can correctly classify the Satisfied label in 11 instances. Neutral in 238 instances, and Matrik: Jurnal Managemen,Teknik Informatika, dan Rekayasa Komputer. Vol. No. July 2023: 431 Ae 440 Matrik: Jurnal Managemen,Teknik Informatika, dan Rekayasa Komputer Dissatisfied in 2958. With SMOTE, however, the Random Forest method can classify the Satisfied label in 2866 instances. Neutral in 2913 instances, and Dissatisfied in 2888 instances. Figure 5. Data Distribution of Satisfaction Opinion on the Use of PeduliLindungi Application . Figure 6. NaOve Bayes Method Confusion Matrix Results Before SMOTE, . NaOve Bayes Method Confusion Matrix Results After SMOTE . Figure 7. SVM Method Confusion Matric Results Before SMOTE, . SVM Method Confusion Matric Results After SMOTE Comparison of Machine . (Putu Tisna Putr. ISSN: 2476-9843 . Figure 8. Confusion Matric Result of Random Forest Method Before SMOTE, . Confusion Matric Result of Random Forest Method After SMOTE In this section, we will analyze the results obtained by the Random Forest. NaOve Bayes, and SVM methods based on accuracy, as shown in Figure 9. Based on Figure 8, the Random Forest method with SMOTE gets the best accuracy compared to the NaOve Bayes and SVM methods. The accuracy obtained by the Random Forest method with Smote is 96. Additionally, using SMOTE can increase the accuracy of classification methods supported by research . Ae. The increase in accuracy in the classification method used after using SMOTE is that the Random Forest method increased by 16. 2%, the SVM method increased by 15. 4%, and the NaOve Bayes method increased by 4. A comparison of the research results obtained with the results of previous research is shown in Table 4. Table 4 shows that the proposed method gets better accuracy than some previous research results using the same case study. Figure 9. Accuracy of the Classification Methods Used Table 4. Comparison Results of this Research with Previous Research Researcher. Aritonang, et al. Mustapa, et. Saputra, et. Lustiansyah, et al . Rais, et. Salma, et. Locarso, et. Darusman . Proposed Method Methods CNN Nave Bayes PSO Nave Bayes LSTM Naive Bayes SVM Nave Bayes SVM Random Forest Smote Case Study PeduliLindungi Application Review Matrik: Jurnal Managemen,Teknik Informatika, dan Rekayasa Komputer. Vol. No. July 2023: 431 Ae 440 Accuracy Matrik: Jurnal Managemen,Teknik Informatika, dan Rekayasa Komputer CONCLUSION This research dataset consists of user reviews of the PeduliLindungi application on Google Play, with a total dataset of 4005 reasons for the PeduliLindungi application, divided into 506 Satisfied reviews, 500 Neutral reviews, and 2999 Dissatisfied reviews. The data contains unbalanced data, so it is necessary to balance it using the SMOTE method. After SMOTE, the total dataset became 8997 reasons for the PeduliLindungi application, divided into 2999 Satisfied reviews, 2999 Neutral reviews, and 2999 Dissatisfied Based on the test results, the Random Forest method with Smote gets better accuracy than the NaOve Bayes and SVM Therefore, it can be concluded that the effect of using SMOTE in solving unbalanced data on public opinion reviews about the PeduliLindungi application can improve performance in the classification method used. Future research can use embedding methods other than TF-IDF, such as Word2Vectorizer. BERT, or Bag of Words, and feature selection methods to improve accuracy in classification methods. REFERENCES