11th ISC 2024 (Universitas Advent Indonesia, Indonesia) “Research and Education Sustainability: Unlocking Opportunities in Shaping Today's Generation Decision Making and Building Connections” October 22-23, 2024 Sentiment Analysis of Customer Satisfaction of Shopee Service Quality Debby E. Sodnakh1, Semmy W. Taju2, Regina Patricia3, Reinhard Tumbal4 Faculty of Computer Science, Universitas Klabat debby.sondakh@unklab.ac.id ABSTRACT E-commerce has become one of the most popular services, especially during the coronavirus (COVID-19) pandemic that restricts activities outside the home. Customer satisfaction is an important factor for the success of e-commerce companies. Therefore, this study aims to analyze customer satisfaction with the service quality of Shopee, one of the e-commerce platforms widely used by the public. Sentiment analysis on customer reviews is conducted following these steps, data collection, pre-processing, data classification, modelling, and evaluation. The SVM, random forest, and k-NN algorithms were used to model customer satisfaction. As many as 6000 reviews were collected from Google Play Store. The results reveal SVM has the best performance. In the positive class, Sensitivity reached 81.7%, Specificity 95.3%, Accuracy 90.8%, and MCC 0.79. In the neutral class, sensitivity reached 91.8%, specificity 83.5%, accuracy 86.3%, and MCC 0.721. In the negative class, sensitivity reached 77.5%, specificity 96.6%, accuracy 90.3%, and MCC 0.778. Thus, the resulting model can accurately identify customer sentiment based on the reviews provided. Keywords: E-commerce, Sentiment Analysis, SVM, k-NN, Random Forest INTRODUCTION Electronic commerce or e-commerce is one form of technology utilisation in the business world. E-commerce includes a digital platform used to conduct buying and selling transactions through an electronic system that continues to experience rapid development. According to the results of the eCommerce 2022 survey, the number of e-commerce businesses in Indonesia in 2021 is 2,868,178 businesses, an increase of 506,755 businesses, an increase of 506,755 businesses from the previous year (Oktora et al., 2022). The increase can also be seen in the value of e-commerce transactions. The Coordinating Ministry for Economic Affairs noted the growth in the value of e-commerce transactions in Indonesia during the first quarter of 2022, which was 23 per cent higher than the same period in 2021 (Uly & Djumena, 2022).. E-commerce is not only limited to retail product sales, but also involves more complex services, such as delivery and financial services (Kütz, 2016). One of the popular services in e-commerce is online shopping, where the buying and selling of products and services between sellers and buyers occurs over the internet, either using a web browser or an application on a mobile device. Technological advances have also brought the concept of marketplace, where 1463 11th ISC 2024 (Universitas Advent Indonesia, Indonesia) “Research and Education Sustainability: Unlocking Opportunities in Shaping Today's Generation Decision Making and Building Connections” October 22-23, 2024 several business people can sell their products or services (Veselova, 2020), using the concept of customer to customer (C2C) and business to costumer (B2C) in facilitating online trading transactions (Erlyana & Hartono, 2017). Thus, nowadays more and more business people are selling products or services through the marketplace. Shoppe is one of the most widely used marketplaces in Indonesia. In the ever-evolving digital era, customer satisfaction has become an important determinant of the performance and survival of e-commerce companies. Companies operating in this industry are engaged in a competitive race to provide efficient services that fulfil the demands of their clients. Shopee is committed to improving customer satisfaction by offering a wide array of products, including providing customer service features that cater to the needs of customers who experience problems while using the platform. In increasing customer satisfaction with the company, there are several things that need to be considered, such as the quality of existing products, the quality of service, and the price offered. Therefore, it is imperative to conduct an analysis of customer satisfaction in Shopee and understand the determinants that influence customers' perceptions and attitudes towards this e-commerce platform. Measuring and analysing customer satisfaction can provide valuable insights for companies to improve service quality, enhance interactions with customers, and strengthen their loyalty. Literature shows that sentiment analysis is a widely used approach to analyse service quality and customer satisfaction, to analyse government policies (Sandag et al., 2022). Sentiment analysis maps a person's opinion, by automatically classifying text into positive, neutral, or negative classes (Sitepu et al., 2022). Machine learning is one of the widely used techniques for sentiment analysis in addition to dictionary-based and ontology-based (Abirami & Gayathri, 2017). Machine learning approach refers to the use of machine learning algorithms to automatically identify patterns in a collection of review texts. There are several machine learning algorithms, particularly classification approaches, used for sentiment analysis, such as Maximum Entropy, Decision Tree, Naïve Bayes Classifier, k-Nearest Neighbor (k-NN), and Support Vector Machine (SVM). Studies have been conducted to analyse user sentiment towards the Shopee application (Cahyaningtyas et al., 2021; Pratmanto et al., 2020) as well as products sold in general and specific to certain products (Hariguna et al., 2019; Rhohmawati et al., 2019). Customer satisfaction plays a huge role in this. In today’s competitive e-commerce landscape, happy customers aren’t just a nice-to-have—they’re essential for survival. When people have a positive experience on Shopee—whether it's through quick deliveries, helpful customer service, or just finding what they’re looking for with ease—they’re more likely to stick around. And when they stick around, they not only make more purchases but often recommend the platform to their friends and family, helping Shopee grow its user base and boost its profits. On the other hand, when customer experience late deliveries, product issues, or poor customer service, they tend to share their frustrations in reviews. This research aims to analyze Shopee's customer satisfaction with the platform. The results of this research are expected to make a significant contribution to Shopee in developing a better strategy to fulfil the needs and expectations of its customers. Specifically, this research compares the ability of kNN, SVM and random forest classification algorithms in building a model to analyze customer satisfaction with the Shopee platform. 1464 11th ISC 2024 (Universitas Advent Indonesia, Indonesia) “Research and Education Sustainability: Unlocking Opportunities in Shaping Today's Generation Decision Making and Building Connections” October 22-23, 2024 METHODS Figure 1 shows the research steps that include data collection, pre-processing, data classification, modelling, and evaluation. Figure 1. Research Methodology The methodology section typically has the following sub-sections:  Data collection. A web scraping technique was used to retrieve 6000 user-given reviews, as well as ratings, from the comments section of the Shopee app on the Google Play Store. The keywords used were ‘kepuasan pelanggan’ and ‘kualitas layanan. This research employes Phyton's Natural Language ToolKit.  Pre-processing. At this stage the collected review data goes through the case folding stage to change capital letters to lowercase letters to get word uniformity which is useful for minimising the occurrence of errors at the tokenizing stage. Furthermore, stop removal to remove any words that have no meaning, stemming to remove affix words into basic words, and tokenizing to decompose the review data sentences into several parts of characters or words that have certain meanings. Furthermore, word weighting is done by converting the text data into numerical values, to determine the frequency of terms in the review text. The pre-processing results left 5992 data ready for further analysis  Data classification. The data is divided into 3 classes, namely positive, negative and neutral. The negative class consists of reviews with 1 (one) and 2 (two) stars, the neutral class is reviews with 3 (three) stars, and the positive class consists of 4 (four) and 5 (five) star reviews. With these results, the data becomes a dataset that will later be sentiment analyzed.  Modeling. Having processed the data and converted it into a numerical format, ensemble learning methods were applied to model the classifier. Random forest, k-NN, and SVM are used in this research. These algorithms were selected due to their proven effectiveness in classifying text. SVM great at handling complicated data like text, where the lines between positive, neutral, and negative feedback aren’t always clear. SVM works by finding the best way to separate different types of sentiment, even when things aren’t straightforward. (Dey et al., 2020). Random Forest is like a collection of decision trees that come together to make more accurate predictions. It’s particularly good at dealing with unbalanced data, which is often the case in sentiment analysis 1465 11th ISC 2024 (Universitas Advent Indonesia, Indonesia) “Research and Education Sustainability: Unlocking Opportunities in Shaping Today's Generation Decision Making and Building Connections” October 22-23, 2024 where you might have way more neutral reviews than negative ones. It’s also great at handling messy data, which makes it a strong contender for this kind of work. (Singh & Tripathi, 2021). k-NN is a simple, straightforward algorithm that looks at the "neighbors" of a data point to figure out where it belongs. It works well for smaller datasets and doesn’t need a lot of tweaking, which made it a good choice for us as a baseline. Although it can struggle with larger datasets, its simplicity and reliability made it worth including (Isnain et al., 2021)  Evaluation. Following the construction of the classifier model, the performance of the classifier model is determined by evaluating it based on the accuracy, precision, recall, and Matthew Correlation Coefficient (MCC) values RESULTS AND DISCUSSION Table 1 provides some examples of reviews on the Shopee platform obtained from the Google Play Store, which have been pre-processed and classed. Table 1 Sample of Reviews and Pre-processing Results Review Hasil Pre-processing Class Sebelumnya tidak ada kendala baik akses lambat ataupun error\. Tapi sejak saya lakukan update aplikasi, shopee menjadi sangat lambat dan hampir setiap saya mengakses, selalu force logout. Kalo lebih jelek dari versi sebelumnya, buat apa ada update aplikasi ‘kendala’ ‘akses’ ‘lambat’ ‘error’ NEGATIVE ‘laku’ ‘update’ ‘aplikasi’ ‘shopee’ ‘lambat’ ‘akses’ ‘force’ ‘logout’ ‘jelek’ ‘versi’ ‘update’ ‘aplikasi’ Belanja cepat dan mantap ‘belanja’ ‘cepat’ ‘mantap’ POSITIVE Market yang sesuai kebutuhan ‘market’ ‘sesuai’ ‘butuh’ ‘netizen’ POSITIVE netizen yang ramah dan lembut\. ‘ramah’ ‘lembut’ ‘pokok’ Pokoknya mantap shopee sukses ‘mantap’ ‘shopee’ ‘sukses’ selalu Aplikasi sangat semoga semakin depannya membantu, ‘aplikasi’ ‘bantu’ ‘moga’ ‘sukses’ POSITIVE sukses ke ‘depan’ Pengiriman selalu cepat ‘kirim’ ‘cepat’ POSITIVE Table 2 shows the distribution of 5992 which is divided into three classes, namely, positive class, negative class and neutral class, and shows the result that the neutral class has the highest 1466 11th ISC 2024 (Universitas Advent Indonesia, Indonesia) “Research and Education Sustainability: Unlocking Opportunities in Shaping Today's Generation Decision Making and Building Connections” October 22-23, 2024 presentation of 34% compared to the positive class and negative class which has a percentage of 33%. The data is divided into two parts, namely 80% training data and 20% testing data after which each class name is converted into numerical form, namely positive (0), neutral (1), and negative (2) classes after which the data is modelled using three classification algorithms to see Sensitivity, Specificity, Accuracy, and MCC. The modelling results are shown in Table 3. Table 2 Data Distribution Class Positive Neutral Negative No. of Reviews Percentage 1977 2037 1977 33% 34% 33% Training (80%) 1582 1630 1582 Testing (20%) 395 407 395 Table 3 Modelling Results Using Random Forest, k-NN, and SVM Algorithms Performance Sensitivity Specificity Accuracy MCC Classifier Random Forest k-NN SVM Positive Neutral Negative Positive Neutral Negative Positive Neutral Negative Positive Neutral Negative 80.1 64.1 59.4 78.8 85 88 79.2 78 78.5 0.565 0.499 0.49 81.7 91.8 77.5 95.3 83.5 96.6 90.8 86.3 90.3 0.79 0.72 0.778 64.6 57.9 40 78.6 65.2 87.4 75 62.8 71.6 0.221 0.221 0.314 In addition, visual data analysis was also carried out using Word Cloud text data visualisation which is useful for displaying words in each Shopee review that appear frequently. Word Cloud provides a visualisation of Shopee user reviews based on the frequency of the word. The word that has the highest frequency will have the largest visual word and is located in the middle of the data visualisation shown. The larger the word in the Word Cloud indicates that the word is often used in user reviews. The following figures, Figure 2, 3 and 4 are the visualisation of the Word Cloud respectively in the positive, neutral, and negative class reviews. The Word Cloud visualize the most occurring words in each class. As shown in the figure, the larger the size of the word indicates that the higher the frequency of the word appears. It can be seen that the highest frequency word in the positive class is ‘like’. 1467 11th ISC 2024 (Universitas Advent Indonesia, Indonesia) “Research and Education Sustainability: Unlocking Opportunities in Shaping Today's Generation Decision Making and Building Connections” October 22-23, 2024 Figure 1 Word Cloud of Positive Reviews Class Figure 2 Word Cloud of Neutral Reviews Class Figure 4 Word Cloud of Negative Reviews Class DISCUSSION The results presented above show that, in analyzing customer sentiment for Shopee's service quality, the Support Vector Machine (SVM) model has the highest performance in all classes. The SVM classifier evaluation results achieved high sensitivity, specificity, accuracy, and Matthew Correlation Coefficient (MCC) in the positive, neutral, and negative sentiment classes. Compared to the other two algorithms tested, namely Random Forest and k-Nearest Neighbor (k-NN), SVM showed better performance in almost every aspect, especially excelling in identifying positive and neutral reviews with sensitivity values of 81.7% and 91.8%, respectively. The accuracy of the SVM model reached 90.8% for positive reviews and 90.3% for negative reviews, indicating that the model can accurately distinguish customer satisfaction across different sentiment levels. The high specificity value for the negative class 1468 11th ISC 2024 (Universitas Advent Indonesia, Indonesia) “Research and Education Sustainability: Unlocking Opportunities in Shaping Today's Generation Decision Making and Building Connections” October 22-23, 2024 (96.6%) further confirms the model's ability to correctly classify negative feedback. In addition, the MCC value for the SVM model is also high in all classes. This confirmed the reliability and accuracy of the model in predicting sentiment. In line with the model performance, the Word Cloud analysis provides an in-depth visual representation of the frequently used words in each sentiment class. In the positive class, terms such as ‘suka’, ‘bagus’, ‘mudah’, and ‘mantap’, indicate that customers are satisfied with the service and overall shopping experience. In contrast, words such as ‘kecewa’, ‘parah’, and ‘susah’ in the negative class highlight common problems such as technical difficulties, which affect customer satisfaction. Customer reviews are often difficult to classify because they do not always fall into the category of “positive” or “negative.” Some are neutral, others are more nuanced, but the SVM algorithm is able to overcome this complexity, offering a more precise and reliable understanding of customer feedback. In essence, the SVM model is able to predict sentiment well and also provides a clearer and more accurate picture of what the customer is actually experiencing. CONCLUSION In conclusion, this study shows how sentiment analysis, using three different machine learning algorithms, namely Random Forest, k-NN, and SVM, can be applied to assess customer satisfaction with the quality of e-commerce services, in this case Shopee. Each of these algorithms has its own advantages, but the results of this study suggest that the SVM model performs more prominently than the other two algorithms. The model consistently outperforms the others in key metrics such as sensitivity (how well it detects true positive cases), specificity (its ability to avoid false positives), and overall accuracy, regardless of the type of sentiment—whether the customer expressed positive, neutral, or negative feelings. For e-commerce platforms like Shopee, the results of this study, more than just a technical achievement, offer practical solutions to stay competitive by continuously monitoring their customer satisfaction. This kind of analysis is invaluable in helping businesses to categorize feedback efficiently. At the same time, the sentiment analysis results also identify areas for improvement. Thus, Shoppee can leverage the sentiment analysis results to respond to customer needs more effectively. In developing future research, researchers provide the following suggestions: 1. For further research, you can try other tools/software in the data collection and preprocessing stages to obtain more optimal review data. 2. For further research, you can explore other algorithms in classifying. 3. For further research, you can use high-level n-grams for n-gram feature extraction such as 3-grams or more, because it allows you to capture longer word combination patterns and more detailed phrases, providing richer contextual information compared to smaller n-grams. 1469 11th ISC 2024 (Universitas Advent Indonesia, Indonesia) “Research and Education Sustainability: Unlocking Opportunities in Shaping Today's Generation Decision Making and Building Connections” October 22-23, 2024 REFERENCES Abirami, A. M., & Gayathri, V. (2017). A survey on sentiment analysis methods and approach. 2016 Eighth International Conference on Advanced Computing (ICoAC), 72–76. https://doi.org/10.1109/ICoAC.2017.7951748 Cahyaningtyas, C., Nataliani, Y., & Widiasari, I. R. (2021). Analisis Sentimen Pada Rating Aplikasi Shopee Menggunakan Metode Decision Tree Berbasis SMOTE. AITI, 18(2), 173–184. https://doi.org/10.24246/aiti.v18i2.173-184 Dey, S., Wasif, S., Tonmoy, D. S., Sultana, S., Sarkar, J., & Dey, M. (2020). A Comparative Study of Support Vector Machine and Naive Bayes Classifier for Sentiment Analysis on Amazon Product Reviews. 2020 International Conference on Contemporary Computing and Applications (IC3A), 217–220. https://doi.org/10.1109/IC3A48958.2020.233300 Erlyana, Y., & Hartono, H. (2017). Business model in marketplace industry using business model canvas approach: An e-commerce case study. IOP Conference Series: Materials Science and Engineering, 277, 012066. https://doi.org/10.1088/1757- 899X/277/1/012066 Hariguna, T., Baihaqi, W. M., & Nurwanti, A. (2019). Sentiment Analysis of Product Reviews as A Customer Recommendation Using the Naive Bayes Classifier Algorithm. IJIIS: International Journal of Informatics and Information Systems, 2(2), 48–55. https://doi.org/10.47738/ijiis.v2i2.13 Isnain, A. R., Supriyanto, J., & Kharisma, M. P. (2021). Implementation of K-Nearest Neighbor (K-NN) Algorithm For Public Sentiment Analysis of Online Learning. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 15(2), 121. https://doi.org/10.22146/ijccs.65176 1470 11th ISC 2024 (Universitas Advent Indonesia, Indonesia) “Research and Education Sustainability: Unlocking Opportunities in Shaping Today's Generation Decision Making and Building Connections” October 22-23, 2024 Kütz, M. (2016). E-commerce: Combining Business and Information Technology. Bookboon. Oktora, R., Kusumatrisna, A. L., Hasyyati, A. N., & Untari, rima. (2022). Statistik eCommerce 2022. Pratmanto, D., Rousyati, R., Wati, F. F., Widodo, A. E., Suleman, S., & Wijianto, R. (2020). App Review Sentiment Analysis Shopee Application In Google Play Store Using Naive Bayes Algorithm. Journal of Physics: Conference Series, 1641(1), 012043. https://doi.org/10.1088/1742-6596/1641/1/012043 Rhohmawati, U., Slamet, I., & Pratiwi, H. (2019). Sentiment Analysis Using Maximum Entropy on Application Reviews (Study Case: Shopee on Google Play). Jurnal Ilmiah Teknik Elektro Komputer Dan Informatika, 5(1). https://doi.org/10.26555/jiteki.v5i1.13087 Sandag, G. A., Soegiarto, E. H. E., Laoh, L., Gunawan, A., & Sondakh, D. (2022). Sentiment Analysis of Government Policy Regarding PPKM on Twitter Using LSTM. 2022 4th International Conference on Cybernetics and Intelligent System (ICORIS), 1–6. https://doi.org/10.1109/ICORIS56080.2022.10031474 Singh, J., & Tripathi, P. (2021). Sentiment analysis of Twitter data by making use of SVM, Random Forest and Decision Tree algorithm. 2021 10th IEEE International Conference on Communication Systems and Network Technologies (CSNT), 193–198. https://doi.org/10.1109/CSNT51715.2021.9509679 Sitepu, M. B., Munthe, I. R., & Harahap, S. Z. (2022). Implementation of Support Vector Machine Algorithm for Shopee Customer Sentiment Anlysis. Sinkron, 7(2), 619–627. https://doi.org/10.33395/sinkron.v7i2.11408 Uly, Y. A., & Djumena, E. (2022, August 3). Nilai Transaksi E-Commerce Indonesia Capai Rp 108,54 Triliun di Kuartal I-2022. Kompas.Com. 1471 11th ISC 2024 (Universitas Advent Indonesia, Indonesia) “Research and Education Sustainability: Unlocking Opportunities in Shaping Today's Generation Decision Making and Building Connections” October 22-23, 2024 https://money.kompas.com/read/2022/08/03/211200826/nilai-transaksi-e-commerceindonesia-capai-rp-108-54-triliun-di-kuartal-i-2022 Veselova, A. (2020). Marketplace vs. Online Shop. Proceedings of the 62nd International Scientific Conference on Economic and Social Development, 30–36. 1472