Journal of Computer System and Informatics (JoSYC) ISSN 2714-8912 . edia onlin. ISSN 2714-7150 . edia ceta. Volume 5. No. February 2024. Page 442-453 https://ejurnal. seminar-id. com/index. php/josyc DOI 10. 47065/josyc. Sentiment Classification of Robot Hotel Content using NBC and SVM Algorithm Yerik Afrianto Singgalen Faculty of Business Administration and Communication. Tourism Study Program. Atma Jaya Catholic University of Indonesia. Jakarta. Indonesia Correspondence Author Email: yerik. afrianto@atmajaya. Submitted: 08/02/2024. Accepted: 23/02/2024. Published: 28/02/2024 AbstractOeSentiment analysis plays a pivotal role in comprehending public sentiment, notably within digital communication, where copious amounts of textual data are generated daily. This study delves into the efficacy of sentiment classification models, namely the Naive Bayes Classifier (NBC) and Support Vector Machine (SVM), within the imbalanced datasets commonly encountered in sentiment analysis tasks. Employing a comparative analysis methodology, a dataset comprising robot hotel reviews from online platforms is the basis for evaluation. Both NBC and SVM models undergo training and assessment, with and without the Synthetic Minority Over-sampling Technique (SMOTE), to rectify the class imbalance. Performance evaluation relies on critical metrics, including accuracy, recall, precision, f-measure, and Area Under Curve (AUC) to gauge model effectiveness. Findings demonstrate SVM's superiority over NBC in terms of accuracy (SVM: NBC: 67. 43%), precision (SVM: 92. NBC: 86. 87%), recall (SVM: 58. NBC: 41. 00%), f-measure (SVM: NBC: 55. 63%), and AUC (SVM: 0. NBC: 0. Incorporating SMOTE significantly enhances both models' performance, particularly in addressing class imbalance concerns. Although NBC exhibits a more balanced performance across precision and recall metrics. SVM demonstrates heightened accuracy and predictive capability in sentiment classification tasks. These findings underscore the pivotal role of algorithm selection and preprocessing techniques in optimizing sentiment analysis performance, thereby providing invaluable insights for practitioners and researchers alike. Keywords: Sentiment. Classification. Robot. Hotel. NBC. SVM INTRODUCTION The emergence of robot hotels represents a notable advancement in digital innovation and the Internet of Things (IoT), revolutionizing accommodation services rooted in human-machine interactions . This development poses a significant challenge within the tourism sector, particularly concerning workforce absorption to enhance community economies . As these establishments increasingly integrate automated systems and robotic functionalities into their operations, the traditional dynamics of hospitality service delivery undergo a paradigm shift . Such transformative endeavors underscore stakeholders' need to balance technological integration and preserving human-centric hospitality experiences . Integrating robots in hotels not only reshapes service provision but also prompts critical reflections on the socio-economic implications of automation in the tourism industry . The publication of digital operational documentation of robot hotels on platforms such as YouTube and social media has elicited praise and criticism, as reflected in public sentiments . This dissemination is crucial for disseminating information and insights into these innovative establishments . However, it also engenders divergent viewpoints regarding the implications of robotic integration in the hospitality industry . Proponents argue that such transparency fosters consumer understanding and trust, bolstering robot hotels' credibility as viable accommodation options . Conversely, critics contend that the widespread accessibility of this documentation may exacerbate concerns surrounding privacy, security, and the displacement of human workers . Despite the polarizing nature of these perspectives, the visibility of digital operational documentation underscores the significance of informed discourse in shaping perceptions and policy frameworks surrounding the proliferation of robot hotels . The urgency of this research lies in the endeavor to identify and analyze public sentiments regarding the existence of robot hotels and the sustainability challenges the tourism sector faces . The proliferation of robot hotels represents a significant shift in the hospitality landscape, prompting inquiries into its societal implications and long-term viability . All stakeholders gain valuable insights into the acceptance, concerns, and expectations surrounding these automated accommodations by comprehensively assessing public sentiments . Furthermore, understanding the sustainability challenges posed by integrating robotic technology into the tourism industry is imperative for devising strategies to mitigate negative impacts and promote responsible development . In conclusion, this research endeavor seeks to contribute to informed decision-making processes and formulate policies conducive to the harmonious coexistence of technological innovation and sustainable tourism practices . The practical implication of this research hinges on the utilization of information regarding public sentiments towards robot hotels and the adoption of digital technology and IoT in accommodation service businesses . By gaining insights into public perceptions and attitudes towards robot hotels, industry stakeholders can tailor their strategies to align with consumer preferences and concerns, thus enhancing customer satisfaction and loyalty . Additionally, understanding the adoption patterns of digital technologies and IoT in the hospitality sector can inform investment decision-making processes, operational enhancements, and Copyright A 2024 Author. Page 442 This Journal is licensed under a Creative Commons Attribution 4. 0 International License Journal of Computer System and Informatics (JoSYC) ISSN 2714-8912 . edia onlin. ISSN 2714-7150 . edia ceta. Volume 5. No. February 2024. Page 442-453 https://ejurnal. seminar-id. com/index. php/josyc DOI 10. 47065/josyc. competitive positioning . In conclusion, leveraging the findings of this research can facilitate the integration of innovative technologies in accommodation services, fostering greater efficiency, competitiveness, and responsiveness to evolving consumer needs . The theoretical implication of this research emphasizes the concept of digitalization in the tourism sector, which influences economic, socio-cultural, and ecological dimensions . By exploring the integration of digital technologies and IoT in accommodation services, this research contributes to understanding broader trends in the digital economy within the tourism context . It sheds light on how advancements in digitalization reshape traditional economic models, impact social and cultural dynamics, and pose challenges to ecological sustainability . Furthermore, this research underscores the interconnectedness between technological innovation and the multifaceted aspects of tourism development, providing valuable insights for theoretical frameworks addressing the evolving landscape of digital economies in the tourism . conclusion, this research contributes to advancing theoretical perspectives on the implications of digitalization for the tourism sector, thereby enriching scholarly discourse and guiding future research endeavors . The limitation of this research lies in the methodology, which is confined to sentiment classification using algorithms, potentially overlooking nuanced qualitative insights . While sentiment analysis provides valuable quantitative data on public perceptions, it may fail to capture the complexity and depth of human emotions and experiences . Furthermore, the perspective adopted primarily focuses on tourist or hotel guest behavior, neglecting other relevant stakeholders such as hotel staff, local communities, and policymakers . Despite these methodological constraints, this research serves as a foundational step towards understanding public sentiments and attitudes towards robot hotels, paving the way for future studies to employ more comprehensive methodologies and consider diverse perspectives in elucidating the societal implications of technological innovations in the tourism sector . Similar research in the field has also explored the performance of sentiment classification models in handling imbalanced datasets. Previous studies have investigated various machine learning algorithms, including Naive Bayes Classifier (NBC) and Support Vector Machine (SVM), to analyze sentiment in textual data. These investigations often employ similar methodologies, utilizing metrics such as accuracy, recall, precision, fmeasure, and Area Under Curve (AUC) for performance evaluation. While some studies focus on specific domains or datasets, others adopt a broader approach to assess model generalization. Overall, these analogous research endeavors contribute to the cumulative understanding of sentiment analysis methodologies and offer insights into the optimization of model performance in real-world applications. The contribution to knowledge of this research lies in its comprehensive exploration of public sentiments toward robot hotels and the broader implications of digitalization in the tourism sector . By employing sentiment analysis algorithms and examining the integration of digital technologies and IoT in accommodation services, this study provides valuable insights into the evolving landscape of hospitality and its impact on economic, socio-cultural, and ecological dimensions . Furthermore, identifying methodological limitations underscores the need for future research to adopt more nuanced approaches and consider diverse perspectives in addressing the complex interplay between technology, society, and tourism . Overall, this research contributes to advancing scholarly understanding of the implications of digitalization for the tourism industry, thereby informing policy-making, managerial practices, and academic discourse in this field . RESEARCH METHODOLOGY 1 Research Gap and Trends Mapping: Climate Change and Tourism The research gap in this topic pertains to the need for further investigation into the long-term socio-economic and environmental implications of robot hotels and the digitalization of the tourism industry. While existing studies provide valuable insights into public sentiments and the adoption of digital technologies in accommodation services, there remains a dearth of research examining the broader systemic effects of these trends on local economies, cultural dynamics, and natural ecosystems . Ae. Additionally, the limited exploration of alternative methodological approaches and the perspectives of diverse stakeholders highlight opportunities for future research to expand the scope and depth of inquiry in this field. Closing this research gap is essential for informing evidence-based policies, sustainable business practices, and holistic strategies to foster responsible technological innovation and inclusive tourism development. Copyright A 2024 Author. Page 443 This Journal is licensed under a Creative Commons Attribution 4. 0 International License Journal of Computer System and Informatics (JoSYC) ISSN 2714-8912 . edia onlin. ISSN 2714-7150 . edia ceta. Volume 5. No. February 2024. Page 442-453 https://ejurnal. seminar-id. com/index. php/josyc DOI 10. 47065/josyc. Figure 1. Network. Overlay, and Density Visualization The contribution of this research, which focuses on conducting a comparative analysis of sentiment classification models using Naive Bayes classifier and Support Vector Machine (SVM) on robot hotel content, is significant in bridging the gap in sentiment analysis methodologies within the context of digital tourism . , . By systematically evaluating the performance of these two widely-used classifiers, the study provides valuable insights into their effectiveness in accurately categorizing public sentiments towards robot hotels. This comparative analysis enhances our understanding of the strengths and limitations of different machine learning techniques and offers practical implications for improving sentiment analysis methodologies in the hospitality Ultimately, this research contributes to advancing scholarly knowledge and guiding future endeavors to enhance the accuracy and reliability of sentiment analysis models in digital tourism research. 2 Cross-Industry Standard Process for Data Mining (CRISP-DM) The method employed in testing the performance of algorithms is the Cross-Industry Standard Process for Data Mining (CRISP-DM), comprising stages of business understanding, data understanding, modeling, evaluation, and deployment. This structured approach facilitates a systematic and comprehensive analysis of sentiment classification models applied to robot hotel content, ensuring a thorough understanding of the business objectives and the data characteristics before model development. By adhering to the CRISP-DM framework, researchers can effectively evaluate the effectiveness of the Naive Bayes classifier and Support Vector Machine in categorizing public sentiments toward robot hotels, thereby enhancing the reliability and validity of the study's Figure 2. Implementation of CRISP-DM In the business understanding stage, the data source originates from the YouTube platform, mainly focusing on content related to robot hotels with the video ID . I3uUlztDbM&t=1. comprising 5604 reviews. This phase is a foundational step in comprehensively understanding the context and scope of the sentiment analysis study. By leveraging data from YouTube, specifically targeting content related to robot hotels, researchers gain insights into the perceptions, opinions, and experiences of users interacting with such This strategic approach facilitates the identification of relevant data sources and ensures alignment with the research objectives and the study's specific context. Copyright A 2024 Author. Page 444 This Journal is licensed under a Creative Commons Attribution 4. 0 International License Journal of Computer System and Informatics (JoSYC) ISSN 2714-8912 . edia onlin. ISSN 2714-7150 . edia ceta. Volume 5. No. February 2024. Page 442-453 https://ejurnal. seminar-id. com/index. php/josyc DOI 10. 47065/josyc. In the data understanding stage, the data to be extracted is textual data, with a division of testing data . %) and training data . %) from the total of 5604 reviews. This phase plays a crucial role in comprehensively assessing the nature and characteristics of the data set, mainly focusing on text-based information regarding robot hotels derived from user reviews on YouTube. By partitioning the data set into testing and training subsets, researchers can effectively evaluate the performance of sentiment classification models while ensuring the robustness and generalizability of the findings. Such a structured approach enhances the reliability and validity of the sentiment analysis study, providing a solid foundation for subsequent modeling and evaluation processes. In the modeling stage, the algorithms employed are Naive Bayes Classifier (NBC) and Support Vector Machine (SVM). Additionally, the SMOTE operator is utilized to compare the performance of both models. This phase constitutes a critical component in the sentiment analysis study, as it involves implementing and evaluating machine learning algorithms to classify sentiment in the textual data extracted from YouTube reviews on robot hotels. By leveraging NBC and SVM algorithms, researchers can assess the efficacy of different approaches in categorizing sentiments expressed by users towards robot hotels. Incorporating the SMOTE operator further enhances the comparative analysis by addressing potential class imbalances and improving the robustness of the models. Overall, the modeling stage facilitates the exploration and comparison of different algorithmic techniques, ultimately contributing to the advancement of sentiment analysis methodologies in digital tourism. In the evaluation stage, performance is measured based on accuracy, precision, recall, f-measure, and Area Under Curve (AUC) values. This phase represents a critical juncture in the sentiment analysis study, as it systematically assesses the effectiveness and reliability of the Naive Bayes Classifier (NBC) and Support Vector Machine (SVM) algorithms in categorizing sentiment within the YouTube reviews on robot hotels. employing a range of performance metrics such as accuracy, precision, recall. F-measure, and AUC. THE models' ability to classify sentiments will be evaluated. This rigorous evaluation process ensures the robustness and validity of the findings, providing valuable insights into the comparative performance of different algorithmic approaches in sentiment analysis tasks. Ultimately, the evaluation stage facilitates evidence-based decision-making and advances sentiment analysis methodologies in digital tourism. In the deployment stage, an immersive experience in a hotel robot, as well as the sentiment of tourists towards robot hotels, can be narrated. This pivotal phase in the sentiment analysis study involves translating research findings into practical applications and actionable insights within the hospitality industry. By deploying sentiment classification models trained on YouTube reviews, hotel operators can gain valuable insights into tourists' perceptions and experiences with robot hotels, thereby informing strategic decision-making processes to enhance customer satisfaction and loyalty. Moreover, by providing an immersive experience in robot-assisted services, hotels can further engage and delight guests, fostering positive sentiments and potentially driving business growth. This deployment phase underscores the importance of translating research outcomes into realworld applications to maximize their impact and relevance in tourism. 3 Nayve Bayes Classifier (NBC) and Support Vector Machine (SVM) The Bayesian Naive classifier requires only a relatively small amount of training data to determine the estimated parameters required for the classification process . At the classification stage, the class value is determined from data based on the term that occurs using the following equation. The posterior state ( probability Y in all . P . C=. = in Y) can be calculated from the prior state (Y in ( )) ( )) ) divided by the sum of all . Where v1 is one of the syllables that appear in over-tourism content reviews. While, ( )) Refers to the number of occurrences of a word labeled C (AupositiveAy or AunegativeA. As for ( )) refers to the sum of all C-labeled words in the dataset. To avoid zero values in probability, place smoothing is implemented to reduce the probability of observed results and increase the probability of unobserved results. Thus, the equation used is as follows: P . C=. = ( )) ( )) . Where |V| refers to the sum of all words in the review data present in the dataset. The SVM algorithm possesses distinct advantages and is highly relevant for sentiment classification based on datasets. Its ability to handle high-dimensional data and nonlinear relationships effectively makes it a preferred choice for sentiment analysis tasks . Additionally. SVM's robustness in handling binary and multiclass classification problems further enhances its applicability in diverse scenarios . The algorithm's versatility and proven performance in various research domains underscores its relevance and utility in sentiment analysis applications . Copyright A 2024 Author. Page 445 This Journal is licensed under a Creative Commons Attribution 4. 0 International License Journal of Computer System and Informatics (JoSYC) ISSN 2714-8912 . edia onlin. ISSN 2714-7150 . edia ceta. Volume 5. No. February 2024. Page 442-453 https://ejurnal. seminar-id. com/index. php/josyc DOI 10. 47065/josyc. conclusion, leveraging SVM algorithms offers a powerful approach to accurately classify sentiment based on dataset characteristics, contributing to advancements in sentiment analysis methodologies. Meanwhile, the regression function of the SVM method is as follows. ( ) Where ( ) is the decision function. W is the weight vector perpendicular to the hyperplane. X is the input feature vector, and B is the bias term. In the case of binary classification, the class label of a data point can be determined by the sign of ( ) ( ) ( ) . In non-linearly separable cases. SVM utilizes a kernel function K( ) to map the input feature vectors into a higher-dimensional space where the data becomes linearly separable. The decision function then becomes: ( ) Where are the Lagrange multipliers obtained during training. In addition. SMOTE is utilized to address data imbalance, thereby enhancing the performance of NBC and SVM classifiers . This methodological approach is pivotal in mitigating the potential biases from skewed class distributions within the sentiment analysis dataset, ensuring more robust and reliable classification outcomes . By synthetically generating minority class instances. SMOTE effectively augments the representation of underrepresented sentiments, enabling NBC and SVM algorithms to learn from a more balanced dataset and thus improve their ability to classify sentiments expressed in YouTube reviews on robot hotels accurately. Integrating SMOTE into the modeling process exemplifies the importance of addressing data imbalance issues to enhance the efficacy of machine learning models in sentiment analysis tasks. RESULT AND DISCUSSION The existence of robot hotels not only provides an immersive experience for hotel guests but also enhances competitiveness in the hospitality industry through a variety of unique attractions. This paradigm shift in hotel services, facilitated by the integration of robotic technology, offers guests novel and memorable experiences, contributing to increased customer satisfaction and loyalty. Additionally, robot hotels differentiate themselves from traditional accommodation establishments by offering innovative and diverse attractions such as robotassisted services, interactive experiences, and futuristic amenities. Such unique offerings attract tourists seeking novel experiences and position robot hotels as frontrunners in the competitive hospitality landscape. conclusion, the emergence of robot hotels signifies a transformative trend in the industry, where innovation and differentiation play pivotal roles in securing market relevance and sustainable growth. The challenge in ensuring the sustainability of these hotels lies in efforts to enhance positive public sentiment regarding digital innovation to provide an immersive experience for hotel guests. While integrating robotic technology and digital innovation offers unique and engaging experiences, it also prompts concerns and skepticism among the public regarding job displacement, privacy, and the erosion of traditional hospitality Therefore, hotel operators must proactively address these concerns through transparent communication, education, and community engagement initiatives to foster understanding and acceptance of digital innovations in the hospitality sector. By cultivating positive public sentiment, hotel operators can mitigate potential resistance and garner support and enthusiasm for their innovative endeavors, laying a solid foundation for longterm operational sustainability and growth. Figure 3. Number of Post Overtime Based on the content of robot hotel videos published on the YouTube platform, public responses can be discerned through reviews related to the content, which can be classified based on sentiment. By analyzing data Copyright A 2024 Author. Page 446 This Journal is licensed under a Creative Commons Attribution 4. 0 International License Journal of Computer System and Informatics (JoSYC) ISSN 2714-8912 . edia onlin. ISSN 2714-7150 . edia ceta. Volume 5. No. February 2024. Page 442-453 https://ejurnal. seminar-id. com/index. php/josyc DOI 10. 47065/josyc. posted over time, sentiment towards the documentation of robot hotels can be identified. This approach provides valuable insights into how users perceive and react to the portrayal of robot hotels in online media, shedding light on the public's sentiments and attitudes towards this innovative concept. Such analysis informs hotel operators and stakeholders about the reception of robot hotels and offers opportunities to address concerns and enhance positive perceptions through targeted strategies and communication efforts. Thus, leveraging data from YouTube reviews allows for a comprehensive understanding of public sentiment towards robot hotels, aiding in developing effective strategies to ensure their acceptance and success in the hospitality industry. Furthermore, the Most Frequently Used Words and Top Ten Posters can be determined. This analysis provides valuable insights into the dominant themes and topics discussed in the reviews of robot hotels on YouTube and the key contributors shaping the discourse surrounding this innovative concept. By identifying the most commonly used words and the individuals or entities posting the most frequently, researchers can understand the prevailing sentiments, opinions, and influencers within the online community discussing robot This information can inform strategic decision-making processes and communication strategies to engage with critical stakeholders, address concerns, and foster positive perceptions of robot hotels. Therefore, analyzing the Most Frequently Used Words and Top Ten Posters offers valuable insights into the dynamics of public discourse surrounding robot hotels, aiding in formulating effective strategies to navigate the complexities of public opinion and ensuring the success of these innovative ventures in the hospitality industry. Figure 4. Most Frequently Used Words and Top Ten Posters Based on information related to reviews of robot hotel content and the most frequently appearing words, the process can be continued with extraction using the RapidMiner application to reaffirm the frequency of the most commonly occurring words in the review data, as well as the classification results based on negative and positive classes. This approach facilitates a more in-depth analysis of the sentiment expressed in the reviews, enabling researchers to validate initial findings and refine the sentiment classification model. By leveraging RapidMiner for data extraction and analysis, researchers can enhance the accuracy and reliability of sentiment classification outcomes, thereby providing more robust insights into public perceptions of robot hotels. Therefore, utilizing RapidMiner in the extraction process is a valuable tool for advancing the understanding of sentiment dynamics surrounding robot hotels and informing strategic decision-making processes in the hospitality industry. Figure 5. Extract Sentiment Process in Rapidminer Before the sentiment extraction process in the RapidMiner application, the collected data is cleaned from symbols and words that do not have significant meaning in sentiment classification. Additionally, duplicate data is removed to ensure the integrity and accuracy of the analysis. This preliminary data-cleaning step is essential in preparing the dataset for sentiment analysis, as it eliminates noise and irrelevant information that may affect the performance of the classification model. By refining the dataset in this manner, researchers can improve the quality of the sentiment extraction process and enhance the reliability of the findings. Therefore, data preprocessing plays a crucial role in facilitating more accurate and insightful sentiment analysis outcomes using the RapidMiner application. Table 1. Extract Sentiment in Rapidminer Review When there is no more employment crimes will rise. Scoring String no (-0. crime (-0. Total Score Copyright A 2024 Author. Page 447 This Journal is licensed under a Creative Commons Attribution 4. 0 International License Journal of Computer System and Informatics (JoSYC) ISSN 2714-8912 . edia onlin. ISSN 2714-7150 . edia ceta. Volume 5. No. February 2024. Page 442-453 https://ejurnal. seminar-id. com/index. php/josyc DOI 10. 47065/josyc. Review When crime is high, illegal works happens . rugs When people are motivated by drugs and weapons then war starts. It is the cycle of the future. Wow that was great i wud love to visit japan cause i like the way they treat robots with respect like a human. Scoring String illegal (-0. weapons (-0. war (-0. Total Score 2,82051282051282 4,07692307692308 Based on sentiment extraction, it can be observed that machine learning occurs through scoring strings of text data, allowing for the determination of the total negative score classified as negative sentiment. In contrast, the total positive score is classified as positive sentiment. This process involves the application of machine learning algorithms to assign sentiment scores to individual text entries based on predefined criteria and training By analyzing these sentiment scores, researchers can effectively categorize text data into positive and negative sentiment categories, providing valuable insights into the prevailing sentiments expressed in the dataset. This approach facilitates a systematic and quantitative analysis of sentiment dynamics, enabling researchers to identify trends, patterns, and fluctuations in public opinion toward robot hotels. Therefore, sentiment extraction is a critical step in sentiment analysis, enabling researchers to derive meaningful insights from textual data and inform decision-making processes in various domains, including the hospitality industry. Figure 6. : Word Cloud Visualization Based on the frequently used words provided, it can be observed that specific terms such as "robots" . , "robot" . , "human" . , "humans" . , and "people" . are commonly mentioned in the reviews, indicating a significant focus on the interaction between robots and humans in the context of robot hotels. Additionally, terms like "hotel" . , "Japan" . , and "future" . suggest discussions related to the concept of robot hotels and their implications for the future of hospitality. However, the presence of words such as "creepy" . and "scary" . indicates that there are concerns or negative perceptions associated with the use of robots in hotel settings, highlighting potential challenges that need to be addressed. Analyzing frequently used words provides valuable insights into the key themes, topics, and sentiments prevalent in robot hotel discussions, informing further research and strategic decision-making in the hospitality industry. Analyzing the frequently used words provides several insights into robot hotel discussions. Firstly, the high occurrences of terms such as "robots," "robot," "human," "humans," and "people" suggest a predominant focus on the interaction between robots and humans within the context of robot hotels. This indicates a keen interest in understanding how automation and robotics are integrated into hospitality services and the implications for guest experiences. Additionally, the repetition of terms like "hotel," "Japan," and "future" underscores discussions related to the concept of robot hotels, particularly in the context of Japan, where these establishments have gained significant attention. However, words such as "creepy" and "scary" with notable occurrences indicate concerns or negative perceptions associated with using robots in hotel settings. This suggests potential challenges in addressing guest apprehensions and ensuring a seamless technology integration with the hospitality experience. Overall, the analysis highlights both the opportunities and challenges associated with adopting robotic technology in the hotel industry, underscoring the need for further research and strategic planning to navigate these complexities effectively. Copyright A 2024 Author. Page 448 This Journal is licensed under a Creative Commons Attribution 4. 0 International License Journal of Computer System and Informatics (JoSYC) ISSN 2714-8912 . edia onlin. ISSN 2714-7150 . edia ceta. Volume 5. No. February 2024. Page 442-453 https://ejurnal. seminar-id. com/index. php/josyc DOI 10. 47065/josyc. Figure 7. Modeling Process in Rapidminer The modeling results using the NBC and SVM algorithms show that both exhibit different performances before and after employing the SMOTE operator. This observation underscores the importance of addressing class imbalance in sentiment analysis tasks, as it significantly impacts the efficacy of classification models. The variations in performance highlight the need for careful consideration of algorithm selection and preprocessing techniques to ensure robust and reliable sentiment classification outcomes. Consequently, these findings advance our understanding of the complexities involved in sentiment analysis and underscore the importance of employing appropriate methodologies to enhance the accuracy and effectiveness of classification models in analyzing textual data. The NBC without SMOTE demonstrates the following results: The model's accuracy is 49. 97% with a micro average of 49. The confusion matrix indicates that out of 2,172 instances classified as unfavorable, 761 are true negatives, while 1,411 are false negatives. Similarly, out of 1218 instances classified as positive, 285 are false positives, and 933 are true positives. The AUC scores are reported as 0. , 0. , and 0. for the positive class. Precision is 76. 71% with a micro average of 76. while recall is 39. 80% with a micro average of 39. Lastly, the f-measure is 52. 34%, with a micro average of 39% for the positive class. These results suggest a moderate performance of the NBC model without SMOTE in sentiment classification. Further analysis and refinement may be necessary to improve its effectiveness in categorizing sentiments accurately. The NBC without SMOTE exhibits the following results: The model's accuracy is 71. 74% with a micro average of 71. The confusion matrix reveals that out of 1962 instances classified as unfavorable, 142 are true negatives, while 54 are false negatives. Conversely, out of 3,194 instances classified as positive, 904 are false positives, and 2,290 are true positives. The AUC scores are reported as 0. , 0. , 802 . for the positive class. Precision is reported as 71. 70% with a micro average of 71. while recall is 97. 70% with a micro average of 97. The f-measure is 82. 70%, with a micro average of 70% for the positive class. These results suggest a significantly improved performance of the NBC model without SMOTE in sentiment classification compared to the previous iteration. The model demonstrates vital precision, recall, and f-measure values, indicating its effectiveness in accurately categorizing positive sentiments. The difference in performance between the NBC and SVM models without SMOTE and the subsequent improvement with the inclusion of SMOTE underscores the significance of addressing class imbalance in sentiment analysis tasks. Initially, without SMOTE, both NBC and SVM models may struggle to classify sentiments accurately, particularly in scenarios where one class significantly outweighs the other. This imbalance can lead to biased predictions, where the model tends to favor the majority class, resulting in suboptimal accuracy, precision, recall, and f-measure values. However, with the incorporation of SMOTE, which synthetically generates minority class instances, the class distribution becomes more balanced. This allows the NBC and SVM models to learn from a more representative dataset, improving their ability to accurately classify sentiments expressed in textual data. Consequently. SMOTE enables the models to achieve higher accuracy, precision, recall, and f-measure values, leading to more reliable sentiment analysis outcomes. Overall, the need Copyright A 2024 Author. Page 449 This Journal is licensed under a Creative Commons Attribution 4. 0 International License Journal of Computer System and Informatics (JoSYC) ISSN 2714-8912 . edia onlin. ISSN 2714-7150 . edia ceta. Volume 5. No. February 2024. Page 442-453 https://ejurnal. seminar-id. com/index. php/josyc DOI 10. 47065/josyc. for SMOTE to improve the performance of NBC and SVM models lies in its ability to mitigate class imbalance issues, thereby enhancing the robustness and effectiveness of sentiment classification in real-world applications. Table 2. NBC and SVM with SMOTE NBC Using SMOTE PerformanceVector: accuracy: 67. 43% /- 2. 22% . icro average: 67. ConfusionMatrix: True: Negative Positive Negative: 2200 Positive: 144 AUC . : 0. 961 /- 0. icro average: ositive class: Positiv. AUC: 0. 669 /- 0. icro average: 0. ositive class: Positiv. AUC . : 0. 402 /- 0. icro average: ositive class: Positiv. precision: 86. 87% /- 2. 16% . icro average: 86. ositive class: Positiv. ConfusionMatrix: True: Negative Positive Negative: 2200 Positive: 144 recall: 41. 00% /- 4. 11% . icro average: 41. ositive class: Positiv. ConfusionMatrix: True: Negative Positive Negative: 2200 Positive: 144 f_measure: 55. 63% /- 4. 14% . icro average: 73%) . ositive class: Positiv. ConfusionMatrix: True: Negative Positive Negative: 2200 Positive: 144 SVM Using SMOTE PerformanceVector: accuracy: 76. 88% /- 1. 28% . icro average: 76. ConfusionMatrix: True: Negative Positive Negative: 2224 Positive: 120 AUC . : 0. 907 /- 0. icro average: ositive class: Positiv. AUC: 0. 907 /- 0. icro average: 0. ositive class: Positiv. AUC . : 0. 906 /- 0. icro average: ositive class: Positiv. precision: 92. 03% /- 1. 76% . icro average: 92. ositive class: Positiv. ConfusionMatrix: True: Negative Positive Negative: 2224 Positive: 120 recall: 58. 88% /- 2. 43% . icro average: 58. ositive class: Positiv. ConfusionMatrix: True: Negative Positive Negative: 2224 Positive: 120 f_measure: 71. 78% /- 1. 87% . icro average: 80%) . ositive class: Positiv. ConfusionMatrix: True: Negative Positive Negative: 2224 Positive: 120 The NBC with SMOTE yields the following results: The model's accuracy is reported as 67. 43% with a micro average of 67. The confusion matrix illustrates that out of 3583 instances classified as unfavorable, 2200 are true negatives, while 1383 are false negatives. Conversely, out of 1105 instances classified as positive, 144 are false positives, and 961 are true positives. The AUC scores indicate a significant improvement, with an optimistic AUC of 0. 961, standard AUC of 0. 669, and pessimistic AUC of 0. 402 for the positive class. Precision is reported as 86. 87% with a micro average of 86. 97%, while recall is 41. 00% with a micro average of 41. Furthermore, the f-measure is reported as 55. 63%, with a micro average of 55. 73% for the positive class. These results demonstrate notable improvements in the NBC model's performance with the inclusion of SMOTE, particularly in terms of accuracy and AUC, indicating its effectiveness in mitigating class imbalance issues and enhancing sentiment classification outcomes. The SVM with SMOTE demonstrates the following results: The model's accuracy is 76. 88% with a micro average of 76. The confusion matrix indicates that of 3188 instances classified as unfavorable, 2224 are true negatives, while 964 are false negatives. Conversely, out of 2344 instances classified as positive, 120 are false positives, and 1380 are true positives. The AUC scores are consistently high, with an optimistic AUC of 0. 907, a standard AUC of 0. 907, and a pessimistic AUC of 0. 906 for the positive class. Precision is reported as 92. with a micro average of 92. 00%, while recall is 58. 88% with a micro average of 58. Furthermore, the fmeasure is 71. 78%, with a micro average of 71. 80% for the positive class. These results underscore the effectiveness of SVM with SMOTE in accurately classifying sentiments, particularly in achieving high precision and AUC values, thereby demonstrating its suitability for sentiment analysis tasks with imbalanced class The comparison between NBC and SVM using SMOTE reveals distinct differences in performance metrics and effectiveness in sentiment analysis tasks. Firstly. SVM with SMOTE achieves a higher accuracy of 88% compared to NBC with SMOTE, which has an accuracy of 67. This suggests that SVM is better at accurately classifying sentiments in the dataset. Additionally. SVM demonstrates higher precision . 03% for SVM vs. 87% for NBC) and recall . 88% for SVM vs. 00% for NBC), indicating its ability to correctly identify positive sentiments while minimizing false positives. On the other hand. NBC has a lower recall. Copyright A 2024 Author. Page 450 This Journal is licensed under a Creative Commons Attribution 4. 0 International License Journal of Computer System and Informatics (JoSYC) ISSN 2714-8912 . edia onlin. ISSN 2714-7150 . edia ceta. Volume 5. No. February 2024. Page 442-453 https://ejurnal. seminar-id. com/index. php/josyc DOI 10. 47065/josyc. indicating that it tends to miss positive sentiments more often. However. NBC has a higher f-measure . for NBC vs. 78% for SVM), combining precision and recall into a single metric, indicating its ability to balance both precision and recall somewhat. While SVM outperforms NBC in accuracy, precision, and recall. NBC demonstrates a more balanced performance across precision and recall. These differences highlight each model's varying strengths and weaknesses and underscore the importance of selecting the most suitable algorithm based on the specific requirements of the sentiment analysis task. The recommendation stemming from the findings of this research is to utilize a combination of both NBC and SVM algorithms with the inclusion of SMOTE to enhance sentiment analysis accuracy in imbalanced Given the distinct performance characteristics observed between NBC and SVM with SMOTE, it is advisable to leverage the strengths of each algorithm to achieve more robust sentiment classification outcomes. Moreover, further exploration into advanced techniques, such as ensemble methods or deep learning architectures, may be warranted to improve the efficacy of sentiment analysis models further. Overall, this research underscores the importance of algorithm selection and preprocessing techniques in optimizing sentiment analysis performance and suggests avenues for future investigations in the field. CONCLUSION In conclusion, utilizing the SMOTE technique, this study has provided valuable insights into the performance of sentiment classification models, specifically NBC and SVM, in the context of imbalanced datasets. The findings indicate that both algorithms exhibit distinct strengths and weaknesses. SVM demonstrates higher accuracy, precision, recall, f-measure, and AUC than NBC. However. NBC displays a more balanced performance across precision and recall metrics. The incorporation of SMOTE significantly improves the performance of both algorithms, as illustrated in the confusion matrices. For NBC with SMOTE, the accuracy is 67. 43%, recall is 00%, precision is 86. 87%, f-measure is 55. 63%, and AUC is 0. Meanwhile, for SVM with SMOTE, the accuracy is 76. 88%, recall is 58. 88%, precision is 92. 03%, f-measure is 71. 78%, and AUC is 0. These metrics highlight the effectiveness of the models in accurately predicting sentiment labels and underscore the recommendation to leverage a combination of NBC and SVM with SMOTE to enhance sentiment analysis accuracy further. Overall, these findings contribute to advancing the understanding of sentiment analysis methodologies and provide valuable insights for practitioners and researchers in the field. REFERENCES