Jurnal SISFOKOM (Sistem Informasi dan Kompute. Volume 14. Nomor 03. PP 291-297 Analysis of Public Sentiment Towards LGBT on Twitter Social Media Using Nayve Bayes Method Yudhi Franata. Rizal. Rizki Suwanda. University of Malikussaleh. , . , . Lhokseumawe. Aceh 180170124@mhs. , rizal@unimal. , rizkisuwanda@unimal. AbstractAi The advancement of information technology and the widespread use of social media have provided a platform for individuals to express their views on various social issues, including those related to Lesbian. Gay. Bisexual, and Transgender (LGBT) topics. This study aims to assess public sentiment towards LGBT issues on Twitter by employing the Nayve Bayes classification algorithm. Relevant tweets were collected through web scraping based on specific LGBT-related keywords within a defined time frame. The collected data underwent several preprocessing stages, including data cleaning, tokenization, stopword removal, and stemming. The processed data were then categorized into three sentiment classes: positive, negative, and neutral. Nayve Bayes was chosen for its effectiveness and efficiency in handling large-scale textual data. The analysis revealed that negative sentiment toward LGBT issues was predominant, although a considerable portion of tweets expressed neutral and positive sentiments. These findings offer valuable insights for policymakers, social activists, and academics in understanding public perception and formulating more effective communication strategies related to LGBT discourse in Indonesia. The classification model achieved an accuracy of 57%, precision of 52%, recall of 100%, and an F1-score of 68%. While the Nayve Bayes approach proved capable in sentiment classification, the model's accuracy could be further enhanced through improved data preparation or the application of more advanced algorithms. KeywordsAi Sentiment Analysis. LGBT. Twitter. Nayve Bayes. INTRODUCTION Social media has become an integral part of modern society, serving as a platform for expressing opinions, sharing entertainment, and disseminating information. Among the many social issues discussed online, the topic of LGBT (Lesbian. Gay. Bisexual, and Transgende. has gained significant attention. Since the 1990s, the term LGBT has replaced the word "homosexual" to better represent the diversity of gender identities and sexual orientations. Indonesia, the LGBT issue remains highly controversial, especially as members of the LGBT community have become increasingly visible through various campaigns and promotional activities on digital platforms . This cultural shift has drawn considerable attention due to its perceived misalignment with the values held by the majority of Indonesians. In particular. Twitter, a microblogging platform, has played a significant role in the conversation surrounding LGBT issues. With its fast dissemination and wide reach. Twitter allows users to share their thoughts in short but impactful messages known as tweets. The platform offers a unique window into the dynamics of public opinion. Events such as the proposed LGBT-themed AuParis ParadeAy in Indonesia . and the police intervention in an alleged LGBT community party in Bogor . demonstrate that social media functions not only as a discussion space but also as a stage for societal conflicts. Sentiment analysis, a computational technique used to identify public attitudes towards specific topics, has become a valuable tool for analyzing these dynamics. By utilizing natural language processing techniques, sentiment analysis categorizes textual data into positive, negative, or neutral sentiments, while also detecting emotional states such as anger, sadness, or In this study, the Nayve Bayes classification algorithm was selected for its efficiency and accuracy in processing large-scale and diverse textual datasets. Previous research has demonstrated the effectiveness of Nayve Bayes in sentiment analysis tasks. For instance, a study by . on public opinions regarding COVID-19 vaccination in Indonesia using Nayve Bayes achieved an accuracy of 93%. Another study by Ridwan . , which employed the Support Vector Machine (SVM) algorithm to analyze public sentiment on LGBT issues, found that public opinion was predominantly neutral and negative, with the highest accuracy of 74% achieved using a linear kernel. The Nayve Bayes algorithm remains a preferred method due to its strong classification performance with relatively low computational complexity. According to . Nayve Bayes consistently delivers reliable results in real-world applications. In the context of public opinion toward LGBT issues, this algorithm can provide valuable insights and support the development of more effective communication strategies by Based on this background, the present study is entitled AuPublic Sentiment Analysis on LGBT Issues on Twitter Using the Nayve Bayes Method. Ay This research aims to explore Indonesian public opinion towards LGBT-related discussions on Twitter and contribute to a better understanding of societal perspectives, which can inform academic research, social p-ISSN 2301-7988, e-ISSN 2581-0588 DOI : 10. 32736/sisfokom. Copyright A2025 Submitted : Mey 26, 2025. Revised : June 4, 2025. Accepted : June 13, 2025. Published : July 28, 2025 Jurnal SISFOKOM (Sistem Informasi dan Kompute. Volume 14. Nomor 03. PP 291-297 advocacy, and policy development. II. LITERATURE REVIEW Previous Research Many studies have shown that sentiment analysis using machine learningAiespecially the Nayve Bayes algorithmAican be highly effective. For example. Nurdin et al. used Nayve Bayes to classify student academic papers and achieved an average accuracy of 86. 68% . Similarly. Adek et al. applied the same method to analyze perfume product reviews on Bukalapak. com, reaching an impressive accuracy of 44% . In the realm of social media. Adek et al. explored sentiment classification on Twitter using unigram, bigram, and trigram models, finding that accuracy improved as the size of the training data increased . Kosasih and Alberto . also applied Nayve Bayes in their analysis of game product reviews on Shopee, combining it with TF-IDF and 22% accuracy . Meanwhile. Karami et al. developed an automatic classification system for LGBT Twitter profiles, including those featuring explicit content, and reported around 88% accuracy . These findings collectively underscore the adaptability and dependability of the Nayve Bayes classifier in a wide range of applications. Its strength lies in its solid probabilistic foundation, straightforward implementation, and reliable performance in predictive tasks. LGBT LGBT stands for Lesbian. Gay. Bisexual, and Transgender, a term that has been used since the 1990s to replace the more limited term "homosexual. " Each identity within LGBT represents different sexual orientations or gender expressions, and in Indonesia, the existence of LGBT individuals remains Although human rights advocates support equal treatment, resistance still prevails due to cultural and religious Campaigns promoting LGBT awareness face significant challenges in a Muslim-majority country like Indonesia. at eliminating noise and improving classification accuracy. This process involves several steps, including case folding, tokenization, stopword removal, and stemming, to prepare the text for further analysis. It helps standardize text input, making it more suitable for machine learning models. The effectiveness of sentiment analysis often hinges on the quality of preprocessing applied to the raw text data. Nayve Bayes Classification The Nayve Bayes algorithm is a popular choice for text classification because it's both simple and surprisingly By applying BayesAo theorem, it estimates how likely a piece of text belongs to a certain category based on how often specific words appear. Simorangkir and Lhaksmana highlighted how well this method handles large volumes of tweets, offering fast processing and dependable predictions. What makes Nayve Bayes especially appealing is its solid foundation in probability and how easy it is to implement in real-world applications. ycU) = ycE. ycU)ycE. cU) . Unknown: : Unknown class data : Special class data (Hypothesized data X) P(H|X) : Probability of hypothesis H on X P(H) : Probability of hypothesis H P(X|H) : Probability of hypothesis X on H P(X) : Probability of hypothesis X The flowchart of the classification process is shown in the following figure: Sentiment Analysis Sentiment analysis is a method for evaluating public opinions on specific topics by categorizing text as positive or Typically applied to social media data, this technique uses machine learning algorithms to classify large volumes of textual content. Nayve Bayes is suitable for this task as it allows for manual evaluation of training data and delivers clear sentiment labels such as P for positive and N for negative sentiments. Text Mining Text mining refers to the process of extracting valuable insights and patterns from unstructured textual data. Unlike structured data mining, text mining must first undergo preprocessing to clean and standardize the text. Hermanto & Noviriandini A . explained that this process includes information extraction, clustering, classification, and visualization, enabling researchers to analyze opinions, emotions, and behaviors related to certain topics or entities. Text Preprocessing Text preprocessing is an essential step in text mining aimed Fig. flowchart of the classification process Nayve Bayes The steps in the Nayve Bayes classification method are as A Calculate the probability of each document category p-ISSN 2301-7988, e-ISSN 2581-0588 DOI : 10. 32736/sisfokom. Copyright A2025 Submitted : Mey 26, 2025. Revised : June 4, 2025. Accepted : June 13, 2025. Published : July 28, 2025 Jurnal SISFOKOM (Sistem Informasi dan Kompute. Volume 14. Nomor 03. PP 291-297 ycAyca = ycA A Calculate the probability of each word appearing in a category using: 1 P. = yce. cO| . A Determine the class with the highest probability using: VMAP= argmaxyaycAyc ycE. A Evaluate model performance using a confusion matrix consisting of accuracy, precision, and recall. Confusion Matrix To assess the performance of classification models, a confusion matrix is commonly used. It measures accuracy, precision, and recall by comparing the predicted outcomes with the actual values. Muktafin et al. emphasized that high precision and recall indicate a strong model, while additional metrics like AUC (Area Under Curv. and ROC (Receiver Operating Characteristi. curves help visualize classification accuracy and performance across different thresholds. TABLE I. CONFUSION MATRIX MODEL Correct Classification Classified as True positives (A) False negatives (B) False positives (C) True negatives (D) The classification results are evaluated using a confusion matrix, with the following metrics: Accuracy = (A B) / (A B C D) A Precision = A / (C A) A Recall = A / (A D) i. Fig. Research Steps System Requirements Analysis System requirements analysis includes identifying the hardware and software specifications used during the system development and testing process. The hardware used is a laptop with AMD A10 processor, 8 GB RAM, and 1 TB HDD, which is sufficient to handle data crawling, text preprocessing, and classification model training. Meanwhile, the software used includes Windows 10 Home 64-bit operating system. Visual Studio Code as a text editor, and Python as the main programming language supported by various additional libraries for web scraping, machine learning, and text preprocessing needs. All of these requirements become a reference in ensuring the system can run efficiently and System Scheme The general sentiment analysis system scheme can be seen in the figure below: RESEARCH METHODS Time of Research Implementation This research was conducted during the period June 2023 to September 2023. In this time span, all stages ranging from data collection, system design, algorithm implementation, to model evaluation are carried out gradually and systematically. Problem Formulation The LGBT phenomenon in Indonesia is a topic that generates both support and opposition in society. Therefore, this research aims to develop a sentiment analysis system for public opinion on Twitter using the Nayve Bayes algorithm, in order to identify trends in public attitudes towards the issue. Research Steps This research uses a waterfall approach that includes the stages of data collection, data preprocessing, classification, and system testing. The process flow is explained visually through a flow chart to facilitate understanding of the series of research Fig. System Scheme p-ISSN 2301-7988, e-ISSN 2581-0588 DOI : 10. 32736/sisfokom. Copyright A2025 Submitted : Mey 26, 2025. Revised : June 4, 2025. Accepted : June 13, 2025. Published : July 28, 2025 Jurnal SISFOKOM (Sistem Informasi dan Kompute. Volume 14. Nomor 03. PP 291-297 System Schematic of Nayve Bayes Method Basic Model Design Context Diagram Fig. Context Diagram Data Flow Diagram (DFD) Fig. System Schematic of Nayve Bayes Method The diagram above shows the overall workflow of the It starts with collecting data through scraping using the Twitter API. Once the data is gathered, it goes through preprocessing and is labeled based on sentiment. After that, the dataset is splitAi80% is used to train the model, while the remaining 20% is reserved for testing. A Nayve Bayes classifier is then built using the training data and evaluated on the test set. To assess how well the model performs, a confusion matrix is used to calculate its accuracy (Afifah & Voutama, 2. IV. RESULTS AND DISCUSSION Research Results This study conducted sentiment analysis of Indonesian public opinion related to LGBT using data from the X. Data was collected through crawling techniques using Python, with authentication in the form of auth_token from The collected sentiments cover the period 2023 to 2024 and are processed using the Nayve Bayes algorithm. The implementation results produced three sentiment categories, namely positive, negative, and neutral. Model evaluation is done by measuring the level of accuracy as an indicator of the feasibility of the Nayve Bayes method in classifying public opinion on LGBT issues. System Analysis System analysis is carried out to decompose the problem into smaller components so that it is easy to understand and The main problem in this research is how to identify public opinion on LGBT automatically. Therefore, a sentiment analysis system based on Nayve Bayes algorithm was built by utilizing crawling data from X. The system receives input in the form of tweets, then performs preprocessing to clean the text, before finally being classified into sentiment categories. The system is developed using PHP and Python programming languages and MySQL database, and the final result is sentiment prediction and its accuracy level. Fig. Data Flow Diagram (DFD) p-ISSN 2301-7988, e-ISSN 2581-0588 DOI : 10. 32736/sisfokom. Copyright A2025 Submitted : Mey 26, 2025. Revised : June 4, 2025. Accepted : June 13, 2025. Published : July 28, 2025 Jurnal SISFOKOM (Sistem Informasi dan Kompute. Volume 14. Nomor 03. PP 291-297 Entity Relationship Diagram (ERD) Sub Menu Table TABLE V. SUB MENU TABLE Name id_sub_menu id_menu Type Width Notes Primary Key Foreign Key User Menu Table TABLE VI. USER MENU TABLE Name id_user_menu id_user id_menu Type Width Notes Primary Key Foreign Key Foreign Key Fig. Entity Relationship Diagram (ERD) Performance Evaluation of the Nayve Bayes Model Using a Confusion Matrix To evaluate the performance of the Nayve Bayes model, a confusion matrix is used that measures accuracy, precision, recall, and F1-score, and provides an overview of the model's ability to classify positive, negative, and neutral sentiments. Actual Predicted Positive Negative Neutral Total TABLE II. USERS TABLE Positive Negative Neutral Total . Accuracy yaycaycaycycycaycayc = 100 40 = 0. 5676 OO 57% 100 40 57 50 Precision ycEycyceycaycnycycnycuycu = = 0. 5208 OO 52% Recall ycIyceycaycaycoyco = = 100% 100 0 . F1-Score ya1 Oe ycIycaycuycyce = 2 y 52 y 1 =y 0. 3421 OO 68% Database Management This research uses MySQL as a database management system to facilitate structured data processing and storage. The use of the right DBMS supports the relationship between data and facilitates the process of managing the system. Users Table TABLE i. USERS TABLE Name id_ulselr id_role is_active MCenu Table Type Width Notes Primary Key TABLE IV. MCENU TABLE Name id_menu Type Width Notes Primary Key Role Table TABLE VII. ROLE TABLE Name id_role Type Width Notes Primary Key Discussion . Testing the Nayve Bayes Method The testing carried out in this research is to use the Nayve Bayes model. In the testing process, the system requires research data, namely sentiment data obtained by crawling the com application. Research Data Research data is data taken by crawling the X. The data in the research is in the form of X user sentiment data on LGBT from 2023 to 2024. In this process the system also performs automatic data labeling using AutoTokenizer. Preprocessing Data Data preprocessing is the initial stage in data processing which aims to clean, change, and prepare raw data so that it is ready to be used in the Nayve Bayes modeling process. Nayve Bayes Implementation The implementation of the Nayve Bayes algorithm in this study produces a classification model which is then evaluated using the following performance metrics: Accuracy = 57% Precision = 52% Recall = 100% F1-score = 68% System Implementation The implementation of the sentiment analysis system follows a structured process as outlined in the flowchart. Each stage of the system plays a crucial role in ensuring the model functions as expected. Data Collection The data collection process involves scraping 300 tweets related to LGBT from Twitter over a three-month period (January to March 2. This data was collected using Python's web scraping libraries with specific LGBT-related . System Design & Implementation p-ISSN 2301-7988, e-ISSN 2581-0588 DOI : 10. 32736/sisfokom. Copyright A2025 Submitted : Mey 26, 2025. Revised : June 4, 2025. Accepted : June 13, 2025. Published : July 28, 2025 Jurnal SISFOKOM (Sistem Informasi dan Kompute. Volume 14. Nomor 03. PP 291-297 The system was developed using Python for machine learning tasks and PHP for the web interface. The design focuses on creating a user-friendly system for data collection, processing, and sentiment analysis with Nayve Bayes. Text Processing Text preprocessing steps included tokenization, stopword removal, and stemming. After cleaning the raw data, around 250,000 words remained, ready for use in training the model. Training Data 80% of the cleaned data, about 240,000 words, was used for The data was categorized into three sentiment classes: positive, negative, and neutral. Testing with Testing Data The remaining 20% of the data . ,000 word. was used for The modelAos predictions were compared with actual results to assess its accuracy and effectiveness. Test Result The model's performance was evaluated, resulting in an accuracy of 57%, precision of 52%, recall of 100%, and an F1-score of 68%. The system developed in this research is a web-based application that serves to perform sentiment analysis of public opinion on LGBT using Nayve Bayes algorithm. The system is built using PHP and Python programming languages, and supported by MySQL database. Users can login and access various features, such as managing sentiment data, preprocessing text, and viewing sentiment classification results. One of the important parts of this system is the dashboard page that displays a summary of the amount of data, classification results, and model performance in the form of an informative and easy-to-use interface, as shown in Figure below: Fig. Access User Pages CONCLUSION This study successfully developed a sentiment analysis system to capture Indonesian public opinion on LGBT issues through Twitter. The system was built using the Nayve Bayes algorithm and involved several key stages: data collection, text preprocessing, sentiment labeling, and classification. The evaluation results indicate that the Nayve Bayes algorithm can classify tweets into positive, negative, and neutral categories, achieving an accuracy of 57%, a precision of 52%, a perfect recall of 100%, and an F1-score of 68%. These findings suggest that while the Nayve Bayes method shows promise in handling sentiment classification tasks, particularly in identifying relevant data, there is still significant room for improving overall accuracyAipotentially through model refinement or by exploring more advanced algorithms. REFERENCES