OPEN ACCESS ISSN 2356-5462 http://socj. id/ijoict/ Intl. Journal on ICT Vol. No. Dec 2024. doi: 10. 21108/ijoict. Buzzer Account Detection in Political Hate Tweets Using IndoBERT and Ensemble Learning: Case Study of the Indonesian Presidential Election 2024 Fizio Ramadhan Herman1*. Ade Romadhony2 School of Computing. Telkom University. Bandung. Indonesia *fizioramadhan@gmail. Abstract The Indonesian Presidential Election of 2024 has seen a widespread use of social media such as Twitter for political campaigning and discussion. However, this has also enabled the spread of hate speech from buzzer accounts that are created to influence public opinions. This study implements a machine learning approach to classify buzzer accounts that are spreading hate speeches during the presidential election period. By utilizing IndoBERT for hate speech classification and a traditional machine learning model to classify buzzer accounts. This study analyzes 62,341 tweets for hate speech classification and 961 accounts for buzzer account Our implementation of IndoBERT achieved a strong performance with 91. 12% of precision and recall, and 91. 19% accuracy and F1-score in hate speech classification. While for buzzer account classification, we compared Decision Tree. Random Forest, and XGBoost, with Decision Tree achieving the highest performance of 64% precision, recall, accuracy, and F1-Score. Our results demonstrate the effectiveness of combining deep learning for hate speech classification with traditional machine learning for buzzer account classification, contributing to the development of more effective content filtering for election discourse on social media. Keywords: buzzer detection, ensemble learning. IndoBERT, presidential election, sentiment analysis, social media INTRODUCTION he 2024 Indonesian Presidential Election has captured the attention of Indonesians for these past few Since late November 2023, there has been a growing discussion regarding this topic, especially on the internet, due to the political party campaigns have started. Social media such as Twitter and Facebook, play a significant role in politics . by providing spaces for individuals to share views and engage in political discussions . As the campaign season intensifies, the use of social media by Indonesian political parties to promote an advertise candidates is one example that demonstrates the role that social media occupies in politics. However, the open nature of Twitter can both encourage informed political discourse and create opportunities for the dissemination of hate speech, which could potentially shape public opinions about presidential Received on 20 Nov 2024. Revised on 12 Dec 2024. Accepted and Published on 10 Jan 2025. INTL. JOURNAL ON ICT VOL. NO. DECEMBER 2024 Increasing amount of political hate speech on Twitter has emerged as a notable concern. Based on a study conducted by Mozafari et. , hate speech is commonly defined as any form of communication that involves the criticism of an individual or a group based on some characteristics such as gender, nationality, religion, race. It is important to consider the potential consequences of this issue, as it could lead to division within communities, incite violence, and undermine the integrity of fair elections. In order to tackle this problem, the proposed research seeks to develop a robust deep learning approach to detect and identify buzzer accounts that are specifically created or used to spread political hate speech during the 2024 Indonesian Presidential Election. While previous studies have explored the role of Twitter in political polarization . the influence of ecampaigns on voters . and the detection of malicious social bots . , there is a need for a comprehensive study focused on the detection of buzzer accounts actively engaged in spreading political hate speech during the Indonesian Presidential Election. Previous research has also demonstrated significant progress in buzzer Suciati et al. evaluated multiple ensemble learning algorithms for buzzer detection, achieving 3% accuracy and 61. 3% precision using AdaBoost, while Ibrahim et al. developed an automatic buzzer detection system combined with sentiment analysis to predict election results with a mean absolute error of However, while these studies have broadened our understanding of buzzer detection and political sentiment analysis separately, these leaves a gap in the integrated study of hate speech-spreading buzzer A detailed review of related studies is provided in Section 2. This research aims to bridge this gap by leveraging ensemble learning for buzzer detection and deep learning approaches, specifically IndoBERT, for hate speech classification. Through collaboration with NoLimit Indonesia, we analyze a dataset of 62,341 tweets and 961 accounts to evaluate our proposed approach in identifying buzzer accounts that spread hate speeches during election periods. The study focuses on developing a framework that integrates ensemble learning and IndoBERT, assessing its effectiveness on a real Twitter dataset collected during the election period, and establishing quantitative benchmarks for identifying hate speech-spreading buzzer accounts. Additionally, the dataset used in this research is expected to serve as a resource for future studies on hate speech classification and buzzer account detection. II. LITERATURE REVIEW This section will analyze the recent research carried out in the field of buzzer account detection and hate speech classification. We will study this body of work by splitting it into several components to conduct a comprehensive review. Buzzer Account . Definition of Buzzer Account: According to a study conducted by Rismi Juliadi in 2017 . , a "buzzer" on Twitter is anticipated to spread information that will have an impact on their followers. The widespread adoption of buzzers can be understood through the lens of the Diffusion of Innovation Theory by Everett Rogers . This theory explains that communicators or messengers who get information from the mass media have the Fig. Two Step Flow Model (Katz & Lazarsfeld, 1. FIZIO RAMADHAN HERMAN AND ADE ROMADHONY: BUZZER ACCOUNT DETECTION IN POLITICAL HATE TWEETS: CASE STUDY OF THE INDONESIAN PRESIDENTIAL ELECTION 2024 ability to influence those around them. This process of influence is particularly well-illustrated through the following model in Fig. As described in the two-step flow model, the communication process is divided into two stages. The initial source communicates messages through mass media. Then, the message is received by the opinion leader. During the second stage, the opinion leader relays messages from the mass media to the public. These messages may already include the opinion leader's interpretations and responses . Characteristics of Buzzer Account: Within the social media landscape, two primary categories of buzzer accounts exist: automated and human-operated . Automated buzzers, essentially computer programs, are programmed to react on specific triggers. These programs can be designed for broadcast spamming, reaching a wide audience, or context-aware spamming, targeting specific groups . As stated by Ibrahim in 2015 . , an automated buzzer might automatically re-tweet posts containing keywords like "pemilu" . Humanoperated buzzers, in contrast, are individuals who are paid or driven by strong allegiance to a particular These accounts frequently post content and actively engage with related tweets, potentially influencing public opinion. Notably, local media reports suggest a possible link between candidates and paid social media users who promote specific campaigns . This highlights the potential for manipulation within the online political sphere. Political Hate Speech In today's political landscape, social media is commonly used to express opinions and for campaigning. Platforms like Twitter and Facebook can be utilized to communicate one's views in favor of a political party or political candidate . However, while the creation of online communities is not inherently a significant issue, it can also become concerning when politically active individuals utilize these platforms to spread radicalism . This radicalization can be facilitated by the widespread use of provocative language, as defined by Seigner et al. , which includes aggressive expressions, disobedience towards regulations, swearing, and antagonistic Related Works: Buzzer Account Detection and Hate Speech Classification Several studies have explored methods for detecting buzzer accounts on Twitter. Ibrahim et al. conducted research on buzzer detection for the 2014 Indonesian Presidential Election. They employed a machine learning approach to develop a computational model for classifying Twitter users as buzzers or non-buzzers. Their method achieved an accuracy of 86% in detecting buzzer accounts. Suciati et al. investigated buzzer detection for the 2019 Indonesian Presidential Election. They examined the performance of four machine learning algorithms: AdaBoost (AB). Gradient Boosting (GB). Extreme Gradient Boosting (XGB), and Histogram-based Gradient Boosting (HGB) . Their results showed that AdaBoost achieved the best accuracy 62. 3% and precision 3% with 25 features, while XGB attained the highest recall 67. Hate speech classification is another important task related to social media analysis, particularly in the context of elections. Geni et al. conducted sentiment analysis on Indonesian tweets related to the 2024 election using various models, including IndoBERT, which achieved the highest accuracy of 83. This method can be used for tasks like hate speech classification. While significant progress has been made in both buzzer detection and hate speech classification independently, there remains a critical gap in addressing the intersection of these challenges specifically, the detection of buzzer accounts that spread hate speech during election periods. RESEARCH METHOD System Overview Description To develop a model that can identify whether a Twitter account that posts hate speech is a buzzer or nonbuzzer, a dataset is required. Through collaboration with NoLimit Indonesia, we gathered a dataset of 62,341 tweets along with their corresponding account features. The system implementation begins by using IndoBERT to classify these tweets for hate speech content. If a tweet is not identified as hate speech, it is disregarded. For tweets classified as containing hate speech, we then examine the frequency of the account's appearance in the Based on the characteristic behavior of buzzer accounts, which involve high posting frequencies, we INTL. JOURNAL ON ICT VOL. NO. DECEMBER 2024 set a threshold of 10 appearances in the dataset to differentiate potential buzzer accounts from regular users. an account appears more than 10 times in the gathered dataset, we proceed to classify it using ensemble learning methods (Decision Tree. Random Forest, and XGBoos. to determine whether it is a buzzer or non-buzzer If the account appears less than 10 times, it is ignored from further analysis. The final output consists of a classification result indicating whether each qualifying account is categorized as a buzzer or non-buzzer based on their posting patterns (Fig. Fig. System Overview Dataset Preparation . Aimed Dataset: The dataset aimed for this research composed of social media posts on Twitter where the post includes hashtags that have the potential to incite hate speech related to the 2024 Indonesian Presidential Election, along with accompanying account features. Here in Fig. 3 are some cases for preferred tweets for the For the account features, we have selected nine specific features based on similar research conducted by Suciati et. These features will be presented in Table 2. Fig. Hate Speeches on Twitter Based on the research conducted by Suciati et. , there were originally 64 features before feature selection was performed. They used mutual information (MI) to avoid the curse of dimensionality. Among these features, only 11 were utilized due to possessing the highest MI scores. However, for our research, we are constrained to utilizing only 9 of these features, as obtaining the remaining data is unfeasible. Data Collection Method: The data for this study were collected through a collaboration with NoLimit Indonesia, a company that provides social media analysis as a service. NoLimit Indonesia supplied all the required data, such as tweet text, metadata and account features of a user. Their team handle the data gathering process, allowing us to focus on the analysis and modelling work. FIZIO RAMADHAN HERMAN AND ADE ROMADHONY: BUZZER ACCOUNT DETECTION IN POLITICAL HATE TWEETS: CASE STUDY OF THE INDONESIAN PRESIDENTIAL ELECTION 2024 TABLE I DESCRIPTION OF ACCOUNT FEATURES Features Name from_username Types of Data account_age Integer post_frequency_per_year post_frequency_per_month post_frequency_per_day final_sentiment is_buzzer Description Unique Twitter handle/username of the account . , @usernam. Total number of current followers for the Account age in years . alculated as current date minus account creation dat. Average number of posts per year Average number of posts per month Average number of posts per day Full text content of the tweet/post, including any hashtags and mentions Binary classification of content . : non-hate speech, 1: hate speec. Binary classification of account type . : nonbuzzer, 1: buzzer accoun. Data Labeling: For data labeling. NoLimit Indonesia provided a dataset of 62,341 tweets that were already labeled with sentiments . ate/non-hat. For buzzer account labeling, we can determine whether an account is a buzzer based on the definition and characteristics of buzzers as explain in Section 2, where buzzers are identified through several key characteristics: Information Spreading Patterns: Based on the Two-Step Flow Model and Diffusion of Innovation Theory . , buzzer accounts act as opinion leaders who actively spread and amplify messages to influence their Account Type: Following Ibrahim's research . , buzzer accounts can be categorized into two automated buzzers and human-operated buzzers. Automated buzzer are a programmed accounts that automatically react to specific triggers, such as retweeting posts containing keywords like AupemiluAy . or any other specific keywords. Behavioral Patterns: We also identify buzzer accounts by examining the frequency and consistency of tweet posting, pattern of message, and connection to organized political campaign or one of specific presidential . Data Preprocessing: Data preprocessing conducted for this research dataset includes several steps, namely: Text Cleaning: involves removing any irrelevant characters or symbols from the text data to ensure consistency and improve the quality of the analysis. This typically includes removing punctuation marks, special characters, mentions/urls, leading/trailing whitespace and converting all letters to lowercase. Tokenization: the process of breaking down a text into smaller units called tokens. These tokens can be words, phrases, or even individual characters. Tokenization facilitates further analysis by providing a structured representation of the text, enabling algorithms to process and understand the underlying content more efficiently. Slang Words Removal: This process involves identifying and replacing slang words in the text data with their corresponding standard Indonesian translations using a slang word dictionary. The dictionary contains a collection of informal, non-standard words and phrases commonly used in casual conversations, especially among youth and in online communication. For example, "ak" is a slang word for "saya"(I), "bkn" means "bukan". , and "jgn" is short for "jangan". By mapping these slang words to their standard equivalents, we can standardize the text data, reduce the variability introduced by informal language usage, and improve the performance of subsequent natural language processing tasks. Proposed Method . IndoBERT for Hate Speech Classification: IndoBERT is a transformer-based pretrained language model designed specifically for the Indonesian language. It is built on the BERT architecture, which utilizes bidirectional encoding to capture contextual information from both directions in a sentence. For hate speech INTL. JOURNAL ON ICT VOL. NO. DECEMBER 2024 IndoBERT can be fine-tuned by adding a classification head, enabling it to learn task-specific patterns from labeled data. IndoBERT is chosen for hate speech classification because its language-specific design ensures optimal understanding of Indonesian linguistic nuances, and its bidirectional encoding captures complex contextual information necessary for identifying hate speech. Ensemble Learning for Buzzer Detection: Ensemble learning combines predictions from multiple base models to improve overall classification performance. This method is particularly effective for buzzer detection, as it leverages diverse classifiers to handle various data patterns. Common techniques include bagging . Random Fores. and stacking, which combines outputs from base models through a meta-classifier. Ensemble learning is chosen for buzzer detection due to its abilities in handling noisy and diverse data. Evaluation Method The evaluation of the hate speech detection and buzzer account classification task involves assessing the models' performance using a variety of evaluation metrics tailored to the specific objectives of the research. The evaluation metric is used in two distinct stages: the training stage, which involves the learning process, and testing stage. During the testing phase, an assessment metric was utilized to measure the effectiveness of the classifier when applied to new, unseen data . There are four usual evaluation metrics that are often used in a classification task, which are precision, recall. F1 score, and accuracy. Precision: Precision is used to measure the positive patterns that are correctly predicted from the total predicted patterns in a positive class. The formula for calculating precision is shown in Equation 1. In this formula, tp stands for true positive and fp stands for false positive. The precision formula calculates the ratio of true positive predictions . to the total number of positive predictions . p f. A higher precision value indicates a lower number of false positives. ycEycyceycaycnycycnycuycu = ycycy yceycy . Accuracy: Accuracy is used as an evaluation metric in classification tasks because it provides a straightforward measure of a model's overall performance. This metric is particularly useful when the classes are balanced and the cost of false positives and false negatives is similar. The formula for calculating accuracy is shown in Equation 2. In this accuracy formula tp stands for true positive, tn stands for true negative, fp stands for false positive, and fn stands for false negative. The accuracy formula calculates the ratio of correctly predicted instances . oth true positives and true negative. to the total number of instances. A higher accuracy value indicates a higher number of correct predictions . oth positives and negative. relative to the total number of predictions made. ycycy ycycu yaycaycaycycycaycayc = ycycy yceycy ycycu yceycu . Recal: Recall is used to measure the fraction of positive patterns that are correctly classified. The formula for calculating recall is shown in Equation 3. In the recall formula, tp stands for true positive and fn stands for false negative. The recall formula calculates the ratio of true positive predictions to the total number of actual positive instances, which is the sum of true positives and false negatives. A higher recall value indicates that the model is effective at identifying most of the positive instances in the dataset. ycIyceycaycaycoyco = ycycy yceycu . F1 Score: The F1 score is used as an evaluation metric in classification tasks because it provides a balanced measure of a model's performance by combining precision and recall into a single metric. The formula for calculating F1 score is shown in Equation 4. This metric is particularly useful when there is an imbalance between the number of positive and negative samples, or when the costs of false positives and false negatives are significantly different. ya1 ycIycaycuycyce = 2 O ycEycyceycaycnycycnycuycu O ycIyceycaycaycoyco ycEycyceycaycnycycnycuycu ycIyceycaycaycoyco FIZIO RAMADHAN HERMAN AND ADE ROMADHONY: BUZZER ACCOUNT DETECTION IN POLITICAL HATE TWEETS: CASE STUDY OF THE INDONESIAN PRESIDENTIAL ELECTION 2024 IV. RESULTS AND DISCUSSION Dataset Characteristics Our analysis encompassed two primary datasets. The first dataset consisted of 62341 tweets for hate speech detection, with 1249 . 17%) classified as hate speech and 21094 . 83%) as non-hate speech. For buzzer account detection, after applying our filtering criteria of minimum 10 tweets per account, we analyzed 961 accounts, comprising 580 . 35%) buzzer accounts and 381 . 65%) non-buzzer accounts. Fig. 4 presents the detailed distribution of both datasets, showing the class proportions for hate speech and buzzer account Fig. Dataset Distribution Experimental Setup For hate speech classification, we implemented the IndoBERT model. Additionally, we employed three machine learning models for buzzer account detection: Decision Tree. Random Forest, and XGBoost. Table 2 presents the complete hyperparameter configurations for each model. TABLE II MODEL HYPERPARAMETER CONFIGURATION Model IndoBERT Decision Tree Random Forest XGBoost Parameter Training Batch Size Evaluation Batch Size Number of Epochs Learning Rate Weight Decay Warmup Steps Warmup Ratio Gradient Accumulation Steps Max Depth Min Samples Split Min Samples Leaf Number of Estimators Max Depth Min Samples Split Min Samples Leaf Number of Estimators Learning Rate Max Depth Subsample Gamma Min Child Weight Value INTL. JOURNAL ON ICT VOL. NO. DECEMBER 2024 Hate Speech Classification Result The IndoBERT model demonstrated a robust performance with 91. 12% precision, recall, accuracy, and F1score. Analysis of the confusion matrix reveals strong discrimination ability, with 7,881 true negatives and 3,490 true positives, while maintaining relatively low false positives . and false negatives . These metrics demonstrate the model's effectiveness in detecting hate speech tweets. The balanced score across all metrics indicates consistent performance. With only 1,098 misclassifications out of 12,469 total samples, this suggest that IndoBERT effectively learned the linguistic pattern for hate speech detection in Indonesian political These results are visualized in Fig. 5 and Fig. 6 which illustrate the model's performance metrics and classification distribution respectively. Fig. Performance Metrics (Precision. Recall. F1-Score, and Accurac. of IndoBERT Fig. Confusion Matrix of IndoBERT Fig. Model Performances Comparison FIZIO RAMADHAN HERMAN AND ADE ROMADHONY: BUZZER ACCOUNT DETECTION IN POLITICAL HATE TWEETS: CASE STUDY OF THE INDONESIAN PRESIDENTIAL ELECTION 2024 Buzzer Account Classification Result For buzzer account detection, we evaluated three machine learning models: Decision Tree (DT). Random Forest (RF), and XGBoost (XGB). The Decision Tree classifier achieved the highest performance with 64% precision, recall, and F1-Score. Random Forest and XGBoost performed slightly lower, achieving 61% and 60% F1-scores respectively. These results reflect the inherent complexity of buzzer account detection, where sophisticated account behaviors can make classification challenging. The performance metrics are visualized in Fig. Discussion IndoBERT demonstrated excellent performance in hate speech classification, achieving 91. 12% accuracy with balanced precision and recall scores. This indicates the model's strong capability in understanding Indonesian political discourse and distinguishing hate speech patterns. However, there is still room for improvement through several approaches: Expanding the dataset to include more diverse forms of political hate speech. Incorporating contextual features to better understand implicit forms of hate speech. Meanwhile, the buzzer account detection achieved moderate performance . % accuracy with Decision Tre. , highlighting the need for a more comprehensive dataset. The current dataset of 961 accounts, while providing a foundation for classification, appears insufficient to capture the full complexity of buzzer behavior Future work should focus on: Collecting a larger, more diverse dataset of buzzer accounts. Developing more sophisticated feature engineering techniques to capture evolving buzzer behaviors. Exploring other machine learning methods to better handle the variety of buzzer account strategies. CONCLUSION This research demonstrates the significant presence of buzzer accounts spreading hate speech on Twitter during the 2024 Indonesian Presidential Election, highlighting the urgent need for effective detection mechanisms before implementing countermeasures. Our analysis reveals that out of 62,341 tweets analyzed, 17% contained hate speech, indicating the substantial scale of coordinated negative messaging in political discourse. By leveraging advanced natural language processing and machine learning techniques, we have developed a comprehensive framework for creating healthier social media environments during election periods. Our implementation of IndoBERT for hate speech classification achieved robust performance with 91. precision and F1-score, and 91. 19% accuracy and recall, demonstrating its effectiveness in identifying harmful content in Indonesian political discourse. For buzzer account detection, the Decision Tree classifier emerged as the most effective among traditional machine learning models, achieving 64% across all metrics . recision, recall. F1-score, and accurac. These results validate the potential of combining deep learning approaches for sentiment analysis with traditional machine learning for behavior classification in addressing social media manipulation. While the performance metrics indicate promising capabilities, they also highlight areas for future improvement, particularly in buzzer account detection through expanded datasets and enhanced feature engineering. DATA AND COMPUTER PROGRAM AVAILABILITY Data and program used in this paper can be accessed in the following site: github. INTL. JOURNAL ON ICT VOL. NO. DECEMBER 2024 REFERENCES