Media Jurnal Informatika Vol.
No.
December 2025 p-issn : 2088-2114, e-issn : 2477-2542 Emotion Detection in Indonesian Text Using the Logistic Regression Method Erfian Juniantoa,1,*.
Mila Puspitasarib,2.
Salman Ilyas Zakaria b,3.
Toni Arifina,4.
Ignatius Wiseto Prasetyo Agunga,5 a Program Studi Teknik Informatika.
Fakultas Teknologi Informasi.
Universitas Adhirajasa Reswara Sanjaya.
Bandung.
Indonesia 40282 b Program Studi Sistem Informasi.
Fakultas Teknologi Informasi.
Universitas Adhirajasa Reswara Sanjaya.
Bandung.
Indonesia 40282 1 erfian.
ejn@ars.
2 mileuups14@gmail.
3 salmanzakaria38@gmail.
4 toni.
arifin@ars.
5 wiseto.
agung@ars.
* corresponding author
AR T IC L E I NF O
Article history Received Revised Accepted Keywords Deteksi Emosi
Logistic regression Ensemble Bagging Text Mining Data Tekstual
AB S T R AC T
Emotion detection in Indonesian text has become a crucial topic in the advancement of humanAecomputer interaction and sentiment analysis on digital platforms.
Despite its importance, challenges arise from the linguistic complexity and frequent use of slang in Indonesian text.
This study aims to evaluate the performance of three classification modelsAiLogistic Regression.
K-Nearest Neighbors (KNN), and Naive BayesAiin detecting emotions from Indonesian text.
The dataset comprises 1,000 texts categorized into four emotions: happy, sad, angry, and fear.
Preprocessing steps included slang normalization, text cleaning, tokenization, stopword removal, and stemming, followed by TF-IDF weighting.
Each model was trained and further optimized using ensemble bagging to improve classification The optimized Logistic Regression model achieved the best performance, with an accuracy of 89%, precision of 0.
90, recall 89.
F1-score of 0.
89, and an average ROC-AUC score of 0.
Both KNN and Naive Bayes models reached 81% accuracy after optimization, but their overall performance remained lower than Logistic Regression.
These results indicate that Logistic Regression provides the most consistent and reliable performance for emotion detection in Indonesian text, while the ensemble approach mainly contributes to improving prediction stability and yields more substantial benefits for weaker classifiers.
This study contributes to the development of emotion analysis models for Indonesian text, supporting applications in social computing and affective computing.
Introduction Research on human emotions has long been a central focus across various disciplines, such as cognitive science, psychology, and, more recently, computer science with the rapid growth of social media.
Understanding emotions is essential for advancing humanAecomputer interaction and for exploring social trends in diverse fields, particularly those related to psychological issues.
Emotions play a significant role in daily human life, influencing social relationships, memory, and even decision-making processes .
Text is one of the primary media used for communication and information delivery.
Beyond conveying information, text can also express emotions .
Emotional textual data are increasingly abundant with the growing use of social media.
This trend necessitates the development of more efficient methods for detecting emotions in Indonesian text .
Moreover, textual data often exhibit diverse features, such as variations in writing style, the use of slang, and dialectal differences, all of which can affect the accuracy of emotion Therefore, this study also emphasizes the importance of selecting appropriate feature engineering techniques to address such diversity and to enhance the performance of emotion detection models .
Text mining is a technology used to discover useful knowledge within collections of text documents, enabling the identification of trends, patterns, or similarities in natural language texts that serve specific purposes .
Text mining is also a process of extracting valid and applicable knowledge from various Emotion Detection in Indonesian Text Using the Logistic Regression Method documents and utilizing this knowledge to better organize information for future reference .
Text within documents consists of various types of words, such as prepositions, conjunctions, pronouns, adjectives, and Some of these words cannot be used as document indices because their occurrences are not specific or unique to particular documents .
Through text mining, it becomes possible to extract and uncover valuable information from textual data .
Emotion classification methods are used to detect emotions, where emotion classes are determined based on the analyzed text .
Emotion detection in Indonesian text is one of the challenges in text mining that requires appropriate approaches to achieve accurate results.
Emotions expressed in text can be utilized to understand public sentiment, analyze social media, and support other applications related to human interaction .
In the process of emotion identification, several physiological characteristics can also be employed, such as voice, facial expressions, hand gestures, body movements, heartbeat, blood pressure, as well as information obtained from textual data.
Emotion detection in Indonesian text is increasingly important with the growing use of the language across digital platforms such as social media, e-commerce, and online forums.
The choice of Indonesian text in this study is based on the significant growth of internet and social media users in Indonesia, which provides great opportunities for text-based emotion analysis in a local context .
IndonesiaAos rapid growth in internet usage has led to a substantial increase in user-generated textual content on social media, online forums, and digital communication platforms.
These texts are characterized by informal language, extensive slang usage, abbreviations, code-mixing, and relatively simple grammatical Such linguistic properties present unique challenges for emotion detection, particularly in lowresource language settings where standardized lexical resources are limited.
Consequently, robust and interpretable machine learning models that can effectively handle sparse features and lexical variationAisuch as Logistic RegressionAiare highly relevant for Indonesian emotion classification tasks .
Table 1.
Internet Penetration Rate Year Penetration 64,80% 73,70% 77,01% 78,19% 79,50% Table 1 presents the development of internet penetration rates in Indonesia from 2018 to 2024.
The data illustrate a consistent upward trend, with internet penetration reaching 79.
50% in 2024.
The Indonesian language offers several advantages in terms of its relatively simpler structure compared to English.
The absence of verb conjugation based on tenses or subjects facilitates text processing and the application of modeling techniques .
This structural simplicity can also reduce the complexity of detecting emotions expressed explicitly in text.
However, emotion detection in Indonesian text also faces unique challenges.
First, emotional expressions in Indonesian are often more formal and less explicit compared to English, making emotion classification more difficult .
Additionally, the limited vocabulary of Indonesian in representing diverse emotional nuances often relies on loanwords or local dialects, which are not always consistent.
Another challenge is the scarcity of adequate Indonesian-language datasets for training emotion detection models, which often necessitates manual annotation .
Previous studies have evaluated the performance of Deep Learning methods in detecting emotions in social media text using several datasets, including Semeval.
WASSA.
Tweet Pemilu, and Crowdflower.
The experimental results demonstrated that Deep Learning is among the most effective methods for emotion On the Semeval dataset, the CNN architecture achieved the highest accuracy of 81.
Meanwhile, for the WASSA dataset.
CNN.
MLP, and GRU methods showed comparable performance.
On the Tweet Pemilu dataset.
LSTM and GRU achieved the highest accuracy.
For the Crowdflower dataset, the LSTM.
RNN, and GRU methods yielded the best performance, with the highest accuracies of 92.
33% for LSTM and 92.
30% for both RNN and GRU .
Subsequent research compared the performance of Logistic Regression and K-Nearest Neighbors (KNN) in text classification using TF-IDF and hyperparameter tuning.
TF-IDF significantly improved the Junianto.
Puspitasari.
Zakaria.
Aripin &Agung Media Jurnal Informatika, 2025, 17 .
, 305-316 performance of KNN, with accuracy and F1-score increases of up to 48.
4% and 54.
84%, respectively, whereas for Logistic Regression, only precision improved by 2.
The combination of TF-IDF and hyperparameter tuning yielded the best results for Logistic Regression, with an accuracy of 65% and an F1-score of 66% .
Another study focused on classification methods and feature extraction for sentiment analysis, with most datasets derived from Twitter.
Commonly used algorithms included Nayve Bayes.
Support Vector Machine (SVM).
Logistic Regression, and lexicon-based approaches.
The results showed that Logistic Regression achieved the highest accuracy at 93.
60%, followed by the lexicon-based approach with 92%, while Nayve Bayes.
SVM.
Random Forest, and K-Means achieved 88.
20%, 85.
50%, 81%, and 84.
16%, respectively.
The superior performance of Logistic Regression and lexicon-based methods was likely influenced by optimal feature extraction and effective dataset management .
Previous studies also compared the performance of machine learning and deep learning models in sentiment analysis of Shopee customer reviews using a dataset of 6,002 comments.
The machine learning models tested included Logistic Regression.
Nayve Bayes, and Multinomial Nayve Bayes, with Logistic Regression achieving the highest accuracy at 94.
In contrast, for deep learning models.
BERT achieved an accuracy of 92.
83%, while Multilingual BERT produced the best results with 97.
41% accuracy.
These findings suggest that deep learning models demonstrate superior accuracy compared to machine learning models in customer sentiment analysis .
Ae.
Other studies have also revealed that Logistic Regression achieved the highest accuracy in detecting emotions in English text .
, particularly when combined with ensemble bagging techniques .
The Indonesian language, however, presents unique characteristics compared to English, such as a more limited vocabulary for describing emotions and the tendency to use more formal expressions.
These factors pose challenges for emotion detection in Indonesian text.
Therefore.
Logistic Regression is considered more suitable for text classification tasks in both Indonesian and English, due to its ability to handle highdimensional and sparse data, which are common in text representations using TF-IDF.
Models such as KNN, although effective for non-linear data distributions, are generally less efficient for textual data as they require more memory and computational time, especially when applied to large-scale datasets .
, .
Accordingly, this study aims to analyze the effectiveness of Logistic Regression in detecting emotions in Indonesian text.
The main focus is on applying ensemble bagging techniques and feature engineering to improve the accuracy of emotion detection models, taking into account the simpler grammatical structure of Indonesian compared to English.
In addition, this research incorporates an extra preprocessing stage for handling slang words, which are commonly found in Indonesian text.
The study also seeks to identify key challenges, such as the tendency of emotional expressions in Indonesian to be more formal and less explicit, as well as the limitations in vocabulary that may influence detection outcomes.
Method Figure 1 illustrates the stages or processes of the proposed model for emotion detection using the Ensemble Bagging approach.
The diagram outlines the sequential steps performed, starting from the input of textual data to the final output of detected emotions.
Emotion Detection in Indonesian Text Using the Logistic Regression Method Fig.
Research Method 1 Dataset Collection In this study, the data were obtained from a GitHub repository providing a dataset to support emotion detection analysis in Indonesian text.
The dataset consists of four emotion categories: happy, sad, angry, and Each emotion category contains 250 text samples, with the total dataset comprising 1,000 samples.
The dataset size is presented in Table 2.
Table 2.
Dataset Size Class Total Happy Sad Angry Fear Table 2 shows the distribution of the four emotions in the dataset: happy, sad, angry, and fear .
Each emotion category contains 250 samples, resulting in a total of 1,000 samples.
This balanced distribution is essential to prevent bias toward any particular emotion, ensuring that the analysis and the developed model can provide more accurate and balanced results in identifying or classifying emotions.
2 Preprocessing Text preprocessing is the process of preparing text for analysis by converting unstructured data into structured data.
Typically, structured data are represented in numerical form .
The preprocessing stages in this study include handling slang words, cleaning and lowercasing, tokenization, stopword removal, and Slang words refer to language commonly used in informal conversations by teenagers or young adults, whether in the United States, the United Kingdom, or Indonesia.
The slang words stage is performed prior to cleaning, where non-standard or slang words are converted into standard words.
For example, "knp" is transformed into "kenapa," and "jg" becomes "juga," to ensure that the text is more formal and conforms to standard grammar .
This process is illustrated in Table 3.
Table 3.
Slang Words Text Output Gk ada yang mw ngerjain tp sy harus lakuin karena udah telat banget, dan harus segera selesai.
Tidak ada yang mau mengerjakan tapi saya harus lakukan karena sudah telat banget, dan harus segera selesai.
The cleaning stage aims to reduce noise in the data by removing account names, numbers, "RT" tags, hashtags, duplicates, emoticons, punctuation, and hyperlinks.
The lowercasing stage converts all text in the documents into a standard format, typically lowercase letters from 'a' to 'z' .
This process is shown in Table 4.
Table 4.
Slang Words Junianto.
Puspitasari.
Zakaria.
Aripin &Agung Media Jurnal Informatika, 2025, 17 .
, 305-316 Text Output Tidak ada yang mau mengerjakan tapi saya harus lakukan karena sudah telat banget, dan harus segera tidak ada yang mau mengerjakan tapi saya harus lakukan karena sudah telat banget, dan harus segera selesai The next stage, tokenization, separates sentences into individual words, which are then arranged in an array format .
, as shown in Table 5.
Table 5.
Tokenizer Text Output tidak ada yang mau mengerjakan tapi saya harus lakukan karena sudah telat banget, dan harus segera ["tidak", "ada", "yang", "mau", "mengerjakan", "tapi", "saya", "harus", "lakukan", "karena", "sudah", "telat", "banget", "dan", "harus", "segera", "selesai"] The subsequent stage is stopword removal, which eliminates common words that frequently appear but do not affect sentiment .
, as presented in Table 6.
Table 6.
Stopwords Text Output ["tidak", "ada", "yang", "mau", "mengerjakan", "tapi", "saya", "harus", "lakukan", "karena", "sudah", "telat", "banget", "dan", "harus", "segera", "selesai"] [AutidakAy.
AumauAy.
AumengerjakanAy.
AuharusAy.
AulakukanAy.
AutelatAy.
AubangetAy.
AuharusAy.
AusegeraAy.
AuselesaiA.
The final stage is stemming, the process of converting words to their root form by removing prefixes, infixes, or suffixes.
The Sastrawi library was used for stemming in this study .
, as illustrated in Table 7.
Table 7.
Stemming Text Output tidak mau mengerjakan harus lakukan telat banget harus segera selesai tidak mau kerja harus laku telat banget harus segera selesai 3 Vector Creation After preprocessing, the next step is to convert the collection of words into vectors using TF-IDF weighting .
Term Frequency-Inverse Document Frequency (TF-IDF) is a statistical method used in natural language processing and information retrieval systems to assess the importance of a term .
within a document relative to a larger collection of documents.
TF-IDF generates a score that reflects how significant a term is in a specific document compared to other documents.
This score can be used to measure and compare the relevance of documents within a retrieval system.
Documents with higher TF-IDF scores for a given term are generally considered more relevant to user queries containing that term.
TF-IDF is a widely used method for enhancing accuracy and relevance in information retrieval .
The dataset was divided using a 90:10 ratio.
From the total dataset of 1,000 samples, this split results in 900 samples for training and 100 samples for testing.
The purpose of this division is to ensure that the model can be evaluated on previously unseen data, allowing its performance to be assessed objectively .
4 Classification Model After the dataset splitting process, the next step is to apply the classification model using a combination of machine learning algorithms, namely K-Nearest Neighbors (K-NN).
Logistic Regression, and Naive Bayes, through the ensemble bagging approach.
In this study, the results of the three algorithms are integrated at the testing stage.
Each algorithm first processes the test data separately.
The predictions generated by each algorithm are then combined using the ensemble bagging technique, producing a final prediction that is more stable and accurate.
Emotion Detection in Indonesian Text Using the Logistic Regression Method Equation 1 illustrates the ensemble process in the bagging method, where the final prediction obtained by summing or aggregating the predictions generated by several base learners.
The measurement process evaluates the ability of the learning algorithm models to solve problems or manage data effectively .
5 Measurement Value This evaluation utilizes true positive (TP), false positive (FP), true negative (TN), and false negative (FN) values arranged in a confusion matrix to determine the performance of the machine learning algorithms.
The data from the confusion matrix are used to calculate accuracy, precision, recall, and F1-score, which serve as indicators of the applied algorithm's performance .
The accuracy score reflects the proportion of correct predictions relative to the total number of predictions, with a maximum value of 1 and a minimum value of 0.
The formulas for calculating accuracy, precision, recall, and F1-score are presented in Equations 2, 3, 4, and 5 .
Results and Discussion This section presents the results and discussion of the conducted research.
The main aspects discussed include Research Data.
Data Preprocessing.
Weighting Results.
Dataset Splitting.
Classification Modeling, and Evaluation Metrics.
Research Data The research data were obtained through a web scraping process from a GitHub repository containing the relevant dataset, including a slang dictionary used in this study, as shown in Table 8.
Table 8.
Sample Texts from the Dataset in the GitHub Repository Text
Ga harus ngomel-ngomel,bisa?! lah ko ketawa masalah ? Makan Sehat Hidup Senang
pas gua udah mulai suka , udah membuka hati lagi lu malah ngecewain
ku sedih banget bulan itu aku udah uts jadi gak bisa ke jakarta :(( huhuhu pengen banget ktmu kalex
Aten rindu saya sampai letak gmbar berdua dengan saya di wechat
GUE GA PERDULI LO MAU NGOMONG APAAN
Pinter boong, suka ngeles, pinter cari muka, otak kecil , yah itu si "budakkecikbalita" smtime bkn krn kebohongan utk membenci ssorg, tp krn sedih menerima kenyataan bahwa ia tak bisa lg dipercaya:') Jangan sedih bila sekarang masih dipandang sebelah mata, buktikan bahwa anda layak mendapatkan kedua matanya.
Table 8 illustrates sample texts from the dataset employed in this study, consisting of 1,000 Indonesianlanguage texts that had been annotated with emotion labels.
Each text is categorized into several types of emotions, such as anger, sadness, and fear.
For instance, under the anger label, an example text is: AuSUSAH NGOMONG SAMA ORANG YANG GA TAU DIRI.
Ay Meanwhile, under the sadness label, a sample text is:
Aumacam mw balas tweet klau sy x pandai english.
haiszz #sedih.
Ay Each emotion category covers texts with diverse contexts, ranging from expressions of frustration and loss to fear regarding particular situations.
The dataset was sourced from the GitHub repository that compiles Indonesian-language texts related to emotional Since the dataset contained informal and non-standardized language, preprocessing was required to enhance data quality.
One critical step in this process was slang word replacement.
This was accomplished by using the slang dictionary included within the dataset to substitute non-standard words or abbreviations with Junianto.
Puspitasari.
Zakaria.
Aripin &Agung Media Jurnal Informatika, 2025, 17 .
, 305-316 their standardized forms, such as converting AusyAy to AusayaAy and AugkAy to Autidak.
Ay This step was essential to ensure textual consistency and to facilitate subsequent model processing.
The preprocessing prepared the dataset to be optimally utilized for emotion detection in Indonesian texts, as presented in Table 9.
Table 9.
Sample of the Slang Dictionary Slang Standard Word Table 9 displays 10 word pairs out of a total of 542 pairs as a sample from the dataset.
For example, slang terms such as AuadAy are transformed into AuadaAy.
AubtwAy into AubangetAy, and AubnykAy into Aubanyak.
Ay This slang dictionary plays a crucial role in the preprocessing stage to ensure that the texts follow a more standardized format, making them easier to process by computational algorithms.
The dictionary was employed for text normalization, i.
, converting informal or non-standard words into their formal equivalents.
Such normalization is vital for maintaining data consistency in text processing, which in turn facilitates analysis and improves model accuracy, particularly in tasks such as emotion detection.
This process significantly contributes to enhancing the performance of emotion detection models and other text analysis applications by reducing ambiguity and linguistic variation.
Data Preprocessing The comments presented in Table 10, which are intended for classification or prediction purposes, were first processed through a text preprocessing pipeline.
This stage included handling slang or non-standard words, removing stopwords, eliminating punctuation marks, converting all text to lowercase, and applying stemming to each word .
At this stage.
Indonesian-language comments were processed through several sequential steps.
First, informal or slang words were cleaned and converted into their standardized forms.
Next, all text was transformed into lowercase.
The text was then tokenized into individual words.
The process continued with the removal of common words that do not carry significant meaning, such as AudanAy .
AuatauAy .
AukeAy .
, and AudiAy .
topword remova.
Finally, stemming was applied to reduce words to their root forms .
Table 10.
Preprocessing Text Preprocessing knp ka?? aku jg gk denger siaran trakhir kk lg.
#sedih.
Mksih ya ka udh jd tmn aku di stiap kk lg siaran kenapa ka?? aku juga tidak denger siaran terakhir kak lagi .
# sedih .
makasih ya ka sudah jadi teman aku di setiap kak lagi siaran kenapa ka aku juga tidak denger siaran terakhir kak lagi sedih makasih ya ka sudah jadi teman aku di setiap kak lagi siaran .
enapa, ka, aku, juga, tidak, denger, siaran, terakhir, kak, lagi, sedih, makasih, ya, ka, sudah, jadi, teman, aku, di, setiap, kak, lagi, siara.
a, denger, siaran, terakhir, kak, sedih, makasih, ka, jadi, teman, kak, siara.
a, denger, siar, akhir, kak, sedih, makasih, ka, jadi, teman, kak, sia.
Text Slang Words Cleaning & Lowercase Tokenizer Stopword Removal Stemming Term Weighting Word weighting using the TF-IDF (Term FrequencyAeInverse Document Frequenc.
method aims to determine the importance of a word within a document relative to the entire dataset.
This method calculates term frequency while assigning higher weights to words that are unique or rarely appear in other documents .
nverse document frequenc.
The process strengthens the influence of relevant words and diminishes the impact of frequently occurring but less meaningful words.
This technique is commonly implemented using Emotion Detection in Indonesian Text Using the Logistic Regression Method libraries such as scikit-learn to efficiently compute weight values .
The TF-IDF results are presented in Table 11.
Table 11.
TF-IDF Result
Text
TF-IDF
SUSAH NGOMONG SAMA ORANG YANG GA TAU DIRI
macam mw balas tweet klau sy x pandai english knp ka?? aku jg gk denger siaran trakhir kk lg.
Mksih ya ka udh jd tmn aku di stiap kk lg siaran hujan lbat petirr aku takut pengen off aja Yang tadinya tenang dirumah Dataset Splitting The dataset employed in this study consisted of 1,000 text samples distributed across four emotion categories: happy, sad, angry, and fearful, with 250 samples in each category.
To ensure an objective evaluation of model performance, the dataset was divided into two subsets with a 90:10 ratio.
A total of 900 samples were allocated for training, while the remaining 100 samples were reserved for testing.
The 90:10 ratio was selected to provide a sufficiently large portion of training data, allowing the model to better learn patterns from each emotion category.
At the same time, the testing subset enabled effective evaluation on previously unseen data, which is crucial for assessing model generalization .
The class distribution within the dataset was deliberately balanced to prevent bias toward any particular emotion category.
This balance is important to avoid classification results being overly influenced by a majority class .
Additionally, a simple validation method, i.
, trainAetest split, was employed to maintain efficiency in the modeling process, considering the relatively small dataset size.
A similar 90:10 split ratio has been adopted in prior studies, such as .
, which demonstrated that this division offers an effective tradeoff between training performance and testing evaluation in text classification tasks.
Classification Modeling This study implemented three primary algorithms for emotion detection in text: Logistic Regression.
KNearest Neighbors (KNN), and Nayve Bayes.
To enhance model performance, an ensemble bagging approach was applied, combining predictions from multiple models to produce more stable and accurate outcomes.
Logistic regression Logistic Regression was chosen due to its capability to handle high-dimensional text data, such as TFIDF representations, and its robustness when processing sparse datasets.
Logistic Regression has also demonstrated competitive performance in text classification tasks.
The baseline evaluation results showed an accuracy of 88%, with precision, recall, and F1-score of 0.
89, 0.
88, and 0.
88, respectively.
After hyperparameter tuning with ensemble bagging, performance improved to 89% accuracy, with precision, recall, and F1-score of 0.
90, 0.
89, and 0.
89, respectively.
K-Nearest Neighbors (KNN) KNN was used as a comparison to Logistic Regression because of its intuitive nature and its ability to capture non-linear patterns within the data.
However, this algorithm is sensitive to noise and often requires additional optimization Baseline results indicated an accuracy of 62%, with precision, recall, and F1-score of 64, 0.
62, and 0.
62, respectively.
After optimization through ensemble bagging, accuracy improved significantly to 81%, with precision, recall, and F1-score of 0.
83, 0.
81, and 0.
81, respectively.
Naive Bayes Nayve Bayes was selected due to its probabilistic approach, which is well-suited for text classification, particularly with relatively small datasets .
The baseline model achieved an accuracy of 80%, with precision, recall, and F1-score of 0.
81, 0.
80, and 0.
80, respectively.
After optimization with ensemble bagging, accuracy increased to 81%, with precision, recall, and F1-score of 0.
83, 0.
81, and 0.
81, respectively.
Model Comparison At this stage, an analysis was conducted to evaluate the performance of the models used in this study.
The objective was to assess how effectively each algorithm could detect emotions with high accuracy, both under baseline conditions and after optimization with ensemble bagging.
Table 12 presents a comparison of the performance results for the three algorithms, including outcomes before and after optimization.
Junianto.
Puspitasari.
Zakaria.
Aripin &Agung Media Jurnal Informatika, 2025, 17 .
, 305-316 Table 12.
Evaluasi Model Result Methode Accuracy (%) Precision Recall F1-Score Logistic regression (LR) Ensemble Bagging (LR) KNN Ensemble Bagging (KNN) Nayve Bayes Ensemble Bagging (NB) 0,89 0,90 0,64 0,83 0,81 0,83 0,88 0,89 0,62 0,81 0,80 0,81 0,88 0,89 0,62 0,81 0,80 0,81 The Ensemble Bagging models (KNN and Nayve Baye.
achieved accuracy, precision, recall, and F1score values of 81%.
Logistic Regression initially obtained an accuracy of 88%.
however, applying ensemble bagging increased its accuracy to 89%, yielding the best performance among all models.
On the other hand, the KNN model without bagging showed the lowest performance with an accuracy of 62%, but after applying ensemble bagging, its performance improved significantly to 81%.
These results demonstrate that the ensemble bagging method is effective in enhancing model performance.
Measurement Value The performance of the models was evaluated using accuracy, precision, recall, and F1-score to measure their ability to detect emotions in Indonesian texts.
These metrics were calculated based on the confusion matrix, which illustrates the relationship between predicted and actual values.
According to the evaluation results.
Logistic Regression achieved the best performance, both before and after optimization with ensemble Table 13 and Equations .
Ae.
present the performance calculation of Logistic Regression after optimization, based on the following confusion matrix.
Table 13.
Actual Positive Actual Negative Predicted and Actual Predicted Positive Predicted Negative 89 (TP) 11 (FP) 11 (FN) 89 (TN) .
The results indicate that Logistic Regression with ensemble bagging exhibited excellent performance and emerged as the best-performing model in this study.
In addition to the aforementioned metrics, the models were also evaluated using the ROC-AUC (Receiver Operating Characteristic Ae Area Under the Curv.
The ROC-AUC curve measures the ability of a model to distinguish between positive and negative classes across various thresholds.
The closer the value is to 1, the better the modelAos classification ability.
The following figures illustrate the ROC-AUC curves for the three models employed in this study: Logistic Regression.
KNN, and Nayve Bayes.
Figure 2 shows the ROC-AUC curve of the Logistic Regression model in a multiclass classification Each class achieved a high AUC value, with Class 0 and Class 1 reaching 0.
Class 2 achieving 96, and Class 3 achieving 0.
AUC values close to 1 indicate that the model has excellent discriminative capability among the emotion classes in the dataset.
This curve further reinforces the superiority of Logistic Regression as the best-performing model in this research.
Figure 3 illustrates the ROC (Receiver Operating Characteristi.
curve of the K-Nearest Neighbors (KNN) algorithm in multiclass classification.
The graph presents the modelAos performance for each class, as represented by varying AUC values.
Class 1 achieved the highest performance with an AUC of 0.
Emotion Detection in Indonesian Text Using the Logistic Regression Method followed by Class 0 (AUC = 0.
Class 3 (AUC = 0.
, and Class 2 (AUC = 0.
This curve demonstrates the trade-off between the False Positive Rate (FPR) and True Positive Rate (TPR) for each To improve clarity and reproducibility, the corresponding AUC values for each emotion class are summarized in table 14.
Table 14.
ROC-AUC Values per Class for Each Classification Model Model Class 0 Class 1 Class 2 Class 3 Logistic Regression K-Nearest Neighbors Nayve Bayes Classifier Fig.
ROC-AUC Curve for the Logistic Regression Model Fig.
ROC-AUC Curve for the K-Nearest Neighbors (KNN) Model Fig.
ROC-AUC Curve for the Nayve Bayes Model Junianto.
Puspitasari.
Zakaria.
Aripin &Agung Media Jurnal Informatika, 2025, 17 .
, 305-316 Conclusion This study demonstrates that Logistic Regression optimized with the ensemble bagging approach is an effective solution for detecting emotions in Indonesian texts.
The model successfully addressed challenges such as slang variations, the presence of stopwords, and the limited size of Indonesian-language datasets.
With an accuracy of 89%, this method achieved the best performance compared to other models such as KNN and Nayve Bayes.
Furthermore, the comprehensive preprocessing steps contributed significantly to improving data quality and predictive outcomes.
The findings of this study are expected to serve as a reference for the development of Indonesian text-based emotion analysis applications, such as social media analytics or humanAe computer interaction.
However, the limitations of this research lie in the relatively small dataset size and the restricted coverage of emotion categories.
Future studies are recommended to employ larger and more diverse datasets to further enhance model generalization.
Acknowledgment The authors would like to express their gratitude to the Faculty of Engineering.
Universitas Suryakancana, for providing a platform to conduct and develop this research.
It is hoped that this study will contribute significantly to the advancement of scientific knowledge in Indonesia.
Declarations Author contribution.
ll authors contributed substantially to this research.
Contributions include problem formulation, literature review, methodology design, data analysis, and manuscript preparation.
All authors have read and approved the final manuscript for publication.
Funding statement.
This research was self-funded by the authors.
Conflict of interest.
The authors declare no conflict of interest regarding the research or the publication of this Additional information.
No additional information is available for this paper.
Data and Software Availability Statements The data and processes implemented using Python that support the findings of this study are accessible at the following public repository: .
ttps://github.
com/erfianjunianto/deteksi-emosi-mj.
All datasets analyzed or generated during this research/experiment are openly available for academic purposes and further research.
References