TELKOMNIKA Telecommunication Computing Electronics and Control Vol.
No.
April 2026, pp.
ISSN: 1693-6930.
DOI: 10.
12928/TELKOMNIKA.
Evaluating learning rate effects on long short-term memory for Indonesian sentiment classification Serly Eldina.
Tekad Matulatan.
Novrizal Fattah Fahmitra Department of Informatics Engineering.
School of Electrical and Informatics.
Universitas Maritim Raja Ali Haji.
Tanjungpinang.
Indonesia
Article Info
ABSTRACT
Article history:
Hyperparameter optimization is a crucial process for enhancing the performance of deep learning models, particularly in the context of Indonesian sentiment classification.
This study examines the impact of varying learning rates on a long short-term memory (LSTM) architecture trained with the adaptive moment estimation (Ada.
The dataset comprises 9,295 Indonesian comments automatically labeled by the Indonesian Bidirectional Encoder Representations from Transformers (IndoBERT) model.
Stratified k-fold cross-validation was employed to maintain class balance during training.
Learning curves were analyzed to evaluate convergence and identify potential overfitting, while early stopping was applied when performance improvements became insignificant.
The one-way analysis of variance (ANOVA) test .
-adj = 0.
000575 < 0.
revealed significant differences among the learning rate variations.
Post-hoc analysis indicated the learning rates of 0.
0001, 0.
001, and 0.
002 differ significantly from 0.
Descriptive statistics showed that a learning rate of 001 was the most optimal, achieving the highest validation accuracy while maintaining a relatively low variance.
Evaluation across two data categories demonstrated that lower learning rates .
0001 and 0.
achieved the best accuracy, 78.
71% on in-domain data, whereas higher learning rates .
performed better on cross-domain data with 36% accuracy.
These findings highlight the crucial role of learning rate selection in determining model stability and generalization capability.
Received Jul 14, 2025 Revised Jan 12, 2026 Accepted Jan 30, 2026 Keywords:
Adaptive moment estimation Early stopping Learning rate Long short-term memory One-way analysis of variance Sentiment classification This is an open access article under the CC BY-SA license.
Corresponding Author:
Serly Eldina Department of Informatics Engineering.
School of Electrical and Informatics Universitas Maritim Raja Ali Haji Tanjungpinang.
Indonesia Email: serlyeldina03@gmail.
INTRODUCTION
Sentiment analysis on social media has become a prominent research area that leverages natural language processing (NLP) and computational linguistics to extract subjective information from textual data, thereby improving the understanding of public opinion, brand perception, and social dynamics .
, .
It is widely used to classify Indonesian text into three sentiment categories: positive, negative, and neutral .
One of the most widely used models for this purpose is the long short-term memory (LSTM), a type of recurrent neural network (RNN) that is capable of capturing long-term dependencies within sequential data, allowing it to perform effectively across a wide range of text processing tasks .
Learning rate optimization plays a crucial role in deep learning, as it significantly affects model stability and convergence.
Among various optimization algorithms, the adaptive moment estimation (Ada.
Journal homepage: http://journal.
id/index.
php/TELKOMNIKA TELKOMNIKA Telecommun Comput El Control optimizer is one of the most popular, as it adaptively adjusts the learning rate using both momentum and gradient mean estimation, enabling faster and more stable convergence .
The learning rate itself is a critical hyperparameter that influences the stability and speed of the training process.
A high learning rate increases the risk of overshooting the optimal solution, whereas a low learning rate leads to gradual and slower convergence .
Proper tuning of this parameter is vital to achieving optimal model performance and to mitigating overfitting or underfitting.
Therefore, additional strategies such as early stopping and dropout are often applied to prevent these issues .
Previous studies have explored the implementation of the LSTM model .
, and the use of the Adam optimizer for Indonesian sentiment analysis .
For example, a study analyzing various learning rate configurations reported that different learning rate values .
1, 0.
01, 0.
and epoch settings .
, 500, 1.
influenced model stability and accuracy in time-series prediction of chlorophyll-a concentration, with suboptimal learning rates causing training instability .
Similarly, a study on the CIFAR-10 dataset showed that a learning rate of 0.
001 combined with a dropout rate of 0.
5 yielded the best performance in addressing overfitting and underfitting issues .
Another study on the Indonesian Bidirectional Encoder Representations from Transformers (IndoBERT) sentiment analysis model concluded that variations in the Adam learning rate significantly influenced stability and accuracy.
The optimal learning rate of 2e-5 achieved an accuracy of 94.
14%, while an excessively low value of 1e-7 caused instability and reduced accuracy to 69.
76% .
Although previous studies have examined the impact of learning rate variations, there remains a research gap.
Based on our current knowledge, no study has specifically investigated the effect of learning rate variations on the LSTM model for Indonesian text classification using automatically labeled data generated by a fine-tuned IndoBERT model.
Most prior works focused on time-series, image, or sentiment datasets without performing statistical significance tests on the results.
Therefore, this study aims to evaluate the effect of different Adam learning rate configurations on the performance of a text classification model.
The model was trained using 9,295 Indonesian comments, and Stratified k-fold cross-validation was applied to ensure result reliability and to address class imbalance by maintaining proportional class distributions in each fold .
Furthermore, a one-way analysis of variance (ANOVA) test with a significance level of .
-adj < 0.
was conducted to determine statistical significance, followed by a posthoc test to assess differences among learning rate variations .
, .
The evaluation was conducted using a confusion matrix under two data scenarios to assess the modelAos generalization ability: in-domain .
omments from the Sirekap application on the Play Stor.
and cross-domain .
ews headlines related to the AuMakanan Bergizi Gratis (MBG)Ay topi.
The main contributions of this study are as follows: .
providing an empirical analysis of the effect of learning rate variations on the accuracy, stability, and generalization ability of the LSTM model and .
presenting the first study that examines the impact of learning rate variations on an Indonesian LSTM model automatically labeled using the IndoBERT model, with validation through a one-way ANOVA statistical test and evaluation conducted on both in-domain and cross-domain datasets.
The remainder of this paper is organized as: section 2 describes the research methodology, section 3 presents the experimental results and analysis, and section 4 concludes the study and provides directions for future research.
METHOD
This study employs a sentiment classification approach to analyze textual data systematically.
The research methodology is shown in Figure 1, which presents the process from data collection and preprocessing to model training and evaluation.
The Figure 1 provides a clear overview of the workflow used in this study.
Figure 1.
Research methodology Evaluating learning rate effects on long short-term memory for Indonesian A (Serly Eldin.
A ISSN: 1693-6930 Data scraping Data collection was performed through a web scraping technique using the Google Play Scraper .
, to systematically obtain user comments.
In total, 19,936 entries were collected, focusing on AuMost RelevantAy reviews from the Sirekap application available on the Google Play Store.
Data preparation The data preparation stage involves several essential processes to ensure that the dataset is suitable for analysis and modeling.
This stage consists of preprocessing, labeling, feature extraction, and formatting, each contributing to improved data quality and model compatibility.
Preprocessing Data preprocessing is a fundamental step in this study to ensure that the data used for analysis is clean, consistent, and properly prepared for further processing.
This step is crucial in sentiment analysis, as it converts raw text into a structured format suitable for computational processing .
, .
The process generally involves several key operations, including redundancy removal .
, text cleaning, tokenization, stopword removal, case folding, and stemming .
, as well as handling missing values .
These preprocessing steps collectively improve the quality and reliability of the dataset for model training.
Labeling The data labeling process was conducted using the IndoBERT model available on the Hugging Face platform, which facilitates community collaboration by providing open-source tools .
The pretrained model used for this task was mdhugol/indonesia-bert-sentiment-classification.
Manual labeling was not performed due to limited linguistic expertise and the need to minimize annotator subjectivity.
The IndoBERT model was chosen because it is a modern transformer-based model trained on a large-scale Indonesian corpus, enabling a deeper understanding of contextual meaning .
However, lexicon-based approaches, such as the Indonesian Sentiment Lexicon (InSe.
, have fundamental limitations because they rely on keyword matching within the lexicon and cannot capture semantic meaning patterns expressed in full sentences .
Valence aware dictionary and sentiment reasoner (VADER) and TextBlob require translation into English, which may result in the loss of the original meaning .
IndoBERT was used in this study to automatically label sentiments, creating three classes: negative, neutral, and positive.
After the labeling process, a total of 17,152 text entries were obtained, consisting of 12,276 .
6%) negative, 2,946 .
2%) neutral, and 1,930 .
2%) positive sentiments.
The resulting labeled dataset was then used to train the LSTM model.
Resampling The dataset generated from the IndoBERT-based labeling process was further refined through undersampling using the random under sampler technique to reduce class imbalance and minimize model bias toward the majority class .
After the undersampling process, the dataset consisted of 9,295 entries, with class distributions of 4,419 .
5%) negative, 2,946 .
7%) neutral, and 1,930 .
8%) positive The distribution results after resampling are shown in Figure 2.
Figure 2.
Class distribution comparison before and after undersampling TELKOMNIKA Telecommun Comput El Control.
Vol.
No.
April 2026: 564-573 TELKOMNIKA Telecommun Comput El Control The dataset was stratified into 90% for training .
,365 entrie.
and 10% for testing .
, ensuring that class proportions remained balanced across subsets.
A stratified 10-fold cross-validation was then applied to the training set.
Each fold produced training and validation subsets with balanced class distributions, enabling a comprehensive model evaluation and ensuring more reliable and representative results .
Splitting data The dataset used in this study consisted of 9,295 entries, divided into a training set comprising 99% .
,365 entrie.
and a testing set comprising 10.
01% .
The training data were subsequently partitioned using stratified 10-fold cross-validation, with 90% .
,528 entrie.
used for training and 10% .
for validation in each fold during model training .
Feature extraction Feature extraction was conducted to convert textual data into numerical representations suitable for processing by the classification algorithm .
This process involved tokenization, in which the text was transformed into integer sequences using the Keras Tokenizer .
, and padding, which standardized the input length by truncating texts that exceeded the maximum length or appending zeros to shorter sequences .
The categorical sentiment labels were mapped into three classes and encoded using one-hot encoding.
The feature extraction process is illustrated schematically, as shown in Figure 3.
Figure 3.
Illustration of the feature extraction process Training with long short-term memory The LSTM algorithm for sentiment analysis was trained using specific hyperparameter configurations, with a primary focus on optimizing the learning rate to ensure training stability.
The evaluated learning rate values were 0.
0001, 0.
001, 0.
002, 0.
005, 0.
01, and 0.
During training, an early stopping mechanism was employed to terminate the process when improvements in validation performance became insignificant, thereby preventing overfitting and accelerating convergence .
The LSTM model was implemented using the Keras library in Python .
The model configuration in this study consisted of a 64-dimensional embedding, an LSTM layer with 32 units, and a dense layer with 3 output units, along with a dropout rate set to 0.
2 and a batch size of 32.
This configuration was adapted from previous studies, in which an embedding size of 64 and a dropout rate of 0.
2 were found effective for sentiment analysis tasks .
, while the use of 32 LSTM units followed a hybrid architecture applied to hotel review sentiment analysis .
A batch size of 32 was consistently adopted in both prior Testing The testing dataset was reserved exclusively for evaluating the final performance of the model and was not used during the training or validation phases to prevent bias .
The evaluation was conducted across two domains using a confusion matrix: in-domain, which includes 930 test samples from Sirekap application reviews on the Google Play Store .
negative, 295 neutral, and 193 positiv.
, and crossdomain, which includes 100 news headlines related to the MBG .
negative, 40 neutral, and 30 positiv.
Measurement Model evaluation was conducted through cross-validation on the training data, followed by a oneway ANOVA test to determine the significance of performance differences among various learning rate A post-hoc test was subsequently applied to further examine pairwise differences .
validate the statistical results, a final evaluation was conducted on the testing dataset under two scenarios:
comments from the Sirekap application on the Google Play Store and news headlines related to the MBG Evaluating learning rate effects on long short-term memory for Indonesian A (Serly Eldin.
A ISSN: 1693-6930 The representative model with the best performance from each learning rate configuration was selected based on the optimal fold.
Model performance was measured using standard classification metrics derived from the confusion matrix, including accuracy, precision, recall, and F1-score.
RESULTS AND DISCUSSION
Experimental results and analysis The experiments were conducted on a text classification task using a LSTM architecture optimized with the Adam optimizer.
An early stopping mechanism with a patience of three epochs was applied to prevent overfitting .
Training was halted when no improvement in validation performance was observed for three consecutive epochs, and the best-performing model was selected based on the lowest validation loss before reaching the patience threshold.
Figure 4 shows the validation loss curve of the model achieving the highest validation accuracy, presented as a representative example since all folds exhibit similar patterns.
The red dash-dot line indicates the selected model prior to the onset of increasing loss, illustrating the loss behavior before potential overfitting.
Figure 4.
Representative loss history for selected learning rates: .
0001, .
001, .
002, .
01, and .
TELKOMNIKA Telecommun Comput El Control.
Vol.
No.
April 2026: 564-573 TELKOMNIKA Telecommun Comput El Control Based on the loss curves shown in Figure 4, the model with the smallest learning rate .
required more epochs to reach convergence due to the limited number of weight updates, resulting in a slower, more stable optimization process.
In contrast, higher learning rates .
001Ae0.
accelerated the reduction of training loss and caused an earlier increase in validation loss, indicating the onset of overfitting.
These configurations achieved their best performance in fewer epochs, exhibiting faster, less stable convergence patterns.
In Figure 4.
corresponds to 0.
0001, while Figures 4.
represent learning rates 001, 0.
002, 0.
005, 0.
01, and 0.
02, respectively.
The results shown in Figure 5 indicate that a learning rate of 0.
001 provides the best overall performance, with high accuracy, low variance, and consistent results across folds.
This learning rate also converges faster than 0.
0001 while maintaining similar accuracy levels, demonstrating an optimal balance between training stability and convergence speed.
Although the dataset and model architecture differ .
, the findings consistently indicate that a moderate learning rate improves convergence and enhances the modelAos learning capability.
Based on the validation training results, to assess the statistical significance of performance differences among learning rates, a one-way ANOVA was conducted using the 10-fold crossvalidation results, with accuracy values averaged across folds.
Assumption tests confirmed that the data were normally distributed (Shapiro-Wilk W = 0.
9614, ycy = 0.
and had homogeneous variances .
ayceycyceycuyce ya = 3639, ycy = 0.
Figure 5.
Comparison of learning rates using stratified 10-fold cross-validation The ANOVA results showed a significant effect of learning rate on model accuracy .
, .
= 1989, ycy = 0.
000575, yuCA = 0.
, with eta squared indicating a large effect size, suggesting statistically significant differences among learning rate configurations.
The use of one-way ANOVA is consistent with previous studies which also assessed significance .
cy < 0.
, .
Furthermore, the descriptive statistics of model performance across the tested learning rates are summarized in Table 1.
As shown, the lower learning rates .
0001 and 0.
yielded the highest mean validation accuracy with relatively low variance, indicating stable model behavior.
In contrast, higher learning rates (Ou 0.
resulted in decreased Table 1.
Descriptive statistics of validation results across various learning rates Learning rate Mean Std dev Min Max The post-hoc analysis .
revealed that significant differences in model accuracy were found only for the highest learning rate .
compared with 0.
0001, 0.
001, and 0.
Table 2 summarizes these comparisons, while the remaining pairwise comparisons did not show significant differences at a significance level of yu = 0.
Evaluating learning rate effects on long short-term memory for Indonesian A (Serly Eldin.
A ISSN: 1693-6930 Table 2.
Post-hoc analysis of differences between learning rates Group 1 Group 2 Mean difference P-adj Lower Upper Significant Yes Yes Yes Evaluation Evaluation was conducted to validate the training results using the test dataset .
Lower learning rates .
0001 and 0.
exhibited more stable convergence, with a one-way ANOVA indicating a significant effect on accuracy.
The modelAos generalization capability was further assessed through confusion matrix analysis under two scenarios: in-domain and cross-domain .
The detailed evaluation results are shown in Table 3.
Table 3.
Classification report: in-domain and cross-domain Domain In-domain In-domain In-domain In-domain In-domain In-domain Cross-domain Cross-domain Cross-domain Cross-domain Cross-domain Cross-domain Learning Accuracy Precision Macro average Recall F1-score Weight average Precision Recall F1-score The results show that, for in-domain data, learning rates of 0.
0001, 0.
001, and 0.
002 achieved the best performance, with accuracies between 78.
49% and 78.
71% and F1-scores around 0.
These results indicate a good balance between precision and recall under the imbalanced data condition.
In contrast, for cross-domain data, the overall accuracy decreased.
Lower learning rates .
0001Ae0.
reached only 33 34%, while higher learning rates .
01 and 0.
slightly improved accuracy to about 36%, with F1-scores ranging from 0.
27 to 0.
Although the differences were not substantial, the results suggest that higher learning rates provide slightly better generalization to other domains, while lower learning rates perform better for in-domain data.
This difference indicates the modelAos limited generalization ability, which may be affected by the distinct characteristics of the datasets: user comments are typically informal and contextdependent, whereas news titles tend to be more formal and topic-focused.
Performance The model performance for different learning rates is shown in Tables 4 and 5.
During the experiments, the training time was recorded using the Python time module, and memory usage was monitored through the psutil library in Google Colaboratory.
The findings reveal that a learning rate of 0.
provided the most efficient training time, while a learning rate of 0.
001 achieved the lowest memory consumption without significantly increasing the training duration.
Overall, the performance across indomain and cross-domain datasets remained relatively consistent, with only minor variations observed despite differences in the number of test samples and evaluation periods.
Table 4.
Average training time and memory usage for 10-fold cross-validation Learning rate Average time .
Average memory (MB) TELKOMNIKA Telecommun Comput El Control.
Vol.
No.
April 2026: 564-573 TELKOMNIKA Telecommun Comput El Control Table 5.
Model performance comparison on in-domain and cross-domain testing Dataset In-domain In-domain In-domain In-domain In-domain In-domain Cross-domain Cross-domain Cross-domain Cross-domain Cross-domain Cross-domain Best fold Learning rate Time .
Memory (MB) CONCLUSION This study analyzed the effect of learning rate variations on the stability and accuracy of LSTMbased text classification models using the Adam optimizer.
The results demonstrate that learning rate significantly influences model performance.
Based on 10-fold cross-validation and a one-way ANOVA test, significant differences were observed among learning rates, with post-hoc analysis indicating that a learning rate of 0.
02 differed significantly from 0.
0001, 0.
001, and 0.
Analysis of the training and validation loss curves showed that a learning rate of 0.
0001 produced the most stable convergence, albeit requiring more Learning rates of 0.
0001 and 0.
002 achieved the highest in-domain accuracy of 78.
71%, while a learning rate of 0.
001 provided an optimal balance between stability and training efficiency, reaching 78.
in-domain accuracy.
Higher learning rates .
01Ae0.
demonstrated better adaptability to cross-domain data, achieving 36% accuracy, indicating potential for improving generalization despite lower absolute accuracy.
Therefore, a learning rate of 0.
001 is recommended for achieving both high accuracy and efficient training, 0001 is preferable when training stability is prioritized.
Higher learning rates .
01Ae0.
can be leveraged to enhance adaptability on cross-domain data.
These findings provide practical guidance for NLP practitioners in Indonesia on effectively tuning learning rates for Indonesian sentiment analysis tasks.
Future research may explore adaptive learning rate mechanisms or scheduler-based optimization and extend experiments to other deep learning architectures, such as bidirectional or attention-based LSTM models, to further improve cross-domain generalization.
ACKNOWLEDGMENTS
We acknowledge the Department of Informatics Engineering.
Faculty of Engineering and Maritime Technology.
Universitas Maritim Raja Ali Haji, for providing the facilities and academic environment that enabled the completion of this research.
FUNDING INFORMATION
Authors state no funding involved.
AUTHOR CONTRIBUTIONS STATEMENT
This journal uses the Contributor Roles Taxonomy (CRediT) to recognize individual author contributions, reduce authorship disputes, and facilitate collaboration.
Name of Author Serly Eldina Tekad Matulatan Novrizal Fattah Fahmitra C : Conceptualization M : Methodology So : Software Va : Validation Fo : Formal analysis ue I : Investigation R : Resources D : Data Curation O : Writing - Original Draft E : Writing - Review & Editing ue Vi : Visualization Su : Supervision P : Project administration Fu : Funding acquisition Evaluating learning rate effects on long short-term memory for Indonesian A (Serly Eldin.
A ISSN: 1693-6930
CONFLICT OF INTEREST STATEMENT
Authors state no conflict of interest.
DATA AVAILABILITY
The data supporting the experimental findings of this study, including the model results on training and test data as well as all related experiment graphs, are publicly available in the project repository:
ttps://github.
com/Serly-Eldina/LSTM-LearningRate-Indonesia-Sentimen.
REFERENCES