JOURNAL INFORMATIC.
EDUCATION AND MANAGEMENT (JIEM)
Vol 8 No 1 .
: September 2025 - February 2026, pp.
ISSN: 2716-0696.
DOI: 10.
61992/jiem.
A 643
Sentiment Analysis of JMO Application Reviews on the Google Play Store Using BERT Hendi Putra Wijaya 1* .
Adhityah Anugrah 1 .
Mira Afrina 1 .
Ali Ibrahim 1 1 Universitas Sriwijaya Article Info Article history:
Received 1 January 2026 Revised 4 January 2026 Accepted 7 January 2026 Keywords:
Sentiment Analysis.
BERT.
Natural Language Processing ABSTRACT The development of digital technology has encouraged increased use of online-based public service applications, including the JMO (Jamsostek Mobil.
application developed by BPJS Ketenagakerjaan to provide easy access for its This application has received many user reviews on the Google Play Store, reflecting the level of satisfaction and public perception of service quality.
However, the large and unstructured volume of comments makes manual analysis This study aims to conduct sentiment analysis on user comments about the JMO application on the Play Store using the Bidirectional Encoder Representations from Transformers (BERT) model.
The research method involves collecting comments through web scraping, text preprocessing .
uch as data cleaning, normalization, and tokenizatio.
, and sentiment labeling .
ositive, negative, and neutra.
Evaluation using precision, recall, and F1-score is employed to describe the The study is expected to identify patterns of user sentiment and public perceptions of the JMO application.
It is also expected to serve as an evaluation material and input for developers to improve service quality and user experience.
This is an open access article under the CC BY-SA license.
Corresponding Author:
Hendi Putra Wijaya | Universitas Sriwijaya Email: hendiputrawijaya@unsri.
Introduction In the context of employment social security services, digital transformation has become crucial due to the high volume of participant interactions, especially through application based services such as JMO, which is used by millions of active workers.
One form of this digital transformation is the Jamsostek Mobile (JMO) application developed by BPJS Ketenagakerjaan as an online platform for employment social security services.
This application allows participants to check account balances, submit claims, and access membership information without the need for in-person visits (BPJS Ketenagakerjaan.
Journal homepage: http://w.
id/index.
php/jiem JOURNAL INFORMATIC.
EDUCATION AND MANAGEMENT (JIEM) Vol 8 No 1 .
: September 2025 - February 2026, pp.
ISSN: 2716-0696.
DOI: 10.
61992/jiem.
A 644
As the use of the JMO application has increased, it has received thousands of user reviews on the Google Play Store, including praise, complaints, and suggestions for improvement.
These reviews directly reflect usersAo perceptions and experiences regarding the quality of the services provided (Google Play Console, 2.
However, the unstructured nature of textual comments makes manual analysis inefficient and prone to subjectivity.
Therefore, an Artificial Intelligence (AI)Aebased approach is required to automatically extract opinions and emotions from user-generated text (Liu, 2015.
Medhat et al.
, 2.
One of the most widely used methods for processing public opinion is sentiment analysis, which is applied to identify and classify opinions into positive, negative, or neutral categories based on textual expressions (Pang & Lee, 2.
Traditional approaches based on lexicons and machine learning algorithms such as Nayve Bayes and Support Vector Machines (SVM) have long been used, but their performance is limited in capturing complex semantic Consequently, deep learningAebased approaches have emerged, enabling a deeper understanding of language context through vector representations.
One of the most influential deep learning models in the field of Natural Language Processing (NLP) is BERT, developed by Google (Devlin et al.
, 2.
BERT employs a Transformer architecture that can capture the full contextual meaning of words within a sentence (Vaswani et al.
, 2.
This approach has demonstrated strong performance in various NLP tasks such as text classification, sentiment analysis, and named entity recognition (Sun et al.
For the Indonesian language.
IndoBERT has been developed as an adaptation of BERT to better fit local linguistic structures (Koto et al.
, 2.
Several studies have shown the superiority of IndoBERT in analyzing Indonesian-language texts compared to conventional models.
IndoBERT has achieved higher accuracy than LSTM and Nayve Bayes in e-commerce review sentiment analysis (Putri et al.
, 2.
Other studies have demonstrated that BERT can classify social media sentiment with accuracy exceeding 90% (Rahmawati & Pratama, 2.
Furthermore.
Oktaviani and Wahyudi .
applied BERT to analyze government application reviews on the Play Store and obtained similar results.
These findings indicate that Transformer-based models are highly suitable for application in public service platforms such as JMO.
Therefore, this study aims to apply the IndoBERT model to identify user sentiment toward the JMO application on the Play Store.
The analysis is expected to provide empirical insights into public perceptions of the performance and quality of the JMO application.
In addition, the results are intended to serve as an evaluation tool and contribute to improvements in digital service quality and user experience in the future.
Research Methodology The stages conducted to determine the sentiment of user comments on the JMO application on the Play Store using the BERT model include: .
data collection, .
data preprocessing, .
data labeling, .
BERT model training, and .
model evaluation.
This study adopts a descriptive quantitative approach with a computational experimental method to analyze user sentiment toward the Jamsostek Mobile (JMO) application on the Google Play Store.
The Journal homepage: http://w.
id/index.
php/jiem JOURNAL INFORMATIC.
EDUCATION AND MANAGEMENT (JIEM) Vol 8 No 1 .
: September 2025 - February 2026, pp.
ISSN: 2716-0696.
DOI: 10.
61992/jiem.
A 645
focus of the study is the application of the BERT model, particularly the IndoBERT variant, to classify sentiments into three categories: positive, negative, and neutral (Koto et al.
, 2.
The research process consists of several main stages: data collection, preprocessing, labeling, training, and evaluation.
Each stage is designed to follow the principles of Indonesian text analysis using deep learning approaches (Devlin et al.
, 2018.
Rahmawati & Pratama, 2.
Data Collection The data used in this study consist of user reviews of the JMO application obtained from the Google Play Store.
The data were collected using a web scraping technique with the Google Play Scraper library implemented in the Python programming language.
The extracted information includes the username, review date, star rating, and the textual content of each Data Preprocessing Comments collected from the Play Store are generally in raw text form and contain various irrelevant elements such as excessive punctuation, emoticons, numbers, hyperlinks, and Therefore, a preprocessing step is required to prepare the text for model training.
The preprocessing stage focuses on handling the informal language that is dominant in JMO reviews, including the removal of non-linguistic symbols, normalization of non-standard words, and text tokenization using the built-in IndoBERT tokenizer to ensure consistency with the modelAos representation.
Subsequently, during the data labeling stage, each comment is classified into three sentiment categories:
Positive: comments containing positive or favorable expressions Negative: comments that include complaints, dissatisfaction, or negative experiences Neutral: comments that are informative, provide suggestions, or do not express a specific emotional tone BERT Model Implementation The main model used in this study is BERT, specifically the IndoBERT variant, which has been pre-trained on Indonesian language corpora.
The implementation stages include:
Pre-trained Model Loading The indobenchmark/indobert-base-p1 model from the Hugging Face Transformers library is used as the pre-trained model.
Text Tokenization The review texts are converted into token IDs according to the BERT input format using the WordPiece tokenizer.
Model Training (Fine-tunin.
The pre-trained BERT model is fine-tuned on the JMO review dataset by adding a classification layer on top of the BERT architecture.
The training parameters are as - Batch size: 16 - Learning rate: 2e-5 Journal homepage: http://w.
id/index.
php/jiem JOURNAL INFORMATIC.
EDUCATION AND MANAGEMENT (JIEM) Vol 8 No 1 .
: September 2025 - February 2026, pp.
ISSN: 2716-0696.
DOI: 10.
61992/jiem.
A 646
- Epochs: 2 - Optimizer: AdamW Sentiment Prediction After the training process is completed, the model is used to predict the sentiment of new user comments.
Model Evaluation Model evaluation is conducted to measure how well the BERT model can correctly classify The dataset is split using an 80:20 ratio to maintain a balance between training and testing, considering that the number of samples in each class has been equalized.
The evaluation metrics used in this study include:
Accuracy: the proportion of correct predictions out of the total number of samples Precision: the degree to which the model correctly identifies a particular sentiment class Recall (Sensitivit.
: the modelAos ability to detect all instances belonging to a specific sentiment class F1-score: the harmonic mean of precision and recall Results and Discussion The study employed several Python functions during the experimental process, particularly in the preprocessing stage, including text lowercasing, duplicate removal, and word The experiments were conducted on a system equipped with an AMD Ryzen 7 5800H processor with Radeon Graphics .
CPUs, approximately 3.
2 GH.
, 32 GB DDR4 RAM, running Windows 11 Pro.
The programming language used was Python version The research dataset consists of 100,000 reviews of the JMO application from BPJS Ketenagakerjaan collected from the Google Play Store.
The dataset underwent preprocessing steps such as lowercasing, normalization to remove non-standard words, and the removal of duplicate comments.
After preprocessing, 49,901 reviews remained for labeling, from which 3,000 samples were selected for each sentiment category.
Table 1.
Data Processing Results Label
Data After Preprocessing Data Used Positive 30,184 3,000 Negative 16,160 3,000 Neutral
3,557
3,000
The dataset used in this study consists of 9,000 user comments, with each sentiment class containing 3,000 samples: positive .
, negative .
, and neutral .
The data were split into training and testing sets using an 80:20 ratio, with the parameters test_size = 2 and random_state = 42 to ensure consistency in data partitioning.
Two BERT-based approaches were evaluated in this study:
Journal homepage: http://w.
id/index.
php/jiem JOURNAL INFORMATIC.
EDUCATION AND MANAGEMENT (JIEM) Vol 8 No 1 .
: September 2025 - February 2026, pp.
ISSN: 2716-0696.
DOI: 10.
61992/jiem.
A 647
BERT without fine-tuning .
, in which the model uses only the pretrained BERT representations for classification without additional training on the application review BERT with fine-tuning, in which the model is retrained using the review dataset to adjust the representation weights so that they become more specific to the target data domain.
Table 2.
Evaluation Results of BERT without Fine-tuning Label Precision Recall F1-Score Positive Negative Neutral Accuracy In the baseline model, an accuracy of 0.
60 was achieved.
The positive class showed the highest performance with a precision of 0.
79 and an F1-score of 0.
75, indicating that the model was more effective in identifying positive comments compared to the other two However, the neutral class still exhibited relatively low performance (F1-score of .
, suggesting that the model had difficulty distinguishing neutral comments from positive or negative ones.
Table 3.
Evaluation Results of BERT with Fine-tuning Label Precision Recall F1-Score Positive Negative Neutral Accuracy The experimental results demonstrate that the BERT model has strong capability in handling sentiment analysis tasks for user reviews on the Google Play Store.
However, the modelAos performance is highly dependent on how well it is optimized for the characteristics of the local dataset.
In the first stage, the baseline BERT model was used without retraining on the application review dataset.
The results showed an accuracy of 0.
60, with the highest F1-score observed in the positive class .
These results indicate that the general representations learned by BERT from English corpora are still able to capture some sentiment patterns in Indonesianlanguage review data.
However, the relatively low performance on the neutral class (F1score of 0.
highlights the modelAos limitations in recognizing ambiguous or implicit sentiment expressions.
After fine-tuning was performed using the same dataset, the accuracy increased to 0.
65, with improvements in F1-scores across all classes, particularly in the neutral class .
49 to Journal homepage: http://w.
id/index.
php/jiem JOURNAL INFORMATIC.
EDUCATION AND MANAGEMENT (JIEM) Vol 8 No 1 .
: September 2025 - February 2026, pp.
ISSN: 2716-0696.
DOI: 10.
61992/jiem.
A 648
and the positive class .
75 to 0.
Fine-tuning had a significant impact on the modelAos ability to understand the specific linguistic context of Indonesian user comments.
Overall, the fine-tuning process made the model more sensitive to variations in informal language, abbreviations, and writing styles commonly found in Play Store reviews.
This is consistent with the theory that a BERT model fine-tuned on domain-specific data can produce more accurate semantic representations (Devlin et al.
, 2.
Conclusions This study analyzed the effectiveness of the BERT model in performing sentiment analysis on Play Store application reviews using a labeled dataset of 9,000 samples categorized as positive, negative, and neutral.
The model was evaluated under two conditions: without finetuning .
and with fine-tuning using 80% of the data for training and 20% for testing.
The results show that:
The BERT model without fine-tuning achieved an accuracy of 0.
60, while the fine-tuned BERT model improved to 0.
Fine-tuning provided a noticeable performance improvement across all classes, particularly for sentiment categories that were previously difficult to identify.
The highest F1-score was achieved in the positive class .
after fine-tuning, indicating the modelAos strong ability to consistently recognize positive sentiment.
It can be concluded that BERT with fine-tuning is more effective for sentiment analysis on application review data, primarily because it is able to adapt word representations and contextual understanding to the characteristics of local user language.
References