Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
Comparative Analysis of Bidirectional Encoder Representations from Transformers Models for Twitter Sentiment Classification using Text Mining on Streamlit Ahmad Fajar Tatang*1.
Mohammad Hasbi Assidiqi2 Computer Science Department.
Faculty of Computing and Information Technology.
King Abdulaziz University.
Saudi Arabia Email: 1atatang0001@stu.
Received : Jun 9, 2025.
Revised : Nov 20, 2025.
Accepted : Nov 22, 2025.
Published : Dec 11, 2025 Abstract Social media platforms like Twitter have become highly influential in shaping public opinion, making sentiment analysis on tweet data crucial.
However, traditional techniques struggle with the nuances and complexities of informal social media text.
This research addresses these challenges by conducting a comparative analysis between the nonoptimized BERT (Bidirectional Encoder Representations from Transformer.
model and the BERT model optimized with Fine-Tuning techniques for sentiment analysis on Indonesian Twitter data using text mining methods.
Employing the CRISP-DM methodology, the study involves data collection through Twitter crawling using the keyword biznet, data preprocessing steps such as case folding, cleaning, tokenization, normalization, and data augmentation, with the dataset split into training, validation, and testing subsets for modeling and evaluation using the IndoBERT-base-p1 model specifically trained for the Indonesian language.
The results demonstrate that the FineTuned BERT model significantly outperforms the non-optimized BERT, achieving 91% accuracy, 0.
91 precision, 90 recall, and 0.
91 F1-score on the test set.
Fine-Tuning enables BERT to adapt to the unique characteristics of Twitter sentiment data, allowing better recognition of language and context patterns associated with sentiment The optimized model is implemented as a web application for practical utilization.
This research affirms the superiority of Fine-Tuned BERT for accurate sentiment analysis on Indonesian Twitter data, providing valuable insights for businesses, governments, and researchers leveraging social media data.
Keywords: BERT.
Fine-Tuning.
Sentiment Analysis.
Social Media.
Text Mining.
Twitter This work is an open access article licensed under a Creative Commons Attribution 4.
0 International License.
INTRODUCTION
Social media platforms, such as Twitter, have become a powerful force in shaping public opinion and influencing decision-making processes.
The vast amount of textual data generated on these platforms presents both opportunities and challenges for sentiment analysis.
Accurately determining the sentiment expressed in tweets can provide valuable insights for businesses, governments and researchers .
, .
However, the inherent complexity and nuances of natural language, coupled with the informal and often ambiguous nature of social media text, pose significant hurdles for traditional sentiment analysis techniques.
To overcome the challenges in sentiment analysis, researchers have explored various methods and algorithms, including deep learning models and ensemble techniques.
One promising approach is to use sophisticated language models such as Bidirectional Encoder Representations from Transformers (BERT) with Fine-Tuning methods .
Although BERT itself is a powerful model, it has some drawbacks, such as limitations in capturing the nuances and special characteristics of certain domains or tasks such as sentiment analysis, as well as differences in data distribution between BERT training data and sentiment analysis data .
By using relevant data.
BERT can learn to understand the unique Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
characteristics of language and context associated with sentiment, such as expressions of emotion, sarcasm, or opinion .
Refinement helps BERT recognize important aspects in language and context that have an effect on overall sentiment, thus improving its ability to better understand and analyze sentiment .
Through a gradual learning process using data relevant to sentiment analysis.
BERT with fine-tuning can learn a wider and more diverse range of language and context patterns, thereby improving its accuracy and overall performance in sentiment analysis tasks on social media data compared to using BERT without fine-tuning .
In several previous studies, various classification methods have been implemented in the real Some of the frequently used classification algorithms are BERT.
BERT algorithm with FineTuning was used by Muhammad Bilal .
, to classify online customer reviews into useful or not useful The use of BERT with Fine-Tuning in the study aims to get a more general approach in predicting the usefulness of online customer reviews compared to traditional machine learning methods.
The main objective of this research is to conduct a comparative analysis of the non-optimized BERT model and the optimized BERT model with Fine-Tuning for sentiment analysis on Twitter data using text mining techniques.
By utilizing the power of text mining, this research aims to extract relevant features and patterns from textual data, thus enabling a comprehensive evaluation of both approaches.
The ultimate goal is to determine which method yields better performance based on the Precision.
Recall, and Accuracy results that will be compared, so that conclusions can be drawn regarding the best algorithm for classifying the sentiment of tweets based on user-supplied keywords or phrases.
METHOD
The research stages using CRISP-DM include Business Understanding.
Data Understanding.
Data Preparation.
Modeling.
Evaluation, and Application.
Business Understanding is essential for problem Data Understanding is necessary for data collection.
Data Preparation is essential for manual data elimination, preprocessing, and labeling.
Modeling is essential for data modeling, word weighting, and classification using the BERT model.
Evaluation is essential to test the validity of the data using a confusion matrix.
Deployment is essential to ensure the web application can run properly using the Python language and Streamlit framework.
By doing deployment, the web application can be published so that it can be accessed and used by other users.
Figure 1.
CRISP-DM Diagram
Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
Figure 1 shows the stages of CRISP-DM .
This research aims to classify positive, negative, and neutral sentiments from tweets using Unoptimized BERT and Optimized BERT through the FineTuning method.
Business Understanding This first stage was to identify and understand the existing challenges in Twitter sentiment analysis using text mining techniques.
the main challenge relates to the efficiency of data collection, where the previous process done manually was not only time consuming but also prone to errors.
As a solution this research proposes the implementation of Node Package Manager .
tweet-harvest package run through Python, enabling automated and real-time data collection.
which significantly speeds up the process while increasing the volume of data that can be processed.
The second challenge is improving the accuracy of sentiment analysis.
where traditional analysis is often unable to capture complex language nuances and specific sentence contexts.
To address this, this research integrates the use of the IndoBERT model, a language model that has been specially trained for the Indonesian language .
This model is designed to be more sensitive to language context and nuances, so it is expected to improve accuracy in sentiment classification.
This research aims to compare the performance of the standard BERT model with the BERT model optimized through Fine-Tuning evaluate the effectiveness of this technique in improving model performance on Twitter sentiment analysis.
Data Understanding The second stage is data collection, which is very important in this research.
The data used comes from Twitter social media.
The data collection process is done by crawling tweet data based on keywords inputted by the user.
The users can enter keywords that are relevant to the topic or subject they want to analyze, then tweet data containing those keywords will be collected for further processing.
In this research, the dataset used is obtained from crawling tweet data using the keyword biznet in Indonesian.
The selection of the keyword biznet is based on the consideration that the word refers to an Indonesian telecommunication company, so it is expected that people's sentiment towards the keyword tends to be neutral, not too positive or negative.
In addition, with Biznet as an Indonesian company, the tweet data obtained is expected to be in Indonesian, so it is in accordance with the qualifications of the IndoBERT model used in this research to perform sentiment analysis on Indonesian tweet data.
Crawling is a text mining technique to automatically collect text data from online resources such as social media, websites, and blogs based on certain keywords .
This process involves a script or computer program that searches for resources, identifies, and downloads relevant text data.
In this research, crawling is used to collect tweet data from Twitter by utilizing npm tweet-harvest.
This technique allows for efficient and automated collection of large amounts of text data as opposed to manual collection which is time and labor consuming.
Data Preparation In the third stage, the collected Twitter data will go through a series of preprocessing processes to prepare it before being classified using the BERT model that has not been optimized and the BERT model that has been optimized with Fine-Tuning techniques.
This preprocessing stage is carried out because it can improve the accuracy of the BERT model.
the preprocessing process in this study includes case folding, data cleaning, tokenization, normalization, data augmentation, and split dataset using techniques such as Fine-Tuning Tokenizer .
Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
Case Folding This stage is used to convert all characters into lowercase letters, because the data obtained is not always structured and consistent in the use of capital letters, so case folding is done to equalize the use of capital letters .
Case folding is done using the lower() function that is available in the Python This can help in the sentiment classification process by eliminating capitalization differences.
Data Cleaning The Data Cleaning stage is very important in the sentiment classification process from social media data.
This process serves to clean noise or irrelevant information such as mentions, hashtags, and URLs in text data before further processing, so as to increase the accuracy of the classification model in determining sentiment .
In this study, the Data Cleaning process is carried out by removing mentions (@usernam.
, hashtags (#), and irrelevant URLs in order to avoid bias or noise in the form of words, symbols, or characters that are not needed.
When modeling is done from data that is not optimal in the presence of excessive noise, the prediction accuracy of the model can decrease and become inefficient in analyzing sentiment.
Tokenization Tokenization is the process of separating text into smaller semantic units .
uch as words or phrase.
before classification.
It is important to isolate "tokens" or semantic components that may affect sentiment classification .
Normalization Normalization is an important process in sentiment analysis to make the text more consistent and uniform .
Normalization steps include: replacing nonstandard or slang words into standard Indonesian forms .
, "gak" becomes "enggak"), correcting typos .
, "sech" becomes "sih"), removing unnecessary special characters .
uch as inappropriate punctuation marks, emojis, and other symbol.
, and changing all letters to lowercase to ensure consistency .
uch as "INDONESIA" and "Indonesia" are considered the sam.
With normalization, the text data becomes cleaner and more structured, which improves the accuracy of the model in sentiment classification.
Data Augmentation Data augmentation is performed using the back translation technique, which translates text from Indonesian to English, then translates it back to Indonesian .
The process includes converting the text to tensor, translating the tensor to English, translating the translation to Indonesian, ensuring the meaning does not change significantly, and merging the augmented data with the original dataset.
This technique increases data diversity by preserving context and meaning, helping the model learn new patterns to improve sentiment classification performance.
Split Dataset The dataset obtained from the crawling process, using the keyword biznet, was divided into three subsets for further processing and analysis.
The first subset, 70% of the data, is used as a training data subset to train the model to learn patterns from the data, both for the BERT sentiment classification model before optimization and BERT with Fine-Tuning, where the more training data, the more patterns the model can learn, thus improving classification accuracy .
The second subset, 20% of the data, is the validation data subset used to monitor the performance of the model during training and prevent overfitting, by periodically evaluating the model during training, so that if performance decreases, adjustments can be made .
Meanwhile, the third subset, 10% of the data, is a test data subset used Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
to evaluate the final performance of the model after training is complete, separate from the trainingvalidation data, and used to calculate evaluation metrics such as accuracy, precision, recall, and F1score.
Modeling The fourth stage is Modeling.
This fourth stage is the core of data classification.
The classification used in this study is BERT without optimization and BERT optimization with Fine-Tuning.
BERT without Optimization BERT (Bidirectional Encoder Representations from Transformer.
is a pre-trained language model that has achieved remarkable results in various NLP (Natural Language Processin.
To apply BERT to specific tasks such as sentiment analysis, a common approach used is fine-tuning BERT (Bidirectional Encoder Representations from Transformer.
without optimization refers to the application of pre-trained language models used directly in analysis tasks without any special adjustments.
This model, which has achieved significant results in various natural language processing (NLP) tasks, is used as the baseline in this study, allowing users to understand the basic performance of BERT under standard conditions, without a fine-tuning process that further adapts the model to specific contexts or data.
Nonetheless, the BERT model without optimization itself, lacks broader world knowledge or higher reasoning that can help in more complex tasks .
Consequently, a fine-tuning process is needed to prevent overfitting of the data, and to better understand the context of the initial data so as to have higher accuracy in sentiment analysis tasks .
, .
BERT with Fine-Tuning BERT with Fine-Tuning is a technique, language model training that involves two stages.
First, the BERT model is initialized with weights that have been pre-trained on a large dataset.
This stage aims to utilize the general knowledge acquired by BERT during the initial training.
Next, the training process continues by adjusting those weights using the specific data of the task to be accomplished, such as sentiment analysis.
These adjustments allow BERT to more effectively map the nuances of language and relevant context from the given data .
Figure 2.
BERT Tokenizer Fine-Tuning BERT involves the concept of Transfer Learning, where the general knowledge that BERT gains when trained on large data can be transferred and adapted for specific tasks with smaller
Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
amounts of data .
, .
Although BERT models themselves lack the wider world knowledge or higher reasoning that can help in more complex tasks, the Fine-Tuning process helps prevent overfitting on data, and allows BERT to better understand the context of the initial data and thus have higher accuracy .
, .
In this research, the IndoBERT-base-p1 model based on BERT-base architecture is used.
Before the Fine-Tuning process, the dataset must be prepared by tokenizing the sentences using BertTokenizer, so that the input received by BERT matches the expected format.
BertTokenizer tokenizes the text, input into a sequence of tokens with a maximum length of 512 tokens.
During the fine-tuning process, the tokens are represented as embedding vectors through several embedding layers, namely token embedding, segment embedding, and position embedding.
These embedding layers are used to incorporate contextual and positional information of the tokens.
The embedding vectors of the tokens are then fed into the BERT encoder stack which consists of 12 identical layers.
Each encoder layer uses a self-attention mechanism to extract contextual relationships between tokens in the sentence.
The output of the BERT encoder stack is a vector representation for each token.
The output of interest is the [CLS] token which represents the entire This vector of [CLS] tokens is then used as input to the classifier, which generates a logit .
ough probability predictio.
of the sentence to be classified into sentiment classes .
ositive, negative, or neutra.
Softmax is then used to convert the logit into probabilities with values between 0 and To optimize BERT's performance in specific tasks, hyperparameters can be adjusted such as batch size, number of epochs, and learning rate.
In this study, the hyperparameters used are encode max sequence length 512, batch size 32, workers 8, learning rate (AdamW) 5e-5, and epoch 5.
With Fine-Tuning techniques.
BERT can utilize the general knowledge that has been previously acquired and customize it for specific tasks such as sentiment analysis, thus achieving optimal performance on the task.
Evaluation This fifth stage aims to evaluate the model that has been implemented.
Evaluation is done by conducting tests.
In this research, the evaluation is carried out by dividing the dataset into three subsets, namely the training subset, validation subset, and test subset, then, the performance of BERT before optimization with Fine-Tuning, and BERT after optimization with Fine-Tuning on the dataset will be evaluated using confusion matrix, such as accuracy, precision, recall, and F1-score .
Table 1.
Confusion matrix True Positive Neutral Negative Positive Predicted sentiment Neutral FNt TNt FNt Negative Precision is the calculation of the estimated proportion of positive cases formulated in .
ycNycE ycEycyceycaycnycycnycuycu = ycNycE yaycE y 100% .
Recall is the calculation of the estimated proportion of correctly identified positive cases and is as shown in .
ycNycE ycIyceycaycaycoyco = ycNycE yaycAyc yaycA y 100% .
Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
F1-score is a calculation of precission and recall that balances the trade-off between these metrics, providing a summary of the model's ability to correctly identify positive cases while minimizing false It is formulated in equation .
ycEycyceycaycnycycnycuycu yycIyceycaycaycoyco ya1 ycIycaycuycyce = 2 y ycEycyceycaycnycycnycuycu ycIyceycaycaycoyco y 100% .
Accuracy is a calculation of the proportion of the total number of correct predictions and is as shown in .
ycNycE ycNycAyc ycNycA yaycaycaycycycaycayc = ycNycE yaycE ycNycAyc yaycAyc ycNycA yaycA y 100% Where: TP TNt FNt : True Positive : True Neutral : True Negative : False Positive : False Neutral : False Negative Deployment The sixth stage is an important implementation in this research.
In this stage, the crawling, preprocessing, and the best model from the training and evaluation process will be implemented on the Streamlit web application.
This application allows users to enter certain keywords or phrases and get sentiment predictions from the processed Twitter data, whether positive, negative, or neutral.
With the Streamlit web application, the results of this research can be widely accessed and utilized by users or other parties who need sentiment analysis from Twitter data.
RESULT
This section is a discussion of the research that has been done.
Starting from the data understanding, data preparation, modeling, evaluation, deployment.
Crawling Result The crawling process is done to collect tweet data from Twitter using the keyword biznet.
Crawling technique with npm tweet-harvest package allows automatic and real-time data collection.
From the crawling results in Figure 3, a dataset of 3.
051 tweet data was obtained which will be used for the training and evaluation process of the sentiment analysis mode.
Figure 3.
Crawling Result
Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
From the results of crawling 3.
051 data using the keyword biznet, the distribution of sentiment in the dataset is obtained as follows: 660 positive sentiments .
6%), 949 neutral sentiments .
1%), and 442 negative sentiments .
3%).
Case Folding Result After obtaining the dataset from the crawling results, the first preprocessing stage performed is case folding.
By performing case folding, the entire text is converted into lowercase letters as can be seen in Figure 4, thus eliminating the difference in the use of uppercase and lowercase letters.
This helps improve data consistency and reduce redundancy in the tokenization and sentiment modeling process.
Figure 4.
Case Folding Result Data Cleaning Result The data cleaning stage is the second preprocessing stage performed after the case folding stage.
The data cleaning process that can be seen in Figure 5 aims to clean the noise so that the accuracy of the model in sentiment analysis can be improved.
In addition, the removal of mentions, hashtags, and URLs can also help maximize the next tokenization process and text normalization.
Figure 5.
Data Cleaning Result Tokenization Result The next preprocessing stage is tokenization.
In Figure 6 of the tokenization results, it can be seen that the text has been separated into tokens.
This separation of text into tokens allows the model to more easily recognize language patterns.
Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
Figure 6.
Tokenization Result Normalization Result In Figure 7, we can see the result of the text normalization process.
By performing normalization, the text becomes cleaner, structured, and uniform.
This helps machine learning models to more easily recognize patterns and extract relevant features from the text, thus improving accuracy in tasks such as sentiment analysis.
Figure 7.
Normalization Result Data Augmentation Result From the initial dataset of 3.
051 data, after augmentation with this technique, the amount of data increased to 6.
016 data as can be seen in Figure 8.
The addition of 2.
965 data or about 49.
3% of the total The addition of this data aims to increase data diversity and help the model learn new patterns, so it is expected to improve sentiment classification performance.
Figure 8.
Data Augmentation Result
Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
Split Dataset Result From the dataset of 6.
016 data, it was divided into three subsets, namely the train subset of 4.
%), the validation subset of 1.
209 data .
%), and the test subset of 596 data .
%).
The details of the split result are shown in table 2.
Table 2.
Split Dataset Result Train Validation Test In the train subset consisting of 4.
211 data, the proportion of positive sentiment is 911 data .
6%), negative 1.
991 data .
3%), and neutral 1.
309 data .
1%).
While in the validation subset of 209 data, the proportion of positive sentiments is 262 data .
7%), negative 571 data .
2%), and neutral 376 data .
1%).
Finally, the test subset consisting of 596 data has a distribution of 129 data .
6%) positive sentiment, 282 data .
3%) negative, and 185 data .
%) neutral.
In general, the pattern of sentiment distribution in the three subsets is quite balanced.
Result Analyze After preprocessing the data, configuring the model is essential to achieve optimal performance.
This research investigates two configurations: BERT without optimization and BERT with Fine-Tuning.
The results of each configuration will be analyzed using metrics such as accuracy, precision, recall and F1-score.
By comparing performance metrics across configurations, this research aims to identify the model that achieves the most robust and accurate results in twitter sentiment analysis.
BERT without Optimization Tests were conducted on the BERT model without optimization using example tweets outside the dataset, as follows:
Text: "wifi biznet lambat sekali".
Label: neutral .
Text: "wifi biznet stabil atau tidak?".
Label: neutral .
Text: "wifi biznet cepat dan lancar".
Label: neutral .
From the three example sentences, it can be seen that the BERT model without optimization tends to predict all sentences as neutral sentiment, even though the first sentence is actually negative and the third sentence is positive.
Table 3.
BERT without Optimization Confusion Matrix True Positive Neutral Negative Positive Predicted sentiment Neutral Negative Furthermore, in Table 3.
BERT without Optimization Confusion Matrix, we can see the model performance on the validation data subset.
From the confusion matrix, results are obtained as in Table Classification Report on BERT without Optimization, with an accuracy of only 0.
32 or 32%.
The precision, recall, and F1-Score values for each sentiment class are also relatively low.
Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
Table 4.
Classification Report on BERT without Optimization Negative Neutral Positive Accuracy Macro Avg Weighted Avg Precision Recall F1-Score Support These results show that the BERT model without optimization is less able to recognize sentiment patterns in tweet data well.
This is because the BERT model used is still general and has not been adapted to the characteristics of Twitter sentiment data.
Therefore, it is necessary to optimize with Fine-Tuning techniques to improve the performance of the model in performing sentiment analysis on Twitter data.
Fine-Tuning will help the BERT model adjust its weight to the characteristics of Twitter sentiment data, so that the model can better recognize language and context patterns associated with sentiment BERT with Fine-Tuning Fine-Tuning Process in this stage, the BERT model pre-trained on a large dataset is weighted using a more specific Twitter sentiment dataset.
This adjustment is done through several additional training epochs as can be seen in table 5 by further learning the language and context patterns associated with sentiment analysis on Twitter data.
Table 5.
Epoch Process on BERT with Fine-Tuning Epoch 1 Epoch 2 Epoch 3 Epoch 4 Epoch 5 Train 1 Validation 1 Train 2 Validation 2 Train 3 Validation 3 Train 4 Validation 4 Train 5 Validation 5 Accuracy Precision Figure 9.
Learning Curve Recall F1-Score
Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
Figure 8 shows the accuracy graph of the BERT model against training and validation data during the fine-tuning process for 5 epochs.
At epoch 1, the accuracy is still low.
However, the accuracy continues to increase significantly in subsequent epochs, both for train and validation data.
This shows that the Fine-Tuning process successfully adjusted the model weights to better learn sentiment analysis patterns on Twitter data.
Tests were also conducted on the BERT model with Fine-Tuning using example tweets outside the dataset, as follows:
Text: "wifi biznet lambat sekali".
Label: negative .
Text: "wifi biznet stabil atau tidak?".
Label: neutral .
Text: "wifi biznet cepat dan lancar".
Label: positive .
The test results show that the BERT model with Fine-Tuning is able to classify sentiment well according to the context of the sentence.
In the sentence "wifi biznet lambat sekali", the model can classify it as a negative sentiment .
oor servic.
On "wifi biznet cepat dan lancar", the model classifies it as positive .
ood servic.
This success is because the Fine-Tuning process helps the model to efficiently learn sentiment-related language and context patterns, resulting in more accurate sentiment classification performance on Twitter data.
Table 6.
BERT with Fine-Tuning Confusion Matrix Validation Set True Positive Neutral Negative Positive Predicted sentiment Neutral Negative Table 7.
Classification Report on BERT with Fine-Tuning Validation Set Negative Neutral Positive Accuracy Macro Avg Weighted Avg Precision Recall F1-Score Support Results on Validation Data in Table 6.
BERT with Fine-Tuning Confusion Matrix Validation Set, we can see the performance of the model on the validation data subset after going through the FineTuning process.
Based on Table 7.
Classification Report on BERT with Fine-Tuning Validation Set, the model accuracy reaches 0.
89 or 89%.
The precision, recall, and F1-Score values for each sentiment class are also very good, ranging from 0.
81 to 0.
Table 8.
BERT with Fine-Tuning Confusion Matrix Test Set
True
Positive Neutral
Negative Positive Predicted sentiment
Neutral
Negative Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
Table 9.
Classification Report on BERT with Fine-Tuning Test Set Negative Neutral Positive Accuracy Macro Avg Weighted Avg Precision Recall F1-Score Support Results on the subsequent Test Data, in Table 8.
BERT with Fine-Tuning Confusion Matrix Test Set and Table 9.
Classification Report on BERT with Fine-Tuning Test Set, the model performance on a separate subset of test data is shown.
There was an increase in accuracy compared to the validation data, to 0.
91 or 91% from the previous 0.
89 in the validation data.
The precision, recall, and F1-Score values for each class are also still very good.
The above results show that by performing Fine-Tuning, the performance of the BERT model in performing sentiment analysis on Twitter data becomes much better than BERT without optimization.
Fine-Tuning helps the model adjust its weights to the characteristics of Twitter sentiment data, so that it can better recognize language and context patterns associated with sentiment expressions.
This can be seen from the significant increase in accuracy, precision, recall, and F1-Score after Fine-Tuning.
Deployment Result At this stage, the developed system is deployed using the Streamlit Community Cloud platform, a free hosting service provided by Streamlit.
Users can access and utilize the system through the following URL: https://soeara-sentweet.
The deployment process involves integrating the core components of the research, including the data crawling module, preprocessing techniques, and the optimized BERT model with Fine-Tuning.
DISCUSSIONS
In this study, we have conducted a comparative analysis between the BERT model without optimization and the BERT model with Fine-Tuning optimization for sentiment analysis on Twitter data using text mining techniques.
From the results obtained, it can be seen that the BERT model with FineTuning optimization shows much better performance compared to the BERT model without In the BERT model without optimization, the accuracy achieved is only 0.
32 or 32% on validation data.
The precision, recall, and F1-Score values for each sentiment class are also relatively This shows that the BERT model without optimization is less able to recognize sentiment patterns in tweet data well.
This is because the BERT model used is still general and has not been adapted to the characteristics of Twitter sentiment data.
In contrast, after optimization with Fine-Tuning techniques, the performance of the BERT model improved significantly.
On the validation data, the accuracy reached 0.
89 or 89%, with precision, recall, and F1-Score values for each sentiment class in the range of 0.
81 to 0.
On split test data, the accuracy even increased to 0.
91 or 91%, with other evaluation metrics also being excellent.
This significant performance improvement indicates that the Fine-Tuning process successfully adapted the BERT model weights to the characteristics of Twitter sentiment data, so that the model can better recognize language and context patterns associated with sentiment expressions.
The results of this study are in line with several previous studies that also show the superiority of BERT models with Fine-Tuning optimization in sentiment analysis tasks.
For example, research
conducted by Muhammad Bilal used BERT with Fine-Tuning for online customer review classification and obtained a more generalized approach in predicting review usability compared to traditional Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
machine learning methods.
However, this study has its own advantages because it focuses on Twitter data in Indonesian and uses the IndoBERT model specifically trained for Indonesian.
In addition, this research also applies several preprocessing techniques such as data augmentation with back translation to improve the performance of the model.
Overall, the results of this study make an important contribution to the development of sentiment analysis methods on Twitter data, especially for the Indonesian language.
With the advantages of the BERT model optimized through Fine-Tuning, sentiment analysis can be performed more accurately and reliably, thus providing valuable insights for various parties that utilize Twitter data.
CONCLUSION
This research has conducted a comparative analysis between the BERT model without optimization and the BERT model with Fine-Tuning optimization for sentiment analysis on Twitter data using text mining techniques.
The analysis results show that the BERT model with Fine-Tuning optimization is superior in performance compared to the BERT model without optimization.
validation data, the accuracy, precision, recall, and F1-score of the BERT model without optimization are relatively low for each sentiment class.
But after optimization with Fine-Tuning, there is a significant improvement in these metrics.
In the test data, the improvement in accuracy and other evaluation metrics is higher using the Fine-Tuning optimized BERT model.
This indicates that the Fine-Tuning process successfully adapted the BERT model to the characteristics of Twitter sentiment data, resulting in better recognition of language and context patterns related to sentiment expressions.
Fine-Tuning facilitates the transfer of BERT model learning from previously acquired general knowledge to the specific task of Twitter sentiment analysis.
Through additional training stages, the model is able to understand the nuances and context associated with sentiment expressions, resulting in more accurate sentiment Overall, this research confirms the superiority of the Fine-Tuning optimized BERT model for sentiment analysis on Twitter data, particularly Indonesian language, with the ability to provide valuable insights for Twitter data users.
CONFLICT OF INTEREST
The authors declares that there is no conflict of interest between the authors or with research object in this paper.
REFERENCES