Informatics. Electrical and Electronics Engineering (Infotro. Volume 5. Number 1, . ISSN 2798-0197 Sinta 4 Accreditation Ae No SK. 177/ E/ KPT/ 2024 http://riset. id/index. php/infotron/article/view/23242 DOI: 10. 33474/infotron. Leveraging BiLSTM for Deep Learning-Based Mental Health Chatbots Nur Afnis Agustinaa. Abd. Charis Fauzanb*. Harlianac Departments of Computer Science. Universitas Nahdlatul Ulama Blitar. Kota Blitar. Indonesia anurafnisagustina30@gmail. babdcharis@unublitar. charliana@unublitar. *Corresponding author Article History: Received 31 January 2025 Reviewed 21 February 2025 Revised 21 March 2025 Accepted 30 May 2025 Lisensi: cc-by-sa A B S T R A CT The high prevalence of mental health issues and limited access to professional information and support have driven the search for innovative solutions. One promising approach is the development of chatbot systems that provide quick and accessible mental health information. This study evaluates the performance of the Bidirectional Long Short-Term Memory (BiLSTM) algorithm in identifying and classifying user inputs within a mental health chatbot system. BiLSTM is chosen for its ability to process sequential data in both directions, allowing it to capture context more effectively than unidirectional models and better understand user intent. Deep learning methods like BiLSTM have also demonstrated higher accuracy compared to traditional machine learning models. This study focuses solely on BiLSTM to evaluate its performance in this context. The mental health dataset used in this study was sourced from previous research published on the GitHub platform and contains 100 classes of mental healthrelated questions and statements. This dataset was used to train the BiLSTM model to recognize user intent and generate relevant responses. The model achieved 98% accuracy on the training data. For evaluation on the test set, a confusion matrix was used, yielding an accuracy of 82%. The chatbot is implemented as a web-based application using a Python framework and is designed to provide users with insights and knowledge through text-based These results highlight the potential of the BiLSTM-based chatbot system to deliver effective and efficient mental health information services. Keywords: Chatbot. Mental Health. BiLSTM. Deep Learning. Classification Introduction Mental health refers to an individual's psychological state, reflecting their ability to adapt and solve problems, both internally . ithin onesel. and externally . n their environmen. Just like physical health, mental health plays a crucial role in maintaining a person's quality of life. encompasses emotional, psychological, and social aspects that influence how individuals interact with their surroundings. Despite its significant impact, mental health often receives insufficient attention. many developing countries, mental health concerns are overshadowed by issues related to infectious diseases . The World Health Organization (WHO) reports that over 450 million people worldwide suffer from mental health disorders, with cases increasing each year . This issue is exacerbated by low awareness of available services, limited mental health literacy, high social stigma, a shortage of trained professionals, and insufficient specialized facilities. These challenges contribute to delays in diagnosis and treatment . Therefore, preventive measures are essential to help individuals recognize early symptoms, reduce social stigma, prevent severe disorders, and improve access to early treatment. According to . , increasing knowledge and awareness is a key strategy in preventing mental disorders and strengthening self-protection against potential mental health triggers. Although mental health disorders affect over 450 million people globally, a major concern lies in the limited access to accurate information and adequate care services. Social stigma often discourages DOI: 10. 33474/infotron. Informatics. Electrical and Electronics Engineering (Infotro. Volume 5. Number 1, . ISSN 2798-0197 individuals from seeking help, even when experiencing symptoms. Additionally, low mental health literacy makes it difficult for many to recognize early warning signs, leading to delayed diagnosis and worsening conditions. The shortage of mental health professionals and specialized facilities further exacerbates the issue, especially in underserved regions. These challenges highlight the urgent need for accessible and preventative solutions that can educate the public, promote early symptom recognition, and reduce stigma. In this context, chatbot-based digital interventions offer a promising alternative by providing fast, reliable, and 24/7 access to mental health information. With advancements in technology, one innovative approach to improving mental health awareness is through digital information services. A promising solution is the development of chatbots that provide users with reliable answers to mental health-related queries. A chatbot is a computer program designed to simulate conversations with humans via text or voice interactions . Research findings indicate that chatbots are highly efficient, cost-effective, time-saving, and available 24/7 for information retrieval . In the context of mental health, several studies have shown that chatbots can serve as effective tools for improving mental health literacy. AI-based chatbots can help reduce mental health stigma by facilitating social contact and encouraging users to open up about their experiences . Chatbots also have the potential to provide accessible and scalable mental health interventions, helping users manage conditions like anxiety and depression . Furthermore, they can provide valuable support, especially in underserved populations, by offering timely access to information and promoting mental well-being . However, a major challenge in chatbot development is ensuring that the chatbot can understand and respond to conversations naturally and contextually, mimicking human-like communication. To address this challenge, this study integrates artificial intelligence (AI) techniques, particularly deep learning, which has proven effective in solving complex problems . Deep learning, a subfield of machine learning, utilizes multi-layered neural networks to process and understand complex data. Several deep learning algorithms can be applied to chatbot systems, one of which is Long Short-Term Memory (LSTM). LSTM has been shown to effectively process long and sequential data, making it well-suited for chatbot applications that require contextual understanding . However. LSTM has limitations, as it processes text in only one direction . To overcome this, researchers developed Bidirectional Long Short-Term Memory (BiLSTM), which allows text to be processed in both forward and backward directions, enhancing the chatbotAos comprehension ability . A previous study by . utilized a dataset containing fictional dialogues from movie scripts and applied LSTM and BiLSTM models. The results demonstrated that the BiLSTM model outperformed the standard LSTM model in terms of accuracy. Previous studies have also implemented LSTM for mental health chatbots but faced challenges such as overfitting due to limited data . However, there is still limited research on the application of BiLSTM for intent classification in mental health domains. This study aims to address these gaps by using BiLSTM with a larger and more varied dataset to enhance performance and contextual understanding. Based on the discussions above, this study employs the Bidirectional Long Short-Term Memory (BiLSTM) algorithm to develop an advanced chatbot system for mental health using text-based conversational data. By leveraging deep learning, this chatbot aims to provide more contextually aware and accurate responses, ultimately improving access to mental health information. This study utilizes a larger and more diverse dataset, applies hyperparameter tuning, and incorporates comprehensive evaluation metrics . recision, recall. F1-scor. to ensure optimal performance and generalization. Method DOI: 10. 33474/infotron. Informatics. Electrical and Electronics Engineering (Infotro. Volume 5. Number 1, . ISSN 2798-0197 Fig. Research Method Based on Fig 1, this study begins with data collection from public sources, consisting of conversation data related to mental health. Next, the collected dataset undergoes text preprocessing to clean and prepare raw text data for processing by machine learning algorithms. This preprocessing involves case folding, removing punctuation, tokenizing, padding, and label encoding. Following this, the modeling stage is carried out using the BiLSTM architecture. Once the BiLSTM model is built, training is conducted to optimize the model's weights. After training, the evaluation phase is performed using a confusion matrix, and the model is assessed based on accuracy, precision, recall, and F1-score. Finally, the deployment phase integrates the trained model into an application that can be accessed by 1 Data Collection The dataset used in this research consists of conversations related to mental health. The dataset is sourced from previous studies available on the GitHub. com platform. This dataset was selected based on its relevance to the topic of mental health issues. It is stored in JSON format, which includes three main elements: tags or labels used to categorize information, patterns that contain variations of sentences users might use to express a specific intent, and responses, which are the chatbot's replies to user inquiries. An example of the dataset as shown in Table 1. Tag Greeting Definisi Table 1. Example Dataset Patterns Responses Hai! Mindcare disini, ada yang bisa dibantu? A Apa yang dimaksud Gangguan mental gangguan atau Kesehatan mental itu kondiri yang A Apa gangguan mental? 2 Data Preprocessing Data preprocessing is carried out to prepare the data. This preprocessing stage begins with data merging, as the data used is obtained from multiple sources. The merging is done by combining the list of intents from each file. Next, data sorting is performed manually to eliminate any identified duplicate The data, initially in JSON format, is then converted into a DataFrame for further text preprocessing, which includes: Case folding Ae the process of converting and standardizing all text to lowercase to ensure that differences in uppercase and lowercase letters do not affect the model . Example: AuSaya SEDIHAy Ie Ausaya sedihAy. Remove punctuation Ae the process of removing punctuation marks such as periods (. ), commas (,), question marks (?), exclamation marks (!), and other symbols that have no meaningful value . Example: Ausaya sedih hari ini!!Ay Ie Ausaya sedih hari iniAy. DOI: 10. 33474/infotron. Informatics. Electrical and Electronics Engineering (Infotro. Volume 5. Number 1, . ISSN 2798-0197 Tokenizing Ae the process of breaking text into smaller units called tokens . Example: Ausaya sedihAy Ie [AusayaAy. AusedihA. Padding Ae the process of adding values . sually zero. to each sequence of numbers so that all sequences have the same length . Example . f max length = . : [Au12Ay. Au7A. Ie [Au12Ay. Au7Ay. Au0Ay. Au0Ay. Au0A. Label encoding Ae the process of converting text-based labels into numerical form . Example: "Stres" Ie 0, "Depresi" Ie 1, "Cemas" Ie 2. Once data preprocessing is complete, the dataset is then split into training data and test data. The training data is used to teach the model to recognize various patterns, while the test data is used to objectively evaluate the model's performance. 3 Modeling At this stage, the BiLSTM architecture is used to build the model. Bidirectional Long Short-Term Memory (BiLSTM) is an advanced version of the LSTM algorithm. The LSTM algorithm consists of three gates: forget gate, input gate, and output gate, which function to control the balance between the input stored in the memory cell and the previous state that needs to be discarded . Equations . , . , and . are applied to each gate in the LSTM architecture. ycnyc = E. cOycuycn . ycuyc ycOEaycn . EaycOe1 ycaycn ) . yceyc = E. cOycuyce . ycuyc ycOEayce . EaycOe1 ycayce ) . ycCyc = E. cOycuycu ycuyc ycOEaycu EaycOe1 ycaycu BiLSTM combines two LSTM networks that operate in opposite directions. One network functions forward . orward laye. , while the other functions backward . ackward laye. The output of BiLSTM is a hidden representation that integrates contextual information from both processing directions . In the BiLSTM architecture, the process occurring in the forward LSTM can be expressed as equation . Meanwhile, the process in the backward LSTM can be expressed as equation . The final output of BiLSTM is a combination of the forward and backward outputs, formulated as equation . Ec yc = yaycIycNycA. EaycOe1 ) Ea EnEcyc = yaycIycNycA. Eayc 1 ) Ea Ec yc. Ea EnEcyc ] Eayc = [Ea . In this research, the BiLSTM model consists of five layers as follows: Input layer, serves as the starting point of the network. It receives an integer vector with a length determined by a predefined limit. This vector is obtained from the data preprocessing stage carried out earlier. Embedding layer. Performs word embedding, which represents words or tokens as dense, highdimensional numerical vectors . These embedding vectors store semantic relationships between words, allowing the model to understand meaning and contextual connections in text BiLSTM layer, executes the Bidirectional Long Short-Term Memory (BiLSTM) method, which consists of both forward and backward LSTM units. This enables the model to capture context from both past and future words, improving text comprehension. DOI: 10. 33474/infotron. Informatics. Electrical and Electronics Engineering (Infotro. Volume 5. Number 1, . ISSN 2798-0197 Dropout layer, helps prevent overfitting by randomly deactivating a certain number of neurons, setting them to zero. This ensures the model does not become overly reliant on specific neurons. Dense layer, serves as the output layer, utilizing the softmax activation function. This function assigns probability distributions across different class labels, determining the likelihood of each input belonging to a specific class . After that, the model selects the class with the highest probability as the final prediction, representing the most likely class corresponding to the input. Based on this highest-probability class, the chatbot system can retrieve the most suitable response to answer user queries. 4 Model Training After the BiLSTM model is built, the next step is training the model. The main objective of training is to optimize the model's weights so that it can learn patterns from the training data. Before training, the model is configured by specifying several key parameters. These include the loss function, such as categorical_crossentropy, which calculates the error between the model's predictions and the actual labels, the optimizer such as Adam which updates the modelAos weights, the learning rate which controls how much the modelAos weights are updated at each training step, the batch size which determines how many data samples are processed before updating the modelAos weights, the epoch which defines how many times the model will go through the entire training dataset and the evaluation metric, such as accuracy, which assesses the modelAos performance during and after training. 5 Model Evaluation The evaluation is conducted using a confusion matrix, as shown in Fig 2. The test data used is separate from the training data to ensure that the model can generate accurate predictions on previously unseen data. The primary objective of evaluating the confusion matrix is to assess the model's performance in classifying test data by comparing the modelAos predictions against the actual labels. The confusion matrix provides information on the number of correct and incorrect predictions for each class. From this matrix, evaluation metrics such as accuracy, precision, recall, and F1-score can be calculated as percentages using the formulas in equations . , . , . , and . Fig. Confusion Matrix yaycaycaycycycaycayc = ycNycE ycNycA ycNycE ycNycA yaycE yaycA ycEycyceycaycnycycnycuycu = ycNycE/. cNycE yaycE) ycIyceycaycaycoyco = ya1 Oe ycycaycuycyce = ycNycE ycNycE yaycA . OycyycyceycaycnycycnycuycuOycyceycaycaycoyc. cyycyceycaycnycycnycuycu ycyceycaycaycoyc. DOI: 10. 33474/infotron. Informatics. Electrical and Electronics Engineering (Infotro. Volume 5. Number 1, . ISSN 2798-0197 Where. True Positive (TP) refers to positive data that is correctly predicted. True Negative (TN) refers to negative data that is correctly predicted. False Positive (FP) refers to negative data that is incorrectly predicted as positive. False Negative (FN) refers to positive data that is incorrectly predicted as negative. 6 Deployment The trained and evaluated BiLSTM model will be saved as the foundation for the chatbot system. This stored model will be integrated into an interactive web application using the Streamlit framework, which was chosen for its ease of development and deployment of Python-based web applications. As a result, the chatbot can be easily accessed and used by users via a website, from anywhere and at any Results and Discussion 1 Data Collection The dataset used in this study consists of 100 tags, 1508 patterns, and 172 responses, all stored in JSON format. The total number of dataset entries was obtained after performing manual merging and sorting of the data. A sample of the JSON dataset is shown in Fig 3. Fig. JSON Dataset Next, the dataset, originally in JSON format, will be converted into a dataframe to ensure compatibility with the machine learning model. This transformation is crucial because machine learning models typically operate with structured data in a tabular format, making it easier to process, analyze, and train the model effectively. The result of converting the JSON data into a DataFrame is shown in Fig 4. DOI: 10. 33474/infotron. Informatics. Electrical and Electronics Engineering (Infotro. Volume 5. Number 1, . ISSN 2798-0197 Fig. Dataframe Dataset 2 Text Preprocessing After the data is converted into a dataframe, the next crucial step is text preprocessing, which aims to clean and prepare the data for more efficient processing and analysis. This step is particularly important for the 'pattern' column, which serves as the primary input for the model. The preprocessing process consists of three main steps. First, case folding is applied to convert all letters into lowercase, ensuring text uniformity. Second, punctuation removal is performed to eliminate irrelevant symbols and punctuation marks that do not contribute to the analysis. Third, tokenization is carried out by breaking the text into individual words or tokens, allowing the model to process them more effectively. By implementing these preprocessing techniques, the efficiency and accuracy of the BiLSTM model can be significantly improved while reducing training complexity. The cleaned and standardized data will help the model better recognize patterns and generate more accurate responses in the chatbot system. The results of these preprocessing steps are shown in Table 2. Table 2. Result of Preprocessing Before Method After Bagaimana cara Case bagaimana cara mengurangi mengurangi risiko Folding risiko bunuh diri? bunuh diri? Remove bagaimana cara mengurangi Punctuation risiko bunuh diri Tokenizing Text token=[AobagaimanaAo. AocaraAo. AomenguurangiAo. AorisikoAo. AobunuhAo. AodiriA. Token=. , 6, 134, 135, 43, . The next step in text preprocessing is padding, which aims to standardize the sequence length of the data. Since text data naturally varies in length, it needs to be adjusted to a uniform length to ensure efficient processing by the model. This sequence length adjustment, known as padding, is crucial for maintaining data consistency during model training. In this study, the maximum sequence length is set to 15, meaning that all sequences whether shorter or longer will be modified to have a fixed length of Shorter sequences will be padded with additional elements, while longer sequences will be Padding ensures that all input data maintains the same dimensions, allowing the model to process it more efficiently and effectively. The result of the padding process is shown in Fig 5. DOI: 10. 33474/infotron. Informatics. Electrical and Electronics Engineering (Infotro. Volume 5. Number 1, . ISSN 2798-0197 Fig. Result of Padding The final step in text preprocessing is label encoding, which is specifically applied to the 'tag' This process aims to convert labels or tags, which are typically in the form of words or phrases, into numerical representations. This transformation is essential because machine learning models, such as BiLSTM, generally operate with numerical data. Label encoding assigns a unique numerical value to each unique label, allowing the model to process and interpret categorical information more effectively. The results of the label encoding process can be seen in Fig 6, where each label in the 'tag' column has been converted into its corresponding numeric representation. With this transformation, both text data and labels have been structured into a format that is ready for training the BiLSTM model. Fig. Result of Encoding After completing the text preprocessing stage, the dataset is divided into two parts: training data and testing data. The split is conducted with a ratio of 80% for training data and 20% for testing data. Based on this proportion, the dataset consists of 1,207 samples for training and 302 samples for testing. This division ensures that the model has sufficient data for learning while also providing a separate dataset to evaluate its performance and generalization ability. 3 Modeling The modeling results using the BiLSTM algorithm are shown in Fig 7. Below is an explanation of each layer in the model: Input layer, receives input with dimensions (None, . , where None represents a flexible batch size, and 15 is the sequence length or the number of input tokens. Embedding layer, maps tokens from the input layer into a 100-dimensional vector representation, resulting in an output of (None, 15, . BiLSTM layer, takes the output from the embedding layer as input, processes it in both directions to capture contextual meaning, and combines the information into a single vector with a dimension of 64. Dropout layer, takes the BiLSTM output as input and randomly deactivates 50% of the elements during training. This prevents overfitting and encourages the model to generalize better. Since dropout only affects training and not the data structure, the output remains a 64-dimensional vector, but with some elements set to zero. Dense layer, serves as the output layer of the model. It receives the vector representation from the previous layer and applies the softmax activation function, producing an output of (None, . , representing the probability distribution over 100 classes . The probability values indicate the model's confidence that the user's input belongs to each class. The class with the highest probability is selected as the model's final prediction. DOI: 10. 33474/infotron. Informatics. Electrical and Electronics Engineering (Infotro. Volume 5. Number 1, . ISSN 2798-0197 Fig. Result of Modeling 4 Model Training After the modeling stage is completed, the next step is configuring the BiLSTM model parameters for training. Prior to finalizing the configuration, several experiments were conducted to determine the most optimal combination of hyperparameters. Based on the evaluation results, the best performance was achieved using the following settings: the loss function Sparse Categorical Crossentropy to measure the modelAos error, the evaluation metric accuracy to assess performance during training, and the Adam optimizer for updating the modelAos weights. A learning rate of 0. 001 was selected to control the weight update step size, batch size was set to 16 indicating that training updates occur every 16 samples, and the model was trained for 100 epochs, meaning it processed the entire dataset 100 times. The training process results are stored in the history variable and are visualized in Fig 8. Fig. Result of Model Training Based on Fig 8, at the 100 epoch, the model achieved a very high training accuracy of 0. 9887 with a significantly low loss value of 0. This indicates that the model has successfully learned patterns and relationships within the training data very well. To visualize the training process, accuracy and loss graphs were generated over 100 epochs, as shown in Fig 9 and Fig 10. These graphs provide a clear representation of the accuracy improvement and loss reduction as the number of epochs increases, demonstrating an effective learning process. DOI: 10. 33474/infotron. Informatics. Electrical and Electronics Engineering (Infotro. Volume 5. Number 1, . ISSN 2798-0197 Fig. Accuracy Graph The results of the model training indicate that the model has been well-trained and exhibits high performance on the training data. The model has effectively learned patterns and features within the data, achieving high accuracy and low loss. However, performance on training data does not always reflect the model's performance on unseen data. Fig. Loss Graph 5 Model Evaluation After completing the model training, the next step is model evaluation using a confusion matrix, with the results shown in Fig 11. Additionally, the model's performance is evaluated on a separate test data that the model has not seen before. The evaluation metrics, including accuracy, precision, recall, and F1-score, are presented in Table 3. By comparing the performance on the test data with the training results, we can assess the model's ability to generalize to new, unseen data. The model's performance on the test data confirms the consistency and reliability of the training outcomes. DOI: 10. 33474/infotron. Informatics. Electrical and Electronics Engineering (Infotro. Volume 5. Number 1, . ISSN 2798-0197 Fig. Confusion Matrix of BiLSTM Table 3. Classification Report (Sampled Classe. Class Precision Recall manfaat_rutinitas_harian_untuk_menjaga_kes ehatan_mental obat_depresi definisi_kesehatan_emosional terapi_psikologi A. cara_penanganan_ocd faktor_genetik_gangguan_mental latihan_relaksasi_dapat_membantu_menjaga_k esehatan_mental macro avg weighted avg F1-Score Support A. Fig 11 and Table 3 display only a small subset of the 100 classes to simplify the visualization. However, the evaluation metrics calculations still include all 100 classes. The dataset already includes 100 diverse classes, providing a broad range of mental health topics for evaluation. While this ensures variability in the analysis, adding additional test classes that cover more specific or less common mental health conditions could further enhance the model's ability to generalize to a wider range of real-world However, the current set of 100 classes is already sufficient for the purpose of this study. The evaluation results indicate that the BiLSTM algorithm achieved strong overall performance, with an DOI: 10. 33474/infotron. Informatics. Electrical and Electronics Engineering (Infotro. Volume 5. Number 1, . ISSN 2798-0197 accuracy of 82. 5%, an average macro precision of 83. 7%, an average macro recall of 84. 8%, and an average macro F1-score of 82. The BiLSTM model demonstrated exceptional performance in classifying dominant classes such as 'emotional health definition', 'psychological therapy', and 'benefits of daily routines for mental health', achieving a perfect F1-score of 1. Additionally. BiLSTM performed well on the 'antidepressant medication' class, with an F1-score of 0. This suggests that the model's performance may be influenced by the amount of available data for each class, highlighting the importance of balancing class distribution to improve classification accuracy, especially for minority 6 Deployment The results from the chatbot system deployment phase, including its user interface and functionality, are presented in Fig 12. The chatbot is accessed via a website with a simple and userfriendly design, featuring a text input area for users and an output area displaying chatbot responses. Fig. Chatbot Interface Conclusion This study successfully implemented the Bidirectional Long Short-Term Memory (BiLSTM) algorithm in the development of a chatbot for mental health information services. The BiLSTM model demonstrated strong performance during training, achieving 98% accuracy. However, its accuracy on test data dropped to 82%, indicating that the model needs improvement in adapting to new data. Nevertheless, an 82% accuracy still highlights the modelAos potential in correctly classifying user inputs. To further enhance chatbot performance, future research is recommended to utilize a larger and more diverse dataset, as well as implement various optimization techniques to improve the modelAos The development of this BiLSTM-based chatbot is expected to provide more accessible and responsive mental health information services for users. References