Advance Sustainable Science. Engineering and Technology (ASSET) Vol. No. January 2024, pp. 0240102-01 0240102-08 ISSN: 2715-4211 DOI: https://doi. org/10. 26877/asset. A Good Evaluation Based on Confusion Matrix for Lung Diseases Classification using Convolutional Neural Networks Izza Putri Kamila*. Christy Atika Sari2. Eko Hari Rachmawanto3. Nur Ryan Dwi Cahyo4 Faculty of Computer Science. University of Dian Nuswantoro. Imam Bonjol No. Semarang. Central Java, 50131. Indonesia *111202214398@mhs. Abstract. CNN has been widely used to detect a pattern with image classification. This study used CNN to perform a classification analysis of lung abnormality detection on chest X-ray The dataset consists of 5,732 2D images with dimensions of 200 x 200 x 1 divided into training data . %) and testing data . %). The preprocessing process includes image resizing, enhancement to increase contrast and reduce image complexity, and filtering to improve visibility and reduce noise. CNN is used to classify imagery into three categories. Normal . o abnormalitie. Pneumonia, and Tuberculosis. The results showed a good level of accuracy, with an accuracy value of 97. 79% and a continuously high accuracy value of 100% in 6 trials. This research provides insights into the detection of lung disorders and encourages further exploration in medical diagnosis. Keywords: Pneumonia. Tuberculosis. TBC. CNN. Image Classification (Received 2023-11-03. Accepted 2023-11-14. Available Online by 2023-11-. Introduction Lungs are one of the important organs of the human respiratory system. The lungs can experience abnormalities, such as pneumonia, and tuberculosis when exposed to pathogens in the form of bacteria, viruses, fungi, or parasites . A bacterial illness called pneumonia results in swelling and inflammation of the lungs' alveoli, or air sacs . In Indonesia, pneumonia is one of the leading causes of mortality among children under the age of five. Pneumonia will affect up to 278,261 children under the age of five in 2021. Meanwhile. Tuberculosis (TB) is a contagious infection caused by Mycobacterium tuberculosis . According to the World Health Organization (WHO), there will be 2. 97 million tuberculosis (TB) patients in Southeast Asia by 2021, with Indonesia ranking second in the area. Meanwhile, the Ministry of Health of the Republic of Indonesia says that 969 thousand individuals would be infected with tuberculosis by 2023 . If this lung condition is not detected in time, it might lead to death. Detection can be used to determine whether or not someone has a certain disease. Detecting lung abnormalities is difficult in the medical world due to the complexity of the anatomical structure of the lungs, the diversity of illnesses that can be diagnosed, and the necessity for high precision. Based on this topic, researchers suggest study using artificial intelligence, specifically convolutional neural networks, to diagnose and categorize discovered lung illnesses such as pneumonia and TB. From these problems, researchers proposed a study using Convolutional Neural Network (CNN) to recognize, and classify lung disorders detected in the form of Pneumonia and Tuberculosis. CNN method for predicting abnormalities in the lungs. CNN has advantages, such as being efficient in processing images, having a high level of accuracy, resistant to noise, and having automatic feature extraction . However, to get a high accuracy value, it is necessary to do augmentation if the data to be processed is relatively small. In addition. CNNs are computationally intensive to train and run models, are difficult to interpret certain models, are not always suitable for processing or processing 2D and 3D images, and CNNs are classified as vulnerable to changes in images, such as light, rotation, and/or added noise . In a previous study conducted by Harshvardhan GM and colleagues in 2021 on Detecting Pneumonia Using CNN and a Chest X-ray. In his study, the authors used CNN to classify binaries with chest X-ray data describing cases of impactful and non-impacted pneumonia. The accuracy resulting from the study was 93%. Meanwhile, a study conducted by Abdulfattah Alawi and colleagues in 2021 explored the Convolutional Neural Network Model for Tuberculosis Disease Screening. According to the research, the study used a CNN-based model to separate areas of the lungs in classifying chest X-ray images whether infected with tuberculosis or not. In research conducted by Abdulfatah et al, accuracy results 71% were obtained . To recognize, detect and classify the abnormalities of this disease, you can use X-rays, magnetic resonance imaging (MRI) and computed tomography (CT). X-rays are frequently used to examine a patient's chest cavity by looking for characteristics of any bone density. These images can be used to determine whether there are abnormalities in the patient's lungs or not. The approach used in this study tries to distinguish, diagnose, and categorize anomalies in patients' lungs. As a result, it is critical to recognize and detect the existence or absence of the patient's lungs as soon as possible. The CNN approach was utilized in this work to identify these lung illnesses, particularly pneumonia and TB. Thus, artificial intelligence can be used to aid in the early detection and treatment of lung illnesses, particularly in places with high incidence rates, such as Indonesia . Methods In this study, to classify lung disorders based on X-Ray images. Convolutional Neural Network was The presented flow diagram illustrates the steps in the proposed method. Figure 1. Research Methodology Dataset Figure 2. Normal Image Sample. Pneumonia Image, and Tuberculosis Image. The dataset used in this study was a 2D X-ray radiography image of the lungs totaling 5,732 samples with pixel dimensions of 200 x 200 x 1. Based on the 5,732 images, categorized into two different parts, namely training data with each section including 245 images of tuberculosis, 3,054 images of pneumonia, and 1,573 images of normal lungs, and testing data totaling 860 samples. Training data is used to study and recognize the pattern of each type of lung disorder to be categorize. Data from testing is utilized to evaluate the efficacy of proposed procedures in correctly identifying the types of lung illnesses in question. This study's dataset was obtained via the Kaggle platform . Preprocessing In this step, the dataset is divided into 15% testing data and 85% training data. The first step is to resize the previous image such that all processed photos are the same size and dimensions. The image then undergoes an image enhancement process, focusing on the area that will be recognized by the next process by increasing the contrast value in the image to be processed. Then, there is a process of reducing color information's complexity and simplifying. This conversion procedure allows you to focus on pixel intensity values for simpler analysis, as well as extract characteristics from photos for additional processing or analysis. In addition, this modification is intended to boost computing efficiency and simplify some image processing processes . Convolutional Neural Network (CNN) Convolutional neural network is a form of artificial neural architecture that was created primarily for image processing and visual pattern detection. CNN architecture learns the internal structure of features and generalizes those features to image analysis problems, such as object recognition and computer CNN is made up of three major layers: a convolutional layer, a pooling layer, and a fully connected layer . Convolutional layers have filters and image maps as a list of coordinates relative to a specific image and aim to extract features from the input image by maintaining spatial relationships between pixels. The pooling layer has the effect of reducing the spatial dimension of the image features obtained from the previous process and helps reduce the number of parameters and calculations required, as well as preventing overfitting. Figure 3. Chest X-Ray Classification Process with Convolutional Neural Network Method. There are two types of pooling layers, the max pooling layer keeps the maximum value in each zone and reduces the image size and average pooling layer that maintains the average information in an image The fully connected layer functions as a link between the previous layers of the network and the subsequent layers. The classification process will be performed by CNN starting by processing the selected image, then applying image adjustments to provide adjustments to the image without change the intensity, in the form of increasing or decreasing contrast, adjusting brightness and color. Then, after going through the image adjustment process, the image will continue to be processed at the three major The images will then be classified based on the main goal of the classification process . Confusion Matrix An important evaluation tool in the measurement of CNN classification models is used to understand the extent to which classification models are successful in correctly classifying data. The confusion matrix consists of four main components. True Positive (TP) denotes the value of correctly identified positive cases. True Negative (TN) denotes the value of correctly categorized negative situations, the number of negative case values classified as positive by the model is indicated by False Positive (FP), and False Negative (FN) refers to the number of cases that are truly positive but are labeled as negative by the model. To calculate the confusion matrix, you may utilize an Accuracy. Precision. Recall, and F1-Score formula . yaycoycycycaycycn = ycNycA ycNycE y 100% ycNycA ycNycE yaycA yaycE ycEycyceycycnycycn = ycNycA y 100% ycNycA yaycA ycIyceycaycaycoyco = ycNycA y 100% ycNycu yaycE ya1 Oe ycIycaycuycyce = 2 y ycIyceycaycaycoyco O ycEycyceycycnycycn y 100% ycIyceycaycaycoyco ycEycyceycycnycycn Results and Discussion This study used a dataset of 5,732 images with dimensions of 200 x 200 x 1 from three different classes. The dataset is divided into 15% testing data and 85% training data. Each class in the training data amounted to 245 tuberculosis images, 3,054 pneumonia images, and 1,573 normal lung images, and testing data totaling 860 samples. Then, preprocessing is applied to each image for filtering using image adjustments, such as increasing the contrast. Furthermore, the classification uses CNN to classify lung disorders, namely Normal or not experiencing pneumonia or tuberculosis. Pneumonia, and Tuberculosis. The values of the confusion matrix's four main components are as follows. Figure 4. Confusion Matrix Result To obtain the confusion matrix values, training data must be generated to provide the component values contained in the confusion matrix, namely True Positive. True Negative. False Positive, and False Negative. As indicated in Table 1, the performance of the classification model utilized may be determined or estimated using Accuracy. Recall (Sensitivit. Precision. Specificity, and F1-Score. Table 1. Confusion Matrix. No. Partition (Testing-Trainin. Training 15% - 85% Confusion Matrix Precision Specificity Accuracy Recall F1-Score 1st Training 97,79% 96,73% 97,79% 98,89% 97,28% 20% - 80% 2nd Training 97,03% 97,84% 93,54% 96,52% 95,64% 25% - 75% 3rd Training 97,49% 95,45% 97,14% 98,55% 96,29% 30% - 70% 4th Training 97,79% 96,58% 97,10% 95,51% 96,84% Training's Graphics 97,79% 97,49% 97,03% 97,84% 96,73% 96,58% 95,45% 98,89% 98,55% 97,79% 97,14% 97,10% 96,52% 95,51% 97,28% 96,84% 96,29% 95,64% Specificity F1-Score 93,54% Accuracy Recall Precision Confusion Matrix 1st Training 2nd Training 3rd Training 4th Training Figure 5. TrainingAos Graphics In Table 1 above, the author tried four divisions of testing and training data, namely the first training 15%-85%, the first training 20%-80%, the third training 25%-75%, and 30%-70%. It can be seen in Figure 5, the highest accuracy is 97. 79% which is the result of 1st training and 4th training. However, for the highest Recall. Precision. Specificity, and F1 values are generated at 1st training. Thus, in this study the dataset used to detect as much as 15% of testing data and 85% of training data. After acquiring the Confusion Matrix value, more testing is performed to evaluate the model's The Confusion Matrix provides a clear picture of the extent to which the model is able to correctly classify data, comprising data on four main component of confusion matrix. With this information, we can make more informed decisions about how well the image processing model works and whether improvements or adjustments need to be made as tested in the following table. Table 2. Dataset Testing Actual Class Classification Result Result Normal Normal True Pneumonia Pneumonia True No. Input Image Table 2. Dataset Testing Continue Actual Class Classification Result Result Tuberculosis Tuberculosis True Normal Normal True Pneumonia Pneumonia True Tuberculosis Tuberculosis True No. Input Image Table 3. Comparison Results Research Method Total Dataset Precision Accuracy Specificity Sensitivity (Recal. Score Rahman et CNN with Gm. Kumar Gourisaria, et CNN ANN Rahman T. , et CNN Chexnet Proposed Method CNN Based on Table 3, lung anomalies examined by three research using different approaches, the writer recommends utilizing the CNN method in the next investigation. The CNN approach makes it feasible to operate more efficiently and accurately. Conclusion In the study above using a dataset of 5,732 samples with dimensions of 200 x 200 x 1 which were divided into 85% training data and 15% testing data. These samples include samples of normal lungs, lungs with pneumonia, and lungs with tuberculosis. Each sample class amounted to 1,573 normal lung samples, 3,054 lung samples with pneumonia, and 254 lung samples with tuberculosis. Samples categorized using the CNN approach had an accuracy value of 97. 79% and a continuously high accuracy value of 100% in 6 trials. Overall, the CNN method can work well in the classification of lung abnormality detection. In future study, the architectural evolution of CNN may be used to attain the same accuracy outcomes when deciding the amount of testing data and training data. And it may combine or compromise with other designs to improve accuracy and efficacy in detecting lung problems, such as CNN ChexNet. Recurrent Neural Network, or Long-Short Term Memory (LSTM). Combining CNN with ChexNet. RNN, or LSTM can result in a more comprehensive method to medical image processing, focusing on the spatial correlations between pixels in CNN pictures and comprehending the temporal sequence in sequential images like CT-Scans and MRIs. This can assist the medical community in dealing with issues linked to changes in medical pictures during the illness detection process, such as rotation and the addition of noise. References