ICMSE . : 74-84 International Conference on Mathematics. Science, and Education QRCBN 62-6861-0508-697 https://proceeding. id/icmse Performances Analysis of Convolutional Neural Network VGG16 Model on Covid-19 Classification Using Radiographic and CT-Scan Images Fifin Dewi Ratnasari 1. Nida Rihadatul Aisy Nahdah 2 Author Affiliations Department of Physics Universitas Negeri Semarang. Indonesia Sekaran. Gunungpati, 50229 Department of Physics Universitas Negeri Semarang. Indonesia Sekaran. Gunungpati, 50229 Author Emails Corresponding author:nidanahdah@gmail. fifin_fisika@mail. Abstract. Corona Virus Disease 2019 (COVID-. infection is an infectious disease that is a public health concern Diagnosis through medical images can identify COVID-19 cases to combat the virus. However, as the number of COVID-19 cases continues to increase, the time available to review cases is limited. This can lead to heavy workload and high stress levels for radiologists, which in turn can lead to errors in analyzing images . issed finding. Therefore, automatic detection of COVID-19 infection based on deep learning is needed to analyze medical images such as radiographs and CT-Scans quickly and efficiently. This study aims to analyze the performance of the VGG16 model in performing identification between COVID-19 and Normal on radiographic and CT-Scan images. To find the optimal model for the classification task, the VGG16 model was tested with two approaches: without and with transfer learning (TL). A total of 800 images were used in this study and divided into 3 parts, 80% for training, 10% for validation, and 10% for testing. The results showed that the VGG16 model with TL achieved the highest accuracy value, which was 7% on radiographic images and 73. 3% on CT-Scan images. This information indicates that the use of transfer learning method with VGG16 model proposed in this study is effective in recognizing binary class cases using radiographic and CT-Scan images. Thus, the application of this model can provide significant benefits and contributions in disease diagnosed, help reduce radiologist workload, and improve patient handling. INTRODUCTION Coronavirus Disease 2019 (COVID-. is an acute respiratory infection transmitted through droplets or respiratory secretions such as coughing and sneezing. Patients infected with COVID-19 have symptoms similar to flu-like illness such as high fever, chest pain, headache, dry cough, sore throat and several other symptoms of respiratory illness (Huang et al. , 2. The beginning of the spread of the virus began with reports from several hospitals in the Huanan region on December 31, 2019. Several patients were reported to have the same infection and symptoms and then diagnosed with a new virus called severe acute respiratory syndrome coronavirus 2 (SARS-CoV-. The high level of international migration activity is responsible for the global spread of the infection. The World Health Organization (WHO) declared COVID-19 as a global pandemic on March 11, 2020 (Ahmad, 2. From December 31, 2019 to October 2, 2022 there were 615,310,890 total confirmed cases of COVID-19 with 6,525,708 total deaths (WHO, 2. The Indonesian government announced the first COVID-19 case on March 6, 2020, where a woman was detected positive for COVID-19 whose transmission came from a Japanese citizen (Tosepu et , 2. Indonesia has gone through at least two pandemic waves starting from December 2020 and then entering the peak of the first wave on January 25, 2021, with the number of cases per week reaching 98,902 cases. At the peak of the second wave on June 21, 2021, the number of weekly cases increased almost 5 times higher than the first peak with the number of cases reaching 125,396 cases (Haryono et al, 2. Accurate and rapid diagnostics are essential in identifying and treating COVID-19 patients due to the high transmission rate of the infection. There are several COVID-19 diagnostic tests used by the public, including ICMSE . : 74-84 International Conference on Mathematics. Science, and Education QRCBN 62-6861-0508-697 https://proceeding. id/icmse Reverse Transcriptase Polymerase Chain Reaction (RT-PCR) with nasopharyngeal and oropharyngeal swabs. RT-PCR examination is considered the recommended primary standard in COVID-19 diagnostics (Aditia, 2. However. RT-PCR is considered less efficient because it requires a longer time of about 5-6 hours compared to other diagnostic methods in obtaining diagnostic results. This can further expand the spread of infection (Stogiannos et al. Medical images have an important role in the projection of anatomical and physiological images of the human body so that they can become useful clinical information (Rundo et al. , 2. Currently, thoracic radiography and computed tomography scan (CT-Sca. are vital thoracic imaging modalities in the early detection of COVID-19 cases (Aljondi et al. , 2. CT-Scan is known to have a higher sensitivity of 98% in identifying human lung abnormalities, compared to other methods (Long et al. , 2. However, the American College Radiology (ACR) does not recommend the use of radiographic images and CT-Scan as first-line COVID-19 diagnostic methods (ACR. Radiographic images and CT-Scan have limitations and low accuracy in distinguishing between COVID-19 and other respiratory infections (Li et al. , 2. Moreover, radiologists often experience physical and mental fatigue due to high workload (Cao et al. , 2. In the face of an ever-increasing number of COVID-19 cases, radiologists are faced with complex challenges. The case review process takes a considerable amount of time and can result in fatal errors, such as missed findings (Zhou et al. , 2. Deep learning (DL) methods are widely applied to medical imaging fields such as segmentation, classification, and assessment of disease severity (Nabavi et al. , 2. Convolutional neural network (CNN), one of the DL methods, is commonly used as it has achieved remarkable success in medical image analysis to date. Their predictive functions and automatic identification feature also make them popular in disease diagnosis. Fundamental CNN architectures such as AlexNet. VGGNet. ResNet. GoogleNet are widely used for disease classification through medical images. Research on the use of the VGG16 model, one of the VGGNet family, has been widely used in the classification of COVID-19 in radiographic and CT-Scan images including research by Ramadhan et al. modifying the VGG16 model by reducing 138 million parameters to 40 million parameters and obtaining an accuracy of 97. 5% in multiple classification and 99. 7% for binary classification, research by Sarki et al. modified the VGG16 model with fine-tuning and contrast enhancement on radiographic images and achieved accuracy, sensitivity, and specificity values of 100%. Mishra et al. applied a transfer learning (TL) approach to the VGG16 model in detecting COVID-19 and obtained an accuracy value of more than 97%. Kogilavani et al. compared several CNN models, including VGG16, the model has the best performance with an accuracy of 68%. Zouch et al . applied data augmentation techniques on VGG19 using two images, namely radiographic and CT-Scan images with an accuracy rate of 99. 35% and 84,87%. Furthermore. VGG16 has also been proven in other disease detection, such as gastrointestinal diseases (Xiao et al. , 2. Based on the research that has been stated, the VGG16 model has good model performance in the form of high accuracy values in classifying COVID-19. Therefore, the purpose of this study is to determine the comparison of performance results between the VGG16 non-transfer learning (TL) model and with TL in binary classification, namely COVID-19 and Normal on radiographic and CT-Scan images. METHOD Model development environment The CNN model development environment in this study uses the Google Collaboratory (Cola. cloud computing platform based on Jupyter Notebook for the training and testing process. Jupyter Notebook configures python 2 and 3 as well as several libraries required for DL visualization such as TensorFlow. Keras. Matplotlib. NumPy and scikit-learn so that no manual installation is required to start the program (Carneiro et al. , 2. Colab, with support from the GPU, can speed up the runtime process, supporting DL learning, especially CNN (Praveen Gujjar et al. In addition, all datasets and programming codes are stored in Google Drive which is integrated with Colab. All experiments in this study were conducted using laptop hardware with Intel Celeron hardware specification with 4 GB of RAM. ICMSE . : 74-84 International Conference on Mathematics. Science, and Education QRCBN 62-6861-0508-697 https://proceeding. id/icmse Research Flow This research examines the binary classification between COVID-19 infected and normal lung medical images using the CNN VGG16 architecture model. Figure 1 shows the workflow from beginning to end of the proposed CNN model. The following will explain in detail for each step listed in the research flow. Data aggregation This research aggregates radiographic image datasets for binary or 2-class classification (COVID-19 and Norma. The COVID-19 dataset prepared is sourced from an open-source data repository. COVID-19 Radiography Database (Rahman et al, 2. for both normal and COVID-19 radiographic image datasets. All images are in *. format which will then go through a pre-processing stage. A total of 800 radiographic images are used for training, validation and testing processes which are divided into binary classes, namely COVID-19 and normal. The images are randomly sorted into three parts, 80% of the images for the training stage, 10% of the images for validation and 10% for testing to construct a balanced dataset. The distribution of the dataset is shown in Table 1. Modality Radiography CT-scan Class COVID-19 Normal COVID-19 Normal Training Validation Total images Testing Total Data preprocessing The collected images must go through a resizing stage because they come from datasets with dimensions that still vary. Resizing serves to reduce or eliminate the effects of interference and homogenize the image resolution to 224x224 pixels. Data Augmentation ICMSE . : 74-84 International Conference on Mathematics. Science, and Education QRCBN 62-6861-0508-697 https://proceeding. id/icmse Data augmentation is applied to improve the learning performance of the model by creating a variety of training data sample images (Shorten et al. , 2. Various Augmentation techniques have been applied to the training In addition, several parameters are also applied to this method. Detailed value of hyperparameters and data Detail value Hyperparameters Batch size Epoch Optimizer Learning rate Dropout ReduceOnPlateu Re-scaling Width shift range Height shift range Shear range Zoom range Horizontal flip Vertical clip augmentation are tabulated in Table 2 Radiography CT-Scan Adam 0,001 Factor=0,3 AU Patience = 5 Data Augmentation Adam 0,001 1/255 True True 1/255 True True Building of VGG16 Model Architecture The VGG16 model architecture in this study was built with two test approaches, namely the approach without transfer learning (TL) and the approach with TL. Approach without TL The VGG16 model architecture in this approach consists of a 13-layer convolution with a 3x3 kernel followed by a ReLu activation function, 5 max pooling layers to reduce the dimension of the feature map, a flatten layer, and finally several dense layers along with a softmax function for classification. Figure 3 illustrates the architectural arrangement of the VGG16 model trained using radiographic images for binary class classification. Approach with TL One of the problems often encountered in DL is the insufficient scale of training data that affects the performance of CNN models. Transfer learning (TL) is a solution to the problem of limited datasets with more efficient running time (Zhao, 2. In this study, the TL process is applied to the VGG16 model architecture, where the model has been previously trained using the ImageNet dataset. There are two stages of TL, the first step is to import the pre- trained CNN model VGG-16 through the Keras library. The feature extraction technique is applied to the model by freezing the convolution base . eature extraction laye. and modifying the classification layer. In this case, the layer will be replaced with a new classifier layer. ICMSE . : 74-84 International Conference on Mathematics. Science, and Education QRCBN 62-6861-0508-697 https://proceeding. id/icmse The architecture of the VGG-16 model with TL on radiographic images is shown in figure 4. After the image feature extraction layer, there is a classification layer consisting of a global average pooling (GAP) layer, which takes the average value of each feature vector. GAP helps in reducing overfitting and reduces the number of parameters that the model can learn and has a low computational burden (Hsiao et al. , 2019. Lin et al. , 2. After the GAP process, two dense layers with 512 neurons and the activation function ReLu. In each dense layer, there is a dropout layer that serves to avoid overfitting with a dropout level of 0. Finally, sofmax activation function is added as a classifier. Performance evaluation Evaluation of the CNN model's performance in predicting the target class is based on the confusion matrix with four variables, namely TP (True Positiv. TN (True Negativ. FP (False Positiv. , and FN (False Negativ. binary classification with two classes, namely COVID-19 as a positive class and Normal as a negative class. The confusion matrix with four variables is illustrated in figure 4. aAU True Positive (TP) : the amount of data that is correctly predicted as a positive class of COVID-19 model. aAU True Negative (TN) : the amount of data that is correctly predicted as a negative Normal class by the aAU False Positive (FP) : the amount of data that is incorrectly predicted as a positive COVID-19 class by the model, even though the real class is negative Normal. aAU False Negative (FN) : the amount of data that is incorrectly predicted as a negative Normal class when the actual class is positive COVID-19. Furthermore, these parameters are used to evaluate the performance of the model, with metrics in the form of accuracy, precision, and recall values in classifying binary classes (COVID-19 and Norma. This aims to obtain ICMSE . : 74-84 International Conference on Mathematics. Science, and Education QRCBN 62-6861-0508-697 https://proceeding. id/icmse information about the classification quality of the proposed VGG16 model and the extent to which the model can distinguish between the two classes correctly and accurately. The formula of three-performances metrics is as yaycaycaycycycaycayc = ycNycE ycNycA ycNycE yaycE ycNycA yaycA ycNycE ycNycE yaycA ycEycyceycaycnycycnycuycu = ycIyceycaycaycoyco = Model CNN ycNycA ycNycA yaycE Class Precision Recall Accuracy (%) Radiography VGG16 without TL VGG16 with TL COVID-19 Normal COVID-19 Normal 0,50 1,00 1,00 0,89 1,00 0,00 0,88 1,00 50,00% 93,75% CT-Scan VGG16 without TL VGG16 with TL COVID-19 Normal COVID-19 Normal 0,50 1,00 1,00 0,00 50,00% 73,33% RESULTS AND DISCUSSION Several parameters such as epoch 50, batch size 32, learning rate 0. 001 and Adam optimizer were applied to the VGG16 model to assess the performance of the model on radiographic and CT-Scan images. There are two treatments applied to the VGG16 model, the model without TL and the model with TL. The image dataset has been divided into three parts, 80% for training, 10% for validation and 10% for testing. Based on the evaluation of the performance of the tested model, several evaluation metrics were generated, namely precision, recall, and accuracy of each image modality. ICMSE . : 74-84 International Conference on Mathematics. Science, and Education QRCBN 62-6861-0508-697 https://proceeding. id/icmse Table 3 shows the performance evaluation metrics of the VGG16 model on radiographic and CT-Scan images to identify COVID-19 and Normal lungs in the test dataset. Based on the radiographic images test results, the VGG16 model without TL can only achieve an average accuracy of 50%. However, with TL, the model experienced a significant increase in average accuracy to reach 93. In the table, the highest precision and recall values were obtained by the VGG16 model with TL of 1. 00 and 0. 89 for the COVID-19 case while 0. 88 and 1. 00 for the Normal The average precision and recall values obtained by the model for COVID-19 and Normal radiographic images are 0. 945 and 0. Based on the performance evaluation of the VGG16 model using the CT-scan image dataset, the VGG16 model with TL experienced an average increase in accuracy of up to 73. When viewed from the precision and recall values, the model also obtained quite high values compared to the model without TL, namely 0. 66 and 0. 98 in the COVID-19 class and 0. 97 and 0. 48 in the Normal class. The average precision and recall values generated by the model for CT-Scan images are 0. 82 and 0. Figure 5 displays the confusion matrix showing the performance of the VGG16 model without TL using radiographic images. The confusion matrix results in Figure 5 . show that there are 40 images that are correctly classified as COVID-19 (TP), 40 Normal images that are incorrectly predicted as COVID-19 (FP). However, there are no FN and TP because the value is 0. Figure 5 . shows that the model can recognize 35 images correctly (TP), even though there are 5 cases of COVID-19 misclassified as Normal (FN). Likewise, 40 Normal cases were correctly classified (TN) and no Normal cases were misclassified as COVID-19 cases (FP). The confusion matrix used to see the classification ability of the VGG16 model performance is shown in Figure on CT-scan images. Based on Figure 5 . , 60 images were classified as COVID-19 (TP) and no images were correctly classified as Normal (TN). Figure 5 . shows that there are 59 correctly detected COVID-19 images (TP), one incorrectly detected COVID-19 image as Normal (FN). Meanwhile, 29 Normal image samples were correctly predicted (TN) but 31 Normal images were incorrectly predicted as COVID-19 (FP). ICMSE . : 74-84 International Conference on Mathematics. Science, and Education QRCBN 62-6861-0508-697 https://proceeding. id/icmse Figure 6 are data visualizations between the training and validation stages to provide a glimpse of the ICMSE . : 74-84 International Conference on Mathematics. Science, and Education QRCBN 62-6861-0508-697 https://proceeding. id/icmse performance of the VGG16 model with TL against epoch. Based on both figures, the model curve on radiographic images has the most stable increase in accuracy when compared to CT-Scan images. Figure 6 . shows that the accuracy value of the training data at the initial stage . is about 48%, then increases slowly at each epoch, and reaches an accuracy of about 89% at the last epoch . Similarly, the accuracy value of the validation data obtained a value of about 85% at the end of the epoch. Meanwhile, in Figure 6 . , the training data loss value decreases dramatically to 0. 3 at epoch 8, and then reaches stability until epoch 50. Likewise, the validation data loss decreases sharply in the first five epochs. Figure 6 . shows a significant increase in accuracy on the training data. In the first epoch, the accuracy value is only about 42% then increases to 78% in the last epoch. Meanwhile, the accuracy value of the validation data at the beginning of the epoch was around 62% then reached 95% at the last epoch. Although, there were fluctuations during the training process, the overall accuracy value of the training and validation data experienced a drastic The fluctuations can be seen when the accuracy value increases and decreases irregularly without any clear Figure 6 . displays the training data accuracy loss graph, at the beginning of the epoch is at a value of 1. then decreases until it reaches 0. 5 at the end of the epoch. Similarly, the validation data loss only decreased by about 5 from the beginning to the end of the epoch. Based on the result, it is proven that the TL method effectively produces optimal model performance with limited datasets. This is evidenced by high accuracy, precision, and recall values, as well as accuracy and loss graphs that show stability in radiographic and CT-Scan images. Additionally, the confusion matrix results also support the effectiveness of the TL method by classifying between the COVID-19 and Normal classes. Figure 5 . shows that the model with TL has a good level of accuracy, with 35 COVID-19 images and 40 Normal images correctly classified on radiographic images. However, in Figure 5 . , the model still experiences a high error rate, with 31 COVID-19 images incorrectly classified as Normal on the CT-Scan images. The model still faces difficulties in classifying correctly for some specific cases, especially in identifying the Normal class. On the other hand, the VGG16 model without TL cannot distinguish between the COVID-19 and Normal classes because all Normal images in the test dataset were predicted as COVID-19, so the model also has no FP and TP values since both are 0. The CNN model used in this study is similar to the model used in research by (Sahinbas et al. , 2. , where the accuracy results obtained from the study were 80%. However, the model in this study managed to achieve a 13% improvement in accuracy compared to that model. The VGG16 model proposed by the study conducted by Mohammadi et al. obtained an accuracy value close to the accuracy results in the research model of 93. with a difference of 10%. In contrast to the total images used in this study. Ramadhan et al. collected 2159 radiographic images for training and testing. Moreover, the study also reduced the number of VGG16 parameters to 40 million by applying only four blocks so that the accuracy reached 99. However, classification that uses many image samples will increase the computational load so that it can slow down the performance of the hardware Previous research conducted by Mishra et al. on CT-Scan has resulted in a model accuracy value of 99%. There is a difference of about 30% between the accuracy values of the two models and the model used in this study. Although, using the VGG-16 architecture, the model used is considered less than optimal in detecting COVID-19 lungs through CT-Scan. They recommend fine-tuning techniques on hyperparameters by finding the most optimal combination of values so that the model can achieve good performance. After adjusting the parameters, some of the parameter values applied to the model to achieve optimal performance are learning rate of 0. 00007, epochs of 500, and so on. However, it is important to remember that this research faces hardware limitations that affect the implementation of very detailed hyperparameter values in the CNN model, thus increasing the computational power Furthermore, the Colab session usage time also limits parameter settings with more specific values. CONCLUSION The VGG16 pre-trained architecture convolutional neural network (CNN) model has been applied to binary classification (COVID-19 and norma. using radiographic images. Two approaches were used to evaluate the performance of the CNN model: with or without transfer learning. Based on the research findings, it can be concluded that the model with transfer learning can effectively solve the problem of dataset limitation in this study. The model achieved high classification performance for COVID-19 identification with an accuracy of 93. 70% on radiographic images and 73. 33% on CT-Scan images. The model is capable of being a clinical diagnosis that can ICMSE . : 74-84 International Conference on Mathematics. Science, and Education QRCBN 62-6861-0508-697 https://proceeding. id/icmse help medical personnel in the radiology department. A limitation of our study is the inadequate hardware capability to run the program with a large number of datasets. In the future, updating hardware components can improve performance when processing larger datasets and parameters. REFERENCES