Available online at website: https://jurnal.
id/index.
php/RESTI JURNAL RESTI (Rekayasa Sistem dan Teknologi Informas.
Vol.
9 No.
1385 - 1397
e-ISSN: 2580-0760 CNN-Based Skin Cancer Classification with Combined Max and Global Average Pooling Chairani Fauzi1.
Fitra Salam S.
Nagalay2* 1,2Institute Informatics and Business Darmajaya.
Lampung.
Indonesia 1chairani@darmajaya.
id, 2fitra.
2321211010@mail.
Abstract Skin cancer represents a significant threat to human health, with a rising incidence of new cases annually.
Timely identification is essential for enhancing recovery rates, however, conventional diagnostic methods such as biopsy are often invasive, timeconsuming, and costly.
To address this issue, artificial intelligence-based diagnostic systems, particularly Convolutional Neural Networks (CNN.
, offer a promising solution for enhancing diagnostic accuracy and efficiency.
This study seeks to assess the efficacy of a CNN model that integrates Max Pooling and Global Average Pooling for the detection of skin cancer in digital dermoscopic pictures.
The ISIC dataset was used, focusing on two classes, malignant and benign.
The combination of Max Pooling and GAP is intended to increase model precision while reducing the ri sk of overfitting.
The experimental results show that the proposed model achieved a precision of 96.
35%, indicating strong performance in minimizing false However, the recall was relatively low at 85.
99%, suggesting reduced sensitivity in detecting malignant cases.
The overall accuracy of the combined model was 91.
68%, slightly lower than the Max Pooling-only model .
79%).
Although the combination does not significantly improve accuracy, it effectively enhances precision to 96.
This is a critical advantage in a clinical setting, as it directly translates to minimizing false positive diagnoses and preventing patients from undergoing unnecessary invasive procedures like biopsies.
Keywords: convolutional neural network.
global average pooling.
max pooling.
skin cancer How to Cite: C.
Fauzi and F.
Nagalay.
AuCNN-Based Skin Cancer Classification with Combined Max and Global Average PoolingAy.
RESTI (Rekayasa Sist.
Teknol.
Inf.
), vol.
9, no.
6, pp.
1385 - 1397.
Dec.
Permalink/DOI: https://doi.
org/10.
29207/resti.
Received: May 2, 2025 Accepted: July 23, 2025 Available Online: December 16, 2025 This is an open-access article under the CC BY 4.
0 License Published by Ikatan Ahli Informatika Indonesia Introduction The human body's most extensive organ is the skin, functioning as a critical shield against environmental hazards like ultraviolet rays, harmful microorganisms, physical trauma, and toxic substances .
Ideally, the skin should be prioritized to remain healthy and diseasefree.
However, due to poor hygiene, environmental factors, extreme weather conditions, and allergies, the skin becomes vulnerable to various diseases, one of which is skin cancer .
Skin cancer is one of the most dangerous diseases globally, with increasing incidence over the past few decades .
Data from the World Health Organization (WHO) indicates that over 1.
5 million individuals were diagnosed with skin cancer globally in 2022, resulting in approximately 60,000 fatalities .
Although the number of cases in Indonesia remains relatively low, early detection is crucial to prevent severe complications or even death if left untreated in later stages .
There are two general types of skin cancer, benign and malignant .
Malignant malignancies have greater aggressiveness and the ability to metastasize, encompassing disorders such as melanoma, vascular lesions, basal cell, actinic keratosis, and squamous cell.
Meanwhile, benign types such as melanocytic nevus and benign keratosis are less dangerous but still require accurate diagnosis .
Early diagnosis significantly improves the chance of recovery, with some studies reporting up to 95% survival if detected early .
Early detection is often challenging due to the subtlety of initial symptoms and the reliance on the expertise of Inexperienced practitioners may misdiagnose or overlook critical signs .
Biopsy, the most commonly used method, involves removing tissue samples and is often painful, slow, and expensive .
Fauzi et al Jurnal RESTI (Rekayasa Sistem dan Teknologi Informas.
Vol.
9 No.
Therefore, there is a pressing need for faster, less invasive, and more cost-effective diagnostic methods.
The rapid progression of artificial intelligence and machine learning has facilitated the creation of computer-aided diagnostic (CAD) programs that can analyze medical imaging with considerable precision .
A multitude of research have successfully utilized deep learning, particularly Convolutional Neural Networks (CNN.
, for the classification of medical images .
CNNs can automatically extract features from raw images, outperforming traditional diagnostic methods and even rivaling dermatologists in accuracy .
However, a significant challenge in CNNs lies in the selection of the appropriate pooling method.
fundamental element of CNN architecture is the pooling layer, a mechanism designed to lower image dimensionality while preserving essential feature information .
with two commonly used methods in CNN namely local pooling such as max pooling and global pooling such as global average pooling (GAP) .
Max pooling retains the largest value in each pooling area, helping to capture salient local features .
On the other hand.
GAP pools the average of all features and improves model generalization .
Previous studies have often used these pooling techniques independently.
For instance.
Luqman Hakim applied Max and Average Pooling, achieving only 75% accuracy .
Reynaldi Saputro achieved 92.
accuracy using Max Pooling with different training configurations .
Teresia R.
Savera compared CNN and K-Nearest Neighbor (KNN) for skin cancer classification, where CNN performed better, though still with modest accuracy .
Another study by Lokesh Kumar used GAP in brain tumor classification and achieved a high accuracy of 97.
48%, highlighting GAPAos potential to reduce overfitting and improve generalization .
The data indicate that the integration of Max Pooling and GAP may enhance the advantages of both methods, resulting in improved efficacy in skin cancer categorization.
This study aims to develop a CNN model that incorporates both Max Pooling and GAP to detect skin cancer automatically from digital images.
Using the International Skin Imaging Collaboration (ISIC) dataset, this research investigates the effect of combining these pooling techniques on the performance of CNN models.
It is expected that this combination can enhance detection accuracy while reducing overfitting.
This research seeks to augment the reliability and efficiency of skin cancer diagnostic techniques, enabling early detection and enhancing patient outcomes.
While many novel CNN architectures have been proposed, there is limited research that specifically investigates the critical tradeoff between general classification accuracy and clinical The work seeks to address that deficiency by offering a comprehensive comparative analysis to ascertain how the integration of existing pooling approaches might be maximized for enhanced diagnostic outcomes, with primary focus on decreasing false positives.
Research Methods This research involved several stages to develop and assess a model integrating Max Pooling and Global Average Pooling (GAP) within a Convolutional Neural Network for skin prediction, including literature review, data collection, preprocessing, model construction, training, and evaluation.
A detailed description of each stage of this research is presented in Figure 1.
Figure 1.
Research Flow for CNN-Based Skin Cancer Classification Fauzi et al Jurnal RESTI (Rekayasa Sistem dan Teknologi Informas.
Vol.
9 No.
The research flow began with a literature review to explore existing approaches in medical image classification, particularly those involving deep learning and pooling strategies in CNN architectures.
This step also helped identify gaps in previous studies, where most have implemented either Max Pooling or GAP in isolation, without exploring the combined effect of both.
1 Data Collection At this stage, a dataset is obtained from the ISIC Archive dataset (International Skin Imaging Collaboratio.
, which provides a large set of labeled dermoscopic images for both benign and malignant skin cancer .
5,500 benign and 5,105 malignant images were used, resulting in a slight imbalance between the two classes.
To mitigate the potential impact of this imbalance, class weighting was applied during model training, ensuring that the minority class was given appropriate emphasis .
Furthermore, data preprocessing stage to enrich image diversity and improve the model generalization capability .
Samples of benign and malignant image data are shown in Figure 2.
Following resizing, normalization was performed to adjust the pixel value scale to a consistent range between 0 and 1 .
The raw .
jpg images consist of RGB channels (Red.
Green.
Blu.
, where pixel intensities vary between 0 and 255 .
This wide range can negatively impact model training, as neural networks tend to converge faster when input values are within a smaller, standardized range .
Therefore, in this step, each pixel value in the image was divided by 255 to produce a normalized matrix, as represented in Equation 1.
ycIyceycycaycaycoyce = ycnycoyci/255 .
img represents the pixel matrix of the image.
This process ensures that each pixel value lies within a consistent scale, enhancing training efficiency and contributing to a more stable learning process .
The final preprocessing step was data augmentation, applied to enrich the dataset by introducing variations to existing images.
Several augmentation techniques were implemented, including a rotation range of 10 degrees, as well as width and height shift ranges of 0.
This allows images to be rotated and shifted up to 10% in both vertical and horizontal directions.
To handle the empty areas generated by these transformations, the fill mode was set to AynearestAy, which replaces missing pixels with the value of the nearest neighboring pixel .
These strategies are extensively employed to augment the training dataset and facilitate the model's acquisition of more generalized patterns, rather than depending exclusively on the original visual patterns.
As a result, augmentation is essential for enhancing model resilience and reducing overfitting during training .
3 CNN Model Figure 2.
Benign and Malignant Samples 2 Preprocessing Data Following data collection, the study implements a preprocessing framework consisting of image resizing, normalization, and augmentation.
These steps are requisite to ensure scalar uniformity and to bolster the modelAos robustness when encountering unseen data .
To address the inherent dimensional variations within the ISIC dataset, all images were standardized to a uniform resolution of 256x256 pixels.
The selection of this size was based on an experiment conducted by Suhendro Y.
Irianto et al.
, which demonstrated that although the difference in validation accuracy between various image sizes was not significant, the execution time remained within an acceptable range.
The experiment showed that resizing images to 256y256 pixels resulted in a training accuracy of 0.
9721 and a validation accuracy of 0.
9520, with an average execution time of 26.
4 seconds.
These results indicate that this resolution offers an ideal trade-off between computational efficiency and classification accuracy.
Following the completion of data preprocessing phase, the study proceeded to the construction of the CNN CNN are extensively utilized in digital image analysis, distinguished by their superior accuracy and robust capacity for identifying intricate patterns .
HereAos the proposed CNN model, shown in Figure The CNN architecture created in this study is specifically designed for the binary classification of skin lesions.
As depicted in Figure 3, the network accepts input images with dimensions of 256y256 Structurally, the model is composed of two primary distinct segments, the feature extraction layers and the classification layers .
The feature extraction component consists of three convolutional blocks.
Each block includes two Conv2D layers followed by a Rectified Linear Unit (ReLU) activation function and batch normalization, which helps stabilize the training process and improve convergence speed .
, .
Max pooling is implemented following each block to downsample Fauzi et al Jurnal RESTI (Rekayasa Sistem dan Teknologi Informas.
Vol.
9 No.
spatial dimensions and minimize computational complexity, while effectively retaining the most salient features .
To further improve generalization and prevent overfitting.
Dropout is introduced after each block .
Figure 3.
The Proposed CNN Architecture Design As the input data progresses through the layers, the number of filters increases progressively.
This hierarchical architecture facilitates the acquisition of progressively abstract and complex feature Initial layers detect elementary patterns, such as edges and textures, whereas deeper layers discern more intricate structures, including lesion morphology and boundaries.
Consequently, this multilevel learning capability renders CNNs highly effective for classification tasks, as it eliminates the necessity for manual feature engineering .
Following the feature extraction phase, the output is processed by a Global Average Pooling (GAP) layer.
GAP downsamples the spatial dimensions by calculating the mean value of each feature map.
This operation streamlines data representation and minimizes the volume of trainable parameters, effectively serving as a mechanism to mitigate overfitting .
The output generated by the GAP layer is subsequently propagated to a Fully Connected layer, followed by an additional Dropout layer to reinforce network The architecture culminates in a sigmoid activation function within the output layer, producing a probability value between 0 and 1 for the final categorization .
, 0 for benign and 1 for malignant.
4 Training Model The model was trained using k-fold cross validation, a commonly used statistical method.
K-fold cross- validation mitigates undue reliance on a particular data subset and offers a more reliable and stable assessment of the model's generalization capability by utilizing the entire dataset for both training and testing .
The dataset is partitioned into k subsets of comparable size, commonly referred to as folds.
During each cycle, the model utilizes k-1 folds for training purposes, while the single remaining fold serves as the validation data.
This cycle repeats until every specific fold has functioned as the validation set exactly once.
After all iterations, the performance metrics like as loss and accuracy are averaged to get a final estimate of the model's performance.
This method assures that no data is wasted and that the evaluation takes into consideration the variety in data distribution.
Evaluation Model To validate the model trained with k-fold crossvalidation, a confusion matrix was employed to quantify performance deviations between predicted and actual classes.
This approach provides a comprehensive assessment of the categorization capabilities by breaking down the output into four major metrics.
True Positive (TP).
False Positive (FP).
False Negative (FN), and True Negative (TN) .
True Positives denote the quantity of positive instances accurately classified by the model, whereas False Positives represent negative instances erroneously categorized as positive.
Conversely.
False Negatives refer to positive cases incorrectly labeled as negative, whereas True Negatives Fauzi et al Jurnal RESTI (Rekayasa Sistem dan Teknologi Informas.
Vol.
9 No.
measure the accurately detected negative instances.
Based on these values, several evaluation metrics are derived to assess the classification performance.
Accuracy, as shown in Equation 2, measures the overall correctness of the model.
ycNycE ycNycA yaycaycaycycycaycayc = ycNycE ycNycA yaycE yaycA Precision, defined in Equation 3, measures the ratio of accurately identified positive predictions to the total number of positive predictions.
ycEycyceycaycnycycnycuycu = ycNycE ycNycE yaycE Recall, presented in Equation 4, denotes the model's capacity to identify true positive cases .
ycIyceycaycaycoyco = ycNycE ycNycE yaycA F1-Score in Equation 5 represents the harmonic mean of precision and recall, providing a balance between the two metrics ya1 Oe ycIycaycuycyce = 2 y ycEycyceycaycnycycnycuycu y ycIyceycaycaycoyco ycEycyceycaycnycycnycuycu ycIyceycaycaycoyco These metrics provide detailed insight into the modelAos strengths and weaknesses in distinguishing between the two classes .
For critical tasks like skin cancer screening.
Recall is of particular significance, as minimizing false negatives is essential to prevent the adverse consequences of missed diagnoses .
, .
Furthermore, the Area Under the Receiver Operating Characteristic Curve was computed to assess the overall effectiveness of each model in differentiating between benign and malignant categories .
Results and Discussions Following the research stages outlined previously, this study has been fully executed, ranging from data acquisition to performance assessment.
The process began by gathering dermoscopic images from the ISIC Archive, which were utilized as the primary dataset to train and validate the CNN model.
A total of 10,605 images were used in this study, consisting of 5,500 images labeled as benign and 5,105 images labeled as This balanced distribution ensured that the model could learn effectively from both classes and avoid bias toward one category.
After collecting the data, all images were resized to a consistent dimension of 256y256 pixels to standardize input dimensions and facilitate efficient training.
The next step was image normalization, in which pixel values originally ranging from 0 to 255 were rescaled to a range between 0 and 1.
The purpose of this normalization was to ensure uniform data scaling, allowing the neural network to train more stably and converge more quickly.
Figure 4.
The Pixel Distribution of RGB Channels Benign and Malignant Before and After Normalization Figure 4 illustrates the distribution of pixel values in both benign and malignant image sets before and after The graphs show that the overall distribution patterns remain similar, but the values are compressed into a smaller scale.
Following normalization, data augmentation was applied to enrich the dataset with varied image samples and improve the modelAos ability to generalize .
Augmentation techniques used in this study included rotating the images and shifting them horizontally and vertically by 10% of the image size.
These transformations were applied to increase dataset variability while preserving lesion structure.
Figure 5.
Original and Augmented Image Samples in Benign Class Figure 6.
Original and Augmented Image Samples in Malignant Class Figures 5 and 6 display sample images from the benign and malignant classes, respectively, before and after The augmented samples show realistic variations in orientation and position, helping to improve the model to become more robust in detecting lesions under different conditions .
, .
After the preprocessing process is completed, this study proceeded with the implementation of two CNN model architectures for analysis.
The first architecture is the Fauzi et al Jurnal RESTI (Rekayasa Sistem dan Teknologi Informas.
Vol.
9 No.
proposed model, which combines Max Pooling and Global Average Pooling, as summarized in Table 1.
Table 1.
Model summary of the proposed CNN architecture using a combination of Max Pooling and Global Average Pooling Layer Block Block 1 Block 2 Block 3 Classifier Layer Type Conv2D.
Conv2D Batch Normalization MaxPooling2D Dropout Conv2D.
Conv2D Batch Normalization MaxPooling2D Dropout Conv2D.
Conv2D Batch Normalization MaxPooling2D Dropout GlobalAveragePooling2D Dense Dropout Dense (Outpu.
The proposed architecture is organized into sequential convolutional blocks.
The initial block comprises two Convolutional (Conv2D) layers equipped with 32 Then, a second convolutional layer with the same configuration is applied, followed by batch normalization again to further improve stability .
After these two convolution layers, a max pooling layer is used to reduce the spatial dimensionality of the extracted features .
, as well as a dropout layer 0.
2 to reduce overfitting .
Furthermore, the second block expands the network depth to two convolutional layers with 64 filters, each followed by a batch normalization layer to stabilize the activation.
After that, a max pooling layer with a pool size of 2x2 is again applied.
Subsequently, a dropout layer with a rate of 0.
3 is integrated to mitigate the risk of overfitting.
The third block features a pair of convolutional layers equipped with 128 filters and a 3x3 kernel size, both utilizing ReLU activation and succeeded by a Batch Normalization layer.
After these two convolution layers, a max pooling layer with a pool size of 2x2 is used to further reduce the spatial dimensionality, followed by a dropout layer 0.
Once the convolution and pooling process is complete, the model uses a Global Average Pooling (GAP) layer to produce a one-dimensional vector of size 128.
This process helps to capture important information from all the extracted features and reduces the spatial dimension Then, there is a dense layer with 256 units that uses the ReLU activation function and L2 regularization to strengthen the feature representation, followed by a dropout layer with a level of 0.
Finally, the model has an output layer with 1 unit that uses a sigmoid activation .
Filters/ Units Kernel Size Activation ReLU ReLU ReLU ReLU Sigmoid To compare performance, the second model architecture is built using only Max Pooling, without the GAP component.
The structure follows a similar convolutional block pattern as the proposed model, as detailed in Table 2.
In this model summary max pooling alone without combination starts with the first convolution layer (Conv2D) which has 32 filters with a 3x3 kernel size, using the ReLU activation function.
This layer is followed by batch normalization to stabilize the activation distribution during training.
The second convolution layer also has 32 filters and a 3x3 kernel size, followed by batch normalization.
After that, a max pooling operation (MaxPooling2D) using a 2y2 pool size to downsample the spatial features.
Moving to the second block, the network implements a convolutional layer containing 64 filters with 3y3 kernels, immediately followed by batch normalization for improved training stability.
This is succeeded by another identical convolutional configuration .
filters, 3y3 kernel.
with batch normalization.
The block concludes with another 2y2 max pooling operation for further spatial reduction.
In the third block, the network doubles the filter count with a convolutional layer using 128 filters .
y3 kernel.
paired with batch normalization.
This is followed by an identical convolutional setup .
filters, 3y3 kernel.
also with batch normalization.
The block finishes with a final 2y2 max pooling layer to additionally compress the spatial dimensions.
Upon completion of the convolution and pooling stages, the feature maps are reshaped into a one-dimensional The resulting vector is then fed into a dense layer comprising 256 units, utilizing the ReLU activation function alongside L2 regularization to Fauzi et al Jurnal RESTI (Rekayasa Sistem dan Teknologi Informas.
Vol.
9 No.
improve the robustness of the extracted features.
Subsequently, a dropout rate of 0.
5 is implemented to mitigate overfitting.
The architecture concludes with an output layer utilizing a sigmoid activation function to execute binary classification between malignant and benign cases.
Table 2.
Model summary of the baseline CNN architecture using Max Pooling only.
Layer Block Block 1 Block 2 Block 3 Classifier Layer Type Conv2D.
Conv2D Batch Normalization MaxPooling2D Dropout Conv2D.
Conv2D Batch Normalization MaxPooling2D Dropout Conv2D.
Conv2D Batch Normalization MaxPooling2D Dropout Flatten Dense Dropout Dense (Outpu.
Both CNN models were trained and tested using k-fold cross-validation.
In this study, the parameter k was established at 5, signifying that the dataset was partitioned into five equal-sized folds.
During each iteration, four folds were utilized for training, while the remaining fold functioned as validation data.
The dataset was split proportionally to preserve the original distribution between benign and malignant classes across all folds, ensuring that each subset reflected a balanced representation.
Each fold was trained for 50 epochs using preprocessed images from both classes.
The training aimed to evaluate the consistency and stability of the models' performance over multiple subsets of the data.
address the slight class imbalance, class weighting was applied during training by calculating weights based on the frequency of each class in the training data for every This approach helped reduce model bias toward the majority class .
and improved sensitivity to the minority class .
The subsequent figure depicts the accuracy and loss curves for the model employing the combination model over folds 1, 3, and 5 in Figure 7.
Based on the accuracy and loss graphs for each fold, several observations can be made.
In Fold 1, training accuracy shows a consistent upward trend from 80 to 0.
Validation accuracy fluctuates in the early epochs but stabilizes around 0.
toward the end of training.
Training loss steadily decreases from 0.
8 to 0.
2, while validation loss is more erratic but ultimately stabilizes near 0.
For Fold 3, training accuracy increases from 0.
85 to 0.
Validation accuracy begins at 0.
60 and stabilizes 85 by the end of training.
Training loss drops Filters/ Units Kernel Size Activation ReLU ReLU ReLU ReLU Sigmoid 5 to 0.
25, while validation loss shows fluctuations with some spikes but eventually settles Finally, in Fold 5, training accuracy rises 75 to over 0.
Validation accuracy exhibits significant fluctuation, beginning at 0.
50 and peaking 90, indicating substantial instability.
Training loss decreases steadily, while validation loss presents sharp spikes, suggesting periods of instability.
This shows for the training performance of the model its good ability to capture patterns from the training data with increasing training accuracy and decreasing training loss consistently across all folds.
Then the performance on validation accuracy is generally stable at the end of training although there are fluctuations at the beginning.
However, significant fluctuations in validation loss indicate a potential overfitting or instability problem during training .
The training and testing performance of the baseline CNN model, which uses only Max Pooling without the combination of GAP, is shown in the following accuracy and loss graphs for folds 1, 3, and 5 in Figure Based on the graphs, the following observations can be made for each fold.
In Fold 1, training accuracy improves steadily from approximately 0.
70 to around 90 as the number of epochs increases.
Validation accuracy follows a similar trend, although it exhibits significant fluctuations early in training.
Toward the final epochs, validation accuracy stabilizes around 0.
The training loss decreases consistently from about 2.
to near 0, and validation loss shows a similar pattern with slight variations.
Regarding Fold 3, training accuracy increases from 0.
70 to approximately 0.
Fauzi et al Jurnal RESTI (Rekayasa Sistem dan Teknologi Informas.
Vol.
9 No.
with validation accuracy following a similar path.
Although there are notable fluctuations during the early training stages, validation accuracy eventually stabilizes around 0.
Training loss decreases steadily from approximately 2.
0 to near 0, while validation loss initially fluctuates but stabilizes toward the end of the training process.
Lastly, for Fold 5, training accuracy improves consistently from 0.
70 to above 0.
Validation accuracy shows significant fluctuations throughout training but ultimately settles around 0.
Training loss again shows a steady decline from 2.
0 to nearly 0, while validation loss displays irregular variations, indicating instability during certain phases of training.
Figure 7.
Accuracy and loss graphs of the proposed CNN model (Max Pooling GAP) for folds 1, 3, and 5.
Fauzi et al Jurnal RESTI (Rekayasa Sistem dan Teknologi Informas.
Vol.
9 No.
Figure 8.
Accuracy and loss graphs of the baseline CNN model (Max Pooling Onl.
for folds 1, 3, and 5.
The results indicate that the Max Pooling only model is capable of achieving relatively high training and validation accuracy, similar to the proposed model.
However, the presence of more pronounced fluctuations in validation loss across all folds may suggest that the model is more susceptible to instability or potential overfitting, particularly when compared to the more stabilized performance of the proposed model.
Based on the graphical results of the training and testing performance process of the two models.
The comparison between the max pooling and global average pooling combination model and the max pooling only model without combination is that the combination model has fluctuations in validation loss at the beginning and middle after which it tends to stabilize at the end of training.
While the max poolingonly model shows rapid stabilization after high initial fluctuations, with validation accuracy that eventually Fauzi et al Jurnal RESTI (Rekayasa Sistem dan Teknologi Informas.
Vol.
9 No.
approaches training accuracy and loss that remains low after the first few epochs.
This is quite good, as after the initial fluctuations, the model quickly stabilizes and maintains good performance.
In line with the observation in Prechelt's study .
, the validation error curve often has several minimum points before reaching the best result.
The combined model experienced more fluctuations before stabilizing, while the max pooling model alone stabilized faster after a poor training start.
Overall, the combination model had a more variable learning pattern but remained stable at the end of training, while the max pooling-only model showed a faster stabilization process after high initial This shows that the max pooling model does not suffer from severe overfitting, but instead achieves a balance between training and generalization.
This is due to the use of other regulation techniques such as Dropout or Batch Normalization, then the training data is diverse and large enough that overfitting can be controlled by Max Pooling .
, .
Table 3 shows a comparative analysis of the average assessment outcomes between the suggested model (Max Pooling GAP) and the baseline model (Max Pooling onl.
Table 3.
Comparison of Average Evaluation Results Between Models.
CNN Model Max Pooling GAP Max Pooling Only Accuracy (%) Precision (%) Based on the evaluation results in Table 3, an interesting trade-off between the two models becomes apparent.
The Max Pooling Only model demonstrates superior performance across several general metrics.
It achieved a slightly higher accuracy .
79%), recall .
04%),
F1-Score .
26%), and a notably higher AUC of This superior AUC value, visualized in Figure 9, indicates that the Max Pooling Only architecture is generally more robust at discriminating between benign and malignant lesions across all thresholds.
However, the primary contribution and most critical advantage of our proposed model (Max Pooling GAP) lie in its significantly superior precision .
35% vs.
Recall (%) F1-Score (%) AUC (%) 60%).
This sharp increase in precision highlights that the combined model is effectively optimized for a more specific and clinically vital purpose: minimizing false positives.
In medical applications, where a misdiagnosis can lead to patients undergoing painful, costly, and unnecessary biopsies, the ability of a model to be highly trustworthy when predicting a positive case is an invaluable asset.
Therefore, this study demonstrates that while not outperforming on all metrics, the proposed combined model offers a specialized solution that is safer and more reliable for clinical scenarios where precision is the highest Figure 9.
Comparison of ROC Curves for the combined model and the max pooling only To further assess the model's behavior, confusion matrices were generated for each fold.
These matrices visualize the number of correct and incorrect predictions for both benign and malignant categories, and help interpret where the models performed well or made errors.
The confusion matrices for the proposed model and the baseline model are presented in Table 4 and Table 5, respectively.
Based on the comparison of the results table of the confusion matrix on both models, there is a significant Fauzi et al Jurnal RESTI (Rekayasa Sistem dan Teknologi Informas.
Vol.
9 No.
difference in the pattern of skin cancer detection In the combination model (Max Pooling GAP), the true positive value indicates the number of malignant skin cancer cases that were correctly detected by the model.
For example, in fold 1, the number 927 indicates that 927 malignant cancer samples were correctly identified.
This TP value tends to vary across folds, with fold 1 having a lower TP value than the other folds, indicating different detection consistency depending on the distribution of data within the fold.
The combined model showed superiority in terms of True Negative (TN), such as in fold 1 with a value of 1016, meaning 1016 benign skin cases were correctly categorized as benign.
A significant advantage of the combination model is seen in the consistently lower False Positive (FP) values across folds.
For example, in fold 2, the combined model only produced 27 FP cases .
enign cases that were misclassified as malignan.
, while the Max Pooling model alone produced 53 FP This lower FP value indicates better precision, in line with the higher precision value in the evaluation table .
35% vs 93.
60%).
However, the trade-off is evident in the False Negative (FN) values, where the combined model has higher numbers in most folds.
In fold 2, there were 169 FN cases .
ndetected malignant case.
compared to only 104 cases in the Max Pooling model alone.
This higher FN value indicates that the combined model tends to be more conservative in classifying samples as malignant cancer, which correlates with a lower recall value .
99% vs 89.
04%).
Table 4.
Confusion matrix for max pooling GAP model.
Fold Table 5.
Confusion matrix for max pooling only model.
Fold In the Max Pooling model alone, a different pattern is seen with more consistent and generally higher TP values across all folds.
For example, in fold 1, this model correctly identified 1034 malignant cases, compared to 927 in the combined model.
TN values are also quite stable, as in fold 3 with 909 correctly identified benign cases.
However, the main drawback of this model is the higher FP value, with an example in fold 4 where 73 benign cases were incorrectly categorized as malignant, which could have implications for unnecessary levels of medical intervention in a clinical context.
These differences explain why the Max Pooling model has better recall due to its ability to detect more positive cases, but at the cost of lower precision due to higher FP.
The slightly higher F1-Score value for the Max Pooling model alone .
26% vs 90.
87%) indicates that this model achieves a better balance between precision and recall, although the difference is not significant.
Performance differences among the folds reflect the model's sensitivity to specific data distributions.
This provides a comprehensive assessment of robustness across various data segments, demonstrating the primary advantage of utilizing k-fold cross-validation for evaluation.
Conclusions Based on the results, several conclusions can be drawn regarding the research objectives that were initially The proposed CNN model that combines Max Pooling and Global Average Pooling (GAP) demonstrated solid performance in skin cancer prediction, achieving an overall accuracy of 91.
This level of accuracy indicates the modelAos strong capability in correctly classifying the majority of skin lesion cases, whether benign or malignant.
These findings confirm that a CNN architecture incorporating both pooling techniques is effective and applicable for digital image-based skin cancer detection systems.
One of the most notable strengths of the proposed model is its precision, which reached 96.
outperforming the baseline model.
This result highlights the modelAos ability to significantly reduce false-positive cases where benign lesions are incorrectly classified as malignant.
Such precision is highly valuable in clinical settings, as it can help avoid overdiagnosis and unnecessary patient anxiety.
While the proposed model is slightly outperformed on several general metrics such as accuracy and AUC, this is a justified trade-off for the substantial and clinically vital gain in precision.
This study demonstrates that incorporating Global Average Pooling (GAP) is not merely about boosting a single accuracy metric, but about optimizing the model for diagnostic reliability and safety.
Therefore, it can be concluded that the combined pooling approach offers an intelligent and valuable trade-off.
While it may not yield the highest absolute accuracy, it contributes more meaningfully by enhancing precision, which is a paramount priority in medical applications where diagnostic errors can have serious consequences.
However, this study has some limitations.
The analysis was conducted on a single dataset (ISIC), and the dataset contained a slight class imbalance which.
Fauzi et al Jurnal RESTI (Rekayasa Sistem dan Teknologi Informas.
Vol.
9 No.
despite the use of class weighting and data augmentation, may not be fully resolved.
Furthermore, as our contribution focuses on the analysis of existing components, future work can be directed towards enhancing the originality and impact of the architectural Promising avenues for future research include proposing a novel hybrid pooling mechanism or an adaptive selection method to combine the strengths of different pooling strategies dynamically, applying and evaluating the architecture on a multi-class classification problem .
, distinguishing melanoma, carcinoma, and nevu.
to assess its robustness on more complex tasks, and performing validation on an external dataset to rigorously test the model's generalization capabilities.
This study provides a foundational analysis that opens the door for these future investigations.
Acknowledgements Fitra Salam S.
Nagalay gratefully acknowledges the guidance and support of Dr.
Chairani.
Kom.
Kom.
, from the Department of Informatics.
Institute of Informatics and Business Darmajaya, whose expertise in artificial intelligence contributed significantly to the direction and depth of this research.
The author also appreciates the assistance of academic peers and institutional support that made the completion of this study possible.
Special thanks are extended to the ISIC Archive for providing the publicly accessible dataset used in this research.
References