International Journal of Electrical and Computer Engineering (IJECE) Vol.
No.
October 2025, pp.
ISSN: 2088-8708.
DOI: 10.
11591/ijece.
Comparative analysis of convolutional neural network architecture for post forest fire area classification based on vegetation image Ahmad Bintang Arif1.
Imas Sukaesih Sitanggang1.
Hari Agung Adrianto1.
Lailan Syaufina2 School of Data Science.
Mathematics and Informatics.
IPB University.
Bogor.
Indonesia Department of Silviculture.
Faculty of Forestry and Environment.
IPB University.
Bogor.
Indonesia
Article Info
ABSTRACT
Article history:
This study presents a comparative analysis of 7 Convolutional Neural Network (CNN) architecturesAiMobileNetV2.
VGG16.
VGG19.
LeNet5.
AlexNet.
ResNet50, and InceptionV3Aifor classifying post-forest fire areas using field-based vegetation imagery.
A total of 56 models were evaluated through combinations of batch size, input size, and optimizer.
The results show that MobileNetV2.
VGG16, and VGG19 outperformed other models, with validation accuracies exceeding 88%.
MobileNetV2 emerged as the most balanced model, achieving 96% accuracy with the lowest model size and training time, making it ideal for resource-constrained applications.
This study highlights the potential of CNN-based classification using mobile field imagery, offering an efficient alternative to costly and condition-dependent satellite or drone data.
The findings support real-time, localized identification of burned areas after forest fires, providing actionable insights for prioritizing recovery areas and guiding ecological restoration and land rehabilitation strategies.
Received Aug 26, 2024 Revised Apr 17, 2025 Accepted Jul 3, 2025 Keywords:
Architecture comparison Convolutional neural network Field imagery Forest and land fire This is an open access article under the CC BY-SA license.
Corresponding Author:
Imas Sukaesih Sitanggang School of Data Science.
Mathematics and Informatics.
IPB University Meranti Street.
Wing 20 Level 5.
IPB Campus Darmaga Bogor 16680.
Indonesia Email: imas.
sitanggang@apps.
INTRODUCTION
Indonesia has a long-standing history of recurring forest fires, which have become an annual occurrence, severely affecting the environment and local communities.
Forest fires have been a significant issue as early as the 1980s .
These fires have contributed to air pollution, health issues, biodiversity loss, and the degradation of ecosystems .
Analyzing post-fire areas is therefore crucial for mitigation, restoration, and sustainable land management.
Recent studies have introduced convolutional neural networks (CNN.
as a promising solution for wildfire damage classification due to their powerful feature extraction and image recognition capabilities .
Most existing works utilize satellite and drone imagery to classify post-forest fire areas using CNNs.
For example, .
used drone images with CNN-based models such as VGG-16 and FFireNet, achieving high accuracy levels of 96% and 98.
42%, respectively.
Other researcher utilized Sentinel-2 satellite images to classify images into five categories (Aofield,Ao Aoforest,Ao Aosmoke,Ao Aourban,Ao and AoburnedA.
using ResNet and Xception models, with Xception achieving the highest accuracy of 96.
7% .
Despite their success, these approaches still depend on expensive image acquisition, varying image resolutions, and are limited by spatial or weather-related constraints .
Journal homepage: http://ijece.
ISSN: 2088-8708
To overcome these challenges, field-based or ground-level imagery has emerged as a viable Field images captured using mobile phones provide detailed vegetation and soil information and are less affected by cloud cover or atmospheric distortions.
A study by .
applied MobileNetV2 on field imagery from Jambi Province, achieving 77.
7% accuracy.
However, research in this direction remains limited, particularly in optimizing CNN architectures and hyperparameter combinations for such datasets.
This study addresses this gap by systematically evaluating 7 CNN architectures (MobileNetV2.
VGG16.
VGG19.
LeNet5.
AlexNet.
ResNet50, and InceptionV.
on a dataset of post-fire vegetation images collected in the field.
This study applied hyperparameter tuning involving batch size, input size, and optimizer to determine their influence on model performance.
The goal of this study is to identify an architecture that balances high classification accuracy with computational efficiencyAiespecially relevant for real-time applications in remote or resource-limited environments.
The key contributions of this work include a comparative evaluation of CNN architectures specifically tailored for post-forest fire classification in field conditions, an analysis of how batch size, input size, and optimizer affect performance, and valuable insights into the viability of using mobile-captured field images as a cost-effective and practical alternative to traditional remote sensing methods.
METHOD
The methodology of this research comprises five main stages: data collection, preprocessing, data splitting, modeling, and evaluation, as illustrated in Figure 1.
These stages are adapted from widely accepted image classification workflows .
, and each is carefully designed to ensure high-quality input data, optimal model performance, and rigorous evaluation.
The overall approach integrates field-collected image data with deep learning-based classification to assess post-fire land conditions.
Figure 1.
Research stages Data collection This study utilized field images captured using mobile phones from 4 post-fire locations in Jambi Province, namely Pematang Rahim.
Pematang Lumut.
Pelayangan, and Tenam.
Field imagery was chosen due to its high resolution, clearer visual details, immunity to atmospheric disturbances, and lower cost compared to satellite or drone imagery.
The collected images were initially categorized into three factors:
area, soil, and vegetation.
However, only vegetation-related images were used for the classification model, as vegetation plays a significant role in assessing post-fire areas .
In total, 239 images were used, divided into two classes: burned area .
and unburned area .
Sample images from each class are presented in Figures 2 and 3.
Figure 2.
Burned area image Figure 3.
Unburned area image Int J Elec & Comp Eng.
Vol.
No.
October 2025: 4723-4731
Int J Elec & Comp Eng
ISSN: 2088-8708
Preprocessing data Preprocessing is a critical step in deep learning pipelines to improve model learning efficiency and output quality .
In this study, three key preprocessing operations were applied.
First, resizing was performed by standardizing all images to dimensions of either 192y192 or 224y224 pixels.
This not only reduced computational cost and memory usage but also ensured consistent input dimensions across models.
While resizing can introduce information loss, this was mitigated by selecting relatively high target resolutions and preserving aspect ratios where possible .
Second, normalization was applied by scaling pixel values to the .
, .
range, which facilitated faster and more stable convergence during training, particularly when using gradient-based optimizers .
Third, data augmentation techniques such as random rotation, flipping, and zooming were employed to increase data variability, reduce overfitting, and improve model generalization .
Data partition Before training, the dataset was divided into three subsets: training, validation, and test sets.
The training set was used to fit the model, the validation set was used to tune the model and prevent overfitting during training, and the test set was reserved for final performance evaluation on unseen data .
This partitioning ensures that the model has sufficient data to learn effectively while also being properly validated and tested.
The dataset was split with the following proportions: 80% for training .
burned images and 80 unburned image.
, 10% for validation .
burned and 10 unburne.
, and 10% for testing .
burned and 10 Model development This study explored two approaches to CNN model development: transfer learning using pretrained models and training from scratch.
The pretrained models included MobileNetV2.
VGG-16.
VGG-19.
ResNet-50, and Inception, all of which were pretrained on the ImageNet dataset.
Leveraging transfer learning allows for improved performance on small datasets and faster convergence due to the reuse of learned feature representations .
In parallel, two CNN modelsAiLeNet-5 and AlexNetAiwere trained from scratch.
These architectures serve as baseline models and enable comparison of shallow versus deep feature extractors, particularly in the context of forest fire classification, which lacks dedicated pretrained models.
To optimize model performance, hyperparameter tuning was conducted on three primary parameters: batch size .
, input size .
y192 and 224y.
, and optimizer (Adam and RMSpro.
Batch size determines how many samples are processed at each training step .
, while input size defines the image dimensions provided to the CNN model.
Optimizers, which help find the optimal model parameters by minimizing the loss function through gradient computation, significantly impact training speed and convergence .
Each optimizer also affects the learning speed and convergence of a model .
These hyperparameter combinations resulted in 8 unique scheme as shown in Table 1.
Hyperparameter tuning is critical to finding the most effective configuration for achieving accurate and efficient classification of postfire areas .
All models were trained for 50 epochs and other training parameters, such as learning rate and dropout rate, were kept at their default values to isolate the impact of the selected hyperparameters.
Model training and evaluation were conducted on a local machine equipped with an AMD Ryzen 5 5600X CPU, 64 GB RAM, and a 512 GB SSD.
The software environment included Python 3.
8 along with essential libraries such as TensorFlow 2.
Scikit-learn.
Pillow.
NumPy.
Seaborn.
Pandas, and Matplotlib.
Table 1.
Hyperparameter scheme in building model Scheme Batch size Input size 192y192 224y224 192y192 224y224 Optimizer Adam Adam Adam Adam Scheme Batch size Input size 192y192 224y224 192y192 224y224 Optimizer Adam RMSProp RMSProp RMSProp Evaluation and comparison The performance of each model was evaluated through both quantitative metrics, training metrics, and visual assessment techniques.
Quantitative metrics involved the calculation of accuracy, precision, recall, and F1-score based on the confusion matrix.
These metrics provide a comprehensive assessment of classification performance, especially in scenarios involving class imbalance .
, .
Accuracy, in particular, was computed using in .
Comparative analysis of convolutional neural network architecture for A (Ahmad Bintang Ari.
A ISSN: 2088-8708
yaycaycaycycycaycayc = ycNycE ycNycA ycNycE ycNycA yaycE yaycA where TP.
TN.
FP, and FN represent true positives, true negatives, false positives, and false negatives.
To assess training dynamics and model generalization, training metrics such as training and validation accuracy, training time .
n second.
, and model size .
n megabyte.
were also recorded.
Visual analysis was conducted using confusion matrices to observe misclassification trends, along with accuracy and loss plots throughout training epochs to evaluate learning progress and detect potential overfitting or The overall evaluation aimed to identify not only the best-performing CNN architecture but also the optimal combination of hyperparameters by considering both predictive performance and computational RESULTS AND DISCUSSION Model training results Based on the training process, a total of 56 CNN models were generated from 8 experiments conducted across 7 different architectures.
The performance summary of each architecture was presented in Table 2, which includes metrics such as training accuracy, validation accuracy, training time, model size, and the corresponding hyperparameter combinations.
Each architecture exhibited varying performance depending on the hyperparameter combinations applied.
Table 2.
Summary of model performance Architecture MobileNetV2
VGG16
VGG19
LeNet5 AlexNet ResNet50 InceptionV3 Train acc Val acc Train loss Val loss Train Time .
Model size .
Scheme From Table 2, it can be observed that VGG16.
VGG19, and MobileNetV2 demonstrated the best performance among the seven CNN architectures evaluated.
MobileNetV2 achieved the highest validation accuracy of 96%, with relatively fast training time .
and the smallest model size .
MB), making it highly efficient for deployment on resource-constrained devices.
Meanwhile.
VGG16 recorded the highest training accuracy at 98%, although its validation accuracy .
%) was lower than that of MobileNetV2 and VGG19.
VGG19 also performed well, with a validation accuracy of 96% and a low validation loss value .
, despite its relatively large model size .
MB).
In contrast, architectures such as AlexNet and ResNet50 showed poor performance, with low validation accuracies of 58% and 63%.
These results suggest that despite their architectural depth, these models (AlexNet and ResNet.
may be less effective in learning from the limited, ground-level imagery used in this study.
Considering validation accuracy, training time efficiency, and model size.
MobileNetV2 is the most balanced model, while VGG19 and VGG16 stand out in terms of accuracy.
To further analyze model performance, training history plots and confusion matrices of the three models are presented.
The performance of MobileNetV2.
VGG19, and VGG16 is illustrated in Figures 5, 6, and 7.
Figure 5.
MobileNet V2 performance result Int J Elec & Comp Eng.
Vol.
No.
October 2025: 4723-4731
Int J Elec & Comp Eng
ISSN: 2088-8708
Figure 6.
VGG-19 performance result Figure 7.
VGG-16 performance result Figures 5, 6, and 7 present the training history and confusion matrices of MobileNetV2.
VGG16, and VGG19.
These visualizations indicate that all three models exhibited steadily increasing accuracy over epochs and decreasing loss curves, confirming stable learning behavior.
However.
MobileNetV2 shows more fluctuations during training, suggesting that its performance, although efficient, may be more sensitive to data variations or due to insufficient data.
In summary.
VGG19 produced the most accurate predictions, while MobileNetV2 offered the best trade-off between performance and efficiency.
This reinforces prior findings .
which state that MobileNetV2 was designed to balance accuracy and computational demands through hyperparameter flexibility, making it ideal for field-based, low-resource scenarios.
These results indicate that deep learning modelsAiespecially lightweight architectures like MobileNetV2Aican effectively distinguish between burned and unburned areas based on field imagery, supporting the studies of CNN applicability in post-fire assessment.
The effect of hyperparameters on model performance In this study, several hyperparameters were tested on CNN models to evaluate their impact on Each architecture was assigned a numerical label for ease of reference .
=MobileNetV2, 2=VGG16, 3=VGG19, 4=LeNet5, 5=AlexNet, 6=ResNet50, and 7=InceptionV.
The hyperparameters tested include batch size, input size, and optimizer.
The batch sizes tested were 16 and 32, input sizes were 192y192 and 224y224, and the optimizers used were Adam and RMSProp.
The following section discusses each hyperparameter in detail.
Batch size Batch size affects several aspects of training, including convergence time, training stability, and the modelAos ability to generalize to unseen data.
For instance, smaller batch sizes often allow faster computations but may require more iterations to converge compared to larger batch sizes .
To further explore the effect of batch size on model performance, the results of the experiments are presented in Table 3.
Based on Table 3, the use of different batch sizes had a significant impact on both validation accuracy and training time across various CNN architectures.
With a batch size of 16, models such as MobileNetV2 .
performed exceptionally well, achieving validation accuracies up to 1.
00 in certain However, other models like VGG16 .
and VGG19 .
showed more variability, with validation accuracies tending to be lower .
Conversely, a batch size of 32 generally produced more consistent validation accuracy.
For instance.
InceptionV3 .
achieved near-perfect validation accuracy Comparative analysis of convolutional neural network architecture for A (Ahmad Bintang Ari.
A ISSN: 2088-8708 in several experiments.
However.
VGG16 and VGG19 showed a decrease in accuracy when using the larger batch size.
In terms of training time, batch size 32 generally reduced the overall training time compared to batch size 16, as the model could process more data in a single iteration, thus requiring fewer iterations to complete the training process.
Nevertheless, smaller batch sizes .
, .
tended to result in higher validation accuracy, the cost of longer training time.
For architectures like AlexNet and ResNet50, validation accuracy remained relatively low under both batch size settings, suggesting that batch size had less influence on these models within the context of this dataset.
In summary, smaller batch sizes yielded better accuracy for certain models, whereas larger batch sizes were more efficient in terms of training time but could potentially compromise accuracy.
This trend is consistent with findings from previous research .
, where smaller batch sizes .
, .
yielded higher accuracy compared to larger batch sizes .
, .
Table 3.
Model performance results for each batch size Batch size Scheme Val accuracy Training time .
544 545 549 553
567 562 561 567
549 550 548 554
568 562 559 560
550 546 543 561
553 534 543 551
555 547 546 557
546 539 526 542
Input size Input size, or image resolution, influences the extent to which spatial features within an image can be effectively extracted by a model.
To examine the impact of input size variations on model performance, experiments were conducted using input sizes of 192y192 and 224y224 pixels.
This allowed for an assessment of how input size affects both validation accuracy and training time across different CNN The experimental results are summarized in Table 4.
Overall.
Table 4 shows that an input size of 224y224 tends to produce higher and more consistent validation accuracy compared to 192y192, particularly in architectures such as MobileNetV2.
VGG16, and VGG19.
For example, in MobileNetV2 .
, the highest validation accuracy of 1.
00 was achieved with a 224y224 input size during experiment 8, whereas with 192y192 input size, the highest accuracy reached only 0.
96 in experiment 5.
This suggests that increasing image resolution enables the model to better recognize patterns and extract features.
This is in line with findings from studies .
, which state that larger image resolutions tend to improve model performance, though they also increase computational time and resource consumption, leading to a trade-off between computational efficiency and recognition accuracy.
However, other studies have shown that increasing image size does not necessarily improve deep learning model performance, as it highly depends on the complexity of the images and the problem being In general, increasing input size does not guarantee better accuracy.
in some cases, smaller input sizes can yield better performance and vice versa.
This is because each dataset may have an optimal input size that yields the best results, and accuracy can even decrease if image size exceeds a certain threshold .
In terms of training time, larger input sizes tend to increase training duration.
This is observable across several architectures, where models trained on 224y224 inputs required more time than those trained on 192y192.
This finding was consistent with .
, which notes that larger input sizes invariably lead to longer training Table 4.
Model performance results for each input size Input size 192y192 224y224 Scheme Val accuracy Int J Elec & Comp Eng.
Vol.
No.
October 2025: 4723-4731 Training time .
550 546 543 561
544 545 549 553
549 550 548 554
555 547 546 557
553 534 543 551
567 562 561 567
568 562 559 560
546 539 526 542
Int J Elec & Comp Eng
ISSN: 2088-8708
Optimizer In this study, two types of optimizers were employed: Adam and RMSprop.
Both are widely used in deep learning model training due to their adaptive learning rate capabilities.
The impact of each optimizer on validation accuracy across different CNN architectures was examined to assess the consistency and optimization effectiveness during training.
The performance results based on the optimizer used are presented in Table 5.
According to Table 5.
RMSProp generally provided higher validation accuracy across several models, most notably MobileNetV2 and InceptionV3.
For instance.
MobileNetV2 reached a perfect accuracy 0 under RMSProp in one training scheme, while it only reached up to 0.
92 under Adam.
InceptionV3 also performed consistently well with both optimizers, often achieving validation accuracy as high as 0.
On the other hand, other architectures such as ResNet50 and AlexNet delivered lower performance, with validation accuracy typically ranging between 0.
58 and 0.
63, indicating their limitations in handling the complex textures and features present in burned field imagery.
In addition to accuracy, training time was also analyzed.
Both optimizers showed comparable training durations, though RMSProp occasionally offered slightly shorter training times in models like ResNet50 and InceptionV3.
Despite the small differences, these time savings could be beneficial when scaling up to large datasets or deploying models in resource-constrained environments.
Among all schemes, the combination of RMSProp and MobileNetV2 proved to be the most effective, achieving perfect classification accuracy in just 556 seconds of training, suggesting an ideal balance of performance and This aligns with findings from study .
which highlights RMSProp as one of the best default optimizers due to its use of decay and momentum variables to optimize image classification accuracy.
Table 5.
Model performance results for each optimizer Optimzer
Adam
RMS
Prop Scheme Val accuracy Training time .
550 546 543 561
553 534 543 551
544 545 549 553
567 562 561 567
549 550 548 554
568 562 559 560
555 547 546 557
546 539 526 542
Discussion The findings highlight that lightweight CNNs can be highly effective for image classification in constrained environments, such as post-fire field settings.
Unlike satellite imagery, field images captured with mobile devices are flexible and cost-effective, yet remain underutilized in wildfire damage assessment.
This study demonstrates that, with proper preprocessing and model selection.
CNNs can achieve competitive results even with such ground-level data.
Notably, our best models rivaled or exceeded reported performances from previous drone-based studies.
For instance.
VGG19Aos performance .
%) is comparable to VGG16 in .
, which used drone imagery with similar tasks.
More importantly.
MobileNetV2 model achieved 96% val accuracyAisignificantly outperforming the 77.
7% reported in .
, which also used MobileNetV2 on field imagery from Jambi Province.
This improvement may be attributed to the use of enhanced preprocessing, optimized hyperparameters, and more systematic training schemes.
These findings reinforce the practical value of using mobile imagery for post-fire classification and demonstrate that careful model tuning can yield competitive, even superior, results compared to prior approaches using the same model architecture.
CONCLUSION
This study compared seven CNN architectures for classifying post-forest fire areas using mobilecaptured field imagery and found that MobileNetV2.
VGG16, and VGG19 achieved the best performance, with MobileNetV2 offering the best balance between accuracy, training time, and model size.
The results highlight the potential of field imagery as a low-cost and flexible alternative to satellite or drone data for wildfire damage assessment.
The studyAos novelty lies in its focus on mobile imagery and systematic evaluation of hyperparameters across multiple architectures.
These findings support the development of lightweight, real-time tools for post-fire assessment and suggest future research should explore larger datasets, more hyperparameter tuning, and on-device deployment to enhance practical applications.
Comparative analysis of convolutional neural network architecture for A (Ahmad Bintang Ari.
A ISSN: 2088-8708
REFERENCES