COGITO Smart Journal Ae Vol.
No.
December 2025.
P-ISSN: 2541-2221.
E-ISSN: 2477-8079
ISSN: 1978-1
Neural Dynamic Network for Brain Tumor Classification:
An Attention-Based Feature Selection Approach Muchammad Naseer*1.
Nova Agustina2.
Harya Gusdevi3.
Niken Riyanti4 Department of Informatics.
Faculty of Creative Industries.
Universitas Teknologi Bandung.
Bandung.
Jawa Barat.
Indonesia e-mail: *1naseer@utb-univ.
id, 2nova@utb-univ.
id, 3devi@utb-univ.
id , 4niken@utb- 1,2,3,4 Abstract Magnetic Resonance Imaging (MRI) plays a vital role in the early detection of brain However, standard Convolutional Neural Network (CNN) models often struggle to extract truly relevant features from complex MRI structures.
This limitation creates a gap in achieving robust and clinically interpretable classifications, as feature redundancy and weak attention toward tumor-specific regions may reduce diagnostic reliability.
To address this gap, this study introduces a Neural Dynamic Network (NDN) that integrates EfficientNetV2S with a dynamic attention-based mechanism to adaptively highlight informative features while suppressing noise.
The proposed model was evaluated using a 5-fold cross-validation scheme and tested on unseen data.
Compared with the baseline CNN, the NDN consistently demonstrated higher accuracy, precision, recall, and F1-score across folds and final testing, reflecting improved robustness and balanced sensitivity.
NDN yielded significant improvements, with the 5fold validation averaging an accuracy of 88.
44%, a precision of 87.
84%, a recall of 87.
88%, and an F1-score of 87.
Beyond numerical performance, interpretability analysis utilizing GradCAM demonstrated that NDN generates more concentrated and clinically consistent heatmaps.
In contrast, the baseline CNN produced dispersed activations that exhibited less alignment with tumor regions.
Overall, the findings confirm that incorporating a dynamic attention-based mechanism substantially enhances both feature selection and visual interpretability.
This makes the NDN architecture more reliable for MRI-based brain tumor classification and highly suitable as a decision-support tool in clinical workflows.
KeywordsAi Neural Dynamic Network (NDN).
Attention-Based Mechanism.
Grad-CAM.
Brain Tumor Classification INTRODUCTION Brain tumors represent one of the most critical life-threatening conditions, imposing a substantial global health burden.
In 2022, approximately 322,000 new cases were reported worldwide .
Nevertheless, the accurate diagnosis of these tumors remains a formidable Their irregular shapes, heterogeneous structures, and proximity to complex healthy brain tissue complicate detection.
Magnetic Resonance Imaging (MRI) stands as one of the primary medical imaging modalities .
used to identify abnormalities, including brain tumors, by leveraging high-resolution imaging outputs .
, .
, .
MRI produces highly detailed images, which are essential for distinguishing healthy regions from abnormal tissues and enabling precise diagnosis and tumor localization .
Compared to CT scans.
MRI offers several advantages: it is non-invasive, non-ionizing, and provides superior soft tissue contrast .
, .
Through multiple sequences, such as structural imaging, contrast-enhanced active tumors, edema, and fluid-adjacent edema.
MRI highlights brain abnormalities from various perspectives, facilitating both anatomical and pathological analysis.
Nevertheless, scan interpretation remains challenging.
Radiologists frequently encounter difficulties in delineating indistinct or infiltrative tumor boundaries due to inter-observer variability and visual overlap between tumors and surrounding healthy tissues .
, 431
ISSN: 1978-1520
This complexity has driven the adoption of artificial intelligenceAebased computational methods, such as radiomics and radiogenomics, to precisely detect tumor margin infiltration and enhance diagnostic transparency .
Accurate AI-driven computational approaches are crucial for determining optimal treatment strategies to improve patient survival, making this a vibrant research focus, especially in deep learning .
Recent advancements in deep learning have accelerated research in MRI-based tumor classification, with CNNs extensively employed to automate pattern recognition and assist clinical decision-making.
Prior works, such as ensemble models using VGG16.
DenseNet121, and Inception-ResNet-v2, achieving around 86% accuracy .
, while ResNet-based approaches incorporating feature selection have reached 86.
77% accuracy .
, demonstrate promising Despite these promising results, significant challenges persist.
CNNs often generate an extensive array of feature maps in each convolutional layer, many of which are redundant, noisy, or irrelevant.
These weakly discriminative features reduce model generalization on heterogeneous MRI datasets, particularly when images are acquired from diverse institutions or varying acquisition protocols .
Furthermore.
CNNs with fixed receptive fields struggle to simultaneously capture global context and fine-grained tumor details, which limits their sensitivity to small or anatomically complex lesions .
To address these limitations, this study proposes the Neural Dynamic Network (NDN), an attention-based deep learning architecture that integrates EfficientNetV2S with a Dynamic Attention mechanism for adaptive feature selection designed to adapt its feature selection process to each MRI input.
Within the attention-based learning paradigm, the proposed approach differs from conventional static attention modules.
Unlike static attention-based modules, where attention weights are fixed after training.
Dynamic Attention generates attention weights conditioned on the extracted feature representations of each MRI input.
This attention-based yet dynamically adaptive mechanism allows the network to highlight discriminative tumor-related features while suppressing redundant or misleading ones.
Through this dynamic attention-based feature selection process.
NDN aims to produce more stable classification performance and more concentrated Grad-CAM visualizations that align with clinically relevant tumor regions.
The proposed model is evaluated against a baseline CNN using a five-fold cross-validation scheme and independent test data to assess its robustness, interpretability, and applicability for real-world clinical support.
In Indonesia and the broader Southeast Asian region, research on MRI-based tumor classification has predominantly focused on traditional CNN architectures or handcrafted feature extraction, with limited attention given to adaptive attention-based mechanisms.
Furthermore, significant variability in MRI acquisition across local hospitals poses substantial challenges to model robustness.
By introducing an adaptive attention-based framework, this study contributes to the enhancement of AI-assisted diagnostic tools in regional healthcare settings.
The approach aligns with ongoing national initiatives aimed at advancing medical imaging analytics and improving early cancer detection.
Despite the success of CNN-based methods, existing models continue to encounter difficulties in adaptively selecting salient tumor-related features, leading to dispersed activation maps and reduced robustness across imaging variations.
These limitations hinder clinical interpretability and practical deployment.
This study aims to address these weaknesses by integrating an attention-based mechanism into a CNN architecture, allowing the model to emphasize meaningful features while suppressing irrelevant information.
Specifically, this study proposes the Neural Dynamic Network (NDN), a deep learning framework built upon EfficientNetV2S combined with a dynamic attention-based mechanism.
The main contributions are: .
the development of the NDN architecture, which adaptively adjusts attention weights to highlight tumor-specific features and suppress noise.
comparison between the baseline CNN and the proposed NDN using 5-fold cross-validation and independent test evaluation.
the integration of Grad-CAM to provide visual interpretability, enabling clinicians to verify whether the model focuses on medically relevant tumor regions.
COGITO Smart Journal Ae Vol.
No.
December 2025.
P-ISSN: 2541-2221.
E-ISSN: 2477-8079
COGITO Smart Journal Ae Vol.
No.
December 2025.
P-ISSN: 2541-2221.
E-ISSN: 2477-8079
ISSN: 1978-1
LITERATURE REVIEW
To clarify the research gap and analytically position the contribution of the proposed NDN model.
Table 1 presents a comparative summary of prior studies, highlighting their methodological characteristics, inherent limitations, and the manner in which the proposed approach addresses these Table 1.
Comparison of Related Works and the Proposed NDN Model Author Musthafa et Method ResNet50 Performance Accuracy 98.
Precision 98Ae99%.
Recall 97Ae99%.
F1-score 98% Pacal et al.
Enhanced EfficientNetV2Small dengan Global Attention Mechanism (GAM) dan Efficient Channel Attention (ECA).
FDR-TransUNet 76%.
Precision 99.
Recall 99.
F1-score Iftikhar et CNN (ConvAe PoolAeBNAe Dens.
21% accuracy.
F1-score Jebin et.
U-Net 69% (SARTAJ), 97.
(Br35H), 98.
18% (Figshar.
Output consists of segmentation overlays .
umor contour.
that only show anatomical boundaries, not feature-level interpretability.
Muksimova et al.
Dense CNN Accuracy 98.
Sensitivity 98.
Specificity 99.
SynchezMoreno et
Majority Voting Ensemble CNNs
(VGG16,
DenseNet121.
InceptionResNet-v.
Interpretability exists, but not via Grad-CAM.
the method does not reveal reasoning behind classification Grad-CAM does not influence feature selection.
only visualizes attention after the model has made a Chaoyang et al.
Lung segmentation (COVID-19 Radiography Databas.
Accuracy: 86.
Precision: 86.
Recall:
F1-score: 85.
Limitation / Research Gap Focuses on binary classification only .
umor vs.
non-tumo.
, not multi-class tumor recognition.
Dataset not validated with cross-validation, only trainAe valAetest, leading to potential dataset-specific overfitting.
Comparison to This Study NDN performs multi-class brain tumor classification, not merely binary tumor/notumor classification.
NDN is evaluated using 5fold cross-validation, preventing overfitting and generalization gain ( 1.
72%).
FDR-TransUNet lacks mechanism, such as GradCAM, is provided.
Generalization drop: Ae5.
21% to 94.
72%).
NDN provides Grad-CAM overlays that highlight regions actually used by the model during prediction.
NDN does not suffer from generalization drop.
it achieves a generalization gain of 1.
72% .
44% to 16%).
Our study applies overlays on Grad-CAM results, not merely segmentation, allowing visualization of the actual image regions used by the model for Attention-based in NDN adapts to each input rather than relying on static, connection-wise dynamic Attention-based filters feature from the beginning of the pipeline, resulting in Grad-CAM visualizations that are more representative and clinically meaningful.
A synthesis of the studies summarized in Table 1 reveals several consistent limitations across existing deep learning approaches for brain tumor analysis.
For instance.
Musthafa et al.
demonstrate high accuracy using ResNet50.
their model is restricted to binary classification, thereby limiting its applicability to multi-class tumor scenarios.
While Pacal et al.
achieve exceptional performance using EfficientNetV2 enhanced with GAM and ECA, their evaluation relied solely on a single trainAevalAetest split, rendering the model vulnerable to dataset-specific overfitting .
Similarly, segmentation-focused architectures such as FDR-TransUNet .
and U-Net-based 433
ISSN: 1978-1520
often lack feature-level interpretability, providing only anatomical contours rather than insights into the underlying decision-making process.
Furthermore, ensemble CNN models .
often depend on post-hoc XAI modules such as Grad-CAM , which explain predictions without enhancing feature extraction, consequently producing heatmaps sensitive to noise.
This instability is also evident in the work of Iftikhar et al.
, where a notable generalization drop was observed.
Finally, although architectures like the Dense CNN with connection-wise attention proposed by Muksimova et al.
incorporate an attention mechanism, they rely on static attention, which fails to adapt to heterogeneous tumor appearances and MRI variations.
Collectively, these studies highlight clear research gaps: including limited multi-class capability, reliance on static .
on-adaptiv.
feature selection, and dependence on post-hoc Furthermore, the absence of intrinsic attention mechanisms often results in unstable generalization across datasets.
To address these limitations, the proposed Neural Dynamic Network (NDN) introduces an attention-based mechanism that adaptively filters salient tumor features during the training process.
Unlike previous approaches, the NDN produces intrinsically focused representations and more consistent Grad-CAM overlays.
Experimental results demonstrate a positive generalization gain across five-fold cross-validation and test sets, effectively overcoming the shortcoming identified in prior literature.
COGITO Smart Journal Ae Vol.
No.
December 2025.
P-ISSN: 2541-2221.
E-ISSN: 2477-8079
RESEARCH METHODS
This study utilizes an exploratory approach by comparing two neural network architectures: a baseline CNN and the proposed Neural Dynamic Network (NDN), which incorporates EfficientNetV2S as its backbone.
The main research phases comprise MRI preprocessing, feature extraction using EfficientNetV2S and Global Average Pooling (GAP), the implementation of an attention-based mechanism within the NDN, and multi-class brain tumor To ensure robust results, we implemented a 5-fold cross-validation on the training dataset and evaluated performance using accuracy, precision, recall.
F1-score, and confidence intervals during both validation and final testing on unseen data.
The comprehensive research workflow for implementing the baseline CNN and the proposed NDN in MRI-based brain tumor classification is presented in Figure 1.
Figure 1.
Research Workflow Highlighting Attention-based in the NDN Architecture.
Figure 1 illustrates the end-to-end workflow of the proposed MRI-based brain tumor classification framework, from data preprocessing to evaluation and interpretability.
The pipeline consists of standardized preprocessing, feature extraction using EfficientNetV2S and Global Average Pooling, and a comparative modeling stage between a baseline CNN and the proposed Neural Dynamic Network with an attention-based mechanism.
Model performance is assessed using 5-fold cross-validation and multiple classification metrics, while Grad-CAM is utilized to provide visual explanations of model decisions, thereby highlighting the contribution of salient COGITO Smart Journal Ae Vol.
No.
December 2025.
P-ISSN: 2541-2221.
E-ISSN: 2477-8079
ISSN: 1978-1
MRI regions to tumor classification process.
Data Collection The Brain Tumor MRI dataset utilized in this study was sourced from Kaggle .
and contains a total of 7,023 human brain MRI images.
These images are categorized into four classes:
glioma, meningioma, pituitary, and no tumor.
The dataset is split into a training set of 5,712 images .
,321 glioma, 1,339 meningioma, 1,457 pituitary, and 1,595 no tumo.
and a testing set of 1,311 images .
glioma, 306 meningioma, 300 pituitary, and 405 no tumo.
Although the original images exhibit varying resolutions, the dataset was selected for its substantial volume, balanced class labels, high-quality annotations, and public availability, all of which facilitate the replication of this study.
Preprocessing Preprocessing was applied to all MRI images in this study to ensure a uniform format for model training.
The preprocessing pipeline consisted of the following stages:
MRI Image Resizing All MRI images were resized to 224 y 224 pixels.
This balanced spatial detail with computational efficiency.
RGB Conversion Following resizing, the images were converted to the RGB color space.
This step ensured compatibility with the input requirements for pre-trained ImageNet weights, which necessitate three color channels.
Normalization In this step, pixel values were rescaled from the .
, .
range to the .
, .
range by dividing each pixel by 255.
This process stabilized the optimization during training and accelerated model convergence.
Label Encoding After normalization, label encoding was applied.
Categorical labels were transformed into numerical representations using one-hot encoding across four dimensions.
This ensures compatibility with the categorical cross-entropy loss during training.
Data Augmentation To enhance model generalization, data augmentation was performed using geometric transformations, including horizontal flipping and 90-degree rotation.
These augmentations introduced variability, rendering the model more robust to orientation changes and anatomical symmetry, without altering the underlying class labels.
Feature Extraction This study utilizes EfficientNetV2S as the architectural backbone for feature extraction, selected for its proven efficacy in capturing complex patterns within medical imaging tasks.
The resulting feature maps are subsequently passed to both the baseline CNN and the Neural Dynamic Network (NDN) framework.
In the baseline CNN (Convolutional Neural Networ.
, the feature maps are directly connected to the fully connected classification layer for final category For example, the feature extraction results from an MRI image at the pixel level are illustrated in Figure 2.
Figure 2.
The original brain MRI image and extracted feature maps using EfficientNetV2S
ISSN: 1978-1520
In contrast, within the NDN, the feature maps are first processed through an attention-based mechanism .
hat emphasizes significant regions of the feature map.
using a dynamic attention mechanism before being passed into the fully connected layer for final classification.
Each input image (X) with dimensions X OO Ey244 y244 y3 is mapped into a set of feature maps (F) through the EfficientNetV2S.
In the convolution process, several components are involved.
The indices i and j represent the spatial positions of pixels in the feature map, while u and v denote the positions of the convolution kernel.
The component c refers to the input channels, such as the RGB channels, .
whereas k indicates the filter index in the convolutional layer.
The kernel weights, denoted as ycOyc,yc,yca serve as trainable parameters to generate the feature map in the-k.
The activation function applied is ReLU, defined as .
= max .
, .
, where x is the convolution output obtained from the multiplication of kernel weights with the input patch plus a bias term .
In summary, the convolution process in EfficientNetV2S for generating feature maps can be formulated as shown in Equation .
COGITO Smart Journal Ae Vol.
No.
December 2025.
P-ISSN: 2541-2221.
E-ISSN: 2477-8079
A yaycn,yc,yco = ReLU (Ocyc Ocyc Ocyca ycOyc,yc,yca ycUycn yc, yc yc, yca ycayco The multiplication of kernel weights (W) captures local visual patterns, such as edges, tumor textures, intensity variations, and morphologies, while the summation across channels .
aggregates information from all input channels, enabling the model to learn multi-channel feature The kernelAos movement over spatial positions .
, .
maps how tumor-related patterns appear in different regions of the image.
The ReLU activation suppresses negative values and retains only signals that are considered relevant for tumor representation.
As the network goes deeper, the learned features become increasingly abstract, from simple edges to edema textures, to high-level tumor shape patterns.
This process forms a hierarchical transformation that converts raw MRI images into meaningful semantic representations.
The extracted feature maps are smaller in spatial dimensions compared to the original image but possess greater depth .
, a larger number of channel.
This process illustrates that the deeper the network, the more complex the semantic representations obtained, which are subsequently utilized in the pooling stage.
Global Average Pooling (GAP) The next stage following feature extraction is pooling, which aims to reduce the spatial dimensions of the feature maps while preserving information.
In this study, pooling is implemented using Global Average Pooling (GAP).
Unlike conventional pooling methods, such as max pooling, which select the maximum value from a patch.
GAP computes the average value of all pixels within each channel, reducing each channel to a single value for a more compact feature vector.
Let yaA denote the feature maps from the convolutional process, with dimensions H .
eight feature map.
y W .
idth feature maps y C .
, where c is the channel index.
The GAP operation to obtain scalar .
for each channel is defined in Equation .
ycyca = ya y ycO Ocya ycnOe1 OcycOe1 yaycn,yc,yca .
Through GAP, global information from each channel is summarized, thereby reducing computational complexity before reaching the fully connected layer.
This process also helps mitigate the risk of overfitting, as it prevents the retention of excessive spatial information that may not be relevant for classification.
The feature vector produced by GAP serves as the input to the fully connected layers in both the baseline CNN and the attention-based model using a dynamic mechanism within the NDN.
Baseline CNN
The Baseline Convolutional Neural Network (CNN) in this study was designed as a 436
ISSN: 1978-1
comparative model against the proposed Neural Dynamic Network (NDN) architecture.
This model utilizes EfficientNetV2S as the backbone for feature extraction, after which the extracted features are processed through Global Average Pooling (GAP) and then through fully connected An overview of the baseline CNN architecture is presented in Figure 3.
COGITO Smart Journal Ae Vol.
No.
December 2025.
P-ISSN: 2541-2221.
E-ISSN: 2477-8079
Figure 3.
Baseline CNN architecture As illustrated in Figure 3, the fully connected layers are implemented using several sequential Dense layers with the ReLU .
activation function.
Additionally.
Dropout layers are inserted between the Dense layers to prevent overfitting by randomly deactivating a portion of neurons during training.
Mathematically, the computation in the fully connected layer of the baseline CNN, denoted as (Ea.
), is formulated as shown in Equation .
Ea.
= yce.
yc yca.
) .
Subsequently, the final layer is a Dense layer with a Softmax activation function .
cCyc.
, which generates probability distributions for the four brain tumor classes: glioma, meningioma, pituitary, and non-tumor.
The Softmax function operates by normalizing the exponential values of the output logits .
The output logit score ycuyco is obtained from the result of Equation .
ycuyco = ycOycoycuycyc Ea.
Thus, each output value lies within the range .
and the sum of all predicted probabilities equals 1.
This output layer represents the likelihood that a given MRI image belongs to each class-k out of the total C classes.
In summary, this computation is expressed in Equation yce ycuyco yco=1 yce ycyco = Ocya Neural Dynamic Network (NDN) The Neural Dynamic Network (NDN) extends the baseline CNN by integrating a dynamic attention-based mechanism following the feature extraction stage.
Similar to the baseline model.
EfficientNetV2S serves as the architectural backbone to extract feature maps, which are then aggregated using Global Average Pooling (GAP).
However, unlike the baseline approach, the NDN further processes the feature maps through dynamic attention-based processing before sending them to the fully connected layer.
This critical step enables the model to suppress irrelevant feature noise while simultaneously amplifying salient diagnostic patterns.
Specifically, the dynamic attention-based mechanism computes an attention weight vector using two sequential non-linear operations.
In the first phase, the output from the convolution layer passes through a 437
ISSN: 1978-1520
ReLU activation function in .
, which removes negative values.
Figure 4 provides an overview of the NDN architecture.
COGITO Smart Journal Ae Vol.
No.
December 2025.
P-ISSN: 2541-2221.
E-ISSN: 2477-8079
Figure 4.
Neural Dynamic Network architecture using dynamic attention-based As illustrated in Figure 4, the dynamic attention-based mechanism computes an attention weight vector using two sequential non-linear operations.
First, the output from the convolution layer passes through a ReLU activation function in .
, which removes negative values.
Next, the result is fed into a Sigmoid function in .
, mapping the values into the range .
so they can serve as attention weights.
The operations performed in yaya1 and yaya2 correspond to Equations .
, respectively.
yaya1 = ycIyceyaycO.
cO1yc yca.
yaya2 = ycIycnyciycoycuycnycc .
cO2yaya1 yca.
After obtaining yaya1 and yaya2, these functions are combined with the feature maps through element-wise multiplication, denoted by the symbol Oo.
This process, referred to as element-wise multiplication .
cA), integrates the attention weights with the feature maps.
The computation of ycA is formulated in Equation .
ycA = yc Oo yaya2 Subsequently, the output layer of the NDN, which has passed through element-wise multiplication, is forwarded to the fully connected layer.
This layer consists of several sequential Dense layers with the ReLU .
activation function.
Similar to the baseline CNN.
Dropout layers are inserted between the Dense layers.
The computation of the fully connected layer (Ea.
) in the NDN is formulated in Equation .
The calculation of the output logit scores .
is expressed in Equation .
Ea.
= yce.
cO .
ycA yca .
ycuyco = ycOycoycuycyc Ea.
Finally, in the output layer, each output value in the NDN is transformed to fall within the range .
, ensuring that the total probability sums to 1.
The final stage of the NDN process is represented in Equation .
yce ycuyco
yco=1 yce
ycyco = Ocya
FC1 (ReLU) serves as an information compression step that reduces dimensionality and 438
ISSN: 1978-1
captures the core relationships across channels.
FC2 (Sigmoi.
transforms the activations into attention weights ranging from 0 to 1, functioning as an importance estimation mechanism.
The element-wise multiplication (O.
performs feature refinement by amplifying tumor-related features and suppressing irrelevant ones.
Dynamic attention-based recalculates the channel weights (FC.
for each MRI image individually, meaning that every image produces a new weighting pattern based on tumor location, edema intensity, and mass size.
In this stage, the model learns to adaptively adjust feature emphasis according to the characteristics of the input.
Unlike SE/ECA, which compute static weights that do not change across samples, dynamic attentionbased updates its weights according to the MRI characteristics, for instance, assigning higher weights to texture-sensitive channels when the tumor is small, and reducing edge-sensitive channels when extensive edema is present.
COGITO Smart Journal Ae Vol.
No.
December 2025.
P-ISSN: 2541-2221.
E-ISSN: 2477-8079
Explainability with Grad-CAM In general.
Grad-CAM operates by leveraging the gradients of the feature maps generated by the network.
In the baseline CNN, gradients are computed with respect to the feature maps from the final convolutional layer.
This occurs before Global Average Pooling (GAP).
In contrast, in the NDN, gradients are computed with respect to the feature maps obtained after feature selection in the dynamic attention-based layer.
These gradients represent the contribution of each channel to the target class.
They are then averaged to obtain the importance weight of each These weights are subsequently used to recombine the channels in the feature maps, producing the class activation map, also known as a heatmap.
The heatmap is normalized to the range .
, resized to match the original image resolution .
, and then overlaid on the MRI image to highlight the modelAos focus areas.
To generate Grad-CAM, the first step is to compute the class score gradient .
with respect to the feature maps.
This can be formulated yco using Equation .
for the baseline CNN and Equation .
for the NDN.
yuycoyca = ya yycO Ocya ycn=1 Ocyc=1 yuiyc ycO yuycoyca = ya yycO Ocya ycn=1 Ocyc=1 yuiyc A .
In Equations .
, yuycoyca is obtained by dividing the sum of gradients by H y W of the feature map.
This value is then multiplied by the total number of spatial units in each channel.
The process is similar to average pooling, but it pools gradients instead of activation values.
In the yuiycuyca baseline CNN, the gradient yuiyc represents the sensitivity of the class logit score to changes in the yca activation value at position .
in the-k channel after feature selection.
The next step is to construct the Class Activation Map (CAM).
This is achieved by multiplying each channel of the feature maps by its corresponding weight yuycoyca .
Then, sum the results across all channels.
This operation produces an activation map that highlights the MRI regions most important to class prediction .
To keep only positive contributions, apply the ReLU activation function to the sum.
Mathematically, this is shown in Equation .
for the baseline CNN, and Equation .
for the NDN.
yaycayaycycayccOeyayaycA = ycIyceyaycO(Ocyco yuycoyca ycyca ) yaycayaycycayccOeyayaycA = ycIyceyaycO(Ocyco yuycoyca yc A ) .
Architecture and Training Setup In this study, three fully connected layers were used sequentially.
The first is a Dense layer with 512 units and ReLU activation, accompanied by a Dropout rate of 0.
3 to prevent Next is a Dense layer with 128 units and ReLU activation.
The final output layer is a Dense layer with 4 units and a Softmax activation function, corresponding to the number of target The training process employed a 5-fold stratified K-fold cross-validation strategy.
This COGITO Smart Journal Ae Vol.
No.
December 2025.
P-ISSN: 2541-2221.
E-ISSN: 2477-8079
ISSN: 1978-1520
helped ensure more stable performance evaluation and minimized bias in data partitioning.
Training also included an Early Stopping mechanism .
atience 10, monitor val_los.
to prevent The AdamW optimizer was used with an initial learning rate of 1 y 10Oe4 along with categorical cross-entropy as the loss function.
Model evaluation was conducted using accuracy, precision, recall, and F1-score metrics.
This provided a more comprehensive assessment of performance across each validation fold.
The training scheme included a batch size of 32 and a maximum of 100 epochs.
To further mitigate overfitting, early stopping was applied .
atience = Adaptive learning rate adjustment was performed with ReduceLROnPlateau .
eduction factor = 0.
2, minimum learning rate 1 y 10Oe.
Model checkpointing was used to store the best weights based on validation accuracy.
Computational Environment Specifications All experiments in this study were conducted in a computing environment equipped with an NVIDIA A100 Tensor Core GPU, using Python 3 as the programming language.
The system was configured with 83.
5 GB of RAM, 40.
0 GB of GPU memory, and a storage capacity of 235.
GB.
RESULT AND DISCUSSION
After preprocessing, feature extraction, model training, and applying the dynamic attention-based mechanism, the next step was to evaluate model performance.
This section presents the experimental results from the baseline CNN and the Neural Dynamic Network (NDN).
Results are presented under the 5-fold cross-validation scheme and from final testing on held-out test data.
Quantitative evaluation used accuracy, precision, recall, and F1-score metrics.
Model interpretability was also analyzed with Grad-CAM to provide deeper insights into regions of focus in MRI-based brain tumor classification.
To reduce repetition, the overall trends are summarized visually in Figures 5Ae10 and supported with statistical testing and analytical interpretation to address reviewer feedback.
Model Performance In the baseline model, the Early Stopping mechanism was triggered in Folds 1 .
, 2 .
, 4 .
, and 5 .
This means that after these points, the validation loss no longer showed significant improvement for 10 consecutive epochs.
As a result, the training process stopped earlier.
In contrast, in Fold 3.
Early Stopping was not triggered, so training continued for the full 100 epochs.
The activation of Early Stopping in the baseline model suggests the validation loss had plateaued.
Meanwhile, training performance continued to improve without corresponding validation gains, indicating the onset of overfitting.
In the NDN model.
Early Stopping was not activated.
The validation loss consistently improved until it approached the maximum number of epochs.
This demonstrates that the model equipped with the dynamic attention-based mechanism maintained a stable downward trend in As a result, it did not meet the criteria for early termination.
This also indicates the NDN is more resistant to overfitting symptoms and has better generalization capability compared to the baseline CNN.
For clarity.
Figure 5 presents the average validation performance per fold for the baseline CNN.
Figure 6 illustrates the results obtained from the proposed Neural Dynamic Network (NDN).
COGITO Smart Journal Ae Vol.
No.
December 2025.
P-ISSN: 2541-2221.
E-ISSN: 2477-8079
ISSN: 1978-1
Figure 5.
Average Validation Performance per Fold Figure 6.
Average Validation Performance per Fold (Baseline CNN) (NDN) In the baseline CNN (Figure .
, the validation accuracy ranged from 81.
12% to 83.
with an average of approximately 82%.
Precision ranged from 80.
82% to 85.
Recall varied 46% to 81.
The F1-score stayed stable, ranging from 80.
28% to 83.
These results indicate the baseline CNN could extract features, but still struggled to maintain consistent This suggests the modelAos sensitivity to true positives was relatively low, limiting its ability to detect all tumor classes evenly.
Conversely, the proposed NDN model (Figure .
achieved higher performance in all Validation accuracy ranged from 85.
81% to 88.
Precision and recall consistently exceeded 85%.
The F1-score remained above 87%, reaching 88.
90% in the best fold.
This improvement can be attributed to the dynamic attention-based mechanism, which emphasizes tumor- relevant features and suppresses noisy ones.
These findings confirm NDN maintains a better balance between precision and recall, yielding a significant improvement in F1-score compared to the baseline CNN.
For completeness.
Table 1 summarizes the average results, while Figures 5Ae6 provide fold-wise visual summaries to streamline result presentation and avoid repetitive textual descriptions.
Table 1.
Comparative Results of Baseline-Based Model (CNN) and NDN Performances Accuracy Precision Recall F1-Score Based-Model (CNN) Training Testing NDN Training Testing Table 1 shows that the baseline CNN achieved an average accuracy of 82.
66% across all folds during training, with a precision of 84.
78%, a recall of 79.
90%, and an F1-score of 82.
In contrast, the NDN model demonstrated consistent improvements across all average metrics during training, with an accuracy of 88.
44%, precision of 87.
84%, recall of 87.
88%, and F1-score These results confirm that the dynamic attention-based mechanism in NDN enhanced the quality of feature representation during training, making the model not only more accurate but also more balanced in its ability to identify both positive and negative cases.
In testing.
NDN consistently outperformed the baseline CNN, improving test accuracy from 84.
89% (CNN)
before, to 90.
16%, precision from 85.
85% to 90.
39%, recall from 84.
21% to 89.
70%, and F1score from 77.
48% to 89.
These improvements demonstrate that integrating the dynamic attention-based mechanism into NDN enhances feature selection, resulting in a more robust model
ISSN: 1978-1520
with superior generalization for MRI-based brain tumor classification.
The next evaluation involves measuring model loss for CNN and NDN in the best-performing fold (Fold .
, illustrated in Figures 7 and 8.
COGITO Smart Journal Ae Vol.
No.
December 2025.
P-ISSN: 2541-2221.
E-ISSN: 2477-8079
Figure 5.
Loss Curve of the Baseline CNN Figure 6.
Loss Curve of the NDN In Figure 7, the baseline CNN exhibits a validation loss curve that appears fairly fluctuating during the initial epochs before gradually converging.
Although the training loss decreases progressively, a noticeable gap remains between the training and validation curves, indicating that the model tends to be less consistent in generalizing.
Conversely, as shown in Figure 8, the NDN demonstrates a smoother and more stable convergence pattern.
Both training loss and validation loss consistently decrease throughout the epochs, with a smaller gap between them compared to the baseline CNN.
This pattern highlights the effectiveness of the dynamic attention-based mechanism in enhancing feature selection and suppressing irrelevant information, thereby enabling the model to generalize better and deliver more reliable performance on the test Next, to evaluate the dynamic aspects of this study, i.
, the extent to which the application of GAP.
Fully Connected Layers, and dynamic attention-based influences model performance, the results are summarized in Table 2.
Table 2.
Ablation Study Results of CNN Variants and Full NDN Model Variant Baseline CNN with GAP and Fully Connected Layer Baseline CNN without GAP and Fully Connected Layer Full NDN with dynamic attention-based.
CNN.
GAP, and FC Layers Accuracy Precision Recall F1-Score In Table 2, the ablation study shows that the Global Average Pooling (GAP) and Fully Connected Layers are essential components in the CNN architecture.
When both components are removed, the modelAos performance drops drastically, with accuracy falling to only 23.
39% and the F1-score to 9.
48%, indicating that the model is almost unable to perform classification.
When the GAP and FC layers are reinstated, performance improves, achieving an accuracy of 82.
demonstrating that a standard CNN can still recognize basic patterns in MRI images.
However, the greatest improvement is obtained when a dynamic attention-based model is integrated into the architecture (NDN), increasing accuracy to 88.
44% along with improvements in precision, recall, and F1-score.
This indicates that the feature selection mechanism in NDN is able to highlight relevant features and suppress noise, resulting in more stable representations and greater sensitivity to tumor presence.
Overall, the ablation study shows that each architectural component, especially the dynamic attention-based, contributes to improving model performance.
To further evaluate the modelAos generalization capability, a cross-dataset experiment was conducted using two different binary MRI datasets, i.
, the Brain Tumor MRI Dataset (MRI Dataset 4 Clas.
and the Mendeley MRI Dataset (MRI Dataset 2 Clas.
, as shown in Table 3.
COGITO Smart Journal Ae Vol.
No.
December 2025.
P-ISSN: 2541-2221.
E-ISSN: 2477-8079
ISSN: 1978-1
Table 3.
Cross-Dataset Classification Performance Between the Mendeley MRI Dataset (Binar.
and the Kaggle Brain Tumor MRI Dataset (Binar.
Model
CNN
CNN
NDN
NDN
Train Dataset MRI Dataset 4 Class MRI Dataset 2 Class MRI Dataset 4 Class MRI Dataset 2 Class Test Dataset MRI Dataset 2 Class MRI Dataset 4 Class MRI Dataset 2 Class MRI Dataset 4 Class Accuracy Precision Recall F1-Score The test results presented in Table 3 show that when the model was trained on the dataset containing 4 classes and tested on the dataset containing 2 classes, both CNN and NDN experienced a decrease in accuracy .
98% and 79.
52%).
Nevertheless, both models still produced very high recall values .
70% for CNN and 96.
69% for NDN).
This condition indicates that the model becomes highly sensitive to the presence of tumors but lacks precision in distinguishing positive and negative cases.
The cause is the characteristics of the MRI Dataset 4 Class, which has higher variations in intensity and noise, causing the model to generalize by predicting AutumorAy more frequently, resulting in lower precision and reduced accuracy.
Conversely, when the model was trained using the dataset with 2 classes and tested on the dataset with 4 classes, performance improved significantly.
CNN achieved an accuracy of 94.
32% and NDN 93.
91%, with precision, recall, and F1-score all above 94%.
The dataset with 2 classes contains cleaner and more uniform image quality, allowing the model to learn tumor patterns more stably.
These learned patterns can then be transferred effectively when the model is exposed to the more varied Dataset 4.
In this scenario, the performance difference between CNN and NDN is not very large, because the high quality of the training data already sufficiently supports CNNAos learning process.
However.
NDN still maintains an advantage in prediction stability, as seen from the highest precision value of 96.
95% when tested on the dataset with 4 classes.
However, the MRI Dataset 2 Class contains only two classes, i.
, tumor and non-tumor, so models trained on this dataset tend to learn simpler and less diverse patterns.
The binary class setting makes the learning process more stable and less noisy, but also limits the modelAos understanding of more complex tumor morphology variations.
As a result, when tested on MRI Dataset 4, which has more diverse image characteristics .
uch as intensity variations, tumor shape variations, and anatomical background difference.
, the model still performs well because the main tumor pattern has been learned, but it does not experience a significant performance boost from the use of dynamic attention-based.
This condition explains why the performance gap between CNN and NDN becomes relatively small in the training scenario using MRI Dataset 2 Class.
Overall, this cross-dataset evaluation demonstrates that NDN exhibits more stable performance than CNN, particularly when the training data originate from a noisier dataset.
Models trained on high-quality datasets are able to transfer their learned knowledge to other datasets more effectively.
These findings reinforce the importance of training data quality in influencing the generalization ability of MRI-based brain tumor classification models.
Grad-CAM This study used Gradient-weighted Class Activation Mapping (Grad-CAM) to enhance model interpretability.
Grad-CAM generated heatmaps of the MRI regions contributing most to The activation maps from the CNN were dispersed and less focused on tumor CNN heatmaps showed broad attention, making it difficult to confirm if true pathological areas were fully highlighted.
This aligns with the lower recall performance in the baseline CNN .
90% in training, 84.
21% in testin.
, indicating limited sensitivity in detecting positives.
Thus, the CNN could extract discriminative features, but its interpretability remained limited.
This may raise concerns in clinical applications.
Figure 9 illustrates the baseline CNN results.
COGITO Smart Journal Ae Vol.
No.
December 2025.
P-ISSN: 2541-2221.
E-ISSN: 2477-8079
ISSN: 1978-1520
Figure 9.
Grad-CAM of the Baseline CNN In contrast.
Figure 10 presents the NDN results, showing a more focused and consistent activation in the tumor regions, especially in the pituitary area, which matches the image label.
The generated heatmaps are more directed, with high intensity only in target regions.
Other image areas are relatively suppressed.
This confirms that the dynamic attention-based mechanism in NDN can adaptively select important features, suppress noise, and guide the model to relevant Higher recall and F1-scores for NDN compared to CNN further support this, indicating a better balance between sensitivity and precision.
Figure 10.
Grad-CAM of the NDN Overall, this comparison demonstrates that integrating Grad-CAM with NDN not only enhances numerical performance but also provides more focused visual interpretability.
Therefore.
NDN is more suitable for supporting clinical diagnostic processes, as its predictions can be validated through the visualization of tumor regions in MRI images.
Discussion The experimental results show that the Neural Dynamic Network (NDN) consistently outperforms the baseline CNN across accuracy, precision, recall, and F1-score in both crossvalidation and final testing.
This study also conducted statistical significance tests to verify whether the performance improvements achieved by NDN were truly meaningful and not due to random variation.
Based on the fold-wise results from the 5-fold cross-validation scheme, the paired t-test and Wilcoxon signed-rank test produced p-values of 0.
, 0.
, 0.
, and 0.
1328 (F1-scor.
for the paired t-test, and 0.
1250, 0.
3125, and 0.
1875, respectively, for the Wilcoxon test.
Since all p-values were greater than 0.
these performance improvements cannot be considered statistically significant.
This outcome reflects the limited sample size and the low variance across folds, two common conditions that reduce the statistical power of significance testing in deep learning experiments.
However.
NDN still outperformed CNN on all folds, demonstrating its practical advantage despite the lack of statistical significance.
Next, the study evaluated the confidence scores, which represent the average softmax COGITO Smart Journal Ae Vol.
No.
December 2025.
P-ISSN: 2541-2221.
E-ISSN: 2477-8079
ISSN: 1978-1
probability assigned to the correct class.
The higher confidence achieved by NDN .
compared to 88.
79% for CNN) indicates that NDN produces more stable predictions.
From a performance interpretation perspective, the improvement introduced by the dynamic attentionbased approach arises from the modelAos ability to adaptively adjust feature response weights for each MRI input.
Rather than treating all extracted features equally, dynamic attention-based strengthening activates related to tumor regions while suppressing irrelevant or noisy signals.
This results in clearer class separation, better sensitivity to subtle tumor structures, and more consistent predictions across folds.
This functional benefit is further supported by the interpretability results.
The Grad-CAM maps produced by NDN are more focused on the actual tumor regions, indicating that the model is not only more accurate but also more aligned with meaningful spatial patterns.
Furthermore, the cross-dataset evaluation provides additional evidence regarding the robustness of NDN.
When trained on a noisier and more heterogeneous dataset (MRI Dataset 4 Clas.
, both CNN and NDN showed reduced accuracy but maintained extremely high recall when tested on MRI Dataset 2 Class.
This indicates a strong tumor-sensitivity bias, suggesting that models exposed to highvariation data tend to overpredict positive cases.
In contrast, when trained on the cleaner MRI Dataset 2 Class and tested on the more complex Dataset 4 Class, both models achieved substantially higher overall performance, with NDN maintaining the highest precision.
This demonstrates that NDN benefits more clearly in scenarios involving noisy or heterogeneous training data, where dynamic attention-based effectively suppresses irrelevant activations.
These findings collectively illustrate that dynamic attention-based contributes not only to better numerical performance but also to improved stability across datasets and clearer localization of tumor regions.
This has practical implications for clinical decision support, as models that generalize more consistently across datasets are more reliable when deployed across different hospitals or imaging protocols.
However, the study still has limitations.
Ablation studies have not been performed to isolate the specific contribution of dynamic attention-based relative to other architectural components.
CONCLUSION
This study proposes the Neural Dynamic Network (NDN), a modification of EfficientNetV2S with the addition of a dynamic attention-based mechanism to improve MRIbased brain tumor classification performance.
The model demonstrates consistent performance improvements compared to the baseline CNN, both in accuracy, sensitivity .
, and prediction The main contribution of this study is showing that dynamically weighting features for each input, rather than applying static weights across the entire dataset, can reduce irrelevant activations, strengthen tumor representation, and stabilize the learning process.
This mechanism provides a new direction in attention design for CNNs by emphasizing contextual weight adjustment without increasing architectural complexity.
Although statistical significance tests .
aired t-test & Wilcoxo.
did not yield p-values < 0.
05 due to the small number of folds and low fold-to-fold variance, the performance improvements remain consistent across all folds.
NDN provides a higher prediction confidence level .
72% compared to 88.
79% for CNN) and more focused Grad-CAM attention regions aligned with actual tumor locations.
These results enhance the modelAos reliability, an important aspect in clinical application.
Furthermore, the cross-dataset experiment further supports this robustness, showing that NDN maintains more stable precision than CNN when trained on noisier data and tested on an external dataset.
This study has several limitations.
First, the ablation studies were not conducted to isolate the specific contribution of dynamic attention-based, and the Grad-CAM evaluation remains qualitative without quantitative metrics.
Future work should include component-level ablation, multi-institutional validation, and quantitative interpretability measures such as overlap ratio (IoU) and pointing-game accuracy to empirically verify the modelAos attention quality.
Evaluating COGITO Smart Journal Ae Vol.
No.
December 2025.
P-ISSN: 2541-2221.
E-ISSN: 2477-8079
ISSN: 1978-1520
multimodal MRI inputs and integrating clinical metadata also represent promising directions to further improve robustness and clinical applicability.
ACKNOWLEDGMENTS
The research was funded under the Main Contract dated 28 May 2025 (Contract No.
125/C3/DT.
00/PL/2.
and supported through the Derivative Contracts dated 04 June 2025 (Contract No.
8078/LL4/PG/2.
and 05 June 2025 (Contract No.
049/WRRICP/UTB/VI/2.
REFERENCES