International Journal of Electrical and Computer Engineering (IJECE) Vol.
No.
October 2025, pp.
ISSN: 2088-8708.
DOI: 10.
11591/ijece.
Anomaly-based intrusion detection leveraging optimized firewall log analysis: a real-time machine learning solution Tran Cong Hung1.
Dam Minh Linh2.
Han Minh Chau3.
Ngo Xuan Thoai2.
Thai Duc Phuong1.
Huynh De Thu1 School of Computer Science and Engineering.
The Saigon International University.
Ho Chi Minh City.
Vietnam Information Security Technology Lab and Faculty of Information Technology.
Posts and Telecommunications Institute of Technology.
Ho Chi Minh City.
Vietnam Faculty of Information Technology.
HUTECH University.
Ho Chi Minh City.
Vietnam
Article Info
ABSTRACT
Article history:
Firewall logs play a vital role in cybersecurity by recording network traffic and flagging potential threats.
This study evaluates five machine learning algorithms-decision tree (DT), random forest (RF), extra trees (ET).
CatBoost (CB), and AdaBoost (AB)-on a dataset of 65,532 firewall log Models were assessed using accuracy, precision, recall, training/prediction time, and Pearson correlation for feature selection, across multiple train-test splits.
The DT model achieved the best performance, 45% test accuracy, 97.
457% precision, and 93.
389% recall at a 7:3 split, along with the fastest training time .
We propose realtime flow-level intrusion detection (RT-FLID), novel, lightweight, real-time intrusion detection system that leverages multithreaded processing and flowlevel analysis to boost detection speed and scalability.
Unlike existing approaches that rely heavily on deep packet inspection or computationally intensive processing.
RT-FLID requires minimal resources while maintaining high detection accuracy.
The architecture efficiently handles large traffic volumes and dynamically identifies anomalies such as distributed denial-of-service (DDoS) and port scans.
Validated on real-world logs, the system maintained high accuracy in critical classes like AudenyAy and Aureset-both.
Ay These findings highlight RT-FLIDAos novelty and practical advantages, demonstrating its potential for deployment in high-throughput, low-latency network environments.
Received May 26, 2025 Revised Jul 7, 2025 Accepted Jul 12, 2025 Keywords:
Anomaly detection Cybersecurity Firewall logs Intrusion detection Multi-threading Real-time systems This is an open access article under the CC BY-SA license.
Corresponding Author:
Dam Minh Linh Information Security Technology Lab and Faculty of Information Technology.
Posts and Telecommunications Institute of Technology Ho Chi Minh City.
Vietnam Email: linhdm@ptit.
INTRODUCTION
Firewalls play a crucial role in an organization's network security, acting as the primary barrier against threats.
They offer protection against both external and internal attacks, ensuring comprehensive security coverage.
Given their significance in safeguarding systems, the logs generated by firewalls provide valuable insights into network traffic patterns and facilitate enhanced monitoring and analysis.
CiscoAos 2024 cybersecurity readiness index offered an in-depth assessment of global cybersecurity preparedness within organizations .
, revealing that only 3% of companies are equipped to address current threats, whereas two-thirds of organizations are at the beginner or formative stages of preparedness.
Notably.
Journal homepage: http://ijece.
ISSN: 2088-8708
73% of respondents expected a cybersecurity incident to impact their business within the next 12 to 24 The price of unpreparedness can be significant, as 52% of impacted respondents incurred a cost of at least US $300,000.
The Federal Bureau of Investigation (FBI) developed a AuTech TuesdayAy in .
and the Phoenix Field Office alerted the public about criminal actors engaging in phishing and spoofing scams and provided tips to reduce the likelihood of falling victim.
The FBIAos internet crime complaint center (IC.
reported that phishing scams were the most prevalent type of cybercrime in 2020, with over 240,000 victims and losses of nearly $50 million.
Although spoofing scams impacted fewer individualsAiapproximately 28,000Aithe financial losses exceeded S215 million.
Noting the rise in global security threats, the 2024 Microsoft digital defense report .
highlighted a sharp increase in identity-related attacks, especially password breaches, which account for over 99% of the 600 million daily identity-focused attacks.
However, one key solution inferred from recent cybersecurity reports is to enhance firewall log data analysis to improve real-time threat detection and significantly reduce the risk of successful cyberattacks.
In recent years, the application of artificial intelligence (AI) techniquesAiparticularly machine learning (ML) and deep learning (DL)Aihas gained increasing attention in cybersecurity, especially in firewall anomaly detection.
Traditional firewall systems, which rely on manually configured rule sets to filter network traffic, are prone to misconfigurations and may fail to detect novel threats.
To overcome these limitations, advanced ML and DL approaches have been proposed to enhance the detection of suspicious activities in firewall logs .
Komadina et al.
investigated the effectiveness of anomaly detection by injecting synthetic attack data into firewall logs, offering a controlled yet realistic evaluation environment.
Building upon this foundation, subsequent research .
leveraged AI to autonomously optimize and validate firewall rule sets in high-performance network infrastructures, thereby enhancing policy accuracy and reducing manual configuration errors.
Extending this line of work, a lightweight and cost-effective Smart Unified Threat Management System was proposed in .
, which was deployed on a Raspberry Pi platform to secure home networks against modern cyber threats.
The system achieved a detection accuracy of up to 99% while reducing memory consumption by approximately 55% compared to traditional signature-based solutions.
Firewall logs are a vital component of network security, providing visibility into data traffic and enabling the identification of potential network security risks and suspicious behaviors.
The critical role of firewall logs in intrusion detection systems (IDS) has spurred the adoption of ML approaches to evaluate network security datasets, particularly for identifying abnormal traffic patterns indicative of targeted attacks on critical servers.
Aljabri et al.
developed a ML-based framework to classify firewall sessions into categories such as AuAllow,Ay AuDrop,Ay AuDeny,Ay and AuReset-both.
Ay Their evaluation utilized a firewall log dataset containing 65,532 records, distributed as follows: Allow .
Deny .
Drop .
, and Reset-Both .
By introducing two novel features, application and category, the study significantly improved classification performance, with the random forest (RF) algorithm achieving up to 99.
However, the current system remains suboptimal in terms of availability and rapid response for real-time network environments.
Based on the same firewall log dataset, study .
proposed a multiclass classification method using support vector machine (SVM) to categorize connection sessions into four actions: allow, deny, drop, and reset-both, through systematic comparison of kernel functions to optimize classification performance.
Subsequently, the research in .
implemented and evaluated multiple classification algorithms, with RF achieving up to 99% accuracy, demonstrating the effectiveness of feature extraction techniques in firewall log analysis.
Nevertheless, both research efforts primarily rely on prelabeled data and provide limited exploration of real-time responsiveness and data imbalance issues.
Li et al.
introduced a novel intrusion detection model, adversarial environment with soft actorcritic (AE-SAC), which integrates adversarial learning and deep reinforcement learning to address challenges in imbalanced datasets and minority attack detection.
Experimental results demonstrate that AE-SAC achieves an accuracy of 84.
15%, an F1-score of 83.
97% on the NSL-KDD dataset and exceeds 98.
9% for both metrics on the AWID dataset.
Similarly.
Bamber et al.
emphasized optimizing feature selection through recursive feature elimination combined with a DT classifier and evaluated multiple DL architectures, with the convolutional neural networkAelong short-term memory model outperforming others by achieving 95% accuracy, 0.
89 recall, and a 0.
94 F1-score on the NSL-KDD dataset.
However, these two studies exhibit certain limitations related to model accuracy, processing time, and varying data split ratios between training and testing phases.
To effectively mitigate targeted attacks on server infrastructures, the systematic collection and analysis of log files play a pivotal role in anomaly detection and tracing attack behavior chains.
demonstrated in the study by Artioli et al.
, advanced Security Information and event management systems can be enhanced by integrating natural language processing-based classifiers trained on synthetic log datasets, such as Siem Ingesting EVEnts, which are generated using the semantic augmentation technique known as semantic perturbation and instantiation for content enrichment.
The SVM model maintained consistent performance across both synthetic and real-world logs (Macro-F1: 0.
9477Ae0.
In contrast.
Int J Elec & Comp Eng.
Vol.
No.
October 2025: 4785-4802
Int J Elec & Comp Eng
ISSN: 2088-8708
although the bidirectional encoder representations from transformers (BERT) modelAia pre-trained DL model widely used in text classificationAiachieved high performance on synthetic data (Macro-F1: 0.
9528Ae 9.
, it exhibited limited generalization capability when applied to real-world logs (Macro-F1: 0.
8864Ae To further support anomaly detection and classification in IDS systems, a context-aware logging and advanced log analysis frameworkAiSemantic-aware generatorAihas been proposed in .
Meanwhile, the studies .
, .
emphasize the role of policies and log analysis in distributed firewalls, proposing a data mining and ML approach to detect anomalies from large-scale logs collected in real-world environments.
the same time.
Bringhenti and Valenza .
emphasize that optimizing firewall configurations to reduce energy consumption while maintaining cybersecurity is essential for enhancing system sustainability.
Similarly, the study in .
proposes a semi-automated approach for firewall anomaly detection and resolution, which automatically addresses sub-optimizations while involving human intervention for conflict management, thereby reducing the workload of administrators.
Subsequently.
Park et al.
developed a visualization tool to assist administrators in monitoring and managing cybersecurity incidents, while also performing anomaly classification within firewall policies.
Additionally, the research in .
addresses effective and standardized alarm rationalization for cybersecurity monitoring.
Sharma et al.
made a significant contribution by proposing an optimized solution for firewall packet classification using advanced ensemble models, applied to a large dataset consisting of 65,532 log entries with four firewall action labels .
ccept, drop, reject.
TCP rese.
The study employed both voting and stacking ensemble models based on five popular ML algorithms, with the stacking model using a RF as the meta-classifier, achieving an accuracy of 99.
8% and a precision of 91%.
However, this research mainly focused on classification using static log data and did not thoroughly explore aspects such as real-time processing capabilities and latency, which remain important areas for further investigation.
Efeolu and Tuna .
evaluated multiple classification algorithms on a firewall log dataset comprising 65,532 entries with 12 attributes, using the action field as the target class.
Simple cart and Naive Bayes (NB) tree achieved the highest accuracy .
84%), while decision stump performed the worst .
68%).
To address class imbalance, the study employed the Matthews correlation coefficient for more robust evaluation.
However, the approach lacks real-time processing capabilities.
In contrast.
Mingze .
contributed a firewall log visualization system that achieved 98.
3% accuracy, 92.
1% precision, 97.
5% recall, 1% F1-score, and 91.
2% real-time performance in experimental evaluations.
To clearly position our work in relation to existing research.
Table 1 summarizes the key approaches, advantages, and limitations of the related studies discussed.
Table 1.
Summary of related work on firewall log analysis and intrusion detection Study/Reference Dataset used Algorithms Aljabri et al.
65,532 firewall RF, k-nearest neighbors (KNN).
NB.
J48, artificial neural network (ANN) RF .
thers not specifie.
Rahman et al.
65,532 firewall .
Efeolu and Tuna 65,532 firewall .
Mingze .
65,532 firewall Li et al.
NSL-KDD
Bamber et al.
NSL-KDD
Our study Simple cart.
NB tree.
FT tree.
J48.
BF tree.
Decision stump IG-based feature selection, visualization-based AE-SAC .
dversarial Env SAC RL) Best accuracy Training/testing time .
RF: Accuracy 99.
Not reported Real-time Not supported RF: F1-score 99% Not reported Not supported Simple cart: Accuracy Accuracy 98.
F1score 98.
Not reported Not supported Not reported Supported Accuracy: 84.
Not reported Limited performance on NSL-KDD.
tested on firewall Not applied to real lacks realtime processing ANN, long short-term memory CNN-LSTM:
Not reported (LSTM), bidirectional LSTM Accuracy 95%.
Recall (BiLSTM), gated recurrent unit 89%.
F1-score 94% (GRU), bidirectional GRU (BiGRU), convolutional neural network (CNN)-LSTM 65,532 firewall DT.
RF.
ET.
CatBoost.
Accuracy (Train/Tes.
Train:
RT-FLID:
AdaBoost.
Proposed: RT99.
86% / 99.
FLID
Precision (Train/Tes.
: Test: 0.
05292 s multithreaded, real99.
20% / 97.
time capable.
Recall (Train/Tes.
distributed denial-of99.
25% / 93.
service (DDoS)/port effective on real logs Anomaly-based intrusion detection leveraging optimized firewall log A (Tran Cong Hun.
A ISSN: 2088-8708 The major contributions of this paper are:
A real-time detection algorithm is proposed to identify abnormal increases in total packet size within server traffic, based on key statistical characteristics extracted from the dataset.
Additionally, the study utilizes dataset .
, publicly available via .
, and demonstrates superior accuracy and real-time applicability compared to prior works .
, .
, particularly in practical server environments.
Five ML algorithmsAiDT.
RF.
ET.
CB, and ABAiwere evaluated based on multiple performance metrics, including accuracy, precision, training/testing time, and confusion matrix, across diverse train-test ratios ranging from 1:9 to 9:1.
Among them, the DT model demonstrated superior accuracy and consistent performance, leading to its selection as the final prediction model.
The experimental results revealed that a 7:3 train-test split provided the most balanced trade-off between training sufficiency and testing reliability.
Furthermore.
Pearson correlation coefficient analysis was incorporated to assess the relevance of input features, supporting effective feature selection and enhancing overall model performance.
The proposed experimental system presents a novel multi-threaded network intrusion detection architecture that leverages parallel machine learning-based prediction to achieve real-time and scalable threat detection.
By integrating efficient flow aggregation, synchronized multi-threaded processing, and automated resource management, the experimental implementation significantly enhances detection accuracy and operational performance compared to traditional IDS solutions.
This architecture is particularly well-suited for dynamic and high-throughput network environments, providing a robust foundation for adaptive and intelligent network security.
These limitations reveal a significant research gap in developing real-time, lightweight intrusion detection systems that can effectively operate in dynamic and high-volume network environments.
The motivation for this study stems from the growing volume and complexity of modern network traffic, which presents significant challenges for timely and accurate intrusion detection.
Traditional methods often rely on deep packet inspection or computationally intensive techniques, which are not suitable for highspeed, resource-constrained environments.
Meanwhile, firewall logsAialthough readily availableAiremain underutilized for real-time threat detection.
To address these limitations, we propose RT-FLID, a lightweight and scalable intrusion detection system that leverages machine learning and flow-level log analysis to detect anomalies such as DDoS and port scanning efficiently in live network environments.
The structure of the paper is as follows: section 2 outlines the proposed algorithm, including the detection model architecture and the real-time network traffic anomaly detection algorithm.
Section 3 describes the evaluation methods and dataset, covering the firewall logs dataset, evaluation metrics and statistical techniques, as well as dataset splitting strategies for model assessment.
Section 4 presents the results and discussion, focusing on the performance evaluation of ML-based intrusion detection models using firewall logs, the experimental validation of real-time processing on a live server environment, and an overall discussion of key findings.
Finally, section 5 concludes the study and provides acknowledgements.
PROPOSED ALGORITHM
The proposed algorithm introduces a real-time, flow-level network intrusion detection approach that leverages machine learning for accurate and adaptive traffic classification.
The detection model architecture is organized into seven interconnected phases, encompassing packet acquisition, flow aggregation, thread synchronization, parallel background processing, multi-threaded prediction, result aggregation, and automated flow management.
This design enables the system to efficiently process high-volume network traffic, promptly identify both known and unknown threats, and maintain robust performance under dynamic network conditions.
The RT-FLID algorithm integrates synchronized feature extraction and parallel ML inference to deliver timely and reliable anomaly detection for modern network environments.
Detection model architecture As illustrated in Figure 1, the proposed multi-threaded network intrusion detection system is architected as a sequence of interconnected processing phases, each responsible for a distinct functional aspect of real-time traffic analysis and threat detection.
By decomposing the system into well-defined phases, we ensure modularity, scalability, and efficient resource utilization throughout the detection pipeline.
The following sections detail each phase of the system, from initial packet acquisition to parallelized machine learning-based prediction and automated flow management.
Phase 1: Packet capture and processing The system initiates with real-time packet sniffing on the serverAos network interface .
Tailscale.
Each incoming transmission control protocol/user datagram protocol (TCP/UDP) packet is parsed to extract metadata, including source IP, destination port, protocol type, packet size, and TCP flags Int J Elec & Comp Eng.
Vol.
No.
October 2025: 4785-4802
Int J Elec & Comp Eng
ISSN: 2088-8708
SYN).
Flows are uniquely identified by the tuple .
ource IP, destination por.
, enabling granular behavioral tracking.
Phase 2: Flow aggregation and statistics compilation Extracted flow metadata is aggregated into a structured dictionary, where each entry stores cumulative statistics:
Oe Total packets: Count of packets per flow.
Oe Total data volume: Sum of payload sizes .
Oe SYN counter .
: Number of SYN packets .
ndicative of connection attempt.
Oe Last activity timestamp: Time since the last packet in the flow.
This phase ensures continuous monitoring of flow behavior while minimizing redundant computations.
Phase 3: Thread synchronization A thread lock safeguards concurrent access to the shared aggregation dictionary.
This prevents race conditions during parallel updates by the packet processing thread (Phase .
and background maintenance threads (Phase .
, ensuring data integrity.
Phase 4: Parallel background threads Three dedicated threads operate concurrently:
Oe Packet processing thread: Core thread for continuous packet capture and flow updates.
Oe Summary and prediction thread: At fixed intervals .
, 60 second.
, this thread triggers flow analysis, invokes parallel ML predictions via a ThreadPoolExecutor, and generates intrusion reports.
Oe Flow cleanup thread: Periodically purges inactive flows .
, no packets within 60 second.
to optimize memory usage.
Phase 5: Multi-threaded ML prediction During each summary cycle, flow features .
port, total data, packet count, pktsen.
are normalized using a pre-trained scaler and fed into a machine learning model.
Predictions (Allow/Den.
are executed in parallel across a thread pool, leveraging multi-core CPUs to maintain low latency under high traffic loads.
Phase 6: Result consolidation and reporting Prediction results are aggregated into a human-readable summary, highlighting suspicious flows .
, high DENY rate.
Alerts are logged or forwarded to security tools for automated mitigation.
Phase 7: Flow aging and cleanup Inactive flows are automatically evicted from the aggregation dictionary, ensuring the system remains lightweight and responsive to evolving traffic patterns.
The architecture in Figure 1 visually encapsulates these phases, emphasizing the parallelized workflow between packet processing.
ML inference, and resource management.
The ThreadPoolExecutorAos role in enabling concurrent predictions is a critical innovation, distinguishing this system from conventional single-threaded IDS solutions.
This phased design not only enhances detection accuracy through ML-driven analysis but also ensures operational efficiency in dynamic network environments.
Figure 1.
Architecture of a multi-threaded network IDS with parallel machine learning-based prediction Anomaly-based intrusion detection leveraging optimized firewall log A (Tran Cong Hun.
A ISSN: 2088-8708 Proposed real-time network traffic anomaly detection algorithm To address the challenge of detecting malicious traffic in real-time, we propose a lightweight yet effective algorithm that operates directly at the flow level.
The proposed approach leverages a pre-trained machine learning classification model and maintains minimal in-memory statistics to achieve both low latency and high throughput.
The algorithm, named real-time flow-level intrusion detection (RT-FLID), is designed to process live packets, extract essential flow-level features, and periodically classify traffic behavior without interrupting the ongoing packet capture process.
Algorithm 1.
RT-FLID Ae real-time flow-level intrusion detection using ML classification Input: Incoming packets captured on network interface Output: Traffic label OO {ALLOW.
DENY} 1: Initialize model Ia Load classification model from joblib 2: Initialize scaler Ia Load feature scaler from joblib 3: Initialize ip_aggregation as empty dictionary 4: Start background thread to execute Print_Summary_Every_60s() 5: Start background thread to execute Cleanup_Expired_IPs() 6: Set my_ip Ia Get IP of monitored interface 7: while True do pkt Ia Capture next incoming packet from network interface if pkt contains IP and (TCP or UDP) then src Ia Source IP address of pkt dst Ia Destination IP address of pkt dport Ia Destination port from TCP/UDP header size Ia Length of pkt in bytes timestamp Ia Current time if dst=my_ip then ip_key Ia .
rc, dpor.
Acquire Lock if ip_key not in ip_aggregation then ip_aggregation.
p_ke.
Ia .
otal_length: size, packet_count: 1, pktsent: 0, last_update: timestam.
if pkt is TCP and SYN flag is set and ACK is not set then ip_aggregation.
p_ke.
pktsent Ia 1 end if Update total_length, packet_count, last_update if pkt is TCP and SYN flag is set and ACK is not set then ip_aggregation.
p_ke.
pktsent Ia pktsent 1 end if end if Release Lock end if end if 32: end while Procedure: Print_Summary_Every_60s() 1: while True do Sleep 60 seconds Acquire Lock for each .
p_key, dat.
in ip_aggregation do if packet_count > 0 then features Ia .
port, total_length, packet_count, pktsen.
Scale features using scaler prediction Ia model.
label Ia AuAllowAy if prediction=1 else AuDenyAy Print prediction with src IP and dport end if end for Release Lock 14: end while Procedure: Cleanup_Expired_IPs() 1: while True do Sleep 5 seconds Get current_time Acquire Lock for each .
p_key, dat.
in ip_aggregation do if current_time - last_update > 60 then Remove ip_key Print timeout message end if end for Release Lock 12: end while Int J Elec & Comp Eng.
Vol.
No.
October 2025: 4785-4802
Int J Elec & Comp Eng
ISSN: 2088-8708
RT-FLID is a proposed real-time detection algorithm that leverages ML to classify network traffic based on aggregated flow statistics.
The system continuously captures packets from the monitored interface and maintains per-flow statistics based on the tuple .
ource IP, destination por.
Every 60 seconds, these statistics are converted into feature vectors, scaled, and classified using a pretrained ML model to determine whether the traffic should be allowed or denied.
Oe Initialization phase (Lines 1Ae.
: The system loads the pretrained ML model and corresponding scaler using joblib.
It initializes a dictionary .
cnycy_ycayciyciycyceyciycaycycnycuyc.
to store flow statistics and launches two background threads: one for periodic prediction and one for garbage collection of stale flow entries.
Oe Real-time packet processing loop (Lines 7Ae.
: Each incoming packet is analyzed if it includes IP and transport-layer headers (TCP or UDP).
If the destination matches the monitored host, the flow is identified by its .
ource IP, destination por.
The algorithm then updates the total bytes, packet count, and SYN packet count, maintaining a lock to ensure thread safety.
Oe Prediction procedure Ae ycEycycnycuyc_ycIycycoycoycaycyc_yaycyceycyc_60yc() (Lines 1Ae.
: Every 60 seconds, the system iterates over active IP flows, extracts feature vectors .
port, total_length, packet_count, pktsen.
, applies scaling, and performs classification using the trained model.
Each flow is then labeled as Allow or Deny based on the prediction result.
The outcome is printed for monitoring and logging purposes, supporting real-time intrusion detection and decision auditing.
Oe Cleanup procedure Ae yaycoyceycaycuycycy_yaycuycyycnycyceycc_yaycEyc() (Lines 1Ae.
: To maintain memory efficiency and prevent resource exhaustion, this background thread periodically .
very 5 second.
checks for inactive IP flows in the ycnycy_ycayciyciycyceyciycaycycnycuycu structure.
If a flow has not been updated for more than 60 seconds, it is removed from memory, and a timeout message is logged.
This mechanism ensures optimal performance and scalability of the real-time intrusion detection system.
Overall.
RT-FLID achieves a balance between low-latency packet handling and batch-based ML classification, making it suitable for lightweight deployment in real-time intrusion detection on resourceconstrained systems.
EVALUATION METHODS AND DATASET
In this section, we utilize a firewall log dataset collected from a real-world environment, which contains detailed information about inbound and outbound packets as well as network access behaviors.
The performance of the models is evaluated using quantitative metrics, including confusion matrix (CM.
, accuracy (Ac.
, precision (Pr.
, and recall (Re.
, along with training and prediction time.
These are combined with statistical analysis techniques to ensure the reliability and significance of the results.
The dataset is partitioned using various training/testing ratios to examine the stability and generalization capabilities of the models across different data distribution scenarios.
The integration of quantitative evaluation metrics with statistical validation enhances the objectivity and credibility of the abnormal traffic detection assessment.
Firewall logs dataset This dataset collected events from the firewall device in the attack detection system .
, and the dataset was named Aufirewall logs.
Ay The firewall device is used to collect logs of events that occur during the process of preventing attacks such as DDoS, and phishing.
The firewall logs dataset contained 65,532 records.
The number of corresponding actions for each type is presented in Table 2, in which the Action attribute was classified into four labels: AuAllowAy .
AuDenyAy .
AuDropAy .
, and AuReset-BothAy .
These actions are directly related to granting or denying access to resources from the external network to the internal network of the firewall system.
Additionally, the dataset includes 12 different attributes, as described in Table 3.
Machine learning models Machine learning algorithms were selected in this study due to their efficiency, interpretability, and strong generalization capabilities across diverse data distributions.
These characteristics make ML particularly suitable for processing complex, high-volume traffic logs in real-time network environments.
comprehensively evaluate their applicability to anomaly detection in firewall logs, five widely-used ML algorithms were systematically selected and assessed:
Oe Decision tree: A simple yet effective tree-based model that splits data based on feature thresholds, offering high interpretability and fast training time.
Oe Random forest: An ensemble of decision trees that improves classification performance and reduces overfitting through bootstrap aggregation .
Oe Extra trees: A randomized ensemble method similar to Random Forest, which introduces additional randomness in feature splits to enhance computational speed and robustness.
Anomaly-based intrusion detection leveraging optimized firewall log A (Tran Cong Hun.
A ISSN: 2088-8708 Oe CatBoost: A gradient boosting algorithm optimized for handling categorical features efficiently, offering competitive accuracy with minimal preprocessing.
Oe AdaBoost: An adaptive boosting technique that iteratively focuses on misclassified instances to construct a strong composite classifier from multiple weak learners.
The five machine learning models were evaluated across nine distinct trainAetest split scenarios .
anging from 1:9 to 9:.
Their performance was assessed using standard evaluation metrics, including Acc.
Pre.
Rec.
CMa, and training/prediction time .
n second.
Pearson correlation coefficient analysis was also applied to examine feature relevance and support effective feature selection.
Among all evaluated models, the decision tree consistently achieved the highest Acc and exhibited the most stable performance across all experimental configurations, leading to its selection as the final prediction model for abnormal traffic detection in server systems.
Table 2.
Firewall logs dataset Action type Allow Deny Drop Reset-both Total No.
of actions Table 3.
Attributes in the dataset No.
Attribute Source port Destination port NAT source port NAT destination port Action Bytes Bytes sent Bytes received Packets Elapsed time .
Pkts_Sent .
ackets sen.
Pkts_Received .
ackets receive.
Description Port to which the packet is sent from the source machine, integer data type .
The gateway on the destination device knows which application or service the packet needs to be delivered to, integer data type .
, requesting access to a website over HTTPS.
it is .
Network address translation (NAT) source port is the conversion from source port .
, internal, 22.
of internal IP .
, 192.
to source port .
After NAT, 56.
of public IP .
, 203.
NAT Destination port is changing from destination port (Outside.
, port .
to destination port (Internal.
, and forwarding the packet to the internal server .
, 192.
10:8.
The IPS system detects and processes traffic.
Allow: Allow normal connections .
lassified as Au3A.
Deny: Reject the connection with notification to the sender (Au2A.
Drop: Block traffic suspected to be malicious (Au1A.
Reset-Both: Terminate any connection detected as malicious (Au0A.
Sum of bytes sent and bytes received .
, 177 bytes=94 bytes 83 byte.
Total data .
n byte.
sent over the network by a device or application .
, 94 byte.
Total data .
n byte.
received from the network by the device or application .
, 83 byte.
Sum of packets sent and packets received.
Each packet is 1,500 bytes in size (Ethernet standar.
The firewall will record the elapsed time .
n second.
to track the lifetime of each connection.
Total number of packets sent by the device or application over the network.
Total number of packets received by the device or application from the network.
Evaluation metrics and statistical methods To assess the classification performance of the proposed ML models, several standard evaluation metrics were employed, including the CMa.
Acc.
Pre, and Rec, as described in .
Ae.
These metrics allow for a detailed analysis of both overall performance and error distribution across classes.
In particular, the confusion matrix, as presented in .
, captures the correspondence between predicted and ground truth labels in the classification process.
yaycAyca = .
cNycA yaycE yaycA ycNycE] .
The confusion matrix provides four essential quantities for binary classification analysis.
True positive .
cNycE) represents correctly identified malicious instances .
, drop, deny, reset-bot.
, while True Negative .
cNycA) corresponds to correctly identified benign instances .
, allo.
In contrast, false positive .
aycE) and false negative .
aycA) represent misclassifications of benign and malicious traffic, respectively, and serve as the foundation for further metric calculations.
yaycayca = ycNycE ycNycA ycNycE ycNycA yaycE yaycA Int J Elec & Comp Eng.
Vol.
No.
October 2025: 4785-4802 Int J Elec & Comp Eng ycEycyce = ycIyceyca = A ISSN: 2088-8708 ycNycE ycNycE yaycE ycNycE ycNycE yaycA Acc measures the overall correctness of predictions.
Pre quantifies the proportion of positive identifications that were actually correct, while Rec .
r true positive rat.
indicates the modelAos ability to identify positive instances.
A high recall is especially important in security contexts, where failing to detect a malicious instance can be costly.
The Pearson correlation coefficient .
is a statistical measure used to quantify the strength and direction of the linear relationship between two continuous variables.
It is frequently used in ML and data analysis for feature selection, where identifying strong correlations can improve model interpretability and For example.
Chen et al.
utilized this coefficient to assess the importance of input variables in reliability analysis and predictive modeling.
The coefficient is computed as .
yc= Ocycu ycn=1.
cuycn OeycuI ).
cycn Oeyc
I)2
ocycu ycn=1.
cuycn OeycuI ) Oo Ocycn=1.
cycn Oeyc Ocycu ycu Ocycu yc The average values of variables ycu and yc are computed as ycuI = ycn=1 ycn and Iyc = ycn=1 ycn, respectively.
In these ycu ycu formulas, ycuycn and ycycn denote the values of ycu and yc at the ycn-th observation, and ycu represents the total number of Furthermore, the Pearson correlation coefficient yc quantifies the degree of linear association between the two variables, where yc = 1 indicates a perfect positive correlation, yc = Oe1 indicates a perfect negative correlation, and yc = 0 signifies no linear correlation.
Dataset splitting for model evaluation Equation .
was used to define multiple trainingAetesting splits in order to evaluate model performance metrics, including Acc.
Pre, and Rec, across five ML algorithms: DT.
RF.
ET.
CB, and AB.
each simulation scenario ycn, the test size refers to the proportion of the dataset reserved for testing, while the training size is the complement of that fraction, as illustrated in Figure 2.
This ratio-based splitting approach enabled the identification of an optimal data partitioning scheme for predictive modeling.
Figure 2.
Distribution of training and testing data proportions across nine simulation scenarios using the firewall logs dataset .
,532 record.
Anomaly-based intrusion detection leveraging optimized firewall log A (Tran Cong Hun.
A ISSN: 2088-8708
ycNyceycycycnycuyci .
= 1 Oe ycNycycaycnycuycnycuyci .
where ycn OO .
, 2.
A ,.
represents different simulation configurations.
The Firewall logs dataset, consisting of 65,532 records, was divided accordingly into training and testing subsets.
As the training portion increases across simulations, the corresponding testing portion decreases, allowing an empirical examination of how model performance varies with data availability.
The computation time for training or testing in each scenario was calculated using .
, by taking the difference between the end and start timestamps:
n yc = yc2 Oe yc1 where n yc denotes the elapsed time, yc1 is the start time, and yc2 is the end time.
The training and testing data proportions used in this study are illustrated in Figure 2, based on the firewall logs dataset, which comprises 65,532 records.
Across nine simulation iterations .
rom 1 to .
, the proportion of training data progressively increased, while the testing data proportion decreased Each iteration corresponded to a specific trainAetest ratio, ranging from 1:9 to 9:1.
These data partitioning schemes were employed to investigate the effect of different data splits on the performance of classification models.
RESULTS AND DISCUSSION
The experimental assessments were conducted on a high-performance computing server equipped with dual Intel Xeon E5-2696 v3 processors .
30 GHz, 36 cores/72 thread.
and 64 GB of DDR4 RAM, powered by an ASUS TUF 1200 W Gold ATX 3.
0 power supply to ensure operational stability during intensive workloads.
Although the system was equipped with an NVIDIA GeForce RTX 3090 XC3 Ultra Hybrid GPU .
GB GDDR6X, 10,496 CUDA core.
, all training and inference tasks were executed on the CPU, as the study focused exclusively on ML models rather than DL architectures.
DL models typically require larger datasets and more computational resources, which may not be feasible or necessary for the problem of detecting abnormal traffic in server systems.
Therefore.
ML algorithms were preferred for their efficiency and interpretability in this study.
Five ML algorithmsAiDT.
RF.
ET.
CB, and ABAiwere systematically evaluated across nine trainAetest ratio scenarios .
anging from 1:9 to 9:.
Model performance was assessed using standard metrics, including yaycayca, ycEycyce, ycIyceyca, and yaycAyca analysis, training and testing times .
easured in second.
of the five algorithms, as well as the evaluation of the correlation matrix.
Among these models.
DT consistently demonstrated the highest accuracy and most stable performance, leading to its selection as the final prediction model in this study on detecting abnormal traffic in server systems.
Assessing ML-based intrusion detection models using firewall logs As shown in Table 4.
DT consistently achieved the highest and most stable accuracy across all trainAetest splits, with training accuracy ranging from 99.
862% to 99.
893% and test accuracy from 98.
835% to At the 7:3 split.
DT attained its peak test accuracy of 99.
45%, outperforming ET .
389%).
338%).
RF .
005%), and AB .
005%).
While ET and CB remained competitive .
, 99.
374% and 343% at the 9:1 and 8:2 split.
, their performance was consistently lower than that of DT.
In contrast.
RF and AB yielded noticeably lower test accuracies, particularly under minimal training conditions .
, both at 108% with the 1:9 spli.
These results from Table 4 confirm that DT offers superior generalization and robustness across varying data distributions for detecting abnormal traffic in server systems, and Figure 3 further reinforces this finding by comparing the accuracy of the five algorithms at the 7:3 trainingAetesting ratio, where DT demonstrates the best performance.
Table 4.
Accuracies of the five algorithms across different trainAetest splits.
TrainingAe testing ratio Train Test Train Test Train Test Int J Elec & Comp Eng.
Vol.
No.
October 2025: 4785-4802
Train
Test
Train
Test
Int J Elec & Comp Eng
ISSN: 2088-8708
Figure 3.
Comparison of the accuracy of the five algorithms at the 7:3 trainingAetesting ratio According to the results presented in Table 5.
DT consistently achieved the highest training precision across all trainAetest ratios, ranging from 99.
2% to 99.
Although its test precision initially started lower .
985% at the 1:9 spli.
, it improved significantly to 99.
455% at the 8:2 split, demonstrating strong generalization as more training data was introduced.
In comparison.
RF and AB exhibited limited improvement, with test precision remaining below 72.
ET and CB showed stronger performance gainsAiCB reached 99.
341% and ET achieved 99.
472% at the 8:2 splitAiyet they still trailed DT in overall These findings from Table 5 further affirm DTAos robustness and adaptability in detecting abnormal traffic within server systems under varying data conditions.
Table 5.
Comparison of precision among the five algorithms after training and testing TrainingAe testing ratio Train Test Train Test Train Test Train Test Train Test As detailed in Table 6.
DT consistently achieved the highest recall on the training data, ranging from 982% to 99.
923%, and demonstrated notable test recall performance, peaking at 93.
389% with the 7:3 In contrast.
RF and AB maintained relatively low recall on test sets across all splits, remaining around 000% to 73.
ET displayed improved test recall, reaching 98.
020% at the 7:3 split, whereas CB demonstrated stable performance, with test recall gradually increasing to 87.
492% at the 5:5 split and 987% at the 8:2 split.
Despite fluctuations.
DT consistently outperformed the other methods in both training and test scenarios.
These findings reinforce DTAos effectiveness in capturing true abnormal traffic patterns within server systems.
As shown in Table 7.
DT consistently exhibited the lowest computational time across all trainAetest splits, with training times ranging from 0.
05843 seconds .
20642 seconds .
and testing times 05292 seconds .
13196 seconds .
RF and ET demonstrated moderate training and testing for instance.
RFAos training time increased from 0.
23983 seconds .
99801 seconds .
, while ETAos training time rose more steeply from 0.
45540 to 2.
30398 seconds, with its maximum test time 03108 seconds at the 9:1 split.
In contrast.
CB incurred the highest training costs among all Anomaly-based intrusion detection leveraging optimized firewall log A (Tran Cong Hun.
A ISSN: 2088-8708 models, starting at 6.
43591 seconds .
and climbing to 32.
7983 seconds .
, although its testing time remained relatively stable, ranging from 0.
41535 to 0.
47654 seconds.
AB maintained relatively low testing times, such as 0.
4228 seconds at 7:3 and 0.
43434 seconds at 9:1, but its training time still increased steadily 22743 to 1.
00326 seconds across the splits.
These results highlight the computational efficiency of DT, particularly in scenarios requiring low-latency model deployment, while underscoring the substantial training overhead introduced by ensemble methods like CB and ET.
Table 6.
Comparison of recall across the five algorithms TrainingAe testing ratio Train Test Train Test Train Test Train Test Train Test The correlation matrix in Figure 4 was utilized in Formula 5 to evaluate the linear relationships among the 12 attributes listed in Table 3.
Pairwise correlations were analyzed, with values ranging from -1 A strong positive correlation was observed between AuBytes SentAy and AuPacketsAy .
orrelation OO .
, suggesting that the number of bytes sent is closely related to the number of packets transmitted.
Multicollinearity was also assessed.
if two attributes exhibited a correlation coefficient above 0.
8Aisuch as AuPkts_SentAy and AuBytes SentAyAione of them could be excluded to reduce redundancy.
Anomalies and potential data skewness were detected, as AuSource PortAy showed no meaningful correlation with trafficrelated attributes .
AuBytes SentA.
, indicating inconsistency.
Overall, the correlation matrix supported both the identification of key attributes and the reduction of redundant features.
Table 7.
Computational time .
for training and testing the five algorithms TrainingAe testing ratio Train Test Train Test Train Test Train Test Train Test Based on Figure 5, which presents the confusion matrices for the five classification algorithms under a 7:3 trainingAetesting split, clear differences in classification performance are evident, particularly for sensitive classes such as reset-both and deny.
The DT model demonstrated highly accurate classification across all classes, with 11,227 out of 11,292 reset-both samples correctly predicted and minimal confusion across other classes.
ET performed comparably well, correctly identifying 11,219 reset-both instances and misclassifying only 73.
In contrast, both RF and AB struggled significantly with the deny and reset-both For example.
RF misclassified 191 deny samples and 626 reset-both samples, indicating poor discrimination capability between those categories.
CatBoost, while not as precise as DT or ET, maintained competitive accuracy with 11,206 correct predictions for reset-both and low misclassification rates for other These results suggest that DT and ET offer superior robustness in distinguishing between nuanced traffic control actions, making them more suitable for detecting abnormal behaviors in server systems.
Experimental evaluation of real-time Figure 6 presents a comprehensive real-time summary and prediction report generated by the proposed intrusion detection system, highlighting its operational effectiveness in a dynamic network The system continuously aggregates flow-level statistics from multiple source IPs Int J Elec & Comp Eng.
Vol.
No.
October 2025: 4785-4802
Int J Elec & Comp Eng
ISSN: 2088-8708
1Ae100.
, all targeting the serverAos HTTP port .
, each exhibiting identical behavioral metricsAinamely, 192 packets, 7680 bytes of data transferred, and 96 SYN packets per flow.
This uniformity in traffic patterns is indicative of automated or coordinated activities, such as DDoS attacks or systematic port scanning, which are often challenging for traditional signature-based IDS to promptly Leveraging machine learning-based classification, the system assigns a DENY label to all detected flows, as evidenced by the consistent Au[Predictio.
DENYAe Src IP: .
-> Dst Port: 80Ay output, thereby demonstrating the modelAos ability to accurately recognize and respond to anomalous and potentially malicious behaviors in real time.
These experimental results underscore the robustness of the RT-FLID algorithm in synthesizing flow features and executing parallel inference, enabling timely and reliable defense against both known and emerging network threatsAia capability that aligns with recent advancements in deep learning-based intrusion detection systems.
Figure 4.
Correlation matrix showing the statistical relationships among 12 attributes in the dataset Figure 7 presents the system status table containing a list of active nodes in the network, including IP addresses .
, 100.
1, 100.
2, 100.
3, 100.
4, 100.
, along with information on devices, users, operating systems, connection status .
ctive, direc.
, and transmission statistics .
x/r.
These nodes act as attacking machines, generating abnormal traffic toward the victim machine with IP 16 by using the HPing3 tool to simulate various attack patterns.
The consistency between the IP addresses shown in this table and those labeled as DENY in the traffic summary report, see Figure 6, confirms that the intrusion detection system operates effectively not only on simulated data but also in a realworld network environment with multiple nodes and concurrent connections.
Furthermore, the traffic statistics per node provide a solid basis for verifying the model's prediction decisions, thereby enhancing the reliability of the Allow/Deny labels.
The combination of evidence from both figures demonstrates that the proposed RT-FLID algorithm is capable of accurately analyzing and classifying traffic in real time, while also proving its practical applicability in monitoring and protecting dynamic, multi-device network environments, including real-world attack scenarios.
Anomaly-based intrusion detection leveraging optimized firewall log A (Tran Cong Hun.
A ISSN: 2088-8708 Figure 5.
Confusion matrices illustrating the performance of five classification algorithms under a 7:3 trainingAetesting split Figure 6.
Real-time summary report and ML-based flow-level prediction results of the proposed intrusion detection system Discussion This study conducted a comprehensive evaluation of five machine learning algorithms (DT.
RF.
ET.
CB.
AB) across various training/testing data split ratios.
The models were assessed using standard performance metrics such as accuracy, precision, recall, confusion matrix, training time, and prediction time.
Among these, the DT model demonstrated the highest accuracy and stability, achieving a peak test accuracy 45% at the 7:3 split ratio.
Moreover.
DT outperformed the others in processing speed, recording the shortest training time .
20642 seconds at the 7:3 rati.
and rapid prediction times.
These results highlight the modelAos strong potential for deployment in low-latency environments and serve as compelling evidence of DTAos effectiveness and practicality in detecting abnormal traffic in server systems.
Experimental results indicate that a 7:3 training/testing split offers an optimal balance between providing sufficient training data and ensuring reliable test evaluation.
At this ratio, most modelsAi particularly the DT modelAiachieved the highest accuracy and recall scores, demonstrating strong Int J Elec & Comp Eng.
Vol.
No.
October 2025: 4785-4802
Int J Elec & Comp Eng
ISSN: 2088-8708
generalization capabilities.
Specifically.
DT reached a precision of up to 99.
455% and a recall of 93.
significantly outperforming the other algorithms.
This analysis provides valuable practical guidance for determining appropriate data split ratios in real-world network monitoring applications.
The proposed system, named RT-FLID, is a real-time intrusion detection architecture that employs a multi-threaded mechanism to concurrently process network traffic flows.
The architecture integrates three core components: flow aggregation by session, synchronized multi-threaded processing, and automated resource management.
As a result, the system significantly enhances detection speed, accuracy, and scalability, making it suitable for dynamic and high-throughput network environments.
This represents a substantial contribution, especially considering the limitations of traditional IDS systems when confronted with distributed and sophisticated cyberattacks.
Figure 7.
Active node and IP status table in the monitored network environment Another key contribution of this study is the integration of quantitative evaluation with statistical analysis to enhance the reliability and objectivity of the results.
In addition to conventional metrics such as accuracy, precision, recall, and training/prediction time, the study employs Pearson correlation analysis to effectively select input features, thereby optimizing model performance and reducing computational cost.
More importantly, the system is validated on firewall log data collected from real-world network environments, demonstrating its ability to accurately identify abnormal behaviors such as DDoS and port scanning by correctly labeling suspicious flows .
AuDENYA.
, with the DT model consistently achieving high accuracy in critical classes like AudenyAy and Aureset-bothAy as evidenced by the confusion matrix.
Although parallel processing can be resource-intensive and time-consuming in certain contexts, the application of ThreadPoolExecutor and multithreading in this study significantly reduces latency compared to sequential processing, while also enhancing the scalability of the system.
This design choice represents a meaningful contribution and can be considered a breakthrough in the field of real-time network intrusion Exploring this direction further in subsequent projects offers a promising avenue for developing more efficient and accurate solutions.
The proposed approach demonstrates strong potential and is expected to deliver substantial value to the research community in both academic and practical domains.
CONCLUSION
This study thoroughly evaluated five machine learning algorithmsAiDT.
RF.
ET.
CB, and ABAi across various training/testing data splits.
The DT model demonstrated superior performance, achieving a peak test accuracy of 99.
45%, precision of 99.
455%, and recall of 93.
389% at the optimal 7:3 split ratio.
also outperformed others in efficiency, with the shortest training time of just 0.
20642 seconds and rapid prediction speed, highlighting its suitability for low-latency environments.
The proposed RT-FLID system, leveraging multi-threaded processing, significantly enhanced detection speed, accuracy, and scalability in real-time intrusion detection.
Moreover, integrating Pearson correlation for feature selection optimized model performance and reduced computational costs.
Validated on real-world firewall logs, the system accurately identified abnormal activities like DDoS and port scanning, with the DT model maintaining high accuracy in critical classes such as AudenyAy and Aureset-both,Ay as confirmed by confusion matrix analysis.
Anomaly-based intrusion detection leveraging optimized firewall log A (Tran Cong Hun.
A ISSN: 2088-8708 While RT-FLID demonstrates strong real-time performance and scalability, several enhancements are planned for future research.
These include exploring hybrid deep learning models for improved detection of complex attack patterns, enabling compatibility with encrypted traffic through flow-based behavioral analysis, and integrating external threat intelligence feeds to enhance anomaly context awareness.
Additionally, extending the system to analyze logs from multiple sources .
IDS, system log.
could provide a more comprehensive security posture.
ACKNOWLEDGMENTS
The authors sincerely thank the Associate Editor, reviewers, and Editor-in-Chief for their valuable As a doctoral student, we appreciate the Posts and Telecommunications Institute of Technology.
Ministry of Information and Communications of Vietnam, for their partial financial support through an International Scientific Research Grant.
FUNDING INFORMATION
This study was partially supported by the Theme-based Research Grant on International Science .
rant number 999/Qa-HV] from the Ministry of Information and Communications.
Vietnam.
AUTHOR CONTRIBUTIONS STATEMENT
This journal uses the Contributor Roles Taxonomy (CRediT) to recognize individual author contributions, reduce authorship disputes, and facilitate collaboration.
Name of Author Tran Cong Hung Dam Minh Linh Han Minh Chau Ngo Xuan Thoai Thai Duc Phuong Huynh De Thu C : Conceptualization M : Methodology So : Software Va : Validation Fo : Formal analysis ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue I : Investigation R : Resources D : Data Curation O : Writing - Original Draft E : Writing - Review & Editing ue ue ue ue ue ue ue ue ue ue ue ue ue Vi : Visualization Su : Supervision P : Project administration Fu : Funding acquisition CONFLICT OF INTEREST STATEMENT Authors state no conflict of interest.
DATA AVAILABILITY
The data that support the findings of this study have been previously published and are openly available at GitHub: https://github.
com/MinhLinhEdu/Firewall-logs-dataset.
REFERENCES