SINERGI Vol. No. October 2025: 615-624 http://publikasi. id/index. php/sinergi http://doi. org/10. 22441/sinergi. An intelligent approach for detection and classification of security attacks in a Passive Optical Network using Light Gradient Boosting Machine Sumayya Bibi1. Nadiatulhuda Zulkifli1*. Farabi Iqbal1. Sajid Iqbal2. Arnidza Ramli1. Adam Wong Yoon Khang3 Department of Communication Engineering. Faculty of Electrical Engineering. Universiti Teknologi Malaysia. Malaysia Department of Information Systems. College of Computer Science and Information Technology. King Faisal University. Saudi Arabia Department of Engineering Technology. Faculty of Electronics and Computer Technology and Engineering. Universiti Teknikal Malaysia Melaka. Malaysia Abstract Over the past decade. Passive Optical Networks (PON. have emerged as a leading solution for next-generation broadband access, providing high-speed and cost-effective communication. However. PONs face significant security challenges, including data interception, denial-of-service (DoS) attacks, and resource exhaustion caused by malicious Optical Network Units (ONU. Machine learning (ML), particularly advanced models like Light Gradient Boosting Machine (LightGBM), has proven to be a promising solution for managing complex security issues in PONs. Leveraging its ability to handle imbalanced, high-dimensional datasets. LightGBM was employed in this study to detect and classify malicious ONUs based on bandwidth usage patterns. The model achieved an impressive accuracy of 95. 27%, a Matthews Correlation Coefficient (MCC) of 90%, and a precision rate of 93%. While traditional classifiers, such as Nayve Bayes (NB), achieved an accuracy of 88. LightGBM demonstrated superior robustness in addressing class imbalance and enhancing detection accuracy. This work highlights the potential of LightGBM in enhancing PON security and enabling intelligent, resilient broadband networks. Keywords: Attack detection system. Classification. LightGBM. Machine Learning. Nayve Bayes. Article History: Received: September 18, 2024 Revised: January 13, 2025 Accepted: February 12, 2025 Published: September 1, 2025 Corresponding Author: Nadiatulhuda Zulkifli Universiti Teknologi Malaysia, 81310 UTM. Johor. Malaysia Email: nadiatuhuda@utm. This is an open-access article under the CC BY-SA license. INTRODUCTION Passive Optical Networks (PON. have emerged as a premier approach for alleviating access congestion challenges in recent years. Their capability to deliver higher transmission speeds, guaranteed consistent quality of service (QoS), and cost effectiveness has solidified their position as the leading fiber-access network option . , 2, 3, 4, 5, . It functions through tree topology, which connects one point to multiple endpoints, providing user access. In a standard time-division (TDM) PON configuration, an optical fiber is passively branched by an optical power splitter, allowing a single fiber to route traffic exchange in the connection linking the optical line terminal (OLT) and the optical network units (ONU. Typically, the communication channels connecting these two elements utilize distinct wavelengths: 1490 nm is used for downstream transmissions, while 1310 nm is used for upstream transmissions. Due to the inherently passive design of the PON network, it offers a high level of security, creating substantial difficulties for any potential attackers attempting to intercept the optical signal . For example. Gigabit PON (GPON) incorporates Bibi et al. An intelligent approach for detection and classification of security attacks A SINERGI Vol. No. October 2025: 615-624 security measures like data encryption, identity authentication, and key management, along with other functionalities. However, recent studies have shown that attackers have devised multiple techniques, including splitting and bending attacks, to illicitly access a PON network . This setup can potentially be exploited by malicious aiming to disrupt the standard operations related to the ONU within the medium access control (MAC) layer. In these scenarios, rogue ONUs might intercept sensitive information intended for other ONUs, which could result in stealing information. Every ONU is required to comply and function in accordance with the dynamic bandwidth algorithm (DBA) agreement, which might lead to network vulnerabilities potentially undermine the security of the DBA Avoiding this is crucial for optimal GPON functionality. During the DBA process, a degradation attack attempts to acquire additional bandwidth at the expense of other ONUs rather than causing a complete disruption of GPON Nevertheless, countering degradation attacks remains a difficult challenge. Numerous strategies have been suggested in scholarly literature to counter such network threats. For instance, one method to mitigate IP spoofing and DOS attacks involves labelling network packets and tracking their origin at the perimeter routers. Another method is to block out these spoofed packets at the perimeter routers using hop limit or time to live filter as a criteria . Given that PON operates within access networks, whereas its DBA operates primarily within the medium access control layer, a DoS attack directed within the network and transport layers would notably increase traffic frames in both the downstream and upstream links of a PON . Several other potential attacks include IP spoofing, routing attacks, selective forwarding attacks, session hijacking attacks, port scanning attacks, and distributed denial-of-service attacks. Specifically, an ONU under attack will experience a heightened demand for bandwidth in the upstream shared link. The increased bandwidth demand will decrease the bandwidth available to other normal ONUs. A typical (DBA) scheme for managing upstream bandwidth often falls short in addressing this situation. The high accuracy of predictions when machine learning (ML) methods with real traces are employed in Next Generation Ethernet Passive Optical Network (NG-EPON )for detecting network traffic has been illustrated in . The proposed approach utilizes a single Long ShortTerm Memory (LSTM) model at the location of the Optical Line Terminal (OLT) for forecasting the bandwidth requirements of all ONUs under various network loads. It was demonstrated that applying ML algorithms for traffic prediction enhances performance in the context of NGEPON. This success was mainly attributable to the ability to gather and utilize knowledge effectively. This method demonstrates that high-performing, significantly enhance or potentially replace traditional network control in the near future. Additionally, . proposed an intelligent approach for classification and prediction within PON. The authors introduced an advanced classification technique that autonomously and incrementally predicts and categorizes future traffic into various types using LSTM and Gated Recurrent Unit (GRU) models. Similarly, . sought to illustrate the detection and classification of events within the PON applications for network traffic monitoring by incorporating Long ShortTerm Memory (LSTM) with . an ensemble classifier and . a neural network, respectively. Also, . focused on demonstrating fault detection in PON by applying a Support Vector Machine (SVM) classifier. Nevertheless, most of the existing DBA algorithms, with only a few exceptions, lack security awareness and tend to overlook potential network attacks, hence security has become an emerging topic in optical access networks. Notable studies on secure bandwidth allocation algorithms include Drakulic et al. and Fadila et al. However, they do not incorporate ML techniques as security measures, including threat detection and mitigation techniques rely on collision monitoring per ONU. This approach identifies only the ONU with the fewest collisions as a measure of potential threat and imposes penalties accordingly. Previous algorithms such as Nayve Bayes. Support Vector Machines (SVM). Decision Trees, and mostly LSTMs for classification tasks. While these models demonstrated satisfactory performance in terms of classification accuracy, region of convergence (ROC), and precision, their performance was inferior to modern ensemble techniques like Light Gradient Boosting Machine (LightGBM). For instance, a previous study in . employed LSTM and GRU in industrial passive optical networks for a dynamic bandwidth allocation algorithm based on traffic classification. Similarly, the study in . explored the synergistic use of XGBoost. TABPFN, and LightGBM for enhancing classification performance. LightGBM's strengths, such as handling class imbalance and Bibi et al. An intelligent approach for detection and classification of security attacks A p-ISSN: 1410-2331 e-ISSN: 2460-1217 efficiently processing large datasets, enabled superior classification of security attacks in PONs. It outperformed older methods in accuracy, precision, and ROC metrics, reliably distinguishing between attack types and advancing real-world PON security applications. In summary, the results from studies relating to machine learning methods applied in PON models have potential that are outlined as follows . , 13, 14, . Supervised commonly used. While K-Nearest Neighbors (KNN). SVM, and Bayesian algorithms have witnessed increased research attention given their prevalence in many studies, limited studies are available on PON models. Most of the supervised learning methods consistently achieve high mean accuracies, exceeding 90% in detection effectiveness across different assessment criteria. In PON implementations, the majority of studies have utilized SVM and Decision Tree (DT) methods. Different kinds of datasets have been Certain studies have utilized data from online sources like Kaggle, whereas some have generated datasets through flow generation methods for use with machine learning algorithms. No studies have identified the main attributes of flow records in PON . ncluding priority and action attribute. for all types of datasets employed in existing machine learning Precision, recall, and F1-score are the primary frequently employed evaluation metrics for assessing the performance of ML algorithms in most research. Conversely, accuracy and execution time are rarely utilized as performance measures. In view of the above trends, this paper aims to address the gap in detecting and mitigating various security threats, such as eavesdropping. DoS attacks, masquerading, and Theft of Service (ToS) in PON, by classifying malicious vs normal ONU using ML algorithms. Additionally, we propose a novel approach using BorderlineSMOTE post data processing that can further refine a model's performance on imbalanced datasets, especially after an initial model has been This method focuses on adjusting and enhancing the model's predictions by generating synthetic samples specifically in regions where the model misclassifies minority class instances. It is an effective strategy for handling class imbalance, particularly when the initial model has difficulty with minority class instances near the decision boundary. By generating synthetic samples in these critical regions and re-training the model, better classification performance through improved recall and more balanced precision can be achieved, ultimately leading to a more robust and reliable model. The rest of this paper is structured as follows: Section II reviews related work on machine learning algorithms applied in the PON The proposed methods are outlined in Section i. Section IV provides a detailed account of the experimental findings and discussion. In the end. Section V concludes the paper and highlights future directions. METHOD The model proposed throughout the study consists of two primary stages: detection and Figure 1 illustrates the proposed model for identifying and classifying malicious ONUs. The initial phase involves distinguishing between malicious and normal ONUs. In this phase, the algorithms assess the impact of a DDoS attack to monitor the behavior of ONUs. Features of a DDoS attack, such as bandwidth usage, are crucial for distinguishing between normal and malicious ONUs. Consequently, the results from these feature-checking methods determine whether the ONUs are normal or Normal ONUs are passed directly, while malicious ONUs are sent to the next phase for further classification. The algorithms proposed for detecting malicious ONUs, the SVM algorithms, were developed and implemented to boost their effectiveness regarding accuracy and execution time. The SVM algorithm was selected because it has demonstrated strong performance in prior research across various PON applications . Figure 1 shows the algorithmic steps employed in detecting malicious ONUs. The processes included in the detection stage are as follows. Execute and operate the method. The method examines the characteristics of ONUs. The method examines the priority and bandwidth usage of each ONU. The malicious ONUs are forwarded to the classification algorithm. Bibi et al. An intelligent approach for detection and classification of security attacks A SINERGI Vol. No. October 2025: 615-624 Figure 1. The proposed model for identifying and classifying ONUs The second stage of the proposed framework involves the classification of ONUs. this stage, the DDOS attacks identified during the detection stage are analyzed by an algorithm to assess the behavior of the ONUs. The three characteristics of ONUs are priority, bandwidth, and time. Once the checking process is complete, the ONUs is classified. Figure 1 also shows how the LightGBM algorithm is used to classify malicious ONUs. The procedure involved in the classification stage can be outlined as follows: Execute and operate the LightGBM classifier The method starts by detecting malicious ONUs. The method examines the priority and bandwidth usage of each ONU. It then classifies the types of ONUs based on the features assessed in step 3. Gradient Boosting Algorithm Gradient boosting is a type of ensemble learning method. Unlike the Nayve Bayes method, where models are created independently, ensemble boosting builds models sequentially, iteratively reducing the errors of previously learned models . It develops a predictive model by combining M additive tree models . 0, f1, f2, . , fM) to forecast the outcomes . ycA yce. = Oc . cu )) . yco=0 The tree ensemble model is optimized by minimizing the expected generalization error L, as described in . ycu ya = Oc. Oe yC)2 . ycn L represents a loss function that quantifies the difference between the target value yycn and the predicted value yC for a given data point. There are three main motivations to use an ensemble-based Statistical Combining and averaging multiple learners enhances data learning and reduces the risk of selecting inappropriate classifiers. Computational During learning, finding a local optimum to accurately represent data, such as decision boundaries, is computationally challenging. For instance, neural networks use gradient descent to minimize the loss function, starting from a single Ensemble methods, however, leverage multiple starting points for local searches, enabling more accurate estimations of functions like decision boundaries compared to individual Representational In some cases, a single classifier may struggle to capture complex decision boundaries. Ensemble-based learning addresses this by combining diverse decision boundaries from multiple classifiers . Gradient boosting enhances classifier robustness by reducing variance and bias while mitigating individual This study utilizes LightGBM, a Bibi et al. An intelligent approach for detection and classification of security attacks A p-ISSN: 1410-2331 e-ISSN: 2460-1217 novel and highly efficient gradient boosting algorithm, to build a more robust model. Light Gradient Boosting Machines LightGBM . is a gradient boosting method that employs a vertical, leaf-level treebuilding approach. LightGBM selects the leaf with the greatest loss reduction for splitting and uses histogram-based methods to identify optimal To improve training, it employs Gradientbased One-Side Sampling (GOSS), which prioritizes data samples with larger gradients while ignoring those with smaller gradients, assuming they have fewer errors and are well-trained . , 21, 22, 23, 24, 25, . Thus. GOSS recommends ignoring lessinformative data points and using the remaining ones to compute information gain for optimal However, this can introduce bias toward samples with larger gradients and distort the original data distribution. To address this issue. GOSS uses random sampling for low-gradient data while retaining high-gradient points. It compensates by increasing the weights of low-gradient points during information gain calculation. LightGBM uses a unique feature grouping algorithm to address data LightGBM effectively handles data sparsity and imbalance by merging mutually exclusive features in a nearly lossless manner, reducing feature count while retaining key Using Gradient-based One-Side Sampling (GOSS), it prioritizes samples with higher errors, as those with lower errors are considered adequately trained. Additionally, its "Exclusive Feature Bundling" method enables efficient processing of high-dimensional data, a common challenge in sentiment analysis. Algorithm LighgtGBM Training Process Figure 2. LightGBM Training Procedure The operation of LightGBM is illustrated by the algorithm depicted in Figure 2. Additionally, it has been shown that LightGBM achieves quicker convergence than other algorithms within the gradient boosting framework. As part of this study, we adopt the identical hybrid method, expected to be detailed in the section on related Nayve Bayes The algorithm applies Bayes' theorem, assuming variable independence relative to the class variable, a simplification rarely accurate in practice, hence the term "Naive. " Nonetheless, it performs efficiently in controlled classification tasks . , as shown in . for probability calculations under known conditions. yaA) = ycE. yaA ycE. ycE( ) ya ycE. aA) . We develop a LightGBM classifier to differentiate between ONUs . , normal and The performance of classifiers is widely recognized as being highly dependent on the features used for training. Throughout this research, our goal is to accomplish the following. Develop a classification model using LGBM to analyze ONUs affected by DDOS attacks in terms of bandwidth utilization. Examine how the proposed features perform on our dataset. Explore the correlation between normal and malicious ONUs derived from our dataset. Compare LightGBM with another classifier, specifically Nayve Bayes. Assess the effectiveness of the analysis. We outline the additions of this study as Create a cutting-edge LightGBM-based model for analyzing DDOS attacks on ONUs. Conduct comprehensive evaluations of classification algorithms using different feature subsets through experiments on our Where P(A|B) represents the probability of event A occurring given that event B has occurred. P(A) is the probability of event A occurring. P(B|A) is the probability of the occurrence of event B when event A occurs. P(B) is the probability of event B occurring. The concept behind the Nayve Bayes algorithm is to determine the posterior probability of a data instance ycycn in a class ycayc in belonging to a class within the data model. Bibi et al. An intelligent approach for detection and classification of security attacks A SINERGI Vol. No. October 2025: 615-624 Algorithm Nayve Bayes Training Process ycNycE ycNycE yaycE ycNycE ycNycA yaycaycaycycycaycayc = . cNycE ycNycA Oe yaycE yaycE ycEycyceycaycnycycnycuycu = ycAyaya = ycNycE ycu ycNycA Oe yaycE ycu yaycA Figure 3. The Nayve Bayes algorithm Training Process The posterior probability P . cayc ) represents the likelihood that ti can be assigned the label cj. cayc ) can be determined by multiplying the probabilities of all attributes of the data instance within the data model. cayc ) )=Oaycyyco=1 P . cayc ) . Where P represents the number of attributes in each data instance. The posterior probability is computed for all classes, and the class with the maximum probability is assigned as the label for the instance. The flowchart for this algorithm is shown in Figure 3. RESULTS AND DISCUSSION This section discusses the implementation of Naive Bayes and LightGBM and the analysis of the performance of these models based on evaluation metrics. This work uses OMNET simulated data network comprising 64 ONUs and one OLT with a fiber distance of 40 km in the ODN. All the models were trained on a synthetic dataset that was created to train a classifier to detect malicious ONUs within a PON based on their bandwidth usage patterns. This dataset includes bandwidth demand profiles from ONUs recorded under normal conditions and during simulated The response variable for the binary classification is labeled as 0 for normal ONUs and 1 for malicious ONUs. Predictive features include each ONU's average and peak bandwidth Data cleaning involved removing outliers from the bandwidth data. During the simulation. ONUs were labeled as either malicious or normal based on their behavior in attack The Accuracy. Precision and MCC of the proposed methods are computed using the formulas given by . , . , and . , respectively. The initial model used for analysis is Nayve Bayes. The model produces predictions based on the validation set. Three distinct metrics were computed for the predictions generated by the Precision. Matthews Correlation Coefficient (MCC), and Accuracy. The Naive Bayes classifier achieved a precision of 81. an MCC of 80. 503%, and an accuracy of 88. using the validation data. Subsequently, the same trained model was used to estimate the labels for the test data. The values for True Positives (TP). True Negatives (TN). False Positives (FP), and False Negatives (FN) can be determined by plotting the confusion matrix comparing the actual predictions with the values predicted by our The detailed results of the confusion matrix include True Positives (TP=. The confusion matrix offers a detailed breakdown of the classifier's performance by displaying actual versus predicted classifications, with True Positives (TP=. The model accurately identified 31 instances as positive. True Negatives (TN=. The model accurately predicted 16 instances as False Positives (FP=. were incorrectly predicted as positive, while False Negatives (FN=. were incorrectly predicted as negative, as illustrated in Figure 4. Figure 4. Confusion matrix for actual class and predicted class detection Bibi et al. An intelligent approach for detection and classification of security attacks A p-ISSN: 1410-2331 e-ISSN: 2460-1217 Roc Curve The ROC curve demonstrates the balance between sensitivity and specificity by plotting the true positive rate against the false positive rate at various threshold settings, illustrating the trade-off between sensitivity and AUC (Area under Curve=0. The AUC value of 0. 88 indicates that the model has good discriminative ability, as shown in Figure 5. A model with an AUC closer to 1 is considered excellent, while an AUC closer to 0. suggests no discriminative power. These Performance Metrics are calculated. The bar charts provide a summary of key performance metrics: Accuracy. Precision, and MCC are shown in Figure 6. The NB classifier achieved a precision of 81. 185%, an MCC of 80%, and an accuracy of 88. 359% using the validation Accuracy The high accuracy demonstrates that the model is reliable in its predictions. Precision The precision value indicates the model's effectiveness in minimizing false positives, which is particularly important in scenarios where false positives are costly. MCC (Matthews Correlation Coefficien. A high MCC score indicates a strong overall performance, accounting for true and false positives and negatives. This is useful for a comprehensive perception of the modelAos Figure 6. Performance Metrics of Nayve Bayes (NB) Classifier for ONU Fault Detection: Accuracy. Precision, and MCC From the above discussion, it is observed that the LightGBM model demonstrates excellent performance across all evaluation metrics. Figure 7 shows the confusion matrix for actual and predicted classes using the LGBM model, with true negatives . , false positives . , false negatives . , and true positives . The model demonstrates high accuracy, with only two misclassifications, indicating strong predictive The model's exceptional discriminative ability is highlighted by the ROC curve in Figure 8, where the LightGBM (LGBM) classifier achieves an AUC of 0. This high AUC highlights the model's strong ability to distinguish between classes, outperforming the Naive Bayes classifier, which typically has lower AUC scores due to its simplifying assumptions and limitations. Figure 7. Confusion matrix for the actual class and predicted class. Figure 5. ROC for Nayve Bayes (NB)Classifier with AUC 0. 88 for ONU fault Bibi et al. An intelligent approach for detection and classification of security attacks A SINERGI Vol. No. October 2025: 615-624 Table 1. Performance of our models at detecting security threats and classifying ONUs Model LGBM Accuracy (%) MCC (%) Precision (%) For optimal threat detection with minimal false positives. LGBM is preferred. Figure 8. ROC for LightGBM Classifier with AUC 95 for ONU fault detection Figure 9. Performance Metrics of LightGBM Classifier for ONU Fault Detection: Accuracy. Precision, and MCC Figure 9 shows the performance metrics of the LightGBM classifier for ONU fault detection, focusing on Accuracy. Precision, and MCC. The LGBM classifier achieved a precision of 93%, an MCC of 90%, and an accuracy of 95. 27% using the validation data. The high accuracy reflects the model's overall correctness in predictions, while the high precision indicates its ability to minimize false positives, making it reliable for fault Additionally, the high MCC score demonstrates the classifier's balanced and robust performance, accounting for all aspects of the confusion matrix, even in scenarios with potential class imbalance. Overall, the results confirm the effectiveness of the LightGBM classifier in accurately detecting ONU faults. Based on Table 1, the LGBM model outperforms NB in all metrics, making it the superior choice for detecting security threats and classifying ONUs. While NB is simpler and more efficient, its performance is lower. CONCLUSION This study evaluates the classification of normal and faulty ONUs using LightGBM and Naive Bayes (NB). It also compares the performance of LGBM, an advanced classification method, against NB, one of the earliest classification algorithms, demonstrating that LightGBM achieved superior scores. The findings indicate that the LGBM classifier outperforms others in terms of accuracy, precision, and Matthews correlation coefficient (MCC). LightGBM excels in addressing class imbalance issues, delivering the best results with a detection accuracy of 98. 49%, outperforming NB. LGBM shows robustness in accuracy, precision, and MCC, achieving the highest scores among the evaluated techniques. Future work could explore the impact of varying dosage levels on classification performance. To the best of our knowledge, this is the first effort to apply machine learning algorithms for detecting and classifying the nature of ONUs. ACKNOWLEDGMENT This work was supported/funded by the Ministry of Higher Education under the Fundamental Research Grant Scheme with (FRGS/1/2023/TK07/UTM/02/. REFERENCES