JOIV : Int. Inform. Visualization, 8. : IT for Global Goals: Building a Sustainable Tomorrow - November 2024 1678-1685 INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION journal homepage : w. org/index. php/joiv Comparative Analysis of Machine Learning Algorithms for Cross-Site Scripting (XSS) Attack Detection Khairatun Hisan Hamzah a. Mohd Zamri Osman a,*. Tumusiime Anthony a. Mohd Arfian Ismail b. Zubaile Abdullah c. Alde Alanda d Faculty of Computing. Universiti Teknologi Malaysia. Skudai. Johor Bahru. Malaysia Faculty of Computing. Universiti Malaysia Pahang Al-Sultan Abdullah. Pekan. Pahang. Malaysia Faculty of Computer Science and Information Technology. Universiti Tun Hussein Onn Malaysia. Parit Raja. Johor. Malaysia Department of Information Technology. Politeknik Negeri Padang. Padang. Indonesia Corresponding author: *mohdzamri. osman@utm. AbstractAiCross-Site Scripting (XSS) attacks pose a significant cybersecurity threat by exploiting vulnerabilities in web applications to inject malicious scripts, enabling unauthorized access and execution of malicious code. Traditional XSS detection systems often struggle to identify increasingly complex XSS payloads. To address this issue, this research evaluated the efficacy of Machine Learning algorithms in detecting XSS threats within online web applications. The study conducts a comprehensive comparative analysis of XSS attack detection using four prominent Machine Learning algorithms, which consist of Extreme Gradient Boosting (XGBoos. Random Forest (RF). K-Nearest Neighbors (KNN), and Support Vector Machine (SVM). This research utilizes a comparative methodology to assess the selected Machine Learning algorithms by analyzing their performance metrics, including confusion matrix, 10-fold crossvalidation, and assessment of training time to thoroughly evaluate the models. By exploring dataset characteristics and evaluating the performance metrics of each selected algorithm, the study determined the most robust Machine Learning solution for XSS detection. Results indicate that Random Forest is the top performer, achieving 99. 93% accuracy and balanced metrics across all criteria evaluated. These findings will significantly enhance web application security by providing reliable defenses against evolving XSS threats. KeywordsAiCross Site Scripting (XSS). machine learning. RF. XGBoost. KNN. SVM. web application security. Manuscript received 5 Mar. revised 17 Jul. accepted 24 Sep. Date of publication 30 Nov. International Journal on Informatics Visualization is licensed under a Creative Commons Attribution-Share Alike 4. 0 International License. As a result, online web applications and users that utilize traditional methods are left vulnerable. This calls for a dynamic and adaptive solution that can overcome the constantly evolving payloads of XSS. This research implemented a machine learning (ML) approach to XSS detection to address the increasing complexity of XSS payloads. The study focuses on utilizing four prominent algorithms, specifically Extreme Gradient Boosting (XGBoos. Random Forest (RF). K-Nearest Neighbors (KNN), and Support Vector Machine (SVM). Each model is carefully tuned to enhance its ability to distinguish between malicious scripts and benign code. Our study aims to comprehensively analyze these algorithms to determine the most effective model for robust XSS detection in web The performance of each model is evaluated based on multiple metrics, including training time, confusion matrix, and 10-fold cross-validation. By assessing these metrics, the INTRODUCTION Cross-site scripting (XSS) remains a challenging threat in cybersecurity, exploiting vulnerabilities in online web applications to inject malicious scripts into web pages. XSS is a web security vulnerability where attackers inject malicious scripts into trusted websites, exploiting the site's failure to validate or encode user input properly. This poses a significant risk to users, enabling attackers to gain unauthorized access to sensitive information and execute malicious code. The traditional XSS detection system needs to be improved, considering the increasingly diverse forms of XSS payloads . OWASP's 2021 report indicates that 94% of the applications tested are susceptible to injection vulnerabilities, with 33 Common Weakness Enumerations (CWE. falling into this category . Traditional methods for detecting cross-site scripting (XSS) focus on signature-based approaches, which involve investigating known attack study aims to identify the optimal approach for XSS detection that can adapt to the evolving nature of web-based attacks. The findings of this research are expected to contribute significantly to the advancement of XSS detection models and provide valuable insights for enhancing cybersecurity in the digital realm. The findings propose recommendations for enhancing the accuracy and performance of the XSS detection model by selecting the most effective approach for classifiers to identify various types of XSS attacks. Type of XSS Attack Distinguishing XSS attacks can be exceedingly challenging due to the concealed aspect of the malicious script, whether it happens to be on the server side or the client side. Within the framework of an XSS attack, it is possible to classify the threats into three distinct types: Stored XSS. Reflected XSS, and DOM-based XSS . Each type of XSS attack demonstrates the diverse methods attackers use to exploit web application vulnerabilities. Understanding these XSS variants' mechanisms and potential impacts is expected to protect web applications and their users against these persistent threats. XSS Detection Model Machine Learning-based XSS attacks can be prevented by employing a machinelearning algorithm in an XSS detection model . This model aims to differentiate the dataset for the proposed approach Differentiating between XSS attacks and nonXSS inputs, which are regular web application inputs. These standard inputs can include text, numbers, and other types of inputs, including combinations of these elements. Hence, it is the one to monitor and detect XSS attacks on web application Utilizing logistic regression on 16,361 samples, the method attained exceptional accuracy with minimal false positives. This hybrid strategy exhibited enhanced efficacy relative to standalone linguistic and feature selection techniques . The n-gram was also studied comprehensively in . for email spam detection. In . , a hybrid method for phishing attack detection was employed for better performance. An extensive evaluation of multiple ML models, including Random Forest. XGBoost, and ensemble methods. Their study utilized a large dataset of 138,569 samples and incorporated feature selection techniques. The Random Forest model achieved high accuracy, while their ensemble models combining Random Forest with Decision Trees and Gradient Boosting also showed high performance . The Isolation Forest, meanwhile, was deployed to detect diabetes mellitus, reducing the complexity of staking. A comparative analysis of five ML algorithms using a Kaggle dataset. The study evaluated AdaBoost. XGBoost. Decision Tree. Logistic Regression, and Naive Bayes. Among these. AdaBoost demonstrated the highest accuracy. AdaBoost also excelled in precision, specificity, and F1-score, further establishing its effectiveness for XSS attack detection. TABLE I RELATED RESEARCH PAPERS Ref. Description Result Used XGBoost in a hybrid learning approach for XSS Compared XGBoost. RF. KNN, and SVM for XSS Compared KNN and SVM for XSS detection. Compared XGBoost. KNN. RF, and SVM for XSS Proposed Random Forest. WAIDPFS Compared AdaBoost. Random Forest. Decision Tree. SVM. KNN. LR. XGBoost Proposed Logistic Regression. N-gram. Feature Selection Hybrid (XGBoost RF):96. XGBoost: 94. RF: 93. SVM: 91. KNN: 89. XGBoost: 95. RF: 93. SVM: 92. KNN: 89. SVM: 93. KNN: 88. Related Studies The related research on XSS attack detection using an MLbased model includes efforts to understand the analysis of XSS attack patterns, applying machine learning algorithms for XSS attack identification, and evaluating various ML algorithms for XSS detection. Research on XSS attack patterns has explored various aspects, including XSS attack payloads' characteristics. XSS attack methods' evolution, and the impact of XSS attacks on web applications. TABLE I shows that the studies provide valuable insights into the nature of XSS attacks, making way for more effective detection strategies . , . , . , . A proposed fusion verification method that combines traffic detection and XSS payload detection. The approach, utilizing Random Forest and a novel Web Application Intrusion Detection Prevention Firewall System (WAIDPFS), demonstrated superior realtime detection capabilities . A comprehensive evaluation of multiple machine learning algorithms was performed on a dataset comprising 13,686 The analysis focused on the efficacy of AdaBoost. Random Forest. Decision Tree. SVM. KNN. Logistic Regression, and XGBoost. Findings revealed that AdaBoost. Random Forest, and Decision Tree exhibited superior performance regarding accuracy and F1-score . A hybrid feature methodology integrating n-gram modeling and feature selection techniques was proposed. Evaluated Random Forest. XGBoost. Decision Trees. Gradient Boosting. MLP. Ensemble Learning Used AdaBoost. XGBoost. Decision Tree. Logistic Regression. Naive Bayes RF: 94. XGBoost: 94. SVM: 91. KNN: 88. Random Forest: 99. AdaBoost: 99. Random Forest:99. Logistic Regression: 99. accuracy, 0. 039% false positive rate Random Forest: 99. Ensemble (RF DT GB): Ensemble (RF MLP): AdaBoost: 97. XGBoost: Decision Tree: Logistic Regression: Naive Bayes: 86. Proposed Solutions This research proposes the use of machine learning algorithms, specifically XGBoost. RF. SVM, and KNN for detecting XSS attacks. From the summarizations of related studies, as shown in Table II. XGBoost is the superior algorithm based on its performance in XSS detection, and it will be one of the selected algorithms for the proposed However, this research will include a comparative analysis between other prominent machine learning algorithms to assess their effectiveness in identifying XSS In addition to XGBoost, other machine learning algorithms such as RF. SVM, and KNN will be explored as part of the comparative analysis. SVM can effectively handle high-dimensional data. works by finding the hyperplane that best separates the classes of data points in the feature space. This separation margin is maximized to ensure optimal classification performance. Despite its high computational cost, it has a strong theoretical foundation and can generalize well to unseen data, making it valuable for XSS detection. TABLE i COMPARATIVE OF RESEARCH PAPERS Selected ML Algorithms KNN SVM XGBoost Advantages Disadvantages Simple and effective for small datasets, it handles multi-class classification well Effective in highdimensional spaces, robust to overfitting . , . Computationally expensive with large datasets, sensitive to irrelevant features . High performance, handles missing data, scalable . , . High accuracy, handles non-linear data, reduces overfitting . , . Support Vector Machine (SVM) The selected dataset "XSS_dataset. csv," obtained from the Kaggle platform . , is a suitable and deliberate choice for the literature evaluation in this study context. The dataset's specific nomenclature, which clearly indicates its emphasis on XSS threats, perfectly matches the study goal of training machine learning models to detect XSS. Conclusively, the validity and importance of this dataset in evaluating machine learning methods for XSS detection result from its specific focus on XSS attacks and applicability for training machine learning models. It requires careful tuning of parameters and is intensive with large datasets . , . It requires careful tuning and complex implementation . Computationally intensive, less interpretable . II. MATERIAL AND METHOD This section thoroughly explains the research methods employed in planning and evaluating the experiment. Additionally, the research workflow will be outlined to clarify the methodologies proposed at each stage of the research The chapter justifies the tools, datasets, and procedures used to carry out the experiment for the comparative analysis. A list of the performance metrics used in this investigation is also included. Extreme Gradient Boosting (XGBoos. XGBoost's iterative learning method constructs an ensemble of decision trees that leverage knowledge acquired from previous iterations. This iterative nature empowers XGBoost to continually enhance its accuracy in predicting XSS attacks . Regardless of the volume and complexity of XSS data. XGBoost shows a high level of preparedness to process the information efficiently and reveal concealed This dual capability of iterative learning sets XGBoost as an outstanding algorithm for XSS detection . Workflow This research follows a three-stage workflow, each aligned to specific research objectives. In the initial stage, data preparation and cleaning processes are executed, followed by a rigorous assessment and examination of methods and In the second stage, the selected machine learning algorithms are applied to train and test the model. Finally, in the third stage, the experimental results are thoroughly analyzed and discussed in Fig. The research framework is covered in three phases: Random Forest (RF) Theoretically, each tree in the forest is trained on a random subset of the data, and the final prediction is determined by combining the predictions of individual trees. RF functions as an ensemble technique, successfully preventing overfitting and exhibiting strong performance with diverse data sources. An outstanding feature is its capacity to analyze complex data sets with several dimensions, which allows it to be versatile in detecting XSS in various settings . Phase 1: Phase 1 (Recognition of Dataset & Data Preparation to Train Machine Learning Classifier. Understanding existing machine learning algorithm findings from literature review and exploring a dataset for machine learning detection model. K-Nearest Neighbors (KNN) KNN offers a more straightforward approach. It is a nonparametric algorithm for classification. It assigns an object to the class most common among its k-nearest neighbors, where k is a user-defined parameter. KNN can be computationally expensive for large datasets and needs help dealing with noisy or imbalanced data. Phase 2: Phase 2 (Development of Proposed XSS Detection and Classification Mode. : Create an efficient model for detecting XSS attacks using machine learning algorithms XGBoost. RF. SVM, and KNN. Model training and testing. Support Vector Machine (SVM) Fig. 1 Flowchart of proposed work The labeling outline is straightforward. "Label 0" likely indicates cases without possible XSS attacks, representing safe scripts. In contrast, "Label 1" signifies the existence of XSS attacks, explicitly referring to malicious scripts. This binary classification enables the training of machine learning models to differentiate between these two categories. Phase 3: Phase 3 (Performance comparison of trained Machine Learning Classifier. : The performance comparison findings demonstrate the efficacy of the suggested XSS detection model in terms of performance metrics The comparison analysis will uncover insights into the most effective model for XSS detection. Performance Measurement This section analyzes the performance metrics used to facilitate future comparative analysis. First, training time is calculated by calculating how long it takes to train machine learning using the selected dataset. This step is essential for assessing the machine learning classifier's efficiency because a shorter training period will result in lower computational The training time can be computed using the formula Data Labelling The dataset has 13,686 raw data that has been prepared with two primary columns, which are "Sentence" and "Label" . The "Sentence" column contains textual data in the form of scripts, including both benign and malicious occurrences associated with XSS attacks. The "Label" column assigns binary values, "0" and "1," to each script, indicating the lack or existence of XSS attacks, accordingly. Fig. 2 shows benign data, which is AuLabel 0,Ay is 6,316, while malicious data, which is AuLabel 1,Ay is 7,373. Second, the confusion matrix is a concept, and data related to the Confusion Matrix are true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). Based on Table i. FN is the model that incorrectly classifies positive events as negative. TN is the model that accurately identifies negative instances as negative. TP is a model that accurately identifies the number of positive cases. Meanwhile. FP is the frequency with which the model incorrectly identified a negative case as a positive example. Fig. 2 Distribution of dataset TABLE i CONFUSION MATRIX Predicted Positive . Predicted Negative . Actually Positive . True Positive (TP) False Negative (FN) machine learning classifiers in detecting and classifying XSS It also explains the machine learning classifiers that have been trained, assessed, and compared. Actually Negative . False Positive (FP) True Negative (TN) Experiment Setup The experiment is based on a Python environment. personal laptop running Jupyter Notebook with Anaconda Navigator is utilized. The exploratory data analysis is conducted by analyzing the raw dataset obtained from Kaggle . Then, using this data, accuracy, precision, recall, and F1score are calculated. These will be compared to previous models to evaluate this proposed XSS detection modelAos Accuracy is calculated by dividing the number of predictions the model makes by the number of correct Experiment Design To effectively achieve the research objectives, the experiment's design provides a comprehensive explanation of the implementation procedure, and the tools utilized in the This includes detailing how the dataset was prepared, the specific machine learning models chosen, and the evaluation metrics employed to assess their performance in detecting Cross-Site Scripting (XSS) vulnerabilities: Precision is the percentage of total positive predictions. expresses the model's percentage of true positive predictions. Data Preprocessing: The experiment begins with loading the 'XSS_dataset. csv' dataset into a Panda DataFrame . Each entry contains sentences labeled for Cross-Site Scripting (XSS) vulnerabilities. As observed initially, the raw dataset contains various forms of HTML tags and JavaScript The preprocessing stage involves transforming the raw dataset into a cleaned version where significant elements like HTML tags and JavaScript event handlers are systematically removed. This cleaning process consists of several steps, such as converting all text to lowercase for uniformity, reducing case sensitivity, and tokenizing sentences into individual words to filter out non-informative words. Recall is the total number of positive cases in the dataset. it computes the detection model's real positive prediction rate. F1-score is the data that combines recall and accuracy of a model into a single score. Vectorization Data: In this experiment, the text data undergoes vectorization using the CountVectorizer from the sci-kit-learn library, a crucial step in natural language processing (NLP) tasks. This process transforms textual data into a numerical format suitable for machine learning Error! Reference source not found. includes the output result, which displays the transformed data as a NumPy array . type=int. , where each element represents the frequency count of a specific term in its corresponding This numerical representation allows machine learning models to process and learn from the textual data Cross-validation is a technique known as 10-fold crossvalidation that involves training and testing a machine learning model on several subsets of a dataset to assess its In a series of ten iterations, the dataset is resampled into ten equal-sized folds, of which nine are used for training and one for testing. To visualize the iteration, show how it works. Fig. 4 Data after vectorization . Splitting Data and Model Development Execution Setup: The performance of the models in detecting XSS threats was assessed, the dataset was ratioed precisely 70% of the data was allocated for training, and the remaining 30% was reserved for testing. The splitting process was executed with a fixed random state to ensure reproducibility of the results. The function begins by recording the start time to calculate the duration of the training. It then performs 10-fold crossvalidation on the training data to estimate the model's Fig. 3 10-fold cross-validation Research Design and Implementation This section provides an overview of the research methodology, which involves selecting the appropriate software and measurements to assess the effectiveness of TABLE VII CONFUSION MATRIX FOR SVM performance stability. After cross-validation, the model is trained on the entire training set, and the end time is recorded to determine the total training duration. The trained model is then used to predict the labels of the test set, and several performance metrics are computed, such as accuracy, precision, recall, and F1 score. Then, the function evaluate_model is applied to four different classifiers. Predicted Positive . Predicted Negative . Model Selection: A ranking approach was applied using the performance metrics stored in the results_df DataFrame to determine the most effective model for detecting XSS This method allows DataFrame to assess and compare the performance of various classifiers based on metric measurements. It is to identify and present the metrics of the best-performing model. RESULT AND DISCUSSION Result of Confusion Matrix The study evaluates the performance of four key algorithms: Random Forest. XGBoost. K-Nearest Neighbors, and SVM. Each model is trained and evaluated using accuracy, precision, recall, and F1-score metrics. To get the metrics being assessed, each model needs to go through the confusion TABLE IV CONFUSION MATRIX FOR RANDOM FOREST Actually Negative . Predicted Positive . Predicted Negative . TABLE VI CONFUSION MATRIX FOR KNN Actually Negative . Predicted Positive . Predicted Negative . XGBoost K-Nearest Neighbors SVM Test Accuracy Precision Recall F1-Score Result of Cross-Validation Cross-validation results provide a robust evaluation of the models by partitioning the data into subsets, training the model on some subsets while validating on others, and repeating this process to ensure that the evaluation metrics are not biased by a particular data split. The 10-fold crossvalidation accuracy for each model is detailed, showing the stability of the algorithms. TABLE IX CROSS-VALIDATION FOR MODELS Metric Random Forest XGBoost K-Nearest Neighbors SVM 10-fold CV Accuracy Test Accuracy TABLE X TRAINING TIME RESULTS FOR MODELS Actually Negative . Actually Positive . Random Forest Result of Training Time Training time is important, especially for large datasets or real-time applications. The training time for each model is recorded and compared to highlight the modelsAo efficiency. TABLE V CONFUSION MATRIX FOR XGBOOST Actually Positive . Metric It can be concluded that XGBoost demonstrates a balanced performance with a notable accuracy rate . At the same time. Random Forest shows superior results, likely due to its ensemble approach that combines multiple decision trees for enhanced prediction. Despite its simplicity. KNearest Neighbors performs competitively, underscoring its efficiency in handling text classification tasks. SVM also shows high performance, although its training time is significantly longer. The confusion matrices measure each model's strengths and weaknesses in predicting XSS threats. This section provides the results and discussion, along with an analysis of the experiments conducted. The proposed models are evaluated in terms of performance metrics and cross-validation in detecting Cross-Site Scripting (XSS) threats, specifically in online web applications. This chapter also provides an overview of the most effective model identified from the evaluation and offers insights for future Predicted Positive . Predicted Negative . Actually Negative . TABLE Vi RESULT OF EVALUATION USING PERFORMANCE METRICS Model Evaluation: Throughout the model evaluation phase, each model was tested thoroughly to determine how well it detected XSS vulnerabilities. The evaluate_model function was used with the modelsAo classifiers and the training and testing datasets. This function provided several performance metrics, including cross-validation scores, accuracy, precision, recall. F1 score, confusion matrix, and training time. Actually Positive . Actually Positive . Metric Random Forest XGBoost K-Nearest Neighbors SVM Training Time . Comparison and Result Discussion The algorithms' comparative analysis focuses on the advantages and limitations of each approach in the context of XSS detection models. TABLE XI TRAINING TIME RESULT FOR MODELS Metric 10-fold CV Accuracy Test Accuracy Precision Recall F1-Score Training Time . Metric Random Forest K-Nearest Neighbors Random Forest XGBoost K-Nearest Neighbors SVM XGBoost SVM Referring to Table XI. Random Forest emerges as the top performer with an accuracy of 99. 93%, benefiting from its ability to manage complex decision boundaries through ensemble learning. XGBoost follows closely with an accuracy 78%, showcasing its efficacy in handling linear and non-linear relationships. K-Nearest Neighbors (KNN), achieving an accuracy of 99. 53%, is a valuable model due to its simplicity and computational efficiency. SVM also shows strong performance, though its training time is significantly longer, which could be a drawback for time-sensitive . IV. CONCLUSION This research provides a comprehensive summary of the primary objectives, methodologies, and techniques employed in the comparative analysis of machine learning algorithms for detecting Cross-Site Scripting (XSS) threats in online web applications The fast-changing nature of web security requires effective detection systems. This research aimed to determine how well different machine learning models can identify XSS attacks and which model is the most effective. The study used thorough research and careful evaluation to improve understanding and application of machine learning in web security Based on the findings and constraints of this research, several suggestions are made for future improvements and further research. Future work should explore advanced feature extraction techniques like word embeddings and deep learning-based methods to capture more detailed patterns in the data. Expanding the dataset to include more samples and a wider variety of XSS attack patterns will help the models generalize better. Using techniques like grid search, random search, or Bayesian optimization for more extensive hyperparameter tuning can improve model performance. REFERENCES