SINERGI Vol. No. June 2022: 213-222 http://publikasi. id/index. php/sinergi http://doi. org/10. 22441/sinergi. Comparative analysis of classification algorithm: Random Forest. SPAARC, and MLP for airlines customer satisfaction Safira Amalia1. Irene Deborah1. Intan Nurma Yulita2,3 Master of Science Management. Universitas Padjadjaran. Indonesia Research Center for Artificial Intelligence and Big Data. Universitas Padjadjaran. Indonesia Department of Computer Science. Universitas Padjadjaran. Indonesia Abstract The airline business is one of the businesses determined by the quality of its services. Every airline creates its best service so that customers feel satisfied and loyal to using their services. Therefore, customer satisfaction is an essential metric to measure features and services provided. By having a database on customer satisfaction, the company can utilize the data for machine learning The model generated can predict customer satisfaction by looking at the existing feature criteria and becoming a decision support system for management. This article compares machine learning between Split Point and Attribute Reduced Classifier (SPAARC). Multilayer Perceptron (MLP), and Random Fores (RF) in predicting customer satisfaction. Based on the data testing, the Random Forest algorithm provides better results with the lowest training time compared to SPAARC and MLP. It has an accuracy of 827%, an F-score of 0. 958, and a training time of 84. Keywords: Customer satisfaction. Multilayer Perceptron. Random Forest. Reduced Classifier. Split Point and Attribute. Article History: Received: August 20, 2021 Revised: December 26, 2021 Accepted: January 2, 2022 Published: June 15, 2022 Corresponding Author: Safira Amalia Master of Science Management Universitas Padjadjaran. Indonesia Email: safira20003@mail. This is an open access article under the CC BY-NC license INTRODUCTION Every company competes to provide the best service and features to create and increase customer satisfaction in business activities. One of the businesses determined by the quality of its service is the airline business. Every airline provides its best service so consumers are satisfied and loyal to use the services of an airline so that the company can continue to grow and be able to compete in the industry. Passengers prefer to rate airlines based on their satisfaction with in-flight services . increasing the quality of in-flight service becomes one of the success factors of an airline. Evaluating the quality of services can be done by checking customer satisfaction. Customer satisfaction is an essential metric to measure consumer loyalty and intention to use services/products again, increase positive ratings, and reduce costs for new customer acquisitions . Companies can use customer satisfaction surveys to gain consumer ratings and evaluate the features and services. Data obtained can be used as training data for machine learning supervised learning. In addition, the available data can be used for supervised learning. The machine learning training process produces a model to predict customer satisfaction by looking at the existing feature criteria. Furthermore, the model generated by machine learning can be used as a decision support system that helps management plan future business strategies and strategies for retaining customers and new customer acquisitions. Many studies have been conducted to analyze customer satisfaction with airlines. Kumar and Zymbler . analyzed tweets for improving customer experience by using Support Vector Machine (SVM). Artificial Neural Networks (ANN), and Convolutional Neural Network (CNN). Amalia et al. Comparative analysis of classification algorithm: Random Forest. A SINERGI Vol. No. June 2022: 213-222 The study showed that CNN improved the performance of the classification model and provided better results than ANN and SVM. Gracia et al. used an ensemble regression model to analyze the problem of predicting customer satisfaction. The results showed ensemble regression produced the best results. Hulliyah . researched predicting flight passengers using a classification algorithm: KKN. Logistics Regression. Gaussian NB. Decision Tree, and Random Forest (RF). This study concentrates on the Wi-Fi service experience, and the algorithm that provides the best result is RF, with 99. 00% accuracy at a threshold of 0. With many classification algorithms in machine learning, this study chose to develop a classification using three models there are Split Point Attribute Reduced Classifier (SPAARC). Multilayer Perceptron (MLP), and Random Forest (RF). SPAARC is a new method, and not many studies have used this algorithm, so this study aims to test the SPAARC algorithm and compare it with other algorithms in a case. The advantage of the SPAARC method is reducing the computational workload process from the decision tree by selecting attributes dynamically or using the tree depth levels involved . Another algorithm used in this study is Random Forest, which gives high accuracy based on previous studies. On the other hand, the MLP algorithm is used due to its ability to classify large amounts of data with various This paper contains a comparative analysis of several classification algorithms. Accuracy results from the data collection will be obtained, and this study can show which algorithm has a high and good level of accuracy according to the existing parameters. MATERIAL AND METHOD Supervised Learning Supervised learning is defined where the class labels are known and the class limits are well represented in the data set . Several methods that are included in supervised learning are Decision Trees. Naive Bayes. Neural Networks, and Deep Learning. The function of supervised learning is to build a classifier by providing a classified training data set . This method processes the training data set to find the input and target attribute relationship. Finally, the assembled model will be used to predict the attribute target value for the new data set. The challenge of supervised learning is a generalization, where the classifier model has to be appropriately used on all data. The paradigm of supervised learning can be seen from the Neural Network, which includes the MLP algorithm, which can efficiently find solutions for several linear and non-linear problems, such as the classification process . Moreover, processes in MLP have unique characteristics, such as. Nonlinearity that is reflected in the activity and can be distinguished, . One or more hidden layers of neurons to enable the network to solve complex problems, and . Interconnection model. Meanwhile, the decision trees are used for prediction functions such as classification and The nodes represent the data set features, and the branches represent the rules of the decisions . This decision tree has two nodes: the decision node and the leaf node. The decision node is used to make any decision and has many branches, while the leaf node is the output of the decision and does not have branches . Split Point and Attribute Reduced Classifier (SPAARC) Split Point and Attribute Reduced Classifier (SPAARC) is one of the classification tree algorithms from the Decision Tree or Classification (CART) and Regression Tree method . SPAARC has two components in dealing with decision tree problems. This technique is used to reduce the computational process and increase processing time while minimizing the accuracy of the classification The SPAARC method is applied to the classification algorithm by implementing splitpoint numerical attribute analysis and recursive selection of attribute nodes. The process includes split-point sampling to reduce the number of these split points when used in testing the suitability of attributes at each node in the decision tree and usage of node-attribute sampling to test each alternative horizontally at the tree node level. Components of SPAARC consist of Node Attribute Sampling (NAS) and Split-Point Sampling (SPS). The purpose of the combined NAS components is to balance the different requirements of classification accuracy and processing time . The research found supporting evidence on optimizing the speed of induction of decision trees studied by Fayyad and Irani . by using entropy as a heuristic in decision trees. Yates et al. proposed the NAS component contributes to avoiding testing every non-class attribute in each tree node. dynamically selects the attribute space by switching between complete attributes lists and Amalia et al. Comparative analysis of classification algorithm: Random Forest. A p-ISSN: 1410-2331 e-ISSN: 2460-1217 subsetsAibased on the depth of the tested At the same time, the SPS component can reduce the number of possible split points tested dynamically even though the SPS component only handles numeric attributes. These two components of the SPAARC algorithm can improve time savings during the modelling process by accelerating the pruning process. However, the improvement of SPAARC can contributes significantly to implementing the The hyperparameters in SPAARC are minNumObj (M), numFoldPruning (N), size (C), and seed (S). minNumObj (M) is the minimum number of branches on a node. The smaller the minNumObj value, the less branching in a node takes a longer processing time than a larger minNumObj. The second is numFoldPruning (N) which is trimming the amount of data to reduce pruning errors in each tree. Pruning on the decision tree can reduce outliers and data noise to increase accuracy in data classification. The third is size per (C) is a percentage of the training data set size. Last is the seed (S), which sets a local random seed for randomization. Multilayer Perceptron (MLP) Multilayer Perceptron (MLP) is part of the Artificial Neural Network (ANN). The Neural Network method starts with receiving input and performs operations with a weight, adding them . eighted su. and adding bias. This operation will be used as a parameter of the activation function, which will be the neuron's output. MLP is a neural network structure that is widely used and consists of 3 layers . of structure, namely the input layer, output layer, and hidden Each layer contains several neurons . depending on how complex the process Neurons in MLP are trained with a backpropagation algorithm. MLP is commonly used for classification, recognition, prediction, and forecasting activities. MLP works by moving the data forward from the input layer toward the output layer as depicted in Figure 1. MLP works starting from the input layer receiving the input signal for Then the input is processed by the MLP computing engine in the hidden layer, which is located between the input and output layers. Finally, tasks that need to be done, and the computing results, are carried out by the output In neural networks, hyperparameters determine the structure of the neural network and how the model is trained. The hyperparameter tuning process is the key to reducing the computation time that gives a reasonable error. Hyperparameters that can be adjusted in MLP are the number and size of hidden layers . he depth of the algorithm mode. , learning rate, momentum, and dropout rate . The learning rate sets the minimum step for each iteration. Setting the learning rate can result in the model's speed to produce the model and solution . xample: minimum erro. A small learning rate can produce a smoother model and more minor errors than a significant learning rate. Then, momentum in the neural network is a weight change based on the direction of the gradient of the last pattern with the previous The use of the momentum parameter affects the learning process towards a faster and more stable convergence. Figure 1. Multilayer Perceptron Block Diagram . Amalia et al. Comparative analysis of classification algorithm: Random Forest. A SINERGI Vol. No. June 2022: 213-222 Figure 2. Random Forest Block Diagram . Random Forest Random Forest is a machine learning algorithm with an ensemble method that can be used for classification and regression. The Random Forest consists of a collection of decision trees associated with bootstrap samples collected from the original dataset as shown in Figure 2. The nodes are divided based on the entropy of the selected feature subset. Suthaharan . explained that the subset formed from the original dataset has the same size as the original dataset by bootstrapping. The advantage of using the Random Forest method compared to the decision tree is that it provides several classifications from several decision trees in the testing phase. In addition, the accuracy of the Random Forest is higher. It retains some good qualities in the decision tree, such as interpreting the relationship between predictors and outcomes . These characteristics make it a preferred method for a decision tree. Suthaharan . identified it as a good technique for solving classification problems in big data because of its flexible parallel structure that works with technologies to handle big data, such as Hadoop. MapReduce, etc. The Random Forest algorithm has several hyperparameters that the researcher can set. selecting the hyperparameters, the model can perform better. The hyperparameters used in Random Forest consist of the structure of each tree . inimum number of node size. , forest structure and size . umber of tree. , and categorical elements . umber of variables considered in each branch/tr. The number of node sizes sets the minimum number of observations on the terminal Setting leads at low trees, and high depths produces more branches to reach the terminal nodes by setting the number of node The higher the result is on reducing computation time without reducing the prediction At the same time, the number of trees is a parameter that is recommended to be set in large values. More trees result in good Cross-Validation Cross-validation or rotation estimation is a model validation technique used to assess the statistical results of the analysis to be generalized from the component data set . Crossvalidation can be used for estimating errors in predicting or evaluating the performance of the model . In cross-validation, rotation estimation is known and divides the data into k subsets of almost the same size. Then training and testing are conducted as many as k. in each repetition, one set will be used for test data while the other k data subgroups serve as training K-fold is known for evaluating the classifier's performance, where the K-Fold method can be used if the amount of data is The best implementation of the number of folds in the validity test uses 10-fold crossvalidation in each model . Cross-validation is also a validation method used to increase the accuracy of the algorithms of other methods Method The learning methods that are used in the comparative analysis are SPAARC. MLP, and Random Forest. Each method uses the stages of the Knowledge Discovery in Database (KDD). Data collection and selection, preprocessing/ cleaning, transformation, data mining, and interpretation/evaluation as shown in Figure 3 . Amalia et al. Comparative analysis of classification algorithm: Random Forest. A p-ISSN: 1410-2331 e-ISSN: 2460-1217 Table 1. Data Features Ten-Fold Cross Validation Data Collecting Data PreProcessing and Cleaning Validation and Testing Features Gender Analyze Testing Result Rule Model Consumer type Age Type of travel Class Data Transformation Choosing Algorithm Classification Flight distance Seat comfort Departure/Arrival time Food & drink Gate location Figure 3. Research Method . Inflight Wi-Fi service Inflight entertainment The process starts with data collection, where the data is obtained from Kaggle. The dataset consists of 129,880 data entries, 22 features, and 2 class labels . atisfied and The detail of data features is as listed in Table 1. The next step is to do data preprocessing and cleaning, where the preprocessing stage aims to increase the data's quality before processing. In this phase, the data are cleaned to fit on a Likert scale of 1-5. A score of 0 is considered an unanswered survey After deleting data with a score of 0 and statements that are not filled in, the total data that can be processed for modelling is 119,611 data After the data set is ready, algorithms are chosen and used for rule models for research. Next, validation and testing were carried out to determine the prediction results' accuracy, precision, recall, and classification error to find out the data results. Finally, an analysis of the test results is carried out by the discussion and then will be compared between the classification algorithms that have been determined. After data preprocessing, data is ready to be used for machine learning with the selected method. Transformation data is involved when needed. Then, for model testing, there is a validity test to measure the level of accuracy, f1score value, classification error, and training time. Then, last is analyzing the results and comparing the modelled algorithm. Modelling The learning method used in this study is SPAARC. MLP, and RF SPAARC, a new algorithm, while MLP and Random Forest are the most popular algorithms often used in large data Online support Ease of Online booking Onboard service Legroom service Baggage handling Checkin service Online boarding Cleanliness Departure Delay in Minutes Arrival Delay in Minutes Description Customer's gender ale/femal. Type of customer . oyal/disloya. Customer's age Travel purpose . usiness/persona. Type of class (Eco/Business/Eco Plu. Flight distance Rating of seat comfort (Likert scale 1-. Rating of departure/arrival time convenient (Likert scale 1-. Rating of food and drink (Likert scale 1-. Rating satisfaction of gate location (Likert scale 1-. Rating satisfaction of Wi-Fi services (Likert scale 1-. Rating of inflight entertainment (Likert scale 1-. Rating satisfaction of online support (Likert scale 1-. Rating satisfaction of online booking feature (Likert scale Rating satisfaction of onboard services (Likert scale 1-. Rating satisfaction of legroom service (Likert scale Rating satisfaction of baggage handling (Likert scale 1-. Rating satisfaction of checkin service (Likert scale 1-. Rating satisfaction of baggage handling (Likert scale 1-. Rating satisfaction of airplane's cleanliness (Likert scale 1-. Departure delay duration . n Arrival delay duration . n There are hyperparameter settings in each algorithm model used (SPAARC. MLP, and Random Fores. to improve the algorithm's The modelling uses a 10-fold crossvalidation technique. The device used for modelling is a 1. 6 GHz Dual-Core Intel Core i5 with OS version Big Sur Version 11. 1, 4 GB 1600 MHz DDR3. Every hyperparameter in the method is listed in Table 2. After applying the algorithm method, the performance is measured by several metrics which are then used for comparative analysis. This paper uses the metrics commonly used in classification, accuracy and F-score. Accuracy measures how much the model can classify the data correctly. Calculations do not discriminate between the correct number of labels from Amalia et al. Comparative analysis of classification algorithm: Random Forest. A SINERGI Vol. No. June 2022: 213-222 different classes . Accuracy can be calculated by using . TP = true positive TN = true negative FP = false positive FN = false negative MLP RF. Hyperparameter minNumobj, numFoldPruning, percentage of training data, number of seeds Learning rate, hidden layer, momentum Number of the decision tree, node size Meanwhile. F-score is a calculation with weighting from precision . he accuracy of the model to predict positive label. and recall . ow much actual positive data can be captured by the model with positive data labels . rue positiv. Fscore aims to measure the effectiveness of the method used. F-score can be calculated by using . RESULTS AND DISCUSSION SPAARC Model Testing The hyperparameter was tested four times to optimize the SPAARC model. As for the first test by changing minNumobj (M), the M values used in the experiment are 2. 0, 1. 0, and 0. The test results are listed in Table 3. The most optimal accuracy and F-score results are obtained at the M value of 1. 0 with an accuracy rate of 95. The smaller the value of M causes the model training time to be longer with the same accuracy results. The second hyperparameter test is numFoldPruning (N), where the N values used in the experiment are 2,3 and 5. The test results are shown in Table 4. NumFoldPruning (N), which sets pruning the amount of data to reduce pruning errors on each tree, produces the highest accuracy when the number of pruning is 5 with a previously determined M value of 1. 0 and an accuracy value The third Hyperparameter test is to test the training data set (C) percentage, where the C values used in this experiment are 1, 0. 75, and So that the test results are obtained as follows in Table 5. The last hyperparameter test is S. The results are shown in Table 6. F-score Accuracy (%) Training Time . Table 4. Accuracy & F-Score Result of NumFoldPrunning (N) Setting Table 2. Method and Hyperparameter Used Method SPAARC Table 3. Accuracy & F-Score Result of MinNumObj (M) Setting F-score Accuracy (%) Training Time . Table 5. Accuracy & F-Score Result of Training Data (C) Setting Fscore Accuracy (%) Training Time . Table 6. Accuracy & F-Score Result of Number of Seeds (S) Setting F-score Accuracy (%) Training Time . The best accuracy is 95. It was obtained when the number of seeds S= 5. From the test results with hyperparameter settings, the SPAARC algorithm produces the best model with values of M 1. N 5. C 1, and S 5. The model has an accuracy rate of 95. The F-score value is 951, and a training time of 86. 55 seconds. Multilayer Perceptron Model Testing The first hyperparameter test is the Learning rate which is used for model optimization in MLP. The learning rates used in the experiment were 0. 3, 0. 01, and 0. The test results are listed in Table 7. The highest accuracy occurs in the 0. learning rate setting, with 94. 78% in the learning rate test. The test results also found that the higher the learning rate, the shorter the training Next is testing the number of hidden layer The MLP model used is the best learning rate setting in the previous test. The number of hidden layers tested was 5, 10, and 15. Table 8 lists the results of the Amalia et al. Comparative analysis of classification algorithm: Random Forest. A p-ISSN: 1410-2331 e-ISSN: 2460-1217 It increases the number of hidden layers resulting in a higher level of accuracy. This can be seen in the most significant number of hidden layers, 15, which has an accuracy rate of 94. and the F-score is 0. The higher the hidden layer value also affects the longer training time. The last test is momentum using hidden layers and learning rates that produce the highest The amount of momentum tested is 2, 0. 5, and 0. Table 9 shows the results of the hyperparameter testing. Changes in the momentum value at a certain point can cause an increase in accuracy and F1score. For example, this experiment with a momentum value of 0. 5 produces an accuracy of 94% and an F-score of 0. The test results with hyperparameter settings show that the MLP algorithm makes the best model with a learning rate of 0. 01, hidden layers 15, and a momentum value of 0. The model has an accuracy of 94% and an F-score value of 0. Table 7. Accuracy & F-Score Result of Learning Rate Setting Learning Rate F-Score Accuracy (%) Training Time . Table 8. Accuracy & F-Score Result of Number of Hidden Layer Setting Num of Hidden Layer Learning Rate F-Score Accuracy (%) Training Time . Table 9. Accuracy & F-Score Result of Momentum Setting Momentum Hidden Learning Rate Random Forest Model Testing In the Random Forest algorithm, the hyperparameters that need to be set are the number of decision trees and node sizes to improve the performance. Therefore, the first hyperparameter setting is to test changes in the number of decision trees. The number of decision trees tested in the experiment was 100, 80, and 50, as listed in Table 10. In setting the number of decision trees, there is an increase in accuracy and F-score if the number of decision trees is increased. For example, the highest accuracy is obtained from the hyperparameter setting with a total decision tree of 100 with a 95. 8% accuracy rate and an Fscore value of 0. The result is similar to Probst's . findings which are the training time increases linearly with the number of trees. Next is setting the maximum number of node sizes listed in Table 11. Table 10. Accuracy & F-Score Result of Number Decision Tree Setting Num of Decision Tree Accuracy (%) F-score Training Time . F-score Accuracy (%) Training Time . Table 11. Accuracy & F-Score Result of Max Node Size Setting Max Num of Decision Tree Accuracy (%) F-score Training Time . The model's accuracy is the same as setting the maximum number of node sizes to 50 and the maximum number of node sizes to 100. The higher the maximum number of node sizes, the shorter the training time. From the test results with hyperparameter settings, the Random Forest algorithm produces the best model with a decision tree number of 100 and a maximum number of node size 100. The model has a 827% accuracy rate, and the F-score value is Discussion After calculating accuracy. F-score, and training time with the specified airline dataset and comparing SPAARC. MLP, and RF produced like the data in Table 12. In each of these comparison algorithms, using the 10-validity test fold crossvalidation in each model, the accuracy results and the highest F-score are obtained using the Random Forest algorithm. It has the highest accuracy and score and the lowest training time compared to the other two algorithms, where RF has a 95. 837% accuracy rate. F-score is 0. and a training time of 84. 53 seconds. Amalia et al. Comparative analysis of classification algorithm: Random Forest. A SINERGI Vol. No. June 2022: 213-222 Table 12. Result Comparison Table 13. Confusion Matrix Training Time Accuracy (%) F-score SPAARC (M 1. C 1. S . Multilayer Perceptron . hidden layer 15, learning rate 0. Random Forest (Num of decision tree 100, max node size . Metode Meanwhile. SPAARC has a 95. 071% accuracy rate, an F-score of 0. 951, and a training time of Then. MLP has a 94. 941% accuracy rate, an F-score of 0. 949, and a training time of 394. In a previous study. SPAARC had minimal effect on decision tree classification accuracy and reduced training time by 70% . this study, the accuracy of SPAARC reached 95% and had a training time of 86 seconds. The results of SPAARC are almost the same as RF. Still, after testing by setting each hyperparameter on each RF algorithm. SPAARC results is lower than RF. However, it produces pretty good accuracy. The training time is the longest. Random Forest and SPAARC algorithm methods are more superficial than MLP, so the training time is much faster than MLP. MLPs with more iteration settings will spend longer training or execution time, becoming the weakness of MLPs . It can provide the best result because it is one approach of the ensemble method which combines several base models to produce one optimal predictive model. A large group of uncorrelated decision trees can produce more accurate and stable results than any individual decision tree. For example, in an ensemble method in a Random Forest, an increasing number of trees (J) can stabilize generalization error and converge surely to a limit . Generalization error is related to measuring how accurately the algorithm can predict the outcome. Meanwhile, generalization error initially decreases in other ensemble methods as the number of trees (J) increases. When the number of trees (J) becomes too large, overfitting and generalization error increases. In the RF model with specified hyperparameters and a high accuracy rate of 8%, the model also produces a good level of precision and recall. The precision . ositive predictive rat. 1%, and the recall or sensitivity rate . rue positive rat. Then the tested RF model can be used to create a predictive model to predict customer satisfaction with precision. It is also shown in Table 13. Predicted (Satisfie. Predicted (Dissatisfie. Recall (%) True (Satisfie. True (Dissatisfie. Precise (%) CONCLUSION Several machine learning algorithms were compared. Split Point and Attribute Reduced Classifier (SPAARC). Multilayer Perceptron (MLP), and Random Forest, to determine the classification model on passenger satisfaction. SPAARC was chosen in this study because the method was relatively new and had the advantage of short training time. All models produce an accuracy rate above 90% based on the results. However, the highest accuracy was obtained by the Random Forest method with the decision tree number hyperparameter setting 100 and max node size 100. The accuracy value generated in the random Forest model was 827%, and F-score was 0. 958, and the training time was 84. 53 seconds. Random Forest can have the best performance because the ensemble method in Random Forest can stabilize generalization error and converge surely to a limit. This stabilized generalization error then results in better model Further, this Random Forest modelling can be developed to identify features that make customers satisfied with the airlines and features that need improvement from dissatisfied REFERENCES