ADI International Conference Series Vol 4 January 2022 p-ISSN : 2774-9576 e-ISSN : 2747-2981 Feature Selection Technique for Improving Classification Performance in The Web-Phishing Detection Process Anggit Ferdita Nugraha1. Dwiky Alfian Tama2. Dewi Anisa Istiqomah3. Surya Tri Atmaja Ramadhani4. Bayu Nadya Kusuma5. Vikky Aprelia Windarni6 Faculty of Computer Science. University of Amikom Yogyakarta1,2,3,4,5,6 Jl. Padjajaran. Ring Road Utara. Condong Catur. Depok. Sleman. Yogyakarta1,2,3,4,5,6 Indonesia1,2,3,4,5,6 e-mail: anggitferdita@amikom. id1, dwiky. tama@students. dewianisaist@amikom. id3, surya@amikom. id4, bayu. nadya@amikom. vikkyaprelia@amikom. To cite this document: Nugraha. Tama. Istiqomah. Ramadhani. Kusuma. , & Windarni. Feature Selection Technique for improving classification performance in the web-phishing detection process. Conference Series, 4. , 25Ae31. https://doi. org/10. 34306/conferenceseries. Hash : ABCRcxeVygKM3n9LZbpfV33Aj4w6dJLrcEcl0tQunhAf6QUJWNPU6HQn2gDh5Myr Abstract Web phishing is a type of cybercrime that occasionally threatens the online activities of website visitors. Web phishing uses a phoney website page that closely mimics the legitimate Website in order to fool its target into providing crucial information. Web phishing attacks also continue to grow in popularity year after year. As a result, it is vital to design a web phishing detection system in order to reduce the number of victims and financial losses caused by web phishing attacks. The development of a web phishing detection system continues to this day, with machine learning being the most often used model. Unfortunately, the construction of a machine learning-based web phishing detection system frequently employs only a single classification step. however, the feature selection process enables an increase in the performance of the resultant classification. Thus, an experiment was conducted in this paper by using a feature selection procedure based on the Pearson correlation algorithm prior to doing machine learning modelling utilizing popular algorithms such as Naive Bayes. Decision Tree, and Random Forest. As a result, using a web phishing dataset from the UCI Machine Learning Repository, it was determined that the addition of the feature selection process based on the use of decision tree and random forest algorithms resulted in an increase in accuracy of up to 60 percent and 95. 50 percent, respectively, and a slight decrease in accuracy of 0. 4 percent when implemented in the Naive Bayes algorithm. Keywords: Web-phishing. Feature Selection. Pearson correlation. Classification Feature Selection Technique for improving A. n 25 ADI International Conference Series Vol 4 January 2022 p-ISSN : 2774-9576 e-ISSN : 2747-2981 Introduction The Covid-19 epidemic alters a person's lifestyle and culture . Work, study, and even shopping is increasingly being replaced by an internet society. This societal shift is inextricably linked to the Internet's function as a universal necessity . Ae. With the Internet, one may readily obtain information and conduct numerous transactions. Behind the convenience, there are numerous cybercrime risks waiting to assault and harm internet users. Web phishing is an example of cybercrime that still poses a hazard to internet users, especially those who often deal online . Web phishing is a cybercrime that attempts to deceive the target into divulging sensitive A false website page is created to look like the original website page, so the target does not realize he has been captured and his personal information such as login, password, or account number has been taken over and exploited by cybercriminals. This phishing web mainly targets transactional websites such as financial, e-commerce, airline and travel, and banking websites . The number of web phishing websites is 611877 through mid-2021 and is expected to rise to 54% by early 2022 . To reduce the number of web phishing victims, a system that can evaluate and detect web phishing attacks is required. Machine learning is still commonly utilized in phishing site detection systems. In web phishing research, algorithms like decision trees . Ae . Naive Bayes, and random forests . , . Ae. are commonly utilized. Unfortunately, most machine learning research models only have one classification. As a result, the selected features are dominant and have a significant impact on the data classification process . , . So, in this study, we will use selection to see if it improves the performance of machine learning models, notably for web phishing detection. This experiment will use an online phishing dataset that can be downloaded for free from the UCI Machine Learning Repository. Research Method As indicated in Figure 1, this research was carried out in steps starting with preparation, modelling, data analysis, and model evaluation. Figure 1. Research Flow. Preparation involves downloading an online phishing dataset from the UCI Machine Learning Repository. The dataset contains 11055 data points with 30 features grouped into 4 categories: address bar-based features, abnormal-based features. HTML and JavaScript-based features, and domain-based features. Feature Selection Technique for improving A. n 26 ADI International Conference Series Vol 4 January 2022 p-ISSN : 2774-9576 e-ISSN : 2747-2981 Table 1 provides details and explanations of each of these features. Table 1. Web Phishing Dataset Features. Section Feature Name Having Address Address Bar based Features Abnormal Features Description A website that is indicated as a phishing web uses an IP address in hexadecimal format. URL Length If the character length of the URL address exceeds 54 characters, the Website is indicated as a phishing web. Shortening Service Websites are indicated as phishing when using Short URLs such as "Tinyurl". Having at ( @ ) If the Website contains the @ symbol in the URL, then the Website is indicated as phishing. Double Slash ( // ) A legitimate website will place a double slash at the 6th character for HTTP and the 7th character for HTTPS. Prefix Suffix A phishing web will use prefixes and suffixes, especially in the URL address. Having Domain Sub Too many dots ( . ) in the URL address is one of the characteristics of a website is web phishing. SSL Final State A valid website, of course, has an SSL certificate and uses HTTPS. Domain Registration Length Web phishing tends to use the domain in a short time. Favicon If the favicon is taken from an external domain, then the Website is a phishing website. Port The Legitimate Website uses port 80 as a path for In comparison, web phishing will open a port other than port 80 (HTTP). HTTPS Token A phishing website does not have an authentication token like a legitimate website. Request URL Media on the Legitimate Website are in the same URL and Domain. URL of Anchor The tag shows how many links are linked through the Website. The more links that are connected, the more the web is indicated as a phishing web. Links in Tags Legitimate websites use tags for metadata,