Journal of Fuzzy Systems and Control.
Vol.
No 3, 2025 ISSN: 2986-6537.
DOI: 10.
59247/jfsc.
Phishing Website Detection via a Transfer Learning based XGBoost Meta-learner with SMOTE-Tomek Joy Agboi 1,* .
Frances Uche Emordi 2 .
Christopher Chukufunaya Odiakaose 3 .
Rebecca Okeoghene Idama 4 .
Evans Fubara Jumbo 5 .
Amanda Enaodona Oweimieotu 6 .
Peace Oguguo Ezzeh 7 .
Andrew Okonji Eboka 8 .
Anne Odoh 9 .
Eferhire Valentine Ugbotu 10 .
Paul Avwerosuo Onoma 11 .
Arnold Adimabua Ojugo 12 .
Tabitha Chukwudi Aghaunor 13 .
Amaka Patience Binitie 14 .
Christopher Chukwudi Onochie 15 .
Blessing Uche Nwozor 16 .
Patrick Ogholuwarami Ejeh 17 Faculty of Science.
Delta State University.
Abraka.
Delta State.
Nigeria
2, 3, 17
Faculty of Computing.
Dennis Osadebay University.
Asaba.
Delta State.
Nigeria Faculty of Computing.
Southern Delta University.
Ozoro.
Delta State.
Nigeria
5, 6
School of Sciences.
Edwin Clark University.
Kiagbodo.
Delta State.
Nigeria
7, 8, 14, 15
School of Science.
Federal College of Education (Technica.
Asaba.
Nigeria School of Media and Communications.
Pan-Atlantic University.
Lekki.
Lagos State.
Nigeria Department of Data Science.
University of Salford.
Manchester.
United Kingdom
11, 12, 16
College of Computing.
Federal University of Petroleum Resources.
Effurun.
Nigeria Department of Data Intelligence and Tech.
Robert Morris University.
Pittsburgh.
Pennsylvania.
USA Email: 1 agboijoy0@gmail.
com, 2 emordi.
frances@dou.
ng, 3 osegalaxy@gmail.
com, 4 idamaro@dsust.
evans3447@gmail.
com, 6 oweimieotuamanda@edwinclarkuniversity.
ng, 7 peace.
ezzeh@fcetasaba.
ebokaandrew@gmail.
com, 9 aodoh@pau.
ng, 10 eferhire.
ugbotu@gmail.
com, 11 kenbridge14@gmail.
arnold@fupre.
ng, 13 tabitha.
aghaunor@gmail.
com, 14 amaka.
binitie@fcetasaba.
xtoline2@gmail.
com, 16 nwozor.
blessing@fupre.
ng, 17 patrick.
ejeh@dou.
*Corresponding Author AbstractAiThe widespread proliferation of smartphones has advanced portability, data access ease, mobility, and other it has also birthed adversarial targeting of network resources that seek to compromise unsuspecting user devices.
Increased susceptibility was traced to user's personality, which renders them repeatedly vulnerable to exploits.
Our study posits a stacked learning model to classify malicious lures used by adversaries on phishing websites.
Our hybrid fuses 3-base learners .
Genetic Algorithm.
Random Forest.
Modular Ne.
with its output sent as input to the XGBoost.
The imbalanced dataset was resolved via SMOTE-Tomek with predictors selected using a relief rank feature selection.
Our hybrid yields F1 0.
Accuracy 1.
Recall 0.
Precision 1.
MCC
000, and Specificity 1.
000 Ae to accurately classify all 3,316 cases of its held-out test dataset.
Results affirm that it outperformed benchmark ensembles.
The study shows that our proposed model, as explored on the UCI Phishing Website dataset, effectively classified phishing .
ues and lure.
contents on websites.
KeywordsAiPhishing Website.
SMOTE-Tomek.
Balancing.
Memetic Algorithm.
Tree-based Ensembles Data
INTRODUCTION
Digital revolution has ushered in a plethora of tools and processes that seek to advance efficient knowledge exchange among users .
The devices ease data processing tasks .
while offering the benefits of flexibility in the shared resource cum enhanced user-connectivity .
With security a major issue, such advances have continued to ignite the interest of adversaries .
The proliferation of smartphones with their processing capacities has further eased it as invasive targets, with protocols made more possible with emergent tools .
, .
An adversary uses the penetrative tools like malware .
to bolster socially-engineered threats that explore subterfuge mode to coordinate their attack at unsuspecting devices in their bid to compromise network infrastructure and resources .
These attack ensures that data exchange is targeted at exploits on a userAos social needs, desires .
and insatiable traits .
TodayAos businesses are reshaped via fusion of informatics .
Ae as a channel to deliver high-end values to consumers, who receive services as rendered .
This exchange has today become a trillion-dollar war .
, as businesses must seek new frontiers to curb attacks amongst other issues .
, as failure to safeguard these exchanges ushers in the need for cross-cutting research .
The success of many of these adversarial attacks is hinged on user personality traits, which include online presence, emotional seclusion, insatiable wants, and trust issues .
An adversary masks their intent as a trusted ally, to exploit a compromised resource Ae providing the attacker with a pivot access for further exploits on the infrastructure .
The consequent rise in the adoption of smartphones has further eased these attacks and compromises considerably.
Phishing simply redirects a userAos request to a spoofed website, rippled with malicious content that seeks to expose a targeted user .
or device without their knowledge .
Phishing consist of 3-elements: .
a lure masks an attacker as a genuine-user, targeting a userAos empathy, fear and curiosity .
, .
a hook is an embedded link in a message .
, and .
a catch obtains an exposed deviceAos private data .
Its success is hinged on its frequency and diversity .
with unrealistic demands that seek to intimidate a userAos psyche with petty gifts .
, .
Vulnerability to scam can be due to demographics .
age, gender, and statu.
shown as in Fig 1 to Fig.
Girls between 24-to-42 years were the most phished due to This work is licensed under a Creative Commons Attribution 4.
0 License.
For more information, see https://creativecommons.
org/licenses/by/4.
Journal of Fuzzy Systems and Control.
Vol.
No 3, 2025 media presence or social seclusion .
There was also the factor of educational status cum societal approval .
users between 18-to-29years were also phished more due to behavioural traits .
Victimization impacts websiteAos contents and its structure with greater probability an unsuspecting user will fall prey .
To identify malicious contents, we must eliminate gaps by .
: .
identify lures that increases believability in a user .
, and .
assess the undetectability and potency of cues to unsuspecting users .
Learning models are successfully used to identify attacks, and detect cues and lures .
that leave users as susceptible.
They identify data anomalies via learned outliers in a dataset .
as accomplished via vote, bagging, boosting, and stacked models/schemes .
MLs are veritable tools to identify attacks .
A trained MLs can detect anomalous patterns Ae even with its dynamic predictors .
Learning schemes are grouped into: machine learning (ML) .
, deep learning (DL) .
, and ensemble learning (EL) .
ML's flexibility and robustness help it to learn intrinsic patterns and decode predictors that fastens model design, and ease outliers identification .
Its pitfalls are imbalanced dataset and the feature selection mode used .
DLs utilize recurrent neural networks to capture chaotic, high-dimensional data patterns .
Its poor generalization due to the vanishing gradient problem, restricts its use.
But, its variant overcomes this via its gates to control its input, and eases its adaptability to learned changes as long-term dependency .
Its inability to handle larger dataset and longer training time required implies the quest for better alternative .
Lastly.
ELs effectively fuses ML with DL into a stronger learner to enhance performance .
It must resolve conflicts of structure and data-encode .
while, leveraging the merits of both ML and DL to avoid model overfit as birthed by the underlying models .
Thus, we explore the XGBoost to achieve such predictive abilities, leveraging its base, weak learners to enhance itself .
It will improve its performance via error reduction on its weak .
learner, and reduce its overall variance and bias in the dataset to improve generalization.
It benefits from the comprehensive knowledge of its weaker base learners, to improve its generalization by exploiting the XGBoost scheme .
With degraded performance due to an imbalanced data .
, we explore the variant SMOTE-Tomek balancing.
Our study wishes to .
: .
identify phishing lures content on spoofed website, .
resolve data imbalance via SMOTETomek, and .
select predictors concerning the target class via the relief rank feature selection.
Resolving data imbalance via oversampling has become imperative in ML, as it accounts for the minor class as crucial .
It is opposed to under-samplers that often reduces or ignore as meaningless, the minor class in a dataset .
Thus, we use the synthetic minority oversample technique (SMOTE) .
, or its variants namely SMOTE-Tomek .
and SMOTEEN .
Our study contributes thus: Section 1 introduces the subject with gaps that motivate the study, .
Section 2 explores the proposed method Ae and leans on data collection, pre-processing, dataset split-balance-normalize via SMOTE-Tomek, the stacked model construction, training and validation with XGBoost, and .
Section 3 Ae discusses the experimental results obtained as evidence in a broader sense cum context for the stacked ensemble on the phishing website dataset as obtained from UCI.
Fig.
Scam count by age distribution Fig.
Scam count by gender Fig.
StudentsAo status by year of study II.
MATERIALS AND METHODS
The stacking mode is based on Fig.
4 as thus:
a Step-1AeData Collection: We explore UCI phishing dataset, that consist of 11,055-records distributed into 5,180-cases in genuine class, and 5,875-cases in phishing class .
The original dataset plot is seen in Fig.
and detailed in Table 1.
a Step-2AePre-processing: cleans up the dataset by expunging redundancies to yield integrity, and removes missing values to yield quality.
The optimized data is encoded via a one-hot encoding that transforms categorical data into its equivalent binary forms .
Fig.
6 shows the optimized dataset.
a Step-3AeRelief Rank Feature Selection: We select strictly, only the predictors of relevance to expunge all docile feats and reduce dataset dimensionality, to aid fastened model construction .
The relief rank: .
assumes all features have same weight and influence on accuracy, .
identifies the nearest sample from the same class as the nearest hit, and the nearest sample from a varying class as the nearest miss, and .
uses feature value of nearest neighbor to update its weight.
Joy Agboi.
Phishing Website Detection via a Transfer Learning based XGBoost Meta-learner with SMOTE-Tomek Journal of Fuzzy Systems and Control.
Vol.
No 3, 2025 assesses the correlation of all predictors for a target class as in .
With a threshold of 8.
321 computed.
Algorithm 1 ranks features using relief ranking to choose a total of 20 predictors as in Table 1, from the original UCI dataset with the initial 30 features.
ycU = 100 O O.
cu12 Oe ycu22 )2 .
Oe ycu12 )2 | to repopulate a pool, or removing data from its original pool to yield a more balanced dataset.
We adapt SMOTETomek via the SMOTE .
and Tomek-links .
nder-sample.
as detailed in .
Fig.
7 shows a balanced plot as resulting from the Algorithm 2 for SMOTE-Tomeks.
Algorithm 1: Relief Ranking Feature Selection Approach With dataset: n E number of train sample.
, a E number of feature.
, m E random train samples used to update W initialize all feature weights W[A]=0.
for ycn = 1 to m do:
randomly select a target instance R find nearest hit AoHAo and nearest miss AoMAo .
for A = 1 to m do:
W[A] = W[A] Ae diff(A,R,H)/m diff(A,R,M)/m end for: end for return vector W of feature scores that estimate feat quality a Step-4AeData Split/Balance: First, dataset is split into train .
%) and test .
%).
Balancing resample data, interpolating its nearest neighbor to create synthetic data Fig.
Proposed stacking ensemble approach with XGBoost as meta-regressor Table 1.
Ranking of Features Engineered Using Wrapper Mode Parameters shortening_service double_slash_redirecting having_IP_Address having_At_Symbol having_Sub_Domain URL_lenght domain_registration_length Prefix_Suffix SSLfinal_State Favicon HTTPS_token Request_URL URL_of_Anchor Links_in_tags SFH Submitting_to_email Abnormal_URL Redirect on_mouseover RightClick popUpWindow Iframe age_of_domain DNSRecord web_traffic Page_Rank Google_Index Links_pointing_to_page Statistical_report Description Whether a URL shortening service like bit.
ly is used .
=Yes, -1=N.
Presence of "//" in the URL path .
: Yes, -1: N.
Whether the URL has an IP Address instead of a domain name .
=Yes, 1=N.
Presence of "@" symbol in the URL .
: Yes, -1: N.
Number of subdomains in the URL .
: More than one, 0: One, -1: Non.
Length of the URL .
=long, 0=medium, -1=shor.
Length of time domain has been registered .
: over a year, -1: Less than a yea.
Presence of "-" in the domain part of the URL .
: Yes, -1: N.
Whether the website uses HTTPS with a valid SSL certificate .
: Yes, -1: N.
Whether the favicon is loaded from the same domain .
: Yes, -1: N.
Use of non-standard ports .
: Yes, -1: N.
Presence of "HTTPS" token in the URL .
: Yes, -1: N.
Percentage of external links in the source code of the website .
: High, -1: Lo.
Percentage of external anchor links on the website .
: High, -1: Lo.
Percentage of external links in tags .
, meta, scrip.
: High, -1: Lo.
Form Handler, where form data is submitted .
: External, 0: Internal, -1: Sam.
Whether the form submits data to an email address .
: Yes, -1: N.
Whether the URL is abnormal .
: Yes, -1: N.
Number of redirects .
: More than one, -1: Less than on.
Whether changing status bar content on mouseover .
: Yes, -1: N.
Whether right-click is turned off on the website .
: Yes, -1: N.
Whether pop-up windows are present .
: Yes, -1: N.
Whether iframe is used on the website .
: Yes, -1: N.
Age of the domain .
: More than 6 months, -1: Less than 6 month.
Whether the DNS record exists .
: Yes, -1: N.
Web traffic rank .
: High, 0: Medium, -1: Lo.
Google PageRank .
: High, -1: Lo.
Whether Google indexes the site .
: Yes, -1: N.
Number of links pointing to the page .
: High, 0: Medium, -1: Lo.
Whether the website is reported as a phishing site .
: Yes, -1: N.
Fig.
Original dataset plot Data Type Selected Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Fig.
Preprocessing applied to the dataset Joy Agboi.
Phishing Website Detection via a Transfer Learning based XGBoost Meta-learner with SMOTE-Tomek Journal of Fuzzy Systems and Control.
Vol.
No 3, 2025 Fig.
SMOTE-Tomek data balancing Algorithm 2: SMOTE-Tomek's Links Data balancing approach //stratified split of dataset with train-70% and test-30% subsets from sklearn.
select import train_test_split.
StratifiedShuffleSplit xy_train, train_test_split .
estSize=0.
stratify=y, random_state=.
x_val, x_test, y_val, y_test = train_test_split.
_temp, y_temp, test_size=0.
3, stratify=y_temp, random_state=.
from minor_class, choose random data-point //start SMOTE_mode compute: rel_dist from rnd_selected_data and k_nearest_neighbor choose rnd_val = random_value.
: rnd_val * rel_distance.
if simSamples = obtained then minorClassNew = minorClas simSample repeat steps 2-to-4 until threshold_minor_class_new = reached.
select rnd_minor_class.
//start Tomek .
nder-sample.
approach find k_nearest_neighbor.
andomized_dat.
if knn.
selected = minor_class_new then TomekLink created stop TomekLink procedure: end a Step-5 Ae Stacked-Ensemble fuses 3-base learners with the XGB meta-regressor, explained as:
Cultural Genetic Algorithm uses these belief spaces as: .
normative values to which predictors are bound, .
domain equip predictors with knowledge about task, .
temporal ensures each predictor knows the solution, and .
spatial ensures each .
predictor with its It uses an influence function to set its .
pper and lowe.
bounds which lies between .
in its quest for optimal as in .
Ae allows knowledge transfer between its belief space.
and the pool, and to alter each predictor to conform with its belief space .
Each bi OO.
is a chromosome gene .
Table 2 is the CGA = yaycoycuycyceyc ycu A yaycycyycyyceyc 2ycA Oe1 ycu A = Oc ycaycn 2ycn Random Forest successively grows its decision trees independently via a bootstrap sample, in bagging mode .
It uses a binary split on its extra layer to extend the randomness on how its trees are constructed, so that its best nodes are selected randomly to capture intricate feats in the dataset .
Its inability to handle diversity in categorical data often results in its poor performance .
Thus, we tune the hyperparameters to reduce model overfitting .
Expressed in .
, with ycuycuycycoyceycn as normalized feature importance for ycn in tree j in .
T is the total number of trees, and yceycnycn is the importance of a feature ycn about ground-truth, and ycuycnyc is nodal feature importance as in .
that yields Gini value .
Table 3 shows the Random Forest design configuration.
Table 3.
Rf Design Configuration Features n_estimators learning_rate max_depth min_sample_split random_state eval_metric eval_set Value error, logloss x,val, y_val True ycuycuycycoyceycnycn = yceycnycn = Description Number of trees constructed Step size learning for update Max depth of each tree Minimal samples needed The seeds for reproduction Performance evaluation metrics Train data for evaluation sets bootstrap aggregation used yceycnycn OcycOOycaycoyco yceyceycaycycycyceyc yceycnyc Ocyc:ycuycuyccyce yc ycycyycoycnycyc ycuycu yceyceycaycycycyceyc ycn ycuycnyc Ocycoycaycoyco ycuycuyccyceyc ycuycnyco ycuycnyc = ycyc ycayc Oe ycycoyceyceyc.
Oe ycycycnyciEayc.
ycaycycnyciEayc.
Korhonen Modular Neural Network (KMNN) yields a deep, modular learning model that computes its output using the tan-sigmoid function.
It splits a network into smaller units for enhanced dependence and improved efficacy of its units .
This improves its computational efficiency, reduces time convergence, and lets it handle more tasks effectively in parallel .
Its diversity grants each unit independent training to make KMNN more robust and flexible, with improved generalization .
Table 4 details the KMNN design configuration.
ycn=0 Table 4.
Korhonen Modular NN Configuration Table 2.
CGA Design and Configuration Features max_nos_gen nos_individuals Value selection_type offspring_created req_fit_function learning_rate random_state max_nos_gens Description Maximum number of generations Number of solutions in a generation 1-rank, 2-elitism, 3-steady state, 4-tourney, 5-stochastic universal Offspring: 1-crossover, 2-mutation Minimal number of samples needed Determines the step size in learning The seeds for reproduction Epochs or max number of generations Features eval_perf_set hidden_layers training_percent transfer_hidden Value MSE learning_rate number_layer data_division train_net_algo LMBP bkpg_momentum Description Evaluation metrics at training Number of hidden layers adopted k-fold dataset used for training Transfer .
learning Step size learning to update the Minimal number of samples needed k-fold dataset for construction Training mode by a neural network Backpropagation-in-momentum Joy Agboi.
Phishing Website Detection via a Transfer Learning based XGBoost Meta-learner with SMOTE-Tomek Journal of Fuzzy Systems and Control.
Vol.
No 3, 2025 .
XGBoost meta-regressor leans on the predictive output of its base models, expanding its goal function through its regularizer term .
ceyc ) and loss function l( ycUycnyc , ycUCycnyc ) .
to ensure its solution remains within the bounds for its improved accuracy via its tuned hyperparameters .
as in Table 5 and .
yaycn = Ocycuycn=1.
co( ycUycnyc , ycUCycnyc ) yceyco .
cuycn ) ) .
ceyc ) .
Table 5.
XGB Regressor Design and Configuration Features n_estimators max_depth eval_set learning_rate eval_metric random_state Value x,val, y_val error, logloss Description Number of trees constructed Max depth of each tree Train dataset to evaluate performance Step size learning to update XGBoost Performance evaluation metrics The seeds for reproduction Fig.
8 is the AUC-ROC with a 99.
73%, and shows the modelAos capability to differentiate the negative and positive The proposed model accurately identified all 3,591 of the test data.
With only a misclassified case and no false positives recorded .
Ae Its specificity of 1.
000 implies that no phishing content was misclassified.
This is critical to avoid misclassification when detecting phishing.
Proposed model enhances phishing website detection performance on both the training data and the held-out test set .
Fig.
9 implies the ensemble correctly classified all test datasets with perfect accuracy.
The utilization of both feature selection.
SMOTE-Tomek balancing, and data normalization did not degrade performance .
Rather, it focuses on critical feats for model construction to successfully detect spoofed websites with reduced errors that will secure user.
resources and enhance experience.
a Step 6 Ae Train/Cross Validation: is initialized with default configuration as in Table 2 to Table 5 to tune Each tree is iteratively constructed and trained to ensure the collective knowledge is used in identifying intricate patterns.
Training blends synthetic with original data that guarantees comprehensive learning, while improving its adaptability for a variety of configurations .
RESULT FINDINGS AND DISCUSSION
Results.
Findings, and Discussion For a comprehensive evaluation devoid of overfitting, we use a 5-fold cross-validation on the 70% train-subset obtained via SMOTE-Tomek balancing, and a final evaluation carried out via a held-out test .
%) dataset as in Table 6.
Proposed hybrid yields an average accuracy of 99.
34% with a Precision 6%, a Recall of 98.
64%, an F1 of 99.
2%, a Specificity 66%, an MCC of 97.
7% and an AUC-ROC of 99.
From Table 6, the high value resulting in the MCC scores implies that the model accurately and consistently handles the minority class with data balancing performed.
while the Specificity value of 99.
66% reached indicates that the model effectively recognizes phishing, malicious websites that agree with .
The held-out test .
%) assesses the modelAos generalization ability with unseen data.
The results showed an accuracy of 99.
7%, precision 100%, recall of 99.
8%, and F1 of 99.
The AUC value of 99.
7% implies that the model was able to differentiate between the benign and malignant Also, a Specificity of 100% indicates that no benign .
record was misclassified.
Table 6.
Evaluation Without Feature Selection 5-Fold Cross-Validation (Trainin.
Models Fold-1 Accuracy 0.
Recall Precision 1.
MCC
Specificity 1.
AUC-ROC 0.
HeldOut Fold-2 Fold-3 Fold-4 Fold-5 Test Set Fig.
ROC result of the held-out test dataset Fig.
Confusion Matrix Comparison As we explore the high performance of our proposed ensemble with the dataset to demonstrate its robust flexibility, adaptability, robustness, and prediction ability, we also benchmark it against previous methods that have used the same dataset .
Thus, we benchmark our ensemble's similar design constructs on various datasets for various domain tasks, as in Table 7 .
We focus on the held-out test dataset performance as it presents a more realistic indication of the modelAos generalization capabilities.
These are summarized using the metric Table 7:
Table 7.
Evaluation Without Feature Selection SEM DHH BiGRU
DBN
GRU
FSOR
Ref .
Ref .
Ref .
Accu.
Recall 0.
Precis.
Spec ROC 0.
LSTM
CNN
GBM
PSO
Ref .
Joy Agboi.
Phishing Website Detection via a Transfer Learning based XGBoost Meta-learner with SMOTE-Tomek Our Model Journal of Fuzzy Systems and Control.
Vol.
No 3, 2025 The proposed model underperforms against .
due to its use of BiGRU deep learning scheme with hybrid feature However, other benchmark model underperformed in comparison to our proposed model, across metrics on the test dataset Ae achieving its high accuracy .
7%), precision .
%), recall .
8%) and specificity .
%) Ae showing best generalization with low false-positives, which is crucial in phishing detection especially with complex lures used by adversaries .
in their evolving exploit methods.
Models leverage deep learning capabilities Ae their performance can be seen to be slightly lower in metrics, and the lack thereof of specificity indicates that they are less robust.
whereas, our model can be seen to maintain high sensitivity performance, even with its transfer learning architectures.
We used the SMOTE-Tomek scheme to address class imbalances .
IV.
CONCLUSION
This study presents a hybrid fusion of supervised CGA with unsupervised KMNN, and tree-based (RF and XGB) to classify websites via the UCI phishing website dataset.
The model achieved high-performing discriminative ability by fusing statistically selected features using the relief ranking It used SMOTE-Tomek at training successfully to mitigate imbalance in classes to yield enhanced recall and F1.
Its final classification with the XGB kernel achieved a 99.
accuracy with 100% precision on test data.
The comparative analysis with benchmarks showed our methodAos superior generalization and data balance.
Thus, our study contributes a light framework yet effective model that avoids complex training, handles larger dataset complexities, and proffers interpretability with high performance.
Future work may extend this hybrid strategy to multiclass or multimodal datasets and test alternative fusion or dimensionality REFERENCES