Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
Performance Comparison of Logistic Regression and XGBoost for Credit Card Fraud Detection using Random Undersampling and Hyperparameter Tuning Hasri Akbar Awal Rozaq*1.
Deni Sutaji2 Graduate School of Informatics.
Department of Computer Science.
Gazi University.
Tyrkiye Email: 1hakbar.
rozaq@gazi.
Submitted: 05 November 2025.
Revised: 14 December 2025.
Accepted: 26 December 2025.
Published: 31 December 2025 Abstract Credit card fraud is a growing problem due to the rise of card transactions.
This study investigates the effectiveness of Logistic Regression (LogRe.
and Extreme Gradient Boosting (XGBoos.
in identifying fraudulent transactions in a highly imbalanced dataset, where only 8% of the data represents fraudulent activity.
To address the class imbalance, random undersampling was applied, reducing the number of legitimate transactions.
This technique significantly improved LogReg's ability to detect fraud, with the AUC-ROC increasing from 0.
7994 to 0.
XGBoost performed well even without hyperparameter tuning or random undersampling, indicating its robustness as a baseline model.
The study highlights the critical importance of addressing class imbalance in fraud detection.
Both LogReg and XGBoost demonstrated potential, particularly when combined with techniques like undersampling or hyperparameter tuning.
These findings underscore the need for effective data preprocessing methods to enhance the performance of machine learning models in detecting credit card fraud.
Keywords: Fraud Detection.
Hyperparameter Tuning.
Imbalanced data.
Logistic Regression.
Random Undersampling.
XGBoost This work is an open access article licensed under a Creative Commons Attribution 4.
0 International License.
INTRODUCTION
Over the past few decades, the fintech industry has experienced rapid growth, making credit cards a common choice for everyday purchases.
However, this rapid rise in credit card usage has also led to an increase in credit card fraud, creating significant financial risks to both consumers and financial Effective fraud detection has become essential to mitigate these risks and maintain the integrity of financial systems.
To address this vulnerability, it is crucial to implement effective methods that can accurately identify fraudulent transactions.
Data mining and machine learning approaches have proven to be particularly effective in this context.
Logistic Regression is a proven method known for its effectiveness when dealing with straightforward datasets that have linear relationships.
However, it faces challenges when handling complex data with many dimensions, known as the "curse of " As the number of features grows.
Logistic Regression struggles to maintain accuracy, requiring more data points.
, .
In contrast.
Extreme Gradient Boosting (XGBoos.
is a powerful machine learning algorithm that enhances logistic regression's binary classification concept through gradient boosting.
It leverages gradient tree boosting to create a more flexible ensemble model, making it highly effective in handling complex relationships and non-linear patterns within high-dimensional data.
, .
Both algorithms Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
share the foundation of predicting binary outcomes using a logistic function, but XGBoost offers a more robust approach for intricate datasets.
To maximize the performance of XGBoost, hyperparameter tuning is applied.
This method optimizes the algorithm's performance by adjusting its parameters.
In this study.
Randomized Search CV is employed for hyperparameter tuning due to its simplicity and minimal computational resource requirements.
It often helps to improve model performance in many cases.
Real-world applications often involve imbalanced data, where classes are not equally represented.
This imbalance can significantly impact model performance, biased to the majority class .
, .
With this matter.
It is crucial to find the best way to address the imbalance data problem.
However, the optimal approach depends on the characteristics of the dataset and the specific classification task.
For that reason.
Random undersampling could be a great approach regarding this matter.
, .
Random undersampling is a simple sampling technique that can be used to addresses data imbalance by randomly selecting samples, in this study, from the majority class.
It is suitable for this study as it effectively reduces the size of the majority class.
Given the large size of the dataset, substantial computational resources are required.
However, the simplicity of random undersampling makes it an efficient choice for this purpose.
In summary, this paper aims to achieve two objectives: (I) to compare the performance of Logistic Regression and XGBoost algorithms using Random Undersampling in classifying fraudulent transactions, and (II) to optimize the XGBoost accuracy Hyperparameter Tuning using Randomized Search CV.
METHOD
This chapter explain the methodology employed in this research to compare the performance of Logistic Regression and XGBoost for credit card fraud detection with Random Undersampling The chapter will be divided into the following sections in Figure 1:
Figure 1.
Research method stage Data Description The dataset is obtained from Kaggle.
which is available publicly.
It has 1,000,000 samples with 87,403 true values in the target class and 912,597 false values in the target class.
It is implied that this dataset has an extremely imbalanced class with only 8.
74% of the minority class.
Besides.
It also has 7 features with 3 numeric data and 4 subtype data, with a target class column at the end.
The dataset is described in Table 1 and Table 2.
Table 1.
Dataset Description Column distance_from_home distance_from_last_tr ratio_to_median_purc hase_price Description Distance between the attempted transaction location and the cardholder's registered address .
n Km.
Distance between the attempted transaction location and the location of the most recent transaction using the same card .
n Km.
Ratio of the current transaction amount to the median transaction amount for this card .
, transaction amount of 85 with a median of 50 would be 0.
Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
repeat_retailer used_chip used_pin_number online_order Column distance_from_home distance_from_last_t ratio_to_median_pur chase_price repeat_retailer used_chip used_pin_number online_order Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
Indicates whether the attempted transaction occurs at a retailer where the card is frequently used (Yes/N.
Indicates whether the transaction attempt involved using an RFID chip in the card (Yes/N.
Indicates whether the transaction attempt involved entering a PIN number (Yes/N.
Indicates whether the transaction attempt is for an online order (Yes/N.
Indicates whether the transaction is classified as fraudulent or legitimate (Yes/N.
Table 2.
Summary of Statistics Type Max Min Mean Numerical Std Numerical Numerical Categorical (Binar.
Categorical (Binar.
Categorical (Binar.
Categorical (Binar.
Categorical (Binar.
Preparation This study leverages a credit card fraud dataset with minimal cleaning requirements.
In this stage.
Exploratory data analysis (EDA) is conducted as the initial step to gain valuable insights into the data distribution and identify potential areas for preprocessing.
Understanding these characteristics helps guide subsequent data preparation techniques.
Following EDA, the data undergoes the following preprocessing steps:
Splitting The data is then split into training and testing sets.
Splitting to 80% for training and 20% for testing is one of the common approach.
The training set is used to build the model, while the unseen testing set provides an unbiased evaluation of the model's generalizability.
Splitting the data first before doing any preprocessing process is generally recommended to prevent data leakage to the testing set and unintentionally influence the model during training, leading to overfitting.
Feature Scaling Feature scaling is applied to standardize the data.
Standardization ensures all features have a mean of zero and a standard deviation of one, removing the influence of units and promoting equal contribution from each feature during model training.
For Standardization purpose.
StandardScaller from PythonAos Scikit-Learn library is used.
Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
Random Undersampling (RUS) To address class imbalance within the dataset.
Random Undersampling (RUS) is employed.
This technique strategically removes majority class samples to a desired number.
Samples are chosen randomly, aiming to achieve a more balanced class distribution.
In this study.
RUS is used to choose 200,000 of majority class in 800,000 training set.
RUS requires fewer computational resources, but it does come with a potential drawback such as the possibly loss of valuable data insights.
The decision to use it depends on the severity of the class imbalance and the chosen modeling approach.
, .
Implementation This section focuses on the modelling stage, with the implementation of two machine learning algorithms.
Logistic Regression and XGBoost.
Hyperparameter Tuning with Randomized Search CV will also be implemented with XGBoost.
It will be a great step for the model to optimize their performance in identifying fraudulent transactions.
Logistic Regression A logistic regression can be interpreted as a generalized linear model (GLM) when the dependent variable is binary, either 0 or 1 .
, .
It has recently been used to analyze the advantages of using measurable independent factors and to see how a group of these factors affects the regression outcomes .
Unlike linear regression, which predicts continuous values, logistic regression transforms its linear output using the logistic function .
igmoid functio.
to produce a probability value between 0 and 1.
This probability indicates the likelihood of an observation belonging to the positive class .
fraudulent transactio.
To estimate the probability of to the positive class, linear score is calculated first .
using This score is a weighted sum of the independent variables, where each variable's contribution is determined by its associated weight.
Equation 2 then employs the logistic function (E) to C) between 0 and 1.
transform this linear score .
into a probability estimate .
eo yeu = yeoyc O yeo yeE ya C = yyO.
= yeo .
a yeI(Oeye.
) .
Information:
: Linear Predictor yeoyc : Transpose of the Weights .
: Independent Variable : Bias Term yeo : Predicted Probability yyO.
: Logistic Function : Exponential Function .
This function helps determine the likelihood of fraud.
Logistic regression has several advantages, including computational efficiency, making it suitable for large datasets and resource-constrained It can handle various types of variables and does not require normal distribution.
Furthermore, it does well in capturing linear relationships between features and the target variable, leading to accurate classifications.
However, logistic regression is not without limitations.
It is restricted to binary classification problems only, as it struggles with scenarios involving more than two classes.
Additionally, logistic regression struggles with capturing non-linear relationships and may be sensitive to outliers, potentially leading to suboptimal results .
Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
XGBoost Unlike logistic regression, which relies on a single model.
XGBoost leverages the power of ensemble learning, specifically a technique known as gradient boosting.
Ensemble learning builds robust models by combining predictions from multiple weak learners, often referred to as base learners which is decision trees.
Gradient boosting takes this concept further by creating new models.
Each new model focuses on correcting the errors made by the previous one, leading to a more accurate ensemble .
XGBoost offers several advantages that make it a really good option for different machine learning jobs.
Its ability to handle complex, non-linear relationships between features and the target variable is a significant strength compared to simpler models like logistic regression.
Moreover.
XGBoost's ability to manage sparse data well and can train multiple parts at the same time, which is great for dealing with complex datasets in real-world situations such as credit card fraud.
While XGBoost offers significant advantages, it's not without limitations.
One consideration is its computational demands.
Training XGBoost models can require more computational resources compared to simpler algorithms like logistic regression.
Additionally.
XGBoost has a larger set of hyperparameters that require careful tuning for optimal performance.
This tuning process can be more complex compared to LogReg, potentially requiring expertise and experimentation.
Despite these limitations.
XGBoost remains a powerful tool for classification and regression tasks due to its ability to learn complex patterns, handle large datasets efficiently, and offer valuable insights into feature importance.
Hyperparameter Tuning While machine learning models rely on hyperparameters that influence their performance, the optimal values for these parameters need to be decide.
Hyperparameter tuning offers a method to identify the most optimal values to improve the model performance.
, .
Several techniques exist for hyperparameter tuning, with grid search and randomized search being common approaches.
Grid search carefully checks a set of values one by one, making sure to cover everything broadly.
Thus, this thoroughness need a lot of computing resource.
Randomized search offers an alternative by efficiently sampling hyperparameter combinations, reducing the computational resources needed and risk of getting stuck in suboptimal configurations.
The importance of hyperparameter tuning is suitable for algorithms like XGBoost.
XGBoost, with its ensemble nature and gradient boosting framework, owns a rich set of hyperparameters impacting model complexity, regularization, and learning rate.
Tuning these parameters allows for optimization, leading to improved accuracy.
In this study.
Randomized Search CV will be employed for XGBoost to efficiently explore the hyperparameter space and achieve optimal model performance.
Evaluation The implemented models will be evaluated on a held-out test set created by splitting the preprocessed data in 80/20 split ratio.
Established metrics like accuracy, precision, recall.
F1-score, and AUC-ROC will be used to compare different configurations performance of Logistic Regression and XGBoost, such as with and without hyperparameter tuning Randomized Search CV, as well as with and without Random undersampling.
This evaluation will identify the most effective model configuration for credit card fraud detection.
AUC-ROC (Area Under the Receiver Operating Characteristic Curv.
serves as a performance metric that summarizes how well a model distinguishes between positive and negative cases.
It achieves this by calculating the probability that a randomly chosen positive example ranks higher on the ROC curve .
igher True Positive Rat.
compared to a randomly chosen negative example.
Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
In binary classification tasks, accurately identifying both positive and negative cases is crucial.
True positives (TP) represent correctly classified positive examples, while true negatives (TN) represent correctly classified negative examples.
In the other hand, false positives (FP) occur when the model mistakenly classifies a negative example as positive, and false negatives (FN) occur when the model misses a positive example, identifying it as negative.
It is included in the confusion matrix table is shown in the table 3.
Table 3.
Confusion Matrix Predicted Class Actual Class Accuracy is the ratio of correctly predicted instances to the total instances and is formulated in Equation 3:
ycNycE ycNycA ycaycaycaycycycaycayc = ycNycE ycNycA yaycE yaycA y 100% .
Precision measures the proportion of true positive predictions out of all positive predictions made and is formulated in Equation 4:
ycyycyceycaycnycycnycuycu = ycNycE y 100% ycNycE yaycE Recall quantifies the proportion of actual positive instances that were correctly identified and is formulated in Equation 5:
ycNycE ycyceycaycaycoyco = ycNycE yaycA y 100% .
F1-score is the harmonic mean of precision and recall, balancing the trade-off between these metrics to provide an overall assessment of a model's performance in identifying positive cases while minimizing false positives.
It is formulated in Equation 6:
ya1 Oe ycycaycuycyce = 2yycyycyceycaycnycycnycuycuyycyceycaycaycoyco ycyycyceycaycnycycnycuycu ycyceycaycaycoyco Information:
: True Positive : True Negative : False Positive : False Negative RESULT This section is a discussion of the study that has been done.
Starting from the preprocessing, implementation, and evaluation.
Preprocessing result In this stage, several data preprocessing techniques is performed.
To avoid contaminating the testing set, the data is first split into training and testing sets with a ratio of 80:20.
This ensures that the preprocessing steps are applied only to the training data.
The details of the split are shown in Table 4.
Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
Table 4.
Data splitting Training 800,000 Testing 200,000 1,000,000 Next, feature scaling is applied.
Scaling the data before splitting it into training and testing sets is This prevents the data leakage and statistics used for scaling .
, standard deviation and mea.
from being influenced by the testing data.
Standardscaller is used in this stage for standardization Standardization transforms the features to have a mean of 0 and a standard deviation of 1.
Sampling result After scaling, a class imbalance handling technique is surely required for the data, such as a sampling method.
In this stage.
The train dataset will reduce it rows using random undersampling.
It only reduce the majority class rows to 200,000 while minority rows still remain the same.
It will boost the percentage of the minority class from 9% to 30%.
The result and the before is shown in the figure 2 Figure 2.
Class percentage before sampling Figure 3.
Class percentage after sampling Based on Figures 3 and 4, it can be observed that the ratio between the minority and majority class has significantly improved after applying Random Undersampling (RUS).
Figure 3 likely represents the class distribution before RUS, where the majority class dominates the pie chart.
Figure 4 presumably shows the distribution after RUS, where the minority class now occupies a noticeably larger portion relative to the majority class.
This visual representation suggests that RUS effectively reduced the majority class size, leading to a more balanced class distribution.
Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
Performance Comparison After preprocessing the data, configuring the models is crucial for achieving optimal performance.
This research investigated several configurations: Logistic Regression with and without Random Undersampling (RUS), and XGBoost with and without RUS, additionally exploring XGBoost with hyperparameter tuning.
Due to the large dataset and limited computational resources.
Randomized Search CV was employed as a hyperparameter tuning technique.
Following the exploration of various model configurations, this section delves deeper into the performance comparison of these models.
The results for each configuration will be analyzed using metrics like accuracy, precision, recall.
F1-score, and AUC-ROC.
By comparing the performance metrics across configurations, this study aim to identify the model that achieves the most robust and accurate credit card fraud detection Logistic Regression Logistic Regression performed with 2 model configurations, with or without Random Undersampling.
The classification report is shown in the table 5 and 6.
Table 5.
Logistic Regression Report without RUS
macro avg
weighted avg
AUC-ROC
Precision Recall F1-Score Support Table 6.
Logistic Regression Report with RUS
macro avg
weighted avg
AUC-ROC
Precision Recall F1-Score Support As shown in the figure 5 and 6.
It is implied that Logistic Regression exhibits a substantial improvement in AUC-ROC .
7994 to 0.
after applying Random Undersampling (RUS), while maintaining a similar overall accuracy .
This suggests that the initial model, despite achieving high accuracy, suffers from a class imbalance issue.
The dominant class likely biases the model's predictions, leading to a lower AUC-ROC, which reflects the model's ability to differentiate between the positive and negative classes.
RUS effectively addresses this imbalance by reducing the majority class size, resulting in a more balanced distribution and a more robust performance measure as evidenced by the significant increase in AUC-ROC.
XGBoost XGBoost performed with 3 model configurations, the baseline model, with Hyperparameter Tuning using Randomized Search CV, and with Random Undersampling .
The result is shown in the table 7 to 9.
Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
Table 7.
XGBoost Baseline Model Report
macro avg
weighted avg
AUC-ROC
Precision Recall F1-Score Support Table 8.
XGBoost Report with Hyperparameter Tuning macro avg
weighted avg
AUC-ROC
Precision Recall F1-Score Support Table 9.
XGBoost Report with RUS
macro avg
weighted avg
AUC-ROC
Precision Recall F1-Score Support Based on the table 7 to 9.
It can be implied that XGBoost performs well on this classification task, achieving high accuracy .
8%) across all configurations.
However, hyperparameter tuning and Random Undersampling (RUS) appear to have minimal impact on overall accuracy.
The key difference lies in the model's ability to distinguish between classes, measured by AUC-ROC.
While the baseline XGBoost achieves a high AUC-ROC .
, both tuning .
and RUS .
lead to slight Notably.
RUS has the most significant impact, suggesting that the original data have a class imbalance issue.
RUS helps address this imbalance, resulting in a model that can better differentiate between the positive and negative classes, as reflected in the higher AUC-ROC score.
DISCUSSIONS
This analysis resulting critical insights into model performance, class imbalance effects, tuning optimization effect, and the strengths of both algorithms.
Both XGBoost and Logistic Regression achieved high overall accuracy, with XGBoost at around 99.
8% and Logistic Regression at 95.
9% across all configurations.
However, a closer look reveals a significant difference in their ability to differentiate between positive and negative classes, as indicated by the Area Under the ROC Curve (AUC-ROC).
While Logistic Regression achieved high accuracy, its initial AUC-ROC score .
hinting at challenges with class imbalance.
This is further supported by the analysis of precision and recall.
While Logistic Regression demonstrated high precision .
for the majority class, suggesting it could predict the dominant class well, it struggled with recall .
for the minority class .
, leading to a notable number of false negatives.
This highlights Logistic Regression's limitations in handling imbalanced datasets.
Jurnal Ilmu Komputer dan Informatika (JIKI) P-ISSN: 2807-6664
E-ISSN: 2807-6591
Vol.
No.
December 2025.
Page.
https://jiki.
jurnal-id.
DOI: https://doi.
org/10.
54082/jiki.
Applying Random Undersampling (RUS) to Logistic Regression significantly improves its AUCROC score .
This suggests that RUS effectively addresses class imbalance by reducing the majority class size, leading to a more balanced distribution and consequently, a more robust performance The improvement in recall for the minority class .
further supports this notion, indicating that RUS helps Logistic Regression identify more true positives in the minority class.
In contrast.
XGBoost consistently displayed a strong ability to distinguish between classes, evident from its high AUC-ROC scores .
across all setups.
This indicates XGBoost's inherent capability in handling class imbalance, possibly due to its ensemble learning approach and capacity to capture intricate data relationships.
Even without hyperparameter tuning or class balancing techniques.
XGBoost maintained a high AUC-ROC, showcasing its robustness in imbalanced classification tasks.
The application of Hyperparameter Tuning and Random Undersampling did not significantly impact its performance, as the model was already performing well.
CONCLUSION
This study investigated the effectiveness of Logistic Regression (LogRe.
and Extreme Gradient Boosting (XGBoos.
for credit card fraud detection.
These findings highlight how important it is to solve class imbalance problem, which is a big challenge in this field because there aren't many fraudulent transactions in real-world case.
Random Undersampling, a technique that reduces the majority class size, was employed to address this imbalance dataset and successfully improve model performance.
was observed that LogReg required modifications, such as the use of Random Undersampling, to achieve optimal performance, while XGBoost demonstrated decent performance even in its baseline The evaluation, using metrics like accuracy, precision, recall.
F1-score, and AUC-ROC, revealed that XGBoost consistently outperformed LogReg in identifying fraudulent transactions, especially when hyperparameter tuning with Randomized Search was applied.
This suggests that XGBoost's inherent ability to handle complex relationships within data makes it a more robust choice for credit card fraud detection tasks characterized by class imbalance.
CONFLICT OF INTEREST
The authors declares that there is no conflict of interest between the authors or with research object in this paper.
REFERENCES