JOIV : Int.
Inform.
Visualization, 9.
- March 2025 770-778
INTERNATIONAL JOURNAL
ON INFORMATICS VISUALIZATION
INTERNATIONAL
JOURNAL ON
INFORMATICS
VISUALIZATION
journal homepage : w.
org/index.
php/joiv A Better Performance of GAN Fake Face Image Detection Using Error Level Analysis-CNN Maria Ulfah Siregar a,*.
Nurochman Nurochman a.
Anif Hanifa Setianingrum b.
Dwi Larasati a.
William Santoso b.
Meisia Dhea Stefany a Department of Informatics.
UIN Sunan Kalijaga.
Jl Marsda Adisucipto.
Sleman.
Indonesia Department of Informatics.
UIN Syarif Hidayatullah.
Jl.
Ir.
Djuanda.
Tangerang Selatan.
Indonesia Corresponding author: *maria.
siregar@uin-suka.
AbstractAi The use of face images has been widely established in various fields, including security, finance, education, social security, and others.
Meanwhile, modern scientific and technological advances make it easier for individuals to manipulate images, including those of faces.
In one of these advancements, the Generative Adversarial Network method creates a fake image similar to the real one.
An error-level analysis algorithm and a convolutional neural network are proposed to detect manipulated images generated by generative adversarial networks.
There are two scenarios: a stand-alone convolutional neural network and a combination of error-level analysis and a convolutional neural network.
Furthermore, the combined scenario has three sub-scenarios regarding the compression levels of the error-level analysis algorithm: 10%, 50%, and 90%.
After training the data obtained from a public source, it becomes evident that using a convolutional neural network combined with compression of error level analysis can improve the modelAos overall performance: accuracy, precision, recall, and other parameters.
Based on the evaluation results, it was found that the highest quality convolutional neural network training was obtained when using 50% error level analysis compression because it could achieve 94% accuracy, 93.
3% precision, 94.
9% recall, 94.
1% F1 Score, 98.
7% ROC-AUC Score, and 98.
8% AP Score.
This research is expected to be a reference for implementing image detection processes between real and fake images from generative adversarial networks.
KeywordsAi Compression level.
manipulated images.
real image.
Manuscript received 30 Apr.
revised 24 Aug.
accepted 1 Oct.
Date of publication 31 Mar.
International Journal on Informatics Visualization is licensed under a Creative Commons Attribution-Share Alike 4.
0 International License.
are so real.
Thus.
GAN can produce realistic fake images or deepfakes, which can be formed even when the original images are not accessed, as said by Hitaj in .
GAN can also be used to produce synthetic data in the case of rare data, such as to produce synthetic chest X-ray (CXR) images during the last COVID .
Moreover.
GAN is a powerful help in augmenting data compared to classical data augmentation .
, and optimizing problems .
In previous studies, several methods have been proposed for the detection of fake images generated by GANs .
, .
As part of the proposed study.
Error Level Analysis (ELA) and Convolutional Neural Network (CNN) algorithms are proposed for detecting manipulated images generated by GANs.
As the availability of applications that could produce deepfake images is growing massively, it is challenging for researchers to contribute to this issue.
On the other hand, the use of images in social content is increasing rapidly and facing new/first-time challenges .
, .
, .
Using the GAN method, it is possible to identify manipulated face images produced by the GAN .
This INTRODUCTION The use of facial images has been widely established in various fields, including security, finance, education, social security, and others .
, .
Meanwhile, modern scientific and technological advances make it easier for individuals to manipulate images, including those of faces.
Problems relating to images have been considered for many years, such as in .
or as said by Krawetz cited in .
Thus, the capability to recognize phony pictures cannot be avoided, and it is necessary in this era .
, either the phony picture is obtained after being affected by a global or local perturbation .
On the other hand, synthetic images are sometimes inevitably needed because the real image is difficult to get .
or is it too costly to get .
In one of the technological advancements, the Generative Adversarial Network (GAN) method was used to create a fake image that is very similar to the original image.
GAN belongs to deep learning techniques .
which makes it hard for humans to identify fake or real images.
It is because the phone pictures technique is used to remove GAN fingerprints from the fake The results indicate that additional efforts are still required to develop a reliable system for detecting image According to another research, face features were encrypted using a GAN .
As a one-way process, encryption effectively protects face features.
A certain study attempted to solve the problem of blurred images .
, .
GAN in .
restored blurred face and body images.
GANs have also helped to increase the performance of CNN in medical image classification, as said by Frid-Adar et al.
GAN is used to generate medical images, and then they are used for synthetic data augmentation.
The generated or synthetic data should have high quality .
Other research has also experienced generating data, which is then treated as unlabeled samples, as said by Xin and Huang in .
Other research regarding GAN proposes that GAN bridges the gap in person re-identification (ReID), as said by Wei in .
that research, the proposed method is Person Transfer GAN (PTGAN).
Social GAN is challenging research relating to GAN, which predicts human motion, as said by Gupta et al.
, predicts a future path for an agent, as said by Sadeghian et al.
, etc.
The GAN, which is built here, is based on an encoder-decoder architecture.
GANs, which is a relatively new framework used to estimate generative models through an adversarial process, as said by Goodfellow et al in .
, involves simultaneously training two models: a generative model G that represents the distribution of data, and a discriminative model D that estimates the probability that one sample comes from a different training set than the one produced by G.
A training procedure for G is designed to maximize the chances that it will make an error.
GANs have been integrated with CNNs, resulting in a class of CNNs, namely.
Deep Convolutional Generative Adversarial Networks (DCGAN).
This class has architectural limitations and is claimed to be a strong candidate for unsupervised learning, as said by Radford in .
It has been found that other researchers have conducted research that combined ELA and Deep Learning (DL).
One of them is a study, as said by Gunawan et al.
In that study, 1771 images with tampered labels and 2940 images with real labels were taken from the CASIA dataset.
ELA is a forensic method for identifying parts of an image based on different compression levels .
In other words, the error level will be computed, as said by Jeronymo et al.
, where an image will usually be divided into 8x8 small image blocks and compressed using JPEG to 95% accuracy.
Each block will provide the same level of compression quality.
If there are blocks with different compression qualities, this indicates that manipulations have occurred .
, .
, in which the manipulated ones have higher error potential compared to the unmanipulated part of images, as said by Gunawan et al.
The ELA process, which is a multimodal data analysis technique .
This can be accomplished by saving the image using a specified level of compression quality, calculating and observing the difference between these levels of compression quality, which is to extract the noise from the image .
ELA is one of the advanced image analysis, as said by Krawetz in .
, but it suffers from image noise, which is too intense .
There are other studies which use ELA to improve detection accuracy, one example is presented in .
There.
ELA is conducted after JPEG compression.
Features are extracted using SIFT in classification.
They used datasets from ImageNet and Caltech-256.
The accuracies outperform the state-of-the-art The CNN, which is in the deep learning area, was developed initially to recognize handwriting, which then proved capable of solving the problems of image recognition, detection, and segmentation, as said by Sudiatmika et al.
Since using deep learning techniques.
CNN could be used to differentiate between real and fake images .
CNN
has a remarkable ability to classify large-scale images.
This capability is influenced by the arrangement of a CNN, which consists of three layers: the convolutional layer, the pooling layer, and the fully connected layer.
On the other hand, the use of deep learning has privacy implications, as a result of being trained in a centralized technique, as said by Hitaj in .
This study aims to compare the accuracy and other metrics of the identification of face images.
The identification is done using the CNN method.
Before this.
ELA compression will be implemented to enhance accuracy.
Thus, the contribution of this study is to devise a better method to detect fake faces by combining ELA and CNN and using the GAN face dataset.
This proposed study differs from the study in .
, .
, such as in the dataset that is used.
Those studies use datasets from CASIA 2.
, .
and MICC F200 .
Ref.
uses 90% ELA.
Ref.
outperforms existing training time and efficiency of the state-of-the-art deep learning models.
The same with .
The system is better than the existing methods.
This paper is organized as follows: Section 1 discusses the background, related works, and objectives.
Section 2 explains the method for conducting this research.
Section 3 gives the results and discusses them.
Section 4 concludes the paper.
II.
MATERIALS AND METHOD
The research begins with collecting image data from public It is then followed by conducting a literature review.
The next step involves image pre-processing, edge detection with the Canny Edge Detector/image compression with ELA, detection by CNN, and accuracy analysis.
To detect real and fake images, two scenarios are used: one in which the image is compressed with ELA (ELA-CNN), and one in which the image is not compressed with ELA .
on-ELA-CNN).
Fig.
1 Method of the research For the ELA-CNN scenarios, there are three sub-scenarios, namely 10%, 50%, and 90% compression levels.
Before the CNN occurs, the Canny Edge Detector will be implemented for 100,000 faces, divided into two.
Thus, there will be 50,000 real face images and 50,000 fake face images.
The validation dataset contains 20,000 faces with 10,000 real face images and 10,000 fake face images.
Dataset testing has the same explanation as dataset validation.
Each folder consists of six columns: .
with type numeric for numbering the rows automatically, original_path with type String, id is a primary key with type numeric, label which has two values: 0 or 1, label_str is a String: real or fake, and path with type String.
Label 1 is for real images, otherwise, 0 is for fake images.
Fig.
2 depicts the screenshot of these columns from the folder train.
the size of the file is 63 MB.
the non-ELA-CNN scenario.
A Canny Edge Detector is one edge detection method that incorporates several stages of edge detection in an image .
This operator was developed by John Canny in 1986.
Fig.
1 shows the method used in this research.
Data Collection This study used a public dataset from Kaggle, namely 140k real and fake faces .
The dataset consists of 70k real faces from the Flickr dataset, which Nvidia collected, and 70k fake faces sampled from 1 million fake faces generated by StyleGAN.
We chose the dataset because it provides users with GAN-generated fake images.
Thus, this dataset suits the aim of this research.
The images are 256 pixels and divided into three folders:
train, validation, and test.
The training dataset contains Fig.
2 Columns of the dataset (.
) Some of the images are given in Table 1.
These images are taken from the train folder.
Real Fake 0A52ADV5M5.
TABLE I
SOME IMAGES FROM THE TRAIN FOLDER
Real Fake Image Pre-processing The size of the images obtained from a public dataset makes it difficult for the system to run on Google Colab.
Thus, the size should be managed and normalized so that it suits the requirements of .
Google Collab.
The online running option allows the GPU to speed up the running process, which 0AVBZYCGE9.
reduces the image size to 150 pixels at the pre-processing Layer global_average_pooling (GlobalAveragePooling2D) flatten (Flatte.
dense (Dens.
dropout_5 (Dropou.
dense_1 (Dens.
dropout_6 (Dropou.
dense_2 (Dens.
Total params: 7,102,533 Trainable params: 7,099,459 Non-trainable params: 3,074 Edge Detection with Canny Edge Detection For detection scenarios without ELA, image preprocessing is continued by applying the Canny Edge Detector.
The steps in this detection are:
C Convert RGB to YCbCr C Contrast adjustment C Convert YCbCr to Grayscale C Apply the Canny Edge Detector on the gray image Image Detection by CNN As can be seen in Table 2, the parameters for CNNs used in this study are given.
Meanwhile.
Fig.
3 shows the CNN architecture used in this study.
The CNN architecture is designed as follows.
The input on this layer is BatchNormalization.
The purpose of this layer is to normalize the inputs to the network, which then helps to stabilize and accelerate training by reducing internal covariate shift.
ensures that the input to each layer has a consistent distribution, which can improve convergence rates and Volume Parameter Parameter Six layers of convolution come next.
The purpose of this layer is to apply convolution operations to extract features from the input images.
Each convolutional layer detects unique features such as edges, textures, and more complex patterns in the images.
Convolutional layers are the core of a CNN, allowing the model to learn spatial hierarchies of Stacking this layer helps in capturing more complex and abstract features.
The convolutional layers include MaxPooling.
BatchNormalization, and Dropout.
The number of neurons on each layer is 64, 64, 128, 256, 512, and 512.
Max Pooling layers down-sample the input by reducing its spatial dimensions, reducing the computational complexity and helping achieve spatial invariance.
These layers reduce the dimensionality of the feature maps.
It helps to prevent overfitting, reduces the computational load, and summarises the most important features.
Batch normalization or intermediate layers have the same purpose as the initial batch normalization, which normalizes the activations of the previous layers.
These layers help maintain the benefits of normalization throughout the network, ensuring that each layer receives appropriately scaled inputs.
Dropout layers randomly set a fraction of input units to 0 at each update during training to prevent overfitting.
Dropout is a regularization technique that helps make the model more robust by preventing it from being dependent on any individual neurons.
This improves the generalization of unseen data.
Then.
GlobalAveragePooling and Flatten layers are used.
Global Average Pooling layers compute the average output of each feature map.
It reduces the spatial dimensions of the feature maps to a single value per feature map.
It helps prevent overfitting and is more interpretable.
It also significantly reduces the number of parameters in the model.
The flatten layer converts the 2D feature maps into a 1d vector.
It is a necessary stage before passing the data into fully connected .
layers, which require a 1d input.
These layers equalise the size of the convolution.
Dense layers are fully connected layers that learn the nonlinear combinations of the features extracted by the convolutional layers.
Dense layers at the end of the network combine the features into a higher-level representation for The final dense layer with a sigmoid activation outputs a probability for the binary classification task .
eal or In the classification, three dense layers (ANN Perceptro.
are used, which are 2048 and 1024 hidden layers and one output layer.
The activation function is Sigmoid.
This function is chosen because the label is a binary classification .
The ReLU function is also used, which helps in TABLE II THE CNNAoS PARAMETERS Layer batch_normalization_input (InputLaye.
batch_normalization (BatchNormalizatio.
conv2d (Conv2D) max_pooling2d (MaxPooling2D) batch\_normalization_1 (BatchNormalizatio.
conv2d_1 (Conv2D) max_pooling2d_1 (MaxPooling2D) batch_normalization_2 (BatchNormalizatio.
dropout (Dropou.
conv2d_2 (Conv2D) max_pooling2d_2 (MaxPooling2D) batch_normalization_3 (BatchNormalizatio.
dropout_1 (Dropou.
conv2d_3 (Conv2D) max_pooling2d_3 (MaxPooling2D) batch_normalization_4 (BatchNormalizatio.
dropout_2 (Dropou.
conv2d_4 (Conv2D) max_pooling2d_4 (MaxPooling2D) batch\_normalization_5 (BatchNormalizatio.
dropout_3 (Dropou.
conv2d_5 (Conv2D) max_pooling2d_5 (MaxPooling2D) batch_normalization_6 (BatchNormalizatio.
dropout_4 (Dropou.
Volume preventing the vanishing gradient problem, making training faster and more effective.
The activation function introduces non-linearities into the model, enabling it to learn more complex patterns.
In total, there are seven million parameters.
The dropout rate on Convolution Layers is 0.
1, and the Dropout on Dense Layers is 0.
5 to avoid overfitting.
The dropout rate was set to 1, which means that 10% of neurons will be dropped out.
This number is relatively small.
It helps to normalize the network without losing so much information during training.
It can reduce overfitting by randomly setting input units to 0 during training.
The Dropout is important in the dense layers to ensure the network generalizes well and does not overfit the training data.
The other hyperparameter is padding.
In this study, it is initialized to AuSameAy.
This padding ensures that the output feature map has the same spatial dimensions as the input.
helps to maintain the spatial resolution of the input throughout the convolutional layers, which can be beneficial for capturing features at different scales and positions in the image.
Fig.
3 Architecture of CNN Epsilon in Batch Normalization has a value of 0.
This number is a common choice which ensures numerical stability without significantly affecting the normalized values.
Then.
Binary Crossentropy is the value of the loss function.
This loss function is suitable for binary classification problems.
measures the difference between the predicted probabilities and the actual class labels, penalizing incorrect predictions more heavily.
Next.
Adam is used for the optimizer.
This value is chosen for its efficiency and adaptive learning rate It combines the advantages of two other popular optimizers: AdaGrad .
hich works well with sparse gradient.
and RMSProp .
hich works well in non-stationary Adam adjusts the learning rate for each parameter, which helps in faster convergence.
The learning rate was set 001 as it is a common starting point for Adam.
It allows the model to converge reasonably quickly without making large updates that could overshoot the optimal solution.
The learning rate decay is 1E-6.
It helps reduce the learning rate gradually over time.
It is useful for fine-tuning the model towards the end of training, ensuring that it does not oscillate around the minimum and instead settles into the optimal The batch size was chosen at 150 because this number compromises the computational efficiency and stability of the gradient updates.
A batch size is the number of training examples utilized in one iteration.
Larger batch sizes can lead to more stable gradients, whereas smaller ones can provide more frequent updates.
For epochs, it was set at 10 as this number is often a starting point to observe how the model performs and to ensure that it is not overfitting or underfitting.
An epoch determines how frequently the training dataset passes through the network.
Bigger epochs can be added later if the model still shows improvement.
In this study, validation data is used.
The separation of the validation dataset helps to monitor the modelAos performance on unseen data during training.
It will then provide an unbiased evaluation of the modelAos ability to generalize, helping in early stopping or hyperparameter tuning if it is The training data is also shuffled to ensure that the model does not learn any unintended patterns due to the order of the data.
Shuffling promotes better generalization and prevents the model from becoming dependent on the order of training examples.
Another hyperparameter is Verbose, which is set to 1.
This parameter controls the verbosity of the output during training.
Setting it to 1 means that the progress of training .
ncluding loss and accuracy metric.
will be displayed for each epoch, helping to monitor the training Overall, the effectiveness of the CNN architecture lies in its ability to gradually extract more complex and abstract features from the input images through multiple layers of convolutions, pooling, and normalization.
This architecture is designed to be deep and complex, making it capable of learning the intricate details necessary to distinguish between real and fake images in the DeepFake function.
Training Result Table 3 shows the results of running the system based on accuracy, loss, and time at the training and validation stages.
Four scenarios were used in this research.
The results indicate that the accuracy of the learning process in the detection system appears to be very good.
In the same way, the results of the validation of the training process are positive.
The highest accuracy and least loss are achieved when ELA with a 50% compression level is applied before CNN detection is On the other hand, cases without ELA resulted in the lowest number of results among the other three scenarios.
However, as seen in Table 3, the differences among the four scenarios are less significant.
The differences in values do not reach 0.
The same as time measured in seconds, the difference is less than 60 seconds between the longest and the shortest time.
Fig.
4 depicts the accuracy of CNN combined with ELA at 50% at the training and validation stages.
Meanwhile, the model loss is given in Fig.
4 using the same percentage.
From Fig.
3, overfitting is assumed not to happen because the accuracy in the training stage does not get worse when the accuracy in the validation stage is getting better.
Based on the results seen in Table 4, using that percentage, the accuracy is the best among the three percentages.
Thus, using this model, the overfitting can be prevented.
On the other hand, it is concluded that ELA 10% and 90% tend to overfit .
n validation los.
, whereas ELA 50% stays stable on This indicates that ELA 50% will be the best result based on training and validation performance.
Testing Result The confusion matrix is given in Fig.
The Confusion Matrix measures the model's performance.
From the matrix, true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN) can be calculated.
TABLE i THE ACCURACY AND LOSS OF THE TRAINING AND VALIDATION STAGES Accuracy Loss Validation Accuracy Validation Loss Test Accuracy Time .
Without ELA With ELA With ELA With ELA i.
RESULTS AND DISCUSSION
In this section, results and discussion are presented.
begins with results obtained from the training stage.
Fig.
4 Model accuracy on training and validation of CNN ELA 50% than the one obtained from a study, such as in .
and it is roughly the same as a study in .
Fig.
5 Model loss on training and validation of CNN ELA 50% Table 4 shows the metrics of testing.
It is an evaluation of the modelAos performance against test data.
As shown in Table 4, the performance of the real and fake image detection processes is improved when the CNN process is preceded by ELA compression.
Based on the evaluation of the model, it is evident that using ELA can improve the overall performance of the model that has been made.
The best performance was achieved by CNN ELA training at 50% with 93.
59% accuracy 54% precision ( ), 90.
42% recall ( ), 93.
F1 score ( O Fig.
6 Confusion matrix with CNN ELA 50% The former studyAos accuracy is 95.
It uses ELA, which is combined with CNN.
The dataset is from CASIA 2.
0, which is also a public dataset.
The latter has an accuracy of 93%, and the dataset used is from CASIA 2.
0 and MICC F200.
The Precision metric of this study is better than the one that resulted from the study in .
However, for the rest, the study in .
has beaten this study.
Table 5 summarizes some of the metric comparisons of those three studies.
ROC-AUC score of 98.
59 %, and TABLE V
COMPARISON OF SOME METRICS OF RELATED RESEARCH
AP score of 98.
However, this scenario's recall score is not the best.
The best recall score was obtained on ELA with a compression of 90%.
TABLE IV
THE ACCURACY AND LOSS OF THE TESTING STAGE
Accuracy Score
Precision Score
Recall Score
F1 Score
ROC-AUC
Score AP Score Without ELA With ELA With ELA 50% With ELA Research Accuracy Precision Recall .
This study Fig.
shows one real face image from the testing folder.
This image is taken from Kaggle .
The face result after ELA is shown in Fig.
The testing indicates that the face is real, and it is a correct analyzing.
The other testing is on a fake face image from the testing folder.
Fig.
depicts the The result is shown in Fig.
The system successfully identifies the face as fake.
However, some faces are identified wrongly.
One example face is shown in Fig.
ELAAos result is shown in Fig.
The model predicts that the image is fake compared to the originality of the real Fig.
show another wrongly identified This face image is taken from fake faces in the testing folder, and the model predicts that it is real.
Next, a comparison of the tests among three related studies is given.
For this study, the test was done with ELA 50%.
This is because the compression level of ELA provides the best Table 5 shows the results in the testing stage.
Although the accuracy of this study is fairly good, it is less Fig.
7 Correct identification of a real face image: .
Testing on a real face image, .
Result of ELA on a real face image Fig.
8 Correctly identified a fake face image: .
Testing on a fake face image, .
Result of ELA on a fake face image Fig.
9 Incorrectly identified a real face image: .
Testing on a real face image, .
Result of ELA on a real face image Fig.
10 Incorrectly identified a fake face image: .
Testing on a real face image, .
Result of ELA on a fake face image those produced by GAN.
Furthermore, this study is planned to implement facial expression recognition.
IV.
CONCLUSION
Generative Adversarial Networks are algorithms used to generate deep fake images through AI.
The combination of the CNN algorithm with the ELA compression method can be used to determine whether an image obtained using a GAN is real or fake.
From training the data.
CNN with ELA compression could improve the overall performance, such as accuracy, precision, recall, or other parameters.
Based on the evaluation results, it was found that the most effective CNN training was obtained when using 50% ELA compression because it can achieve 98.
6% accuracy.
This research is expected to be a reference for performing image detection processes between real and GAN images.
A suggestion for future research is to incorporate genetic algorithms to yield a better compression level.
Another suggestion is to use a denoising technique to accompany ELA.
Yet another one is to analyze fake facial images other than REFERENCES