International Journal of Electrical and Computer Engineering (IJECE) Vol.
No.
August 2025, pp.
ISSN: 2088-8708.
DOI: 10.
11591/ijece.
Enhancing anomaly detection performance using ResNet50 and BiLSTM networks on benchmark datasets Dipak Ramoliya1.
Amit Ganatra2 Department of Computer Science and Engineering.
Devang Patel Institute of Advance Technology and Research (DEPSTAR).
Faculty of Technology and Engineering.
Charotar University of Science and Technology (CHARUSAT).
Anand.
India Department of Computer Science and Engineering.
Faculty of Engineering and Technology.
Parul University (PU).
Vadodara.
India
Article Info
ABSTRACT
Article history:
Detection of abnormal activity from large video sequences is one of the biggest challenges because of ambiguity in different activities.
Over the last many years, several cameras have been placed to cover the public and private sectors to monitor abnormal human activity and surveillance.
recent years, deep learning and computer vision have significantly impacted this kind of surveillance.
Intelligent systems that can automatically identify unusual events in video streams are currently in high demand.
A deep learning-based combinational model has been proposed to detect abnormal activity from input video streams.
The proposed study uses a combination of convolution and sequential models.
A ResNet50 network with a residual connection was used for initial feature extraction.
The proposed bidirectional long short-term memory (BiLSTM) network has improved the extracted ResNet50 features.
Simulation of the proposed model was experimented on two benchmark datasets for anomaly detection UCF Crime and ShanghaiTech.
Simulation of proposed architecture has achieved 97.
94% remarkable accuracy for UCF Crime and ShanghaiTech datasets Received Aug 3, 2024 Revised Mar 24, 2025 Accepted May 23, 2025 Keywords:
Anomaly detection BiLSTM Computer vision Transfer learning Video surveillance This is an open access article under the CC BY-SA license.
Corresponding Author:
Dipak Ramoliya Department of Computer Science and Engineering.
Devang Patel Institute of Advance Technology and Research (DEPSTAR).
Faculty of Technology and Engineering.
Charotar University of Science and Technology (CHARUSAT) Changa.
Anand - 388421.
India Email: dipakramoliya.
ce@charusat.
INTRODUCTION
In surveillance applications, anomalous activity detection plays a crucial role.
It is possible to use automatic video capture to record anomalous human behavior without the need for system intervention .
Due to its complexity, the countless potential for aberrant cases, and the scarcity of available normal ones, it remains one of the most challenging and well-known research areas .
Surveillance cameras are almost everywhere these days, from security-sensitive locationsAisuch as borders and military installations used to monitor terrorist activityAito private residences meant to deter burglaries.
Due to weariness and monotony, manual workers were unable to maintain diligent monitoring over an extended length of time due to the large volume of data involved.
Traditionally, the monitoring function is handled by human operators.
These operators have to simultaneously visually evaluate many camera feeds.
Technological advancements and falling costs have led to an acceleration of the installation of surveillance cameras in both public and private spaces .
The main problem with this approach is that the detection rate of anomalies decreases rapidly when used in crowded environments when occlusions and clutter arise.
Journal homepage: http://ijece.
ISSN: 2088-8708
Anomaly detection, a critical task in various domains such as finance, cybersecurity, industrial monitoring and healthcare involves recognizing patterns in data that deviate meaningfully from the norm.
These anomalies can indicate significant but rare events like fraud, system failures, or unusual medical conditions, necessitating timely and accurate detection to mitigate potential risks and impacts.
Traditional methods for anomaly detection often struggle with the complexity and high dimensionality of modern data, making advanced techniques essential for effective analysis .
Long short-term memory (LSTM) networks, a form of recurrent neural network (RNN), have demonstrated extraordinary success in sequential data processing due to their capacity to capture long-term However, standard LSTM models process data in a single temporal direction, potentially missing contextual information that could enhance anomaly detection accuracy .
Bidirectional LSTM (BiLSTM) networks overcome this problem by processing data in both forward and backward directions, thus leveraging the complete temporal context .
Anomaly detection using deep learning presents several significant challenges, despite its potential to outperform traditional methods.
One of the primary challenges is the requirement for large amounts of labelled training data, which is often scarce in real-world scenarios .
Anomalies, by nature, are rare and diverse, making it difficult to gather a representative and sufficiently large dataset for training deep learning Additionally, deep learning models, especially those with complex architectures, are computationally intensive and require substantial resources for both training and inference .
, .
This can be a barrier to deployment in environments with limited computational capabilities.
Moreover, deep learning models can sometimes struggle with interpretability, making it hard to understand why a particular instance was classified as an anomaly, which is crucial in many applications where transparency and explainability are Furthermore, these models are sensitive to hyperparameter tuning and architecture design, requiring extensive experimentation and expertise to achieve optimal performance .
In this article, the authors have proposed an advanced architecture for anomaly detection.
The major contributions of the article are .
proposed advanced convolution and BILSTM-based sequential architecture for anomaly detection, i.
simulation of the proposed study explore the effectiveness of the study by comparing the SOTA deep learning model for anomaly detection.
The proposed study contributes to improving video surveillance in sensitive sectors.
The rest of the article is organized as section 2 demonstrates literature analysis and comparative analysis of recent work on anomaly detection using deep learning methods.
Section 3 presents the proposed BiLSTM architecture with residual connections.
Section 4 illustrates the simulation of the proposed architecture and statistical analysis with SOTA deep learning models over different benchmark datasets.
LITERATURE SURVEY
There are two types of methods for identifying unusual activity: low-level anomaly detection and high-level anomaly detection.
Low-level anomaly detection techniques locate local spatiotemporal regions that likely have aberrant low-level feature patterns before high-level analysis, including activity classification and object tracking, is carried out.
inter-activity context feature-based architecture proposed by Zhu et al.
The proposed study uses a greedy search algorithm for point-based contextual anomaly VIRAT ground dataset used for the simulation of the proposed study.
Nguyen and Meunier .
proposed a U-Net-based architecture for anomaly detection from video The proposed study used convolution architecture including 4 streams of convolutions of filter sizes 1y1, 3y3, 5y5 and 7y7 for feature extraction.
Simulation of the proposed study used benchmark datasets like Avenue and UCSD Ped2.
The proposed study was able to achieve 0.
869 and 0.
962 accuracy A faster R-CNN-based DeepAnomaly model was proposed by Christiansen et al.
to identify unusual activity in films.
The suggested architecture can identify people up to 90 meters away.
To identify anomalies, pre-train convolution models were employed.
The suggested study makes use of a proposed approach that offers between 300 and 2,000 regions per image.
Subsequently, a categorization network designates a label for every region and routes it via it.
DeepAnomaly is a helpful technique for region suggestion when there are few or no areas per image.
Sabokrou et al.
proposed an autoencoder (AE) based model for anomaly detection.
The proposed architecture finds errors in abnormal parts in the reconstruction phase of sparse AE.
Simulation of proposed architecture uses UCSD Ped2 and UMN benchmark datasets and patch size with 30y30 x10Proposed study marks patches as abnormal if more than 40% pixels are detected.
Simulation finds 99.
AUC with an 82% detection rate.
Khan et al.
proposed convolution-based architecture for monitoring of irregular human actions, and traffic surveillance.
Authors have also contributed to surveillance datasets such as the vehicle accident Int J Elec & Comp Eng.
Vol.
No.
August 2025: 3727-3736
Int J Elec & Comp Eng
ISSN: 2088-8708
image dataset (VAID).
The dataset contains of 1360 accidental images of vehicles.
The authors have extended the simulation of the proposed architecture with different types of pooling, activation, and optimizer The proposed study finds 82% accuracy using modified convolutional neural network (CNN) Le and Kim .
proposed a video anomaly detection system employing an attention-based residual autoencoder architecture.
The suggested network uses both spatial and temporal information in a single, integrated network and is based on unsupervised learning.
Simulation of the proposed study uses three different benchmark datasets UCSD Ped2.
CUHK, and Shanghaitech.
Simulation of the proposed study finds AUC values as 73.
6%, 86.
7% and 97.
4% for ShanghaiTech.
CUHK Avenue, and UCSD Ped2 datasets Avola et al.
proposed anomaly detection based on generative adversarial network (GAN).
The proposed study uses GAN architecture for aerial video surveillance at low altitudes.
Proposed study focuses on low-altitude sequences in complex scenarios wherein a tiny object or gadget may be cause for concern or Simulation of the study uses benchmark change detection (UMCD) and unmanned aerial vehicle (UAV) I mosaicking dataset.
Simulation of GAN architecture finds 95.
7% structural similarity index (SSIM).
Table 1 presents a comparative examination of various deep learning methodologies for anomaly identification .
Ae.
Table 1.
Comparative analysis with other deep learning approaches for anomaly detection .
Ae.
Authors Zaheer et al.
Tang et al.
Zhou et al.
Wan et al.
Doshi and Yilmaz .
Year
Model
One-Class Classifier U-Net
LSTM
LSTM
YOLO-v4
Dataset
UCSD Ped2
Avenue Avenue UCF Crime
Avenue Result
METHODOLOGY
The proposed study uses a combinational approach for anomaly detection.
The proposed study used ResNet architecture for feature extraction followed by BiLSTM .
to enhance features based on feedback The architecture of the proposed study is mainly divided into three major categories as feature extraction with ResNet50, feature enhancement with BiLSTM, and classification with multi-layer perceptron (MLP) .
Figure 1 demonstrates the conceptual architecture of the proposed architecture.
Five layers of convolution layers interconnected with max pooling for feature exactions.
A stack of convolution layers is interconnected with a residual layer to reduce the issue of vanishing gradients in deep neural networks.
The second phase of architecture consists of BiLSTM.
LSTM architecture also helps to prevent vanishing issues for feature learning proposed study has used four layers of BiLSTM.
Where a series of LSTM models interconnected in counter order with each other, generally called as forward and backward pass of the LSTM network .
Figure 1.
Residual-based BiLSTM proposed architecture for anomaly detection .
Enhancing anomaly detection performance using ResNet50 and A (Dipak Ramoliy.
A ISSN: 2088-8708 The proposed architecture uses five layers of convolution with skip connection with output sizes of .
y 112, 56y56, 28y28, 14y14.
followed by a polling layer.
Intermediate layers of residual connections are separated with normalized layers with the same kernel size.
The residual module aims to include the network-extracted features.
Furthermore, the vanishing gradient problem is mitigated by the usage of short connections.
With ResNet serving as the foundation.
Figure 2 illustrates the fundamental workings of convolution learning.
Equation .
illustrates the normalizing procedure used to normalize features during ReLU's activation.
Through the use of skip connections, which combine the outputs of stacked layers with those from previous layers, significantly deeper networks may now be trained than in the past .
= { ycu, ycnyce ycu > 0 0, ycyceycyc ycuyce ycEayce ycaycaycyceyc Figure 2.
Conceptual design of residual connection used in the proposed study .
The ResNet module is composed of the basic block and the identity mapping.
Assume that the input vector of ResNet in the yco Oe ycEa module is ycuyco and the output vector of yco Oe ycEa module is ycuyco 1 .
the output vector of the ResNet module is formulated as .
cuyco , .
cOycoycn }) = (.
cOyco2 )ycN yua.
cOyco1 )ycN ycuyco ) .
ycuyco 1 = yua .
cuyco , .
cOycoycn })) ycuyco BiLSTM Long short-term memory, or LSTM, is frequently thought of as a development of recurrent neural networks (RNN).
The capacity of RNN to function as Aushort-term memoryAy made it possible to apply prior knowledgeAibut only up to a certain pointAifor the current task.
LSTM architecture, which is an extension of RNN, offers Aulong-term memory,Ay or the ability to store all of the prior data for a given neural node, rather than just a specific point in time .
LSTM architecture consists of three gates Input, output, and forget.
The network's present long-term memory, which houses the history of data, is referred to as the Aucell state.
Ay Short-term memory can be thought of as the output at a previous point in time, which is referred to as the previous Hidden State.
The input value at the current time step is contained in the input data.
BiLSTM adopts consideration of both the past and future properties of the network.
The BiLSTM hidden layer consists of two components: the forward and backward LSTM cell states.
To participate in the forward and reverse calculations, network traffic information first enters the hidden layer through the input layer, and the data computed by the hidden layer is subsequently communicated to the output layer.
ultimately, the output layer combines the forward and reverse LSTM outputs based on a predetermined weight to produce the desired output.
Figure 3 shows the structure of the BiLSTM network.
Int J Elec & Comp Eng.
Vol.
No.
August 2025: 3727-3736
Int J Elec & Comp Eng
ISSN: 2088-8708
Figure 3.
Functional architecture of BiLSTM used for anomaly detection .
Bi-direction LSTM network allows the model to learn inter-dependent features from input video.
Initially, feature matrices are processed with a forward pass of LSTM and consequently calculated in backward pass also.
The proposed study has used different interconnected layers of BiLSTM.
However, good accuracy is found in the four layers architecture of BiLSTM.
The hidden state of the proposed architecture ht can be calculated as .
Whare Ct, as learning of cell state formulated as .
Eayc = ycuyc y ycycaycuEa.
ayc ) .
yayc = yceyc Oo yaycOe1 ycnyc Oo yaEyc Forward and backward passes consist of three gates as the vanilla architecture of LSTM.
The input gate of the proposed architecture helps to identify relevant information from anomaly frames .
The output of input gate is set as enabling abnormal features for the rest of the video data its set to 0 as disabled.
Similarly, output gate enables and disables based on feature information, whether it should be forwarded to the next layer or not.
Formulation of input and output gate as .
ycnyc = yua.
cOycn UI [Eayc Oe 1, ycuy.
ycuyc = yua.
cOycu UI [Eayc Oe 1, ycuy.
Output and input gates are interconnected in forward and backward propagation anomaly features.
However, the behavior of input and output gate control by memory cell, which is updated by forget gate .
ormulated as equation .
of the proposed architecture.
The output of the BiLSTM layer is the concatenation of the forward and backward hidden states at each time step as equation 6.
Where is the forward hidden state .
at time step t and Eayc is the backward hidden state at time step t.
yceyc = yua.
cOyce UI [Eayc Oe 1, ycuy.
Eayc = [Eayc.
Eayc.
] .
The proposed architecture improves the recognition of abnormal activity from video sequences .
The combinational study of convolution and recurrent networks helps to extract advanced features from complex activity videos.
The deeper architecture of ResNet50 helps to reduce vanishing issues and generate a feature matrix which is further analyzed with Bi-direction LSTM architecture for improvises in the detection of abnormality in common areas.
RESULTS AND DISCUSSION
In this section, it is explained the results of research and at the same time is given the comprehensive discussion.
Results can be presented in figures, graphs, tables and others that make the reader understand easily .
, .
Simulation of proposed combinational architecture implemented on two Enhancing anomaly detection performance using ResNet50 and A (Dipak Ramoliy.
A ISSN: 2088-8708 diverse anomaly detection datasets as University of Central Florida (UCF) Crime .
and ShanghaiTech Campus dataset .
Properties of benchmark anomaly detection datasets are demonstrated in Table 2.
Table 2.
Dataset information Dataset UCF Crime ShanghaiTech Classes Avg.
Frames Length 128 hr 130 hr The proposed approach is simulated on a PC with 32 GB RAM.
GeForce RTX 3080, and Core i7 The model was executed using Python's Keras package and TensorFlow.
Equations .
, .
, and .
are used to build evaluation matrices such as precision, recall.
F1-Score, and accuracy .
The simulation of the proposed study has achieved remarkable performance as 97.
55% accuracy for the UCF Crime dataset and 91.
94% accuracy for the ShanghaiTech dataset.
The accuracy and loss graph of the proposed ResNet50 with BiLSTM architecture is illustrated in Figure 4.
ycEycyceycaycnycycnycuycu = ycIyceycaycaycoyco = ycNycycyce ycEycuycycnycycnycyce ycNycycyce ycEycuycycnycycnycyce yaycaycoycyce ycEycuycycnycycnycyce ycNycycyce ycEycuycycnycycnycyce ycNycycyce ycEycuycycnycycnycyce yaycaycoycyce ycAyceyciycaycycnycyce yaycaycaycycycaycayc = ycNycycyce ycEycuycycnycycnycyce ycNycycyce ycAyceyciycaycycnycyce ycNycycyce ycEycuycycnycycnycyce ycNycycyce ycAyceyciycaycycnycyce yaycaycoycyce ycEycuycycnycycnycyce yaycaycoycyce ycEycuycycnycycnycyce Figure 4.
Accuracy and loss graphs of the proposed model with both benchmark datasets Int J Elec & Comp Eng.
Vol.
No.
August 2025: 3727-3736
Int J Elec & Comp Eng
ISSN: 2088-8708
Initially, experiments were conducted on the core architecture of convolution and recurrent networks to identify the potential of anomaly detection for both datasets, as demonstrated in Table 3.
Implementation of anomaly detection using core and state-of-the-art models finds good accuracy, but not promising to make the model more generalized and robust.
The simulation of the study uses core .
-laye.
recurrent network models such as recurrent neural network (RNN), long short-term memory (LSTM), and gated recurrent unit (GRU) .
All this recurrent addition of the BiLSTM network improves the recognition rate of abnormal activities in both datasets.
Table 3.
Relative analysis of individual deep learning methods for anomaly detection Model
CNN
Dataset
UCF
ShanghaiTech
UCF
ShanghaiTech
UCF
ShanghaiTech
UCF
ShanghaiTech
UCF
ShanghaiTech
RNN
LSTM
GRU
BiLSTM Accuracy Simulation of the proposed study also compared with different combinations over convolution model such as vanilla CNN.
DenseNet.
ResNet50/34, and VGG16 .
, .
Table 4 demonstrates a comparative analysis of the SOTA convolution model with the proposed Deep BiLSTM model.
Simulation of the proposed study also extends to analyze the impact of different hyperparameters such as batch size, split ratio, and learning rate.
Figure 5 illustrates the impact of the learning rate and train-test split ratio for both The simulation was also analyzed with different numbers of epochs and learning layers in the proposed BiLSTM architecture.
Table 4.
Comparative analysis of different combinational architectures for anomaly detection Model
CNN BiLSTM
VGG16 BiLSTM
DenseNet BiLSTM
ResNet34 BiLSTM
ResNet50 BiLSTM
Params Accuracy Precision Recall Comparative analysis with differnt learning rate and split ratio
Accuracy 100,00% 95,00% 90,00% 85,00% 80,00% 75,00% Learning rate with different split-ratio
UCF
ShanghaiTech Figure 5.
Comparative analysis of proposed architecture with different learning parameters DISCUSSION The proposed study analyzes the detection of abnormal activity from benchmark datasets like UCF crime and ShanghaiTech using enhanced deep learning models such as ResNet50 with BiLSTM.
Simulation Enhancing anomaly detection performance using ResNet50 and A (Dipak Ramoliy.
A ISSN: 2088-8708 of proposed architecture finds remarkable performance with 93.
81 and 87.
91% accuracy for benchmark The simulation section also analyzes the comparison with different SOTA models for anomaly detection and identifies the impact of standalone convolution and recurrent networks.
Simulation of the proposed study improves recognition accuracy for abnormal activity by combining convolution and recurrent The performance of the proposed architecture for unseen prediction data is illustrated in Figure 6 as a confusion matrix for UCF crime datasets.
Figure 6.
Confusion matrix of proposed architecture on unseen UCF dataset
CONCLUSION
Video surveillance become essential to prevent unexpected accidents in public places.
Automatic detection of abnormal activity helps to reduce crime and improve critical care in the medical sector.
The proposed study helps to improve the detection of abnormal activity from video streams.
The proposed study uses convolution and sequential architecture to enhance the accuracy of anomaly detection.
Simulation of the proposed has proven impact of the proposed architecture over two benchmark datasets.
Simulation has also explored the compassion of the proposed architecture with state-of-the-art deep learning models for anomaly The proposed combinational architecture can achieve remarkable performance with 97.
55% and 94% accuracy for UCF and ShanghaiTech datasets respectively.
Simulation has also explored the performance of different combinations of sequential models with convolution models, but BiLSTM finds promising results compared to other sequential models like GRU and LSTM.
The proposed study can be extended to detect abnormal activity from live video feeds.
An extension of this work can be designed with robust and advanced architecture for anomaly detection.
ACKNOWLEDGEMENTS
We would like to thank the AuCharotar University of Science and Technology Ae CHARUSATAy for carrying out this research and utilizing resources.
FUNDING INFORMATION
Authors state no funding involved.
Int J Elec & Comp Eng.
Vol.
No.
August 2025: 3727-3736
Int J Elec & Comp Eng
ISSN: 2088-8708
AUTHOR CONTRIBUTIONS STATEMENT
This journal uses the Contributor Roles Taxonomy (CRediT) to recognize individual author contributions, reduce authorship disputes, and facilitate collaboration.
Name of Author Dipak Ramoliya Amit Ganatra C : Conceptualization M : Methodology So : Software Va : Validation Fo : Formal analysis ue ue ue ue ue ue ue ue ue ue ue I : Investigation R : Resources D : Data Curation O : Writing - Original Draft E : Writing - Review & Editing ue ue ue ue Vi : Visualization Su : Supervision P : Project administration Fu : Funding acquisition CONFLICT OF INTEREST STATEMENT Authors state no conflict of interest.
ETHICAL APPROVAL
The conducted research is not related to either human or animal use.
DATA AVAILABILITY
Derived data supporting the findings of this study are available from the corresponding author.
DR, on request.
REFERENCES