International Journal of Electrical and Computer Engineering (IJECE) Vol.
No.
October 2025, pp.
ISSN: 2088-8708.
DOI: 10.
11591/ijece.
Efficient fall detection using lightweight network to enhance smart internet of things Pinrolinvic D.
Manembu.
Jane Ivonne Litouw.
Feisy Diane Kambey.
Abdul Haris Junus Ontowirjo.
Vecky C.
Poekoel.
Muhamad Dwisnanto Putro Department of Electrical Engineering.
Faculty of Engineering.
Sam Ratulangi University.
Manado.
Indonesia
Article Info
ABSTRACT
Article history:
Fall detection automatically recognizes human falls, mainly to monitor and prevent severe injury and potential fatalities.
It can be developed by applying deep learning methods to recognize human subjects during fall incidents and implemented in the internet of things (IoT) to monitor patient and elderly individualsAo activity.
The development of object detection presents you only look once v8 (YOLOv.
as an influential network, but its efficiency needs to be improved.
A modified YOLOv8 architecture is proposed to introduce a novel lightweight network version called YOLOv8-Hypernano (YOLOv8.
that recognizes fall events.
The backbone incorporates a combined spatial and channel attention module, which enhances focus on human subjects by concentrating on movement patterns to detect falls more accurately.
This work also offers a consecutive selective enhancement (CSE) module to improve efficiency and effectiveness in feature extraction while reducing computational costs.
The neck structure is modified by adding a lightweight bottleneck network.
The proposed network reconstructs feature maps in depth, paying more attention to accurate human movement patterns and enhancing efficiency and effectiveness in feature extraction.
Experimental results of YOLOv8h with the light bottleneck and consecutive selective enhancement modules show giga floating-point operations per seconds (GFLOPS) of 5.
6 and 1,194,440 parameters.
The model performance is calculated in mean average precision, achieving 0.
603 and 0.
732 on the Le2i and Fallen datasets, respectively.
These results demonstrate that the optimized network improves accuracy performance while maintaining lightweight computing requirements that can run smoothly on IoT devices, achieving comparable speed and efficiency suitable for operation on lowcost computing devices.
Received Aug 15, 2024 Revised May 26, 2025 Accepted Jun 30, 2025 Keywords:
Edge device Efficient computation Fall detection Lightweight module Smart internet of things This is an open access article under the CC BY-SA license.
Corresponding Author:
Pinrolinvic D.
Manembu Department of Electrical Engineering.
Faculty of Engineering.
Sam Ratulangi University Bahu Campus.
Manado 95115.
Indonesia Email: pmanembu@unsrat.
INTRODUCTION
Vision-based fall detection is a rewarding task that analyzes and predicts fall events, which can cause serious injuries such as disability, paralysis, or death.
It is especially crucial for elderly patients, as falls can be fatal .
The risk of falls increases due to physical and mental decline influenced by aging.
It makes fall detection systems essential in healthcare to improve its quality.
Therefore, fall detection systems are a promising solution for reducing the risk of falls and their health consequences .
Meanwhile, it is impossible to prevent falls, but physical exercise and technological solutions can completely help reduce Journal homepage: http://ijece.
ISSN: 2088-8708
their frequency.
Internet of things (IoT) utilizes this detection system to reduce risk and improve the performance of monitoring the activities of patients and the elderly .
, .
Besides, fall detection can enhance IoT ability by enabling continuous remote monitoring of the environment .
This system commonly requires edge devices with low computation cost, thus demanding a lightweight and effective The smart IoT can implement more useful preventive measures by utilizing these detection systems .
Ae.
It can decrease the impact of fall injuries and promote healthy and active lifestyles.
The deep learning approach has played a crucial role in extracting complex information and accurately predicting objects and behaviors .
, .
These networks are effective due to their non-linear operations and deep layers, which simultaneously process feature maps.
The needs and challenges in computer vision emphasize convolutional neural networks (CNN) as modern methods that optimally filter out important features .
Ae.
CNNs have proven very effective in computer vision tasks such as object recognition and image classification.
The general convolution layer is followed by an activation layer to offer a non-linearity function.
CNNs for fall detection have paved the way for powerful algorithms such as you only look once (YOLO).
This network offers an efficient solution by detecting objects in a single process, enabling real-time object detection without sacrificing accuracy .
, .
YOLOv8, the latest version of the YOLO family, shows significant advancements over its predecessors.
It achieves higher accuracy in object detection, making it highly reliable for applications requiring robust real-time detection .
However, despite its accuracy, it requires extensive energy resources.
YOLOv8-nano has been presented as a more efficient algorithm but still runs slowly on devices with limited computing capacity .
Therefore, improving efficiency in its development is imperative to create a lighter architecture without compromising detection performance in real-world applications.
Human fall detection has been extensively studied, with models designed to quickly reduce rescue times and significantly identify human body movements.
However, developing a suitable architecture presents several challenges, particularly in capturing global and local information while maintaining detection accuracy.
The GL-YOLO-Lite model, developed by .
, integrates a transformer block and an attention module into the YOLOv5 architecture to address these issues.
The overlapping challenge in complex environments is addressed by the efficient diverse branch block-YOLO (ED-YOLO) model, which uses YOLOv5s as its backbone .
This research produces a real-time feature extraction that encourages the network to work optimally.
Another study .
proposed a vision-based fall detection system that employs object tracking and image enhancement techniques.
Practical applications drive new research focused on presenting lightweight algorithms.
A study .
introduced a lightweight CNN architecture using YOLOv5, which replaces the entire backbone with ShuffleNetV2.
A study .
presented a method that integrates convolution and information suppression layers to reduce computational overhead while maintaining optimal detection performance.
The proposed study presents an efficient solution for fall detection by introducing a resource-efficient approach, enabling implementation across multiple platforms, including IoT-based hardware, particularly edge devices.
Enhancement modules have been widely used to improve the precision of object localization .
Ae.
The module adopted an attention mechanism designed to optimize accuracy and efficiency in this model, enhancing its ability to detect falls.
This module helps the extraction feature focus on the person's body, highlighting specific attributes of movement patterns and body positions that indicate falling activity.
This work focuses on integrating the enhancement module into the network's backbone to improve the precision of fall detection.
The network gains additional capability in extracting essential information by effectively separating elements from the background.
This advantage is achieved without adding significant computation or increasing the number of parameters.
The summary of the potential impact and contributions of this study is as follows:
An efficient fall detection system is developed as an IoT-based monitoring system that operates on lowcost computing devices.
This study proposes a consecutive selective enhancement (CSE) module that modifies the structural enhancement of YOLOv8-nano to improve fall detection performance.
This modification refines the target features of the human body specific to fall events.
Extensive evaluation is conducted to measure the performance of the proposed detector compared to other lightweight network detectors.
Additionally, the study analyzes the model's efficiency by examining the proposed model's number of parameters, computational complexity, and inference time.
METHOD
Backbone In computer vision, the term "backbone" is analogous to the human backbone that supports the Similarly, in YOLO, the backbone is the primary foundation for CNN architecture for extracting information from input images.
In YOLOv8, the C3 module from YOLOv5 has been updated to Int J Elec & Comp Eng.
Vol.
No.
October 2025: 5031-5044
Int J Elec & Comp Eng
ISSN: 2088-8708
convolutional two faster (C2F).
This update improves feature extraction by retaining information more quickly and efficiently.
After the C2F stage, the processed output goes to the SPPF stage.
This stage adds variations to the information before it moves to the neck of the YOLO architecture.
Before entering the neck, a new layer called C2F-CSE is added after SPPF.
It ensures that the varied information enhances the model's ability to highlight important vertical and horizontal information separately from the two spatial dimensions.
This approach makes the model more focused on capturing essential features in the image.
The number of channel layers is modified to reduce computation in the proposed network backbone.
Limiting channel assignment encourages the network to extract features faster during the training and inference processes.
new YOLOv8 size variant called YOLOv8h (YOLOv8-Hypernan.
limits the maximum number of channels in the network layer to 128, as presented in Figure 1.
It significantly reduces the number of parameters and computational complexity compared to the nano version.
Figure 1.
The proposed architecture is improved from the YOLOv8 nano version.
It consists of a backbone as the main extractor feature, a neck to relate information in different frequencies, and the head is responsible for predicting the location and dimension of an object
C2F
The convolutional two faster (C2F) module in YOLOv8 is inspired by the previous version of the convolutional three (C.
This module is designed to improve model performance and efficiency in YOLOv8.
The C2F module comprises two convolution operations at the network's beginning and end.
The input information is split into two parts after the first convolution.
The first part passes the input information by performing a residual operation.
In contrast, the second part applies a bottleneck that utilizes convolutions with different kernel sizes to achieve optimal efficiency and effectiveness.
Furthermore, the model combines both features to enrich the different information.
At the end of the C2F module, the model's performance is enhanced more efficiently by employing a 1y1 convolution operation to consolidate the information.
C2F-CSE
The C2F-CSE module modifies a basic module of C2F by adding a consecutive selective enhancement (CSE) module.
As illustrated in Figure 2, two attention modules are developed to improve the model's capability in capturing and utilizing spatial and channel information, respectively.
The proposed module can increase the ability of extractor features to discriminate between vital information and trivial Its objective is to focus more on valuable features in the feature input.
Besides, it pays attention to the critical context of the image object.
The improvement also aims to enhance the model's ability to capture features of interest while optimizing the efficiency and performance of the model.
Efficient fall detection using lightweight network to enhance smart A (Pinrolinvic D.
Manemb.
A ISSN: 2088-8708 Figure 2.
The proposed CSE module combines channel and spatial representation.
It implements a squeeze and excitation module at the beginning of the module to highlight vital information along the channel map and spatial enhancement to capture the valuable features in a larger spatial area CSE module The proposed network utilizes the squeeze-and-excitation (SE) .
and spatial attention blocks' feature-selective ability.
This module is believed to improve the precision of the detection network by summarizing the representation feature and generating weighted scaling.
A SE block is the first attention module designed to improve network performance by explicitly modeling the relationship between feature channels through a feature recalibration process.
This process assigns weights to each feature channel.
Feature recalibration applies a squeeze technique incorporating spatial information into the channel descriptors, and the excitation process learns the channel's activation corresponding to the input.
This module can formulate as .
ycIya.
cU) = yua.
cO2 ycIyceyaycO.
cO1 ycsya ))CycU, .
ycsyca = yayycO ycO Ocya ycn=1 Ocyc=1 ycUycn,yc,yca .
In the initial process, the input information .
cU) is modeled by capturing the global average region of the feature map across each channel through the operation of ycsya .
The represented feature is then processed through two fully connected layers to model channel dependencies.
This process is followed by applying the rectified linear unit (ReLU) function.
This activation eliminates negative input values, thereby preventing irrelevant or detrimental information propagation in subsequent computations.
It ensures that critical neurons are not hindered by low or negative scores, enabling the model to focus on valuable features effectively.
The weights of the two fully connected layers are denoted as ycO1 and ycO2 , and a sigmoid function .
is employed to generate weighted probability scores.
Subsequently, the output vector from the sigmoid activation is multiplied with the original input feature map to refine the initial information based on channel-wise The SE network provides only channel-specific attention in feature extraction and lacks enhanced spatial representation.
Therefore, this study incorporates a spatial attention module to improve the enhancement of features.
This addition allows the network to find interesting information in spatial coverage, enabling it to recognize specific dimension patterns to indicate fall features.
The channel and spatial information combination can accurately understand unbalanced body positions as indications of a fall while ignoring irrelevant areas.
In detail, the fusion module can illustrate as .
yaycIya.
cU) = yua.
aycuycuyc 7ycu7 .
cIya.
cU)), ycAycaycuycyycuycuyco.
cIya.
cU))]) O ycIya.
cU).
The SE network boosts the quality of input features, which can improve performance by assigning more relevant weight to important feature channels.
The output then adaptively reweights essential features in the spatial dimension, enhancing the model's performance on spatially mapped features.
Average pooling (AvgPoo.
and max pooling (MaxPoo.
are used in parallel blocks to obtain feature summaries.
The two spatial features are fused using the concatenate operation ([]), and then a 7y7 convolution filter .
aycuycuyc 7ycu7 ) Int J Elec & Comp Eng.
Vol.
No.
October 2025: 5031-5044
Int J Elec & Comp Eng
ISSN: 2088-8708
extracts feature map information to cover a wider receptive field and detect more complex patterns.
Finally, sigmoid activation .
emphasizes and highlights the spatial attention map.
The proposed research integrates the modified SE network into the backbone structure of the YOLOv8 architecture to extract deeper features by emphasizing the learned weight represented in spatial and channel maps.
The combined enhancement module improves the network's accuracy in recognizing and emphasizing essential features.
The model also focuses on efficiently operating in a realistic application system.
Moreover, the attention module can enhance the optimization of the feature learning process.
SPPF
A spatial pyramid pooling faster (SPPF) is an important component in the backbone that aims to increase object detection capability with high efficiency on diverse input features.
This module optimizes the original version of spatial pyramid pooling (SPP) by pooling features using varying kernel sizes .
, 5y5, 9y9, 13y.
Then, the results are combined to create a richer and diverse feature representation.
This process helps the model capture information at different scales, making detecting objects of various sizes in images easier.
Additionally, this module improves computational efficiency.
It can speed up the inference process and strengthen the detection precision, making it an exciting component in the YOLOv8 architecture, especially in the backbone part.
Neck The neck module aims to receive and combine features from various resolution levels produced in the backbone and then connect the information from the backbone to the head.
It helps improve the feature representation before passing it to the final prediction.
The PANet module is adopted to enable feature extraction from different levels of resolution on the map by enhancing the model to recognize different-sized PANet utilizes rapid feature fusion by extracting more comprehensive information.
A light bottleneck module is offered in the C2F module at each prediction layer.
This bottleneck structure variation enhances feature extraction effectiveness while reducing computational cost.
In order to improve the efficiency of the network, it proposes C2F-Next.
This module is inspired by the basic module of convolutional two Faster, but the bottleneck part is modified using the light bottleneck.
The structure of the light bottleneck applies the standard bottleneck design .
, as presented in Figure 3.
The module can reduce computational cost while maintaining extraction ability without significantly declining Depthwise convolution applies a single channel of filter operation that compromises mixed information of each channel input.
This process can save many parameters and a rapid extraction process.
The bottleneck module structure incorporates several block structures, which adopt depthwise operation (DW) at input information (X) using a 5y5 kernel, as shown in Figure 4.
The large filter captures a sizable spatial area from input features and helps the network to increase the variety of the element object Furthermore, it utilizes LayerNorm (LN) to process each feature within a layer by normalizing each sample's information.
A light bottleneck is formulated as .
cU) = ycEycO .
cIya.
cU))) , .
ycIya.
cU) = ycEycO .
cU))) .
This module applies pointwise convolution (PW), which employs a 1y1 convolutional block to integrate information from various channels while preserving the spatial dimensions of the features.
The Gaussian error linear unit (GELU) activation function also helps optimize the model's performance by implementing a Gaussian approach to generate small negative inputs.
Subsequently, global response normalization (GR) normalizes the activation output across all spatial features within a layer, thus enhancing the training process and improving model performance.
Finally, pointwise convolution is applied in the last module to reconstruct the output channel and refine single spatial features, resulting in a more precise model.
The proposed module focuses on efficiently extracting combination features in the neck stage.
It utilizes light operation with a linear process without involving multi-channel mixing.
It ensures that the selected feature works effectively, compromising the computational load.
Head In its final stage, the model employs three heads that constitute a neural network responsible for predicting the locations and classes of objects.
These heads determine the corresponding classes' bounding Efficient fall detection using lightweight network to enhance smart A (Pinrolinvic D.
Manemb.
A ISSN: 2088-8708 box locations and dimensions without adding additional computational overhead.
These output heads are trained to generate offset scores attached to the final layers.
The types of output heads can vary depending on the object detection algorithm and task requirements.
Instead of using anchors for predicted boxes.
YOLOv8 adopts an anchor-free detection method that directly predicts an object's center rather than its offset from a predefined box.
This approach reduces the number of box predictions, speeds up post-processing, and simplifies the network, making it faster and improving the suitability of the proposed model with low-cost hardware configurations.
However, object detection often encounters inaccuracies or missed detections, leading to errors.
The intersection over union (IoU) metric addresses this by calculating the ratio of the overlap area of bounding boxes to their union area.
It can further be used to compute the complete IoU (CIoU), which incorporates factors such as the distance between bounding box centers and an aspect ratio for Distribution focal loss (DFL) and binary cross-entropy (BCE) loss functions are employed to evaluate the bounding boxes regression and classification accuracy of detected objects, respectively .
These critical loss functions play a pivotal role in training the model, enabling enhancements in predictive performance across successive iterations.
Figure 3.
Modified C2f with light bottleneck applies efficient operation.
It only provides an extensive computational effort on half of the parts but does not ignore the features of the rest Figure 4.
The proposed light bottleneck module applies a large kernel operation.
It utilizes depthwise convolution with a large kernel size to capture a wider spatial area
DATASET AND IMPLEMENTATION SETUP
Dataset The fall detection dataset uses Le2i .
and consists of 2 classes: standing and falling human These images are sourced from the credible website Roboflow, which focuses on two main objectives: standing detection and falling detection.
This dataset comprises 3010 images, divided into training, testing, and validation subsets, with a dataset split of 70% for training, 20% for testing, and 10% for Int J Elec & Comp Eng.
Vol.
No.
October 2025: 5031-5044
Int J Elec & Comp Eng
ISSN: 2088-8708
The second fallen dataset .
consists of 3 classes: fallen, sitting, and standing.
The dataset includes 3,290 images, divided into three parts: 74% training, 15% validation, and 10% testing.
Implementation setup The proposed method utilizes the PyTorch framework and requires specialized hardware, including an AMD Ryzen 5 4500 6-core CPU @ 4.
2 GHz process, 32 GB RAM, and an RTX 4060TI graphics card, to increase the training speed.
The system utilizes the fall detection dataset for training and evaluation.
The evaluation process relies on the values of average precision (AP) and mean average precision .
AP), using an IoU threshold of 0.
The training was performed over 200 epochs with a batch size of 16.
The optimizer used was Stochastic gradient descent (SGD), with a momentum of 0.
937 and a learning rate of 0.
This work implements mosaic augmentation that utilizes crop, flip, zoom, and shift geometry approaches to enrich the data challenges.
The mosaic only was conducted on 190 epochs, and the remainder used normal mode.
The image in the overall experiment is generalized with the input size dimension of 640y640.
In testing speed, the model embedded in a Jetson Nano 4 GB representing an IoT device that directly connects with a webcam Eyesec 4K in live stream mode.
EXPERIMENT AND RESULTS
This section investigates the proposed model's evaluation results with two fall detection datasets that measure conformance to intersection over union.
This experiment also compares the mean average precision .
AP) performance with other lightweight detection models within the scope of the YOLO family.
addition, efficiency comparisons are conducted by measuring the number of parameters, efficiency, and data processing speed.
Furthermore, a comprehensive analysis of the model is presented in this study, which finds the usage impact of the proposed modules.
Ablation study The ablation study presents the proposed module investigation that improves the performance of the YOLOv8 nano version.
The intended network YOLOv8h-LB-CSE was compared at each step with modified block structures to see the impact of modifications.
As shown in Table 1, the YOLOv8h-LB-CSE module into the original YOLOv8n has a notable decrease in Parameters by 60.
34% and FLOP by 22.
addition, there is an improvement in the performance model by 1.
17% and 0.
69% on the Le2i and Fallen datasets, respectively.
The researchers also reconstructed the channel dimensions of the original YOLOV8 nano, which was limited to a maximum of 128 channels.
This modified model is called YOLOV8h (Hypernan.
because there is a significant decrease in learnable parameters by 69.
84% and the number of operations by 26.
Moreover, it added a light bottleneck module to the YOLOV8-Hypernano module structure, which helps improve accuracy without significantly sacrificing computation cost.
The light bottleneck module structure is designed to maintain efficiency and feature extraction capabilities, and the combined model is called YOLOv8h-LB.
Table 1.
Ablation experiments with different improvement strategies.
It adds the proposed modules until they reach the entire proposed network
Models YOLOv8n
YOLOv8h
YOLOv8h-LB
YOLOv8h-LB-CSE
GFLOPS
Parameters 3,011,238 907,238 1,185,598 1,194,440 mAP @0.
5:0.
95 on Le2i dataset mAP @0.
5 on Fallen dataset Furthermore, the YOLOv8h with LB and CSE combines a light bottleneck with the proposed enhancement module, designed to improve network performance by recalibrating the input features.
These findings prove that the enhanced YOLOv8 provides superior detection efficacy in fall detection.
It also benefits from the lightweight incorporation of modules, leading to reduced model complexity.
Including the light bottleneck and Squeeze-and-Excitation modules allows the model to capture essential information and recalibrate features effectively, improving overall accuracy without adding significant computational This enhancement encourages YOLOv8 to be particularly suitable for deployment in real-world applications with limited computational resources.
Reducing the number of parameters and FLOPs makes the model more efficient and faster, which is crucial for real-time fall detection systems.
Furthermore, carefully reconstructing the channel dimensions ensures that the model remains compact while maintaining high performance, making it an ideal solution for edge computing devices.
Efficient fall detection using lightweight network to enhance smart A (Pinrolinvic D.
Manemb.
A ISSN: 2088-8708 Evaluation on datasets This study conducted a visual analysis to illustrate the detection performance of the modified YOLOv8 under various conditions, as shown in Figures 5.
Each set of test images consists of two components: falling and standing categories.
The left part presents the original photo, while the right part illustrates the heatmap results of the modified YOLOv8 algorithm.
This visualization utilizes the Eigen-CAM approach, highlighting the most important features in red pixels.
The heatmap of the falling category is shown in Figure 5.
, demonstrating that the proposed model emphasizes valuable information on the body part of the fallen object.
As a result, this model can effectively recognize the falling position.
For the standing category, the heatmap indicates that the model focuses on the correct prediction, showing that the heatmap area is inclined vertically.
It also highlights the shoulders and feet as the main indicators of the standing .
Figure 5.
Heatmap observation of the proposed detector.
It tests on .
Le2i and .
Fallen datasets.
The target object is detected through green boxes This study investigates the mean average precision of each prediction against each class label.
The confusion matrix on the Le2i dataset is presented in Figure 6.
The fall class exhibits the highest accuracy, 96 instances correctly classified.
However, there is a 0.
04 misclassification rate, where instances of falls are incorrectly classified as "standing.
" Conversely, the standing class has a correct classification rate of 89 but obtains a 0.
11 misclassification rate, with instances that should be classified as standing being incorrectly identified as fall.
This analysis highlights the model's strengths and areas for improvement in distinguishing between fall and standing events.
Figure 6.
illustrates the confusion matrix for the Fallen dataset, which includes three classes:
fallen, sitting, and standing.
The standing class exhibits the highest correct classification rate at 0.
However, there are misclassification scores of 0.
02 in standing instances, incorrectly classifying it as sitting, and background misclassification of 0.
For the fallen class, the correct prediction rate is 0.
66, but there are fallen misclassifications of 0.
06 predicted as sitting and background misclassifications of 0.
The sitting class obtains the lowest correct prediction rate of 0.
60, with misclassifications of 0.
07, 0.
10, and 0.
23 as fallen, standing, and background, respectively.
This analysis indicates areas where the model's accuracy can improve, particularly distinguishing between similar postures.
This work compares the performance of the proposed model with the efficient YOLO families.
shows that our detector is superior to competitors, such as YOLOv3 tiny.
YOLOv5n.
YOLOv6n.
YOLOv7 tiny.
YOLOv8n, and YOLOv10n.
Performance evaluation shows that YOLOv8h-LB-CSE achieves the best precision, measured with mAP of 0.
603 in 0.
5:95 IoU.
The proposed model outperforms the original YOLOv8n, differing by a mAP of 5.
Even it is superior to the new version of the lightweight YOLOv10n.
This observation also compares the performance of YOLOv8h-LB with several attention Int J Elec & Comp Eng.
Vol.
No.
October 2025: 5031-5044
Int J Elec & Comp Eng
ISSN: 2088-8708
modules, and the majority presents improving precision.
An enhancement block is assigned to increase the ability of extraction features, although it only does not significantly add to the computation cost.
The proposed model with CSE obtains a mAP of 5.
98% higher than YOLOv8h-LB-SE, which uses the Squeeze Excitation attention module.
This structure only focuses on channel-wise enhancement.
Furthermore, the proposed model also performs better than CBAM attention, which differs by 2% mAP.
CBAM uses a configuration similar to CSE but with over-in-channel extraction.
Compared to the CAN and ELA attention modules, our network shows higher mAP.
Although both of these attentions involve context discovery from the spatial regions of the map, they are not robust enough to discriminate between falling and standing GCNET and DAN performed well in the object detection task but could not outperform CSE.
Figure 6.
Confusion matrices of model prediction.
It evaluates on .
Le2i and .
Fallen datasets On the other hand.
Table 2 shows a comparison of YOLOv8h-LB-CSE performance with other detectors on the Fallen dataset.
The proposed network achieves mAP@0.
5 of 0.
732, indicating that our model outperforms YOLOv8n by 0.
688% mAP and surpasses YOLOv7 tiny by 3.
On this dataset, the proposed model also compares the precision with other YOLO lightweight models.
Although YOLOv6n outperforms our detector, other lightweight detectors underperform.
YOLOv6n achieves higher precision than the proposed model by 0.
003, but the model generates more parameters and computation.
Moreover, this satisfactory performance also outperforms the mAP of YOLOv5n by 0.
A comparison with the state-ofthe-art network.
YOLOv10n, shows that our detector is superior by 0.
The Hypernano version shows a lower performance than the full proposed network.
This result represents that the proposed gain module can improve performance in recognizing falling, sitting, and standing activities, thereby demonstrating the practical implications of our work.
Evaluation of model efficiency The design of the proposed model considers the advantages that help the application scenario.
The proposed model is designed with attention to several important aspects, such as a low number of giga floating-point operations per seconds (GFLOPS) and a minimal number of parameters compared to other This research prioritizes efficiency.
a more efficient model can perform many computational tasks with fewer operations and fewer trainable weights.
This issue is directly related to the number of GFLOPS and the total number of parameters.
Analysis of the comparative experiments YOLOv8h-LB-CSE is the cheapest model.
The proposed model is very lightweight compared to the efficient YOLO detectors, as shown in Tables 2 and 3.
The original version of YOLOv8n generates 2.
5 times larger than our proposed Our detector uses only 1.
46 times less operation usage.
Furthermore, the parameters and number of operations are widely used by YOLOv3 tiny and YOLOv7 tiny.
Thus, this also requires a significant processing device memory while weakening the data processing speed.
The speed of data processing determines the reliability of a method when implemented on a device.
In order to support the smart IoT system, this work evaluates the proposed model speed on an NVIDIA Jetson Nano with a VRAM of 4 GB.
This device is commonly used as an edge device for intelligent systems.
Based on the graph in Figure 7, the proposed model with light bottleneck and CSE achieved a speed of 10.
FPS when tested on the device.
Compared to the lightweight model YOLOv8h, it is reduced by 25.
the other hand.
YOLOv8h-LB-CSE achieves 40.
1% faster compared to YOLOv3-tiny.
Another comparison is that the proposed model is 8.
5% and 9.
8% slower than YOLOv8n and YOLOv6n, respectively.
Efficient fall detection using lightweight network to enhance smart A (Pinrolinvic D.
Manemb.
A ISSN: 2088-8708 A comparison with the most popular YOLO versions, such as the YOLOv5n, shows that our model is slower than this competitor's, which is our model's weakness.
The efficiency of the number of parameters and computational complexity of the proposed model outperforms that of the efficient YOLO detectors.
However, it requires more processing memory.
It is due to depth-wise operations that apply branching operations for each channel.
Hence, this operation requires more memory than the regular convolution On the other hand, the speed achieved by our detector of 10.
64 FPS is feasible for smooth operation on edge devices that support IoT intelligent systems.
The priority emphasizes fall detection performance that minimizes fall activity recognition errors.
Table 2.
Comparison of the proposed detector with other lightweight YOLO detectors and attention mechanism methods on the Fallen dataset.
It also compares the number of parameters and computational complexity of models Models YOLOv3 tiny YOLOv5n
YOLOv6n
YOLOv7 tiny YOLOv8n
YOLOv10n
YOLOV8h
YOLOv8h-LB
YOLOv8h-LB-CSE
GFLOPS
Parameters 12,133,670 2,509,049 4,238,441 3,011,433 2,695,196 907,433 mAP @0.
0,690
0,727
mAP @0.
5:0.
0,320
0,405
Table 3.
Comparison of the proposed detector with other lightweight YOLO detectors and attention mechanism methods on the Le2i dataset.
It also compares the number of parameters and computational complexity of models Models YOLOv3 tiny YOLOv5n
YOLOv6n
YOLOv7 tiny YOLOv8n
YOLOv8h
YOLOv10n
YOLOv8h-LB
YOLOv8h-LB-SE
YOLOv8h-LB-CBAM
YOLOv8h-LB-CAN
YOLOv8h-LB-ELA
YOLOv8h-LB-DAN
YOLOv8-h-LB-GCNET
YOLOv8h-LB-ECA
YOLOv8h-LB-CSE
GFLOPS
Parameters 12,133,156 2,508,854 4,238,342 6,017,694 3,011,238 907,238 2,695,196 917,222 1,182,600 1,184,392 1,185,598 1,184,550 1,219,400 1,184,575 1,182,249 mAP @0.
0,910
0,907
0,907
0,916
mAP @0.
5:0.
0,603
Figure 7.
Speed comparison of the proposed model with other lightweight YOLO models.
The proposed model achieves 10.
64 FPS faster than YOLOv3-tiny Int J Elec & Comp Eng.
Vol.
No.
October 2025: 5031-5044
Int J Elec & Comp Eng
ISSN: 2088-8708
Practical scenario testing and future research Practical applications demand that vision algorithms operate efficiently on embedded devices and deliver high accuracy.
Our model was tested in a real-world scenario to evaluate the reliability of our proposed system and implemented as an IoT-based intelligent system for monitoring falls.
Live video streams were captured from an RGB webcam and processed on a Jetson Nano, which served as the computational platform for running the model.
The IoT setup was installed on a high wall corner, simulating an intelligent video surveillance environment.
It was trained using the Fallen dataset to ensure the model's effectiveness in real-world detection, which classifies actions into three categories: falling, sitting, and The results demonstrate that the proposed fall detection system operates efficiently, accurately recognizing these actions, as illustrated in Figure 8.
The visualizations reveal that our system effectively detects falls, indicated by red bounding boxes around the falling individual.
The system demonstrates high accuracy in fall detection and correctly predicting sitting and standing.
However, during the movement transition process, the model occasionally misclassifies actions.
Figure 8 .
ottom ro.
presents some of these prediction errors.
It causes the limited variety of motion data for transition phases.
Additionally, the model requires more temporal awareness, as it needs to account for the sequential relationships in time-series data, further contributing to these scenes' inaccuracies.
Our work introduces a lightweight CNN architecture that generates low trainable parameters and reduces computational overhead.
However, using depthwise convolutions impacts data processing speed due to branching operations, which increase memory usage.
Future work will optimize these branching operations to improve the speed system.
One potential approach is reducing reliance on depthwise convolutions by incorporating grouped convolution operations, which can alleviate memory bottlenecks.
Improving data processing speed will enhance the model's applicability in real-world scenarios, particularly optimizing fall detection systems.
Additionally, future work will address high-frequency features by incorporating attention modules and revisiting error functions, improving detection performance and reducing misclassifications related to fall events.
Strengthening the relationship between features will also mitigate the loss of critical features caused by excessive convolution operations, further boosting the model's accuracy and robustness.
Figure 8.
Visualization of fall detection results in real case scenarios.
These scenes were performed in a laboratory environment CONCLUSION This research introduces a lightweight network that improves YOLOv8n, a new efficient method for human fall detection designed to address the computational challenges of using conventional YOLOv8 in fall detection scenarios.
The study presents YOLOv8-Hypernano (YOLOv8.
, which enhances the model's efficiency and performance by reducing the number of channels and keeping the performance.
It combines spatial and channel attention modules in the backbone, improving the focus on human subjects by more accurately detecting motion patterns.
It installs a consecutive selective enhancement (CSE) module to improve the efficiency and effectiveness of feature extraction while reducing computational costs.
The neck Efficient fall detection using lightweight network to enhance smart A (Pinrolinvic D.
Manemb.
A ISSN: 2088-8708 structure is also modified with a lightweight bottleneck network that cautiously reconstructs feature maps at depth layers.
It avoids abundant operations to accurate human motion patterns and maintains efficiency in feature extraction.
Moreover.
YOLOv8-Hypernano with CSE outperforms other advanced lightweight algorithms such as YOLOv3-tiny.
YOLOv5-nano.
YOLOv6-nano.
YOLOv7-tiny, and YOLOv8-nano.
The model evaluation results show that the proposed detector achieves a mAP score of 0.
603 and 0.
732 on the Fallen and Le2i datasets.
The model generates parameters of 1,194,440 and computations of 5.
6 G.
Visualization results indicate that the proposed detector works optimally and is suitable for implementation in an IoT system.
Further work is required to improve the performance of the detector head and refine the highlevel features.
FUNDING INFORMATION
This research is funded by Fundamental Research of UNSRAT Excellence Cluster 1 (RDUU K.
Sam Ratulangi University 2024 with contract number 192/UN12.
27/LT/2024.
REFERENCES