Journal of Applied Engineering and Technological Science Vol 6.
2025: 790-801
A COST-EFFECTIVE REAL-TIME HUMAN ACTIVITY RECOGNITION
SYSTEM USING SUPERVISED LEARNING ALGORITHMS AND
WEARABLE ACCELERATION SENSORS
Nguyen Thi Thu1.
Phung Cong Phi Khanh2.
Trong-Minh Hoang3.
Duc-Tan Tran4*.
Nguyen Ngoc Linh5.
Nguyen Canh Minh6* Faculty of Electronic Engineering.
Hanoi University of Industry.
Ha Noi.
Viet Nam 1 Faculty of Technology Education.
Hanoi National University of Education.
Ha Noi.
Viet Nam 2 Posts and Telecommunications Institute of Technology.
Ha Noi.
Viet Nam 3 Faculty of Electrical and Electronic Engineering.
Phenikaa University.
Ha Noi.
Viet Nam 4 International School.
Viet Nam National University.
Ha Noi.
Viet Nam 5 Faculty of Electrical and Electronic Engineering.
University of Transport and Communications.
Ha Noi.
Viet Nam6 tranduc@phenikaa-uni.
vn, ncminh@utc.
Received: 19 December 2024.
Revised: 03 April 2025.
Accepted: 13 April 2025
*Corresponding Author
ABSTRACT
Human activity recognition (HAR) plays a vital role in health monitoring by providing detailed insights into daily movements.
This study aims to enhance HAR by developing a lightweight and efficient machine learning model that balances accuracy, real-time performance, and affordability.
Using acceleration data from a wearable inertial sensor, we extracted a novel feature set optimized for computational The proposed model was evaluated on a benchmark dataset, achieving an accuracy of 98.
in classifying six essential daily activities: walking, walking upstairs, walking downstairs, laying, sitting, and standing.
These results demonstrate the modelAos potential for real-time health monitoring applications, offering a cost-effective and deployable solution for wearable-based activity recognition.
Keywords: Accelerometer.
Classification.
Feature Extraction.
Human Activity Recognition.
Wearable.
Introduction Human activity recognition has seen remarkable advancements in recent decades, driven by its broad usefulness across various domains.
An increasing number of real-world issues, including those in healthcare, fall detection for the elderly, industrial applications, and security, require solutions based on activity recognition (Roitberg et al.
, 2014.
Chernbumroong et al.
Peetoom et al.
, 2015.
Jain & Kanhangad, 2015.
Pierleoni et al.
, 2025.
Pham Van Thanh et al, 2.
By gathering data on user behavior and offering various modes of interaction, activity recognition technology enables systems to actively assist users in their tasks ( Ziaeefard et al.
As technology advances, sensors become increasingly varied to enhance identification rates and adapt to diverse environments.
Three primary approaches are used to address the challenge of human activity recognition: wearable sensors, computer vision-based approaches, and environment interactive sensor-based approaches (Marquis-Faulkes et al.
, 2005.
Lin & Ling, 2007.
Hazelhoff et al.
, 2008.
Shieh and Huang, 2009.
Wang et al.
, 2.
In vision-based techniques, cameras or video are used to observe changes in the surrounding environment and user behavior.
However, there are restrictions on the use of surveillance cameras, such as poor lighting, high costs, and privacy concerns.
To monitor user behavior, an environment interactive sensor-based technique embeds sensors in objects or subjects within the surrounding Similar to the computer vision-based method, pre-installation and deployment of this system are required (Sundholm et al.
, 2014.
Rahman et al.
Jamil et al.
, 2015.
Hevesi et al.
, 2014.
Torres, 2.
The wearable approach has created a wealth of new opportunities for detecting human activity (Sepahvand et al.
, 2025.
Zhang et al.
, 2024.
Khan et al.
, .
Zhu & Sheng, 2.
They are suitable for any indoor or outdoor venue without requiring any setup.
The goal of this research is to use body-worn sensors to recognize human activities and states.
Thu et al A Vol 6.
2025: 790-801 Existing HAR methods often suffer from limited generalizability and robustness when applied to real-world environments.
Many approaches rely on predefined datasets collected in controlled settings, leading to models that struggle with unseen variations in human movement.
Additionally, computational efficiency remains a concern, particularly for resource-constrained devices such as wearables and embedded systems.
While deep learning models have shown promise, their reliance on large-scale labeled data and high computational power limits their practical deployment in real-time applications.
This study aims to address these limitations by proposing an adaptive HAR framework that enhances recognition accuracy while optimizing computational efficiency.
This study focused on using a single 3-axis accelerometer integrated into a low-cost system to minimize the impact on human movement.
Additionally, we analyzed data from multiple subjects to determine whether the data clustered according to different activities.
Our work makes several The first main contribution is the proposal and demonstration of a simple feature set that achieves good classification performance.
The second contribution is the development of a real-time HAR system.
Literature Review Human activity recognition (HAR) and monitoring technologies have gained significant attention in recent years due to their applications in healthcare, industrial settings, and assisted Various approaches leveraging inertial measurement units (IMU.
, wearable sensors, and machine learning techniques have been explored to enhance recognition accuracy and system Human Activity Recognition in Industrial and Assisted Living Contexts Human activity recognition plays a crucial role in industrial human-robot interaction and assisted living applications.
Roitberg et al.
presented a HAR system for industrial environments, where precise recognition of worker activities is essential for safety and Similarly.
Chernbumroong et al.
focused on elderly activity classification to support assisted living, using machine learning techniques to distinguish various daily activities.
Peetoom et al.
reviewed monitoring technologies for independent elderly individuals, emphasizing the importance of unobtrusive and reliable sensing solutions.
Their findings highlighted the potential of wearable and ambient sensors in improving the quality of life for aging populations.
Sensor-based Approaches for HAR Jain & Kanhangad .
explored the use of accelerometer and orientation sensor data in smartphones for personal authentication, demonstrating the potential of IMU sensors in HAR Pierleoni et al.
developed a high-reliability wearable device for fall detection in the elderly, which combined accelerometer-based features with a robust classification model.
Similarly.
Pham Van Thanh et al.
proposed a real-time fall detection system using a 3-degree-of-freedom accelerometer, highlighting its high accuracy and Ziaeefard et al.
, .
provided a comprehensive review of semantic HAR, discussing how contextual information improves activity classification accuracy.
Marquis-Faulkes et al.
gathered user requirements for fall detection systems, emphasizing usability and acceptance among elderly users.
Additional research, such as Lin & Ling .
and Hazelhoff et al.
, investigated video-based fall detection, demonstrating the effectiveness of visionbased methods for intelligent homecare applications.
Wearable Sensors and Smart Environment-based HAR Smartphone and wearable sensor-based HAR have been widely studied.
Wang et al.
compared different inertial sensor configurations in smartphones for activity recognition, emphasizing the trade-offs between sensor placement and recognition accuracy.
Sundholm et al.
introduced a smart-mat-based system for exercise recognition, showcasing alternative sensing modalities beyond IMUs.
Thu et al A Vol 6.
2025: 790-801 Rahman et al.
proposed DoppleSleep, a contactless sleep monitoring system using Doppler radar, demonstrating the feasibility of non-invasive HAR solutions.
Hevesi et al.
used thermal sensor arrays for household activity recognition, showcasing a cost-effective alternative to vision-based approaches.
Similarly.
Torres .
explored distributed smart camera networks for fall detection and localization, demonstrating the role of networked sensing in elderly care.
Machine Learning and Deep Learning in HAR The use of machine learning and deep learning has significantly improved HAR Sepahvand et al.
provided a state-of-the-art survey on IMU-based HAR, covering feature extraction, classification models, and real-world applications.
Zhang et al.
proposed a multi-level network for HAR using wearable sensors, demonstrating improvements in classification accuracy using hierarchical architectures.
Earlier works such as Khan et al.
, .
and Zhu & Sheng .
explored accelerometer-based HAR and gesture recognition for robot-assisted living, respectively.
Bulling et al.
provided a tutorial on using body-worn inertial sensors for HAR, covering key challenges such as sensor placement, feature engineering, and computational efficiency.
Banos et al.
examined the impact of window size on HAR performance, showing how temporal segmentation affects classification accuracy.
Kwapisz et al.
utilized smartphone accelerometers for activity recognition, pioneering mobile-based HAR.
Catal et al.
explored ensemble classifiers for HAR, demonstrating improved accuracy using classifier fusion techniques.
Public Datasets and Benchmarking Walse.
H et al.
compared machine learning algorithms for HAR using the WISDM dataset, identifying optimal classifiers for different activity types.
Vavoulas et al.
introduced the MobiAct dataset, enabling standardized benchmarking for HAR research.
The availability of such datasets has facilitated the development and evaluation of robust HAR Overall, the literature highlights significant progress in HAR technologies, particularly in wearable sensor-based activity recognition, fall detection, and machine learning-driven Future research should focus on improving model generalization, enhancing realtime deployment, and integrating multi-modal sensing for more robust activity recognition Material and Methods Human Activity Recognition Model Machine learning models are commonly categorized into supervised learning, unsupervised learning, semi-supervised learning, and sometimes reinforcement learning, based on their learning styles.
The supervised learning algorithm uses a mapping function that creates known .
nput, resul.
pairs to predict the outcome.
Unsupervised learning, on the other hand, uses only input data without any predicted labels or outcomes.
While a lot of data is used in semi-supervised learning, not all of it is labeled.
Furthermore, depending on the present situation, the system can select the best course of action to maximize performance according to reinforcement learning models.
To increase accuracy, this investigation will use supervised The selection of supervised learning is based on several key factors that enhance the reliability and accuracy of our estimations: .
our study benefits from a well-structured dataset with sufficient labeled samples.
supervised learning provides a higher degree of predictive accuracy compared to unsupervised or semi-supervised methods.
the complexity of the task favors supervised learning over other approaches.
As illustrated in Fig.
1, the entire approach for human activity recognition involves three stages: data collection, feature analysis, and activity recognition.
Thu et al A Vol 6.
2025: 790-801 Fig.
Human Activity Recognition Using Acceleration Sensor The stream acceleration data in three axes (Ax.
Ay, and A.
would be divided into Lsecond segments.
Subsequently, features derived from the data would be contained in each vector formed from each of the aforementioned segments.
Selecting features to feed into the classifier would be the next stage.
Lastly, the features chosen in the previous stage would be used to train classification models.
The classifiers would then be obtained to recognize activities based on the learned model.
Each task's specifics are covered in the sections that follow.
Data Acquisition To facilitate classification tasks, the suggested system combines a wireless data transmitter/receiver, a low-cost IMU, and a ADXL345 sensor .
t a sampling rate of 50 H.
into the wearable.
Data will be transferred from this device to an Android-powered smartphone.
This technique can greatly reduce the inconvenience that users experience when recording data.
total of 10 subjects .
males, 5 females, aged between 18 and 28 year.
participated in the study.
All participants provided informed consent before participating in the study.
The research protocol was approved by Ethics Committee, adhering to ethical guidelines for human data collection and privacy protection.
Table I below provides definitions of popular human activities from an existing benchmarking UCI data.
After building the model, as shown in Fig.
1, and designing the system with hardware components, as shown in the Results section, we applied the algorithm to this device.
The embedded device's microcontroller served as its central processing unit.
Consequently, memory requirements and algorithmic complexity were present during real-time identification.
Activity Standing Sitting Laying Walking Walking Upstairs Walking Downstairs Table 1 - Labels in HAR Description Upright on the feet without significant movement Seated with the body weight supported primarily by the buttocks and thighs An individual is in a horizontal position Moving at a regular and fairly slow pace by lifting and setting down each foot in turn Coordinated steps where one foot is placed on a higher step followed by the other foot Moving down stairs by placing one foot on a lower step followed by the other foot Feature Analysis Window Size: Energy-based segmentation, rest-position segmentation, sliding window segmentation, segmentation using one modality of sensors to segment data from another modality, and external context sources were among the many segmentation approaches available (Bulling et al.
, 2.
The sliding window method of data segmentation was used for this work due to its ease of use and compatibility with real-time applications (Banos et al.
It has been demonstrated that this method works well for identifying both static activities .
ike standing and sittin.
and periodic activities .
ike walkin.
This method divides the sensor Thu et al A Vol 6.
2025: 790-801 signals into fixed-size time intervals.
These come in two varieties: overlapping windows and non-overlapping windows.
In the former case, the time windows overlap.
The system's performance in terms of recognition is greatly impacted by the choice of window size.
Therefore, experimenting with various window sizes was necessary to obtain the best value.
(Kwapisz et al.
, 2.
demonstrated that a window size of 20 seconds yielded less encouraging findings compared to segmenting the data into 10-second windows without To find the optimal size, we evaluated a large range of sizes .
second Ae 25 second.
with different overlapping intervals.
The findings revealed that a window with a fixed length of 10 seconds and 50% overlapping produced good results.
Because choosing a window length that was too small would not give adequate information to explain actions, this window was chosen.
On the other hand, if the window size was too big, it might enable more than one sort of activity to be included in a data frame.
This window duration was chosen to ensure that effective measurements captured every step of every behavior, maintaining the integrity of every activity.
Table 2 below provides a breakdown of activity observations for testing from benchmarking UCI data.
Table 2 - Benchmarking data frame Activity Data frame Standing Sitting Laying Walking Walking Upstairs Walking Downstairs Total Feature selection: The features used to inform the activities must be carefully chosen to achieve high categorization efficiency.
Prior to feature extraction, feature selection was We examined a portion of each activity's data to choose features.
It became evident to us that the values of the static states (Standing.
Sittin.
fell within a narrow range.
Therefore, it would be straightforward to differentiate between static states and separate them from dynamic ones using the two properties of mean and median, which represent data concentration.
Additionally, compared to static states, the range of values for dynamic states like walking was wider.
In order to ascertain the variation between the top and lowest values in these states, the range was selected.
The standard deviation (SD) was an additional component utilized to determine the data dispersion of this activity in relation to the mean in order to capture the difference from a different state.
It is also evident that Standing and Walking had higher ax OO 0g values than Sitting and Walking, indicating that these activities could be characterized by RMS .
oot mean squar.
also exploit the acceleration data to evaluate the importance of time-domain features, as shown in Fig.
Fig.
Feature Ranking Using Acceleration Sensor There are 23 features were selected as follows: mean_x, mean_y, mean_z, std_x, std_y, std_z, min_y, max_y, median_z, range_y, rms_x, rms_z, energy_x, energy_z, iqr_y, entropy_x, entropy_z, sma, correlation_yz, skew_y, skew_z, kurtosis_x, kurtosis_y.
Thu et al A Vol 6.
2025: 790-801 Feature Table 3 - Definition of features in x, y, and z axes Meaning Mean value Standard deviation Median absolute deviation Largest value in array Smallest value in array Interquartile range Signal entropy Signal magnitude area Correlation coefficient between two signals Skewness of the frequency domain signal Kurtosis of the frequency domain signal After deciding to employ the 23 aforementioned features, we split the dataset into two segments: a test set comprising 40% and a training set including 60%, respectively.
Next, we utilize the selected features to explore the training dataset and build the model.
Estimation Method The classifier is crucial for activity classification.
In addition to having a short training time and high accuracy, classifiers frequently have to satisfy real-time requirements.
The feature set that was extracted in the previous phase will be used as input for the training and classification process.
Classification techniques are widely used in machine learning Using the Scikit-learn module, data analysis methods such as Gradient Boosted Decision Tree (GBDT).
Support Vector Machine (SVM).
Random Forest (RF), and k-Nearest Neighbor (KNN) will be used to evaluate the performance of the classification model.
There are different ways to measure this performance.
the most widely used is the confusion matrix.
Accordingly, a few common measurements are deployed for better recognition, such as accuracy, sensitivity.
PPV, and NPV.
The formulas for these measurements are shown below:
NPV =
When an activity takes place and the model accurately predicts it, this is known as a True Positive (TP).
When an activity doesn't occur but the model mistakenly predicts that it did, this is known as a False Positive (FP).
When an activity occurs but the model is unable to anticipate it, this is known as a False Negative (FN).
When an action doesn't occur and the model accurately predicts that it didn't, the result is a True Negative (TN).
When assessing the categorization of a dataset with significant size differences between data classes, the use of two metrics the micro-average and the macro-average would be suitable.
Micro-average: is defined according to the following formula with TPc.
FPc.
FNc .
TNc being TP.
FP.
FN, and TN of class C, respectively:
micro-average accuracy = Oc Thu et al A Vol 6.
2025: 790-801
micro-average sensitivity = Oc micro-average PPV = Oc micro-average NPV = Oc Macro-average is the average of the values of the classes:
macro-average accuracy = Oc macro-average sensitivity = Oc macro-average PPV = Oc macro-average NPV = Oc Results and discussion Results We utilized GBDT.
SVM.
RF, and KNN for classification.
The detailed results of RF are presented as follows:
Table 4 is the confusion matrix that provides a detailed overview of how well the classification model performed for each behavior.
Overall, the model performs well in classifying laying activities, with all instances correctly predicted.
However, there are some misclassifications observed in distinguishing between walking activities .
pstairs and downstair.
, as well as between sitting and standing.
Laying Sitting Total Standing Walking Downstairs Walking Walking Upstairs Walking Downstairs Sitting Standing Laying Total Walking Observed Walking Upstairs Table 4 - Confusion matrix for RF model Predicted behavior Table 5 provides insights into how well the RF model performs for each activity.
Overall, the RF model demonstrates high accuracy and sensitivity across most activities, with perfect scores achieved for laying activities.
However, there are slight variations in performance for activities such as walking downstairs, where sensitivity is comparatively lower.
Walking Walking Upstairs Walking Downstairs Sitting Standing Table 5 - Performance indicator for each activity using RF model
Accuracy (%) Sensitivity PPV
(%)
(%)
NPV
(%)
Thu et al A Laying Accuracy Sensitivity PPV
NPV
Vol 6.
2025: 790-801 Table 6 - Performance indicator for each activity using RF model Micro-average (%) Macro-average(%) Fig.
3 provides a graph comparing the performance of different machine learning modelsAiGBDT.
SVM.
RF.
KNN, and RNNAion various activities: Walking.
Walking Upstairs.
Walking Downstairs.
Sitting.
Standing, and Laying.
The y-axis represents accuracy, ranging from 93% to 100%.
We observe that SVM consistently performs the best across all activities, demonstrating robustness and high accuracy.
RNN exhibits high variability, with excellent performance in Laying and Walking-related activities but significant drops in Sitting and Standing.
RF and GBDT maintain relatively stable and high performance across all activities but show noticeable dips in Sitting and Standing.
KNN generally has lower accuracy compared to other models, especially for Sitting and Standing, but performs well in Walkingrelated activities and Laying.
Fig.
Performance Of Models Using The Accuracy Indicator Fig.
4 illustrates the performance of models using the sensitivity indicator.
Once again.
SVM demonstrates consistent high performance across all activities, suggesting it is well-suited for this type of task, likely due to its effectiveness in high-dimensional spaces and its ability to find optimal hyperplanes for classification.
KNN's high accuracy in dynamic activities like Walking Downstairs and Laying can be attributed to its ability to capture temporal dependencies, making it effective for recognizing activities with distinct sequential patterns.
However, the significant drop in accuracy for Sitting and Standing across all models indicates that these activities present a greater challenge for machine learning algorithms.
This challenge may arise from the subtler and less dynamic nature of these activities, which can be harder to distinguish based on the features used.
Further research could explore additional feature engineering or the incorporation of context-aware models to improve performance in these Fig.
Performance of Models Using The Sensitivity Indicator Tables 7 and 8 display the evaluation metrics, including Accuracy.
Sensitivity.
Positive Predictive Value (PPV), and Negative Predictive Value (NPV), for five models.
These metrics offer a comprehensive understanding of each model's performance in classifying physical Thu et al A Vol 6.
2025: 790-801 SVM consistently outperforms other models across all metrics, demonstrating its robustness and reliability in activity classification tasks.
Its high accuracy, sensitivity.
PPV, and NPV indicate that it is well-suited for effectively distinguishing between different activities.
Table 7 - Performance comparison among difference models using micro-average methods Classification Model (%) Evaluation method
GBDT
SVM
KNN
Accuracy Sensitivity PPV
NPV
RNN
Table 8 - Performance comparison among difference models using macro-average methods Classification Model (%) Evaluation method
GBDT
SVM
KNN
Accuracy Sensitivity PPV
NPV
RNN
Discussions Sitting and standing remain particularly challenging to classify due to their high intraclass similarity, as both activities exhibit minimal acceleration and angular velocity changes compared to dynamic movements like walking or running.
The current feature set may not sufficiently capture subtle differences, suggesting the need for additional postural stability features or higher-order statistical analysis.
Moreover, sensor placement plays a critical role, as IMUs positioned on the wrist or ankle may not effectively detect posture transitions compared to placements on the chest or lower back, which better align with the bodyAos center of mass.
Incorporating multi-sensor fusion, such as combining IMUs with pressure or depth sensors, could improve classification accuracy.
Additionally, data augmentation techniques and hybrid approachesAisuch as integrating rule-based heuristics with deep learning modelsAimay further enhance performance.
Addressing these challenges would lead to a more robust and reliable classification of static postures, improving the modelAos real-world applicability.
Undoubtedly, feature selection is crucial to the recognition process.
Numerous studies select a sizable number of features, such as forty-three in (Kwapisz et al.
, 2011.
Catal et al.
, and sixty-four in (Vavoulas, 2.
, in order to enhance the rate of activity detection.
Despite this, our current study achieved comparatively high recognition efficiency by employing a straightforward and efficient set of 23 statistical features along with the ideal sliding window size.
Moreover, the categorization procedure in our work is performed in real time at the microcontroller .
ee Fig.
The classified results are displayed on the terminal .
, as depicted in Fig.
Thu et al A Vol 6.
2025: 790-801 Fig.
Our proposed real-time system Fig.
Real-time observation of HAR on the Smartphone While our algorithm may be suitable for older people or patients, it was not designed for use in groups of people performing dangerous jobs like firefighters, as these subjects' activities frequently involve more intense settings, including massive fires or thick smoke.
Additionally, it appears that the requirement to fix the sensor's location on the waist limits the device's usability.
Conclusions This study establishes a recognition system with low cost, minimal computation time, and real-time response.
The findings of this study confirm that a simple set of five features can effectively classify everyday behaviors, even when performed by different individuals.
Test results reveal that when combining a 10-second window size with a classifier, the Random Forest (RF) classifier achieves higher overall accuracy compared to GBDT.
SVM, and KNN.
Thu et al A Vol 6.
2025: 790-801 For future research, we aim to further develop and validate these initial findings by conducting experiments on more complex activities to assess the accuracy of the proposed model.
We can also integrate data from the time and frequency domains, or extend feature extraction beyond statistical variables in the time domain to encompass other features like those in the frequency Furthermore, experiments can explore machine learning models such as deep learning methods to find the most optimal solution for the classification problem's goal.
Moreover, we intend to develop and expand systems that automatically classify more complex behaviors, providing more effective support and care for patients or the elderly in specific circumstances.
References