SINERGI Vol.
No.
October 2025: 661-676 http://publikasi.
id/index.
php/sinergi http://doi.
org/10.
22441/sinergi.
Development of a Machine Learning Model for the Classification of Healthy and Diabetic Subjects using Electromyography Signal Muhammad Fathi Yakan Zulkifli1.
Noorhamizah Mohamed Nasir1*.
Muhammad Amin Ab Ghani2.
Andi Andriansyah3.
Mohammad Suhaimi Selomah1.
Tay Gaik Tay1.
Danial Md Nor1 Department of Electronic Engineering.
Faculty of Electrical and Electronic Engineering.
Universiti Tun Hussein Onn Malaysia.
Malaysia.
Faculty of Technical and Vocational Education.
Universiti Tun Hussein Onn (UTHM).
Malaysia Electrical Engineering Department.
Faculty of Engineering.
Universitas Mercu Buana.
Indonesia Abstract Diabetes can lead to complications like Diabetic Peripheral Neuropathy (DPN), which impacts muscle and nerve function.
Electromyography (EMG) is a standard diagnostic tool for detecting DPN, but its complex signals make analysis time-consuming, delaying detection and treatment.
This study aims to develop and compare machine learning models for classifying healthy and diabetic individuals using EMG data collected during dorsiflexion The Muscle Sensor V3 recorded EMG signals, which were then transformed into time-domain featuresAiRoot Mean Square (RMS).
Mean Absolute Value (MAV).
Standard Deviation (SD), and Variance (VAR)Aifor classification purposes.
Machine learning models, including K-Nearest Neighbour (KNN).
Support Vector Machine (SVM), and Artificial Neural Network (ANN), were optimized using Particle Swarm Optimization (PSO).
The analysis revealed that healthy individuals exhibited higher EMG amplitudes than those with diabetes.
Among the models.
ANN achieved the highest classification accuracy .
44%) compared to SVM .
and KNN .
78%).
These results demonstrate the effectiveness of ANN as a reliable classifier for distinguishing between healthy and diabetic individuals, offering a more efficient and accurate approach to EMG data analysis for potential clinical applications.
Keywords:
Artificial Neural Network (ANN).
Diabetes.
Electromyography (EMG).
K-Nearest Neighbour (KNN).
Particle Swarm Optimisation (PSO).
Support Vector Machine (SVM).
Article History:
Received: November 5, 2024 Revised: December 16, 2024 Accepted: January 9, 2025 Published: September 2, 2025 Corresponding Author:
Noorhamizah Mohamed Nasir Department of Electronic Engineering.
Faculty of Electrical and Electronic Engineering.
Universiti Tun Hussein Onn Malaysia (UTHM) Email:
hamizahn@uthm.
This is an open access article under the CC BY-SA license INTRODUCTION Diabetes is one of the most chronic diseases, and its prevalence grows yearly .
leads to serious problems, such as Diabetic Peripheral Neuropathy (DPN), which affects as many as half of those who have diabetes .
DPN
can damage nerves and blood vessels in the lower legs, resulting in plantar foot ulcers .
These ulcers, if infected, can progress and potentially spread to the bone or surrounding tissues, leading to severe complications.
Furthermore.
DPN can disrupt the essential dorsiflexion movement, which involves lifting the foot upward at the ankle joint during walking.
This can lead to gait abnormalities and significantly increase the risk of falls and injuries .
Therefore, the early detection of DPN is crucial for individuals with diabetes to maintain a high quality of life.
This can be achieved by adopting a healthy lifestyle, which includes eating a healthy diet and exercising regularly .
Zulkifli1et al.
Development of a machine learning model for the classification of A SINERGI Vol.
No.
October 2025: 661-676 Electromyography (EMG) has emerged as the industry standard for detecting nerve damage in muscles .
By measuring and recording the electrical activity of skeletal muscles.
EMG provides valuable insights into muscle conditions, including strength and weakness .
However, analysis of EMG data can be complex and timeconsuming, especially when dealing with large and intricate datasets .
These factors cause delays in detecting and treating peripheral Therefore, more advanced methods are needed to improve the efficiency and accuracy of EMG data analysis .
Following this challenge, machine learning classification techniques are one of the appropriate alternatives .
Machine learning, a subfield of Artificial Intelligence (AI), focuses on developing algorithms and statistical models that enable systems to learn, predict, classify, and make decisions .
This approach has rapidly advanced machine learning-based classification techniques within the health science industry, enhancing disease detection capabilities .
By leveraging these models, healthcare professionals can obtain more detailed and accurate information about muscle function, which can help in the detection and treatment of DPN.
This study contributes to the development and comparison of machine learning models for classifying healthy individuals and those with diabetes, utilizing time-domain features extracted from EMG data recorded during dorsiflexion Since the choice of machine learning models depends on several factors, such as the complexity and size of the data, three commonly used machine learning models .
, namely KNearest Neighbours (KNN).
Artificial Neural Network (ANN), and Support Vector Machine (SVM) were employed and compared to determine their suitability for the EMG data and the study method.
RELATED WORKS
EMG is a technique that records the electrical activity in muscles and generates complex signals influenced by various factors, such as muscle size, location, and activity level .
For example, variations in muscle characteristics, such as size and location, can affect the amplitude and morphology of the signals .
Additionally, the level of muscle activity, which includes factors such as intensity and coordination, also affects the characteristics of the signals .
These factors collectively contribute to the complexity of the recorded signals, making their analysis a challenging task.
The complexity of EMG signals can make accurate analysis difficult, especially in settings where specialized knowledge and expertise are not readily available .
Traditional methods of analyzing EMG data often involve manual processing, which can be time-consuming and can lead to errors.
For example, some methods involve visual inspection of EMG signals to identify abnormal patterns, which can be subjective and prone to error .
Other methods include calculating parameters from EMG signals, such as amplitude, frequency, and duration, which are time-consuming knowledge in signal processing and analysis .
These obstacles create a barrier to obtaining the most accurate and efficient detection of muscle and nerve disorders, which is crucial for the most effective treatment.
Thus, there is a need for more efficient and precise methods to analyze EMG Machine learning, a subset of Artificial Intelligence (AI), enables computers to make more accurate predictions and classifications .
Moreover, machine learning models can operate independently without human intervention, knowledge to understand the complexity of situations and adapt accordingly .
processing input data with labeled targets, machine learning models have become a powerful tool for model fitting and data processing in the era of big data.
In the healthcare industry, where massive amounts of data are generated, machine learning has demonstrated its effectiveness in generating predictions and facilitating informed decision-making .
Machine learning is broadly classified into two categories: supervised and unsupervised The primary difference between these categories lies in the presence or absence of labeled data within the dataset.
Unsupervised learning is typically used for unlabelled data, tackling pattern recognition problems by identifying and grouping data based on standard features .
Dimensionality reduction techniques such as Principal Component Analysis (PCA) and clustering models like K-means are commonly used in unsupervised learning .
However, the number of categories or clusters and their significance are not predetermined.
Instead, these factors need to be determined by analyzing the data itself, often through exploratory data analysis and visualizations .
The goal is to gain insights into the underlying structure of the data without requiring predefined labels or categories.
Supervised learning requires labeled training datasets with input and output values, as Zulkifli1et al.
Development of a machine learning model for the classification of A p-ISSN: 1410-2331 e-ISSN: 2460-1217 it learns the mapping between input and target values to predict or classify the target value for new input data .
These techniques utilize labeled training data to learn patterns and relationships, enabling accurate predictions and classification of new, unseen data.
Using supervised learning, various methods have been developed for data classification, such as KNearest Neighbors (KNN).
Artificial Neural Networks (ANN), and Support Vector Machines (SVM) .
, 29, .
In the context of this study, supervised learning is the most suitable approach for achieving accurate classification.
By using relevant medical measurements as input features and assigning class labels such as "healthy" and "diabetic" as the target categories, the supervised learning model can learn and generalize from the labeled data to accurately classify individuals.
This approach not only ensures precise classification results but also holds the potential to revolutionize Diabetes Peripheral Neuropathy (DPN) detection and treatment strategies.
Applying supervised learning in this context can enhance patient outcomes and drive significant advancements in healthcare .
MATERIAL AND METHOD
The research procedure begins with the development of necessary hardware and software to record and analyze muscle activity signals.
The experiments of this research were conducted to ensure that the sensors function correctly.
Since the sensors were working, the experimental protocol was then executed to obtain ethical Before seeking ethical approval, it was essential to establish the necessary hardware and experiment protocol for data collection in order to present the proposal to the ethics committee.
This action involved setting up the required sensors, data acquisition devices, and other equipment and developing a detailed experimental protocol that outlined the study design, procedures, and data collection methods.
Ethical approval is crucial to ensure that research and data collection are conducted safely and ethically, with appropriate measures taken to protect the rights and welfare of research subjects.
After obtaining ethical approval, a search was conducted to identify suitable subjects who met the inclusion and exclusion criteria to participate in the study.
This process resulted in the recruitment of 40 subjects who met the criteria, and all of them willingly participated in the study.
Regrettably, this limited number of data samples is insufficient for the subsequent classification To address this challenge, a solution was implemented by leveraging synthetic data generation from actual data.
This innovative approach involved two cycles of generating synthetic data, resulting in a total of 120 samples for classification.
Then, the data underwent a feature extraction phase using four features:
Variance (VAR).
Mean Absolute Value (MAV).
Standard Deviation (SD), and Root Mean Square (RMS), to facilitate the classification process.
Classification uses three machine learning models: KNN.
ANN, and SVM.
The classification process begins by loading the input-output EMG data into the MATLAB workspace.
Each output target sample is labeled as '1' for a healthy sample or '0' for a diabetic sample.
To avoid potential bias in classification analysis, the input-output data were defined and then randomly arranged.
Then, the data is split randomly, with 70.
00% allocated to the training set and 15.
00% to both the validation and testing sets.
The training set was used to train the classification models, the validation set was employed for fine-tuning and model validation during the training process, and the testing set was utilized to evaluate the models' The hyperparameters of each parameter algorithm are set before training the models.
PSO
is used to efficiently optimize the hyperparameters and find the optimal values for the hyperparameter Once the hyperparameters have been optimized, the classification models (KNN.
ANN, and SVM) are trained using the training set, incorporating the optimized hyperparameters.
Then, a validation set is used to evaluate the model's performance after the hyperparameters have been tuned.
The model with the best performance on the validation set is then selected for the final evaluation on the test set.
To evaluate the performance of the models, accuracy, sensitivity, and specificity metrics were utilized.
Accuracy is a measure of how often the model correctly classifies the data.
Sensitivity is a measure of how well the model identifies positive .
instances, and specificity is a measure of how often the model correctly identifies negative .
The discussion and conclusion sections interpreted the results and concluded the model's performance.
Experimental Protocol The experimental protocol is outlined in Figure 1.
As observed in Figure 1, the subject criteria for inclusion and exclusion are based on specific criteria that must be met for individuals to be considered eligible for participation.
Zulkifli1et al.
Development of a machine learning model for the classification of A SINERGI Vol.
No.
October 2025: 661-676 Subject criteria inclusion and Electrode location and Signal recording Contraction Figure 1.
Block diagram of experiment protocol Electrodes were then attached to the correct locations, following the guidelines provided to ensure accurate data collection and to prevent any side effects on the subjects.
Before recording the signal, the sensor was calibrated to ensure that the contractions being recorded accurately reflected the subject's muscle activity.
This step helped to ensure that the data collected was reliable and valid.
Once the sensor had been calibrated, data were recorded according to the established procedure.
Before the day of data collection, the subjects were informed about the purpose of this study during an initial meeting or through a phone If the subject agreed to participate, they would then select a date, with their home serving as the data collection site.
Before participating in this study, the subjects were screened to determine if they met the inclusion and exclusion criteria.
Subjects in this study were either healthy .
ithout diabete.
or had diabetes, and were male or female, between the ages of 18 and 65.
The selection of gender and age range was based on a previous study, which indicated no correlation between gender and adult age with the level of muscle activity in terms of dorsiflexion .
This decision was made to ensure that gender and age did not significantly impact classification accuracy and were not confounding factors in accurately assessing the effects of diabetes neuropathy.
The exclusion criteria of this study were defined to ensure that the study accurately focused on assessing the effects of diabetes.
Subjects with a history of peripheral nervous system disorders.
Parkinson's disease, stroke, significant muscle atrophy in their lower limbs, or ulcers or gout were excluded .
By excluding individuals with these conditions, the study minimized potential confounding factors and isolated the specific impact of diabetes on the recorded EMG activity .
When subjects met the criteria, they were briefed about the experiment procedure and received a leaflet with study information as written.
The leaflet provides additional information about the study, including risks and other relevant details, to help subjects make an informed decision about their participation.
Once the subjects signed the consent form, it indicated that the subjects had read and understood the information provided and voluntarily gave their consent to participate in the study.
Contraction Calibration Electrode pads are typically attached to the skin over the muscle of interest.
The skin surface was cleaned to reduce resistance before placing the electrodes.
The tibialis anterior (TA) muscle has been the focus of most research in this area .
This study is suitable because diabetic neuropathy often affects the lower leg, and the TA muscle is the primary dorsiflexor of the leg .
Dorsiflexion is important in gait because it allows the leg to clear the ground during the swing phase.
Additionally, the TA muscle relies heavily on a well-functioning nerve supply for optimal performance, making it more susceptible to nerve damage associated with particular body As shown in Figure 2, when the subject is seated on a chair, the green electrode is placed a third of the way between the end of the fibula and the end of the medial malleolus to capture EMG signals, and the red electrode is placed on the muscle near the ankle joint to act as ground.
The yellow electrode is placed on the bony part of the ankle, which is an inactive section of the body, to serve as a reference.
For this study, the yellow electrode was placed on the ankle, following SENIAM guidelines.
When using an EMG sensor to collect muscle signals, contraction calibration is carried out to ensure that the signals recorded by the EMG sensor are accurate and reliable.
EMG Signal Electrode Ground Reference Figure 2.
Electrode location on the TA muscle Zulkifli1et al.
Development of a machine learning model for the classification of A p-ISSN: 1410-2331 e-ISSN: 2460-1217 In EMG, the amplitude is relative and must be related to a reference contraction or calibration contraction .
The amplitude of the EMG signal represents the intensity of physiological activities in the motor unit during muscular contraction.
Therefore, the calibration procedure must be carefully designed to ensure accuracy, strictly adhering to established protocols and equipment calibration as elaborated in .
After placing the electrode pad on the muscle, the subject was asked to perform free movement for 30 seconds to observe the response of the signal with muscle contraction.
Figure 3 illustrates the muscle signal during contraction calibration, and Table 1 shows the muscle condition at different time points during a 30-second contraction calibration process.
When the muscle is in a resting condition at time 0 seconds, the signal has a lower amplitude.
At 3 seconds, the muscle contracts, resulting in a higher amplitude signal, which aligns with the predicted results .
These patterns of muscle contraction and signal response persist throughout the entire 30-second duration, as shown in both Table 1 and Figure 3.
Suppose the signal does not respond during muscle In that case, it is essential to troubleshoot the system to identify the cause of the problem, including examining hardware Figure 3.
Signal of the muscle during contraction Recording Data After finishing the contraction calibration, the subject was given a five-minute break.
The subject was instructed to slowly lift the forefoot and toes towards the shin and press the heel into the ground .
As shown in Figure 4.
The toes are lifted as high as possible for one minute .
, then lowered gradually back to the floor.
The recorded signal from this experiment was saved before being uploaded to facilitate the extraction and classification process.
Recruitment Subjects A total of 40 volunteer subjects were enrolled in the research study, consisting of twenty healthy individuals aged 51.
9 A 6.
5 years and twenty subjects with diabetes, aged 54.
1 A 8.
years, who had been living with diabetes for a duration of 17.
1 A 12.
1 years.
The subject data are shown in Table 2.
The Muscle Sensor V3 was used to record muscle activity, as it can detect the presence of muscle damage.
The TA muscle was chosen as the reference for the signal because it plays a crucial role in dorsiflexion during gait movement.
The study collected EMG signals from healthy individuals and those with diabetes, as diabetes can cause nerve damage that affects muscle A total of 40 subjects were recruited from the Kemaman district, comprising 20 healthy individuals and 20 individuals with diabetes.
Each subject generated a sample, resulting in a total of 40 samples.
Figure 4.
Setup of the equipment using Muscle Sensor V3.
The green electrode is placed over the Tibialis Anterior (TA) muscle on the dominant .
The subject does dorsiflexion for one .
The subject lowers the foot to rest.
Table 2.
Demographic data for the subjects Healthy 9 A 6.
2 A5.
Diabetics 1 A 8.
9 A6.
1 A 12.
7 A6.
Zulkifli1et al.
Development of a machine learning model for the classification of A Table 1.
Muscle condition during contraction calibration in 30 seconds Time .
Start Male/Female Age (Year.
BMI .
coyci/yco2 ) Duration of diabetes (Year.
HbA1c (%) SINERGI Vol.
No.
October 2025: 661-676 The Generation of Synthetic Data Although 40 samples were collected from 40 subjects, this sample size was not sufficient to carry out accurate classification, mainly when it was divided into three stages: training, testing, and validation, as noted by previous studies .
, .
A synthetic method was employed to address this limitation, which involved generating additional samples using a combination of existing data and computer simulation.
This approach increases the dataset size and reduces the impact of the limited sample size, ensuring greater accuracy and generalizability of the results.
The synthetic data in this study was generated from the actual data using Random White Gaussian Noise (RWGN).
Figure 5 shows the 1-D signal of a diabetes subject for the corresponding data.
The above image represents a 1D signal with amplitude .
n mV) plotted over time .
n second.
, likely corresponding to physiological or biomedical data.
This could be, for example, electrical activity or time-series data recorded from a sensor.
RWGN is often characterized using the concept of Signal-to-Noise Ratio (SNR), as SNR measures how well a signal can be distinguished from background noise.
A high SNR indicates that the signal is strong relative to the noise, while a low SNR indicates that the noise is strong relative to the signal.
Therefore, selecting an appropriate SNR ratio is crucial for accurately reflecting the original signal.
To address this issue, the study carefully selected an SNR of 30 dB, considering the importance of accurately reflecting the original signal and the need to minimize noise impact, as suggested by .
The process of creating synthetic data began by loading the collected data into the MATLAB Figure 5.
1-D Signal of a diabetes subject In each case of the synthetic process, the sample of actual EMG data obtained was repeated for two cycles, resulting in two synthetic samples of data for each sample.
As a result, 40 sample subjects generated 80 synthetic datasets, each comprising 120 samples, to support the classification procedure.
The result of this process was then stored in a mat file to be used for subsequent processing .
eature extractio.
The models were evaluated using training, validation, and testing splits within the same dataset, rather than being tested on an entirely separate external dataset.
This can lead to overoptimistic performance metrics since the synthetic data shares statistical properties with the training Without validation on external datasets, the study's conclusions about model accuracy, sensitivity, and specificity may not generalize well to different populations or settings.
Feature Extraction Several studies have demonstrated that the time-domain technique can be utilized to detect muscular effort and fatigue .
This is because the time-domain parameters of EMG signals are easily measurable, and no changes to the signal are required.
While each feature of the signal has a unique character, using multiple features as input to the classifier can improve the accuracy of recognizing the EMG patterns .
Besides, a combination of features could capture more information about the EMG signal and provide more accurate classifications of the underlying This approach is a common practice in EMG signal processing, as it enables a more comprehensive representation of the signal and can enhance the performance of the classification system .
The most used features for the time domain are Mean Absolute Value (MAV).
Root Mean Square (RMS).
Variance (VAR), and Standard Deviation (SD), which were employed in these studies .
, .
MAV is the average of the absolute values of a set of numbers.
Root Mean Square (RMS) is the square root of the average of the squares of a set of numbers.
Variance (VAR): Measures the spread of a set of numbers, and SD is the square root of the variance.
It is another measure to investigate the spread of a set of numbers, but it is expressed in the same units as the original data that is being calculated.
Previous studies predominantly focus on time-domain or frequency-domain features.
The potential of hybrid feature sets or advanced feature extraction techniques, such as wavelet transforms or deep feature learning, remains Zulkifli1et al.
Development of a machine learning model for the classification of A p-ISSN: 1410-2331 e-ISSN: 2460-1217 Machine Learning Classification The classification flow process for KNN.
ANN, and SVM is illustrated in Figure 6.
The classification begins by loading the extracted input-output EMG data into the MATLAB The input layer consists of four features (MAV.
RMS.
VAR, and SD), and the output target is binary, with '1' representing the healthy label and '0' representing the diabetes To avoid potential bias in classification analysis, the input-output data were first defined, and the arrangement was then randomly To evaluate the performance of the machine learning model accurately on new, unseen data, it was necessary to split the collected data into three subsets: training, validation, and The training set was used to train the model and learn its parameters.
The validation set was used to fine-tune the model's parameters and prevent overfitting.
The testing set was used to evaluate the model's performance on data that it had not seen before.
A commonly employed split ratio in practice is 70.
00%, 15.
00%, and 15.
for training, testing, and validation, respectively .
The hyperparameters of each classification model are first set and optimized using PSO.
Once the hyperparameters had been optimized, the classification models (KNN.
ANN, and SVM) were The accuracy, sensitivity, and specificity were employed to assess the performance of the models.
Architectures of the Machine Learning Models (KNN.
ANN.
SVM) The capacity of a machine learning model to analyse data, identify patterns, and provide precise classifications is determined by its When it comes to utilizing EMG signals to categorize healthy and diabetic people, the architecture is essential in determining the model's generalisability, interpretability, and Figure 7 illustrates the classification process for two classes, specifically for the cases when K = 1 and K = 3.
Figure 7 .
displays the closest known (-) sample to the sample X utilised for categorizing sample X.
It means that the category of sample X is assigned based on the class of the closest neighbour (-).
Start Load data in MATLAB workspace for classification (KNN.
ANN, and SVM) Randomises and split for training, testing and validation set .
%,15%, and 15% respectivel.
Set the hyperparameter PSO optimises the Train the model Display results of accuracy, sensitivity and specificity End Figure 6.
Flowchart of the classification process.
In Figure 7 .
, there are two nearest ( ) samples and one (-) sample considered for categorizing sample X.
The majority class among these three neighbours ( ) is used to classify sample X.
This approach employs a majority voting mechanism to ensure accurate classification based on the consensus of its closest neighbours.
The architecture of an ANN typically consists of an input layer, a hidden layer, and an output layer, as shown in Figure 8.
The input layer receives data from external sources, then transmits it to the hidden layer.
The hidden layer processes the data and sends it to the output layer, using weights assigned to each node.
Figure 7.
The KNN illustration in classifying different numbers of K Zulkifli1et al.
Development of a machine learning model for the classification of A SINERGI Vol.
No.
October 2025: 661-676 Table 3.
Hyperparameters for KNN optimization Hyperparameter Distance Number of neighbours (K) Figure 8.
Structure of ANN The classification model constructs hyperplanes in high-dimensional space to effectively separate cases with different class labels, as shown in Figure 9.
The classification of Class A and Class B demonstrates the construction of hyperplanes by SVM to separate instances according to their classes effectively.
This visual representation offers insight into the process of maximizing the margin or distance between the two classes.
By maximizing the margin.
SVM aims to minimize the risk of misclassification and enhance the classifier's ability to classify unseen data .
KNN Hyperparameter Optimization The set of hyperparameters for KNN is shown in Table 3, which is incorporated into the KNN classification MATLAB script.
The hyperparameters used for optimization in KNN were the distance metric and the number of neighbors (K).
The distance metric is a function that measures the distance between two data points in the feature space.
The most common distance metrics used in this study were Euclidean distance.
Manhattan distance, and Minkowski distance .
The number of K is a hyperparameter that determines the number of nearest neighbours to consider when making a classification for a new data point.
The optimal value of K depends on the characteristics of the data and the problem being Type and value of Euclidean.
Manhattan, and Minkowski.
1 to 10 However, as a general guideline, the value of K can range from 1 to the square root of the number of samples in the training dataset .
For this study, the dataset comprised 120 samples, with a training set size of 70.
The training set consisted of 84 samples.
Since the square root of 84 is approximately 9.
1652, a reasonable range for the value of K employed in this case would be between 1 and 10.
ANN Hyperparameter Optimization In this study, a feedforward neural network was selected due to its suitability in addressing a binary classification problem involving two output classes ('1' for healthy and '0' for diabete.
By leveraging its architecture, a feedforward neural network was able to discern these two classes by acquiring an understanding of the underlying patterns present in the data.
Additionally, to address the unique challenges posed by scenarios involving relatively small datasets and the need for fast convergence and improved accuracy, the Levenberg-Marquardt backpropagation algorithm was employed, providing further optimization capabilities to the network .
This combined approach ensured robust and accurate classification results in the Table hyperparameters for the ANN, which are integrated into the ANN classification MATLAB Appendix The hyperparameters used for optimization in this research were the number of hidden layers and neurons, as well as the learning rate in The number of hidden layers and neurons is a crucial hyperparameter of an ANN, as it significantly affects the model's ability to learn complex patterns in the data.
For optimal performance, this study chose 1 to 2 hidden layers, considering the relatively small size of the dataset .
Table 4.
Hyperparameters for ANN optimization Hyperparameter Figure 9.
Hyperplane in 2-dimensional space Number of hidden Number of neurons Number of learning Type and value of 1 to 2 1 to 10 1, 0.
01, 0.
001, and 0.
Zulkifli1et al.
Development of a machine learning model for the classification of A p-ISSN: 1410-2331 e-ISSN: 2460-1217 This choice was made to avoid overfitting and ensure efficient utilization of available data.
Besides, the number of neurons in each hidden layer was carefully determined to range from 1 to 10 .
This selection was based on the understanding that a moderate number of neurons is generally suitable for addressing simple problems or datasets with limited This action allowed the neural network to balance capturing meaningful patterns while preventing unnecessary complexity by avoiding excessive neuron count .
Additionally, the learning rate is another crucial hyperparameter that significantly impacts the performance of the ANN during training.
The learning rate governs the magnitude of weight updates made during the training process.
For the fixed learning rate, a common practice starts with a value that is not too small, such as 0.
1, and then exponentially lowers it to obtain smaller constant values, 0.
01, 0.
001, and 0.
This gradual reduction in the learning rate enables more refined adjustments to the network's weights, promoting convergence and enhancing the training process.
SVM Hyperparameter Optimization The set of hyperparameters for the Support Vector Machine (SVM) is shown in Table 5.
This hyperparameter is included in the MATLAB script for SVM classification.
Several hyperparameters must be considered, including the type of kernel function, the regularization parameter (C), and the kernel parameter .
for the Radial Basis Function (RBF).
An important aspect of optimizing in SVM is the choice of the kernel function, which determines how input data is mapped into a higher-dimensional The commonly used kernel functions in SVM are linear, polynomial, and RBF .
The regularization parameter is crucial in balancing model complexity and generalization The optimal value of C typically falls within the range of 0.
01 to 100.
However, it may be necessary to experiment with different values to find the best value for a specific dataset.
previous study suggested that the following values of C are good starting points for experimentation:
01, 0.
1, 1, 10, and 100 .
Table 5.
Hyperparameters for SVM optimization Hyperparameter Type of Kernel Gamma Type and value of the Linear.
Polynomial, and Radial Basis Function 01, 0.
1, 1, 10, and 100 0001, 0.
001, 0.
01, 0.
1, 1, and 10 The gamma parameter controls the width of the RBF kernel.
Typically, the optimal range for gamma is between 0.
0001 and 10.
However, the optimal values of these hyperparameters may vary depending on the specific problem being addressed and may require experimentation to Some suggested values are 0.
001, 0.
01, 0.
1, 1, and 10 .
, 59, 60, 61, .
Performance Measurement Training, testing, and validation are stages in machine learning, while accuracy, sensitivity, and specificity are metrics used to evaluate a model's performance at various stages.
Accuracy, sensitivity, and specificity are commonly used to evaluate the performance of models, as they are particularly relevant in the medical profession.
These metrics are used to evaluate the quality of a classification and are often used to compare the performance of different models in performing the same task.
The calculation of accuracy, sensitivity, and specificity requires a confusion matrix.
Using the MATLAB software, the confusion matrix can be calculated automatically.
Table 6 shows the design of the confusion matrix.
In this study, the positive class of EMG data, obtained from healthy individuals without diabetes, was compared to the negative class of EMG data, collected from individuals with According to the confusion matrix, accuracy, sensitivity, and specificity are calculated as follows:
Accuracy:
Accuracy frequency with which the model correctly classifies the outcome.
It is defined as the ratio of the number of correct classifications to the total number of classifications made.
The equation is stated in .
Accuracy=(TN TP)/(TN TP FN FP) x100% .
Specificity: Specificity measures the proportion of actual negative cases .
that the model correctly identifies.
It is defined as the ratio of true negative cases to the total number of actual negative cases, as shown in .
Specificity=TN/(TN FP) x100% .
Table 6.
Confusion matrix design Actual Positive Actual Negative Prediction Positive TP (True Positiv.
FN (False Negativ.
Prediction Negative FP (False Positiv.
TN (True Negativ.
Zulkifli1et al.
Development of a machine learning model for the classification of A SINERGI Vol.
No.
October 2025: 661-676 Sensitivity: Sensitivity measures the proportion of actual positive cases .
that the model correctly identifies.
It is defined as the ratio of true positive cases to the total number of actual positive cases, as stated in .
Sensitivity=TP/(TP FN) x100% .
To understand the performance of machine learning models, it is essential to evaluate their ability to generalize from the training data to unseen data.
In machine learning, overfitting occurs when a model learns the training data too well, resulting in high accuracy on the training dataset but low accuracy on the testing dataset.
On the other hand, underfitting occurs when a model fails to learn the training data sufficiently, resulting in low accuracy on both the training and testing datasets.
RESULTS AND DISCUSSION
During the dorsiflexion movement.
EMG signal recordings were obtained from the lower part of the TA muscle using the Muscle Sensor V3 device for a duration of one minute.
The muscle activity signals for both classes, the healthy individuals and the diabetes group, are separately averaged as depicted in Figure 10.
As seen in Figure 10, the maximum amplitude values for the healthy and diabetic signals are 278 mV and 149 mV, respectively.
This vast difference in amplitude is because patients with diabetes usually display consistently lower and more variable motor unit discharge frequency than healthy individuals, indicating the presence of neuromuscular disease and potential peripheral neuropathy associated with diabetes.
Based on the analysis of the EMG signal, it is evident that healthy individuals exhibit higher amplitudes than those with diabetes.
This observation aligns with the understanding that individuals with diabetes may experience nerve damage, which can significantly impact the amplitude of the EMG signals.
The variations in muscle activation patterns and the presence of nerve damage contribute to the observed lower amplitudes in the EMG signals from individuals with diabetes compared to those from healthy This is supported by the findings of other relevant EMG studies, which also show higher amplitude in healthy subjects than in those with The collective evidence from this study strongly supports the validity and accuracy of Muscle Sensor V3 in effectively measuring muscle electrical activity.
Figure 10.
Comparison between healthy and diabetic signals.
In this study, three machine learning models were employed for classifying EMG data extracted using four features: MAV.
RMS.
VAR, and SD.
Their performance was evaluated by testing their accuracy levels.
By evaluating the performance of the models using the testing dataset, the study was able to obtain a more accurate estimate of how well the models can perform in real-world scenarios.
Furthermore, accuracy is used for comparing different machine learning models because it provides an assessment of how well the model is performing across all classes.
Table 7 presents a comparison of the sensitivity, specificity, and accuracy for KNN.
ANN, and SVM in the testing set.
Table 7.
Comparison of the performance of KNN.
ANN, and SVM in the testing set
KNN
Sensitivity (%) Specificity (%) Accuracy (%) ANN
SVM
Model Zulkifli1et al.
Development of a machine learning model for the classification of A p-ISSN: 1410-2331 e-ISSN: 2460-1217 The results highlight the superior performance of the ANN model, which outperforms both the KNN and SVM models in terms of sensitivity, specificity, and overall Its ability to accurately identify healthy and diabetic samples, combined with its high accuracy rate, has solidified its suitability for the given classification task.
There could be several reasons why ANN performed better than KNN and SVM in this study.
Firstly, the choice of features favored ANN over KNN and SVM.
These techniques produce a large number of features, which could have been a challenge for KNN and SVM.
ANN is known to perform well with high-dimensional feature spaces, which aligns with the use of RMS.
MAV.
SD, and VAR as feature extraction techniques.
KNN, on the other hand, may encounter difficulties with high-dimensional features when dealing with a large number of features.
SVM performance heavily depends on the selection and tuning of the kernel function and its parameters, especially in complex datasets with high-dimensional features.
Secondly, an ANN is a machine learning model capable of modeling complex relationships between input and output variables.
It can learn from data and make accurate classifications by adjusting the weights and biases of its neurons during the training process.
ANN is beneficial for handling large datasets and dealing with noisy or incomplete data.
It is the most complex model when compared with KNN and SVM, which are relatively more straightforward models than ANN.
On the other hand.
KNN is a non-parametric model that classifies data points based on the majority class of its nearest neighbours in the feature space.
SVM constructs a hyperplane or a set of hyperplanes in a high-dimensional space to separate the classes.
For KNN, the best hyperparameters identified in the research were Euclidean distance and K = 1.
However, the relatively low accuracy observed in the testing compared to the training set is probably due to the number of K=1.
In the KNN algorithm, when K=1, the model classifies each data point solely based on its closest neighbor without considering the other neighbors' This can result in overfitting, where the model becomes overly reliant on the training data and fails to generalize effectively to new data.
The results of the ANN indicate that it was able to effectively classify diabetes and healthy samples with hyperparameters of 1 hidden layer, nine neurons, and a learning rate of 0.
The model achieved high levels of accuracy, specificity, and sensitivity in all three stages .
raining, testing, and validatio.
, indicating that it was able to generalize well to new, unseen data.
In the case of SVM, the model performance was optimised by identifying the best hyperparameters, which were found to be a linear kernel function and a C value of 0.
The results demonstrated that the SVM model also achieved high levels of accuracy, specificity, and sensitivity, showcasing its effectiveness in accurately classifying the data.
However, the ANN model achieved even better performance, with slightly higher levels of accuracy, specificity, and Lastly, it is essential to acknowledge that the hyperparameters and training process for each model may have varied based on the study dataset, potentially influencing their respective ANN.
KNN, and SVM have hyperparameters that need to be carefully tuned to achieve optimal performance before any measurements are made.
In this study.
PSO was used to optimize the hyperparameters for each PSO hyperparameters that was better suited to the task at hand, which could have contributed to ANN's superior performance.
The primary objective of this study is to utilize machine learning techniques to automatically classify individuals into two categories: healthy and diabetic, based on a predefined set of features, which include Mean Absolute Value (MAV).
Root Mean Square (RMS).
Variance (VAR), and Standard Deviation (SD).
The study assessed the effectiveness of these three models - ANN.
SVM, and KNN - in classifying individuals with diabetes and healthy individuals based on EMG data using four different feature extraction techniques (MAV.
VAR.
RMS, and SD).
The results showed that the ANN model outperformed the other two models, achieving a testing accuracy of 94.
44%, which indicates its ability to differentiate between the two groups The SVM model achieved an accuracy 89%, while the KNN model had the lowest accuracy of 77.
78%, indicating its difficulty in accurately classifying the data.
These results are consistent with previous findings in the literature, where ANN models have often demonstrated higher performance in biomedical signal classification tasks.
For example.
Sadeghi et al.
reported that ANN outperformed traditional classifiers in real-world problem settings, while Sarker .
emphasized the robustness of neural networks in complex pattern recognition tasks.
Similarly, a study by Burns et al.
found that ANN-based classifiers demonstrated superior classification accuracy in upper limb EMG signal Zulkifli1et al.
Development of a machine learning model for the classification of A SINERGI Vol.
No.
October 2025: 661-676 conventional machine learning models.
CONCLUSION
In conclusion, this study developed and evaluated several machine learning models for classifying Electromyography (EMG) data obtained from healthy individuals and those with Muscle Sensor V3 was used to record the tibialis anterior (TA) muscle EMG during dorsiflexion movement.
In the study, healthy EMG signals exhibited higher maximum amplitudes .
mV) compared to diabetic signals .
mV).
This striking discrepancy demonstrates the neuropathy-induced dysfunction in people with diabetes.
These results are confirmed by previous research and validate that the Muscle Sensor V3 is a valuable technology for NP-induced muscle dysfunctions, as well as its clinical application in monitoring diabetic patients.
EMG data acquisition was converted into numerical features through the feature extraction process, with a specific emphasis on time-domain features, i.
Root Mean Square (RMS).
Mean Absolute Value (MAV).
Standard Deviation (SD), and Variance (VAR).
This fundamental step enables the application of relevant information to the signal while preserving the signal information, which produces valid data structures for In addition, by transforming the input variables as follows, the accuracy and performance of the classification models may be significantly enhanced, as demonstrated in this Therefore, the selection and validation of feature extraction techniques are of great significance in ensuring that the extracted features sufficiently reflect the relevant information for the classification task.
The time-domain feature set extracted by this study may be used as the basis of the ML models to distinguish between normal subjects and diabetes patients.
Certainly, hyperparameter optimization for a machine learning model is one of the most critical stages for achieving good hyperparameter identification is crucial, as the hyperparameter set can significantly impact the model's ability to distinguish between normal subjects and patients with diabetes accurately.
Particle Swarm Optimisation (PSO) was used in this work, mainly because of its ability to automatically search a large space of hyperparameters, thus contributing to the reduction of time and the attempt to find the bestperforming combination.
PSO was applied to effectively parameterize the hyperparameters of machine learning models, thereby improving their performance and the accuracy of classifying people with and without diabetes.
At last, the experiment compares the performance of the machine learning models KNearest Neighbour (KNN).
Artificial Neural Network (ANN), and Support Vector Machine (SVM) in classifying EMG data between healthy individuals and individuals with diabetes.
classification using the ANN model to EMG data was highest amongst the models .
between normal subjects and diabetes patients, followed by the KNN and SVM models, which achieved an accuracy of 77.
78% and 88.
respectively, based on EMG data, and in the approach in this study.
The impact of this finding is that ANN-based classifiers may be utilized in clinical settings to aid clinicians in accurately and timely diagnosing and monitoring neuropathy in the diabetic population.
By providing early detection and continuous monitoring, these classifiers can contribute to timely interventions and personalized treatment plans, ultimately improving patient outcomes and quality of life.
ACKNOWLEDGMENT
This research was made possible thanks to funding from the Tier 1 Grant Scheme (Code:
Research Management Centre at Universiti Tun Hussein Onn Malaysia.
Communication of this research is made possible through monetary assistance from Universiti Tun Hussein Onn Malaysia and the UTHM Publisher's Office via Publication Fund E15216.
Ethical Declarations Ethical approval is crucial to ensure that subjects are treated respectfully, their privacy is protected, and the experimental protocol is applied appropriately.
Obtaining ethical approval is crucial before collecting data from human The University's Research Ethics Subcommittee reviewed and approved the ethical approval, and this study referred to the University of Malaya Medical Centre (UMMC) guideline.
REFERENCES