Teja E.
Tju, et.
: Hand Sign Virtual Reality DataA (October 2.
Hand Sign Virtual Reality Data Processing Using Padding Technique Teja E.
Tju1.
Julaiha P.
Anggraini1, and Muhammad U.
Shalih1 1Faculty of Information Technology.
Universitas Budi Luhur.
Jakarta Selatan.
Indonesia Corresponding author: Teja E.
Tju .
-mail: teja.
endraengtju@budiluhur.
ABSTRACT This study focuses on addressing the challenges of processing hand sign data in Virtual Reality environments, particularly the variability in data length during gesture recording.
To optimize machine learning models for gesture recognition, various padding techniques were implemented.
The data was gathered using the Meta Quest 2 device, consisting of 1,000 samples representing 10 American Sign Language hand sign movements.
The research applied different padding techniques, including pre- and postzero padding as well as replication padding, to standardize sequence lengths.
Long Short-Term Memory networks were utilized for modeling, with the data split into 80% for training and 20% for validation.
additional 100 unseen samples were used for testing.
Among the techniques, pre-replication padding produced the best results in terms of accuracy, precision, recall, and F1 score on the test dataset.
Both preand post-zero padding also demonstrated strong performance but were outperformed by replication padding.
This study highlights the importance of padding techniques in optimizing the accuracy and generalizability of machine learning models for hand sign recognition in Virtual Reality.
The findings offer valuable insights for developing more robust and efficient gesture recognition systems in interactive Virtual Reality environments, enhancing user experiences and system reliability.
Future work could explore extending these techniques to other Virtual Reality interactions.
KEYWORDS Recurrent Neural Networks (RNN.
Sequential Data.
Signal Processing.
INTRODUCTION
The development of Virtual Reality (VR) technology has opened new opportunities in various fields .
, .
, including education .
, .
, .
, rehabilitation .
, .
, .
, and the development of applications for individuals with special needs .
, .
One promising application is hand sign interpretation for communication with individuals who have speech disabilities .
, .
, .
However, hand-sign data processing in a VR environment faces several challenges.
Specifically, the processing of hand sign data from VR devices encounters difficulties, particularly concerning the variation in data length generated during the recording process.
Therefore, in-depth research is needed to optimize the padding data processing method to enhance hand sign interpretation accuracy and efficiency.
Previous research has attempted to use VR technology to assist individuals with speech disabilities through hand sign Some studies have used image datasets with American Sign Language (ASL) .
, .
, .
and Malaysian Sign Language (MSL) .
Other research has utilized triboelectric gloves that produce voltage graph datasets .
Padding techniques are generally applied in studies with sequential or graphical datasets, such as sign VOLUME 06.
No 02, 2024 DOI: 10.
52985/insyst.
language recognition .
, speech emotion recognition .
, and padding modules .
with neural network modeling, as well as traffic flow prediction using Long Short-Term Memory (LSTM) models .
The novelty of this research lies in its application of padding techniques to VR hand sign data, specifically addressing the variation in data length generated during the data collection process using sequential primary data recorded directly from VR devices.
While our previous similar studies .
have focused solely on post-zero padding with 28 parameters, requiring more complex RNN models, this research employs 22 parameters, allowing for a simpler model architecture without sacrificing effectiveness.
This approach highlights the trade-off between model complexity and parameter count, demonstrating that a streamlined model can still achieve efficient performance.
The innovative use of padding techniques in the context of VR hand sign data, which is relatively new, offers a targeted solution for enhancing VR applications and supporting individuals with speech Teja E.
Tju, et.
: Hand Sign Virtual Reality DataA (October 2.
II.
METHODS
The research process is divided into four main stages, as illustrated in Figure 1.
Data Collection.
Implementing Padding Technique.
Machine Learning Modeling, and Evaluation and Testing.
These stages ensure a comprehensive approach to addressing the challenges of hand sign interpretation in a VR Data Collection Figure 1.
Implementing Padding Technique Machine Learning Modeling Data collection was performed using an application developed with Unity Editor .
, .
, as shown in Figure 4.
Each data recorded consists of 11 parameters each from the left and right hand, including trigger touch, trigger pressed, grip pressed, thumb touch, position (X.
Z), and quaternion (W.
Z).
Evaluation Testing Overview of research stages.
DATA COLLECTION
Primary data collection was conducted using the VR device Meta Quest 2 .
, as shown in Figure 2.
Ten types of hand sign movements were collected based on ASL .
, .
, .
, .
and selected for their ease of use and compatibility with VR devices.
These signs, illustrated in Figure 3, were chosen because they are common and straightforward, ensuring the VR system can accurately capture them.
Each sign was recorded with 100 samples, providing sufficient data for Figure 2.
Meta Quest 2: Immersive, all-in-one VR device .
Figure 3.
10 Hand sign movements.
VOLUME 06.
No 02, 2024 DOI: 10.
52985/insyst.
Figure 4.
VR data recording application.
IMPLEMENTING PADDING TECHNIQUES
Several padding methods were studied to understand how to implement them in the context of VR hand gesture data.
Commonly used padding techniques, such as zero and replication, are applied to balance the data length.
Specifically, variations like pre-zero, post-zero, pre- and post-zero, prereplication, post-replication, and pre- and post-replication padding are explored.
Figure 5 illustrates the explanation of these padding techniques.
Figure 5.
Comparison of padding techniques.
Teja E.
Tju, et.
: Hand Sign Virtual Reality DataA (October 2.
The selected padding methods were applied to the collected data and compared to identify the most effective approach for managing data length variation.
This process ensured consistency and integrity in the machine learning workflow.
The goal of evaluating different padding techniques was to improve the accuracy and efficiency of machine-learning models for interpreting VR hand gestures.
MACHINE LEARNING MODELING
After the data has been processed using padding techniques, machine learning models are developed and trained.
The data is divided into training data .
%) and validation data .
%).
Recurrent Neural Networks (RNN.
with Long Short-Term Memory (LSTM) cells are utilized for modeling.
LSTMs are a specific type of RNN that effectively handles sequential data and captures long-range dependencies while mitigating the vanishing gradient problem common in traditional RNNs .
, .
The model is trained on the training dataset and its performance is evaluated using the validation dataset.
Further testing is conducted with new, unseen data to assess the modelAos effectiveness in real-world scenarios.
EVALUATION AND TESTING
Thoroughly evaluating the performance of the trained model, several evaluation metrics were employed, including accuracy, precision, recall.
F1 score, and confusion matrix analysis .
These metrics provided a comprehensive view of the modelAos performance, identifying areas where it excelled and where further improvements were needed.
To further validate the robustness of the model, an additional 100 new data samples were collected.
These samples, which the model had not previously encountered, were used to test its performance on unseen data.
This step was crucial in determining the modelAos real-world applicability and ensuring that it could generalize effectively beyond the initial The evaluation process, therefore, not only confirmed the model's effectiveness but also guided subsequent refinement and optimization efforts, ensuring a reliable and efficient solution for interpreting VR hand gestures.
Figure 6.
RESULT AND DISCUSSION
The data collection phase successfully yielded 1,000 samples, evenly distributed across 10 selected ASL hand signs .
ttps://github.
com/umaruta4/SignLanguage_MTC_Data/tree /main/new_american_sign_languag.
Each sign contributed an equal number of samples, ensuring a balanced dataset for further analysis.
Figure 6 presents an example from the collected data, highlighting the 11 parameters that define each hand sign.
The overall graphs shown in Figure 6 illustrate the recordings of a specific hand sign movement, the horizontal axis of these graphs denotes the n-th Unity sampling, while the vertical axis values correspond to various sensor readings .
The data is organized into graphs labeled from Sensor 0 to Sensor 21, with a detailed explanation of the 11 parameters provided in Table 1.
The application of various padding strategies played a significant role in addressing the challenge of varying data lengths in VR hand gesture datasets.
The variations of zero and replication padding techniques were systematically applied to the dataset.
The results of these padding implementations, as depicted in Figure 7, show how the raw data in Figure 6 was transformed into a consistent format across all samples.
This uniformity was essential in preserving the data's structural integrity and ensuring that the machine learning algorithms could process the data without being influenced by inconsistencies in sequence length.
TABLE I
SENSOR DATA PARAMETERS
Left Hand Sensor 0 Sensor 1 Sensor 2 Sensor 3 Sensor 4 Sensor 5 Sensor 6 Sensor 7 Sensor 8 Sensor 9 Sensor 10 Right Hand Sensor 11 Sensor 12 Sensor 13 Sensor 14 Sensor 15 Sensor 16 Sensor 17 Sensor 18 Sensor 19 Sensor 20 Sensor 21 Parameter Trigger Touch Trigger Pressed Grip Pressed Thumb Touch Position X Position Y Position Z Quaternion W Quaternion X Quaternion Y Quaternion Z Vertical Axis Value Boolean: 0 or 1 Boolean: 0 or 1 Boolean: 0 or 1 Boolean: 0 or 1 Meter .
Meter .
Meter .
Scalar Vector Vector Vector Example of ASL AuGoodAy data with 22 parameters.
VOLUME 06.
No 02, 2024 DOI: 10.
52985/insyst.
Teja E.
Tju, et.
: Hand Sign Virtual Reality DataA (October 2.
Figure 7.
Examples of VR hand gesture data after applying various padding techniques.
VOLUME 06.
No 02, 2024 DOI: 10.
52985/insyst.
Teja E.
Tju, et.
: Hand Sign Virtual Reality DataA (October 2.
The LSTM-based neural network model, shown in Figure 8, was designed to process the sequential hand gesture data captured in the VR environment.
After applying padding techniques, the sequence length was standardized to match the maximum Unity sampling length, which in this case was set to 113 time steps.
Each time step in the sequence contains 22 feature dimensions, corresponding to the 22 parameters recorded from both hands during gesture performance.
Figure 8.
LSTM-based Neural Network model for VR dataset.
The model architecture begins with a masking layer that handles padded values by ignoring them while learning, ensuring that the model only processes relevant data.
This layer maintains the input shape of (None, 113, .
, where 113 represents the standardized sequence length and 22 denotes the feature dimensions.
The None indicates a variable batch The model's core is the LSTM layer, specifically designed to capture the temporal dependencies in the sequential gesture data.
The input shape for this layer remains (None, 113, .
, and an additional input (None, .
represents the mask applied to the sequence.
It outputs a Figure 9.
reduced representation with 64 features, highlighting the most significant aspects of the data across the 113 time steps.
Finally, the dense layer aggregates the information extracted by the LSTM layer, outputting a 10-dimensional vector, where each dimension corresponds to the 10 different hand sign movements during training.
This structure ensures the model can effectively classify the input sequences into the correct hand gesture categories.
The LSTM-based model was trained on 800 VR hand gesture samples, with 200 samples reserved for validation.
Figures 9 and 10 show the modelAos performance using different padding techniques.
For the zero padding technique (Figure .
, the model demonstrated alignment between training and validation accuracy, indicating effective learning with minimal The final accuracy confirmed the model's capability to interpret zero-padded data consistently.
Pre-zero padding achieved 0.
58 validation accuracy, indicating baseline effectiveness but challenges in maintaining data Post-zero padding yielded 0.
55 accuracy, with minimal impact from the padding position but some inconsistency in data representation.
Pre- and post-zero padding was the top-performing zero padding method, with 99 accuracy, providing balanced data representation.
Replication padding (Figure .
also produced promising results, with accuracy and loss curves reflecting consistent This method allowed the model to generalize well from the training data, highlighting the importance of selecting appropriate padding techniques.
Pre-replication padding excelled with 0.
97 accuracy, effectively preserving sequence structure for better model learning.
Post-replication padding 73 accuracy, performing better than zero padding but less effectively than Pre-replication Padding.
Pre- and post-replication padding showed strong results with 0.
accuracy, demonstrating robustness in maintaining temporal Training and validation accuracy and loss with zero padding.
VOLUME 06.
No 02, 2024 DOI: 10.
52985/insyst.
Teja E.
Tju, et.
: Hand Sign Virtual Reality DataA (October 2.
Figure 10.
Training and validation accuracy and loss with replication padding.
The trained LSTM-based model was comprehensively evaluated using metrics like accuracy, precision, recall.
F1 score, and confusion matrix analysis.
These metrics provided an overall assessment of the model's performance, helping to identify its general effectiveness in interpreting VR hand Figure 11 shows the confusion matrix for the validation dataset, illustrating how well the model predicted each hand sign after training.
The matrix indicates strong performance in some categories but also reveals specific hand signs where the modelAos predictions were less accurate, suggesting potential areas for further refinement.
Figure 11.
An additional 100 unseen data samples were tested to assess the model's robustness.
Figure 12 presents the confusion matrix for this test dataset, reflecting the model's ability to generalize to new data.
These matrices help to understand the model's performance across various categories.
Table 2 summarizes key performance metrics, including accuracy, precision, recall, and F1 score, for both the validation and test datasets.
The table also includes an overall ranking of the padding techniques based on these metrics, offering insights into which methods were most effective in ensuring accurate and consistent model performance.
Confusion Matrix for validation dataset.
VOLUME 06.
No 02, 2024 DOI: 10.
52985/insyst.
Teja E.
Tju, et.
: Hand Sign Virtual Reality DataA (October 2.
Figure 12.
Confusion Matrix for test dataset .
new dat.
TABLE II
AGGREGATE PERFORMANCE METRICS AND RANKING OF PADDING TECHNIQUES
Padding Technique Pre-Replication Padding Pre- and Post-Zero Padding Pre- and Post-Replication Padding Post-Replication Padding Pre-Zero Padding Post-Zero Padding Accuracy Val.
Test Precision Val.
Test The analysis of different padding techniques reveals notable variations in performance metrics, including Accuracy.
Precision.
Recall, and F1 Score, for both validation and test These metrics provide a comprehensive view of how each padding technique impacts model performance.
Pre-replication padding shows strong performance across all metrics, achieving high scores in both validation and test It ranks second in validation and first in the test dataset, indicating its robust ability to generalize and maintain a well-balanced model.
The consistently high accuracy, precision, recall, and F1 score suggest that this padding technique minimizes misclassifications effectively and performs reliably across different data splits.
In comparison, pre- and post-zero padding achieves perfect precision and high recall in the validation dataset, reflecting its strong performance in identifying positive cases within this controlled environment.
However, the techniqueAos performance slightly drops in the test dataset.
This decline may be due to the inherent differences between the validation and test data distributions, which can impact how well the model generalizes to new data.
Despite this drop, it remains VOLUME 06.
No 02, 2024 DOI: 10.
52985/insyst.
Recall Val.
Test F1 score Val.
Test Overall Rank Val.
Test highly effective and ranks first in validation and second in the test dataset.
Pre- and post-replication padding ranks third in both datasets, showing stable but not exceptional performance.
Although it provides balanced metrics, its scores are lower compared to the top two techniques.
This suggests that while it performs reliably, it does not reach the high levels of accuracy and balance achieved by pre-replication and preand post-zero padding.
Post-replication padding ranks fourth, with lower accuracy and F1 score compared to the higher-ranked This indicates a higher rate of misclassifications and less effective performance overall.
The lower metrics suggest that this technique is less capable of managing classification tasks with the same efficiency as the top Pre-zero and post-zero padding exhibit the lowest performance, ranking fifth and sixth, respectively.
These techniques show poorer accuracy, precision, and recall, leading to higher misclassification rates.
Their lower metrics reflect their limited effectiveness in correctly identifying positive cases and achieving balanced classification results.
Teja E.
Tju, et.
: Hand Sign Virtual Reality DataA (October 2.
Choosing the appropriate padding technique is crucial for optimizing model accuracy and generalization.
The top methods, pre-replication and pre- and post-zero padding offer robust performance and balanced metrics, making them suitable for effective model deployment.
Conversely, the lower-ranked techniques highlight areas where model performance could be improved, suggesting their lesser suitability for achieving optimal results.
IV.
CONCLUSION
This research highlights the significant influence of padding techniques on the performance of RNN models in interpreting VR hand gesture data.
Our findings reveal that selecting an appropriate padding method can lead to substantial improvements in model accuracy, precision, recall, and F1 score, even when utilizing simpler RNN architectures.
Specifically, techniques like pre-replication padding and preand post-zero padding demonstrate superior effectiveness.
Pre-replication padding consistently delivers high performance across all evaluation metrics, maintaining robust accuracy and generalization on both validation and test Meanwhile, pre- and post-zero padding shows excellent results in the validation phase but exhibits a slight reduction in performance during testing, indicating a potential sensitivity to unseen data.
These results highlight the critical role of selecting appropriate padding techniques to optimize model performance in sequence-based data processing.
They demonstrate that even with simpler RNN models, the use of strategic padding can substantially enhance learning efficiency and improve the model's ability to generalize from training data to real-world applications.
This emphasizes the need for thoughtful preprocessing choices in the design of sequence models to achieve robust and effective outcomes.
Looking ahead, future research could explore advanced padding strategies to optimize model performance.
Investigating the interaction between innovative padding methods and different RNN architectures could unlock opportunities for greater accuracy and efficiency, leading to more effective classification systems in VR and other These padding techniques enhance machine learning models' performance and flexibility in complex VR and real-life scenarios by standardizing data input, improving robustness, optimizing computational efficiency, and enabling cross-domain applications.
They can significantly improve applications like sign language to speech conversion by ensuring consistent and accurate data processing, enabling real-time translation of hand gestures captured in VR into This is crucial for developing assistive technologies that empower individuals with speech impairments.
As VR technology evolves, these padding strategies will be essential for creating more sophisticated, responsive, and adaptable systems for real-world interactions.
Continued refinement and innovation in these techniques will drive the next generation of immersive and accessible technologies.
VOLUME 06.
No 02, 2024 DOI: 10.
52985/insyst.
ACKNOWLEDGMENTS
The authors wish to express their profound gratitude to the Directorate of Research.
Technology, and Community Service.
Ministry of Education.
Culture.
Research, and Technology of the Republic of Indonesia, for their generous financial support through the 2024 Fiscal Year Research Program.
We are equally grateful to Universitas Budi Luhur for providing essential infrastructure, institutional backing, and an enabling environment that greatly contributed to the successful completion of this study.
The support, both financial and technical, has been invaluable in ensuring the research's success.
AUTHORS CONTRIBUTION
Teja Endra Eng Tju: Conceptualization, methodology, formal analysis, supervision, and writingAioriginal draft Julaiha Probo Anggraini: WritingAireview and editing, and project administration.
Muhammad Umar Shalih: Data curation, investigation, software, and validation.
All authors have read and approved the final version of the COPYRIGHT This work is licensed under a Creative Commons Attribution-NonCommercialShareAlike 4.
0 International License.
REFERENCES