International Journal of Electrical and Computer Engineering (IJECE) Vol. No. October 2025, pp. ISSN: 2088-8708. DOI: 10. 11591/ijece. Classifying the suitability of educational videos for attention deficit hyperactivity disorder students with deep neural Alshefaa Emam1. Eman Karam Elsyed2,3. Mai Kamel Galab1 Department of Computer Science. Faculty of Computers and Artificial Intelligence. Benha University. Benha. Egypt Department of Mathematics. Faculty of Science. Al-Azhar University (Girl. Cairo. Egypt Department of Computer Science. College of Information Technology. Misr University for Science and Technology (MUST). Giza. Egypt Article Info ABSTRACT Article history: This paper presents a comprehensive deep learning-based system to evaluate the educational videos' suitability for students with attention deficit hyperactivity disorder (ADHD). Current methods frequently ignore important instructional elements that are necessary for improving learning experiences for students with ADHD, such as instructor hand movements, video length, object variety, and audio-visual quality. We emphasize two key issues for how to address these difficulties, first, we present the ADHD online instructor (AOI) dataset, a particular benchmark for assessing instructional hand movement in video suitability to solve the absence of a reference dataset for classifying educational videos relevant to ADHD. Second, the system includes creating an enhanced multitask deep learning model that increases classification accuracy by using task-specific enhancements and optimized architectures. This solves the requirement for a strong model that can distinguish between suitable and unsuitable instructional content. Comprehensive tests using pretrained convolutional neural network (CNN) models indicate that the enhanced VGG16 model outperforms baseline methods by achieving a highest accuracy of 97. The results highlight the value of integrating deep learning methods with structured benchmark datasets, exposing up the path to more resilient and flexible instructional materials designed for students with ADHD. Received Jan 29, 2025 Revised Jun 24, 2025 Accepted Jul 3, 2025 Keywords: ADHD online instructor dataset Attention deficit hyperactivity Convolutional neural network Deep learning Video classification This is an open access article under the CC BY-SA license. Corresponding Author: Alshefaa Emam Department of Computer Science. Faculty of Computers and Artificial Intelligence. Benha University Benha. Egypt Email: shefaaemam97@gmail. INTRODUCTION With online learning growing at a rapid pace, the United Nations 2030 Agenda emphasizes the urgent need for high-quality, inclusive, and accessible education, including students with attention deficit hyperactivity disorder (ADHD). However, these students frequently find internet educational videos to be quite difficult, mostly because of the length, complexity, and general quality of the videos. Students with ADHD have special learning needs that are often overlooked by traditional teaching techniques and digital tools, which lowers their efficacy and engagement . The effects of many components of instructional films for students with ADHD, such as item diversity, video length, audio and visual quality, and the teacher hand movement, have been the subject of extensive research. Hand movement are among the most important nonverbal clues that can improve comprehension for students with ADHD. Research shows that using Journal homepage: http://ijece. ISSN: 2088-8708 gestures in the classroom increases the likelihood that students with ADHD will learn and participate . This suggests that watching movies with these cues may help students learn more effectively and pay attention for longer periods of time. Additionally, the length of movies is important for maintaining youngsters with ADHD's attention. Videos longer than fifteen minutes have been shown to reduce viewer attention and memory recall . Thus, setting sensible time restrictions for videos is crucial to maximizing the educational process. Additionally, improving attention and engagement requires high-quality audio and visual content. Poor quality content can hinder understanding and be distracting, especially for children who have trouble focusing . The contributions of the paper can be summarized in the following point: . A new system of evaluating educational videos that are suitable for students with ADHD is presented. The ADHD online instructor (AOI) dataset was created, regarding the absence of a dataset aimed at this purpose, it includes 10,600 photos that highlight important elements such as the instructor's hand movements. A fivephases content the suitability evaluation system was created, to make sure that it meets the needs of students with ADHD, this system efficiently analyses and categorizes video content using the enhanced VGG16 model in combination with the YOLO and OpenCV libraries. The paper is organized as follows: In section 2, examines the work relating to this paper. Section 3 describes the proposed ADHD system. In section 4, the results of the proposed system are examined. Section 5 provided a future direction for the suggested system and summarized the findings of this work. RELATED WORK The literature on deep learning-based video classification for ADHD students is reviewed in this It addresses five main issues: attempts to categories videos according to their suitability and quality. difficulties ADHD students face when learning online. the application of deep learning to video analysis. methods such as feature extraction and data augmentation. and comparisons with current approaches. These fields show the gaps in our existing understanding and give our research context. The impact of visual and auditory components on learning outcomes is highlighted by studies examining the function of sensory processing in educational settings. For example, studies show that movies with a moderate amount of visual complexity might increase learners' engagement by giving them enough stimulation without being too demanding . But there is not a thorough examination of ADHD-specific issues like attention management in this study. Similarly talks on how sensory processing problems affect learning and concentration more generally, but it does not offer any practical solutions for ADHD children . Additional studies examine how people with ADHD struggle to integrate audiovisual information, highlighting how problems with sensory integration can impair their interest and learning . This is furthered by study . , who investigate the difficulties faced by kids with ADHD who wear bilateral hearing aids, highlighting the necessity of specialized teaching strategies. Despite these realizations, research on sensory processing has not been widely incorporated into useful instructional frameworks. Research has also been done on the benefits of immersive technology, such augmented reality (AR), in video-based learning. By offering immersive and engaging experiences. AR can improve understanding and engagement for kids with ADHD, as shown by study . However, this study mostly concentrates on the advantages of AR without discussing the difficulties of incorporating these technologies into current educational institutions or their scalability for wider implementation. Although these studies highlight variables that support students with ADHD, they frequently focus on these components separately, ignoring a comprehensive approach. Numerous studies have examined gestures as a teaching tool, emphasizing how crucial they are for improving understanding and communication in classroom environments. Suggest a deep learning-based hand gesture detection system that enhances teacher-student interactions by accurately classifying gestures in real-time classroom settings . However, the study is limited by its focus on technological feasibility without evaluating its impact on long-term learning outcomes. FREITAS and investigate several forms of gestures and their roles in classroom training, stressing their usefulness in multimodal teaching . Yet, this study overlooks the limitations of standardizing gesture use across varied teaching styles. More comprehensive review emphasizes the benefits of gestures for engagement, memory retention, and comprehension while urging more study on how best to employ them in instructional tactics . In order to improve 3D spatial cognition, provide "Air Anatomy," a revolutionary method of teaching spatial anatomy in medical education through hand movements . Despite being novel, this approach is specialized and might not transfer well to other fields. With an emphasis on linguistic development, examine how deliberate gestures improve speaking and listening abilities . Nevertheless, there is no discussion of the study's relevance to non-linguistic subjects or situations including ADHD. The difficulties that students with ADHD encounter in conventional classroom settings highlight the necessity of focused interventions. Teachers ignorance of ADHD has an impact on classroom management and instructional efficacy . They advocate for increased awareness and training to improve teaching methods. However. Multimedia content that is Int J Elec & Comp Eng. Vol. No. October 2025: 4889-4898 Int J Elec & Comp Eng ISSN: 2088-8708 tailored to the learning needs of students with ADHD is crucial because these students frequently struggle with standard teaching techniques. Visual materials have the potential to improve comprehension and attention, but most study only looks at their individual components, which limits our knowledge of how successful they work together. Furthermore, the existing research does not provide useful frameworks or recommendations to assist instructors in choosing relevant resources for students with ADHD. This paper fills these gaps by putting forth an AI-based classification system that assesses educational videos according to how suitable they are for students with ADHD. The suggested system provides objective and consistent evaluations of ADHD-friendly features by automating the evaluation process using deep learning techniques and incorporating knowledge from AR-based teaching. This system gives teachers access to excellent, customized video content, which eventually enhances the educational experiences of students with ADHD. Through systematic and verified assessment, the ADHD video classifier aims to function as a comprehensive solution that serves the various needs of these students. PROPOSED ADHD SYSTEM The two main phases of the suggested ADHD video classifiers are data preprocessing, training, and decision, as shown on Figure 1. The subsections that follow provide specifics on each phase. When determining if educational videos are suitable for students with ADHD, several elements need to be carefully considered during the pre-processing phase. As shorter videos are better at holding the attention of students with ADHD, the procedure starts with uploading the video and evaluating its length. Videos that go above a specific amount of time are not included. After the length of the video has been determined, its quality is evaluated by looking at both the audio and the visuals to make sure there is no background noise and clarity . , . If the video satisfies these standards for quality, a sample segment is taken out for additional The video's frames from various sections are included here, and their content richness and engagement are assessed. To assess the video's complexity and make sure it has a diversity of items, object detection algorithms are performed on the crucial frames . The video is considered inappropriate if any frame fails this test. The frequency and level of hand movement being more than its stillness in the video are the final factors considered, with videos with insufficient motions being disqualified. A video cannot be deemed suitable for ADHD students unless all these tests are completed. Only the most appropriate and effective content is chosen thanks to this thorough screening process. Figure 1. Workflow for ADHD video classification using multi-task models Dataset description The suitability of online instructional videos for students with ADHD is categorized using the AOI dataset . , which is the subject of this research. 10,600 photos make up this dataset. 5,300 of them are from videos that were judged appropriate for students with ADHD, and the remaining 5,300 are from videos that were ruled inappropriate. The process of creating the dataset started with gathering 100 YouTube videos, of which 50 were deemed appropriate for ADHD students and the other 50 inappropriate. Frames were retrieved from these videos. 14,321 frames from appropriate videos and 28,241 frames from inappropriate videos were 10,600 photos with an emphasis on frames in which the instructor's hand was moving were carefully chosen to create a balanced dataset. By capturing the crucial elements that impact understanding and engagement for ADHD students, this selection system guarantees a balanced dataset. With the help of the AOI dataset, a system that can be appropriate for students with ADHD as well as non-ADHD students can be created, improving the efficacy and quality of online learning materials for a range of student populations. Classifying the suitability of educational videos for A (Alshefaa Ema. A ISSN: 2088-8708 Data preprocessing phase The hundred instructional videos in the ADHD video classifier dataset were hand-picked from YouTube and are categorized as "suitable" or "unsuitable" for students with ADHD. All videos were converted to frames. 14,321 frames from appropriate videos and 28,241 frames from inappropriate videos were obtained. To guarantee that pixel values are scaled to a constant range, typically between 0 and 1, all images in the training, validation, and test sets are normalized. To further differentiate the sorts of content in each category. Figure 2, the first Figure 2. displays images from appropriate videos, while the second Figure 2. plays images from inappropriate videos. This normalization aids in accelerating the convergence of model training. The training, validation, and test sets of the dataset were split apart. Seventy percent of the photos from both categories are in the training set, fifteen percent are in the validation set, and the remaining fifteen percent are in the test set. This distribution guarantees a systematic approach to model evaluation and training. To guarantee balanced representation, 5,300 frames from each category were selected out of a total of 10,600 frames in the dataset for model training. For every category, the dataset was divided into three subsets: 795 images for testing, 9,010 images for training, and 795 photos for validation, as shown in Figure 3. Figure 2. Few samples of . suitable, and . unsuitable in the ADHD dataset Figure 3. Illustrative structures of the data distribution in the training set, validation set, and test set Training phase Video content has been evaluated for ADHD the suitability using a variety of methods and MoviePy and PyDub assessed audio quality, while OpenCV managed video length, visual quality, frame interval computations, and key frame extraction. Wget downloaded the YOLO COCO names file. Int J Elec & Comp Eng. Vol. No. October 2025: 4889-4898 Int J Elec & Comp Eng ISSN: 2088-8708 Shutil compressed files, and the OS library handled directories. For object detection, frame analysis, and loading the YOLO model. OpenCV and NumPy were applied. The analysis process was expedited by these It describes the exact software used at each phase, such as object detection, frame extraction, data compression, visual and audio quality tests, and video length verification. This ensures a thorough and effective analysis of the video material. The VGG16 architecture was utilized as the foundation network for the training phase of the ADHD video classification model due to its established performance in image classification tasks. This architecture was pre-trained on the ImageNet The initial freezing of the convolutional layers of VGG16 allowed the training process to concentrate solely on the newly added fully connected (FC) layers, which were specifically designed for categorizing video content related to ADHD . This strategy allowed the new layers to adapt to the characteristics of the ADHD dataset by utilizing the powerful features identified from the ImageNet dataset. To perform finetuning, the last four layers of the convolutional basis were then unfrozen . The model performed better by modifying the more complex, task-specific attributes that are essential for the ADHD classification task than it did with the general features it had gained from ImageNet. The training process of the model used ten epochs, 64-batch size, and 1e-4 learning rate. Advanced training techniques, such as model checkpoint and early stopping, were applied to improve performance. The model version with the highest validation accuracy was saved by model checkpoint to ensure that the bestperforming model was kept for testing. Early Stopping, which terminates training as soon as validation accuracy ceases to improve, promoted the model's generalization to new data and prevented overtraining on Thus, overfitting was avoided. The VGG16-based ADHD classifier was trained using all the hyperparameters listed in Table 1. The original VGG16 model and the enhanced version created especially for categorizing educational videos for students with ADHD are contrasted in this section. The changes were made to the model in order to adjust it for binary classification . uitable vs. according to the particulars of the dataset. The main variations in model architecture, training techniques, and the performance gains attained following these adjustments are highlighted in the Table 2. Table 1. Values assigned to the hyper-parameters of the adapted VGG16 model Hyper-parameter Input size Count of units in the first FC layer The first FC layerAos activation function Count of units in the second FC layer The second FC layerAos activation function Loss function Batch size (BS) Optimization function Initial learning rate (LR) Corresponding setting ReLU SoftMax Categorical cross-entropy Adam Table 2. Comparison of original and modified VGG16 models for ADHD video classification Aspect Purpose Classifier Flatten layer Layer freezing Fine-tuning Optimization Overfitting Performance Original VGG16 General image classification . ,000 categorie. Three fully connected layers. Modified VGG16 Binary classification of ADHD video Two layers . units 2 units SoftMa. Present after convolutional None Added after convolutional layers. Not applied SGD with variable learning rate. None Fine-tuned last four convolutional layers. Adam with learning rate of 1e-4. Early stopping and model checkpoint. Accuracy: 87%. Accuracy: 97. Convolutional layers frozen. Reason for modification Task-specific focus for ADHD Simplified for binary Compatibility with modified Retains ImageNet features, reduces overfitting. Adapts to ADHD-specific data. Better suited for smaller datasets. Reduces overfitting with limited Enhanced performance for ADHD classification task. EXPERIMENTAL RESULTS AND DISCUSSION The results and learnings from the assessment of the ADHD video classifier are shown in this The data preprocessing and training phases of the analysis are covered in a way that emphasizes the procedures and efficacy of the system. The objective is to evaluate the system's efficacy in identifying instructional videos that are appropriate or inappropriate for students with ADHD and to present a summary of the methodologies and findings. Classifying the suitability of educational videos for A (Alshefaa Ema. A ISSN: 2088-8708 Results of training phase The model's performance on a test set was evaluated using a few significant indicators, which are summarized in Table 3. The evaluation findings demonstrated the model's exceptional memory, accuracy, balanced accuracy, and confusion matrix, which demonstrated how well it classified video content related to ADHD . The specificity score shed more light on how well the program could identify unfavorable occurrences . , . Accuracy is defined as . : TP stands for true positives when actual and predicted are suitable. TN for true negatives when actual and predicted are unsuitable. FP for false positives when actual and predicted are unsuitable, and FN for false negatives when actual and predicted are unsuitable: Accuracy (%) = ( Precision (%) = ( TP ycNycA TP TN FP FN TP FP Recall (%) = ((TP FN) F1-Score (%) = 2y ( . Precision y Recall Precision Recall Balanced Accuracy (%) = ( Specificity (%) = ( . TN FP Sensitivity Specificity . The confusion matrix, shown in Figure 4, provides a visual representation of the model's performance by displaying the distribution of true positives, true negatives, false positives, and false negatives . The model's capacity to distinguish between suitable and inappropriate video content for ADHD is shown in this matrix. The training and evaluation results show how well the modified VGG16 model can identify information about ADHD in videos. By carefully combining the freezing and fine-tuning of the VGG16 layers with the application of state-of-the-art techniques such as model checkpoint and early stopping, a very accurate and reliable model has been generated. Furthermore, the graph of accuracy and loss for the training and test sets over the epochs is displayed in Figure 5 . Table 3. Performance results of the ADHD classifier on the test set Performance metric Accuracy Average Sensitivity (Recal. Average Precision Average F1-Score Average Specificity Balanced Accuracy Value Figure 4. Confusion matrix of the ADHD video classifier Int J Elec & Comp Eng. Vol. No. October 2025: 4889-4898 Int J Elec & Comp Eng ISSN: 2088-8708 Figure 5. Training and testing loss and accuracy over epoch Results comparison with existing methods phase Apart from the video evaluation, we also evaluated the efficacy of four well-known deep learning models on our dataset: VGG16. ResNet50. InceptionV3, and EfficientNetB0. The results are compiled in Table 4 . Ae. VGG16 was chosen because of its proven ability to classify images with high accuracy 93% in training and 97. 73% in validation. ResNet50 achieved a validation accuracy of 78. 67%, but it trailed VGG16 by a small margin. ResNet50 is renowned for its creative usage of skip connections to alleviate the vanishing gradient problem. Table 4. Comparison results with other of deep learning models Model Enhanced VGG16 ResNet50 InceptionV3 EfficientNetB0 Training accuracy (%) Training loss Validation accuracy (%) Validation loss Table 5 shows a comparison of two recent research with the proposed model. Importantly, the analysis of hand movement as a visual signal for categorization tasks is a shared basis among all three In contrast to the first study . , which used hand movement in radiographic imaging to estimate joint damage in rheumatoid arthritis, and the second study . , which used hand gesture recognition to interpret everyday activities in medical and educational contexts, our study uses hand and visual cues to determine whether instructional videos are suitable for students with ADHD. Our comparison's validity is increased by this common goal, which highlights how crucial it is to choose models that can accurately depict minute hand movements. An accuracy of 91. 48% was found in the initial study . , which used a hybrid RANet SVM QNN model on a short dataset of 600 samples. The accuracy of the second study . , which used a 1D CNN on a dataset of 2,700 samples, was 89. On the other hand, our suggested model, which was trained on a much larger dataset of 10,600 instructional video frames and was based on VGG16, achieved a significantly higher accuracy of 97. This shows the efficiency and expandability of our Furthermore, our system adds usefulness beyond classification because it is particularly made for assessing instructional content related to ADHD, in contrast to previous studies. Table 5. Enhanced model performance comparison with different dataset categories Paper ID Model used RANet SVM QNN 1D CNN Dataset size 2,700 Accuracy (%) 9148% and 0. Proposed VGG16 10,600 Notes Early stopping at epoch 5, limited dataset size. Applied to hand gestures used in routines in education and medicine. High scalability and precision with a focus on video classification that is ADHD-friendly. Classifying the suitability of educational videos for A (Alshefaa Ema. A ISSN: 2088-8708 CONCLUSION AND FUTURE WORK Students with ADHD may find online learning materials highly challenging to use, especially because they lack supportive components that help in understanding and attention maintenance. One of the main causes of these difficulties is the lack of tools for efficiently evaluating content that is especially suited to the requirements of students with ADHD. A thorough five-phases-system is presented in this work with the goal of filling this gap. The evaluation criteria include instructor hand movement, video length, audiovisual quality, and diversity of information. According to preliminary findings, the hand movement model successfully differentiates between instructional videos that improve learning for students with ADHD and those that do not, achieving an accuracy of 97. This system's creation represents a substantial achievement in the field of educational technology. It gives parents, teachers, and content producers useful tools that let them choose or produce videos that are most suited for students with ADHD. This model's excellent performance highlights the value of systematic review in evaluating online material, especially about the components that are most pertinent to maintaining focus and increasing engagement. Future work could improve phase one by creating an adaptive system that modifies the duration of videos according to students' interest and engagement, guaranteeing a succinct and powerful presentation. By taking lighting, sound, and resolution variations into consideration, audio-visual quality in phases two and three can be enhanced, improving performance in a variety of environments. Phase four might be extended to assess more instructional materials in order to improve the diversity of the content. In phase five, increased attention to small changes in instructors' hand movements can enhance the accuracy of identifying instructional Lastly, using AI-driven student engagement analysis may provide insights for producing more successful ADHD-focused content. When combined, these enhancements would promote a learning environment that is more tailored and flexible. FUNDING INFORMATION Authors state no funding involved. AUTHOR CONTRIBUTIONS STATEMENT This journal uses the Contributor Roles Taxonomy (CRediT) to recognize individual author contributions, reduce authorship disputes, and facilitate collaboration. Name of Author Alshefaa Emam Eman Karam Elsyed Mai Kamel Galab C : Conceptualization M : Methodology So : Software Va : Validation Fo : Formal analysis ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue : Investigation : Resources : Data Curation : Writing - Original Draft : Writing - Review & Editing ue ue ue ue ue ue ue ue ue ue ue ue Vi : Visualization Su : Supervision P : Project administration Fu : Funding acquisition CONFLICT OF INTEREST STATEMENT Authors state no conflict of interest. DATA AVAILABILITY Data available at this link: https://w. com/datasets/alshefaaemam/adhd-online-instructoraoi-dataset. REFERENCES