JOIV : Int. Inform. Visualization, 8. : IT for Global Goals: Building a Sustainable Tomorrow - November 2024 1671-1677 INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION journal homepage : w. org/index. php/joiv Uncovering the Most Effective Pedagogical Techniques for Math Education Using Machine Learning Said Elnaffar a,*. Mohamed Fawey b Computer Science Department. Canadian University Dubai. Al Safa Street. Dubai, 117781. United Arab Emirates College of Computing and I. Arab Academy for Science. Technology and Maritime Transport. Egypt Corresponding author: *said. elnaffar@cud. AbstractAiMany new math educators express that their first years in the teaching field are extremely challenging. They struggle to discover and apply the most effective teaching techniques and behaviors, often without the support of more experienced colleagues. this study, we use machine learning to find the strategies that novice teachers can adopt to enhance their teaching effectiveness. The core aim is to uncover the relationship between teachersAo performance, as assessed by student evaluations, and their pedagogical These strategies are derived from the final decision tree model, which is trained on a large dataset of empirical data from The data consists of input from 72 math teachers of grades 7-9 and their students in Dubai, used to train two decision tree models: a classification tree and a regression tree. The structure of these trees is analyzed to identify and rank the effectiveness of nine teaching techniquesAisuch as Visualization. Practice. Math Rules. Gamification. Collaboration. Problem-Solving. Case Studies. Assessments, and Language SwitchingAiand four behavioral methodsAisuch as Inspiration. Engagement. Entertainment, and BondingAi in relation to the Student Evaluation Index (SEI), which is derived from student feedback. Results indicate that techniques such as "gamification" and "inspirational behavior" are consistently associated with higher SEI scores across different tree However, factors such as the demographics and culture of both students and teachers may need to be considered when generalizing these findings to other regions of the world. KeywordsAiEducation. machine learning. decision tree. professional development. Manuscript received 8 Jun. revised 14 Sep. accepted 21 Oct. Date of publication 30 Nov. International Journal on Informatics Visualization is licensed under a Creative Commons Attribution-Share Alike 4. 0 International License. INTRODUCTION The UNESCO Institute of Statistics stated in 2016 that the world must recruit 68. 8 million new teachers to provide every future child with primary and secondary education . Current strategies to assist new teachers require many manhours from both new and experienced educators. Understanding the best teaching techniques is considered critical in intermediate education. This is especially true for fundamental subjects such as mathematics, as it paves the way for later subjects. Research shows that teaching mathematics is a challenging task . This can be attributed to many teachers not knowing or using suitable techniques to teach mathematics at a high level. Additionally. Fadlelmula et al. states that there exist social and psychological barriers including fear of failure, lack of confidence, and lack of supportive colleagues. These obstacles may affect teachersAo performance, leading to students not mastering essential mathematics concepts. This could prevent students from pursuing STEM . cience, technology, engineering, and mathematic. careers later in life that not only benefit society, but their own potential livelihood as well. Support networks and continuing education programs for educators are inconsistently effective in helping teachers develop further. Hammond et al. notes some deficiencies of current professional development (PD) for teachers and analyzes some common features of successful PD programs . The research conducted in our paper seeks to explore a specific direction of PD which is optimizing the content which includes pedagogical techniques and behaviors. Researchers and practitioners from different fields seek solutions via machine learning techniques including sentiment analysis . , . , security and phishing attack detection . , . , health-related issues such as calories burn prediction . and ECG heartbeat classification . , prediction of travel insurance purchases . , and more. Nevertheless, the advantages of traditional teaching methods are counterbalanced by their disadvantages. Teachers frequently focus on extensive note-taking and rote memorization, which can hinder a deep comprehension of fundamental concepts. This approach, aimed at delivering maximum information, often leads to a loss of student interest and understanding. Additionally, traditional classrooms often lack real-life application examples, resulting in students feeling disengaged from the material being taught . Applications of machine learning (ML) have proliferated to many fields including education . Some common applications include decoding studentsAo problems, task automation and performance evaluation . Machine learning carries the benefit of being able to capture and distill trends in vast quantities of data. As such, they may be able to improve or augment traditional professional development programs due to the vast amount of information they can generalize and Some researchers tried to predict the dropout for college students in MOOCs based on weighted multi-features . , while others built proposed solutions for class imbalance as they tried to predict students graduating in time . machine learning model can reduce the hours required to provide new teachers with invaluable information to streamline their PD. This work seeks to convert real life teacher experience into data to be represented by decision tree models which are utilized to assist future teachers in the The main contribution of our work is the identification of the most effective techniques that new teachers can use to enhance their teaching performance. These techniques are recommended by the final decision tree model trained on a substantial volume of empirical data from schools. In the following subsections we give a background about the traditional and contemporary teaching techniques, supervised machine learning, decision trees, and the principal Component Analysis (PCA) method. Contemporary Teaching Techniques Many investigators found in novel technological ideas a means to support education. For example, during challenging times, such as the COVID-19 pandemic, some researchers . turned to technology to help young students learn Discrete Mathematics by building a platform called LearnwithEmaa. Over the course of a month, students interacted with an agent named Emma and reported significant improvements in their mathematical understanding and exam performance. The modern or contemporary teaching approach emphasizes activities and places the learner at the center of curriculum planning and instruction. In this constructivist method, learners actively engage in the process to expand their knowledge and hone their skills . Meanwhile, the mentor or instructor guides and supports students in focusing on the subject through interactive activities. This modern pedagogy fosters cooperation, reduces competition among students, and creates a healthy learning environment . In recent years, the breadth of knowledge in science and technology has significantly increased, along with humanityAos ability to adapt to new information in these fields. Consequently, there is a substantial demand for imaginative and creative minds to explore uncharted and unexplored areas across various professions. Education in the 21st century should be designed to prepare students for this era, which is driven by technological advancement and the development of individuals, societies, and nations. Modern teaching methods should be employed to educate future generations, providing them with the necessary information to fully capitalize on emerging opportunities . , . The coupling between teaching and learning, and artificial intelligence methods increased remarkably in the research community the recent years . For example, the authors in . discussed the role of using digital technologies for teaching mathematics. Similarly. Wardat et al. discussed the challenges, practices, and new perspectives of using AI in mathematics education. A systematic literature review of the use of technology in the pedagogy of mathematics was conducted by Aliyu et al. Traditional Teaching Techniques Classical teaching methods have their share of advantages . such as: Cost-effectiveness: Conventional methods, including in-person teaching, are more affordable compared to contemporary educational techniques. This affordability allows them to be adopted by rural areas without the burden of excessive costs. Efficiency in Learning: The focus of conventional educational settings is on the swift conveyance of a large volume of information within a minimal timeframe. Efficiency in Learning: The focus of conventional educational settings is on the swift conveyance of a large volume of information within a minimal timeframe. Suitability for Certain Subjects: Subjects like physics, chemistry, and mathematics often benefit from direct instruction using a blackboard, as concepts in these fields can be more easily grasped through visual chalkboard Supervised Machine Learning Machine learning (ML) involves developing algorithms that indirectly solve problems by detecting patterns in data. , . ML algorithms or models can solve complex problems without explicit programming instruction that would directly affect the conclusions or solutions. Several notable use cases are in object detection, natural language processing, data forecasting and robotic path planning among Suitability for Certain Subjects: Subjects like physics, chemistry, and mathematics often benefit from direct instruction using a blackboard, as concepts in these fields can be more easily grasped through visual chalkboard No Need for Advanced Technology: Conventional teaching methods do not require educators to be skilled in modern technological tools, such as incorporating computers into classroom instruction. PCA transforms the data into new variables known as principal axes, which are derived from the datasetAos The principal axis that exhibits the most variance is deemed to contain the most information. selecting the axes with the highest variance and discarding less informative ones. PCA simplifies the dataset. This leads to the elimination of certain input features that contribute minimal new information. Implementing PCA can enhance model accuracy and reduce the time required for model training and inference. However, it also makes the task of extracting feature importance more challenging, as the original features are transformed and combined into new principal axes. Fig. 1 A regression decision tree that predicts student performance given pedagogical techniques adopted. ML attempts to perform actions based on previous experience which is represented by large datasets. Large amounts of data samples are required for training, which usually include input features and output labels. This is called Supervised Learning. However, there exist Unsupervised ML methods that only require inputs and will draw conclusions without knowledge of predefined answers. Supervised Learning is a subset of ML techniques that uses examples with input features and expected outputs values. For example, consider the simple function Equation 1. Y=f(X) II. MATERIALS AND METHOD Research Goals In this research we aim to find answers to the following C Is it possible to predict public-school educatorAos effectiveness based on student satisfaction and academic achievement? C Can teaching techniques or behaviors be ranked to guide educators in Dubai on which to adopt for maximizing student satisfaction? Related work in teaching evaluation and prediction models applied to student performance indicate that machine learning evaluation of teaching methods can potentially be informative . , . The function f used to bind input features (X) to the desired outcome (Y) represents a machine learning model. Supervised methods will compare the predicted output of f() to the expected output to generate an error value. This error value, depending on the ML algorithm, is used to update the parameters of the function to produce the desired behavior. Decision Trees as a Classification Method The decision tree method is a recursive algorithm for splitting data, applicable to both regression and classification. Despite its age and simplicity compared to techniques like neural networks, it's still a potent machine learning strategy. Asanbe et al. proves that decision trees used in the similar context of teacher performance evaluation from demographic, and education experience data can perform well. A decision tree is a visual representation of decisionmaking scenarios that, when repeated, result in complex branching patterns (Fig. These trees are utilized to derive insights from large datasets by establishing decision rules. The core principle involves dividing datasets using the most suitable decision rule for that set. These similar subsets are then recursively divided until all subsets are internally consistent or meet another criterion. This process creates a tree-like structure that can be navigated for predictive tasks. Decision trees are widely available in various data science programs, platforms, and machine learning libraries, making them easy to implement swiftly. Principal Component Analysis (PCA) as a Dimensionality Reduction Method Principal Component Analysis (PCA) is a method used to reduce the dimensionality of a dataset by identifying and eliminating correlations among features. It operates by computing the covariance of all input features to understand their interrelationships. Features that exhibit high correlations may be considered redundant and thus suitable for removal. Data Collection A teacher self-assessment survey and a matching student evaluation of teachers were developed and voluntarily distributed to government school educators in Dubai to gather The teacher survey gathers some demographic information about teachers such as location, school, relevant years of experience and so on as well as information on 13 techniques and behavioral methods described later in this section. The studentsAo teacher evaluation survey consists of demographic information and the 3 student evaluation questions. To ensure the privacy of teachers and students, the survey results must not impact their careers or learning experiences. Neither survey gathered names or other identifying details. Teachers were asked to create a unique 4-character ID using letters from the English alphabet and numbers 0-9, as shown in the following example: A89L Codes such as this were distributed by teachers to their respective students. Teachers and students input this code into their surveys, which allowed the dataset to associate teacher responses to their respective classes while protecting teacher and student anonymity. Target Population The target populations are grades 7-9 mathematics teachers at government schools in Dubai. The following teachers were excluded from the study: C Teachers who were not currently teaching in these grades but had previously done so. Leaving these data samples as is would severely bias the results as certain teachers who received few student responses could potentially enjoy a very high score which may not be representative of their entire class. We will explain how we resolved this issue shortly. In this research, student scores were pre-processed by first calculating the average of the three student survey question These averaged scores were then further averaged across each teacherAos class to create a single value, known as the Student Evaluation Index (SEI). As a result, the decision tree algorithmAos output class is now represented by this unified SEI value. This dual averaging approach simplifies the decision treeAos task and can help eliminate outliers caused by student bias or other noise. However, it also reduces the datasetAos granularity, potentially omitting important information that could impact accuracy. For teachers whose student scores were dropped, a proxy for SEI is introduced which is the average score of the teacherAos own technique and behavior scores. While this allows us to retain all the teachers which increase the number of data samples for the decision tree models, it can potentially generate incorrect predictions and feature importance rankings. Further discussion about the effects can be found in the next section. A duplicate dataset is generated where the SEI is grouped 25 intervals to facilitate the use of the classification decision tree. For instance, an SEI of 4. 3 is rounded to 4. the nearest 0. 25 interval. The classification decision tree will then use 4. 25 as the output label for this data point. For the regression decision tree, the SEI remains unchanged. Due to the scoring ranges of all input features being on a Likert scale of 1 to 5, there is no need to normalize the data. However, copies of the data for both classification and regression datasets are made that is standardized, removing the mean and scaling with the unit variance for use with PCA. Furthermore, datasets which contain 95% of the variance, as determined by using PCA, are generated. After the teacher and student survey preprocessing, this resulted in datasets containing 72 teacher feature inputs and 72 SEI outputs. Further discussion on the results is next. Teachers who were not actively teaching in the C Part-time or student teachers. C Teachers who did not teach mathematics full time. C Private school teachers. There were no restrictions on which students could take the student survey as long as they were a member of their mathematics teacherAos class and were enrolled in grades 7-9. Data Composition The decision treeAos features include scores for 4 behavioral methods and 9 teaching techniques, scored on a Likert Scale from 1 . to 5 . lmost alway. These behaviors and techniques, slightly modified from research on widely used mathematics pedagogies, cover both traditional and modern approaches . Each teacher self-reported these techniques in their survey. The teaching techniques are as follows: Visualization involves using visual tools to aid teaching. Practice entails giving students homework and practical Mathematical Rules means teaching through theoretical principles, rules, proofs, and laws. Gamification uses games to communicate ideas and teach subjects. Collaboration encourages students to work together in pairs or groups. Solving Problems/Discovery involves knowledge through storytelling or hands-on activities. Case Studies/Scenarios present students with real-life or business-related applications or concepts. Assessments evaluate studentsAo knowledge through quizzes and feedback. Language Switching involves using studentsAo native language to ensure clear and accurate communication. The behavioral methods are as follows: Inspiration implies moving students using intriguing facts, curiosity, fun ideas, and odd examples. Engagement implies the accommodation students with different backgrounds by adapting lesson plans. Entertainment implies injecting joyful activities in Bonding implies establishing a close and empathetic relationship between students and their teacher. We distributed a survey to students that consisted of three questions scored on a Likert Scale of 1 . trongly disagre. to 5 . trongly agre. Post-class. I retain a clear grasp of the mathematical concepts and recall my teacherAos instructions. The mathematics lessons in my class are engaging, and my teacher motivates me to deepen my knowledge of I am in favor of future mathematics courses being conducted in a comparable manner. RESULTS AND DISCUSSION Python3 and Scikit-Learn were utilized to implement the PCA and decision tree algorithms, while Pydotplus and Matplotlib were employed to visualize the resulting tree. Classification Tree To find the best classification tree, the dataset is split into 70% and 80% training data leaving 30%, and 20% of the data for testing, respectively. Two additional datasets split with these ratios are generated, one normally scaled, and the other contains features that PCA determines contains >95% of the GridSearchCV was employed to identify the optimal hyperparameters for each classification tree based on various data splits. The GridSearchCV function utilized 5fold cross-validation. We observed that the best performer tree was the one trained on the 80-20 PCA data, with a minimum of 1 sample per leaf and a maximum tree depth of Impurity and information gain were measured using the Gini index metric. The classifier generated a feature importance ranking, which could be directly extracted. Results are detailed in section i. Preprocessing The original dataset contains 72 teacher surveys. However, the amount of student responses was inconsistent between teachers to a large degree. For example, some teachers received many responses while other teachers received very As such, certain teacherAos associated student surveys were ignored to mitigate the effects of low sample size. TABLE I SUMMARY OF SCORING METRICS FOR THE CLASSIFICATION AND REGRESSION a minimum of 3 samples per leaf. The Mean Squared Error (MSE) was used as the information gain metric for the regression tree, while the R-Squared (R. score measured its The classifier also generated a feature importance ranking, which could be directly extracted. Results are shown in the next section. TREES BASED ON DIFFERENT DATASETS AND DATASET SPLITS Scoring Dataset Split Original Scaled PCA Classification Accuracy Regression MSE & R2 Classification TreeAos Performance The best performing classification tree trained on the 8020 PCA dataset had an accuracy of 67%. PCA determined that the following features contain low variance: Visualization and Language. The feature rankings derived from the treeAos structure yielded the following features as the most influential SEI: Gamification. Engagement. Inspiration. Collaboration and Entertainment. Results can be found in Table I and feature importance can be found in Fig. Regression TreeAos Performance The best performing regression tree trained on 70-30 PCA data yielded an R2 score of 0. 661 and a MSE of 0. Consequently, on average, the regression prediction was 338 points away from the AucorrectAy Student Evaluation Index score. PCA determined that the following features did not contribute much information: Visualization. Language, and Inspiration. The top 4 most important features are: Gamification. Entertainment. Collaboration, and Case Studies. Further techniques and behaviors were deemed to add no extra information to the tree. Results can be found in Table I and feature importance can be found in Fig. Fig. 2 Feature importance ranking of the best performing classification tree. PCA 80-20. Note, after PCA, 5 additional features were deemed to not affect decision tree prediction ability IV. CONCLUSION The accuracy of the classification tree when it came to predicting the SEI was only 67%. The subpar precision may be due to the coarse categorization of SEI, as detailed in Section i. Grouping the SEI into quarter-point increments could result in adjacent predictions being classified as incorrect, even though they are nearly correct. Qualitatively, this can still be considered an exact prediction. Quantitatively, this is incorrect and as such, accuracy suffers. As such, there likely exists a subset of predictions that can be considered qualitatively accurate despite being quantitatively inaccurate. Furthermore, the augmentation of student response data for teachers with extremely low response rates can affect the results as well. Due to the nature of the augmentation, averaging the teaching technique and behavior scores for that teacher, teachers with self-reported high scores will automatically have a high SEI. This may not be true of reallife student responses as they do not necessarily vary linearly with teacher scores. The regression treeAos accuracy performance was also mediocre with an R2 score of 0. 661 and an MSE of 0. mentioned before, the classification tree aggressively bins SEI scores into groups as opposed to the regression tree. However, this does not necessarily increase the accuracy of the regression tree as the predicted SEI values can still lie beyond what is considered quantitatively accurate. The MSE of 0. 114 points translates into an average SEI error 338 points for any given prediction or about 1. 5 intervals. Once again, this can be considered decently accurate qualitatively in the context of teacher evaluation. Fig. 3 Feature importance ranking of the best performing regression tree. PCA 70-30. Note, after PCA, 6 further features are not used at all in the tree and the decision largely rests with the Gamification technique Regression Tree Once again, to find the best regression tree, the dataset is split into 70% and 80% training data leaving 30%, and 20% of the data for testing. Two additional datasets are created from these splits, one normally scaled, and the other contains features that PCA determines contains >95% of the variance. Using 5-fold cross-validation. GridSearchCV identified the optimal regression tree, which utilized a 70-30 PCA split. This configuration resulted in a maximum tree depth of 4 and This strongly suggests that educators may need to focus on educational games to improve SEI. Remember. SEI is not only student performance, but student satisfaction as well. Games of all types are inherently a form of fun or It intuitively follows that they are popular with almost everyone, no matter the age. In summary, the research conducted suggests that educators can improve their usage of games in their teaching as well as improving their behavioral techniques to improve the performance and satisfaction of their students the quickest. It should be noted that the R2 score in the context of decision trees is not an absolute measure of accuracy. However, an R2 of 0. 661 indicates that the regression tree can model this dataset decently. Ranking Features based on Importance Ranking features based on their importance gave intriguing insights . ee Fig. 2 and Table II). Despite the dataset undergoing PCA, the best classification tree only used 6 of the remaining 11 features. Additionally. Language, using the studentAos native language, can be logically deduced to be one of the important factors in student satisfaction and However, the PCA deems it to have little effect on SEI. The classification tree ranks Gamification as the most important feature by a large margin. Using games to teach greatly influences the SEI with a score of 0. Limitations One potential limitation with respect to predicted SEI is the two rounds of averaging during the data preprocessing. This causes a lot of potentially useful trends in data to be simplified or hidden. Further research with not augmented student data could potentially yield different results as well. Another limitation is the relatively small number of survey participants. The performance of both trees could be more satisfactory if given more teacher-student samples to draw from. The survey format was intended for quick completion to avoid loss of interest or diligence by the participants. Specifically for educators, certain teaching or behavioral methods could potentially be misunderstood as something else due to the short explanations. Finally, the choice of decision tree addresses both research questions put forth in this paper. However, small changes in training data can radically affect tree structure which informs feature importance ranking as can be seen with the use of PCA. Additionally, decision trees are suboptimal for regressing continuous values and are prone to overfitting which can affect the generalizability of SEI prediction. TABLE II FEATURE IMPORTANCE SCORES OF THE BEST DATASETS AND DATASET SPLITS FOR EACH TREE Split Ratio Best Dataset Visualization Practice Math Rules Gamification Collaboration Discovery Case Studies Assessment Language Inspiration Engagement Entertainment Bonding Classification PCA PCA Techniques Behaviors Regression PCA PCA This is slightly more than quadruple the importance of the next feature. Engagement, at 0. Of less significant importance are Inspiration. Collaboration. Entertainment, and Practice. By extracting the ranking of features derived from the best classification tree, we may conclude that that teachers can achieve the quickest improvement in SEI by using the games to teach and actively engaging with students. However, due to the mediocre accuracy of the classification tree, the suggested priorities for areas of performance could be suboptimal, out of order or of incorrect magnitude. The best regression treeAos feature importance rankings result in interesting conclusions as well and can be seen in Fig. 3 and Table II. Due to using the PCA dataset the importance of each feature is more concentrated like the classification tree. Despite removing 3 features with PCA, the best regression tree only deems 4 of the remaining 10 features to contain any useful information. The regression treeAos feature importance ranking is heavily skewed, like the classification tree, towards Gamification at a score of 0. This means that 86% of the information gained in the regression tree can be attributed to the Gamification This is more than 8 times more important than the next feature. Summary of Findings The research conducted in this paper demonstrates the ability to predict teacher performance based on their utilization of teaching techniques and different behavioral Additionally, it explores and demonstrates the possibility of discovering which techniques and behaviors can be prioritized for better teacher training. This second aim is crucial for many school systems, especially in Dubai, to better prepare students for the future. It is imperative that modern classrooms should take advantage of modern technology to optimize the learning experience for both teachers and students alike. Through the optimization of teacher performance and the enhancement of their experience, students can anticipate an increase in their own satisfaction as well as improved uptake and retention of knowledge. Future Work Some potential avenues for future investigation are different decision tree variations such as ensemble learning with Random Forests or gradient boosting. This will retain the inherent feature importance ranking functionality while potentially improving on SEI prediction. Moreover, the influence of geographical, demographic, and cultural elements may extend to both the forecasting of Student Evaluation of Instruction (SEI) and the identification of the most effective pedagogical strategies and actions. Conducting a comparable investigation in a different nation might confirm universally accepted educational standards or reveal tactics and practices that are significant within a specific regional context. Finally, the current feature importance implementation returns a ranking for the entire population of teachers. Potential future work can include personalized feature importance ranking which would exclude features that the teacher already uses to a high degree. REFERENCES