Infinity Journal of Mathematics Education Volume 14. No. 4, 2025 p-ISSN 2089-6867 eAeISSN 2460-9285 https://doi. org/10. 22460/infinity. Measuring changes in students' informal statistical reasoning skills through the ethno-flipped classroom model: Stacking and racking analysis Rahmi Ramadhani1. Soeharto2. Fitria Arifiyanti3. Zsolt Lavicza4*. Edi Syahputra5. Elmanani Simamora5 Department of Informatics. Universitas Potensi Utama. North Sumatra. Indonesia Research Center for Education. National Research and Innovation Agency (BRIN). Indonesia Department of Physics Education. Universitas Pendidikan Indonesia. West Java. Indonesia Department of STEM Education. Linz School of Education. Johannes Kepler University Linz. Austria Department of Mathematics Education. Universitas Negeri Medan. North Sumatra. Indonesia Correspondence: zsolt. lavicza@jku. Received: Apr 14, 2025 | Revised: Jul 22, 2025 | Accepted: Jul 29, 2025 | Published Online: Oct 10, 2025 Abstract Statistical reasoning is widely recognized as a fundamental skill for preparing students to navigate a data-driven global society. However, research on the development of informal statistical reasoning, particularly when facilitated through an ethno-flipped classroom model, remains limited. Furthermore, few studies employ longitudinal approaches, such as stacking and racking, to assess continuous changes in students' reasoning development over time. Most existing research relies on preAepost comparisons and lacks comprehensive analysis at both the person and item levels. This study investigates changes in students' informal statistical reasoning using the Partial Credit Rasch Model, incorporating stacking and racking analysis. A total of 152 twelfth-grade students participated in a 12-week ethno-flipped classroom intervention. The informal statistical reasoning test, consisting of five open-ended items, demonstrated sufficient validity for use in both the instructional intervention and the Rasch-based analysis. Item validity was assessed as a key parameter within the Rasch measurement framework. The findings revealed significant improvements in student proficiency and a decrease in item difficulty. Notably, 66 students reached Level 5, reflecting integrated process reasoning. These results support the effectiveness of the ethnoflipped classroom model in enhancing informal statistical reasoning. This study contributes to the design of contextual, adaptive instruction and offers a robust Rasch-based framework for monitoring reasoning development longitudinally. Keywords: Ethno-flipped classroom. Informal statistical reasoning. Racking analysis. Stacking analysis How to Cite: Ramadhani. Soeharto. Arifiyanti. Lavicza. Syahputra. , & Simamora. Measuring changes in students' informal statistical reasoning skills through the ethno-flipped classroom model: Stacking and racking analysis. Infinity Journal, 14. , 949-972. https://doi. org/10. 22460/infinity. This is an open access article under the CC BY-SA license. Ramadhani et al. Measuring changes in students' informal statistical reasoning skills A INTRODUCTION In global mathematics education, there is widespread recognition that mathematical reasoning is a critical skill, closely tied to the execution of mathematical procedures (Schleicher, 2023. Smit et al. , 2. and essential for understanding concepts across other scientific disciplines (Kollosche, 2021. Valenta et al. , 2. In recent years, the field has placed increasing emphasis on statistical literacy, particularly statistical reasoning, as a vital competency for navigating real world and data rich environments. Statistical reasoning equips individuals with the ability to make sense of data, draw inferences, and solve practical problems using statistical information (Fielding et al. , 2025. Hasanah et al. , 2. Within this borader domain, informal statistical reasoning plays a foundational role, particularly in early learning context. It refers to the ability to interpret informal statistical information in informal settings and to formulate questions, model data, and draw reasonable conclusions, all without relying on formal inferential procedures (Hjelte et al. , 2020. Makar, 2013. Pfannkuch, 2. As a subdomain of statistical reasoning, informal statistical reasoning is essential not only for academic progression but also for developing critical data literacy in everyday decision-making (Setianingsih & Rahmah, 2020. Sukoriyanto & Khurin'in, 2. The growing importance of informal statistical reasoning in mathematics education is reflected in various international curriculum frameworks. For instance, the GAISE report by the American Statistical Association (Franklin et al. , 2. outlines four key components of the statistical reasoning process, posing questions, collecting data, analyzing data, and interpreting results, that align with NCTM's principles (National Council of Teachers of Mathematics, 2. Several countries, including the UK (Department for Education, 2. China (Ministry of Education of PeopleAos Republic of China, 2. Singapore (Ministry of Education Singapore, 2. , and Indonesia (Ministry of Education Culture Research and Technology, 2. have integrated informal statistical reasoning into their national curricula to enhance students' ability to analyze data and solve real-life problems. Informal statistical reasoning refers to a skill that emphasizes how students use their logic to gather mathematical information, which involves obtaining information from random statistical data (Pfannkuch, 2. It is a type of reasoning that is closely related to probabilities and helps to draw conclusions and make decisions in situations that aren't expressed in terms of recognized probabilities (Sariningsih & Herdiman, 2. This form of reasoning allows students to understand statistical data informally and to form conclusions and judgments based on informally acquired data from specific observational studies, experiments, or sample surveys (Hjelte et al. , 2020. Pfannkuch, 2. Developing informal statistical reasoning is essential for building foundantial skills in statistical inference, enabling students to observe, compare, and draw conclusions from data distributions. These skills support generalizing data findings and are assessed through five reasoning levels: level 1 . diosyncratic reasonin. , level 2 . erbal reasonin. , level 3 . ransitional reasonin. , level 4 . rocedural reasonin. , and level 5 . ntegrated reasonin. The informal statistical reasoning have four key indicators: describing, organizing, representing, and interpreting data (Jones et al. , 2000. Makar & Fielding-Wells, 2011. Makar & Rubin, 2009. Mooney. Infinity Volume 14. No 4, 2025, pp. 949-972 951 Informal statistical reasoning skills are inherently connected to data interpretation and can be meaningfully contextualized within the framework of ethnomathematics. This approach was demonstrated by Ramadhani. Saragih and Napitupulu . , who investigated studentsAo informal statistical reasoning through assessments embedded in ethnomathematical contexts. Their findings revealed that students exhibited varying levels of informal statistical reasoning, which were associated with their initial mathematical These outcomes suggest that incorporating ethnomathematics-based contexts into assessment tools holds potential for effectively mapping studentsAo reasoning abilities. However, the study did not involve the implementation of specific instructional interventions, focusing solely on identifying existing ability levels. In contrast, other studies assessing informal statistical reasoning have not integrated ethnomathematical contexts into their instruments. The limited use of ethnomathematics contexts in assessing informal statistical reasoning represents a key novelty of this study. Specifically, this research incorporates ethnomathematics-based contexts into statistical problems designed to evaluate studentsAo informal statistical reasoning skills. While previous studies have explored students' reasoning levels, they did not involve specific instructional interventions using cultural contexts, thereby highlighting a gap that this study seeks to address. Several researchers have demonstrated that integrating ethnomathematics into mathematics education enhances student engagement and provides meaningful learning experiences (Furuto, 2016, 2018. Hidetoshi & Rothman, 2021. Kim & Chae, 2016. Lipka et , 2. Beyond cultural contextualization, instructional design models such as the flipped classroom have gained popularity for enabling flexible, interactive, and student-centered learning (Attard & Holmes, 2. Encouraging peer collaboration within this model further increases students' confidence, interest, motivation, and adaptability when using technology to learn mathematics (Abeysekera & Dawson, 2. The ethnomathematics context in this study was integrated with the flipped classroom model. Such an integration was previously implemented by Ramadhani et al. This integration resulted in a versatile learning environment that operates in two phases: out-of-class and in-class, designed to foster meaningful learning by incorporating contexts, activities, cultural elements, values, and characteristics embedded in studentsAo sociocultural experiences. Engaging with problems that reflect studentsAo cultural backgrounds and daily activities enhances their ability to recognize, simplify, visualize, analyze, and draw predictions from data in contexts relevant to their lives. While previous research demonstrated the potential of this integration, the current study introduces a further innovation by applying the ethno-flipped classroom model specifically to assess students' informal statistical reasoning, a focus that has not been sufficiently addressed in earlier To further elaborate, the ethno-flipped classroom model is an innovative pedagogical framework that combines the flexibility and technology integration of the flipped classroom (Attard & Holmes, 2020. Ramadhani et al. , 2023. with the cultural relevance of ethnomathematics (DAoAmbrosio & de Campos Almeida, 2017. Prahmana et al. , 2021. Rosa & Orey, 2. By embedding learning in studentsAo cultural contexts, it supports adaptive, student-centered learning experiences. Collaborative and investigative learning through Ramadhani et al. Measuring changes in students' informal statistical reasoning skills A tiered group work enhances character development, self-regulated learning, and sociocultural values such as cooperation and perseverance (Ramadhani. Syahputra, et al. , 2022. Ramadhani et al. , 2023. Studies show that implementing this model using contexts like Lemang Batok or Nias traditions improves numeracy and informal statistical reasoning (Ramadhani et al. , 2024. Ramadhani et al. , 2. , confirming the modelAos value in promoting critical thinking and statistical literacy in real-world cultural contexts. While the pedagogical model offers promise, rigorous evaluation of its impact on students' reasoning skills remains limited in the literature. This study therefore draws on previous intervention-based research that seeks to assess learning gains meaningfully. Numerous studies have examined improvements in statistical reasoning skills before and after interventions (Conway et al. , 2019. Ganesan & Leong, 2. However, only a few have explored these changes in depth to identify meaningful differences between pre and post-test changes in depth to identify meaningful differences between pre and post-test Ramadhani. Saragih. Maulida, et al. examined changes in statistical reasoning skills using a TinkerPlots-assisted ethnomathematics approach. Laliyo et al. investigated shifts in studentsAo understanding of hydrological concepts through the implementation of an inquiry-based model. Ling et al. analyzed changes in studentsAo initial mathematical abilities using educational games as a learning medium. Hu et al. assessed variations in self-efficacy in science learning through both cross-sectional and longitudinal analyses. Meanwhile. Amin et al. explored developments in studentsAo readiness for self-directed learning. Most studies merely confirm the presence of an effect without explaining the magnitude or nature of the improvement in student ability (Laliyo et , 2. Many focus solely on identifying student performance at a fixed point rather than investigating how specific instructional interventions lead to progressive development over time (Ilie et al. , 2. As a result, they overlook the dynamic nature of learning and fail to capture longitudinal shifts in student understanding. These limitations highlight a core concern in the psychometric community about the accuracy of measuring changes over time. Central to this issue is the use of raw scores in Classical Test Theory (CTT), which has been widely criticized for introducing measurement bias (Sumintono, 2. Raw scores lack the precision required for valid interpretation. Specific concern include: . gain scores tend to correlate negatively with pre-test performance (Pentecost & Barbera, 2. they often show low reliability (Willoughby & Metz, 2. scale inconsistency reduces measurement accuracy (Linn & Slinde. To address these limitations, more advanced psychometric approaches are needed. Stacking and racking analysis is an analytic techniques within the Rasch Measurement framework, offer a more rigorous and accurate way to evaluate learning gains. Both are grounded in Item Response Theory (IRT), which focuses on modeling the probability of a specific response based on both item characteristics and person ability. These techniques allow researchers to simultaneously examine changes in individual or person ability and item difficulty across time points, making them particularly suitable for longitudinal educational studies in educational settings (Amin et al. , 2013. Sunjaya et al. , 2. Stacking arranges each studentAos data vertically, one row for the pre-test and one for the post-test, to track Infinity Volume 14. No 4, 2025, pp. 949-972 953 individual changes across identical items (Combrinck et al. , 2. Racking, in contrast, places the data horizontally, allowing each test item to appear twice while each student appears once, enabling analysis of changes in item difficulty over time (Wright, 2. These techniques help evaluate shifts in student abilitiesAisuch as informal statistical reasoning and self-regulated learning, by comparing item positions and difficulty levels on the Wright Building on this framework, stacking and racking techniques significantly enhance the reliability and interpretability of assessments by: . providing detailed insights into changes in both student ability and item difficulty. improving the precision of skill measurement beyond what conventional methods allow. accurately monitoring learning progress over and . ensuring the equitability and observability of ability shifts within individuals and groups (Uzun & ynretmen, 2. This study aims to examine changes in students' informal statistical reasoning skills by comparing their performance on pre-tests and post-tests following the implementation of the ethno-flipped classroom model. The analysis considers both students' proficiency levels and the difficulty of test items to identify trends and significant differences in their reasoning abilities when solving statistical problems within ethnomathematical contexts. To guide this investigation, the following research questions were formulated. RQ1: Does the data collected from a sample of high school students fit the Rasch Model RQ2: Are there any changes in students' informal statistical reasoning skills after the intervention using an ethno-flipped classroom model based on person ability? RQ3: Are there any changes in students' informal statistical reasoning skills after the intervention using an ethno-flipped classroom model, based on item test difficulty? RQ4: Are there any differences between the students' informal statistical reasoning skills at Time 1 . re-tes. and Time 2 . ost-tes. ? METHOD Research Design This study applied quantitative research with a one-group pre-test/post-test design. The research data were analyzed using the stacking analysis technique and racking analysis technique on the Rasch Model Measurement. The stacking analysis technique was used to measure changes in students' informal statistical reasoning ability after the intervention of the ethno-flipped classroom model in terms of the level of ability per student. In contrast to the stacking analysis technique, the racking analysis technique was used to measure changes in the students' informal statistical reasoning skills in terms of difficulty levels per test item. Participants Participants in this study were final-year students at senior high school level on Nias Island. Indonesia. The study sample was selected using stratified random sampling. The total sample of this study was 235 students, but only 152 of the 12th-grade students were included in the stacking and racking analysis . % female and 44% mal. with an age range of 17-18 years old from six high schools in Nias Island. Indonesia. The research sample participated Ramadhani et al. Measuring changes in students' informal statistical reasoning skills A in a 14-week ethno-flipped classroom learning experience divided into two learning phases . ix out-of-class learning phases and six in-class learning phase. , one initial assessment phase (Time 1-Pretes. , and one final assessment phase (Time 2-Posttes. The topic taught was statistics, consisting of three sub-topics: measures of data presentation, measures of centeral tendency, and measures of dispersion. The decision to include only 152 students was based on data completeness. A total of 82 students did not fully participate in the 12 week intervention and failed to complete either the pre-test or the post-test. Consequently, students who were absent for one of the assessments or did not participate consistently throughout the intervention were excluded from the analysis. This exclusion was necessary to avoid bias and distortion in estimating person ability and item difficulty due to incomplete data. Data completeness is a critical requirement in longitudinal studies utilizing stacking and racking analysis, as accurate and valid measurement at both time points (Time 1-pre-test and Time 2-post-tes. is essential for meaningful comparisons. Research Instrument The instrument used in this study was an informal statistical reasoning test. The test that we designed and developed refers to the statistical learning indicators adopted from Jones et al. and Mooney . , which describe data, organize and reduce data, represent data, and analyze and interpret data. Describing data involves the direct reading and reporting of information presented in charts, tables, or other graphical displays. Organizing and reducing data refers to the processes of classifying, structuring, or summarizing data, often through the use of measures of central tendency and variability. Data representation entails the graphical display of data to facilitate understanding and Analyzing and interpreting data involves identifying patterns or trends and making predictions or inferences based on the information presented in graphical formats. The test was developed based on the level of informal statistical reasoning ability using a five-item test. Table 1 presents one of the informal statistical reasoning ability test items Table 1. Informal statistical reasoning ability test item Informal Statistical Reasoning Ability Test Item Solution One of the kinship systems of the Nias community is derived from the marriage procession. In Nias culture, marriage is an obligation and responsibility of the parents ("No ibu'a wo'ymy nia khy ndraono nia"). The implementation of the marriage process is also carried out in cooperation, following the Amaedola held by the Nias community, namely "Aoha noro nilului wahea, aoha noro nilului waoso, alisi tafadaya-daya, hulu tafewolo-wolo". The following data is given in a histogram related to data on the age of first marriage of Nias people in Gunungsitoli city in 2021 . ata source: https://gunungsitolikota. The data displayed is in the form of rounding units of It is known that the number of Nias people who have their first marriage from the age range of 10-40 years is 114 thousand. The next information obtained is the ratio of Nias people who have their first marriage in the age range of 33-40 years and 17 - 24 years is 1:3 (Students extract the ideosyncratic or relevant information from the graph Suppose the number of Nias people who had their first marriage in the age range of 33-40 years is x, then: 114 = 27 3ycu 55 ycu Infinity Volume 14. No 4, 2025, pp. 949-972 955 Informal Statistical Reasoning Ability Test Item If there are 114 thousand Nias people who have their first marriage from the age range of 10-40 years, and the ratio of Nias people who have their first marriage in the age range of 33-40 years and 17 - 24 is 1:3. Analyze how many people in Gunungsitoli city had their first marriage between the ages of 17 and 24! Solution 114 = 82 4ycu 4ycu = 114 Oe 82 4ycu = 32 ycu=8 (Students produce ideosyncratic information that complement the graph presente. Thus. Nias people who have their first marriage in the age range of 17 years to 24 years are 3ycu = 3. = 24 thousand (Students complete the ideosyncratic information that has been generated to complete the graph presented correctly and Table 1 illustrates that the informal statistical reasoning ability test incorporates problems framed within the cultural context of Nias. This integration aligns with the implementation of the ethno-flipped classroom model, which emphasizes presenting tasks rooted in local ethnomathematical contexts. Specifically, the use of ethnomathematical elements corresponds to the second component of the ethno-flipped classroom model, namely Cultural Experience . ee Figure . Developing informal statistical reasoning assessments that integrate ethnomathematics contexts presents several key challenges. First, it requires balancing cultural authenticity with the psychometric validity of test items. cultural context must support rather than hinder students' reasoning processes. Second, there is a potential for construct-irrelevant variance, where differences in studentsAo familiarity with cultural content may distort the measurement of their true informal statistical reasoning Third, it is necessary to ensure that the ethnomathematics context aligns consistently with established indicators of informal statistical reasoning abilities, maintaining both cultural and educational coherence. Procedure The research was conducted from September to November 2022 and began by administering a pretest (Time-. to all research samples, and ethical approval letter was obtained from Universitas Potensi Utama. The model intervention was conducted by referring to the syntax of the ethno-flipped classroom model for all research samples for 12 weeks and was conducted using two learning phases . ut-of-class learning and in-class learning for each learning cycl. The intervention was conducted in three learning cycles, where one cycle studied one subtopic. The posttest (Time-. was administered at the end of the third cycle. The first cycle of learning began with the out-of-class learning phase by focusing on the preparation of the learning environment, technology adaptation, and the initial introduction of statistical problems in the context of ethnomathematics. in this case. Ramadhani et al. Measuring changes in students' informal statistical reasoning skills A using the context of ethnomathematics artifacts of the Nias tribe. After learning with the ethno-flipped classroom model, the final phase was carried out by providing the final assessment (Posttest Time . The test given in the post-test session (Time . was the same as the test given in the pre-test session (Time . The initial test data . were collected during the first week of the activity, while the final data . were gathered at the end of the final week. The lesson began by setting up the learning environment through a pre-arranged learning management system, where all instructional modules are easily accessible. This process is part of the initial phase outlined in the flipped classroom model of out-of-class During this initial out-of-class phase, students explored the material available on the LMS and engaged in discussions with their peers (Syntax . The insights gained from these discussions are then used for further exploration in the subsequent in-class learning phase, while the initial phase of the environment, activities, and material preparation In the in-class learning phase, students were presented with informal problems rooted in the sociocultural context of the Nias tribe for discussion (Syntax . Prior to these discussions, the teacher divided the students into learning groups based on their proficiency levels (Syntax . The day before the learning intervention, the teacher conducted an initial assessment of the students' informal statistical reasoning skills through a pretest. The results of this initial assessment form the basis for grouping students. In addition, students engaged in collaborative study groups to address informal statistical problems, beginning in the in-class learning phase and continuing in the out-ofclass learning phase through the discussion tool built into the LMS. During the out-of-class phase, students extended the collaborative session by expanding the initial discussion results (Syntax . This expansion process continued into the in-class phase through additional elaboration activities, with the resulting elaborations serving as validation and confirmation The validation and confirmation tasks (Syntax . were performed through graded collaborative activities. Representatives of low-ability groups confirmed the earlier elaboration results for the medium- and high-ability groups. Similarly, middle-ability student groups verified and validated their elaborations with those of high-ability student groups. this stage, representatives from these student groups reached consensus on the results. addition, representatives from the high-ability groups presented agreed-upon results to the teacher for validation purposes. The learning activity culminated in a final assessment designed to evaluate students' proficiency after the learning intervention (Syntax . Figure 1 illustrates the sequence of implementation of the learning process using the ethno-flipped classroom model. Infinity Volume 14. No 4, 2025, pp. 949-972 957 Figure 1. Illustrates the sequence of implementation of the learning process using the ethno-flipped classroom model Data Analysis Rasch Measurement with stacking and racking analysis techniques assisted by WINSTEPS application. The analysis test began by checking the validity of the informal statistical reasoning test instrument used in this study. Fit validity analysis was conducted using the Rasch Model Measurement concerning certain criteria. The criteria in the fit validity analysis on the informal statistical reasoning ability test are using three values, namely the Outfit Mean Square (MNSQ) value between 0. 5 and 1. 5, the Outfit Z-Standard (ZSTD) value between -2. 0 and 2. 0, and the Point Measure Correlation (Pt. Mean Cor. value between 0. 4 and 0. 85 (Sumintono & Widhiarso, 2. Further analysis was performed to strengthen the results of the fit validity analysis by using a dimensionality test. The dimensionality test criterion is whether the test instrument developed can measure the range of variables or the ability of the research subject to answer question items if the Raw Variance Explained by Measures is above 40% (Ekinci et al. , 2023. Soeharto et al. , 2024. Sukarelawan et al. , 2. The test instrument can be used in research if Cronbach's alpha Ramadhani et al. Measuring changes in students' informal statistical reasoning skills A reliability value is in the interval 0. 70 O KR20 O 0. 80 or has a reliability measurement interpretation value of "Good" (Bond & Fox, 2. The next stage of analysis is stacking, and racking analysis is used to check for changes based on persons and items. Finally, the Wilcoxon signed-rank test using SPSS version 25 was used to measure the difference in students' informal statistical reasoning skills. RESULTS AND DISCUSSION Results RQ1: Fit Validity Based on Rasch Model Parameters Table 2. Construct validation test results on the informal statistical reasoning (ISR) ability test instrument using the Rasch analysis No. Item Code Total Score JMLE Measure MNSQ Outfit ZSTD Outfit PT Measure Corr. ISR1 ISR2 ISR3 ISR4 ISR5 Ae 1. Ae 0. Ae 0. Ae 0. Ae 2. Ae 0. Ae 1. The results of the fit validity analysis of the informal statistical reasoning test based on Table 2 show that all item codes meet the fit validity criteria. Although in item ISR1 the Outfit ZSTD value is greater than, this value is still acceptable and is concluded to still meet the valid category (Bond & Fox, 2. The Cronbach's alpha reliability was categorized as excellent . ee Table . Further analysis showed that the person separation index for the construct validation of the informal statistical reasoning instrument obtained a value of more Planinic et al. , and Soeharto and Csapy . stated that the separation value in person separation must be more than 2 logits, where the greater the separation index, the higher the quality of the test. The results of the construct validation analysis of the informal statistical reasoning test indicate that the instrument is capable of effectively measuring the range of abilities among the research participants in responding to the test items. This is supported by the raw variance explained by the measures of item dimensionality, which exceeded 40% . 1%). Table 3. Instrument reliability and model fit Measure Mean Mean Outfit MNSQ Mean Outfit ZSTD Separation Reliability Cronbach's Alpha Raw variance of Unidimensionality Persons Items Infinity Volume 14. No 4, 2025, pp. 949-972 959 The test item coded ISR5 is a question with a very hard difficulty level. Furthermore, the question with code ISR3 is a question with a difficulty level of hard, the question with code ISR4 is a question with a difficulty level of Medium, and the question with code ISR1 and code ISR2 are questions with a difficulty level of Easy. RQ2: Changes in Students' Informal Statistical Reasoning Skills After Intervention Using Ethno-Flipped Classroom Model Based on Person Ability Changes in studentsAo informal statistical reasoning skills were examined using pretest and post-test scores collected from four learning classes, comprising a total sample of 152 students. The pre-test (Time . and post-test (Time . data were analyzed using the stacking analysis technique, which is incorporated within the Rasch model measurement This analysis yielded logit values that were used to assess shifts in informal statistical reasoning skills before and after the implementation of the ethno-flipped classroom model, as illustrated in Figure 2. Figure 2. Wright map of changes in informal statistical reasoning ability in pre-test condition . and post-test condition . in terms of students' individual ability levels Referring to Figure 2, it was found that students with code 094F had the smallest change in logit value compared to the other students. Students with code 094F had a negative Ramadhani et al. Measuring changes in students' informal statistical reasoning skills A logit value in the pre-test condition (Time . but increased to a positive logit value in the post-test condition (Time . In contrast to student code 084M, where the student has the largest change in logit value 7. Figure 2 also shows that student code 119F had the lowest pre-test score . ogit value . , and in the post-test condition, student code 119F had the lowest logit value . ogit In the post-test condition, student code 119F also had the lowest logit value. Students coded 043F and 038M as students with the highest pre-test scores. Different conditions were seen in the post-test scores, where students with codes 013M, 044F, and 103F were the group of students with the highest post-test scores. This finding shows evidence that students who have been given the intervention of an ethno-flipped classroom learning model show positive changes. The positive change is reinforced by the Logit Value Person value, which shows the change in the level of students' informal statistical reasoning ability from the pre-test condition (Time . to the post-test condition (Time . , as shown in Tables 4 and 5. Table 4. Logit value person in pretest condition . High. Mean Logit SD < LVP < Mean Logit (Level . Female Total Very LVP > Mean Logit (Level Male Demograp Low. Mean Logit SD < LVP < Mean Logit (Level Moderat e. Mean Logit < LVP < Mean Logit (Level . Very Low. LVP < Mean Logit SD (Level Total Gender Table 5. Logit value person in posttest condition . Demograp Very LVP > Mean Logit (Level High. Mean Logit SD < LVP < Mean Logit (Level Moder Mean Logit < LVP < Mean Logit (Level Low. Mean Logit SD < LVP < Mean Logit (Level Very Low. LVP < Mean Logit - SD (Level . Total Gender Male Female Total The level of students' informal statistical reasoning ability in the pretest condition (Time . was 48% at Level 2 . In terms of gender demographics, female students dominate Level 2 (Verba. , which is 51. 72%, while 43. 08% of male students have Level 2 (Verba. informal statistical reasoning skills. After the intervention of the ethno-flipped classroom model, there was a change in the level of students' informal statistical reasoning skills, which were previously still dominant at level 2 . to level 5 (Integrated Proces. Infinity Volume 14. No 4, 2025, pp. 949-972 961 in the post-test condition (Time . As many as 43. 42% of the students were at Level 5 (Integrated Proces. , and as many as 39. 47% of the students were at Level 4 . , and only 1. 97% of the students remained at Level 2 . after receiving the intervention of the ethno-flipped classroom model. When reviewed from gender demographics, male students dominate the increase in level of informal statistical reasoning ability to Level 5 (Integrated Proces. , which is 49. 23%, inversely proportional to the increase in level to Level 4 . , students with female gender dominate, which is 44. Further analysis at the individual student level revealed that students with the lowest pretest and posttest logit scores . ode 119F) who were originally in Level 1 . moved up to Level 2 . Another finding was that students with the highest pretest logit scores . odes 043F and 038M), who were originally at level 4 . , moved to level 5 (Integrated Proces. Similarly, students with the highest post-test logit scores . odes 013M, 044F, and 103F), who were originally at Level 3 . , moved up to Level 5 (Integrated Proces. The student with the highest change in ability level . ode 084M) experienced a significant increase in ability level from Level 2 . to Level 5 (Integrated Proces. Although the student with code 094F experienced the smallest increase in skill level, he still showed a positive level increase from Level 3 (Transitiona. to Level 4 (Procedura. Figures 3 and 4 illustrate the proficiency of the students' informal statistical reasoning skills in both the pre-test (Time . and post-test (Time . Figure 3. Level of informal statistical reasoning skills of the students in the pre-test condition (Time . Ramadhani et al. Measuring changes in students' informal statistical reasoning skills A Figure 4. Level of informal statistical reasoning skills of the students in the post-test condition (Time . RQ3: Changes in Students' Informal Statistical Reasoning Skills After Intervention Using Ethno-Flipped Classroom Model Based on Test Difficulty Changes in students' informal statistical reasoning skills were analyzed using Racking Analysis. The Racking Analysis technique aims to identify changes in item difficulty that impact changes in students' informal statistical reasoning abilities, as presented in Figure 5. Figure 5. Wright map of changes in informal statistical reasoning ability in pre-test . and post-test . conditions in terms of test item difficulty levels Infinity Volume 14. No 4, 2025, pp. 949-972 963 The test item with the code ISR5 in the pre-test condition. 43 logi. , which was the most difficult test item for students to solve, changed the logit value to. 22 logi. Although there was a change in the difficulty level of item ISR5 in the pretest condition . ode 5P) and posttest condition . ode 5O), item ISR5 remained at the most difficult item difficulty level compared to other test items. Likewise, in item ISR1, where in the pre-test condition . ode 1P) ISR1 was the easiest test item for students to solved (-0. 39 logi. and in the post-test condition . ode 1O) (-4. 97 logi. experienced the greatest change in the level of difficulty of the test item,-4. 58 logit. The Wright Map in Figure 2 shows that all students in the field test activities were able to change their informal statistical reasoning. This can be seen from the change in the level of difficulty of the test items of the students' informal statistical reasoning ability. The average change in the difficulty level of the test items in the pre and post-test conditions was negative, that is,-3. The size of the difference in the difficulty level of the informal statistical reasoning test items, which is negative, indicates that in the post-test condition, the items become much easier to complete than in the pre-test condition, concluding that the intervention of the ethno-flipped classroom model has a positive impact on students in solving the informal statistical problems given. RQ4: Differences Between Students' Informal Statistical Reasoning Skills in Time 1 Condition (Pretes. and Time 2 Condition (Posttes. The results of data normality and homogeneity tests of informal statistical reasoning data show that the pretest and posttest data are homogeneous, but not normally distributed. Based on these results, non-parametric statistical tests using the Wilcoxon signed-rank test were performed to check for differences before and after the intervention. The average score 50 with a probability of less than 0. 00<0. The results indicated that there was a significant difference between the pre-test and post-test scores of informal statistical reasoning skills. Discussion The ethno-flipped classroom model found that all students experienced positive changes and were able to complete the informal statistical reasoning skill items well. The data confirmed a positive change in the average logit value from the pre-test to the post-test Student scores in the post-test condition also obtained a positive average logit The magnitude of the positive change in logit value indicates that all students were able to improve their informal statistical reasoning skills well and positively after receiving the ethno-flipped classroom model intervention. Changes in students' informal statistical reasoning skills can also be seen from the analysis of changes in the level of difficulty of test items in the pre-test and post-test conditions, as shown in Figure 2. The analysis of changes in the level of difficulty of the test items was performed using the Racking Analysis technique. Based on the results of the analysis of changes in the level of difficulty of the test items, it can be concluded that students experience positive changes in informal statistical reasoning skills. Positive changes can be seen in the change in students' logit scores, which increased from the pre-test condition (Time . to the post-test condition (Time . The pattern of changes that occur in students' Ramadhani et al. Measuring changes in students' informal statistical reasoning skills A logit scores can be seen in Figure 1, where there are changes in the positions of students' abilities presented on the Wright Map. The position of students' informal statistical reasoning ability in the post-test condition (Time . changed and increased to a higher logit value position . owards a positive logit valu. than the position of students' ability in the pre-test condition (Time . The change in students' increasing logit value was also reinforced by the change in the level of difficulty of the test items of students' informal statistical reasoning ability, as shown in Figure 2. The nature of the change in the level of difficulty of the test items shown in Figure 2 indicates that the students' informal statistical reasoning ability has The change in the level of difficulty of the test items also shows that the ethnoflipped classroom model has a positive impact on students in solving informal statistical The ethno-flipped classroom model intervention changed the item difficulty level from the pre-test to the post-test condition. The impact of the ethno-flipped classroom model gives students confidence to be able to solve informal statistical problems better in the posttest condition, thereby influencing both the development of item difficulty levels and the mastery of informal statistical reasoning indicators. In addition to score improvements and item difficulty shifts, further insight can be drawn from the Wright Map and Person-Item Distribution, which also reveal. In line with these findings, the consistent increase in students' logit scores and the shift in item difficulty levels suggest that the ethno-flipped classroom model effectively facilitates deeper engagement with statistical concepts embedded in culturally relevant contexts. incorporating ethnomathematical elements into learning activities, students were exposed to familiar yet meaningful real-life statistical problems that promoted intuitive understanding and improved reasoning. This approach aligns with constructivist principles, where students build new knowledge through active experiences, especially when those experiences are rooted in their cultural backgrounds (Ramadhani et al. , 2023. The increase in student ability logit values also reflects that students were not only recalling procedures but were genuinely developing informal statistical reasoning skills through contextual interpretation, argumentation, and data based decision making. This supports the integration of culturally responsive pedagogy in 21st-century mathematics instruction. In addition to score improvements and item difficulty shifts, the Wright Map and Person-Item Distribution also reveal meaningful cognitive progress. Moreover, the change in the difficulty levels of test items, as visualized in Figure 2, supports the idea that the learning environment created by the ethno-flipped classroom model did not merely prepare students to perform well on a fixed set of problems (Andrade & Coutinho, 2. Instead, it nurtured transferable reasoning skills that could be applied across varying problem The changes in item difficulty distribution indicate that students' post-test abilities were no longer hindered by the cognitive demands of the tasks. This implies an increased confidence and competence in addressing tasks involving variation, distribution, and contextual interpretation, core components of informal statistical reasoning. Therefore, the model's impact is not only visible in score improvement but also in the shift of cognitive engagement and learning depth, emphasizing its relevance for culturally responsive and meaningful mathematics education. Infinity Volume 14. No 4, 2025, pp. 949-972 965 Beyond the quantitative evidence, the Rasch analysis using Wright Map and PersonItem Distribution provided evidence of alignment between studentsAo increasing abilities and the hierarchy of item difficulties (Andrich, 2017. Bond et al. , 2. This alignment suggests that the ethno-flipped classroom model did not merely enhance surface-level understanding, but also supported the progressive development of students' cognitive strategies for solving informal statistical tasks. Students began demonstrating the ability to compare distributions, reason with informal data trends, and justify conclusions using contextual references drawn from ethnomathematical problems. These are hallmarks of informal statistical reasoning, which require interpretation beyond computation. The fact that more students occupied higher logit positions in the post-test reinforces the claim that this model fostered not only knowledge retention but also reasoning sophistication in culturally grounded contexts. Such findings reinforce the transformative potential of integrating pedagogy and culture in statistical education. Complementing the quantitative data, qualitative observations during in-class activities revealed that students were more engaged and collaborative when solving statistical problems based on their cultural environments. Through group discussions, contextual reflections, and peer teaching during class sessions, students exhibited increased autonomy and confidence in interpreting statistical data. This transformation suggests that the ethno-flipped classroom promotes both cognitive and affective gains, reinforcing studentsAo sense of identity while also equipping them with critical reasoning skills. Such outcomes are especially important in multicultural classrooms, where the recognition of studentsAo cultural assets can transform learning environments into spaces that are inclusive, responsive, and academically rigorous. Ultimately, the study demonstrates that combining flipped learning with ethnomathematical contexts is not merely a pedagogical variation, but a transformative approach that bridges culture and cognition in mathematics education. CONCLUSION The results of this study show that changes in the informal statistical reasoning skills of students taught using the ethno-flipped classroom model provide positive changes with stacking analysis and the racking-rasch model. Stacking analysis offers insight into individual-level changes in ability by identifying whether a person's performance has improved, declined, or remained consistent over time, based on the assumption that item difficulty remains stable across measurement occasions. Racking analysis completes the change analysis process but in terms of item difficulty. Stacking and racking analyses within the Rasch model offer a viable approach for measuring change, serving as a complementary method to traditional assessment techniques. Measuring changes at the individual level and item difficulty allows researchers to identify competent and problematic students for Overall, the stacking and racking analyses indicate that the learning intervention implemented through the ethno-flipped classroom model effectively enhanced the informal statistical reasoning skills of senior high school students. The ethno-flipped classroom model intervention had an impact on positive changes in students' informal statistical reasoning skills in the pre-test (Time . and post-test (Time . Ramadhani et al. Measuring changes in students' informal statistical reasoning skills A Despite the promising findings, this study has several limitations. First, the cultural context applied in developing informal statistical reasoning assessments was limited to the Nias cultural setting. This specificity may limit the generalizability of the findings to other cultural contexts within ethnomathematics-based learning. Second, the sample size was reduced from 235 to 152 participants due to incomplete participation in the full learning cycle, to maintain data validity in the Stacking and Racking analysis. This reduced sample may affect the representativeness of the results. Third, the primary focus of this study was limited to analyzing changes in studentsAo informal statistical reasoning abilities through Stacking and Racking analysis, without further statistical testing or qualitative triangulation to validate or deepen the interpretation of these changes. Future research is encouraged to expand the cultural contexts involved, increase sample diversity, and integrate mixedmethod approaches, combining quantitative analysis with qualitative data triangulation to strengthen the understanding of changes in informal statistical reasoning skills over time. Acknowledgments The authors sincerely extend our gratitude to Universitas Potensi Utama. Universitas Negeri Medan. Universitas Pendidikan Indonesia, the Research Center for Education at the National Research and Innovation Agency (BRIN), and Johannes Kepler University for their invaluable contributions and support to this collaborative research. We also express our profound appreciation to the editors and reviewers for their insightful and constructive feedback, which has greatly enhanced the quality of this manuscript. This work was supported by Johannes Kepler University Open Access Publishing Fund and the Federal State Upper Austria. Declarations Author Contribution Funding Statement Conflict of Interest Additional Information : RR: Conceptualization. Visualization. Writing - Original Draft, and Writing Ae Review & Editing. S: Formal Analysis. Methodology, and Writing Ae Review & Editing. FA: Formal Analysis. Methodology, and Writing Ae Review & Editing. ZL: Formal Analysis. Validation, and Supervision. EE: Validation, and Supervision. ES: Validation, and Supervision. : This research was supported by Johannes Kepler University Open Access Publishing Fund and the Federal State Upper Austria. : The authors declare no conflict of interest. : Additional information is available for this paper. Infinity Volume 14. No 4, 2025, pp. 949-972 967 REFERENCES