Journal of Natural Science and Integration P-ISSN: 2620-4. E-ISSN: 2620-5092 Vol. No. October 2025, pp 258-272 Available online at: http://ejournal. uin-suska. id/index. php/JNSI DOI: 10. 24014/jnsi. Exploring Students' Creative Thinking Skills: A Design of Reliable Instrument in term of Assessing Creative Thinking on Temperature and Heat Concepts Ahmad Maqruf1. Andi Suhandi1*. Achmad Samsudin1. Mustafa Syzbilir2. Hasan ynzgyr Kapc3. Hadi Nasbey4. Eki Nugraha5 Department of Physics Education. Universitas Pendidikan Indonesia. Indonesia Department of Mathematics and Science Education. Ataturk University. Turkiye Department of Mathematics and Science Education. Bogazici University. Turkiye Department of Physics Education. Universitas Negeri Jakarta. Indonesia Department of Computer Science Education. Universitas Pendidikan Indonesia. Indonesia *Correspondence Author: andi_sh@upi. ABSTRACT This study aims to develop and validate an instrument for assessing creative thinking skills on the topic of temperature and heat among high school students. A quantitative research design was employed, with data analysis conducted using the Rasch model. The study involved 136 grade XI students from two high schools in West Bandung Regency, consisting of 51 male and 85 female participants. The Rasch analysis results indicated a CronbachAos alpha of 0. and an item reliability value of 0. However, minor revisions are required in the wording of three items to improve linguistic clarity. The percentage distribution of creative thinking indicators was as follows: fluency . 34%), flexibility . 57%), originality . 13%), and elaboration . 76%). These findings reveal that studentsAo creative thinking is predominantly characterized by fluency, suggesting that they are relatively proficient at generating multiple ideas based on given problems, likely due to a good understanding of the material. Nevertheless, further instructional efforts are needed to foster higher levels of creative thinking, particularly in flexibility, originality, and elaboration. Keywords: creative thinking skills test, temperature and heat concepts, rasch analysis INTRODUCTION Education in the 21st century has undergone significant transformation to meet the demands of rapid technological and societal change. These developments require continuous innovation to enhance the quality of education and ensure its relevance to current and future challenges (Permana et al. , 2023. Ramadhanti & Azhar, 2022. Rani et al. , 2. The primary focus of todayAos educational process is the development of essential life skills both cognitive and interpersonal that are aligned with the competencies needed in this century (Haug & Mork, 2021. Suwistika et al. , 2. Within the cognitive domain, higher order thinking skills such as critical and creative thinking are of paramount importance . Alvarez Huerta et al. , 2. Consequently, learning designs should intentionally cultivate studentsAo creative thinking abilities (Albar & Southcott, 2. To effectively evaluate these skills, valid and reliable instruments are required to measure studentsAo creative thinking development. Such instruments provide objective and accurate assessments that can help educators design more effective teaching strategies and monitor studentsAo progress in developing Journal of Natural Science and Integration. Vol. No. October 2025, pp 258-272 | 258 Exploring Students' Creative Thinking Skills: A Design of Reliable Instrument in term of Assessing Creative Thinking on Temperature and Heat Concepts creative competencies an essential preparation for facing the increasingly complex challenges of the modern world. Creative thinking has become a crucial skill for problem solving and innovation, influencing human progress and sustainability (Khoiri et al. , 2019. Malik et al. , 2. It reflects the ability to organize thoughts to produce novel and useful ideas, typically evaluated through several dimensions such as fluency, flexibility, originality, and elaboration (Sahida & Zarvianti, 2019. Torrance, 2. Fluency refers to the ability to generate a large number of relevant ideas quickly, flexibility involves proposing ideas from diverse perspectives (Wartono et al. , 2. , originality represents the capacity to produce uncommon or unique ideas (Saregar et al. , 2. , and elaboration denotes the ability to refine, expand, and implement ideas with detailed steps (Asriadi & Istiyono, 2020. Rosha & Hidayat, 2. The development of an appropriate assessment instrument based on these indicators is therefore essential to accurately measure studentsAo creative thinking skills. One psychometric model that provides a robust framework for developing such instruments is the Rasch model. Initially introduced by Georg Rasch in 1960, this model examines item difficulty and individual ability simultaneously, allowing both to be placed on the same measurement scale (Astutik et al. , 2020. SaAodiyah et al. , 2. The Rasch model applies probability theory to convert raw scores into interval data on a logit scale, providing a linear measure of student ability and item difficulty. It has become a powerful tool in educational measurement for ensuring construct validity and reliability in assessment development. In physics education, creative thinking plays an important role in helping students understand and apply abstract scientific concepts. However, students often perceive physics particularly temperature and heat as a set of formulas to memorize rather than as concepts to understand (Kusairi, 2013. NafiAoah et al. , 2. This misconception leads to difficulties in solving problems that require conceptual reasoning (Kamila et al. , 2020. Sundari & Sarkity, 2. Many students struggle with fundamental concepts such as thermal equilibrium and the distinction between heat and temperature (Kamar et al. , 2016. MusaAoadah & Kusairi, 2. Because these phenomena are not directly observable, studentsAo understanding is often shaped by their daily experiences rather than scientific reasoning (Aminudin et al. , 2019. Budiarti et al. , 2. The Indonesian government has introduced a curriculum aimed at developing students holistically across intellectual, creative, moral, and social dimensions to meet global challenges. According to the Ministry of Education and Culture, physics is not only a study of natural phenomena but also a means to foster self awareness as part of nature, emphasizing critical, creative, and ethical behavior in alignment with the Pancasila Student Profile (Kemendikbud, 2. This approach aligns with UNESCOAos four pillars of education learning to know, learning to do, learning to live together, and learning to be. Research by Trilling and Fadel . also emphasizes that 21st century learners need skills such as communication, critical thinking, problem-solving, collaboration, and adaptability to thrive in diverse and technology-driven environments. Despite this, many students still lack competencies in technological literacy, project management, and leadership, underscoring the need to emphasize 21st century skills in science education. Collaborative learning and authentic assessment are vital to achieving these educational goals. However, studies have shown that the development of valid instruments for measuring 21st century skills, especially creative thinking in physics, remains limited in Indonesia (Utari et al. Most assessment tools focus primarily on memory and low level cognitive processes (NafiAoah et al. , 2019. Herpiana et al. , 2. Teachers often face challenges in designing assessment instruments that evaluate creativity, leading to a lack of opportunities for students to engage in problem solving activities that stimulate innovative thinking. Consequently, there is a gap between the educational demand for fostering creativity and the availability of tools that can measure it Journal of Natural Science and Integration. Vol. No. October 2025, pp 258-272 | 259 Ahmad Maqruf. Andi Suhandi. Achmad Samsudin Therefore, this study aims to develop and validate a creative thinking skills test instrument focusing on temperature and heat concepts within the context of physics learning. The instrument is designed to assess studentsAo ability to think fluently, flexibly, originally, and elaborately when solving physics related problems. The validation process, conducted using the Rasch model, is expected to ensure that the developed instrument meets the psychometric standards of validity and reliability required for assessing creative thinking in science education. Creative Thinking Skills Design Creative Thinking Skills Test Assessing StudentAos Creative Thinking Skiils Merdeka Curricula Figure 1. Framework for Creative Thinking Test on Heat and Temperature Concepts METHODOLOGY Design Study This study employed a quantitative descriptive research design to systematically analyze and interpret data with the aim of producing accurate, reliable, and high-quality findings (Duckett. The research was conducted through four stages: developing the creative thinking test instrument, administering it to students, analyzing the data, and interpreting the results (Amiruddin et al. , 2. The instrument was constructed based on four indicators fluency, flexibility, originality, and elaboration and was distributed via Google Forms to high school students in West Bandung Regency. Data were processed using Excel for preliminary analysis. WINSTEPS for Rasch model analysis, and Notepad for data coding. The Rasch model was employed to determine item difficulty, person ability, and instrument reliability. The results were then interpreted based on classroom conditions to ensure contextual and educational relevance. Participant The participants in this study consisted of 136 students . males and 85 female. from two senior high schools located in West Bandung Regency. West Java. Indonesia. All participants were Grade XI students enrolled under the Merdeka Curriculum, which emphasizes student centered learning and the development of 21st century competencies. The selection of participants was based on their enrollment in physics courses covering the topic of temperature and heat. The research sites were chosen to represent schools with similar academic characteristics and learning The geographical location of the study area is presented in Figure 2. Journal of Natural Science and Integration. Vol. No. October 2025, pp 258-272 Exploring Students' Creative Thinking Skills: A Design of Reliable Instrument in term of Assessing Creative Thinking on Temperature and Heat Concepts Figure 2. Distribution of Participant Instrument This study utilized six descriptive test items designed to assess studentsAo creative thinking skills in the context of physics learning. The questions focused on three key concepts: heat and temperature change, heat and physical change . , and heat transfer. Each item was developed to measure one or more of the four dimensions of creative thinking fluency, flexibility, originality, and elaboration. The instrument underwent expert validation by a panel of five experts, consisting of two university physics lecturers and three high school physics teachers. These experts were selected based on their strong understanding of physics concepts, extensive teaching experience, and proven competence in evaluating educational materials in accordance with national educational standards. The validation process assessed several criteria, including the alignment of test items with the indicators of creative thinking, the clarity and readability of the language used, and the relevance of the items to the concepts of temperature and heat. Subsequently, the validity and reliability of the test items were examined using Rasch model analysis to ensure psychometric soundness and measurement precision. The finalized test items used in this study are presented in Figure 3. Figure 3. Examples of Creative Thinking Skills Test Journal of Natural Science and Integration. Vol. No. October 2025, pp 258-272 | 261 Ahmad Maqruf. Andi Suhandi. Achmad Samsudin Data Analysis The data analyzed in this study consisted of expert validation results and studentsAo responses to the test items. The analysis process was carried out using the Rasch model, which allows for the examination of both item and person parameters on a common measurement scale. The Rasch analysis provided detailed information regarding item validity, respondent reliability, item reliability, as well as logit values for both items and respondents. The validity of the Creative Thinking Skills Instrument on the topic of temperature and heat serves as a critical indicator of whether the developed cognitive assessment tool effectively measures studentsAo creative thinking abilities in this The interpretation of the analysis results followed the guidelines proposed by Sumintono and Widhiarso . , focusing on key psychometric indicators such as item fit, person fit, and reliability indices. The summary of instrument validity findings is presented in Table 1. Table 1. Interpretation of Instrument Test Validity Interpretation Unfulfilled Fulfilled In Accordance Special Raw variance explained by measure ycO < 20% 20% O ycO O 40% 40% < ycO O 60% ycO > 60% Reliability items are used to determine the level of confidence that an instrument will produce the same results when used repeatedly, or to determine the consistency of test instruments used in The interpretation of the results is presented as follows (Sumintono & Widhiarso, 2. person and item reliability (Table . Tabel 2. Interpretation of Item and Person Reliability Interpretation Weak Moderate Good Very Good Excellent Value of Person Reliability and Item Reliability yc < 0. 67 O yc < 0. 80 O yc < 0. 90 O yc < 0. yc Ou 0. Furthermore. Logit items are used to indicate the difficulty and ease of a test question In addition, person logit is used to see the ability of students' creative thinking skills in response to test instruments. The interpretation of the results is presented as follows (Sumintono & Widhiarso, 2. item and person logit (Table. Tabel 3. Interpretation of Item and Person Logit Interpretation Value of Item Logit (A) Very Easy ya < Oe0. Easy Oe0. 82 O ya O 0. Difficult 00 < ya O 0. Very Difficult ya > 0. Interpretation Low Moderate High Value of Person Logit (B) yaA < 0. 96 O yaA < 1. yaA > 1. Journal of Natural Science and Integration. Vol. No. October 2025, pp 258-272 Exploring Students' Creative Thinking Skills: A Design of Reliable Instrument in term of Assessing Creative Thinking on Temperature and Heat Concepts RESULT AND DISCUSSION The results of this study are presented in several subsections to provide a detailed explanation and to address the research questions comprehensively. Data analysis and interpretation were conducted using the Rasch model, which minimizes human error and provides objective and consistent measurement results. The Rasch analysis allows for a more precise evaluation of test items and respondent performance, ensuring that the developed instrument meets psychometric standards for educational assessment. One of the key evaluations in determining the validity of the developed test instrument is unidimensionality analysis (Darmana et al. , 2021. Nurdini et al. , 2. This analysis is used to confirm whether the instrument measures a single underlying construct in this case, studentsAo creative thinking skills. A unidimensional instrument ensures that all items contribute meaningfully to assessing the same latent trait rather than multiple unrelated abilities. The results of the unidimensionality analysis for the Creative Thinking Skills Instrument on temperature and heat concepts are presented in Figure 4. Figure 4. Instrument Validity Results Based on Figure 4, the results of the validity measurement indicate that the raw variance explained by the measures is 40. 8% . range bo. , which falls within the acceptable range of 40% to 60% . ee Table . This suggests that the instrument meets the criterion for good unidimensionality and can therefore be considered appropriate for further research. In the Rasch model, the assessment of unidimensionality is conducted using Principal Component Analysis (PCA) of residuals, which examines the extent to which the developed instrument measures a single latent construct as intended (Amiruddin et al. , 2023. Hagell, 2014. Soeharto, 2. Another important indicator in this analysis is the unexplained variance, particularly the first contrast, which represents the strength of the secondary dimension within the data. The acceptable threshold for this value is below 15%, indicating minimal multidimensionality. The obtained value 0% . ed bo. meets this criterion, and the values of subsequent contrasts are also below 15%. These findings confirm that the instrument demonstrates satisfactory construct validity, meaning that all items consistently measure studentsAo creative thinking skills within the domain of temperature and heat concepts. Furthermore, the item and person analysis provides complementary information about the reliability of the questionnaire, including item reliability, person reliability, and the relationship between item person measures on the developed instrument. The detailed results of these analyses are presented in Figure 5. Journal of Natural Science and Integration. Vol. No. October 2025, pp 258-272 | 263 Ahmad Maqruf. Andi Suhandi. Achmad Samsudin Figure 5. Item person Reliability of The Instrument Figure 5 presents detailed information regarding the relationship between items and respondents . based on CronbachAos alpha values. The person reliability value . ed bo. is 73, which falls under the fair category, while the item reliability value . ellow bo. indicating an excellent level of reliability. These results imply that the studentsAo response consistency is acceptable, and the developed test items demonstrate very high quality. Thus, based on the CronbachAos alpha coefficient, the overall relationship between item and person reliability is well established (Robinson, 2018. Surucu & Maslakci, 2020. Taber, 2. The overall reliability index of 0. ed bo. is categorized as good, indicating that the instrument is sufficiently consistent for assessing studentsAo creative thinking skills. This also suggests that studentsAo creative thinking abilities exhibit notable variability, allowing for differentiation in their performance levels. Consequently, the instrument is suitable for profiling studentsAo creative thinking skills within the context of temperature and heat concepts. Furthermore, this analysis reveals both the distribution of studentsAo creative thinking ability levels and the difficulty level of each test item, reinforcing the validity and reliability results obtained previously (Cheung et al. , 2023. Tennant & Conaghan, 2007. White & Ronfeldt, 2. Therefore, a more detailed examination of item and person logit values is required to clarify the findings and to identify how well the test items represent the intended measurement construct. The results of this analysis are presented in Figure 6. Journal of Natural Science and Integration. Vol. No. October 2025, pp 258-272 Exploring Students' Creative Thinking Skills: A Design of Reliable Instrument in term of Assessing Creative Thinking on Temperature and Heat Concepts Figure 6. Item Person Logit Values Based on the logit values obtained according to Figure 5, the students' creative thinking skills and the difficulty level of the questions can be analyzed in depth . ee Table . For item logits, the most difficult question is Q4D with a logit of 1. ery difficul. , while the easiest question is Q2A with a logit of -1. Although item Q4D is very difficult, a quarter of respondents with high creative thinking skills were able to answer it well. Then, the student with the highest creative thinking ability is 069F with a logit value of 2. , while the student with the lowest creative thinking ability is 074F with a logit value of -0. This indicates that the reliability of the questions also affects the students' creative thinking skills. The mapping of students' creative thinking skills to the questions is shown in Figure 7. Figure 7. The Output of Variable (Wrigh. Maps Based on Figure 7, the distribution of studentsAo creative thinking skills and the difficulty levels of the test items can be clearly observed. Students represented within the blue box demonstrate very high proficiency, indicating their ability to correctly answer all items. In contrast, item Q4D . lack bo. is identified as the most difficult question within the instrument. Meanwhile, the student labeled 074F falls into the very low ability category, as this participant only responded correctly to items Q2A and Q2B, which were classified as the easiest items. Overall, studentsAo creative thinking skills were assessed through four aspects: fluency, flexibility, originality, and Journal of Natural Science and Integration. Vol. No. October 2025, pp 258-272 | 265 Ahmad Maqruf. Andi Suhandi. Achmad Samsudin Figure 7 also visualizes the interaction between items . ed boxe. and persons . reen boxe. The Item section lists each question . oded as Q1AeQ. , while the Person section includes participant identifiers, distinguished by gender through the codes M . and F . The response distribution follows an approximately normal curve, with most students clustering around the median level of ability . ellow bo. Students positioned in the blue box represent those with the highest creative thinking ability, while those in the red box represent individuals with limited ability to respond accurately to the given questions. The Wright map . ariable ma. output provides an effective visualization of the probability of each student successfully answering a given item, thus allowing researchers to interpret both ability and item difficulty within the same measurement continuum (Darman et al. , 2024. Prasetya & Pratama, 2023. Sumintono, 2. Furthermore, to examine potential gender related bias in the developed instrument, a Differential Item Functioning (DIF) analysis was conducted, as illustrated in Figure 8. DIF Measure Item Figure 8. DIF Between Gender Differences in Answering Questions Differential Item Functioning (DIF) refers to differences in the probability of correctly answering an item between two distinct groups (Kucam & Gyllerolu, 2023. Samsudin et al. , 2. In Figure 8, the DIF measure (Diff. ) represents the level of item difficulty where a higher value indicates a more challenging item. The black line represents the female group (F), while the red line represents the male group (M). The sequence of questions is displayed along the x-axis. The increasing trend in the graph illustrates that the overall difficulty of the items shows a positive progression, indicating that the instrument effectively differentiates between varying levels of student ability. However, gender bias was identified in several items where the F or M curves deviate above or below the green reference line . ormal lin. For instance, in item Q3A, the blue line appears above both the green and red lines, suggesting that female students are more likely to answer this item correctly compared to males. Conversely, for items Q5 and Q6, the red line appears above the green and blue lines, indicating that these questions are easier for male students. Therefore, minor revisions are recommended to reduce gender bias and enhance fairness across items. This analysis also contributes to identifying outlier items or individuals that do not align with the expected measurement model. Three main indicators are used to assess item and person fit: the Mean Square (MNSQ), which is considered acceptable when 0. 5 < MNSQ < 1. the ZStandardized (ZSTD) value, which is acceptable when Ae2. 0 < ZSTD < 2. and the Point Measure Journal of Natural Science and Integration. Vol. No. October 2025, pp 258-272 Exploring Students' Creative Thinking Skills: A Design of Reliable Instrument in term of Assessing Creative Thinking on Temperature and Heat Concepts Correlation (Pt Mean Cor. , which is acceptable when 0. 4 < Pt Mean Corr < 0. 85 (Sumintono & Widhiarso, 2. The results of this fit analysis are presented in Figure 9. Figure 9. Item Fit Order Figure 9 presents the item fit analysis for the Creative Thinking Skills Instrument in measuring studentsAo creative thinking abilities. Based on the ZSTD criterion, several items are classified as misfitting, including Q6. Q3A. Q4D, and Q1D. Specifically, items Q3A and Q6 exceed the upper threshold of 2. 0, while items Q4D and Q1D fall below the lower threshold, with values of Ae2. 46 and Ae5. 48, respectively. Nevertheless, when evaluated using MNSQ and Point Measure Correlation (Pt Mean Cor. criteria, all items meet the acceptable fit standards, indicating that the overall instrument remains psychometrically acceptable and suitable for assessing studentsAo creative thinking skills. To further interpret studentsAo performance, the percentage of achievement in each dimension of creative thinking was calculated, providing a more comprehensive view of studentsAo ability The results for the percentage of each category of creative thinking skills fluency, flexibility, originality, and elaboration related to temperature and heat concepts are presented in Figure 10. This analysis allows for a clearer understanding of the strengths and weaknesses of studentsAo creative thinking profiles and supports the validity and reliability findings discussed Elaboration Originality Flexibility Fluency Percentage (%) Figure 10. Percentage of Each West Bandung Regency Student Creative Thinking Skills Category Figure 10 illustrates the percentage distribution of studentsAo creative thinking skills across four categories in West Bandung Regency. The percentages from highest to lowest are: fluency . 34%), flexibility . 57%), elaboration . 76%), and originality . 13%). These results indicate that fluency is the strongest aspect of studentsAo creative thinking regarding temperature and heat This suggests that physics students in West Bandung Regency are proficient at generating multiple ideas in response to the given questions, likely due to a solid understanding of the Journal of Natural Science and Integration. Vol. No. October 2025, pp 258-272 | 267 Ahmad Maqruf. Andi Suhandi. Achmad Samsudin underlying material. These findings are consistent with previous studies, which reported that studentsAo creative thinking skills tend to be dominated by fluency (Nurdiana, 2020. SaAodiyah et al. Syukri et al. , 2. However, some student responses still rely heavily on theoretical knowledge provided by teachers and show limited connection to experiential observations of their environment. address this, learning activities on temperature and heat material could be enhanced by incorporating a 21st century skills approach, particularly emphasizing creative thinking The results of this study also confirm that the developed instrument is valid and reliable, effectively capturing the profile of studentsAo creative thinking skills. Nevertheless, minor revisions are necessary for certain items that demonstrated suboptimal performance to further improve accuracy and effectiveness. Conducting additional evaluations and retesting will ensure broader applicability and provide deeper insights into studentsAo creative thinking abilities. It should be noted that this study relies solely on quantitative data derived from percentages of creative thinking skills, without incorporating qualitative data that could offer richer insights into how students develop and apply their creativity. Additionally, the sample was limited to students from West Bandung Regency, which may restrict the generalizability of the findings to other regions or educational levels. Therefore, future research is recommended to include more diverse samples and to integrate qualitative data collection methods, enabling a more comprehensive understanding of studentsAo creative thinking skills across different contexts. CONCLUSION This study successfully developed and validated a creative thinking skills test instrument on the topic of temperature and heat for high school students. The Rasch analysis results indicate that the instrument is valid for measuring a single dimension of creative thinking, encompassing fluency, flexibility, originality, and elaboration within the physics context. The analysis also revealed a clear relationship between items and respondents, with a CronbachAos alpha of 0. 78, classified as good, indicating satisfactory internal consistency. The instrument demonstrated excellent reliability, as evidenced by consistent item fit across the test. Among the participants, the highest creative thinking ability was observed in student 069F, with the highest logit value exceeding the most difficult item. Q4D, while the lowest ability was recorded for student 072F, corresponding to the easiest item. Q2A. Some gender bias was identified in items Q3A. Q5, and Q6, suggesting that minor revisions are required to enhance fairness. It is recommended that this instrument be further developed and refined through testing on a larger and more diverse sample to ensure broader Physics teachers can utilize this tool to identify, monitor, and map studentsAo creative thinking skills, enabling more targeted instructional strategies. Further research is also needed to examine the validity and reliability of this instrument across other physics concepts and at different educational levels to expand its applicability. ACKNOWLEDGMENTS The authors would like to sincerely express their gratitude to the Riau Provincial Government, especially the Riau Provincial Education Office, for providing financial support and opportunities through the MasterAos Degree Scholarship Program at Universitas Pendidikan Indonesia. Their support was instrumental in facilitating the completion of this research. REFERENCES