Electronic Journal of Education. Social Economics and Technology Vol. No. 2, . , pp. Article ID: 1062 ISSN 2723-6250 . DOI: https://doi. org/10. 33122/ejeset. Research Article Development of HOTS Assessment Tools to Measure Achievement in Acid-Base Learning Outcomes Rauzatul Jannah. Isna Rezkia Lukman*. Fiqih Choiruddin. Naila Nurul Izzah and Ulfa Zahra. Halimatus Sakdiah Faculty of Teachers Training and Education. Universitas Malikussaleh. Aceh Utara. Indonesia, 24351 *Corresponding Author: rezkia. lukman@unimal. id | Phone: 6281220237241 ABSTRACT This study aims to develop a valid and reliable assessment instrument to measure students' higher-order thinking skills in the context of acid-base chemistry. The research employed a Research and Development (R&D) approach using the 4D model, which includes the stages of Define. Design. Develop, and Disseminate. The initial product was validated through expert judgment, evaluated for feasibility by chemistry teachers, and subjected to field testing. The study involved 200 twelfth-grade science students from SMAN Bireuen. Data collection techniques included interviews, questionnaires, and tests. Data collection instruments included interview guidelines, item validation sheets, feasibility evaluation forms, and multiple-choice questions with open-ended justifications. The research findings revealed that the material validation assessment obtained an average score of 47. cUI > 46. , categorized as "very good. " Similarly, the feasibility assessment achieved an average score of 72. cUI > 67. , also falling within the "very good" category. A total of 30 test items from the trial were analyzed in terms of validity, reliability, difficulty level, discrimination index, and distractor effectiveness. Therefore, it can be concluded that the assessment instrument is appropriate for measuring higher-order thinking skills in the topic of acid-base chemistry. Keywords: acid-base. HOTS INTRODUCTION Learning is defined as a change in behavior among both learners and educators, resulting from the process of delivering and receiving knowledge through various instructional methods. The effectiveness of the learning process can be evaluated based on the achievement of predetermined learning objectives (Fatmawati, 2021. Sirait, 2021. Sutikno, 2. To assess the achievement of these objectives, an evaluation of students' learning outcomes is necessary. Assessment within the learning system serves as a means to evaluate the effectiveness of both the learning process and its outcomes for students (Mendiknas, 2007. Pemerintah Indonesia, 2. Assessment plays a pivotal role in the educational process, serving as a key benchmark for evaluating learning outcomes. Broadly speaking, assessment functions as a tool for educational evaluation, enabling stakeholders to understand the current status and effectiveness of the learning process. The degree to which educational objectives and targeted competencies are achieved is measured through systematic evaluation activities. These activities involve the structured collection, analysis, and interpretation of data to determine the extent of learning achievement and to inform future instructional decisions (Agustianti et al. , 2022. Calista & Yefterson, 2022. Pitaloka et al. and has the ability to stimulate students to optimize their potential in the learning process. Accurate assessment plays a crucial role in enhancing students' Higher-Order Thinking Skills (HOTS), as it provides meaningful feedback and encourages deeper cognitive engagement in the learning process (Armanto et al. , 2021. Nisa et al. , 2022. Rorimpandey et , 2. Higher-order thinking skills (HOTS) occupy the upper levels of BloomAos cognitive taxonomy and are designed to stimulate learners to apply knowledge and skills in novel contexts (Halimah, 2021. Istiyono et al. , 2014. Rochman & Hartoyo, 2. HOTS encompass the abilities to analyze, evaluate, and create, as outlined by Anderson . Higher Order Thinking Skills (HOTS) assessments are designed to stimulate learners to apply their existing knowledge and skills to novel situations (Sari et al. , 2023. Setyowati et al. , 2. Rather than relying solely on rote memorization, students are encouraged to connect previously learned concepts with real-world problems encountered in everyday life. This approach fosters meaningful and relevant learning experiences while simultaneously cultivating advanced cognitive abilities that are essential in todayAos globalized and rapidly evolving world. Page 1 of 6 Jannah et al. Electronic Journal of Education. Social Economic and Technology. Vol. No. 2, . , pp. Article ID: 1062 Based on interviews conducted with chemistry teachers of grade XI science students at SMAN 1 Bireuen . topperforming schoo. , it was found that assessment practices have not yet been fully aligned with appropriate pedagogical The teachers continue to rely predominantly on conventional multiple-choice tests, and cognitive process assessments are often based on the teacher's subjective judgment. Specifically, students are awarded high scores if they appear to engage actively in classroom activities, without a comprehensive evaluation of the cognitive dimensions that should be systematically measured. To address the identified issues, a two-tier multiple-choice diagnostic test was developed. This type of test consists of five answer options accompanied by a rationale for each selected response. A two-tier multiple-choice item includes two levels: the first level presents a set of options comprising the correct answer and distractors, while the second level requires respondents to select a justification that reflects their conceptual understanding or reasoning behind the initial choice (Jamhari, 2021. Rintayati et al. , 2. This format has been recognized as a more effective assessment tool compared to conventional multiple-choice questions. Previous studies Samaduri . have reported positive outcomes using this type of instrument to identify students' alternative conceptions. Previous research conducted by Desiriah & Setyarsih . concluded that the development of Higher-Order Thinking Skills (HOTS) assessment instruments in the form of reasoned multiple-choice items effectively measures students' abilities in analysis, evaluation, and creation. The instrument was found to meet the necessary criteria for evaluating HOTS competencies. The primary objective of this study is to evaluate the feasibility of a Higher-Order Thinking Skills (HOTS) assessment instrument designed to measure learning outcomes of eleventh-grade science students (Class XI IPA) on acidAebase chemistry topics. Accordingly, the study seeks to answer the following research question: How feasible is the HOTS-based assessment instrument in effectively measuring students' learning achievement. RESEARCH METHOD This study employed a qualitative approach with the primary aim of outlining the procedural steps necessary for developing an educational product. The research was conducted at SMA Negeri 1 Bireuen, located in Bireuen Regency. Aceh, involving a total of 200 student participants. The study adopted a research and development (R&D) methodology. The development of the HOTS (Higher-Order Thinking Skill. assessment instrument was carried out using the 4D model of instructional The product development process consisted of four key phases, namely: Define This stage involves a comprehensive initial-to-final analysis, including learner analysis, formulation of learning objectives, conceptual analysis, and task analysis. Design At this stage, the development process focuses on determining the type of instrument to be used, constructing a blueprint . est specification tabl. , and designing the instrument accordingly. Develop This phase encompasses expert validation and field testing. The process begins with the initial product being validated by subject matter experts, followed by revisions based on the feedback received. The revised version is then reviewed by external reviewers to finalize the product. After this, pilot testing is conducted to evaluate its practicality and Disseminate In this final phase, the completed product is disseminated for broader use by relevant stakeholders. The dissemination process aims to facilitate large-scale implementation and adoption. The instruments employed in this study included a small-scale exploratory interview to gather initial data on educatorsAo A content expert validation sheet was designed to enhance the initial development of the instrument. The validation technique functioned as a tool for evaluation, feedback, and recommendations from experts, which were then used to revise the instrument accordingly. A product feasibility sheet was assessed by reviewers to evaluate the overall quality of the product, encompassing aspects such as content, construction, language clarity, validity, and practicality. Additionally, a multiple-choice reasoning test was specifically developed to assess studentsAo learning outcomes in chemistry. A field trial was conducted to determine the quality of the test items, evaluating their validity, reliability, difficulty level, discrimination index, and distractor effectiveness. Page 2 of 6 Jannah et al. Electronic Journal of Education. Social Economic and Technology. Vol. No. 2, . , pp. Article ID: 1062 RESULTS AND DISCUSSION 1 Results The research data were obtained using the 4-D model, comprising both questionnaire responses and test results. The results of the material expert validation and product feasibility assessments are presented in Table 1 and Table 2. Table 1. Assessment of Material Aspects Aspects Learning Content Language Average Score Category Excellent Excellent Excellent Presentation Good Total Score Excellent Based on Table 1, the average score from the assessment by subject matter experts across all aspects was 47. 5, which falls within the "very good" category. This indicates that the quality of the HOTS assessment instrument for acid-base topics is rated as very good. Based on Table 1, the average score from the assessment by subject matter experts across all aspects 5, which falls within the "very good" category. This indicates that the quality of the HOTS assessment instrument for acid-base topics is rated as very good. Table 2. Assessment of Feasibility Aspects Aspects Substance Construction Language Average Score Category Excellent Excellent Excellent Validity Practicality Excellent Excellent Total Score Excellent Based on Table 2, the average score from the reviewers' assessments across all aspects was 72. 5, which falls under the "very good" category, indicating that the HOTS assessment instrument is deemed suitable for use. Subsequently, the HOTS assessment instrument will proceed to the trial phase to evaluate the quality of the test items, including their validity, reliability, difficulty level, discriminating power, and distractor analysis. Validity An item is considered valid if the correlation coefficient . CeA) is greater than or equal to the critical value of the rtable . CeA Ou rtabl. otherwise, it is deemed invalid . CeA O rtabl. Based on this criterion, 29 items were found to be valid, while 1 item was categorized as invalid. Reliability The reliability analysis was conducted using CronbachAos Alpha () formula. The obtained reliability value was 0. which falls within the range of very high reliability. Difficulty Level Based on the analysis, it was found that one item was categorized as easy, while the remaining 29 items were classified as having a moderate level of difficulty. Discriminatory Power Based on the analysis, it was found that 11 items fell into the "good" category, 12 items were categorized as "fair," and 7 items were classified as "poor. Distractor Analysis Based on the analysis, it was found that 23 test items were of good quality and deemed suitable for use, while 7 items required revision. 2 DISCUSSION Higher-order thinking skills (HOTS) refer to students' ability to apply and evaluate knowledge in complex contexts (Hamzah et al. , 2022. Kim How et al. , 2. As such, the development of HOTS is strongly emphasized to foster studentsAo cognitive These skills can be cultivated through instructional programs that focus on cognitive reasoning processes, particularly by creating learning conditions that promote practice in solving problems aligned with HOTS (Indriyana & Kuswandono, 2. One effective approach to assessing HOTS is through the use of reasoned multiple-choice tests. This form of assessment consists of two-tiered questions: the first tier includes a correct answer along with distractors, while the Page 3 of 6 Jannah et al. Electronic Journal of Education. Social Economic and Technology. Vol. No. 2, . , pp. Article ID: 1062 second tier presents alternative conceptions or justifications that reflect the studentsAo reasoning behind their initial choices (Lengkong et al. , 2021. Sesli & Kara, 2. The evaluation results from both content experts and reviewers indicate that the instruments are not only appropriate for use but also fall within the AuexcellentAy category. This is evidenced by the overall mean score from the content expert assessments, which reached 47. 5 exceeding the threshold of ycUE > 46. 194, thus qualifying as excellent. Similarly, the reviewersAo overall mean score was 72. 5, placing it within the ycUE > 67. 188 range, also categorized as excellent. Furthermore, the results of the instrument trials analyzing aspects such as validity, reliability, item difficulty, discrimination power, and distractor effectiveness demonstrated that 23 items were considered good and suitable for use, while 7 items required revision. The trial test consisted of 30 multiple-choice items, each accompanied by a rationale. Every item was constructed based on specific learning indicators. Higher-Order Thinking Skills (HOTS) indicators, knowledge dimensions, and item criteria. For example, in an item related to soil pH, students were expected to differentiate between acidic, alkaline, and neutral soil The HOTS level was categorized as C4, emphasizing analytical skills. The associated knowledge dimension was procedural, focusing on understanding how to perform tasks based on the concept of acid-base neutralization in soil. Procedural knowledge includes familiarity with techniques and methods, as supported by previous studies (Alexander et al. De Jong & and Ferguson-Hessler, 1996. Dochy & Alexander, 1. The item criteria required that each question be valid and demonstrate good discriminative power. A higher number of correct responses indicated greater student mastery of specific items, which was attributed to appropriate item difficulty levels. CONCLUSION Based on the research objectives and findings, it can be concluded that the assessment instrument developedAiconsisting of multiple-choice tests with open-ended justificationsAiis of high quality and deemed highly appropriate for use, as validated by subject matter experts and independent reviewers. The average evaluation scores indicate that the developed HOTS (Higher Order Thinking Skill. assessment instrument falls within the "excellent" category. RECOMMENDATIONS The results of this research and study are expected to provide information, insight, and knowledge that will benefit readers and institutions. ACKNOWLEDGEMENTS This paper and the research that underpins it would not have been possible without the outstanding support of our teams at Chemistry Education at Malikussaleh University. We are also thankful for the insightful discussions we had with our The generosity and expertise of everyone involved have significantly improved this study and helped me avoid many errors. Thank you all so much. AUTHORAoS CONTRIBUTIONS All authors discussed the results and contributed to from the start to final manuscript. CONFLICT OF INTEREST The authors declare that they have no competing interests. REFERENCES