EDUKASIA: Jurnal Pendidikan dan Pembelajaran Vol. 7, 1 (January-June, 2. , pp. ISSN: 2721-1150. EISSN: 2721-1169. DOI: 10. 62775/edukasia. Psychometric Properties of the Classroom English Proficiency (CEP) Scale for English Language Teachers in Indonesia: A CFA and Rasch Model Analysis Syahadah Albaqiyatul Karimah1. Devie Yundianto1. Muhammad Aqil Alaauddin1 Universitas Nahdlatul Ulama Indonesia. Indonesia ARTICLE INFO Keywords: Classroom English Proficiency Scale. Confirmatory Factor Analysis. English Teacher. Psychometric Properties. Rasch Model Article history: Received 2025-11-19 Revised 2026-01-08 Accepted 2026-02-22 ABSTRACT This study aimed to adapt and validate the Classroom English Proficiency Scale (CEPS) for English language teachers in Indonesia using Confirmatory Factor Analysis (CFA) and the Rasch Rating Scale Model (RSM). The CFA confirmed that the four latent dimensionsAiGrammar. Pronunciation. Interaction, and InstructionAifit well within a second-order structure representing a single construct of classroom English proficiency. Rasch analysis further supported the unidimensionality assumption, with variance explained by measures exceeding the recommended criterion. The scale showed high reliability, and all items met model-fit expectations. The five-point Likert rating scale functioned effectively, with ordered Andrich thresholds and consistent category use. Differential Item Functioning (DIF) analysis indicated that 11 of 12 items were invariant across gender groups, except CEPS_2 (AuI can use a wide range of English vocabularyA. , which was slightly more difficult for male Overall. Indonesian-adapted CEPS demonstrated strong validity, reliability, and fairness, confirming its suitability for assessing English proficiency in classroom contexts. The CEPS can serve as a reliable diagnostic and evaluative tool for English language teachers in teacher education and EFL assessment settings. This is an open access article under the CC BY-NC-SA license. Corresponding Author: Syahadah Albaqiyatul Karimah Universitas Nahdlatul Ulama Indonesia. Indonesia. syahalbakarimah@unusia. INTRODUCTION In todayAos era. English proficiency is no longer considered an additional skill but rather a necessity, particularly in the field of education. For teachers. English proficiency is not only about mastering grammar and vocabulary but also encompasses the ability to deliver lessons clearly, communicate with students, and manage classrooms effectively in English (Numonova, 2. (Zega, 2. (Sahnan & Daulay, 2. Teachers with strong English competence are believed to foster more interactive learning https://jurnaledukasia. EDUKASIA: Jurnal Pendidikan dan Pembelajaran. Vol. 7, 1 (January-June, 2. 204 of 218 environments and motivate students to develop greater confidence in using English (Ali et al. , 2. (Al-Barakat et al. , 2. (Zhang, 2. Language proficiency is regarded as one of the core competencies that English teachers must possess (Pham, 2. (Ismailov et al. , 2. However, general English proficiency alone does not guarantee teachersAo readiness for classroom instruction, as teaching requires context-specific language use that supports pedagogical functions (Chambless, 2. (Pham, 2. Teachers with insufficient classroom-related language skills often rely heavily on textbooks and scripted materials, limiting spontaneous interaction and authentic communication with learners (Medgyes P, 2. (Nystrym, (Nguyen, 2. (Aramaki, 2. In Indonesia. Classroom English Proficiency (CEP) has gained scholarly attention in recent years due to its pivotal role in enhancing the effectiveness of English instruction. CEP refers to the level of English proficiency required to teach effectively in classroom contexts, encompassing essential classroom-related skills such as giving instructions, managing interaction, providing feedback, and scaffolding studentsAo understanding (Freeman, 2. (Wang, 2. (Matsumura & Hinoki, 2. (Walsh, 2. (Richards, 2. Grounded in the communicative competence framework (Canale & Swain, 1. CEP reflects the integration of linguistic, sociolinguistic, discourse, and strategic competences situated within teaching practices. Empirical studies consistently demonstrate that higher levels of teachersAo English proficiency are associated with more effective classroom management, clearer lesson delivery, and increased student engagement (Goh & Burns, 2. (Walsh, 2. Nevertheless, teaching effectiveness is also influenced by complementary factors such as teacher confidence, professional experience, and institutional support (Narzillayevna, 2. , underscoring the need for valid and reliable measurement of classroom-specific language proficiency. Several instruments have been developed to assess CEP, including the English Language Teaching Confidence Scale (ELT-CS) (Chacyn, 2. and the Teacher Oral Proficiency in English Scale (TOPE) (DE JONG et al. , 2. These instruments primarily focus on oral proficiency, fluency, and teachersAo confidence in using English for instructional purposes. However, prior research also emphasizes that CEP development is shaped by contextual factors such as teacher education, classroom environment, and pedagogical innovation, and that native-speaker status alone does not ensure effective teaching competence (Kamhi-Stein, 2. (Waddington, 2. (Selvi et al. , 2. Despite the growing recognition of CEP, no measurement scale has been specifically adapted and psychometrically validated for the Indonesian educational context. Recent Indonesian studies highlight the importance of localized validation to ensure cultural relevance and measurement accuracy (Karimah et al. , 2. This gap indicates the need for a rigorous psychometric evaluation of CEP instruments that aligns with both local teaching practices and contemporary measurement standards. Addressing this gap, the present study aims to adapt and psychometrically validate the Classroom English Proficiency Scale (CEPS) for English language teachers in Indonesia using an integrated Confirmatory Factor Analysis (CFA) and Rasch modeling framework. Specifically, this study investigates whether the CEP scale demonstrates adequate validity, reliability, unidimensionality, and appropriate item difficulty to support teacher assessment and professional development in the Indonesian context. METHODS 1 Participants This study employed a quantitative research method with a scale development approach to examine English language teachers in Indonesia. A total of 202 participants voluntarily took part in this Participants were recruited using a convenience sampling technique. All participants were active English teachers from both formal and non-formal educational institutions, including public and Syahadah Albaqiyatul Karimah. Devie Yundianto. Muhammad Aqil Alaauddin / Psychometric Properties of the Classroom English Proficiency (CEP) Scale for English Language Teachers in Indonesia: A CFA and Rasch Model Analysis EDUKASIA: Jurnal Pendidikan dan Pembelajaran. Vol. 7, 1 (January-June, 2. 205 of 218 private schools, and represented various educational levels such as primary, secondary, and tertiary The sample consisted of 165 females . 7%) and 37 males . 3%), with ages ranging from 18 to 52 years (M = 25. SD = 5. Prior to participation, all respondents provided informed consent, and their anonymity and confidentiality were strictly maintained. 2 Research Instrument The instrument used in this study was the Classroom English Proficiency Scale (CEPS), originally developed by (Wang, 2. , which was adapted and modified into an Indonesian version. The Classroom English Proficiency Scale (CEPS) comprises 12 items that represent four key dimensions of teachersAo English proficiency in the classroom. The first dimension. Grammatical and Lexical Accuracy and Range . , measures teachersAo ability to use correct grammar and a rich vocabulary in classroom communication. The second dimension. Pronunciation. Stress, and Intonation . , assesses the clarity of pronunciation, accuracy of word stress, and appropriateness of intonation patterns in spoken English. The third dimension. The Language of Interaction . , evaluates teachersAo ability to use English effectively for classroom interaction, such as giving feedback, managing discussions, and responding to students. The final dimension. The Language of Instruction . , measures teachersAo competence in understanding and using English as a medium of instruction to facilitate learning and deliver lesson content clearly and effectively. All items were presented in the form of statements rated on a five-point Likert scale, ranging from 1 (Strongly Disagre. to 5 (Strongly Agre. A higher score on each dimension indicates a greater perceived level of English proficiency. The back-translation technique was applied to ensure equivalence between the original and the Indonesian version. The development of the CEPS items was based on an extensive review of the literature on language proficiency assessment and was qualitatively validated through expert judgment by two specialists in the field of English language teaching and psychology to ensure content validity. 3 Data Collection Procedures The data collection process was conducted through both online and offline methods. The online survey was distributed via Google Forms, while the offline data were collected by directly visiting schools and educational institutions. On the initial page of the questionnaire, the researchers provided a comprehensive explanation of the studyAos objectives, assurances of data confidentiality, and the voluntary nature of participation. Participants who agreed to participate proceeded to complete the The data collection process lasted approximately 10 weeks. Upon completion of data collection, all responses were prepared and organized for statistical analysis. 4 Data Analysis Procedures Data were analyzed using the Rating Scale Model (RSM) developed by (Andrich, 1. RSM is a measurement model for polytomous data . ata with two or more ordinal categorie. The model provides estimates of person locations, item difficulties, and overall thresholds . ixed across item. , while allowing item difficulties to vary. To address the need for a valid and reliable instrument in measuring Classroom English Proficiency (CEP) among teachers in Indonesia, this study adopts a quantitative approach with a cross-sectional design. The study aims to analyze the appropriateness and accuracy of the CEP instrument as a measure of English proficiency in classroom teaching, while also validating and further developing the scale to align with the characteristics of Indonesian teachers. Syahadah Albaqiyatul Karimah. Devie Yundianto. Muhammad Aqil Alaauddin / Psychometric Properties of the Classroom English Proficiency (CEP) Scale for English Language Teachers in Indonesia: A CFA and Rasch Model Analysis EDUKASIA: Jurnal Pendidikan dan Pembelajaran. Vol. 7, 1 (January-June, 2. 206 of 218 5 Confirmatory Factor Analysis (CFA) Data analysis was performed using JASP statistical software version 0. Several statistical procedures were employed to examine the psychometric properties of the Classroom English Proficiency Scale (CEPS). First, descriptive statistical analyses were conducted to identify participantsAo demographic characteristics and to summarize the distribution of scores for each item. Second, to investigate the internal structure of the scale, a Confirmatory Factor Analysis (CFA) was carried out. CFA was selected because the present study adapted the CEPS instrument based on the theoretical framework of the original measure, which specifies the presence of four latent factors. The proposed four-factor model was tested to determine its fit with the empirical data. Model fit was evaluated using several goodness-of-fit indices, including Chi-square (NA). Comparative Fit Index (CFI). TuckerAeLewis Index (TLI). Root Mean Square Error of Approximation (RMSEA), and Standardized Root Mean Square Residual (SRMR). The model was considered to have an acceptable fit if it met the recommended thresholds of CFI and TLI > 0. RMSEA < 0. 08, and SRMR < 0. 08 (Hu & Bentler, 1999. Wang & Wang, 2. Third, reliability analysis was conducted to evaluate the internal consistency of each CEPS subscale. Reliability coefficients were calculated using CronbachAos Alpha and McDonaldAos Omega, with values greater than 0. 70 indicating acceptable reliability (Bentler, 2. (Bodoff, 2. 6 Rasch Model Analysis The study also hypothesized that a unidimensional model would fit the data according to the Rasch Model requirements. Data were analyzed using Winsteps software version 3. The Rasch Model was employed for its ability to convert ordinal response scores into linear interval measures . for both person ability and item difficulty (Bond, 2. Specifically, the Rating Scale Model (RSM) was applied, as all CEPS items share the same response format (Chong et al. , 2. In this study, the unconditional maximum likelihood estimator was used for RSM, while Maximum Likelihood estimation was used in the CFA procedure. 7 Rating Scale Functioning The initial stage of Rasch analysis involved examining the functioning of the rating scale categories to ensure that the Likert response options operated as intended. Evaluation criteria included: . each response category having a minimum frequency of 10. mean-square (Outfit MNSQ) values for each category being less than 2. threshold values . tep calibration. increasing monotonically, indicating that higher response categories consistently represented higher proficiency levels (Linacre, 8 Unidimensionality A core assumption of the Rasch Model is unidimensionality, meaning the instrument measures a single dominant construct (Medvedev & Krygeloh, 2. (Santoso et al. , 2. This assumption was evaluated using the Principal Component Analysis of Residuals (PCAR). The scale was considered unidimensional if: . the Raw Variance Explained by Measures (RVEM) was at least 40% (Holster & Lake, 2. , and . the Unexplained Variance in the First Contrast (UVIC) had an eigenvalue not exceeding 15% (Fan & Bond, 2. 9 Item and Person Fit Item and person fit were assessed using Mean Square (MNSQ) statistics, which include Infit and Outfit indices (Rahman, 2. Infit MNSQ is sensitive to unexpected response patterns on items whose Syahadah Albaqiyatul Karimah. Devie Yundianto. Muhammad Aqil Alaauddin / Psychometric Properties of the Classroom English Proficiency (CEP) Scale for English Language Teachers in Indonesia: A CFA and Rasch Model Analysis EDUKASIA: Jurnal Pendidikan dan Pembelajaran. Vol. 7, 1 (January-June, 2. 207 of 218 difficulty levels are close to the respondentAos ability (Megbele et al. , 2. , whereas Outfit MNSQ is more sensitive to unexpected responses on items that are much easier or harder. Items were considered to fit the model if both Infit and Outfit MNSQ values ranged between 0. 6 and 1. 4, and pointAemeasure correlations were positive (Bond et al. , 2. 10 Reliability and Separation Rasch reliability was estimated separately for items and persons. Item reliability indicates how well items define a stable hierarchy of difficulty, while person reliability shows how effectively the instrument distinguishes among participantsAo ability levels. Additionally, separation indices were examined to determine how many statistically distinct strata could be identified for both items and persons (Bond et al. , 2. 11 ItemAePerson Wright Map The final stage of analysis involved constructing an ItemAePerson Wright Map to visualize the distribution of person abilities and item difficulties along the same logit scale. This map is useful for evaluating targeting, that is, whether the item difficulty levels appropriately match the range of participant abilities measured by the scale (Bond et al. , 2. 12 Differential Item Functioning (DIF) Differential Item Functioning (DIF) occurs when an item exhibits different interpretations across groups, resulting in one group being advantaged or disadvantaged (Hambleton & Jones, 1. Rasch modeling emphasizes the importance of items being free from bias (Christensen et al. , 2. DIF was considered substantial when the DIF contrast reached 0. 5 logits. Furthermore, items were flagged for potential elimination if the RaschAeWelch t-test exceeded A1. 96 and was statistically significant . < 0. (Linacre, 2. To enhance clarity and transparency. Table 1 summarizes the key methodological procedures and analytical steps employed in this study. The table provides an overview of the sequential stages of instrument adaptation, data collection, and psychometric evaluation using Confirmatory Factor Analysis (CFA) and Rasch modeling. This summary is intended to facilitate reproducibility and to clearly illustrate how each analytical procedure aligns with the study objectives and validation Table 1 Summary of Methodological Procedures and Analytical Steps Stage Procedure Translation and backtranslation Expert review Online and offline Confirmatory Factor Analysis (CFA) Purpose Linguistic and cultural Content validity Factor structure and construct validity JASP Reliability analysis Internal consistency Scale reliability CronbachAos Alpha. McDonaldAos Omega Item-level analysis Rasch Rating Model (RSM) Rating scale evaluation Category functioning Item difficulty and person ability Response Instrument adaptation Content validation Data collection Construct validation Scale Data acquisition Method / Software Expert judgment Qualitative evaluation Google Forms & field Winsteps Rasch analysis Syahadah Albaqiyatul Karimah. Devie Yundianto. Muhammad Aqil Alaauddin / Psychometric Properties of the Classroom English Proficiency (CEP) Scale for English Language Teachers in Indonesia: A CFA and Rasch Model Analysis EDUKASIA: Jurnal Pendidikan dan Pembelajaran. Vol. 7, 1 (January-June, 2. 208 of 218 Unidimensionality PCAR Dimensionality Rasch analysis Fairness analysis Differential Item Functioning (DIF) Gender invariance RaschAeWelch t-test FINDINGS AND DISCUSSION The data analysis begins with descriptive statistics and tests of normality assumptions. Subsequently, the results of a second-order Confirmatory Factor Analysis (CFA) are presented to provide preliminary evidence for the construct validity of the scale. The main section of this chapter then reports the results of an in-depth psychometric analysis using the Rasch Rating Scale Model, aimed at evaluating the quality and measurement precision of the Classroom English Proficiency Scale (CEPS). 1 Descriptive Analysis Descriptive analysis was conducted on the 12 CEPS items collected from 202 participants. The normality test results indicated that the data were normally distributed. The skewness values for the 12 items ranged from Ae1. 461 (CEPS . to Ae0. 422 (CEPS . , while the kurtosis values ranged from Ae0. (CEPS . 496 (CEPS . All skewness and kurtosis values were below the absolute cutoff points recommended for multivariate analyses (. for skewness and . for kurtosi. Therefore, the dataset met the assumptions required for further multivariate analyses, including CFA. 2 Confirmatory Factor Analysis (CFA) As an initial step in testing construct validity, a second-order CFA was performed to determine whether the proposed four-factor model, unified by a higher-order factor, adequately fit the empirical The model fit indices demonstrated an overall good fit between the hypothesized model and the observed data . ee Figure . Although the Chi-square value was significant (NA. = 97. 432, p < 0. Ai a common occurrence in larger samplesAithe other fit indices supported a robust model. The Comparative Fit Index (CFI) was 0. 960, and the TuckerAeLewis Index (TLI) was 0. 947, both exceeding the recommended threshold of 0. Furthermore, the error indices indicated acceptable model fit: the Root Mean Square Error of Approximation (RMSEA) was 0. % CI . 048, 0. ), and the Standardized Root Mean Square Residual (SRMR) was 0. An SRMR value below 0. 08 suggests a very good model fit. The factor loading analysis revealed that all 12 items were significant indicators of their respective first-order latent factors . < 0. , with standardized estimates ranging from 0. 623 (CEPS . (CEPS 1 and CEPS . Moreover, the four first-order factors were also found to significantly load onto a second-order construct, with standardized loadings ranging from 0. 969 to 1. 000, providing strong evidence of hierarchical construct validity. From the perspective of Classical Test Theory (CTT), the overall reliability of the CEPS instrument was high, with a CronbachAos alpha () coefficient of 0. The internal consistency coefficients for each subscale were as follows: Grammatical and Lexical Accuracy and Range ( = 0. Pronunciation. Stress, and Intonation ( = 0. Language of Interaction ( = 0. , and Language of Instruction ( = These results demonstrate that the CEPS possesses strong internal reliability and consistent measurement across its four dimensions. Syahadah Albaqiyatul Karimah. Devie Yundianto. Muhammad Aqil Alaauddin / Psychometric Properties of the Classroom English Proficiency (CEP) Scale for English Language Teachers in Indonesia: A CFA and Rasch Model Analysis EDUKASIA: Jurnal Pendidikan dan Pembelajaran. Vol. 7, 1 (January-June, 2. 209 of 218 Figure 1 Four-Dimensional Second-Order Factor Measurement Model 3 Rasch Model Analysis To obtain a more in-depth psychometric evaluation at the interval level, the analysis was further conducted using the Rasch Rating Scale Model (RSM), implemented with Winsteps version 3. Unidimensionality The fundamental assumption of the Rasch model is unidimensionality, which means that the instrument measures a single dominant construct. This assumption was tested using Principal Component Analysis of Residuals (PCAR). The analysis results confirmed that the CEPS instrument strongly meets the unidimensionality assumption. The Raw Variance Explained by Measures (RVEM) 2%, exceeding the recommended minimum threshold of 40%, indicating that one primary construct accounts for the majority of variance in the data. Rasch Reliability and Separation The summary statistics indicated excellent levels of reliability and separation. For the participants (N = . , the Person Reliability value was 0. 85, reflecting a high level of response consistency. The Person Separation Index (PSI) was 2. 38, suggesting that the instrument can distinguish participants into at least two to three statistically distinct ability strata. For the items, the Item Reliability was 0. 93, indicating that the hierarchy of item difficulty levels was highly stable and replicable. The Item Separation Index of 3. 76 shows that the items can effectively differentiate three to four distinct difficulty levels. As confirmation, the classical reliability estimate (CronbachAos Alpha/KR-. calculated by Winsteps was 0. 92, which aligns with the Rasch reliability Item Fit and Item-Person Map The item fit analysis was conducted to identify any items that functioned anomalously within the Rasch model. The criteria used were Infit and Outfit Mean Square (MNSQ) values within the ideal range of 0. 6 to 1. 4, and Standardized Fit Statistics (ZSTD) within the ideal range of Ae2. 0 to 2. These criteria ensured that all items contributed meaningfully to measuring the intended construct. Syahadah Albaqiyatul Karimah. Devie Yundianto. Muhammad Aqil Alaauddin / Psychometric Properties of the Classroom English Proficiency (CEP) Scale for English Language Teachers in Indonesia: A CFA and Rasch Model Analysis EDUKASIA: Jurnal Pendidikan dan Pembelajaran. Vol. 7, 1 (January-June, 2. 210 of 218 Table 2 Item Fit Statistic Measure MNSQ Infit Outfit PTMEA Saya dapat memberikan kuliah dengan struktur tata bahasa Inggris yang Saya dapat menggunakan berbagai kosa kata bahasa Inggris yang luas Saya dapat menggunakan kata-kata yang tepat untuk mengungkapkan ide Saya dapat berbicara bahasa Inggris dengan jelas tanpa kesalahan sistematis dalam Saya tahu cara memberi tekanan pada kata-kata bermakna . ontent word. dalam pengucapan Saya dapat menggunakan intonasi secara alami untuk menyampaikan Saya dapat menggunakan bahasa Inggris yang sesuai untuk mengajukan pertanyaan atau memberikan petunjuk dan Saya dapat menggunakan bahasa Inggris yang sesuai untuk merespons pertanyaan siswa, seperti meminta klarifikasi, memberikan konfirmasi, dan meminta Saya dapat memberikan umpan balik dengan terampil dalam bahasa Inggris, seperti mengakui, mengevaluasi, dan mengomentari respons Item Original Item Item Wording CEPS_1 I can lecture with correct English I can use a broad range of English I can use accurate words to express I can speak English clearly with no systematic in pronunciation. I know how to stress content words in I can use naturally to convey meaning. I can use English to ask questions or to clues and hints. I can use English to respond to studentsAo questions, such as and asking for I can give feedback skillfully in English, such as evaluating, and commenting on CEPS_2 CEPS_3 CEPS_4 CEPS_5 CEPS_6 CEPS_7 CEPS_8 CEPS_9 Syahadah Albaqiyatul Karimah. Devie Yundianto. Muhammad Aqil Alaauddin / Psychometric Properties of the Classroom English Proficiency (CEP) Scale for English Language Teachers in Indonesia: A CFA and Rasch Model Analysis EDUKASIA: Jurnal Pendidikan dan Pembelajaran. Vol. 7, 1 (January-June, 2. 211 of 218 CEPS_10 CEPS_11 CEPS_12 I can explain concepts, terms, or lesson content in clear English. I can give clear instructions in English when activities, giving homework, and managing the I can use English signals . , first, second, to indicate stages of a lesson. Saya dapat menjelaskan konsep, istilah, atau materi pelajaran dengan bahasa Inggris yang jelas Saya dapat memberikan instruksi yang jelas dalam bahasa Inggris saat melaksanakan aktivitas, memberikan pekerjaan rumah, dan mengelola Saya dapat menggunakan penanda bahasa Inggris yang tepat . first, second, nex. untuk menunjukkan tahapan dalam suatu pelajaran Based on the results presented in Table 2, all CEPS items demonstrated a good fit with the Rasch Overall, none of the items in the Indonesian-adapted version of the CEPS instrument were eliminated, indicating strong internal consistency and item validity. Among the items. CEPS_4 (AuI can speak English clearly without systematic pronunciation errorsA. was identified as the most difficult item, meaning it was the least frequently experienced or achieved by participants. Conversely. CEPS_12 (AuI can use appropriate English discourse markers such as first, second, and next to indicate lesson stagesA. was found to be the easiest item, meaning it was the most experienced by participants. The hierarchy of item difficulty levels is illustrated further in Figure 2. Syahadah Albaqiyatul Karimah. Devie Yundianto. Muhammad Aqil Alaauddin / Psychometric Properties of the Classroom English Proficiency (CEP) Scale for English Language Teachers in Indonesia: A CFA and Rasch Model Analysis EDUKASIA: Jurnal Pendidikan dan Pembelajaran. Vol. 7, 1 (January-June, 2. 212 of 218 Figure 2 Item Person Map CEPS Evaluation of Rating Scale Function The functioning of the five-point Likert rating scale was examined using the Rating Scale Model (RSM). The results indicated that all response categories were utilized by participants, although the lowest category (AuStrongly DisagreeA. showed a relatively low frequency of endorsement. The rating scale met the monotonicity assumption, as both observed averages and Andrich threshold estimates increased progressively across response categories, indicating orderly category Category fit statistics showed acceptable performance for categories 2 to 5 (Infit and Outfit MNSQ between 0. 92 and 1. Although category 1 exhibited misfit due to its low frequency, the overall rating scale functioned effectively and consistently for measuring Classroom English Proficiency. Measurement Invariance Differential Item Functioning (DIF) analysis was conducted to examine whether the items functioned equivalently across different subgroups. In this study. DIF was tested based on gender . ale The results indicated that 11 out of 12 items functioned fairly and showed no significant DIF. However, one itemAiCEPS 2 (AuI can use a wide range of English vocabularyA. Aidemonstrated statistically significant DIF. The Welch t-test indicated a significant difference . = 2. 94, p = . , which was further confirmed by the MantelAeHaenszel test (NA = 11. 649, p = . The item was found to be more difficult for the male group (DIF measure = 0. than for the female group (DIF measure = Ae0. The DIF contrast of 04 logits indicates a substantial difference in item difficulty between the two groups. Syahadah Albaqiyatul Karimah. Devie Yundianto. Muhammad Aqil Alaauddin / Psychometric Properties of the Classroom English Proficiency (CEP) Scale for English Language Teachers in Indonesia: A CFA and Rasch Model Analysis EDUKASIA: Jurnal Pendidikan dan Pembelajaran. Vol. 7, 1 (January-June, 2. 213 of 218 4 Discussion This study aimed to adapt and evaluate the psychometric properties of the Classroom English Proficiency Scale (CEPS) using two complementary analytical frameworks: Confirmatory Factor Analysis (CFA) and the Rasch Model. Overall, the findings indicate that CEPS is a psychometrically robust and theoretically sound instrument for assessing classroom-specific English proficiency among English teachers in Indonesia. The strongest evidence emerges from the construct validity results. The second-order CFA demonstrated excellent model fit, with all major goodness-of-fit indices meeting recommended thresholds (CFI = 0. TLI = 0. RMSEA = 0. 069, 90% CI . 048, 0. SRMR = 0. These findings confirm that the four theoretically grounded dimensionsAiGrammatical and Lexical Accuracy and Range. Pronunciation. Stress, and Intonation. Language of Interaction, and Language of InstructionAi are coherently represented by a higher-order construct of Classroom English Proficiency. This hierarchical structure supports the conceptualization of CEPS as multidimensional at the subscale level while remaining unidimensional at the global construct level. This interpretation is further strengthened by the Rasch analysis, which confirmed the unidimensionality assumption through Principal Component Analysis of Residuals. The Raw Variance Explained by Measures . 2%) exceeded the recommended threshold of 40%, indicating that a single dominant latent trait underlies item responses. Together, the CFA and Rasch results provide converging evidence that CEPS can validly yield both subscale scores and an overall proficiency score, enhancing its flexibility for research and professional development purposes. In terms of measurement precision. CEPS demonstrated strong reliability across analytical High person reliability . indicates that the scale can effectively distinguish teachers across different levels of classroom English proficiency, while high item reliability . suggests that the item difficulty hierarchy is stable and replicable. These results confirm that CEPS possesses sufficient measurement sensitivity for both diagnostic and evaluative applications. A noteworthy finding concerns the presence of Differential Item Functioning (DIF) in CEPS_2 (AuI can use a wide range of English vocabularyA. across gender groups. The analysis revealed that this item was more difficult for male teachers than for female teachers with equivalent levels of overall From a practical perspective, this finding has important implications for scale use and Rather than warranting immediate item deletion, the observed DIF suggests a need for careful item review. The item may reflect differences in self-perception of lexical breadth rather than actual classroom language competence, potentially influenced by gender-related response tendencies. Therefore, revision rather than removal appears to be the most appropriate course of action. Future iterations of CEPS may benefit from rephrasing the item to include more behaviorally anchored descriptors or contextualized classroom examples, thereby reducing subjective interpretation while preserving its conceptual relevance. Despite the strong psychometric evidence, several limitations of the present study should be explicitly acknowledged. First, the use of convenience sampling restricts the generalizability of the findings to the broader population of Indonesian English teachers. Second, the reliance on self-report data introduces the possibility of response bias, as participantsAo ratings may not fully reflect actual classroom language performance. These limitations highlight the need for future validation studies employing probabilistic sampling techniques and incorporating external criteria, such as classroom observations or performance-based assessments. Longitudinal studies are also recommended to examine the stability and sensitivity of CEPS across time and instructional contexts. Syahadah Albaqiyatul Karimah. Devie Yundianto. Muhammad Aqil Alaauddin / Psychometric Properties of the Classroom English Proficiency (CEP) Scale for English Language Teachers in Indonesia: A CFA and Rasch Model Analysis EDUKASIA: Jurnal Pendidikan dan Pembelajaran. Vol. 7, 1 (January-June, 2. 214 of 218 CONCLUSION This study aimed to adapt, validate, and evaluate the psychometric properties of the Classroom English Proficiency Scale (CEPS) for English language teachers in Indonesia using a combined analytical framework of Confirmatory Factor Analysis (CFA) and the Rasch Rating Scale Model (RSM). Overall, the findings provide strong empirical evidence that the Indonesian-adapted CEPS demonstrates satisfactory construct validity, reliability, and measurement precision. Based on the stated research objectives, the main conclusions of this study can be summarized as The CFA results confirm that the proposed four-factor structureAiGrammatical and Lexical Accuracy and Range. Pronunciation. Stress, and Intonation. Language of Interaction, and Language of InstructionAifits the empirical data well and is coherently represented by a higher-order construct of Classroom English Proficiency. Rasch analysis provides further support for the unidimensionality of the CEPS, with the scale measuring a single dominant latent trait while maintaining theoretically meaningful The instrument demonstrates strong reliability and separation indices for both persons . and items . , indicating its effectiveness in distinguishing different levels of classroom English proficiency among teachers. All items showed acceptable fit to the Rasch model, and the Wright Map indicated that item difficulty levels were well aligned with participantsAo ability levels. Although one item (CEPS_. exhibited gender-related Differential Item Functioning (DIF), its impact on the overall psychometric integrity of the scale was limited, supporting the general fairness of the instrument across gender groups. Taken together, these findings indicate that the Indonesian version of CEPS is a valid, reliable, and practically useful instrument for assessing English teachersAo classroom language proficiency. The scale can serve both diagnostic and evaluative functions in teacher education programs, professional development initiatives, and institutional language assessment practices. Future research is encouraged to replicate the present findings using larger and more diverse samples to further strengthen generalizability. In addition, longitudinal studies are recommended to examine the sensitivity of CEPS in capturing changes in teachersAo classroom English proficiency over time, as well as to explore the integration of CEPS with observational or performance-based assessment methods. Acknowledgments: This research was funded by BIMA Kemendikbudristek (Ministry of Education. Culture. Research, and Technolog. The researcher would like to express our deepest gratitude to the Kemendikbudristek for the support and funding provided through the BIMA research grant. Thank you to the Universitas Nahdlatul Ulama Indonesia for supporting the research publication. REFERENCES