Bulletin of Information Technology (BIT) Vol 6. No 4. Desember 2025. Hal. 367 - 378 ISSN 2722-0524 . edia onlin. DOI 10. 47065/bit. https://journal. org/index. php/BIT Application of the Nayve Bayes Algorithm for Employee Performance Prediction Based on SIMPEG at TVRI East Kalimantan Station Ishmah Hanani. Siti Lailiyah*. Yulindawati* Department of Informatics Engineering. STMIK Widya Cipta Dharma. Samarinda. Indonesia Email: 1 2243909@wicida. id, 2*lail. 59a@gmail. com,3* yulindawati@wicida. (Corresponding author: 2243909@wicida. Abstract - Employee performance evaluation is a crucial aspect of public organizational management, including at the public broadcasting institution TVRI East Kalimantan Station. To date, attendance indicators obtained from the Employee Management Information System (SIMPEG) have often been used as the primary benchmark, as the data are objectively and structurally However, a single attendance-based approach risks overlooking more substantive aspects of work achievement. Therefore, this study integrates attendance data with the Employee Performance Targets (SKP) to construct a more representative performance label. The method employed is a classification approach using the Nayve Bayes (GaussianNB) algorithm. The research dataset consists of attendance records . ormal attendance, leave, official duty, study assignment, early departure, absence, and total working day. and quantized SKP scores. Performance labels were generated using a composite score . 30 y attendance percentage 0. 70 y normalized SKP), which was then categorized into three classes: Excellent. Good, and Needs Improvement. The model was trained using SIMPEG and SKP data that had undergone preprocessing, data partitioning, and class balancing. Experimental results show that the model achieved an accuracy of 0. 83, with a precision of 0. 86, recall of 0. 84, and F1-score of 83 on the test data. These results indicate that the model can consistently recognize employee performance patterns across all Practically, this study offers a simple, efficient, and easily implementable predictive framework to support more objective processes of coaching, monitoring, and reward allocation within TVRI East Kalimantan Station. Keywords: Employee performance. SIMPEG. Employee Performance Targets. Nayve Bayes. Performance prediction. INTRODUCTION Human resource management (HRM) is a strategic factor determining organizational success in both public and private sectors. Competent, disciplined, and adaptive human resources support the achievement of organizational goals through effective processes and measurable outcomes. In public broadcasting institutions such as TVRI East Kalimantan Station, the quality of broadcast services and operational support is strongly influenced by the consistency and productivity of employees. In practice, many departments use attendance discipline indicators as the primary proxy for performance because such data are easily obtained from the Employee Management Information System (SIMPEG), well-structured, and relatively objective . , . The availability of detailed attendance data makes it a convenient and quick metric for evaluation. However, relying solely on this indicator presents several conceptual and practical limitations. Performance assessments based solely on attendance risk obscuring the true dimensions of work achievement. Employees with perfect attendance do not necessarily meet their performance targets. conversely, those who occasionally take official leave or field duties may still exhibit high performance by achieving measurable output To address this gap, the government encourages the use of the Employee Performance Target (Sasaran Kinerja Pegawai Ae SKP) as a goal-based evaluation instrument. SKP captures aspects not reflected in attendance records, such as output quality, timeliness, and target realization, thus better representing the substantive dimension of performance rather than mere attendance discipline. Based on this perspective, this study asserts that employee performance cannot be evaluated solely through attendance but must also incorporate elements of work achievement represented by SKP. In the analytical domain, the advancement of data mining and machine learning provides lightweight yet effective predictive approaches to support data-driven decision-making . The Nayve Bayes (NB) algorithm is a probabilistic classification method widely recognized for its simplicity, computational efficiency, and competitive performance across various domains . It is also frequently applied in Decision Support Systems (DSS) within the education sector because of its transparent computation process and straightforward implementation . , . Nonetheless, prior literature indicates that in certain complex datasets. Nayve Bayes may underperform compared to nonlinear models such as Neural Networks (NN), making the quality and relevance of features crucial . In this study, this practical implication motivates the integration of attendance and SKP data to enrich the learning signals for the model rather than relying on a single source of indicators. Nayve Bayes is a classification algorithm grounded in BayesAo Theorem, a mathematical approach for estimating the probability of an event based on prior knowledge or observed evidence . Its core characteristic lies in the assumption of conditional independence among features, meaning that each variable contributes independently to the classification decision. Although this assumption is simple, it enables the algorithm to perform efficiently, be easily trained, and require minimal computational resources. In practical terms, the algorithm estimates the probability of each class based on the combination of observed feature values and assigns the class with the highest probability as the predicted outcome. Its probabilistic nature allows Copyright A 2025 Author. Page 367 Jurnal BIT is licensed under a Creative Commons Attribution 4. 0 International License Bulletin of Information Technology (BIT) Vol 6. No 4. Desember 2025. Hal. 367 - 378 ISSN 2722-0524 . edia onlin. DOI 10. 47065/bit. https://journal. org/index. php/BIT predictions to be interpreted quantitatively and transparently, making it particularly suitable for public institutions that demand accountability and explainability in decision-making processes . , . Another advantage is that Nayve Bayes remains competitive when applied to small- to medium-scale tabular datasets, such as personnel administration data, without requiring complex training procedures. However, its effectiveness highly depends on the relevance of input featuresAithe more informative the attributes, the better the modelAos ability to identify data patterns . For these reasons. Nayve Bayes is deemed appropriate for this study, which combines attendance (SIMPEG) and SKP data to predict employee performance objectively and efficiently. Beyond the HR or education domains, the flexibility of Nayve Bayes has been demonstrated across diverse applications . , . Prior studies on employee performance evaluation and prediction have utilized Nayve Bayes and its variants for performance appraisal, employee eligibility classification, and other personnel decisions . Ae. , . Ae . In the academic sector. Nayve Bayes has also been widely employed to model student performance and graduation prediction . , . Ae. However, most HR-oriented studies still focus on attendance-related attributes . ormal attendance, leave, official duty, or absenc. or other administrative features, while explicit integration with SKPAieither as a key feature or as a basis for constructing performance labelsAiremains relatively uncommon. Within governmental institutions, several studies highlight the importance of leveraging SIMPEG data to promote transparency and support bureaucratic reform . , . Other studies further illustrate the algorithmAos versatility. Yusnita et al. applied Nayve Bayes to a student admission system to support selection decisions, while Azahari et al. explored its use in predicting undergraduate study duration. Both studies emphasize that Nayve Bayes can be effectively implemented in educational and academic management contexts. Building upon these findings, the present study positions the application of Nayve Bayes in the public sector, specifically through the integration of SIMPEG and SKP data to assess employee performance more This research extends such utilization from monitoring to predictive classification of performance categories that can be directly applied for employee development and reward allocation. The case of TVRI provides a unique context since, as a public broadcasting institution, it must fulfill both governmental obligations and public service demands. Consequently, employee performance evaluation not only concerns internal administrative accountability but also the quality of public information services delivered . The nature of broadcasting tasksAirequiring on-time program delivery, production quality, and cross-functional coordinationAimakes performance indicators more diverse than those in purely administrative sectors. This condition reveals a research gap: few studies have integrated SIMPEG and SKP data within public broadcasting institutions, despite the distinctive operational demands. This gap thus forms the foundation of the present study. Based on the above background, this study takes the case of TVRI East Kalimantan Station and establishes the following objectives: . to formulate a composite performance label integrating attendance and SKP components. to build a Nayve Bayes model to predict the resulting performance label. From a policy perspective, this study adopts a three-class frameworkAiExcellent. Good, and Needs Improvement. Academically, it expands the application of Nayve Bayes in the public sector, which has been more frequently reported in educational and industrial contexts . , . , . , . , by positioning SKP as a key determinant of Practically, this approach can be readily adopted by HR units because it is transparent . he formulas and thresholds are easily explainable to stakeholder. , computationally efficient . uitable for limited infrastructur. , and flexible in aligning with institutional policies. Therefore, this study offers a more representative and operationally feasible framework for predicting employee performance at TVRI East Kalimantan Station. RESEARCH METHODOLOGY 1 Research Stages This study is an applied quantitative research employing a supervised classification approach to predict employee performance categories using the Nayve Bayes (NB) algorithm at TVRI East Kalimantan Station. The choice of NB is motivated by: . its simplicity, speed, and ease of institutional deployment. competitive performance on tabular and . transparent probability computations that facilitate stakeholder communication . , . , . The overall framework follows the Knowledge Discovery in Databases (KDD) process comprising six main stages: . data collection from SIMPEG and SKP. attribute selection. leaning, encoding, normalizatio. transformation and trainAetest partitioning. application of the Nayve Bayes classifier. model evaluation using confusion matrix, accuracy, precision, recall, and F1-score . Copyright A 2025 Author. Page 368 Jurnal BIT is licensed under a Creative Commons Attribution 4. 0 International License Bulletin of Information Technology (BIT) Vol 6. No 4. Desember 2025. Hal. 367 - 378 ISSN 2722-0524 . edia onlin. DOI 10. 47065/bit. https://journal. org/index. php/BIT Figure 1. KDD-Based Research Flow At the same time, emphasizing feature quality . ombining attendance SKP) because the literature shows that for certain data. NB can lag behind nonlinear models such as Neural Networks, so attribute selection is crucial . This approach is in line with common practices that utilize NB in the domains of education and selection, as demonstrated by Yusnita et al. , who utilized NB in a new student admission system, and Azahari et al. , who used NB to predict student study periods. Both studies demonstrate the simplicity and adaptability of NB in processing structured administrative data. Data sources comprise the SIMPEG . mployee management informatio. system at TVRI East Kalimantan Station and SKP (Employee Performance Targe. evaluation sheets for the same period. The workflow includes data extraction, cleaning and normalization, construction of a composite performance label (Attendance SKP) into three classes (Excellent. Good. Needs Improvemen. , feature engineering, training with GaussianNB, and evaluation on a 30% hold-out test set using accuracy, precision, recall. F1-score, and confusion matrix. This scheme reflects concise, organization-friendly data-mining practice . , . , and is consistent with NB implementations in HR/education domains . Ae. , . Ae. , . Policy and data-governance considerations follow public-sector practices of leveraging SIMPEG to support transparency and bureaucratic reform . , . Tools & environment: Python with scikit-learn for modeling, pandas for data handling, and standard visualization . GaussianNB uses default parameters, as the focus is on composite-label design and feature construction rather than extensive hyperparameter tuning . 2 Population and Sample The population of this study consists of employees registered in the Personnel Management Information System (SIMPEG) at TVRI East Kalimantan Station who had Employee Performance Targets (SKP) assessments during the observation period. This population is relevant because SIMPEG provides standardized administrative/attendance records, while SKP represents target-based performance achievements . , . Copyright A 2025 Author. Page 369 Jurnal BIT is licensed under a Creative Commons Attribution 4. 0 International License Bulletin of Information Technology (BIT) Vol 6. No 4. Desember 2025. Hal. 367 - 378 ISSN 2722-0524 . edia onlin. DOI 10. 47065/bit. https://journal. org/index. php/BIT The research sample consists of all data rows . mployee-perio. that passed the pre-processing stage, namely the numerical attendance columns HADIRNORMAL_HN. CUTI_CT. DINASLUAR_DL. TUGASBELAJAR_TB. MENINGGALKANKANTOR_MK. TIDAKMASUK_TM. TOTAL. The SKP EVALUATION column is available and has been successfully mapped. And the TOTAL value is > 0 . o avoid division by zero when calculating the attendance percentag. After cleaning, the data is divided into 70% for training and 30% for testing with a stratified split so that the proportions of the three classes (Very Good/Good/Needs Improvemen. are relatively balanced in both Given the tendency for the Good class to dominate. RandomOverSampler is applied to the training data to reduce model bias towards the majority class. The test data is left untouched . o resamplin. so that the evaluation reflects performance on unseen data. This total sample approach . ensus of available row. is common in SIMPEG-based prediction studies, maximizing available data while maintaining internal validity through strict pre-processing procedures . , . It is also in line with practices in many Nayve Bayes studies on employee performance . Ae. , . Ae. , . Thus, the evaluation results obtained from the test data can provide an objective picture of the performance of the Nayve Bayes algorithm in predicting employee performance based on attendance and other administrative attributes from SIMPEG. 3 Research Variables Dependent variable . : Predikat_Kinerja OO {Excellent. Good. Needs Improvemen. , derived from a composite score combining attendance discipline and work achievement (SKP). The three-class mapping supports operational coaching/reward decisions and reduces ambiguity in mid-range labels. Independent variables . : Combined SIMPEG attendance attributes and SKP indicator: HADIRNORMAL_HN . ormal attendance day. CUTI_CT . eave day. DINASLUAR_DL . fficial duty day. TUGASBELAJAR_TB . raining/study MENINGGALKANTOR_MK . arly-leave TIDAKMASUK_TM . bsence day. Operational definitions & scaling: persen_hadir = HADIRNORMAL_HN / TOTAL, clipped to . to prevent data artifacts. SKP mapping Ie skp_percent: excellent = 150, good = 100, needs improvement = 75. Below Expectation = 50. Unsatisfactory= 25. then skp_norm = skp_percent / 150 for scale Composite score . or label constructio. : score = 0. 30 y persen_hadir 0. 70 y skp_norm. Three-class policy mapping: Excellent . core Ou 0. Good . 70 O score < 0. Needs Improvement . core < 0. 70, including Below Expectation/Unsatisfactor. Methodological note. skp_percent is used as an input feature . esides attendance component. , while Predikat_Kinerja is the 3-class label predicted by the model. This design ensures the model learns from combined signals . ttendance SKP)Aia proven approach in HR/education NB applications . Ae. , . Ae. , . Table 1. Variabel Independen (Fitur Mode. Variable Code Present Leave Official Duty Study/Training Leaving the Absent Total Working Days SKP . TOTAL Skp_percent Operational Definition Days present as Leave days in the Official duties outside Training/study days Frequency/events of leaving during work Days absent Total relevant working entries Quantized SKP score . ategory mappin. Scale Unit Source Numeric from SIMPEG Numeric from SIMPEG Numeric from SIMPEG Numeric from SIMPEG Numeric from SIMPEG Ratio Days SIMPEG Ratio Days SIMPEG Ratio Days SIMPEG Ratio Days SIMPEG Ratio Event/ Days SIMPEG Numeric from SIMPEG Numeric from SIMPEG 150=Excellent. 100=Good. 75=Needs Improvement. Below Expectation. 25=Unsatisfactory Ratio Days SIMPEG Ratio Days SIMPEG Ratio Percent SKP Computation/Values Copyright A 2025 Author. Page 370 Jurnal BIT is licensed under a Creative Commons Attribution 4. 0 International License Bulletin of Information Technology (BIT) Vol 6. No 4. Desember 2025. Hal. 367 - 378 ISSN 2722-0524 . edia onlin. DOI 10. 47065/bit. https://journal. org/index. php/BIT The first seven variables represent the documented attendance patterns of employees in SIMPEG . resent, leave, official duty, study/training, leaving the office, absent, and total working day. These variables were selected because they are available, standardized, and auditable in the personnel administration process . , . The skp_percent variable is the main feature that represents work achievement derived from the Employee Performance Target (SKP) assessment and quantified to the set . , 100, 75, 50, . for consistency between entries. The inclusion of skp_percent confirms that performance is not only about attendance discipline, but also about achieving outputs methodologically, incorporating meaningful features that improve the signal for the Nayve Bayes algorithm . , . , . , while also responding to the supervisor's directive that assessments should not be Auabsence-onlyAy. Table 2. Intermediate Variables . or Label Formation. not model features except skp_percen. Variable Code Attendance Ratio Normalized SKP Composite Score persen_hadir skp_norm Operational Definition Proportion of SKP score in . Composite for label Computation/Va HN / TOTAL, clipped to . skp_percent / 30 y persen_hadir 70 y skp_norm Scale Unit Source Ratio SIMPEG Ratio SKP Ratio Derived These three intermediate variables are not used directly as features . xcept for skp_percent in Table . , but are used to form labels that will be predicted by the model. persen_hadir normalizes attendance to . , while skp_norm normalizes SKP to a comparable scale. Score combines the two signals with weights of 0. 30 : 0. eaning towards SKP) because the substance of work performance must stand out without negating the importance of attendance This composite strategy is commonly adopted in NB/HR studies to maintain the balance of heterogeneous quantitative aspects . Ae. , . Ae. , . Table 3. Dependent Variable (Performance Labe. Variable Code Operational Definition Class Rule Scale Domain Performanc e Category Predikat_Ki Category mapped from the composite Attendance SKP score Excellent if score Ou 0. Good if 0. 70 O score < 0. Needs Improvement if score < 0. Nominal {SB. BP} The Performance Predicate label is mapped into three classes to facilitate coaching and reward policies. Good . epresenting the majority of operation. is combined, while Below Expectation/Unsatisfactory is combined into Needs Improvement so that the intervention plan is clear and measurable. This three-class approach is in line with managerial needs in agencies and maintains the stability of estimates in the Nayve Bayes model . educing label sparsit. , as recommended in applied classification practices . , . , . Since all features used are numerical and have a continuous distribution, the Gaussian Nayve Bayes (GNB) variant is used as the classification algorithm. This model calculates the probability of each feature for each class based on a normal (Gaussia. distribution, as shown in Equation . = the -th feature value of an employee . , number of normal attendance days. HN = . = the performance class being evaluated . Goo. = mean of feature for all employees belonging to class Copyright A 2025 Author. Page 371 Jurnal BIT is licensed under a Creative Commons Attribution 4. 0 International License Bulletin of Information Technology (BIT) Vol 6. No 4. Desember 2025. Hal. 367 - 378 ISSN 2722-0524 . edia onlin. DOI 10. 47065/bit. https://journal. org/index. php/BIT This equation measures how distinctive an employeeAos feature values are for each performance class under the assumption that the data follow a normal distribution centered at the class mean ( ), with the spread determined by the standard deviation ( ). Values near the mean obtain higher likelihood scores, while those further away decrease The model then multiplies the likelihoods of all features and selects the class with the highest overall Based on the formula above, for each feature (HN. CT. DL. TB. MK. TM. TOTAL. SKP), the likelihood is calculated. For each performance class independenceAito obtain the joint probability , all feature probabilities are multipliedAiassuming conditional , which represents the likelihood that an employee belongs to The class with the highest posterior probability is then selected as the prediction result. Thus, the model does not guess randomly but systematically compares how well each employeeAos feature profile fits the Aucharacteristic patternAy of each performance category. All probability computations are automatically handled by the scikit-learn library (GaussianNB). Systematically, the GaussianNB formulation provides an appropriate representation for continuously distributed data such as those in the SIMPEG and SKP datasets, ensuring stable and interpretable predictive outcomes 4 Nayve Bayes Classification Pipeline The classification process was designed to be simple, transparent, and easily replicable within institutional environments: . input of SIMPEG . and SKP data. preprocessing and normalization. construction of three-class performance labels from the composite Attendance SKP score. feature assembly. 70:30 stratified split with oversampling applied to the training set. training of the Gaussian Nayve Bayes classifier. testing on the hold-out test set. evaluation using accuracy, precision, recall. F1-score, and the confusion The choice of Nayve Bayes is based on its computational efficiency, simplicity, and established performance in Decision Support Systems and educational data-mining applications . , . , . Particular attention is given to feature quality, as Nayve Bayes may underperform compared to nonlinear models when feature relevance is weak . The integration of SIMPEG and SKP data follows best practices in governmental HR information systems aimed at promoting transparency and bureaucratic accountability . , . Model performance is evaluated using standard classification metricsAiaccuracy, precision, recall, and F1-scoreAidefined as follows: Accuracy = . Precision = . Recall = . F1-score = . With: TP (True Positiv. : Number of positive samples correctly predicted. TN (True Negativ. : Number of negative samples correctly predicted. FP (False Positiv. : Number of negative samples incorrectly predicted as positive. FN (False Negativ. : Number of positive samples not detected by the model. For the multi-class case . hree classe. , the values of , and are computed using a one-vs-rest scheme for each class. Subsequently, both macro average . he unweighted mean across classe. and weighted average . he mean weighted by class sample siz. are reported to provide a balanced evaluation. This research followed the Nayve Bayes classification workflow as illustrated in Figure 2. Copyright A 2025 Author. Page 372 Jurnal BIT is licensed under a Creative Commons Attribution 4. 0 International License Bulletin of Information Technology (BIT) Vol 6. No 4. Desember 2025. Hal. 367 - 378 ISSN 2722-0524 . edia onlin. DOI 10. 47065/bit. https://journal. org/index. php/BIT Figure 2. Nayve Bayes Classification Flows The employee performance classification workflow using the Nayve Bayes algorithm consists of seven main stages: Data Input and Preprocessing: Retrieve data from SIMPEG and SKP, including attendance attributes (HN. CT. DL. TB. MK. TM. TOTAL) from SIMPEG and SKP evaluations for the corresponding period . , . Data Cleaning and Normalization: Convert data types, handle missing or anomalous values. compute attendance ratio = HADIRNORMAL_HN / TOTAL and clip the result to the . Map the AuSKP EvaluationAy into skp_percent with the following quantization: Excellent = 150. Good = 100. Needs Improvement = 75. Below Expectation = 50, and Unsatisfactory = 25. Then calculate skp_norm = skp_percent / 150 to ensure scale comparability. Label Construction (Three Classe. Compute the composite score score = 0. 30 y attendance_ratio 0. 70 y skp_norm, then map it into three categories: Excellent . core Ou 0. Good . 70 O score < 0. , and Needs Improvement . core < 0. including Below Expectation and Unsatisfactor. Feature and Label Assembly: Including skp_percent as a feature enriches the representation of achievement signals alongside attendance Data Splitting and Balancing: Apply a 70:30 stratified split between training and test sets. To reduce bias toward the majority class (Goo. , apply RandomOverSampler only to the training data. The test data remain untouched . Model Training: The study employs Gaussian Nayve Bayes (GaussianNB) for three-class performance classification. Conceptually, the model learns class-specific feature distributions . ean and varianc. from the training data. During prediction, it evaluates how well an employeeAos feature values fit each class. The overall match from all features is combined with class priors, and the class with the highest posterior probability is chosen as the prediction result. This approach is efficient, transparent, and well-suited for tabular data such as SIMPEG and SKP, as it requires minimal computation and offers easily interpretable logic for decision-makers. Feature quality is emphasized since simple models like Nayve Bayes rely heavily on relevant information. in some datasets, more complex models may outperform, making attribute selection a key factor. Copyright A 2025 Author. Page 373 Jurnal BIT is licensed under a Creative Commons Attribution 4. 0 International License Bulletin of Information Technology (BIT) Vol 6. No 4. Desember 2025. Hal. 367 - 378 ISSN 2722-0524 . edia onlin. DOI 10. 47065/bit. https://journal. org/index. php/BIT Prediction and Evaluation: Generate predictions on the test set and compute accuracy, precision, recall, and F1-score per class, along with macro and weighted averages and the confusion matrix. The desired model performance target is 0. accuracy < 1. RESULT AND DISCUSSION This section presents the research findings and analytical discussion related to the implementation of the Nayve Bayes algorithm for predicting employee performance based on data obtained from SIMPEG at TVRI East Kalimantan Station. The discussion is divided into three subsections: . data processing and model training, . model evaluation results, and . system implementation and practical implications. 1 Data Processing and Model Training Research Dataset The dataset used in this study integrates administrative records from the Employee Management Information System (SIMPEG) and the Employee Performance Targets (SKP) data for the same evaluation period of employees at TVRI East Kalimantan Station. From SIMPEG, attendance-related attributes consistently recorded by the HR unit were extracted: HADIRNORMAL_HN . ormal attendance day. CUTI_CT . eave day. DINASLUAR_DL . fficial TUGASBELAJAR_TB . raining/study MENINGGALKANTOR_MK . arly-leave event. TIDAKMASUK_TM . bsence day. , and TOTAL working days or valid entries. The SKP data were obtained from performance assessment forms based on target achievement and work quality, then recoded into a numerical variable skp_percent using the following mapping: Excellent = 150. Good = 100. Needs Improvement = 75. Below Expectation = 50, and Unsatisfactory = 25. The inclusion of these two complementary data sources underscores the main purpose of this study: employee performance should not be evaluated solely on attendance. Attendance reflects discipline and work availability, whereas SKP represents goal attainment and quality of output. Their combination yields a more comprehensive and operational representation of employee performance, suitable for both coaching and recognition purposes. This approach aligns with recommendations for leveraging personnel data to enhance transparency and support bureaucratic reform within public-sector institutions . , . Conceptually, the data handled in this study are tabular, consisting of discrete or quasi-continuous numerical Such characteristics are well-suited for lightweight and explainable applied machine-learning approaches such as Nayve Bayes . , . , . , . Models of this type are easily operationalized in institutional environments because they require low computational resources and provide transparent reasoning that can be clearly communicated to stakeholders. Data Processing The preprocessing stage was conducted to ensure that the data were clean, consistent, and scaled comparably before being trained in the model. The following steps were applied: Data Type Alignment : All attendance-related columns from SIMPEG were converted into numeric types, and non-numeric entries were cleaned. Textual SKP scores were mapped to the numeric variable skp_percent. Handling Missing or Anomalous Values: Missing values in attendance features were treated using a minimaldistortion principleAiassigning rational zeros for non-occurring events or removing rows that failed to meet fundamental conditions, such as invalid TOTAL values. Attendance Normalization: The variable attendance_ratio was defined as HN/TOTAL and clipped to the range . to prevent artifacts . , division by zer. This normalization made attendance signals comparable across employees and work periods. SKP Normalization: After categorical mapping to skp_percent, a normalized variable skp_norm = skp_percent / 150 was created to ensure that SKP values fall within the range . , comparable to the attendance ratio. Feature and Label Assembly: The feature matrix (X) consists of [HN. CT. DL. TB. MK. TM. TOTAL, skp_percen. , and the label . represents the three-class performance category (Excellent. Good. Needs Improvemen. , constructed from the composite score described in the Methodology section. Quality Gate Filtering: Rows with TOTAL O 0 were removed to avoid division errors, and entries lacking SKP records for the given period were excluded because a valid label could not be constructed. The decision to include skp_percent as both a predictive feature and a component of the composite label is substantively grounded. Organizationally, performance evaluation emphasizes target achievement as a determinant of overall performance. SKP signals are logically integrated with attendance patterns. The literature also Copyright A 2025 Author. Page 374 Jurnal BIT is licensed under a Creative Commons Attribution 4. 0 International License Bulletin of Information Technology (BIT) Vol 6. No 4. Desember 2025. Hal. 367 - 378 ISSN 2722-0524 . edia onlin. DOI 10. 47065/bit. https://journal. org/index. php/BIT highlights that feature quality is critical to the performance of Nayve Bayes on tabular dataAisimple models can underperform compared to nonlinear approaches . , neural network. when features lack discriminative strength . , . , . , . Data Splitting The dataset was divided into a training set . %) and a test set . %) using stratified sampling, ensuring that the class proportions in the labels were preserved in both subsets. The test set contained 36 instances, distributed as follows according to the classification report: Good = 20. Needs Improvement = 12, and Excellent = 4. Stratification was crucial to prevent evaluation bias toward the majority class. Since the natural class distribution placed Good as the dominant category, a RandomOverSampler was applied only to the training data. This technique balanced the number of examples in each class during model learning, reducing the modelAos tendency to favor the majority class. The test data were left untouched . o resamplin. to ensure that the evaluation metrics truly reflected the modelAos generalization capability on unseen data. This design follows standard evaluation practices for classification models in HR and educational domains . Ae. , . Ae. , . , . Model Training The model employed in this study is the Gaussian Nayve Bayes (GaussianNB) classifier from the scikit-learn Nayve Bayes learns class-specific summary statistics of featuresAinamely the mean and varianceAifrom the training data. During prediction, the model calculates how well an employeeAos feature values fit each class The compatibility scores across all features are then combined with class priors, and the class with the highest combined probability is selected as the predicted outcome. The rationale for choosing Nayve Bayes includes: Efficiency and speed: It can be trained quickly and is well-suited for institutional environments with limited computational resources. Transparency: The decision-making process is easily interpretable by management, as it is based on straightforward statistical summaries and additive feature contributions . , . , . Competitiveness on tabular data: When features are relevant. Nayve Bayes performs competitively with more complex models. Incorporating skp_percent alongside attendance patterns enriches the predictive signal. The GaussianNB model used default parameters, as the research focus was on composite label design, feature construction, and performance validation rather than extensive hyperparameter tuning. Nevertheless, the model could be further enhanced through cross-validation or lightweight parameter optimization in future operational stages if required by the organization. 2 Model Evaluation Result Model Evaluation The model evaluation was conducted using precision, recall. F1-score, and support metrics, with an overall accuracy The test results are presented in Table 4. Table 4. Model Evaluation Matriks Class Precision Recall F1-Score Support Good Needs Improvement Excellent Accuracy Macro avg Weighted avg The evaluation utilized the macro average as the primary reference metric. On the test data, the model achieved an overall accuracy of 0. 83, with a macro-averaged precision of 0. 86, recall of 0. 84, and F1-score of 0. These results indicate that the model performs relatively consistently across all three target classesAiGood. Needs Improvement, and Excellent. The use of macro averaging was considered appropriate because it computes the mean performance across classes with equal weighting, regardless of sample distribution. This is particularly relevant for employee performance datasets where class imbalance naturally occurs: the Good category tends to dominate, while Excellent and Needs Improvement classes contain fewer samples. By employing macro averages, the evaluation reflects not only the modelAos performance on the majority class but also its ability to correctly identify minority classes. The macro-averaged precision value of 0. 86 indicates that the modelAos predictions are highly reliable, with a low error In other words, when the model classifies an employee into a particular performance category, there is a high Copyright A 2025 Author. Page 375 Jurnal BIT is licensed under a Creative Commons Attribution 4. 0 International License Bulletin of Information Technology (BIT) Vol 6. No 4. Desember 2025. Hal. 367 - 378 ISSN 2722-0524 . edia onlin. DOI 10. 47065/bit. https://journal. org/index. php/BIT probability that the prediction corresponds correctly to the true class. The macro-averaged recall of 0. 84 confirms that most employees across all performance categories were successfully identified, reducing the likelihood of misclassification or missed detection of employees who should belong to a specific class. The balance between precision and recall is reflected in the macro-averaged F1-score of 0. 83, which demonstrates the modelAos stable performance in both predictive accuracy and sensitivity. From an organizational standpoint, such consistent macroaverage metrics are essential because they ensure fairness in performance evaluation across all employee categories. Employees with Excellent ratings remain accurately identified, those categorized as Good are recognized with high accuracy, and employees requiring improvement are reliably detected despite their smaller representation in the With these results, the model meets the established quality threshold . ccuracy Ou 0. 79 and < 1. , signifying that it is sufficiently robust to serve as a baseline decision-support model for employee performance evaluation. Bar Chart Figure 3. Distribution of Nayve Bayes Prediction Figure 3 shows the overall distribution of model predictions across the full dataset: Good . Needs Improvement . , and Excellent . This information is useful for managerial planningAie. , sizing capacity for developmental programs targeting the Needs Improvement group and designing recognition strategies for the Excellent group each These counts are not used as scientific evaluation metrics. formal evaluation relies on the 30% hold-out test set, but the distribution remains operationally relevant for HR planning. Confusion Matrix Figure 4. Nayve Bayes Classification Confusion Matrix The confusion matrix on the test set exhibits the following patterns. Row AuGoodAy: 19 correctly predicted as Good, 1 upgraded to Excellent, 0 predicted as Needs ImprovementAiexplaining the high recall for Good . Row AuNeeds ImprovementAy: 7 correctly predicted as Needs Improvement, 5 misclassified as Good, 0 as Excellent. There is no overpromotion to Excellent . Needs Improvement Ie Excellen. , but leakage into Good lowers recall to 0. Row Copyright A 2025 Author. Page 376 Jurnal BIT is licensed under a Creative Commons Attribution 4. 0 International License Bulletin of Information Technology (BIT) Vol 6. No 4. Desember 2025. Hal. 367 - 378 ISSN 2722-0524 . edia onlin. DOI 10. 47065/bit. https://journal. org/index. php/BIT AuExcellentAy: 4 correctly predicted as Excellent, 0 to other classesAiconsistent with a recall of 1. The precision for Excellent . indicates that a portion of predictions labeled Excellent may originate from other classes . ypically Goo. with scores near the 0. 85 decision boundary. Operationally, these patterns suggest that the model is conservative toward Excellent . igh recall, moderate precisio. and highly reliable for Good, while borderline cases between Needs Improvement and Good warrant attention . review of feature thresholds or targeted coaching criteri. 3 Analysis of Result The analysis was conducted according to the Nayve Bayes stages described in the methodology. Preprocessing Stage Normalization of attendance . ttendance_ratio = HN/TOTAL) and SKP . kp_norm = skp_percent / . ensured that all features were comparable on a similar scale. Without normalization, the wider numerical range of SKP values could overshadow the attendance signal. The composite score weighting of 0. 30 : 0. 70 was chosen for discriminative reasonsAiSKP provides a stronger signal for differentiating performance levels, while attendance serves as a supporting indicator. This weighting aligns with the organizational context, where attendance is necessary but actual work achievement remains the primary determinant of employee performance. Label Construction The three performance classesAiExcellent (SB). Good (B), and Needs Improvement (BP)Aiwere derived from the composite score. The class distribution revealed that Good dominated, while Excellent and Needs Improvement appeared as minority categories. This imbalance implies that, without adjustment, the model could be biased toward the majority class. Therefore, oversampling was applied only to the training data, enabling the model to learn minority-class patterns without distorting evaluation integrity. The test data were kept in their original distribution so that evaluation metrics would fairly represent generalization performance. The GaussianNB classifier performed well on numerical data. however, the broader dispersion of the Needs Improvement class made boundary separation more challenging. GaussianNB Training Stage (Mechanism and Impac. The Gaussian Nayve Bayes model learns the mean and variance of each feature per class, then computes the likelihood of an observation given each class distribution, weighted by the class prior. In this dataset, the Needs Improvement class exhibited a wider spread and overlapped with the Good class, particularly near the composite score threshold of 0. 70Ae0. Statistically, this overlap reduced the posterior probability of Needs Improvement relative to Good, causing some Needs Improvement cases to be misclassified upward as Good. This explains the mechanistic reason behind the recall disparity: the Needs Improvement class obtained a recall 58, while the Good and Excellent classes achieved much higher recall scores . 95 and 1. 00, respectivel. The Excellent class achieved perfect recall because its high SKP scores distinctly separated it from other Evaluation and Confusion Matrix Stage The summarized test results yielded an accuracy of 0. 83, macro precision of 0. 86, macro recall of 0. 84, and macro F1-score of 0. 83, indicating balanced model performance across classes despite the inherent data The confusion matrix further demonstrated that the Good class achieved a recall of 0. table and mostly correct prediction. , the Excellent class achieved a recall of 1. o instances were misse. , and the Needs Improvement class obtained a recall of 0. 58, primarily due to five out of twelve cases being misclassified upward into the Good category. To improve performance, particularly in the Needs Improvement class, several enhancement strategies can be These include: performing threshold tuning within the 0. 70Ae0. 85 range to make class boundaries in the overlapping region more decisive. applying model-based class balancing using class_weight . ncreasing the misclassification penalty for the Needs Improvement clas. as a complement to data-level oversampling in the training enriching the SKP feature with additional indicators such as timeliness, targetAeachievement gap, or output quality to sharpen the signal of the Needs Improvement class. and optionally conducting probability calibration prior to threshold application, enabling more consistent score-based decisions. These findings are consistent with previous literature highlighting the flexibility of Nayve Bayes for structured administrative data. Yusnita et al. demonstrated the effectiveness of NB in supporting student admission decisions through transparent classification mechanisms, while Azahari et al. emphasized its application for predicting undergraduate study duration. Both studies confirm that Nayve Bayes can yield reliable results when the input data are representative. In line with these insights, integrating attendance and SKP as key features has proven to enhance the modelAos relevance for employee performance assessment in the public sector. Copyright A 2025 Author. Page 377 Jurnal BIT is licensed under a Creative Commons Attribution 4. 0 International License Bulletin of Information Technology (BIT) Vol 6. No 4. Desember 2025. Hal. 367 - 378 ISSN 2722-0524 . edia onlin. DOI 10. 47065/bit. https://journal. org/index. php/BIT Furthermore, the results have clear managerial implications. First, the list of employees classified under Needs Improvement can be prioritized for coaching and training programs. Second, employees categorized as Excellent can be identified as candidates for recognition and reward programs. Third, the consistent macro-average metrics provide assurance that the model operates without systemic bias, supporting objective and transparent performance monitoring For future improvement, steps such as k-fold cross-validation can be applied to stabilize performance estimates on relatively small datasets, and enrichment of SKP-related features . , timeliness indicators, targetAerealization gap, or work quality metric. can further strengthen the discriminative power of the model. Nonetheless, even without algorithmic modification, the current results demonstrate that Nayve Bayes can provide an accurate, fair, and management-ready representation of employee performance to support data-driven decision-making. CONCLUSION This study demonstrates that employee performance prediction is more accurate when combining attendance information (SIMPEG) with SKP achievement, rather than relying on a single indicator. Using a composite score . y attendance percentage 0. 70 y normalized SKP) and a three-class scheme (Excellent. Good. Needs Improvemen. , the proposed Gaussian Nayve Bayes model achieved an accuracy of 0. 83 on the test set. Macro-averaged metrics yielded precision = 0. 86, recall = 0. 84, and F1-score = 0. 83, indicating balanced performance across all categories. These findings have practical implications for TVRI East Kalimantan Station. High precision indicates trustworthy stable recall ensures that most employees in each category are correctly identified. and consistent F1scores reflect a sound balance between precision and recall. Consequently, the model can serve as a lightweight, transparent baseline decision-support tool. For future work, we recommend enriching SKP features . , quality indicators, timeliness, targetAerealization ga. and applying k-fold cross-validation to stabilize performance estimates. These steps can improve the recall of minority classes without sacrificing precision. Overall, this research contributes a practical approach to strengthening data-driven governance through the integration of SIMPEG and SKP, and shows that macro-average evaluation offers a fair and balanced basis for managerial decision-making. REFERENCES