Cyberspace: Jurnal Pendidikan Teknologi informasi Volume 9. Nomor 2. Oktober 2025, hal. 22 Ae 32 ISSN 2598-2079 . | ISSN 2597-9671 . COMPARATIVE EVALUATION OF NDVI-BASED VEGETATION CLASSIFICATION USING RULE-BASED APPROACH AND RANDOM FOREST MODELS Sri Azizah Nazhifah1*. Maulyanda2. Andriani Putri3. Usfita Kiftiyani4 1,2,3 Department of Informatics. Faculty of Mathematics and Natural Sciences. Universitas Syiah Kuala. Banda Aceh. Indonesia Graduate School Department of Computer Science and Engineering. Kyung Hee University. Seoul. South Korea E-mail: sriazizah07@usk. id, 2maulyanda@usk. id, 3andrianiputri@usk. kiftiyani@khu. Corresponding Author: sriazizah07@usk. Abstract This study aims to compare vegetation classification performance using NDVI derived from Sentinel-2A and Landsat 8 satellite imagery through two different approaches: rule-based classification and machine learning with the Random Forest The rule-based approach applies a fixed NDVI threshold of 0. 45 to distinguish vegetation and non-vegetation areas. In contrast, the Random Forest model was trained using 70% of the labeled data and tested on the remaining 30%, with NDVI values from both satellite sources as input features. The evaluation results show that the Random Forest model achieved perfect classification accuracy . %). However, this may be due to using the same labeled dataset for both training and validation, which can lead to overfitting. On the other hand, the rule-based classification yielded an accuracy of 79. This lower performance is likely caused by several factors, including the resolution differences between Sentinel-2 and Landsat 8 imagery, and the subjectivity involved in selecting the NDVI threshold The manual threshold setting may lead to bias and a higher number of misclassified pixels. Therefore, while rule-based methods are simple and interpretable, they are less robust. Machine learning approaches, such as Random Forest, offer more flexible and accurate classification when supported by properly separated training and validation datasets. Keywords: machine learning, satellite imagery, supervised classification, rulebased approach, random forest Abstrak Penelitian ini berfokus pada perbandingan kinerja klasifikasi vegetasi menggunakan data NDVI dari citra Sentinel-2A dan Landsat 8 dengan dua pendekatan, yaitu pendekatan berbasis aturan . ule-base. dan pembelajaran mesin . achine learnin. menggunakan algoritma Random Forest (RF). Pada pendekatan rule-based, digunakan nilai ambang batas . NDVI sebesar 0,45 untuk membedakan antara area vegetasi dan non-vegetasi. Sebaliknya, pada metode Random Forest, model dilatih menggunakan 70% data untuk pelatihan dan 30% untuk pengujian, dengan nilai NDVI dari kedua citra sebagai fitur input. Hasil evaluasi menunjukkan bahwa model Random Forest menghasilkan akurasi sempurna. Namun, hal ini kemungkinan disebabkan oleh penggunaan data label yang sama untuk pelatihan dan validasi, sehingga berpotensi menyebabkan overfitting. Sementara itu, pendekatan COMPARATIVE EVALUATION OF NDVI-BASED VEGETATION CLASSIFICATION USING RULE-BASED APPROACH AND RANDOM FOREST MODELS rule-based menghasilkan akurasi sebesar 79,7%. Kinerja yang lebih rendah ini disebabkan oleh beberapa faktor, seperti perbedaan resolusi antara citra Sentinel-2 dan Landsat 8 yang memengaruhi proses ekstraksi fitur NDVI, serta penentuan nilai threshold yang dilakukan secara manual berdasarkan pengalaman, bukan dengan metode ilmiah. Dengan demikian, pendekatan rule-based memang sederhana dan mudah diterapkan, namun kurang akurat. Sebaliknya, metode pembelajaran mesin seperti Random Forest menawarkan hasil yang lebih fleksibel dan akurat, terutama jika data pelatihan dan validasi dipisahkan secara tepat. Kata Kunci: pembelajaran mesin, citra satelit, klasifikasi terbimbing, pendekatan rule-based, random forest Introduction The implementation of machine learning techniques has been extensively applied across diverse domains, including healthcare, social sciences, natural disaster, and One such model, by applying XGBoost algorithm to predict the level of land classification at risk of tsunamis, it is divided into 5 classifications, namely very vulnerable, high, low, very low, and not vulnerable . Furthermore, machine learning has also been applied in the field of car prediction by utilizing various features. In this study, both XGBoost and Random Forest algorithms were employed to perform car prediction tasks. The results indicate that the Random Forest model outperformed XGBoost in terms of stability and consistency across different However. XGBoost demonstrated superior performance in distinguishing between different car brands, showing its effectiveness in handling classification tasks that require fine-grained distinctions . Machine learning approaches in car prediction commonly incorporate multiple vehicle attributesAisuch as brand, model, year, mileage, fuel type, and odometer readingAito perform tasks ranging from used-car price prediction to vehicle classification. As . conducted a comparative study where the Random Forest model achieved 96. 8% accuracy, outperforming XGBoost . 7%) in terms of RMSE and model stability on tabular datasets of automobile features. In recent years, decision tree-based classification has gained attention in remote sensing applications due to their ability to handle complex, high-dimensional data For example, a remote sensing study mapping urban impervious surfaces using fused optical and SAR features found that XGBoost generally achieved higher accuracy and superior precision/recall than Random Forest across multiple cities, though RF sometimes matched or exceeded XGBoost in specific cases . It is a subset of machine learning techniques, has been applied to satellite imagery such as Sentinel-2A for land cover classification in Langsa City. This study categorized land cover into eight classes: mangrove, water, pond, open area, built-up area, bushes, rice fields, and oil palm. The results indicate that the decision tree model was able to accurately classify image pixels into the respective categories, achieving an overall accuracy of 94% . Furthermore, the decision tree model demonstrates strong potential in supporting environmental monitoring and land use management, particularly in urban and coastal regions such as Langsa City. Similarly, this study also employs Support Vector Machine (SVM) and Random Forest (RF) for land classification with a case study in Banda Aceh City. Although Random Forest did not perform optimally in this context, both SVM and RF achieved accuracy levels above 80%, indicating their effectiveness in handling classification tasks in urban settings . 23 | Cyberspace: Jurnal Pendidikan Teknologi Informasi Sri Azizah Nazhifah. Maulyanda. Andriani Putri. Usfita Kiftiyani In addition to machine learning techniques, as . will also employ the Rule-Based classification approach. Rule-Based classification is considered effective for classification tasks by determining threshold values prior to applying machine learning The experiment conducted in this study utilizes a rule-based approach using clinical data. The results demonstrate that combining Rule-Based classification with the Decision Tree model can yield promising outcomes, highlighting the potential of hybrid approaches in improving classification accuracy and model interpretability . Moreover, another study combined Rule-Based learning with Multi-Task Learning, proposing the integration of Rule-Based learning to reduce misclassification of non-rice regions, thereby enhancing the overall classification performance . A similar hybrid methodology that combines rule-based approaches with decision-tree models has recently been applied in medical image analysis . , . The results indicate that hybrid machinelearning frameworks in medical imaging, which integrate rule-based reasoning with decision-tree models, can improve predictive performance while enhancing model In line with improving classification accuracy, the study also utilizes the Normalized Difference Vegetation Index (NDVI) as a critical preliminary step to monitor land conditions before applying the classification methods. NDVI is a widely used metric for assessing variations in vegetation density across land areas . The use of NDVI aids in highlighting vegetation patterns, which supports more accurate and robust classification As . , monitoring Land Surface Temperature (LST) by applying NDVI by using Landsat 8 and MODIS satellite image. The results indicate a steady increase in alongside a decrease in vegetation health within areas experiencing urbanization and deforestation, highlighting a self-reinforcing cycle between heat buildup on the surface and environmental deterioration. As . is also conducted to determine land cover in hilly terrain by employing Sentinel-1 SAR imagery as the primary dataset. The study applies a rule-based classification approach, in which specific thresholds are established using VV and VH polarization data to distinguish between forest, urban areas, water bodies, and agricultural The results demonstrated that this method provided improved land cover mapping accuracy in complex topographic regions. However, studies that relate both a rule-based approach and the Random Forest algorithm are still limited. Therefore, this research aims to compare the performance of the rule-based method and Random Forest in classifying satellite imagery. As a preprocessing step to enhance classification accuracy. NDVI is first calculated from the imagery to emphasize vegetation differences. Method Initially, the research identifies an Area of Interest (AOI) chosen for its varied land cover types and its relatively small spatial extent to maintain manageable analysis. In this work. Sentinel-2A and Landsat-8 datasets will be utilized. The selection of these satellite images is based on several considerations. Sentinel imagery was chosen due to its good resolution as an optical remote sensing product. Meanwhile. Landsat was selected for its strong spectral capabilities suitable for medium-scale observations. Moreover, both satellite datasets are freely available and fully integrated with Google Earth Engine (GEE). The selected area covers only one path and row for both Sentinel-2A and Landsat 8 imagery. Subsequently, the temporal period of analysis is determined, followed by NDVI calculation for each satellite image. A rule-based approach is then applied to label Cyberspace: Jurnal Pendidikan Teknologi Informasi | 24 COMPARATIVE EVALUATION OF NDVI-BASED VEGETATION CLASSIFICATION USING RULE-BASED APPROACH AND RANDOM FOREST MODELS the dataset by defining specific NDVI threshold values. Sampling pixels are generated for training and testing purposes. The training dataset is used to build a classification model using the Random Forest algorithm. Finally, the classification results are evaluated by comparing the performance of the rule-based approach and Random Forest in terms of classification accuracy. The step is illustrated in the following Fig. Selecting the period Defining AOI Splitting dataset Sampling 1000 pixels form stack Pre-processing data Random Forest by using 50 trees Calcuting NDVI Labeling NDVI: Rule-based threshold: 0. Accuracy Random forest and Rule-based Figure 1. Data Processing Pipeline for Satellite Image Classification The initial step involves defining the study area, which in this case is the city of Banda Aceh, selected as the experimental site. The satellite imagery utilized in this study includes data from Landsat 8 and Sentinel-2A. Both datasets were filtered to cover the period from January 20, 2025, to July 30, 2025, with a maximum cloud cover threshold of 20%. This filtering was applied to ensure the acquisition of recent imagery with minimal cloud contamination, as high cloud cover can significantly affect the accuracy of the classification results. In addition. NDVI is calculated using specific bands for each satellite image. For Sentinel-2 imagery. Band B8 . ear-infrare. and Band B4 . were used. Meanwhile, for Landsat 8 imagery, the NDVI calculation employs Band SR_B5 . ear-infrare. and Band SR_B4 . The following equations represent the general NDVI equation . applied to each satellite dataset: ycAyaycIOeycIyceycc ycAyaycOya = ycAyaycI ycIyceycc For Sentinel-2 imagery, the NIR and red bands correspond to Band 8 and Band 4, respectively, resulting in the equation . ycAyaycOyaycIyceycuycycnycuyceycoOe2ya = yaA8OeyaA4 yaA8 yaA4 Meanwhile, for Landsat 8 surface reflectance imagery, the NDVI is calculated using Band SR_B5 (NIR) and SR_B4 (Re. as follows . ycIycI_yaA5OeycIycI_yaA4 ycAyaycOyayaycaycuyccycycaycOe8 = ycIycI_yaA5 ycIycI_yaA4 25 | Cyberspace: Jurnal Pendidikan Teknologi Informasi Sri Azizah Nazhifah. Maulyanda. Andriani Putri. Usfita Kiftiyani Next, after obtaining the NDVI values from each satellite image, the two images are combined into a single stacked image containing two NDVI bands. This stacking facilitates easier feature extraction from both datasets simultaneously. Subsequently, the labeling process is conducted, where each image is assigned a different threshold value to distinguish vegetation from non-vegetation pixels. The thresholds were determined empirically through trial and error to optimize classification accuracy. Specifically, for Sentinel-2, an NDVI threshold above 0. 3 is applied to identify vegetation pixels, while a stricter rule-based threshold of NDVI greater than 0. 45 is used for prediction. This threshold was determined through trial and error, as values below or above 0. 45 tend to result in significant pixel misclassification. Sampling is performed by extracting 1,000 pixel samples from the stacked image containing the NDVI bands and corresponding labels. It is ensured that the spatial geometry of each sample was retained to preserve location information. Subsequently, a random column is generated to partition the sampled dataset into training and testing Specifically, 70% of the samples were allocated for model training, while the remaining 30% are reserved for testing and evaluation purposes. The model is trained using the Random Forest algorithm, with the training data labeled by the original classes as the target variable, and NDVI values from both Sentinel2A and Landsat 8 serving as input features. This approach enables more accurate and reliable predictions by leveraging complementary information from the two satellite After training the Random Forest (RF) model, it is applied to the stacked image to produce a vegetation classification map. Subsequently, the testing data was classified using the trained RF model to assess its performance. The evaluation involved calculating the confusion matrix and the classification accuracy of the RF model. Additionally, an evaluation of the rule-based classification was conducted using an NDVI threshold of This rule-based classification was compared against the original labels in the testing data by computing the confusion matrix and accuracy, allowing the performance of the rule-based method in vegetation classification to be assessed. Results The visualization of the two images shows different results. It is showed in Fig. the Sentinel-2A shows a clearer satellite image with minimal cloud cover. However, it is important to note that the spatial resolution of the two satellite images differs. Sentinel-2 provides higher spatial resolution compared to Landsat 8, which results in a more detailed and sharper image. This difference in resolution is evident in the visual comparison presented below. Cyberspace: Jurnal Pendidikan Teknologi Informasi | 26 COMPARATIVE EVALUATION OF NDVI-BASED VEGETATION CLASSIFICATION USING RULE-BASED APPROACH AND RANDOM FOREST MODELS Figure 2. True color image from Sentinel-2 . and true color image from Landsat-8 imagery . Then, the NDVI calculation, which emphasizes the spectral difference between the NearInfrared (NIR) and Red bands, produces the following results as illustrated in the figure . Figure 3. NDVI Results from Sentinel-2A . and Landsat 8 Imagery . This Fig. 3 presents the NDVI maps derived from Sentinel-2A and Landsat 8 The left panel . shows the NDVI result from Sentinel-2A, while the right panel . displays the NDVI from Landsat 8. The NDVI values represent vegetation density, where higher values . isplayed in gree. indicate healthy and dense vegetation, and lower values . hown in white or light brow. indicate sparse or non-vegetated areas. Sentinel2A, with its finer spatial resolution, provides a more detailed visualization of vegetation patterns compared to Landsat 8. Both images are processed to reduce cloud cover and enhance classification accuracy. Subsequently, a rule-based classification was applied using Sentinel-2 NDVI values. An NDVI threshold of greater than 0. 3 was used to define vegetation pixels, while a stricter threshold of greater than 0. 45 was used for rule-based prediction. These thresholds 27 | Cyberspace: Jurnal Pendidikan Teknologi Informasi Sri Azizah Nazhifah. Maulyanda. Andriani Putri. Usfita Kiftiyani were determined through a trial-and-error approach to distinguish vegetated and nonvegetated areas more accurately. The resulting classifications are illustrated in the figure . Figure 4. Random Forest classification . and Rule-based classification results using thresholds . The Fig. 4 is classification results using the Random Forest model demonstrate superior performance compared Rule-based classification, as indicated by a lower number of misclassified pixels. This suggests that the model effectively distinguishes between vegetation and non-vegetation areas, leading to more accurate and reliable classification outcomes. The evaluation results based on the confusion matrix indicate that the Random Forest (RF) model outperforms the rule-based classification. The confusion matrices for both methods are presented in Tables 1 and 2 below. TABLE 1. CONFUSION MATRIX OF RANDOM FOREST Predicted Predicted NonVegetation Vegetation Actual non135 Vegetation Actual Vegetation TABLE 2. CONFUSION MATRIX OF RULE-BASED CLASSIFICATION Predicted Predicted NonVegetation Vegetation Actual non135 Vegetation Actual Vegetation Cyberspace: Jurnal Pendidikan Teknologi Informasi | 28 COMPARATIVE EVALUATION OF NDVI-BASED VEGETATION CLASSIFICATION USING RULE-BASED APPROACH AND RANDOM FOREST MODELS Based on Table 1, the Random Forest model shows no classification errors, indicating perfect prediction accuracy. In contrast. Table 2 reveals that approximately 60 pixels were misclassified in the rule-based classification, highlighting a notable limitation of the rulebased approach. The confusion matrix in the Random Forest and Rule-Based approaches provides an overview of the distribution of correct and incorrect classifications for each class . egetation and non-vegetatio. Evaluation metrics such as overall accuracy, kappa, precision . ser's accurac. , and recall . roducer's accurac. are then calculated from this These values are summarized in an evaluation Table 3, facilitating performance comparisons between the Random Forest and Rule-Based methods. In other words, the evaluation table is a quantitative summary of the information presented by the confusion TABLE 3. COMPARISON OF RANDOM FOREST AND RULE-BASED EVALUATION METRICS BASED ON CONFUSION MATRIX Metric Overall Accuracy Kappa Precision (Veg. UserAo. Precision (Non-Ve. Recall (Veg. ProducerAo. Recall (Non-Ve. Random Forest (RF) Rule-Based Table 3 presents a comparison of the evaluation results of the Random Forest (RF) and Rule-Based approaches based on the confusion matrix. Overall Accuracy and Kappa values indicate that RF has better classification performance than Rule-Based. This is supported by the higher Precision (User's Accurac. and Recall (Producer's Accurac. values for both vegetation and non-vegetation classes in RF. Thus. RF is more reliable in distinguishing vegetation and non-vegetation pixels, while Rule-Based tends to produce more classification errors. Analysis Based on the evaluation results presented in the confusion matrices Table 1, the Random Forest (RF) model excel classification performance with an accuracy of 1. 00 or This is indicated by the absence of any classification errors. all 135 non-vegetation pixels and 161 vegetation pixels were correctly classified. These results show that the RF model effectively learned the patterns from the NDVI data and produced highly accurate However, it is possible that the validation pixels were also drawn from the training dataset, which may explain the absence of misclassifications in the Random Forest model. To ensure a more reliable evaluation, validation data should ideally be collected independently, either through field surveys or by using high-resolution satellite imagery for comparison . , . Validation using high spatial resolution imagery has also been applied in waste monitoring . The study conducted by . reported strong results by applying NDVI for feature extraction in mangrove forest analysis. Their findings showed that the Random Forest (RF) method produced a highly significant accuracy, exceeding 90%. Although the 29 | Cyberspace: Jurnal Pendidikan Teknologi Informasi Sri Azizah Nazhifah. Maulyanda. Andriani Putri. Usfita Kiftiyani current research achieved an accuracy below 90% when compared to their study, this difference may be influenced by cloud cover or insufficient training data. In contrast, the rule-based classification method, which applies an NDVI threshold of > 0. 45 for identifying vegetation, achieved an accuracy of 0. 797 or approximately The selected threshold value of 0. 45 was determined through a trial-and-error When values higher or lower than 0. 45 are tested, the resulting accuracy dropped Therefore, a threshold of 0. 45 was identified as the most optimal value for this specific case. Although all 135 non-vegetation pixels are correctly classified, 60 vegetation pixels are misclassified as non-vegetation. This suggests that the rule-based approach, while simple, may be too rigid and less adaptive to the variations in NDVI values across different vegetation areas, resulting in a significant number of classification errors compared to the RF model. Indeed, the rule-based approach has inherent limitations, primarily due to its reliance on human expertise and subjective interpretation. This dependency can introduce bias during the model construction process, as the rules are manually defined and may not generalize well across varying image characteristics or vegetation conditions. In this study, such limitations became evident, as the rule-based classification led to a significant number of misclassified pixels. This outcome suggests that while rule-based methods can be useful for quick or preliminary analysis, they may not be suitable for complex or largescale classification tasks where data variability is high. In contrast, machine learning models like Random Forest are better equipped to handle such variability by learning patterns directly from the data, resulting in more robust and accurate predictions. Conclusion This study applies two classification approaches for vegetation mapping: a rule-based method and a machine learning technique using Random Forest. The rule-based approach utilized a manually defined NDVI threshold of 0. 45 to distinguish between vegetation and non-vegetation pixels. Meanwhile, the Random Forest model is trained using 70% of the data and is tested on the remaining 30%, with NDVI values from Sentinel-2 and Landsat 8 as input features. The results showed that the Random Forest model achieved perfect accuracy. However, this was likely influenced by the fact that the validation data used are the same as the labeled data, leading to no misclassification. On the other hand, the rule-based method achieve an accuracy of 79. The lower performance of the rule-based classification can be attributed to several factors, including the resolution differences between the satellite images and the subjectivity involved in selecting the NDVI Consequently, a considerable number of pixels are misclassified. This issue can be mitigated by implementing more advanced or data-driven thresholding methods to enhance classification reliability. Future work from this research could focus on expanding the dataset to a longer time period, integrating multisensory data . ptical and rada. , and implementing other algorithms such as SVM or deep learning for performance comparison. Furthermore, more adaptive NDVI threshold determination and validation with field data are needed to improve accuracy, as well as developing operational applications for continuous vegetation monitoring. Cyberspace: Jurnal Pendidikan Teknologi Informasi | 30 COMPARATIVE EVALUATION OF NDVI-BASED VEGETATION CLASSIFICATION USING RULE-BASED APPROACH AND RANDOM FOREST MODELS References