ISSN 2089-385X (Prin. | 2829-6761 (Onlin. Volume 15. No. , pp. Vertex Published by: Institute of Computer Science (IOCS) Enhancing Rice Disease Identification using Hybrid GLCMXGBoost with SMOTE Imbalance Handling Anwar Sadad Health Informatics. Sunan Gresik University. Gresik. Indonesia Abstract Article Info Rice (Oryza sativa ) is a major food staple, which is prone to multiple diseases that will dramatically decrease the harvest yield. Disease identification is time consuming and is usually subject to subjective errors in a manual approach. The following research will seek to increase the level of precision of automatic rice plant disease detection, namely the Brown Spot. Hispa, and Leaf Blast The suggested method combines both the Gray Level Cooccurrence Matrix (GLCM) to extract texture features and the Extreme Gradient Boosting (XGBoos. classification algorithm. Furthermore, the Synthetic Minority Over-sampling Technique (SMOTE) is applied to address class imbalance within the dataset of 5,548 images. Preprocessing steps include resizing, grayscale conversion, and Min-Max normalization. Experimental results demonstrate that the model trained on SMOTE-balanced data with optimized XGBoost parameters achieved a superior accuracy of 98%, outperforming the imbalanced scenario . %) and previous This research confirms that the combination of GLCM. SMOTE, and XGBoost constitutes a robust and high-precision method for rice disease identification. Article history: Received : dec 05,2025 Revised : Dec 17, 2025 Accepted : Dec 30, 2025 Keywords: Disease Classification. GLCM. Rice Disease. SMOTE. XGBoost. Corresponding Author: Anwar Sadad. Health Informatics. Sunan Gresik University. Gresik. Indonesia. anwarsadad@lecturer. This is an open access article under the CC BY license. Introduction Rice (Oryza sativ. is a measured commodity in Indonesia. But the problem is that for achieving food security. Indonesia is increasingly faced with the threat of pests and diseases (Widyawati et , 2. The brown spot disease, leaf blast and hispa are included in diseases that are difficult to control, which greatly reduce the quantity and quality of the harvest. To overcome this, mobile surveillance and smart advisory solutions are increasingly being developed (Aida & Wan, 2025. Khalil et al. , 2024. Talreja et al. , 2. , owing to the need for accurate and early detection so that farmers can take the relevant control measures. In recent years, the application of Machine Learning has proven highly effective in assisting precision agriculture and disease diagnosis (Nata et al. , 2025. Priyangka & Kumara, 2. A systematic review by (Seelwal et al. , 2. highlights the rapidly growing trend and potential of deep learning applications for rice disease diagnosis. Several studies have employed methods such as Convolutional Neural Networks (CNN) (Barburiceanu et al. , 2. hybrid Vision Transformer approaches (De Silva & Brown, 2. and ensemble learning to automate detection. For instance. Journal homepage: w. org/index. php/Vertex A ISSN 2089-385X (Prin. | 2829-6761 (Onlin. (Sharma et al. , 2. highlighted the efficacy of deep learning in plant disease diagnosis, while (Tiwari et al. , 2025 ) and (Kulkarni & Shastri, n. ) successfully demonstrated the potential of multi-model machine learning for automated identification of rice diseases. Similarly, (Hutauruk, 2. utilized YOLO algorithms for smart detection interfaces. Meanwhile, (Anggiratih et al. achieved an accuracy of 79. 53% using EfficientNet B3, and (Wibisono & Saiful, 2. demonstrated the effectiveness of XGBoost in handling agricultural datasets. Furthermore, hybrid approaches combining feature extractors with XGBoost have gained traction. for instance, (Sovia et , 2. successfully enhanced classification performance by integrating CNN with XGBoost. Even though past findings have been promising, a significant research gap that should be filled is evident to enhance reliability. The major weakness of most of the existing studies is that they do not optimally deal with imbalanced datasets. (Miftahushudur et al. , 2. describe a recent survey that points out that one of the issues that need to be addressed to create powerful agricultural models is the imbalance in data. The imbalance in data may lead to the classification model being skewed on the majority class, thus leaving out the minority class that in most cases is the focus of the specific disease. Moreover. Deep learning is trendy, but hybrid techniques with statistic texture features are also very efficient and easy to compute (Alabbasi et al. , 2. Conceptually, this study demonstrates that handling data imbalance in the feature space . ia GLCM vector. offers a more robust strategy for agricultural disease classification compared to traditional image-level augmentation. While the imbalanced model achieved 97% accuracy, it exhibited bias toward the majority class. The application of SMOTE on GLCM features successfully interpolated information gaps for minority classes, allowing the XGBoost algorithm to establish precise decision boundaries without the computational overhead of generating synthetic images . , via GAN. This confirms that statistical feature-level oversampling is a highly effective, yet computationally efficient, strategy for enhancing diagnostic precision in smart farming While Deep Learning (DL) models, particularly Convolutional Neural Networks (CNN. , have shown remarkable performance in large-scale image classification, they are computationally intensive and notoriously data-hungry. Applying CNNs to medium-sized agricultural datasets often leads to overfitting unless massive data augmentation or transfer learning is employed. contrast, rice leaf diseasesAispecifically Brown Spot. Hispa, and BlastAimanifest primarily through distinct textural pathologies rather than complex shape variations. Consequently, statistical feature extraction methods like the Gray Level Co-occurrence Matrix (GLCM) can capture these finegrained surface patterns more effectively than the latent features learned by CNNs in early layers. Furthermore, a major limitation in existing literature is the lack of explicit control over class imbalance. End-to-end DL models often struggle with imbalanced data without complex loss function modifications. This study proposes a hybrid framework that decouples feature extraction from classification, allowing for precise intervention in the feature space using the Synthetic Minority Over-sampling Technique (SMOTE). By combining GLCMAos texture descriptiveness with Extreme Gradient Boosting (XGBoos. Aiknown for its robust regularization and execution speed Ai this approach offers a computationally efficient and highly accurate alternative to 'black-box' deep learning models. To address these issues, the proposed study is a hybrid solution based on the idea of using Synthetic Minority Over-Sampling Technique (SMOTE) to deal with data imbalance. Gray Level Co-occurrence Matrix (GLCM) to extract texture features, and Extreme Gradient Boosting (XGBoos. as a classification method. GLCM is chosen because it is proven to be effective in the ability to capture finer texture leaf patterns verified by recent comparative analysis (Jordy & Ariatmanto, 2025. Ramli & Riadi, 2. XGBoost is selected due to its high execution rate and The primary goal of the study is to enhance the accuracy of rice disease Vertex Vertex. Vol. No. 1, (Decembe. identification by a considerable percentage over the previous methods in ensuring that the model is balanced in terms of learning. Methods Dataset The data used in the current paper has been gathered by the open Kaggle dataset repository ( "Rice Leaf Dataset " ) and single records. It has the total of 5,548 images grouped into four categories BrownSpot . Hispa . LeafBlast . as well as Healthy . The distribution of the classes in the dataset is very imbalanced with the majority of the classes being healthy as it is represented in the distribution. Preprocessing To be able to transform the data in a uniform manner and make computations faster, the following data preprocessing actions were taken: Scaling and Grayscale Conversion: The images were scaled to 256x256 and converted to grayscale because so that emphasis was put on the texture variation and not on the variation in the colors. Normalization: Min-Max was used to bring the values to a range of between 0 and 1 in order to improve the speed at which the models converge. Using Gray Level Co-occurrence Matrix (GLCM), we have obtained Feature Extraction (GLCM) Texture features. GLCM was determined at four orientations ( 0, 45, 90 and 135 0 ) and pixel distance d=1. GLCM can be applied in extracting features in the pathology of rice leaves because the current study has found that it is appropriate (Ramli & Riadi, 2. Based on these matrices, there were six statistical values obtained. Contrast. Correlation. Angular Second Moment (ASM). Energy. Homogeneity and Dissimilarity that gave complete feature vector of each image. Controlling Imbalance (SMOTE) To reduce the bias to the majority. Synthetic Minority Over-sampling Technique (SMOTE) was used on the training data. SMOTE provides synthetic samples of the minor classes (BrownSpot. Hispa and LeafBlas. by interpolating the existing This was intended to equalize the dataset hence giving 1,797 samples of each class, and total of a total of 7,188 data points. Classification and Testing Scenario This was divided into testing and training . and 20 respectivel. XGBoost algorithm was used to do the classification. The hyperparameters were optimized through the grid search with: max depth, learning rate . , min child weight, n estimators and subsample and n estimators with the application of learning rate . The Performance was measured with the help of the Confusion Matrix analysis. Accuracy. Precision. Recall and F1-Score. Results And Discussion Hyperparameter Tuning Results Hyperparameter tuning of the Extreme Gradient Boosting (XGBoos. model is very sensitive to hyperparameters. This research performed a set of experiments through Grid search to determine the best configuration. All 10 scenarios were experimented with different max depth, learning rate . , min child weight, n estimators and The results of the parameter tuning are provided in Table 1. Experimental outcomes suggest that more profound trees . ax_depth: . with an intermediate value of learning rate . and the increased number of estimators . were the most stable and most accurate. Skenario max_depth Tabel 1. Results of Hyperparameter Tuning for XGBoost Tuning Parameter Eta Submin_child_weight n_estimators . earning_rat. Accuracy Balanced Data Accuracy Imbalanced Data Enhancing Rice Disease Identification using Hybrid GLCM-XGBoost with SMOTE Imbalance Handling(Anwar Sada. A ISSN 2089-385X (Prin. | 2829-6761 (Onlin. Table 1 reveals that Scenario 8 had the best accuracy of 98% on the balanced dataset. The following parameters . ax depth: 12, eta: 0. 3, n estimators: . were then chosen as the best The Effect of SMOTE on Model Performance A critical analysis was conducted to determine the effectiveness of SMOTE. Findings indicate that it is vital to deal with class When the model was applied to the unbalanced data, it had a maximum accuracy of 97 with bias on the majority. The accuracy after using SMOTE was always 98. This is in line with the evidence presented by (Miftahushudur et al. , 2. that imbalance intervention is an effective way of enhancing the strength of agricultural classifiers. Findings indicate that dealing with class imbalance is imperative. The model recorded a perfect accuracy of 97 on the unbalanced dataset. The bias was however found in that the model was more favourable to the majority group (Health. The accuracy was improved to 98 per cent regularly, after the use of SMOTE under the best conditions. SMOTE was able to create synthetic samples within the feature space of minority classes (BrownSpot and Hisp. which enabled the XGBoost algorithm to create a draw line of boundaries of decisions and minimize misclassification upon the disease categories. Confusion Matrix Analysis In order to justify the thorough classification behavior, a Confusion Matrix analysis was applied to the most effective model (Scenario . Figure 1. Heatmap Confusion Matrix According to the analysis, the model is very precise. Particularly, the LeafBlast class was recognized with a high degree of fidelity and 0 false negatives were registered. Despite small error in the Healthy and Hispa classes, the error rate was very low. The mean Precision. Recall, and F1Score of all classes were equal to 0. 98 that proves the strength of the model. Comparison to Previous Studies The proposed method (GLCM SMOTE XGBoos. was compared to the best methods provided in the recent literature. As can be seen as summarized in Table 2, the proposed approach performs better than a number of Deep Learning (CNN) and Machine Learning models. Vertex Vertex. Vol. No. 1, (Decembe. Tabel 2. Comparison of Research Performance Method Deep LearningEfficientnet B3 (Anggiratih et al. , 2. Dengan Transfer Learning Feature Selection Optimized (Dubey & Choubey, 2. XGBoost (Kusanti et al. , 2. Glcm & Backpropagation (Khoiruddin & Tena, 2. Cnn (Shrivastava & Pradhan. Machine learning : SVM. DC. Knn. NB. DT. RF. LR (Nata et al. , 2. Cnn (Milano et al. , 2. Xgb Research results Xgb & Glcm Author Accuracy The result of 98% is better than most Deep Learning methods. This is an indication that although deep learning is strong, optimized hybrid feature extraction (GLCM) and ensemble learning can be very competitive and this is also echoed by (Alabbasi et al. , 2025. Ramli & Riadi. Conclusion This study has demonstrated that the quality of agricultural disease identification is heavily dependent on how class imbalance is managed. The proposed method successfully integrates Gray Level Co-occurrence Matrix (GLCM) feature extraction with Synthetic Minority Over-sampling Technique (SMOTE) and Extreme Gradient Boosting (XGBoos. , achieving a superior accuracy of Conceptually, this research offers a critical update to data imbalance handling strategies in the smart agriculture domain by shifting the focus from image-level augmentation to feature-space Unlike traditional methods that rely on geometric transformations or computationally intensive generative models, this study proves that applying SMOTE specifically on statistical texture descriptors (GLCM) effectively reconstructs decision boundaries for minority This strategy eliminates the classification bias toward healthy plants without the computational overhead of image synthesis, offering a robust and lightweight solution. Future studies should consider hybridizing GLCM with deep learning embeddings or deploying this lightweight model into IoT-based monitoring systems as proposed by (Aida & Wan, 2. Reference