Available online at https://journal.
com/index.
php/ijrcs/index International Journal of Research in Community Service e-ISSN: 2746-3281 p-ISSN: 2746-3273 Vol.
No.
3, pp.
147-155, 2025
Implementation of Machine Learning Model for Pest Classification in Rice Plants Moch Panji Agung Saputra1*.
Deva Putra Setyawan2.
Muhammad Bintang Eighista Dwiputra3 Department of Mathematics.
Faculty of Mathematics and Natural Sciences.
Universitas Padjadjaran.
Sumedang.
Indonesia Master's Program of Mathematics.
Faculty of Mathematics and Natural Sciences.
Universitas Padjadjaran.
Jatinangor.
Indonesia Computer Science Study Program.
Faculty of Mathematics and Natural Sciences Education.
Universitas Pendidikan Indonesia.
Bandung.
Indonesia *Corresponding author email: moch16006@unpad.
Abstract Rice cultivation is a cornerstone of food security in agrarian countries like Indonesia, yet it remains highly vulnerable to pest infestations that can severely impact crop productivity.
Manual identification of pests is time-consuming and error-prone, especially when pest species exhibit similar morphological characteristics.
This study aims to implement and evaluate the performance of four classical machine learning algorithms Support Vector Machine (SVM).
K-Nearest Neighbor (KNN).
Random Forest (RF), and Logistic Regression (LR) for classifying rice pests based on image data.
The dataset, derived from KaggleAos AuRice Pest Detection Dataset,Ay includes 12 pest classes and underwent a series of preprocessing steps: grayscale conversion, image resizing to 128y128 pixels, feature extraction using Histogram of Oriented Gradients (HOG), label encoding, and class balancing via SMOTE.
The experimental setup used 80% of the data for training and 20% for testing.
Performance was evaluated using precision, recall.
F1score, and confusion matrices.
Among the four models.
SVM achieved the most consistent and robust performance, with F1-scores reaching up to 0.
98 in several pest classes and an overall balanced classification across the dataset.
Random Forest followed closely, particularly excelling in distinguishing classes such as Rice Water Weevil and Yellow Rice Borer, achieving F1-scores of 0.
99 and 96 respectively.
In contrast.
KNN showed signs of overfitting, with extreme precision-recall imbalances, while LR was more stable but less accurate in separating visually similar classes like Rice Stem Fly and Thrips.
Visual analysis of correct and incorrect predictions revealed that classes 7 (Rice Stem Fl.
and 11 (Thrip.
were consistently misclassified across all models, likely due to high visual similarity.
Keywords: Classification, machine learning model, rice leaf pests.
Introduction Agriculture is a vital sector in meeting the food needs of the population, especially in an agrarian country like Indonesia.
Rice, as a primary crop, plays a crucial role in maintaining national food security.
Rice productivity is often hampered by attacks by various pests, which can significantly reduce crop yields.
Undetected pest attacks often cause severe damage, necessitating rapid and appropriate detection and management to minimize the negative impact on production (Wang et al.
, 2.
Along with the advancement of digital technology, the application of artificial intelligence (AI) and machine learning (ML) in agriculture is growing.
This technology offers an efficient and accurate approach to automatically detecting and classifying pests based on image data (Kaushal et al.
, 2.
By utilizing visual feature extraction techniques such as Histogram of Oriented Gradients (HOG) and machine learning algorithms, the system can recognize visual patterns in images of leaves or plant parts infected with pests.
This presents a significant opportunity to reduce reliance on manual detection, which requires expertise and considerable time (Ochango et al.
, 2.
Various machine learning algorithms have been developed and used for image classification tasks, including in the context of crop pest detection.
Models such as Support Vector Machine (SVM).
K-Nearest Neighbors (KNN).
Random Forest, and Logistic Regression are some of the commonly used methods due to their ability to recognize patterns in complex data.
Each model has its own characteristics and advantages in terms of accuracy, speed, and generalization to new data.
Previous research has shown that the application of machine learning in the classification of plant diseases and pests has attracted the attention of researchers in the field of precision agriculture.
Setiawan et al.
studied the Saputra et al.
/ International Journal of Research in Community Service.
Vol.
No.
3, pp.
147-155, 2025
application of the Nu-Support Vector Machine (Nu-SVM) algorithm to rice leaf disease classification, specifically in distinguishing between healthy leaves.
Brown Spot, and Leaf Blast.
By using Hu Moments features and Sobel edge detection on segmented leaf images, this study achieved a moderate level of accuracy .
81%) through 5-fold cross-validation.
Although the results indicate challenges in precise classification, this study emphasizes the importance of more sophisticated image processing and feature extraction methods.
Kasinathan and Uyyala .
applied a machine vision-based approach to the classification of plant insect pests by combining various feature descriptors such as texture, color, shape.
HOG, and GIST.
They used several machine learning algorithms, both basic classifiers such as Naive Bayes.
SVM.
KNN, and MLP, as well as ensembles such as Random Forest.
Bagging, and XGBoost.
The results of the 10-fold cross-validation test show that the classification accuracy increases significantly when using a combination of features and ensemble methods.
Although machine learning has been widely applied in agriculture, studies specifically comparing various machine learning models for rice pest image classification are still relatively limited.
Most studies focus on disease identification or insect classification in general, without emphasizing a comprehensive evaluation of each algorithm's performance in the context of rice pests.
Furthermore, visualization of classification results showing which images are correctly and incorrectly classified is still rare, even though this is crucial for understanding model error characteristics and improving system interpretability.
Therefore, this study aims to implement and compare several classic machine learning models such as SVM.
KNN.
Random Forest, and Logistic Regression in image-based rice pest classification.
Methodology Data Collection The data used in this study was obtained from the Kaggle platform under the title "Rice Pest Detection Dataset.
" This dataset is part of the IP102 dataset, specifically filtered for detecting rice pests.
The dataset consists of images that have undergone several stages of augmentation and preprocessing to improve data quality and enrich image variety.
This dataset comprises 12 classes of rice pests, with the number of images for each class shown in Table 1 below:
Table 1: Number of images per pest class Label Pest Name Rice leaf roller Rice leaf caterpillar Paddy stem maggot Asiatic rice borer Yellow rice borer Rice gall midge Brown plant hopper Rice stem fly Rice water weevil Rice leaf hopper Rice shell pest Thrips Number of Images Data Preprocessing Before the data was used in the machine learning model training process, several preprocessing steps were performed to ensure the quality and uniformity of the image data used.
The preprocessing steps implemented in this study included:
Image Conversion to Grayscale Each image was converted to grayscale format to simplify visual information by reducing the color dimension.
This allowed the model to focus more on texture and shape patterns relevant for pest detection.
Image Resizing to 128x128 Pixels The images were then resized to 128x128 pixels so that all images had uniform dimensions and could be processed consistently by the machine learning model (Sundhar et al.
, 2.
Feature Extraction Using Histogram of Oriented Gradients (HOG) Image features were extracted using the HOG method, which aims to capture edge and texture characteristics by dividing the image into small blocks and calculating the gradient orientation of the pixels within them.
HOG
is effective in recognizing visual patterns and is often used in object recognition (Ramiady et al.
, 2.
Encoding Class Labels Using LabelEncoder The class name or label of each image is converted into numeric form using LabelEncoder so it can be processed by the classification algorithm.
Saputra et al.
/ International Journal of Research in Community Service.
Vol.
No.
3, pp.
147-155, 2025
Addressing Data Imbalance Using SMOTE To address the uneven distribution of images between classes, the SMOTE (Synthetic Minority Oversampling Techniqu.
method is used.
This technique generates synthetic data for the minority class to create a more balanced distribution between classes, preventing the model from biasing toward the majority class (Mohanty et al.
, 2.
Split the Dataset The processed data is divided into two parts: 80% training data and 20% testing data.
The training data is used to build the classification model, while the testing data is used to evaluate the model's performance on previously unseen data.
Model Development This research conducted a classification model development using four different machine learning algorithms:
Support Vector Machine (SVM).
K-Nearest Neighbor (KNN).
Random Forest (RF), and Logistic Regression (LR).
Support Vector Machine (SVM) Support Vector Machine works by finding the best hyperplane that separates data from different classes in a high-dimensional feature space (Ghaddar and Naoum-Sawaya, 2.
= ycycnyciycu.
c ! ycu yc.
K-Nearest Neighbor (KNN) K-Nearest Neighbor classifies data based on the majority class of a number of nearest neighbors.
cu Oo y.
= 01.
cu" Oe yc" )# "%& .
Random Forest (RF) Random Forest is a decision tree-based ensemble model that uses multiple decision trees and votes for the final result (Wang, 2.
yc3 = ycoycuyccyce{Ea& .
Ea# .
, .
Ea ! .
} .
Logistic Regression (LR) Logistic Regression is used to predict the probability of data belonging to a particular class based on a linear combination of features.
c = .
= .
1 yce '() )* ) This training process is carried out by utilizing features generated from the Histogram of Oriented Gradients (HOG) method as a representation of each pest image, as well as labels that have been encoded numerically using LabelEncoder.
Model Evaluation The evaluation was conducted using common classification metrics: accuracy, precision, recall, and F1-score.
The formula for each metric is as follows:
yaycaycaycycycaycayc = ycNycE ycNycA y 100% ycNycE ycNycA yaycA yaycE ycNycE y 100% ycNycE yaycE ycNycE y 100% ycNycE yaycA ycEycyceycaycnycycnycuycu = ycIyceycaycaycoyco = ya1 Oe ycIycaycuycyce = ycNycE y 100% ycNycE yaycA Saputra et al.
/ International Journal of Research in Community Service.
Vol.
No.
3, pp.
147-155, 2025
TP: True Positive TN: True Negative FP: False Positive FN: False Negative Results and Discussion Model Performance Evaluation Class Table 2: Evaluation of Support Vector Machine Name Rice Leaf Roller Rice Leaf Caterpillar Paddy Stem Maggot Asiatic Rice Borer Yellow Rice Borer Rice Gall Midge Brown Plant Hopper Rice Stem Fly Rice Water Weevil Rice Leaf Hopper Rice Shell Pest Thrips Precision Recall F1-score The Support Vector Machine (SVM) model demonstrated excellent performance in detecting most classes of rice Classes such as Brown Plant Hopper .
Rice Water Weevil .
, and Paddy Stem Maggot .
had very high precision, recall, and f1-score values, approaching or reaching 0.
98Ae1.
00, indicating the model was able to consistently and accurately identify these pests.
However, several classes, such as Rice Leaf Hopper .
and Rice Shell Pest .
, had lower recall and f1-score values .
70Ae0.
, indicating that the model still had difficulty identifying these pests with balanced precision and sensitivity.
Figure 1: Confusion matrix of SVM Saputra et al.
/ International Journal of Research in Community Service.
Vol.
No.
3, pp.
147-155, 2025
The confusion matrix for the SVM model shows excellent classification performance for most classes.
Classes such as Rice Leaf Roller .
Rice Leaf Caterpillar .
, and Paddy Stem Maggot .
were predicted almost perfectly with the number of correct predictions reaching more than 240 instances each.
However, there were slight misclassifications in several classes, especially in class 7 (Rice Stem Fl.
, class 8 (Rice Water Weevi.
, and class 9 (Rice Leaf Hoppe.
, which experienced quite a lot of misclassifications to other classes.
For example, in class 7, only 183 of the total instances were correctly classified, while the rest were spread across various classes such as class 6, 8.
Class Table 3: K-Nearest Neighbor Evaluation Name Rice Leaf Roller Rice Leaf Caterpillar Paddy Stem Maggot Asiatic Rice Borer Yellow Rice Borer Rice Gall Midge Brown Plant Hopper Rice Stem Fly Rice Water Weevil Rice Leaf Hopper Rice Shell Pest Thrips Precision Recall F1-score Based on the K-Nearest Neighbor (KNN) model evaluation table, classification performance varies considerably across classes.
Several classes, such as Rice Stem Fly .
and Rice Shell Pest .
, exhibit overfitting symptoms, characterized by very high precision values .
98 and 1.
00, respectivel.
, but low recall .
84 and 0.
This indicates that although the KNN model is very confident in its predictions, it can only recognize a small fraction of the actual samples from these classes.
Furthermore, for the Asiatic Rice Borer class .
, the model has low precision .
but high recall .
, indicating that the model often mispredicts this class.
However, several classes perform quite well, such as Paddy Stem Maggot .
Brown Plant Hopper .
, and Rice Water Weevil .
, which have f1-scores above 0.
Figure 2: Confusion matrix of KNN Saputra et al.
/ International Journal of Research in Community Service.
Vol.
No.
3, pp.
147-155, 2025
The confusion matrix for the KNN model indicates that it has lower classification performance than SVM, especially for certain classes.
While the model is able to classify classes such as Paddy Stem Maggot .
and Rice Leaf Roller .
quite well, with over 230 correctly classified instances, the model appears to struggle to distinguish classes such as Rice Stem Fly .
Rice Water Weevil .
, and Rice Leaf Hopper .
For example, in class 7, only 109 instances were correctly classified out of a much larger total, while over 100 other instances were misclassified into various classes such as 8, 6, and even 3.
This pattern suggests that KNN is susceptible to similarities between visual features due to its distance-based nature.
It also leads to overfitting on certain training data but fails to generalize well to more complex or visually noisy test data.
Class Table 4: Random Forest Evaluation Name class Rice Leaf Roller Rice Leaf Caterpillar Paddy Stem Maggot Asiatic Rice Borer Yellow Rice Borer Rice Gall Midge Brown Plant Hopper Rice Stem Fly Rice Water Weevil Rice Leaf Hopper Rice Shell Pest Thrips Precision Recall F1-score Based on the evaluation results of the Random Forest (RF) model, the classification performance generally shows good consistency with high precision, recall, and f1-score values for almost all classes.
Several classes, such as Rice Water Weevil .
and Yellow Rice Borer .
, achieved very high f1-scores .
99 and 0.
, demonstrating the model's ability to detect and classify these categories with high accuracy.
However, weaknesses are still visible in the Rice Leaf Hopper .
and Rice Shell Pest .
classes, which only achieved f1-scores of 0.
75 and 0.
This indicates that although the model is quite reliable, classification challenges remain for classes with high visual similarity or an unbalanced initial data set, even after balancing with SMOTE.
There is no indication of overfitting as is the case with KNN, as the difference between precision and recall is relatively balanced.
Figure 3: Confusion matrix of Random Forest Saputra et al.
/ International Journal of Research in Community Service.
Vol.
No.
3, pp.
147-155, 2025
The confusion matrix for the Random Forest model showed quite good and relatively balanced performance in classifying most classes.
The model was able to recognize classes such as Paddy Stem Maggot .
Rice Leaf Roller .
, and Brown Plant Hopper .
with high accuracy each with over 230 correct predictions.
However, there was some classification confusion, especially in classes such as Rice Stem Fly .
and Rice Water Weevil .
, where the model still made incorrect predictions to some classes such as 6, 5, and 9.
For example, out of 240 data points in class 7, only around 170 were correctly classified, while the rest were scattered across similar classes.
This shows that although Random Forest is quite robust and stable in handling variations between classes, it still faces challenges in distinguishing classes that share similar visual features.
Class Table 5: Logistic Regression Evaluation Name class Rice Leaf Roller Rice Leaf Caterpillar Paddy Stem Maggot Asiatic Rice Borer Yellow Rice Borer Rice Gall Midge Brown Plant Hopper Rice Stem Fly Rice Water Weevil Rice Leaf Hopper Rice Shell Pest Thrips Precision Recall F1-score The Logistic Regression (LR) model demonstrated strong performance in classifying rice pest images, with an overall accuracy of 89.
Several classes, such as Rice Water Weevil .
and Brown Plant Hopper .
, achieved very high precision and recall values, even near perfect, indicating that the model was able to recognize these classes with high accuracy and consistency.
However, upon closer inspection, there were indications of overfitting in several classes, such as Yellow Rice Borer .
and Paddy Stem Maggot .
, which had very high recall values .
but were not accompanied by significant variations in precision.
This could be the result of the oversampling process using SMOTE, which increases the synthetic data and makes the model too "memorized" of certain patterns.
addition, several classes such as Rice Shell Pest .
and Rice Leaf Hopper .
still show relatively lower performance with f1-score values of 0.
74 and 0.
78, respectively, indicating that the model is quite often misclassified in these classes.
Figure 4: Confusion matrix of Logistic Regression Saputra et al.
/ International Journal of Research in Community Service.
Vol.
No.
3, pp.
147-155, 2025
The confusion matrix for the Logistic Regression (LR) model shows that several classes such as Rice Leaf Roller .
Paddy Stem Maggot .
, and Brown Plant Hopper .
are classified very well, indicated by a high number of correct predictions .
bove 240 for class 0 and 257 for class .
However, classification errors start to appear more significant in classes such as Rice Stem Fly .
and Rice Water Weevil .
For example, only 182 instances in class 7 were correctly classified, with the rest spread across several other classes, especially classes 4, 6.
This indicates that Logistic Regression tends to have difficulty recognizing feature patterns from complex or visually overlapping classes.
However, compared to KNN, this model appears more stable with less extreme errors.
Comparison of wrong and correct predictions SVM
KNN
Figure 5: Comparison of correct and incorrect predictions In the SVM model, it can be seen that one of the pest images in class 7 was successfully predicted correctly, but an error occurred when predicting an image from class 11 which was actually classified as class 7.
A similar thing also happened in KNN, where the correct prediction was seen in class 10, but the model incorrectly predicted an image from class 7 as class 11.
In Random Forest, the model was able to recognize images from class 10 correctly, but incorrectly classified an image from class 7 as class 8.
Meanwhile.
Logistic Regression was able to recognize images from class 7 correctly, but experienced an error when predicting an image from class 11 as class 1.
Of all the incorrect predictions displayed, it appears that class 7 and class 11 are the two classes that are most often confused between models, indicating that the visual features between the two classes tend to be similar or difficult for the algorithm to distinguish.
This error consistently appears in all models, indicating that class 7 (Rice Stem Fl.
and class 11 (Thrip.
are quite challenging classes to separate visually, possibly due to the similar shape or texture of the insects in the image.
Conclussion Based on the evaluation results of performance metrics .
recision, recall, and f1-scor.
and confusion matrix, it can be concluded that SVM provides the most consistent and balanced performance, with high f1-scores in almost all Random Forest also shows competitive performance, especially in classes that are easier to recognize, although slightly more variable than SVM.
In contrast.
KNN and Logistic Regression show weaknesses, especially in classes with similar visual features, such as class 7 (Rice Stem Fl.
and class 11 (Thrip.
, where both models often experience misclassification and show symptoms of overfitting in recall values.
Confusion matrix analysis strengthens this finding, by showing that the models tend to have difficulty distinguishing certain classes that have visual similarities, such as classes 7 and 11.
Visualization of correct and incorrect predictions also shows that most prediction errors occur due to similar pest morphology, which requires models with deeper feature extraction capabilities to accurately distinguish.
Saputra et al.
/ International Journal of Research in Community Service.
Vol.
No.
3, pp.
147-155, 2025
References