MUNISI: Military Mathematics and Natural Sciences e-ISSN: 3026-2968 Vol.
No.
June 2025, pp.
Available online at https://jurnal.
id/index.
php/munisi Comparison Nayve Bayes and SVM to Classify Drought-Infected Rice Plants Based on Morphological Characteristics in Supporting National Food Security Damaris Christi1*.
Angelia Melisa Hutapea2.
Fulkan Al Husein1.
Nadiza Lediwara3.
Sembada Bimorogo3.
Rizky Dwi Satrio2 1Department of Mathematics.
Faculty of Military Mathematics and Natural Sciences.
Republic of Indonesia Defense University.
Bogor.
Indonesia 2Department of Biology.
Faculty of Military Mathematics and Natural Sciences.
Republic of Indonesia Defense University.
Bogor.
Indonesia 3Department of Informatics.
Faculty Defense Sciences and Technology.
Republic of Indonesia Defense University.
Bogor.
Indonesia * Corresponding Author: damarischristi@gmail.
Abstract Data mining is part of the Knowledge Discovery in Database (KDD) process.
The use of data mining serves to classify, predict, and extract other useful information from large data sets.
This study aimed to classify rice plants under treatment .
rought stress and contro.
using data mining, focusing on the analysis of the variables of Leaf Area (LA).
Root Length (RL), and Shoot Length (SL).
Each classification algorithm has different characteristics, resulting in varied performance results.
After testing both classification algorithms, the accuracy results were 71.
70% for Nayve Bayes and 73.
85% for SVM.
This shows that the SVM algorithm performs better than Nayve Bayes algorithms to determine best treatment of rice to support national food security further.
Furthermore.
It also can be concluded that using a machine learning approach can solve problems in the classification of rice plants affected by drought threats is fairly effective with the maximum score obtained is only 73.
Keywords: Modelling.
Nayve Bayes.
Rice.
SVM
Introduction Rice (Oryza sativ.
is a staple food that sustains nearly half of the global population .
Especially in Indonesia, the significance of rice by stating that rice is a daily staple for about 95% of Indonesia's population .
The increasing global food demand, driven by population growth, exerts significant pressure on agricultural systems to enhance productivity of rice and ensure sustainable food availability .
As a water-dependent crop, rice cultivation is highly susceptible to environmental stressors, especially drought, which can drastically reduce yields .
Drought stress limits the plantAos ability to carry out vital physiological processes, such as nutrient uptake and photosynthesis, thus hindering growth and leading to lower production .
Given these challenges, understanding how rice plants respond to drought conditions is critical for improving crop resilience and developing better agricultural management practices.
In recent years, technological advancements in agriculture have increasingly relied on data-driven approaches, such as machine learning, to analyze plant responses to environmental factors.
Machine learning models can be employed to Christi et al.
ComparisonA classify plant responses to different treatments, helping farmers and agricultural managers make informed decisions to mitigate the effects of stressors like drought .
Specifically, the application of classification algorithms, such as Nayve Bayes and Support Vector Machine (SVM), has gained attention in agriculture for their ability to predict outcomes based on plant growth data .
These algorithms utilize input features like plant physiological traitsAisuch as Leaf Area (LA).
Root Length (RL), and Shoot Length (SL)Aito classify the treatment conditions .
ontrol or drough.
that have been applied to the plants.
The plant materials used in this study were an F9 population consisting of 90 recombinant inbred lines (RIL.
, which originated from a cross between the rice varieties IR64 and Hawara Bunar, referred to as IRH .
The parent lines have distinct traits in response to drought stress.
IR64 is a highyielding lowland variety but is vulnerable to drought, whereas Hawara Bunar (HB) is a local upland variety that is well suited to drought-prone environments .
The Nayve Bayes algorithm, rooted in Bayes' Theorem, is a probabilistic classifier that operates under the assumption of feature independence.
This simplicity in design, combined with its effectiveness, makes Nayve Bayes a popular choice in various classification tasks, especially in text and sentiment analysis .
Despite the often unrealistic assumption of feature independence.
Nayve Bayes frequently delivers strong performance in practice, as it calculates the posterior probability of each class based on the prior probabilities and the likelihood of the observed data.
the other hand.
Support Vector Machine (SVM) is a powerful supervised learning algorithm known for its robustness in handling complex, high-dimensional datasets.
SVM works by finding the optimal hyperplane that separates data points into distinct classes, making it particularly well-suited for binary classification tasks where the goal is to separate data into two categories .
Several studies have demonstrated the efficacy of Nayve Bayes and SVM in classification problems across different domains.
Riyadi .
compared the performance of Nayve Bayes and SVM in classifying online readership, reporting that the SVM algorithm achieved an accuracy of 63.
39%, outperforming Nayve Bayes.
Similarly.
Narayan .
conducted a study about comparative analysis of two classifiersAiSupport Vector Machine (SVM) and Naive BayesAifor classifying surface electromyography .
EMG) signals, finding that SVM produced a higher accuracy .
8%) than Nayve Bayes.
Another study by Apriyani & Kurniati .
compared Nayve Bayes and SVM in the classification of diabetes mellitus at Siti Khadijah Islamic Hospital in Palembang, with SVM achieving the highest accuracy of 96.
These findings suggest that while Nayve Bayes is a strong baseline classifier.
SVM often provides superior performance in more complex classification tasks.
In the context of rice cultivation, classification models serve as essential tools for predicting how rice plants will respond to varying treatment conditions .
Given the critical role that rice plays in global food security, especially in countries like Indonesia where it serves as a staple food, optimizing crop management strategies is imperative .
One way to achieve this optimization is through the development of machine learning models that can accurately predict the treatment needs of rice plants based on specific physiological characteristics.
These models can Christi et al.
ComparisonA utilize input features such as Leaf Area (LA).
Root Length (RL), and Shoot Length (SL), which are key indicators of plant health and growth.
By leveraging such features, machine learning algorithms can classify whether rice plants are under drought stress or in a controlled environment.
The present study seeks to investigate which algorithmAiNayve Bayes or SVMAi yields the highest classification accuracy for rice plant responses to drought This research is based on data collected in 2021 from a study conducted by Satrio et al.
, involving 90 rice varieties subjected to two treatments: control and drought stress.
The dataset contains observations on three key physiological variables: Leaf Area (LA).
Root Length (RL), and Shoot Length (SL).
These variables are used as features in the machine learning models, with the treatment condition .
ontrol or drough.
serving as the target variable.
The importance of this research lies in its potential to contribute to the growing body of literature on data mining and machine learning in agriculture.
By comparing the performance of Nayve Bayes and SVM in classifying rice plant responses to treatment, this study aims to provide insights into which algorithm is more effective for this specific application.
Additionally, this research highlights the broader applicability of machine learning techniques in agricultural decision-making, offering a foundation for future technological developments aimed at improving crop resilience and productivity.
More over, the purpose of this research is to contribute to national food security by improving the understanding of how rice plants, a staple crop for millions of Indonesians, respond to drought conditions.
By utilizing machine learning techniques, specifically the Nayve Bayes and Support Vector Machine (SVM) algorithms, this study aims to accurately classify rice plant treatments and responses under drought stress.
The insights gained from this classification can help optimize agricultural management strategies, ensuring that rice plants receive the most effective treatments during adverse environmental conditions.
This, in turn, enhances rice productivity and resilience, directly supporting Indonesia's efforts to secure a stable food supply amidst growing challenges like climate change and water scarcity.
Through this research, technological advancements in data-driven decision-making for crop management can be fostered, ultimately strengthening the country's food security infrastructure.
Materials and Methods Research Framework This study aims to classify rice plant growth based on treatments .
rought and contro.
using data mining techniques.
The detailed stages applied in this research include several steps as shown in Figure 1.
Christi et al.
ComparisonA Figure 1.
Research Framework The process begins with importing the data using the Read CSV operator in RapidMiner.
Next, missing values are handled with the Replace Missing Values operator, and data normalization is performed using the Normalize operator to standardize the variable scales .
The target variable is then converted into a numerical format through the Nominal to Numerical operator.
The dataset is divided into training data .
%) and testing data .
%) using the Split Data operator .
The process continues with outlier detection using the Outlier Detection operator to ensure the data is clean and ready for modeling.
With these preprocessing steps, the dataset becomes structured and prepared to build an effective machine learning model for predicting the appropriate treatment for rice plants under stress conditions as shown in Figure 2.
Figure 2.
Data Preprocessing with RapidMiner After the data has been successfully split and quantified, modeling is conducted using the machine learning modeling method applied for classification, which is Support Vector Machine (SVM) and Naive Bayes as shown in Figure 3.
Christi et al.
ComparisonA Figure 3.
Application of Classification Model in RapidMiner Dataset This research employs direct observation methods to obtain primary data.
The dataset consists of 175 observation points with four explanatory variables: leaf width, root length, plant height, and developmental stage, which are used to predict the target variable, namely the type of treatment.
Table 1 presents a sample dataset used.
addition to the variable 'environment,' the other three dataset variables are numeric.
Table 1.
The sample of Dataset Genotype Leaf Width Root Lenght Plant Height Environment Control Control Control Drought Drought Table 1 presents a sample from a dataset consisting of 175 observations, where each observation corresponds to a different genotype of rice plants and includes four The three numeric variablesAileaf width, root length, and plant heightAiare measured in centimeters.
These variables provide information about the plantAos physical characteristics and are used to explain growth under different environmental The fourth variable, "Environment," is categorical and indicates whether the plant was grown under "Control" .
ormal growt.
or "Drought" .
ater-stresse.
For example, the first genotype has a leaf width of 6.
59 cm, a root length 52 cm, and a plant height of 64.
96 cm, and was grown under "Control" In contrast, the fifth genotype has a leaf width of 7.
75 cm, a root length of 32 cm, and a plant height of 60.
00 cm, and was grown under "Drought" The aim of the dataset is to use the numeric variables .
eaf width, root length, and plant heigh.
to predict the environmental condition .
ither "Control" or "Drought") in which the plant was grown.
This helps in assessing how plant morphology responds to different growing conditions.
Next, the classification modeling methods that will be used to classify plants in "Drought" and "Control" conditions will be explained.
Two methods.
Support Vector Machine and NayveBayes, will be used to compare the accuracy of both models.
Christi et al.
ComparisonA Support Vector Machine Algorithm Support Vector Machine (SVM) is a well-known machine learning algorithm used for solving various classification problems .
It operates by selecting a subset of features from the training samples, such that the classification of these features is equivalent to dividing the entire dataset .
The primary objective of SVM is to create an optimal decision boundary between the existing data classes.
SVM's aim is to find the individual hyperplane with the highest margin that can divide the classes linearly .
SVM can handle both linear and nonlinear data through techniques like soft margin hyperplane and feature space transformation .
This hyperplane maximizes the margin, defined as the distance between the hyperplane and the nearest data points from each class.
In cases of non-linear classification.
SVM employs a kernel technique to transform the data into a higher-dimensional space where the classes become linearly separable.
To maximize the margin.
SVM minimizes the following function:
AnycAn2 subject to the constraints:
c Oo ycuycn yc.
Ou 1 OAycn Where ycycn is a class label .
or -.
from ycuycn Equation .
is illustrated in Figure 4.
Figure 4.
Hyperplane SVM Ilustration Nayve Bayes Algorithm Nayve Bayes is a machine learning classification algorithm based on BayesAo Theorem, with the "naive" assumption that all features are independent .
Despite this assumption rarely holding true in real-world situations, the algorithm often performs well across various applications, particularly in tasks like text classification and sentiment analysis .
The strength of Nayve Bayes lies in its ability to calculate both prior and posterior probabilities, which are then used to make classification decisions .
Bayes' Theorem (Equation .
) forms the basis of this approach.
Christi et al.
ComparisonA providing a way to update the probability of a hypothesis ya given new evidence ycU.
Specifically, the posterior probability ycE.
ycU) and the likelihood ycE.
, making it a powerful tool for decision-making under uncertainty.
cU) = ycE.
cU).
cU) Evaluation Methods The evaluation stage serves to measure the performance of the model with test The evaluation results will show how well the model can predict the optimal treatment for rice plants.
At this stage, evaluation metrics such as accuracy, precision, and recall are used to assess model performance .
, 24, .
Table 2.
Confussion Matrix Actual Prediction Control True Positive (TP) False Positive (FP) Control Drought ycNycE ycNycA yaycaycaycycycaycycnycuycu = ycE= ycNycE ycNycA yaycE yaycA ycNycE ycNycE yaycE ycIyceycaycaycoyco.
cI) = ya1 ycycaycuycyce = ycNycE ycNycE yaycA 2ycEycI ycE ycI Drought False Negative (FN) True Negative (TN) .
The evaluation process focuses on improving the scores by modifying existing features, adjusting model parameters, and further exploring the properties of the data .
The goal is to identify the most suitable method and achieve the highest possible performance scores.
Results and Discussion Results The results of the comparison between the two methods.
SVM and Nayve Bayes, are presented separately in the form of a confusion matrix evaluation.
The accuracy of both models will be evaluated and discussed one by one in the following subsections.
Further implementation and interpretation will also be discussed in more detail in the subsections below.
Christi et al.
ComparisonA The SVM Algorithm The first classification modeling is performed using the SVM algorithm.
The performance of the SVM algorithm is presented in Figure 5, with an accuracy of 85%, precision of 70%, and recall of 80.
Below.
Figure 5 presents the confusion matrix to clarify the accuracy that has been explained above.
Figure 5.
Confussion Matrix SVM The confusion matrix illustrates the performance of a Support Vector Machine (SVM) model in classifying rice plants as either under drought conditions or in a control .
on-drough.
The matrix consists of the predicted classifications compared to the actual conditions.
Out of the actual control plants, 21 were correctly identified as being in the control condition, while 5 were incorrectly predicted as being in drought conditions.
Conversely, for the actual drought plants, 18 were correctly classified as being in drought, while 9 were mistakenly classified as being in the control This matrix reveals the strengths and limitations of the SVM model in distinguishing between the two classes.
While the model performs well in many cases, there are still errors, particularly when it misclassifies some drought-affected plants as being in the control group.
From this matrix, additional performance metrics like accuracy, precision, recall, and F1-score can be calculated to provide a more detailed assessment of the model's classification ability.
The Nayve Bayes Algorithm The first experiment involves the process using the Nayve Bayes algorithm.
The Nayve Bayes algorithm's approach is based on probabilities, so there is no manual input of parameters.
The performance of the Nayve Bayes model is presented in the table, with an accuracy of 71.
70%, precision of 70.
37%, and recall of 73.
Below.
Figure 6 presents the confusion matrix to clarify the accuracy that has been explained above.
Figure 6.
Confussion Matrix Nayve Bayes The confusion matrix shows the performance of a Nayve Bayes classification model that aims to classify rice plants as either being in a "Control" condition .
o drough.
or in a "Drought" condition .
xperiencing drough.
The matrix has four key Christi et al.
ComparisonA values: 19 instances where the model correctly predicted "Control" when the actual condition was "Control" .
rue positive.
, 8 instances where the model incorrectly predicted "Control" when the actual condition was "Drought" .
alse positive.
, 7 instances where the model incorrectly predicted "Drought" when the actual condition was "Control" .
alse negative.
, and 19 instances where the model correctly predicted "Drought" when the actual condition was "Drought" .
rue negative.
Overall, the model performs reasonably well, with correct predictions in most cases, but it has a moderate number of misclassifications, particularly in predicting the "Control" condition for plants actually under "Drought.
" This can be an indication that the model might slightly overestimate the "Control" condition.
Discussion The classification of rice plants affected by drought and those that are not, based on morphological traits such as leaf area, root length, and shoot length, holds several key expectations.
Early drought detection is anticipated, enabling farmers and researchers to take quicker preventive measures to minimize its impact on crop yields .
Additionally, distinguishing between drought-affected and unaffected plants allows for more efficient resource management, such as water and nutrients, potentially boosting agricultural productivity, particularly in drought-prone areas .
This classification could also offer valuable insights for breeding programs, prioritizing traits like longer roots for the development of drought-tolerant rice varieties.
Clear data on drought-affected plants would support better land management strategies, helping farmers choose suitable areas or adjust irrigation systems.
Furthermore, the system could mitigate production risks by providing more accurate information to guide decisions on irrigation, fertilizer use, and harvest timin.
Ultimately, this approach could enhance the sustainability of agriculture through wiser resource use and more stable yields, even under drought stress, contributing to food security and better adaptation to climate-related water fluctuations.
The results from the SVM and Nayve Bayes algorithms demonstrate their effectiveness in classifying rice plants under drought and control conditions, with accuracy rates of 73.
85% and 71.
70%, respectively.
The SVM model, with its higher recall .
77%), shows a stronger ability to correctly identify drought-affected plants, making it useful for early drought detection.
Meanwhile, the Nayve Bayes model exhibits balanced precision .
37%) and recall .
08%), offering reliable performance in both identifying control and drought conditions.
These results reveal the models' strengths and limitations in predicting crop stress, which can be applied in agricultural decision-making.
In the context of national food security, these models can play a vital role in managing rice crops, a key food source.
Accurate prediction of drought conditions allows for timely interventions in irrigation, resource management, and disaster response.
By implementing these models in an agricultural monitoring system using remote sensing data or in-field sensors, authorities can continuously monitor crop health and act swiftly in high-risk drought areas.
This AI-driven approach can help enhance food security by reducing crop losses, optimizing resource use, and ensuring better preparedness for drought conditions across regions.
Christi et al.
ComparisonA Conclusion The conclusion drawn from this research is that the machine learning approach to selecting the appropriate treatment for rice plants is quite effective.
Each classification algorithm has distinct characteristics, leading to varying performance After testing both classification algorithms, the accuracy results were 71.
for Nayve Bayes and 73.
85% for SVM.
SVM demonstrated the highest performance.
Therefore, the best algorithm for the rice plant treatment selection case, using numerical data models with a machine learning approach, is the Support Vector Machine, as its performance is well-suited for classifying two different classes.
Additionally, the workflow developed in this research is a significant achievement, as it can be applied to any type of data in the future using the established workflow.
general, based on the study's findings, plant height, leaf width, and root length are significant metrics for estimating the appropriate treatment for rice plants.
Acknowledgements This research was supported by the Republic of Indonesia Defense University for allowing the use of their facilities for research purposes.
References