Computer Science (CO-SCIENCE) Volume 6 Issue 1 January 2026 Accreditation Sinta 4 No.
SK : 230/E/KPT/2022
Optimization of Crop Recommendation Model Using Ensemble Learning Techniques for Multiclass Classification Siti Marlina1.
Titik Misriati2*.
Riska Aryanti3
1,2,3
Bina Sarana Informatika University Jl.
Kramat Raya No.
RT.
2/RW.
Kwitang.
Senen District.
Jakarta.
Indonesia .
e-mail: 1siti.
smr@bsi.
id, 2*titik.
tmi@bsi.
id, 3riska.
rts@bsi.
(*) Corresponding Author Article Info: Received: 24-09-2025 | Revised : 10-11-2025 | Accepted : 20-11-2025 Abstracts - Crop recommendation systems play a crucial role in modern agriculture by helping farmers make data-driven decisions to maximize yield, optimize resource use, and ensure sustainable farming practices.
analyzing environmental and soil parameters, these systems can suggest the most suitable crops for specific conditions, reducing the risks of crop failure and improving overall productivity.
This study evaluates the performance of five ensemble learning algorithmsAiRandom Forest.
Extra Trees.
CatBoost.
XGBoost, and LightGBMAifor multiclass classification in a crop recommendation system.
All models achieved high accuracy above 98%, with Random Forest demonstrating the best and most stable performance.
The feature importance analysis revealed that climatic factors, particularly rainfall and humidity, contributed the most to prediction outcomes, followed by macronutrients such as potassium, phosphorus, and nitrogen.
In contrast, temperature and soil pH showed relatively lower influence.
These findings highlight the dominance of climatic factors over soil chemical properties and demonstrate the capability of ensemble learning methods to capture complex data patterns.
Random Forest is recommended as the primary model to support more effective land management and crop cultivation strategies.
Keywords : Ensemble Learning.
Classification.
Crop Recommendation.
Random Forest Algorithm.
Multiclass
INTRODUCTION
The global demand for sustainable agriculture continues to grow as climate change, population increase, and food security concerns place pressure on agricultural productivity (Food and Agriculture Organization (FAO).
Optimizing crop selection plays a crucial role in improving yield, resource utilization, and resilience to environmental variability.
Traditional crop recommendation systems, which rely heavily on expert knowledge and rule-based approaches, often lack adaptability to diverse environmental and soil conditions (Prity et al.
, 2.
Consequently, data-driven approaches have gained significant attention in recent years (Patel & B.
Patel, 2.
Machine learning (ML) methods have been widely applied in precision agriculture, enabling the integration of soil characteristics, climate data, and historical yield information to recommend suitable crops.
Among them, ensemble learning techniques (Sudianto & Cahyadi, 2.
, such as Random Forest (Meiriyama & Sudiadi, 2.
Extremely Randomized Trees (Hussein & R.
Zeebaree, 2.
Gradient Boosting frameworks like XGBoost (Arumugam S.
et al.
, 2.
and LightGBM (Nguyen et al.
, 2.
, as well as CatBoost (Srinivasu et al.
, 2.
have demonstrated strong predictive performance across various agricultural tasks (Hasan et al.
, 2.
These models leverage multiple weak learners to reduce variance and bias, thereby improving classification accuracy in complex, multiclass scenarios.
Building on this foundation, recent studies highlight the effectiveness of ensemble models in agricultural recommendation systems.
For instance, an ensemble-based framework was developed to recommend suitable crops, achieving superior accuracy compared to single models (Hasan et al.
, 2.
Similarly, previous research employed Random Forest and XGBoost regressors for crop selection, demonstrating robust predictive capability (Rahman et al.
, 2.
Ensemble approaches also extend beyond agriculture, with successful applications in structural engineering (Daniel, 2.
, healthcare (Elshewey et al.
, 2.
, and financial risk assessment (Ying et , 2.
, underscoring their versatility and adaptability.
Despite these advancements, several challenges remain in optimizing ensemble models for multiclass classification in agriculture.
Hyperparameter sensitivity, data imbalance, and model interpretability continue to hinder practical adoption (Haddouchi & Berrado, 2.
(Fajar et al.
, 2.
Furthermore, the heterogeneity of soil and climatic conditions across regions demands models that generalize well while maintaining high predictive This work is licensed under a Creative Commons Attribution-ShareAlike 4.
0 International License.
Copyright .
2026 The Autour.
Computer Science (CO-SCIENCE) Volume 6 Issue 1 January 2.
E-ISSN: 2774-9711 | P-ISSN: 2808-9065 Therefore, this study proposes the optimization and performance evaluation of crop recommendation models using ensemble learning techniques to tackle the challenges of multiclass classification.
By systematically evaluating and fine-tuning algorithms such as Random Forest.
XGBoost.
LightGBM, and CatBoost, this research aims to enhance classification accuracy, interpretability, and robustness.
The novelty of this study lies not only in its comprehensive assessment of ensemble learning algorithms for multiclass agricultural data but also in its introduction of a comparative framework that integrates featureAeresponse interactions unique to cropAaspecific This framework enables a deeper understanding of how climatic and soil variables collectively influence model performance, offering a more transparent and interpretable basis for precision-agriculture decision-making.
RESEARCH METHOD
The research method is designed to develop and optimize a multiclass classification-based crop recommendation model using an ensemble learning approach, as illustrated in Figure 1.
The stages of the study include data collection, preprocessing, model development, and model evaluation.
Source: Research Results .
Figure 1.
Research Methodology Dataset Agricultural data were collected from the publicly available Crop Recommendation Dataset on Kaggle, originally compiled to support machine learning applications in agriculture.
The dataset consists of 2,200 records, each representing a unique set of environmental and soil parameters associated with a specific crop.
The features include nitrogen (N), phosphorus (P), potassium (K), temperature (AC), humidity (%), pH, and rainfall .
The target variable, crop type, contains 22 distinct crop classes, covering a wide variety of grains, pulses, fruits, and vegetables such as rice, maize, chickpea, banana, and mango.
This stage is essential to ensure the availability of representative data (Hasan et al.
, 2.
To evaluate model performance effectively, the dataset was randomly divided into training and testing subsets.
A 80:20 split ratio was applied, where 80% of the data 1,760 samples were used for model training and 20% 440 samples were reserved for testing.
This division ensured that each crop class was proportionally represented in both subsets, maintaining class balance for fair model evaluation.
Preprocessing The preprocessing stage was carried out to ensure that the dataset was clean, consistent, and suitable for model The process included data cleaning to eliminate duplicates, inconsistencies, and noise.
to standardize feature scales and improve model stability.
and handling of missing values through imputation techniques .
ean, median, mode, or predictive method.
or exclusion of records with excessive missing In addition, data transformation such as one-hot encoding was applied to categorical features to ensure compatibility with machine learning algorithms.
These steps were essential to improve dataset quality and ensure higher accuracy and reliability of the resulting model (Shahid et al.
, 2.
Model Training The crop recommendation model was developed using several popular ensemble learning algorithms, namely Random Forest, which applies bagging based on decision trees.
Extra Trees, which is similar to Random Forest but introduces more randomness in node splitting to increase variance.
CatBoost, an optimized boosting method for handling categorical data.
XGBoost, a widely used and efficient gradient boosting algorithm for http://jurnal.
id/index.
php/co-science Computer Science (CO-SCIENCE) Volume 6 Issue 1 January 2.
E-ISSN: 2774-9711 | P-ISSN: 2808-9065 tabular data, and LightGBM, a high-efficiency boosting technique that employs histogram-based learning.
Employing multiple algorithms allows for performance comparison, thereby facilitating the selection of the most optimal model for multiclass classification tasks.
Model Evaluation After training, the models were evaluated using multiclass classification metrics such as accuracy, precision, recall.
F1-score, confusion matrix, and Macro-F1 for handling imbalanced data.
This evaluation ensured that the models were not only accurate for majority classes but also fair in representing minority classes (Ramzan et al.
, 2.
With this method, the study is expected to produce a crop recommendation system that is accurate, adaptive, and capable of assisting farmers in determining the most suitable crop types for their land conditions.
RESULTS AND DISCUSSION
Variable distribution analysis was carried out to understand the characteristics of the data in developing the crop recommendation model, as shown in Figure 2.
The results indicate that soil nutrients exhibit diverse distribution patterns: Nitrogen (N) is concentrated in the range of 20Ae40.
Phosphorus (P) is relatively evenly distributed up to 80 with some anomalies above 100, and Potassium (K) is dominated by low values with high outliers reaching up to 200.
Environmental variables show a more regular distribution: temperature approaches normal with an average of 25Ae27AC, humidity is high within the range of 60Ae100%, pH is nearly normal at 6Ae7, and rainfall is widely spread between 50Ae150 mm with some extreme values exceeding 250 mm.
These findings emphasize that understanding the initial distribution is crucial for addressing non-normality and outliers, thereby improving the performance of ensemble learning models in providing optimal crop recommendations.
Source: Research Results .
Figure 2.
Variable Distribution Correlation analysis among variables using PearsonAos coefficient showed that most features have weak correlations, indicating that each variable may provide unique information for the crop recommendation model, as depicted in Figure 3.
The strongest correlation was observed between Phosphorus (P) and Potassium (K) with a value of 0.
74, suggesting a tendency to increase together.
In contrast.
Nitrogen (N) exhibited low and negative correlations with P and K (Ae0.
23 and Ae0.
14, respectivel.
Environmental variables such as temperature, humidity, pH, and rainfall did not show significant relationships, with the highest correlation being only 0.
21 between temperature and humidity.
These findings highlight the low multicollinearity among features, which is advantageous in machine learning as it reduces information redundancy and enhances the modelAos ability to capture complex patterns.
http://jurnal.
id/index.
php/co-science Computer Science (CO-SCIENCE) Volume 6 Issue 1 January 2.
E-ISSN: 2774-9711 | P-ISSN: 2808-9065 Source: Research Results .
Figure 3.
Feature Correlation Heatmap Variable distribution analysis by crop type using boxplots revealed differences in nutrient requirements and environmental conditions across commodities, as presented in Figure 4.
Nitrogen (N) content was higher in paddy, maize, cotton, and coffee, while Phosphorus (P) and Potassium (K) were highest in grape and apple.
Environmental factors also showed variation: tropical crops such as papaya, coconut, and mango thrive at higher temperatures, whereas apple grows better in cooler climates.
High humidity was required by rice, coconut, and coffee, while pomegranate and lentil were more tolerant of low humidity.
Soil pH was generally stable within 5.
5Ae 7, except for chickpea and lentil, which can grow in near-alkaline conditions.
Rainfall requirements varied greatly, with rice, coconut, and coffee needing more than 200 mm, whereas mungbean, mothbeans, and pomegranate grow well with less than 100 mm.
These findings confirm that each crop has distinct ecological and nutritional needs, underscoring the importance of developing machine learningAebased recommendation models capable of recognizing the unique characteristics of each plant.
Source: Research Results .
Figure 4.
Distribution of Soil Nutrients and Environmental Factors by Crop Type Feature importance analysis using the mutual information method revealed that environmental variables were more dominant than soil nutrients in crop classification, as illustrated in Figure 5.
Humidity .
and rainfall .
emerged as the most influential factors, followed by Potassium (K) .
and Phosphorus (P) .
Temperature .
also played an important role in distinguishing tropical and subtropical crops, http://jurnal.
id/index.
php/co-science Computer Science (CO-SCIENCE) Volume 6 Issue 1 January 2.
E-ISSN: 2774-9711 | P-ISSN: 2808-9065 whereas Nitrogen (N) .
and pH .
had relatively lower influence.
These findings confirm that agroclimatic factors play a greater role than soil chemical properties in determining crop suitability, consistent with previous studies highlighting the importance of climatic conditions in land mapping using machine learning.
Source: Research Results .
Figure 5.
Feature importance by Mutual Information The feature importance analysis using Mutual Information showed that agroclimatic variables, particularly humidity and rainfall, had the greatest influence on crop classification, followed by Potassium (K) as the main soil nutrient factor.
Phosphorus (P) ranked at a medium level, while temperature and Nitrogen (N) contributed less.
Soil pH demonstrated the smallest influence.
Overall, climatic factors proved to be more dominant than soil chemical properties, consistent with previous studies highlighting the importance of agroclimatic conditions in land suitability mapping and crop recommendation using machine learning.
Table 1.
Evaluation Results of Ensemble Learning Models Model Accuracy Random Forest Extra Trees CatBoost XGBoost LightGBM Source: Research Results .
Precision Recall F1Score The evaluation results show that all five ensemble learning algorithms (Random Forest.
Extra Trees.
CatBoost.
XGBoost, and LightGBM) achieved excellent performance with accuracy above 98%, as shown in Table 1.
Random Forest emerged as the best model, with an Accuracy of 0.
Precision of 0.
Recall of 9931, and F1-Score of 0.
9931, demonstrating very low classification error.
Its superiority is supported by the bagging mechanism and random feature selection, which enhance generalization.
Extra Trees and CatBoost followed with accuracy close to 0.
989, while XGBoost and LightGBM achieved around 0.
The performance differences were mainly influenced by the algorithmsAo sensitivity to parameters and data structure.
Overall, although all models were competitive.
Random Forest proved to be the most superior and stable, making it the recommended primary model for multiclass classification in crop recommendation systems.
Source: Research Results .
Figure 6.
Comparison of Machine Learning Model Performance http://jurnal.
id/index.
php/co-science Computer Science (CO-SCIENCE) Volume 6 Issue 1 January 2.
E-ISSN: 2774-9711 | P-ISSN: 2808-9065 The performance comparison of five ensemble learning models demonstrates that all algorithms (Random Forest.
Extra Trees.
CatBoost.
XGBoost, and LightGBM) achieved high accuracy with evaluation metrics close to 0, as depicted in Figure 6.
Random Forest consistently showed a slight advantage in Accuracy.
Precision, and Recall, highlighting the effectiveness of the bagging mechanism in enhancing generalization.
Extra Trees and CatBoost exhibited performance nearly equivalent to Random Forest, while XGBoost and LightGBM remained competitive though slightly lower.
Overall, all models proved reliable for multiclass classification, but Random Forest emerged as the most optimal choice for the crop recommendation system.
Table 2.
Feature Importance Analysis using Random Forest Model Feature Source: Research Results .
Importance The feature importance analysis using Random Forest, as presented in Table 2, revealed that climatic factors, particularly rainfall .
and humidity .
, exert the greatest influence on crop classification, followed by the soil nutrient potassium (K) .
Phosphorus (P) and nitrogen (N) contributed at a moderate level, while temperature .
and soil pH .
played the least significant roles.
These findings confirm that climatic factors are more dominant than soil properties in determining crop suitability and further demonstrate the capability of Random Forest to capture complex patterns, thereby enabling more accurate recommendations.
CONCLUSION
The findings of this study demonstrate that climatic factors, particularly humidity and rainfall, exert the greatest influence on crop classification, followed by soil nutrients such as potassium (K) and phosphorus (P).
contrast, nitrogen (N), temperature, and pH show relatively minor effects.
Among the ensemble learning algorithms tested.
Random Forest achieved the best and most stable performance, with accuracy, precision, recall, and F1-score values reaching approximately 0.
993, outperforming other models in the range of 0.
986Ae0.
These results emphasize that climatic conditions should be prioritized in data collection and modeling for crop recommendation systems.
Furthermore, the optimized Random Forest model demonstrates high reliability and interpretability, providing a practical foundation for data-driven decision-making in precision agriculture, including land-use planning, resource optimization, and sustainable crop cultivation.
Future research should focus on improving the generalization and adaptability of ensemble models across diverse climatic and geographical regions by incorporating larger, multi-regional datasets.
Integrating real-time climatic data, remote sensing imagery, and Internet of Things (IoT)-based agricultural monitoring can enhance model responsiveness and scalability.
Additionally, exploring hybrid ensemble frameworks that combine Random Forest with deep learning or metaheuristic optimization techniques could further improve predictive accuracy and Continued investigation into explainable AI methods is also essential to ensure that future crop recommendation systems remain transparent, user-friendly, and applicable for smart, sustainable agricultural decision support.
ACKNOWLEDGEMENT
This research was funded by the Bina Sarana Informatika University Foundation under the 2025 Foundation Research Grant Scheme.
The authors would like to express their sincere gratitude to the foundation for its support and trust in the implementation of this study.
This support has been instrumental in advancing research efforts that contribute to the development of knowledge and the application of data-driven technologies in the field of REFERENCES