ISSN 2087-3336 (Prin. | 2721-4729 (Onlin. TEKNOSAINS: Jurnal Sains. Teknologi dan Informatika Vol. No. 1, 2026, page. http://jurnal. id/index. php/tekno https://doi. org/10. 37373/tekno. A comparative study of tree-based machine learning algorithms for artificial lift optimization Geovanny Branchiny Imasuly*. Marcia V. Rikumahu *Department of Petroleum Engineering. Pattimura University. Maluku. Indonesia, 97233. Jln. Ir. Putuhena *Corresponding Author: gimasuly@gmail. Submitted: 23/4/2025 Revised: 15/6/2025 Accepted: 14/7/2025 ABSTRACT The selection of an appropriate artificial lift method is critical in the oil and gas industry to ensure production continuity as reservoir pressure declines. However, current selection processes still largely rely on technical expertise and conventional heuristic approaches, which are often insufficient for handling the complexity of reservoir characteristics and dynamic operational conditions. This study aims to evaluate the performance of three tree-based machine learning algorithmsAiDecision Tree. Random Forest, and Gradient BoostingAiin predicting the optimal artificial lift method. Historical field data, including fluid flow rate, temperature. API gravity, and artificial lift method labels, were used to train the models. The data underwent preprocessing steps such as data cleaning, encoding, and splitting into training and testing sets before being modeled using the Scikit-learn library. The performance of the three models was evaluated using standard classification metrics, including accuracy, precision, recall, and F1-score. The results indicate that the Decision Tree algorithm achieved an accuracy of approximately 81%. Random Forest yielded the highest accuracy at around 94% . ith a validation accuracy of 93. 67%), while Gradient Boosting performed the least effectively with an accuracy of about 64%. Feature importance and SHAP analysis revealed that temperature was the most influential variable in selecting the artificial lift method, followed by API gravity and fluid flow rate. In conclusion. Random Forest was the most effective model, offering the best combination of accuracy and stability in predicting the optimal artificial lift method. Keyword: Artificial lift. machine learning. decision tree. random forest. gradient boosting Introduction Artificial lift is one of the essential techniques in the oil and gas industry for maintaining production rates when reservoir pressure declines . The appropriate selection of artificial lift methods is crucial to ensuring the continuity of hydrocarbon production and operational efficiency, particularly in fields with complex and heterogeneous reservoir conditions. However, in practice, the selection process still heavily relies on technical expertise and heuristic approaches, which tend to be subjective and not always optimal . With the advancement of digital technologies, data-driven approaches have been increasingly adopted in decision-making processes within the energy sector, including in oil and gas production management. Machine learning (ML), particularly decision treebased algorithms, has shown great potential in uncovering hidden patterns and making accurate predictions based on historical well operation data . Algorithms such as Decision Tree. Random TEKNOSAINS: Jurnal Sains. Teknologi & Informatika is licensed under a Creative Commons Attribution-NonCommercial 4. 0 International License. ISSN 2087-3336 (Prin. | 2721-4729 (Onlin. ISSN 2087-3336 (Prin. | 2721-4729 (Onlin. DOI 10. 37373/tekno. Forest, and Gradient Boosting offer advantages in terms of interpretability and the ability to handle high-dimensional data and nonlinear relationships among input parameters . Previous studies have demonstrated that machine learning approaches can significantly enhance prediction accuracy in various petroleum engineering applications, ranging from reservoir classification to production rate forecasting . However, comprehensive evaluations comparing the effectiveness of different decision tree algorithms in the context of artificial lift selection remain Yet, choosing the appropriate model can have a direct impact on production performance, cost efficiency, and well longevity. This study aims to address this gap by comparing the performance of three widely used ML algorithms. Decision Tree. Random Forest, and Gradient Boosting, in predicting the optimal artificial lift method. The dataset includes critical parameters such as flow rate, temperature. API gravity, and the artificial lift method label applied in the field. Model performance is evaluated using standard metrics, including the confusion matrix, precision, recall, and F1-score, to gain a comprehensive understanding of model accuracy. In addition, a feature importance analysis is conducted to identify the most influential parameters in the model's decision-making process. The findings of this study are expected to provide practical insights for data-driven decision-making in the oil and gas industry and to highlight the potential of machine learning as a reliable, adaptive, and efficient tool for well production optimization. Method This study employs a quantitative experimental approach aimed at comparing the performance of three decision tree-based machine learning algorithms: Decision Tree. Random Forest, and Gradient Boosting in predicting the optimal artificial lift method. The research process is conducted sequentially through the stages of data collection, preprocessing, model training, performance evaluation, and feature analysis. Figure 1 illustrates the research method's flow. Figure 1. Flow of Research Data Collection The dataset analyzed in this study consists of historical operational data from oil and gas wells, encompassing key variables such as fluid flow rate . arrels per da. , reservoir temperature (AF). API gravity, and the corresponding artificial lift method applied (ESP. Gas Lift. Hydraulic-Piston Pump. Jet Pump. Plunger Cavity Pump, and Rod Pum. The data were compiled from an internal production information system and underwent anonymization to ensure the confidentiality of operational records. A total of 2,000 data entries were utilized in this research, which were subsequently divided into two subsets: 70% for model training and 30% for model testing. Data Preprocessing 36 Geovanny Branchiny Imasuly. Marcia V. Rikumahu A comparative study of tree-based machine learning algorithms for artificial lift optimization The data collected may contain errors, missing values, redundant information, and other inconsistencies, which can hinder the performance of machine learning models. Therefore, the data must undergo a cleansing procedure known as preprocessing . Preprocessing in machine learning typically involves five key steps: Importing the dataset Importing the libraries Encoding categorical data Data visualization Splitting the dataset into training set and test set This entire preprocessing workflow can be automated using machine learning algorithms, mathematical modeling, and statistical techniques. Algorithm and Model Training This study employs supervised learning algorithms to classify data into specific categories . Three tree-based algorithms from the Scikit-learn library were used in the model training process, as outlined below: Decision Tree Classifier The model was constructed using DecisionTreeClassifier() from sklearn. This algorithm builds a decision tree by splitting the data based on the most informative features . Random Forest Classifier To improve accuracy, the RandomForestClassifier() from sklearn. ensemble was employed. This algorithm constructs multiple decision trees and combines their outputs through a majority voting mechanism . Gradient Boosting Classifier The GradientBoostingClassifier() from sklearn. ensemble was used to build the model iteratively, focusing on correcting the errors made by previous iterations . Model Evaluation Metrics Model evaluation is a crucial component in the model development process. The following are commonly used evaluation metrics: Confusion Matrix The confusion matrix is one of the most widely used metrics for evaluating the performance of classifiers, particularly in multiclass classification tasks. An illustration of the confusion matrix for binary classification . wo classe. is presented in Figure 2. Figure 2. An illustration of confusion matrix The confusion matrix is recommended to obtain more detailed information for evaluating a modelAos accuracy in correctly and incorrectly predicting each class. Additionally, the terms used in the confusion matrix are defined in Figure 2, based on a 2 y 2 matrix for classes A and B: ISSN 2087-3336 (Prin. | 2721-4729 (Onlin. DOI 10. 37373/tekno. C True positive (TP): the number of instances where the machine learning model correctly classifies samples as class A. C False negative (FN): the number of instances where the machine learning model incorrectly classifies samples as not belonging to class A, when they actually do. C False positive (FP): the number of instances where the machine learning model incorrectly classifies samples as class B, when they actually do not belong to class B. C True negative (TN): the number of instances where the machine learning model correctly classifies samples as class B. Based on this matrix, several accuracy metrics can be calculated, including precision, recall, and F1-score. Precision Precision is defined as the ratio of true positive samples correctly predicted for a particular class to the total number of samples predicted as belonging to that class. A high precision value indicates that the majority of samples predicted by the model to belong to a specific class indeed truly originate from that class . Recall Recall refers to the proportion of correctly identified true positive instances compared to the total actual positive instances for a particular class. A high recall value means that most of the actual positive samples are correctly recognized, with only a small number being mistakenly classified into other categories . F1-score F1-score memberikan rata-rata harmonis dari precision dan recall. Pada saat pemilihan model, model dengan nilai F1-score tertinggi dipilih di antara beberapa algoritma klasifikasi . ! y" ! " 1= 2y Accuracy Evaluation was conducted on the test data to measure the generalization capability of each model . # $ & '( & '( Feature Importance Analysis After training the models, a feature importance analysis was conducted to identify the relative contribution of each input feature to the prediction outcomes . This technique was applied to the Random Forest and Gradient Boosting models, both of which inherently support feature importance The insights gained from this analysis help to understand the key factors influencing the selection of artificial lift methods in a data-driven manner. Results and Discussion Preliminary Data Exploration The first step involved presenting the frequency distribution of various Artificial Lift (AL) methods used in the oil industry. Artificial Lift is a technique employed to enhance oil production from wells that lack sufficient reservoir pressure to naturally flow fluids to the surface. The six types of AL methods shown in the diagram are Jet Pump. Sucker Rod Pump (SRP). Gas Lift. Electric Submersible Pump (ESP). Progressive Cavity Pump (PCP), and Hydraulic Pumping System (HPS), as illustrated in Figure 3. 38 Geovanny Branchiny Imasuly. Marcia V. Rikumahu A comparative study of tree-based machine learning algorithms for artificial lift optimization Figure 3. Frequency Distribution of AL Based on Figure 3, the Jet Pump is the most widely used artificial lift method, with approximately 369 instances, highlighting its popularity across various operating conditions, particularly for handling challenging fluids. The Sucker Rod Pump (SRP) follows closely with a nearly equal number of occurrences and is commonly employed for heavy oil and medium-depth wells. Gas Lift is used in over 344 cases, indicating its effectiveness in wells with high liquid production. Both Electrical Submersible Pumps (ESP) and Progressive Cavity Pumps (PCP) show similar usage frequencies, around 316 instances each, demonstrating their continued relevance depending on well conditions. Meanwhile. Hydraulic Pumping Power (HPP) is the least utilized method, with 286 cases, likely due to the associated costs and system complexity. Figure 4. Cumulative fluid production by AL Figure 4 presents the total fluid production . for each commonly used Artificial Lift method in the oil and gas industry. SRP and Jet Pump recorded the highest production volumes, each exceeding 5 million bpd, demonstrating their effectiveness in handling low-pressure wells or heavy Gas Lift follows closely with nearly 4. 8 million bpd, noted for its flexibility and minimal mechanical components. ESP and Jet Pump show similar performance at around 4 million bpd, while PCP records the lowest production at approximately 3. 7 million bpd, likely due to technical limitations under certain fluid conditions. This comparison indicates that the selection of an AL method depends not only on its usage frequency (Figure . but also on its production performance. Therefore, integrating both parameters is essential in developing a data mining-based decision support system. ISSN 2087-3336 (Prin. | 2721-4729 (Onlin. DOI 10. 37373/tekno. Descriptive Statistics A summary of descriptive statistics is presented for three key variables in the oil and gas production system, namely Flow Rate . Temperature (AF), and API Gravity, each based on 2,000 data samples. These statistics provide an initial overview of the distribution, spread, and characteristics of the values for each parameter. As shown in Table 1. Table 1. Numerical Summary Flow-Rate . Temperature (F) API Gravity LabelEndcoder LabelEncoder is used to perform label encoding on the categorical variable 'LABEL' in a dataset, which represents the types of Artificial Lift methods, as follows: C 'ESP' Ie 0 C 'Gas-Lift' Ie 1 C 'HPP' (Hydraulic Piston Pum. Ie 2 C 'Jet Pump' Ie 3 C 'PCP' (Progressive Cavity Pum. Ie 4 C 'SRP' (Sucker Rod Pum. Ie 5 Table 2. Label endcoder data Flow-Rate . Temperature (F) API Gravity Label Table 2 presents the results of label encoding performed on the categorical variable "Label," which represents the type of Artificial Lift method. This process is a continuation of the data transformation step, where each string category is converted into discrete numerical values ranging from 0 to 5. The encoded data enables compatibility with machine learning algorithms that require numerical inputs. This encoding step forms the foundation of an intelligent decision system architecture that is accurate, fast, and customizable. Classification Report and Confusion Matrix Following the encoding process, the dataset was partitioned into training and testing sets, with 70% of the data allocated for training and the remaining 30% for testing. The classification models were developed using three tree-based machine learning algorithms: Decision Tree. Random Forest, and Gradient Boosting. To evaluate the performance of these multiclass classification models, several metrics were utilized, including precision, recall. F1-score, and support for each class, along with the overall These metrics are comprehensively presented in the classification report, which provides a 40 Geovanny Branchiny Imasuly. Marcia V. Rikumahu A comparative study of tree-based machine learning algorithms for artificial lift optimization detailed view of how well each model performs across the six classes: ESP. PCP. SRP. Jet Pump. Gas Lift, and Hydraulic Pump. In addition, the confusion matrix was employed to visualize the distribution of correct and incorrect predictions across all classes, allowing for the identification of common misclassification This analysis enables a thorough comparison of the three models based on both quantitative performance metrics and classification behavior. The following section discusses in detail the performance of each model based on the results of the classification report and the corresponding confusion matrices. Decision tree classifier The classification results of the Decision tree classifier model are presented in Table 3, which summarizes the evaluation metrics including precision, recall. F1-score, and support . umber of Table 3. Decision tree classification report Precision Recall F1- Score Support macro avg The performance evaluation of the Decision Tree model is presented in Table 3. The evaluation metrics include precision, recall, and F1-score for each of the six classes, along with an overall accuracy of 82% based on a test set of 600 samples. The macro and weighted average scores are both 82, indicating that the model performs relatively consistently across all classes. Class 0 achieved the highest precision . , but with a recall of only 0. 79, suggesting that while the model is selective, it does not capture all instances of this class effectively. In contrast. Class 4 shows the highest recall . but with a lower precision . , indicating a tendency toward overprediction of this class. The lowest performance was observed in Class 2, with an F1-score of just 0. 76, possibly due to imbalanced data distribution or less representative features. Meanwhile. Classes 1, 3, and 5 demonstrated stable performance, with F1-scores ranging between 0. 83 and 0. Random forest classifier The classification results of the Random Forest classifier are presented in Table 4, which summarizes the performance evaluation metrics based on precision, recall, and F1-score. Table 4. Random forest classification report Precision Recall F1-Score Support macro avg weighted avg The Random Forest model demonstrated excellent performance, achieving an overall accuracy of 94%, a significant improvement over the Decision Tree modelAos 81%. The average precision, recall. ISSN 2087-3336 (Prin. | 2721-4729 (Onlin. DOI 10. 37373/tekno. and F1-score . oth macro and weighte. were consistently 0. 94, indicating balanced performance across all classes. Compared to the Decision Tree, the Random Forest significantly reduced misclassification errorsAiparticularly in classes 2 and 5Aiand exhibited greater predictive stability, attributed to its ensemble approach that effectively mitigates overfitting. Gradient boosting classifier The classification results of the Gradient Boosting Classifier are presented in Table 5, which summarizes the performance evaluation metrics based on precision, recall, and F1-score. Table 5. Gradient boosting classification report Precision Recall F1-Score Support macro avg weighted avg The Gradient Boosting Classifier model reached an overall accuracy of 64%, which was lower than the accuracies of the Decision Tree . %) and Random Forest . %) models. Both the macro and weighted average F1-scores were 0. 94, indicating that the model maintained high accuracy and consistency across all categories. Nonetheless, the macro and weighted averages for precision, recall, and F1-score hovered around 0. 64, highlighting issues related to class imbalance and misclassification among certain classes. The highest F1-score appeared in class 3 (Jet Pum. , though the difference compared to other classes with lower performance was relatively minor. Additionally, the confusion matrix is illustrated in Figure 5 - Figure 7. Confusion Matrix Decision Tree Figure 5. Confusion matrix decision tree Figure 5 presents the confusion matrix of the Decision Tree model used for six-class classification, demonstrating a reasonably good overall accuracy of 82%. The model performs relatively well in identifying Jet Pump . and ESP . However, it frequently 42 Geovanny Branchiny Imasuly. Marcia V. Rikumahu A comparative study of tree-based machine learning algorithms for artificial lift optimization misclassifies samples from Gas-Lift . PCP . , and SRP . For instance, many SRP samples are incorrectly predicted as HPP or PCP, while Gas-Lift is often confused with PCP. This suggests that the model struggles to differentiate between artificial lift types with overlapping operational characteristics, indicating limitations in its ability to generalize across classes with similar Confusion Matrix Random Forest Figure 6. Confusion matrix random forest Figure 6 presents the confusion matrix of the Random Forest model in classifying six types of artificial lift. The results demonstrate excellent classification performance, with the majority of predictions concentrated along the main diagonal, indicating correct classifications. The model accurately recognizes all classes, particularly Jet Pump . Gas-Lift . , and PCP . Misclassification errors are minimal and evenly distributed, suggesting that the Random Forest model effectively distinguishes between class features, outperforming the previous model. Confusion Matrix Gradient Boosting Figure 7. Confusion matrix gradient boosting ISSN 2087-3336 (Prin. | 2721-4729 (Onlin. DOI 10. 37373/tekno. Figure 7 presents the confusion matrix of the Gradient Boosting model for classifying six types of artificial lift. The results indicate reasonably good performance, with most predictions concentrated along the main diagonal, particularly for class 3 (Jet Pum. and class 5 (SRP). However, notable misclassifications remain, especially for class 4 (PCP), which is frequently confused with class 1 (Gas-Lif. , and class 2 (HPP), which is often misclassified as other classes. These findings suggest that while the model achieves a decent level of accuracy, there is still room for improvement in distinguishing classes with similar characteristics. Model Accuracy Comparison Compare the accuracy of the Decision Tree. Random Forest, and Gradient Boosting models on both the training and validation datasets. This evaluation aims to assess the generalization capability of each model. Table 6. Comparison of model accuracy on training and validation data Model Traning Validation Accuracy Accuracy Decision Tree Random Forest Gradient Boosting Table 6 shows that the Decision Tree achieved 100% accuracy on the training data but dropped to 17% on the validation data, indicating overfitting. In contrast. Random Forest maintained stable performance with a validation accuracy of 93. 67%, demonstrating good generalization ability due to its ensemble approach. Gradient Boosting recorded a training accuracy of 90. 43% and a validation accuracy of 64. 00%, indicating suboptimal performance with the default configuration. Additionally. Figure 8 compares the accuracy of the three models in classifying Artificial Lift methods based on operational features. Figure 8. Comparison of training and validation accuracy among different models Figure 8 also illustrates the comparison between training and validation accuracy, indicating that the Decision Tree and Gradient Boosting models exhibit signs of overfitting, as evidenced by a substantial gap between training and validation performance. The Decision Tree achieved perfect training accuracy . but only reached 0. 82 on the validation set, while Gradient Boosting showed a more pronounced drop from 0. 90 to 0. In contrast, the Random Forest model demonstrated the most balanced performance, with a training accuracy of 1. 00 and a high validation accuracy of 0. indicating strong generalization capability and superior model stability. 44 Geovanny Branchiny Imasuly. Marcia V. Rikumahu A comparative study of tree-based machine learning algorithms for artificial lift optimization Model Performance Table 7 additional performance indicators used to assess the three primary classification algorithms. Decision Tree. Random Forest, and Gradient Boosting, include ROC AUC (Receiver Operating Characteristic Area Under the Curv. and PR AUC (Precision-Recall Area Under the Curv. These metrics offer a more comprehensive evaluation of model effectiveness, especially when dealing with multiclass scenarios and imbalanced data distributions. Table 7. Model performance evaluated using ROC AUC and PR AUC Metrics Model Mean ROC AUC Mean PR AUC Decision Tree Random Forest Gradient Boosting ROC AUC (Receiver Operating Characteristic AUC) Random Forest demonstrated the best classification performance with an average ROC AUC of 994, approaching a perfect score. Gradient Boosting recorded a ROC AUC of 0. 901, indicating a reliable performance. Meanwhile, the Decision Tree achieved an ROC AUC of 0. 891, reflecting moderate performance with a tendency toward overfitting. Figure 9. Receiver operating characteristic Based on Figure 9, the ROC illustrates the modelAos ability to distinguish between classes, with a curve approaching the top-left corner indicating strong classification performance. Random Forest demonstrated the best and most consistent ROC performance across all classes . Ae. , reflecting superior classification accuracy and generalization capability. In contrast, the Decision Tree and Gradient Boosting models exhibited more variable performance, particularly for Class 2 and Class 5, where a noticeable decline in class separability was observed. PR AUC (Precision-Recall AUC) Random Forest demonstrated the best performance with a PR AUC value of 0. 970, indicating high efficiency in minimizing false positives and false negatives, critical aspects in classification systems for operational decision-making. Meanwhile, both Decision Tree and Gradient Boosting ISSN 2087-3336 (Prin. | 2721-4729 (Onlin. DOI 10. 37373/tekno. recorded the same PR AUC of 0. 702, reflecting limited precision and sensitivity, which is consistent with the low F1-scores observed in these two models. Figure 10. Precision-Recall AUC. Figure 10 the Precision-Recall curve, illustrates the balance between the modelAos precision and re The Random Forest algorithm demonstrated the best performance, maintaining a stable curve eve n at high recall values, indicating that precision was consistently preserved. In contrast. Decision Tree and Gradient Boosting exhibited significant declines in precision at high recall levels. Notably, for cla sses 4 and 5. Gradient Boosting showed the steepest decrease, confirming its limitations on this datase Feature Importance Figure 11. Feature importance comparison across models Figure 11 presents a comparison of the feature importance scores in the classification models Decision Tree. Random Forest, and Gradient Boosting across three main input parameters: Flow Rate 46 Geovanny Branchiny Imasuly. Marcia V. Rikumahu A comparative study of tree-based machine learning algorithms for artificial lift optimization . Temperature (AF), and API Gravity. Feature importance reflects the extent to which each input variable contributes to the decision-making process of the model in classifying the type of Artificial Lift. Figure 12. Permutation importance comparison across models Figure 12 presents the evaluation results of Permutation Feature Importance for three classification models: Decision Tree. Random Forest, and Gradient Boosting, with respect to three key features: Flow Rate . Temperature (AF), and API Gravity. Permutation importance measures the average decrease in model accuracy when the values of a specific feature are randomly shuffled, thereby providing insights into the direct contribution of each feature to the modelAos performance. Figure 13. SHAP sensitivity plot for random forest classifier. Figure 13 presents the SHAP (SHapley Additive exPlanation. visualization results, illustrating the contribution of each feature to the predictions made by the Random Forest Classifier in classifying Artificial Lift methods. The plot displays the mean absolute SHAP values, reflecting the relative influence of each feature on the model's output. The feature Temperature (AF) demonstrates the highest contribution, particularly for Class 3 (Jet Pum. and Class 5 (SRP), indicating the sensitivity of the Jet Pump method to thermal conditions. API Gravity shows a significant impact on Class 1 (Gas Lif. and Class 0 (ESP), aligning with the physical principle of AL selection based on fluid density. Meanwhile. Flow Rate . exhibits substantial influence on Class 0 (ESP), 2 (HPP), and 4 (PCP), highlighting the critical role of production volume in the modelAos decision-making process. The SHAP values also indicate that feature contributions are class-specific, reinforcing the findings from the previous feature importance analysis. Conclution ISSN 2087-3336 (Prin. | 2721-4729 (Onlin. DOI 10. 37373/tekno. Based on the evaluation and comparison of the performance of three primary classification algorithms Decision Tree. Random Forest, and Gradient Boosting in identifying Artificial Lift methods using parameters such as flow rate, temperature, and API gravity, the results indicate that Random Forest outperforms the others, achieving a validation accuracy of 93. 67%, an ROC AUC of 994, and a PR AUC of 0. While the Decision Tree algorithm is suitable as a baseline model, it is prone to overfitting, and Gradient Boosting exhibits unstable performance. Feature importance and SHAP analysis confirm that temperature plays a dominant role, particularly in predicting the Jet Pump method, followed by API gravity and flow rate. Overall. Random Forest offers the best combination of accuracy, stability, and interpretability, making it the preferred choice for a decision support system in selecting Artificial Lift methods. References