JUSIKOM PRIMA (Jurnal Sistem Informasi dan Ilmu Komputer Prim. Vol. 6 No. Agustus 2022 E-ISSN : 2580-2879 Laptop Price Prediction with Machine Learning Using Regression Algorithm Astri Dahlia Siburian. Daniel Ryan Hamonangan Sitompul. Stiven Hamonangan Sinurat Andreas Situmorang. Ruben. Dennis Jusuf Ziegel. Evta Indra* Prodi Sistem Informasi. Fakultas Teknologi dan Ilmu Komputer. Universitas Prima Indonesia Jl. Sampul No. Sei Putih Barat. Medan Petisah E-mail : * evtaindra@unprimdn. ABSTRACT- Since the COVID-19 pandemic, many activities are now carried out in a Work From Home (WFH) manner. According to data from the Central Statistics Agency (BPS) of East Java, in 2021, large and medium-sized enterprises (UMB) who choose to work WFH partially are 32. 37%, and overall WFH is 2. (BPS East Java, 2021 ). With this percentage of 32. 37%, many people need a work device . n this case, a lapto. that can boost their productivity during WFH. WFH players must have laptops with specifications that match their needs to encourage productivity. To prevent buying laptops at overpriced prices, a way to predict laptop prices is needed based on the specified specifications. This study presents a Machine Learning model from data acquisition (Data Acquisitio. Data Cleaning, and Feature Engineering for the Pre-Processing. Exploratory Data Analysis stages to modeling based on regression algorithms. After the model is made, the highest accuracy result 77%, namely the XGBoost algorithm. With this high accuracy value, the model created can predict laptop prices with a minimum accuracy above 80%. Kata kunci : Machine Learning. Regression Algorithm. AutoML. Price Prediction. Laptop INTRODUCTION Since the COVID-19 pandemic, many activities are now carried out in a Work From Home (WFH) According to data from the Central Statistics Agency (BPS) of East Java, in 2021, large and medium-sized enterprises (UMB) who choose to work WFH partially are 32. 37%, and overall WFH 24% . With this percentage of 32. 37%, many people need a work device . n this case, a lapto. that can boost their productivity during WFH. encourage work productivity, every WFH actor must have a laptop with specifications that suit his needs, and to prevent buying laptops at inappropriate prices, an appropriate way is needed to predict the price of a laptop based on the specified There have been many studies that have the theme of predicting prices. To make price predictions, a regression algorithm is generally used in research . Ae. Research . presents a method for predicting used car prices using a machine learning model with a regression algorithm configured with hyper-parameter tuning, while research . presents a method for predicting house prices using a machine learning model with a regression algorithm, namely XGBoost. To overcome the problems mentioned, a Machine Learning method is needed to predict the price of a laptop based on parameters, namely laptop The author presents several machine learning models using a regression algorithm in this After modeling, the algorithm with the highest accuracy value will be used to predict the laptop's price, which will also be configured with AutoML to get a better accuracy value. With this method, it is expected that the model created can predict the price of a laptop with a minimum accuracy of above 80%. METHODOLOGY This study uses the Google Collaboratory tool to create a regression algorithm model. The workflow of the modeling can be seen in Figure 1. The research began with retrieving datasets from the Kaggle website. After the dataset is downloaded, the pre-process stage will be carried out, which consists of Data Cleaning and Feature Engineering. After the pre-processing stage is complete, the Exploratory Data Analysis (EDA) stage is carried out. And the last stage to do is Model Building where this stage consists of making a model with a default configuration to making a model with the help of AutoML. 1 Data Acquisition In this study, the dataset came from Muhammet Varli's "Laptop Price" repository on the Kaggle website . This dataset has 12 columns containing up to 1303 rows of data. Each column is a specification of a laptop in general, such as the brand. CPU. VGA/GPU, the price. Details of this "Laptop Price" dataset can be seen in Figure 2. JUSIKOM PRIMA (Jurnal Sistem Informasi dan Ilmu Komputer Prim. Vol. 6 No. Agustus 2022 E-ISSN : 2580-2879 DATA ACQUISITION DATA CLEANING FEATURE ENGINEERING EXPLORATORY DATA ANALYSIS ( EDA ) PRE-PROCESSING MODEL TUNING WITH AUTOML MODEL BUILDING Figure 1. Research Methodology Sumber . Figure 2. Dataset Detail 2 Data Cleaning Data Cleaning increases the usability of the dataset used . , . For example, in this study, all column names in the dataset will be changed to The syntax for performing Data Cleaning can be seen in Figure 3. Figure 4. Feature Engineering Syntax Table 1. Entire Feature Engineering Process Kolom Setelah Data Cleaning laptop_id Column dropped Addition of column resolution, screentype, touchscreen Addition of column cpu_freq Removal of GB from data and Updating column name to ram(GB) Figure 3. Data Cleaning 3 Feature Engineering Feature Engineering serves to create more features from existing datasets . , . In this study, the laptop specification column, such as CPU, will be broken down into several new columns, namely CPU Brand and CPU Clock. The syntax for performing Feature Engineering can be seen in Figure 4, and the overall results can be seen in Table Addition of column memory_type and memory_size Removal of Kg from data and Updating column name to weight. 4 Exploratory Data Analysis Exploratory Data Analysis (EDA) is an approach to analyzing a set of data to infer its main Generally. EDA is done by visualizing data with statistical charts . , . This study conducted EDA to see the correlation between columns and the average laptop price by Visualization of the average laptop price by the brand can be seen in Figure 5. JUSIKOM PRIMA (Jurnal Sistem Informasi dan Ilmu Komputer Prim. Vol. 6 No. Agustus 2022 E-ISSN : 2580-2879 Figure 5. Harga Rerata Laptop Berdasar Merk 5 Dataset Splitting The dataset will be split twice before model generation into train-testing data. In this study, the configuration of the dataset splitting used is 7:3, i. 70% of the dataset will be training data, while 30% will be testing data. The dataset splitting process can be seen in Figure 6. 3 XGBoost Regressor XGBoost Regressor is Gradient Boosting optimized and designed to be more efficient, flexible, and portable. XGBoost presents a parallel tree that can solve Data Science problems quickly and accurately . , . In this study, the XGBoost Regressor model was created with AutoML configuration, where the AutoML Optuna library will automatically determine the model parameters. The detail can be seen in Figure 9. Figure 6. Proses Splitting Dataset 6 Model Building In this study, three basic regression algorithms are generally used to create a machine learning model that can predict laptop prices, namely Random Forest. Gradient Boosting, and XGBoost. 1 Random Forest Regressor Random Forest is an Ensemble Learning-based Machine Learning algorithm that operates by creating multiple Decision Trees for training. For regression, the average value of the predictions from each tree will be used . , . In this study, the Random Forest algorithm was created using the configuration estimator = 100, max_depth = 100, and max_features = 15. The detail can be seen in Figure 7. Figure 9. Proses Pembuatan Model XGBoost RESULT AND DISCUSSION 1 Model Accuracy After the model is made, a comparison will be made from the model. In this study, what is compared to the Random Forest. Gradient Boosting, and XGBoost models is the R2 value and Real Mean Squared Error (RMSE). Model comparison can be seen in Table 2. Table 2. Model Comparison Model R2 Score Regressor Figure 7. Proses Pembuatan Model Random Forest 2 Gradient Boosting Regressor Gradient Boosting Regressor (GBR) is an algorithm consisting of several decision trees to perform classification and regression tasks. Gradient Boosting generally works by creating a sequence of decision trees in which each tree focuses on the rest of the previous tree . , . The GBR algorithm is made with the default configuration in this study. The detail can be seen in Figure 8. Figure 8. Proses Pembuatan Model GBR Regressor XGB Regressor From the table above, it can be concluded that XGBoost has the highest R2 value and the lowest RMSE. So XGBoost can be said to be better than other algorithms used to predict the price of laptops. 2 Feature Importance Model After making the model, to help employees when buying a laptop, the model can also determine the essential laptop specifications. This study uses the three algorithms to determine that Random Access Memory (RAM) is an essential feature in a laptop for work purposes. Visualization of Feature Importances can be seen in Figure 10. JUSIKOM PRIMA (Jurnal Sistem Informasi dan Ilmu Komputer Prim. Vol. 6 No. Agustus 2022 E-ISSN : 2580-2879 BIBLIOGRAPHY