Yogi F.
Indrawan, et.
: Comparative Analysis of Large Red Chili Price A (April 2.
Comparative Analysis of Large Red Chili Price Forecasting Models in Malang Regency Using Long Short-Term Memory (LSTM) and Autoregressive Integrated Moving Average (ARIMA) Yogi F.
Indrawan1.
Aisyah Larasati1.
Agus R.
Purnama2, and Nikmatus Sholikha.
1Department of Mechanical and Industrial Engineering.
Faculty of Engineering.
Universitas Negeri Malang.
Malang.
Indonesia 2Department of Industrial Engineering.
Faculty of Engineering.
Universitas Nahdlatul Ulama Sidoarjo.
Sidoarjo.
Indonesia Corresponding author: Aisyah Larasati .
-mail: aisyah.
ft@um.
ABSTRACT Large red chili is a strategic food commodity with high demand, yet its price often fluctuates due to factors such as weather, harvest seasons, and market dynamics.
In Malang Regency, these fluctuations impact inflation and economic stability, necessitating an accurate forecasting model to support decisionmaking.
This study aims to develop a price forecasting model using Long Short-Term Memory (LSTM) and Autoregressive Integrated Moving Average (ARIMA) methods and compare their performance using daily time series data on large red chili prices from January 2022 to August 2024, obtained from the Representative Office of Bank Indonesia in Malang.
The data underwent preprocessing, where LSTM data was transformed using MinMaxScaler, while ARIMA data was differenced to meet stationarity assumptions, then split into 80% training and 20% testing data, with optimal parameters obtained through Grid Search for both models.
The results show that the LSTM model with three layers .
, 150, 150 unit.
and a dropout of 0.
2 achieved an RMSE of 2.
326 and MAPE of 3.
65%, whereas the best ARIMA configuration .
,1,.
achieved an RMSE 455 and MAPE of 3.
Although both models performed competitively and yielded promising results.
LSTM demonstrated superior accuracy in forecasting large red chili prices in dynamic market conditions.
KEYWORDS ARIMA.
Large Red Chili.
LSTM.
Time Series Data
INTRODUCTION
Food commodities play a crucial role in the economy at both national and community levels.
One strategic commodity that frequently experiences price fluctuations is large red chili .
The price of this commodity tends to change significantly in a short period, especially during certain events such as Ramadan and Eid al-Fitr.
The imbalance between increasing demand and limited supply, particularly during the rainy season, often leads to extreme price surges .
In Malang Regency, large red chili prices exhibit high volatility, with sharp daily price changes throughout the year.
According to data from the Malang Representative Office of Bank Indonesia for 2022 and 2023, large red chili prices experienced drastic fluctuations, with price spikes occurring within days, making it a significant factor contributing to inflation in the region .
VOLUME 07.
No 01, 2025 DOI: 10.
52985/insyst.
To mitigate the impact of price instability, a forecasting method that can accurately predict price movements is Time series forecasting is widely used for analyzing price patterns based on historical data .
There are two main approaches to time series forecasting: statistical methods and machine learning methods.
One of the most commonly used statistical models is the Autoregressive Integrated Moving Average (ARIMA), which has been extensively applied in various studies due to its ability to capture trend and seasonal patterns in historical data .
Meanwhile, advancements in artificial intelligence enable the use of machine learning models such as Long Short-Term Memory (LSTM), a type of Recurrent Neural Network (RNN) designed to capture long-term dependencies in data and is often used for analyzing complex and nonlinear data .
Yogi F.
Indrawan, et.
: Comparative Analysis of Large Red Chili Price A (April 2.
ARIMA is a statistical forecasting model that is effective for predicting time series data with seasonal patterns and short-term trends.
This model is known to have flexible properties and a fairly good level of accuracy in projecting future trends .
However.
ARIMA has limitations in handling data with long-term dependencies or complex nonlinear patterns, which often appear in commodity price data.
Meanwhile, the LSTM method is a development of RNN which has the ability to handle long-term dependencies in time series data through the use of "gate" functions which allow selective storage of information .
LSTM has also been widely used in various studies involving prediction using time series data.
For example.
LSTM is used to predict rubber prices, with very good results, where the MAPE value is 25% on the train data and 1.
09% on the test data.
This value is categorized as very good because MAPE below 10% indicates a high level of accuracy.
In addition.
LSTM has also been compared with Multilayer Perceptron (MLP) in rice price forecasting, and the results show that LSTM is more accurate, as evidenced by the smaller difference between actual and predicted prices, as well as a lower RMSE value compared to MLP .
These results can strengthen the position of LSTM as a superior method in handling data predictions with complex and fluctuating patterns.
Although LSTM shows advantages in handling complex data compared to simple RNN models, this method still has limitations, especially when dealing with large datasets or trend instability.
Previous studies indicate that ARIMA is advantageous for forecasting data with stable seasonal patterns .
Meanwhile.
LSTM provides more accurate predictions for data with complex and dynamic patterns .
This study focuses on comparing the Long Short-Term Memory (LSTM) and Autoregressive Integrated Moving Average (ARIMA) algorithms in forecasting large red chili prices in Malang Regency.
These two algorithms will be compared in time series analysis using historical data to determine which algorithm provides the most accurate prediction results.
TABLE I
LSTM PARAMETERS SETTINGS
Parameter Layers Value Units 50 Ae 150 Dropout 2 Ae 0.
Optimizer Adam Description Number of layers used to capture data Model capacity in recognizing patterns from data.
Prevents overfitting and improves model Improve convergence speed and model process includes problem identification, data collection, model development, and comparison of forecasting models to obtain the best prediction results.
Here is depiction flowchart research used in study This is what is depicted in Figure 1.
The parameter settings used in the LSTM model are presented in Table 1.
These settings include the number of layers, unit capacity, dropout rate, and optimizer type.
These parameters are designed to ensure the model can recognize time series data patterns effectively while preventing overfitting .
Figure 1.
Research Flowchart.
Additionally, parameter settings were also applied to the ARIMA model to ensure optimal performance.
Table 2 shows the parameter settings for the ARIMA model, covering p, d, and q, each of which plays a role in determining data stationarity, past variable relationships, and residuals for prediction .
TABLE II
ARIMA PARAMETER SETTINGS
II.
METHOD
This study compares the LSTM and ARIMA forecasting methods for predicting large red chili prices in Malang Regency using a quantitative approach based on supervised Secondary data was obtained from the Representative Office of Bank Indonesia Malang, recorded daily from January 1, 2022, to August 31, 2024.
The research VOLUME 07.
No 01, 2025 DOI: 10.
52985/insyst.
Parameter Value 0Ae5 0Ae1 Description Controls the number of lags in the target variable used to predict future values.
Helps eliminate trends in the data to make it stationary.
Controls the number of lagged residuals used in prediction.
Yogi F.
Indrawan, et.
: Comparative Analysis of Large Red Chili Price A (April 2.
Model evaluation in this study was carried out using two main evaluation metrics, namely Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE).
These two metrics are used to measure the level of accuracy prediction results with actual data.
RMSE is a metric obtained by taking the square root of the Mean Squared Error (MSE), which calculates the average of the squared errors between the predicted value and the actual value.
RMSE is used because it provides an interpretation in the same units as the original data, making it easier to understand in measuring model accuracy.
The lower the RMSE value, the higher the level of model accuracy in making predictions .
The RMSE formula is shown in Equation 1 & Equation RMSE = Oo ycAycAycAycAycAycA ycuycu RMSE = .
ayaycyc Oe ycEycEycyc )2 ycuycu ycyc=1 .
In addition to RMSE, another metric used is MAPE.
MAPE calculates the average absolute error as a percentage of the actual values.
In other words.
MAPE indicates how much the model's prediction error deviates from the actual value in percentage terms.
The main advantage of MAPE is its ability to provide a more intuitive interpretation, as it is expressed in percentage form, making it easier to understand how far the predictions deviate from the actual values.
lower MAPE value indicates that the model has a lower error rate and is more accurate in making predictions .
The MAPE formula is shown in Equation 3.
ycEycEycyc
MAPE =
100 ycuycu yayaycyc Oe ycEycEycyc ycuycu yayaycyc ycyc=1 .
Actual Value.
Prediction Value.
Amount of data.
RESULT
DATA COLLECTION
The data collection in this study utilizes internal data on large red chili prices obtained from the Representative Office of Bank Indonesia in Malang.
The dataset consists of historical data spanning from early 2022 to the end of August 2024, recording the daily price of large red chili in Indonesian Rupiah per kilogram, totaling 974 data points.
As historical data obtained from an official source, this dataset is considered representative and relevant for forecasting purposes in this study, particularly in analyzing the price fluctuation patterns of large red chili in Malang Regency.
Based on the data in Table 3, it can be seen that the price of large red chili has experienced quite significant fluctuations.
For example, on August 27, 2024, the price of large red chili VOLUME 07.
No 01, 2025 DOI: 10.
52985/insyst.
was recorded at Rp30,000 per kilogram, but experienced a gradual decline to Rp28,600 on August 31, 2024.
This proves that the price of large red chili can change every day.
TABLE i LARGE RED CHILI PRICE DATA
Date
2022-01-01
2022-01-02
2022-01-03
2022-01-04
2022-01-05
2024-08-27
2024-08-28
2024-08-29
2024-08-30
2024-08-31
Price
DATA PREPROCESSING
Data preprocessing in this study is a crucial step to ensure data quality before being used in the forecasting process of large red chili prices using the ARIMA and LSTM models.
The preprocessing steps include checking for missing values, performing descriptive statistical analysis, transforming data, and splitting data.
The first step is checking for missing values, which aims to identify any missing data in the dataset.
Missing values can affect the model's prediction accuracy, so they need to be handled if found.
In this study, the check was performed using preprocessing, the df.
isna().
sum() function, and the results showed that there were no missing values in the dataset.
Therefore, the data could be used directly without requiring imputation or data removal.
Next, descriptive statistical analysis was conducted to understand the characteristics of large red chili price data.
The descriptive statistics calculated included mean, median, standard deviation, minimum and maximum values, and data The analysis results showed that large red chili prices exhibited significant fluctuations, with a wide price range between a minimum value of Rp19,400 and a maximum value of Rp95,000.
A high standard deviation indicated a significant level of price variation, which could impact the performance of the forecasting model.
This analysis serves as the basis for determining whether additional transformations are needed before using the data in modeling.
To ensure that the data is suitable for each modelAos requirements, data transformation was performed.
In the LSTM model, data was normalized using the MinMaxScaler method, which scales large red chili prices to a range of 0-1.
This normalization is essential because deep learning models like LSTM are more sensitive to data scale, making this transformation helpful in accelerating model convergence during training.
Meanwhile, in the ARIMA model, differencing was applied, a process of calculating the difference between data points to eliminate trends and achieve Yogi F.
Indrawan, et.
: Comparative Analysis of Large Red Chili Price A (April 2.
Data stationarity is a primary requirement in the ARIMA model to better capture patterns.
The stationarity test was conducted using the Augmented Dickey-Fuller (ADF) Test, which indicated that the initial data was non-stationary, necessitating first-order differencing to achieve stationarity .
The final step in preprocessing was data splitting.
The dataset was divided into 80% training data and 20% testing data, which is a common proportion in time-series analysis to ensure the model has sufficient data for learning without losing generalization ability .
In the LSTM model, an approach using 60 previous values to predict the 61st value was implemented, based on research indicating that this lag size is optimal for capturing long-term patterns .
Meanwhile, in the ARIMA model, the stationaries data was also split in the same proportion to ensure the model could learn historical patterns and be tested on data that had not been used during training.
MODELLING LSTM
The development of the LSTM model involves a series of processes, including modeling, parameter selection, training, testing, and forecasting the price of large red chili in Malang Regency.
This model is designed to capture price patterns in historical data and generate more accurate predictions by leveraging LSTMAos advantages in handling complex timeseries data.
The LSTM modeling process begins with defining the model structure using three LSTM layers with varying units:
50, 100, and 150.
Each LSTM layer is designed to capture data patterns more effectively, while dropout layers are added to prevent overfittingAia condition where the model becomes too adapted to the training data and loses its ability to predict new data accurately.
This model employs the Adam optimizer, chosen for its ability to adaptively adjust the learning rate, thereby accelerating the model's convergence process.
Meanwhile, the Mean Squared Error (MSE) loss function is used to measure the average squared difference between actual and predicted values, which is suitable for regression problems such as price forecasting.
TABLE IV
BEST PARAMETER COMBINATION OF LSTM
Units_1 Units_2 Units_3 Dropout Score Best P1 Best P2 Best P3 Best P4 Best P5 For parameter selection, the optimal configuration is determined using the Grid Search method through Keras Tuner.
Grid Search works by testing various parameter combinations to find the configuration that yields the lowest validation error.
Based on the search performed using Grid Search, there are a total of 54 combinations.
Table 4 presents VOLUME 07.
No 01, 2025 DOI: 10.
52985/insyst.
the five best parameter combinations with the lowest error rate, which reflects the best performance in modeling the data The best configuration was found to have 150 units in each layer and a dropout rate of 0.
2 to proceed to the training stage.
This selection was based on the evaluation results, which showed that "Best P1" had the lowest error score of 0.
compared to the other models.
The low error value indicates that the model with these parameters has a higher accuracy in learning the data patterns.
Once the optimal parameters are obtained, the model is trained using the preprocessed dataset.
The training process is conducted for a maximum of 100 epochs, with 20% of the training data used for validation.
To prevent overfitting, early stopping is implemented, which halts training early if there is no decrease in validation loss for ten consecutive epochs.
Early stopping helps the model avoid excessive training, which could degrade performance when tested with new data.
During the training process, the model learns to recognize large red chili price fluctuation patterns based on historical After training is complete, the model undergoes testing to evaluate its predictive capabilities.
The visualization results indicate that the LSTM model can follow actual price patterns fairly well, particularly in capturing major trends and significant price changes.
The testing process results are visualized in Figure 2.
Figure 2.
LSTM Model Testing Results.
The test result graph displays three main lines: the blue line represents training data, the orange line represents actual prices, and the green line represents predicted prices.
Although the model generally replicates price patterns well, there are minor deviations at certain time intervals, indicating some discrepancies between predicted and actual values.
However, these differences are not significant and do not affect the overall performance of the model in understanding large red chili price patterns.
The next step is the forecasting process, where the LSTM model is used to predict large red chili prices for the next year.
This process is conducted using a Python loop to generate price predictions iteratively based on previous values.
The forecasting visualization results are shown in Figure 3.
Yogi F.
Indrawan, et.
: Comparative Analysis of Large Red Chili Price A (April 2.
The next stage is the testing process, which aims to evaluate the model's performance using test data, consisting of 20% of the total dataset.
The model performs predictions iteratively, where each prediction result is updated with actual data to improve the accuracy of subsequent predictions.
The visualization of the testing results shows that the ARIMA model effectively captures price fluctuation patterns, with the model's predicted values closely aligning with actual price The testing process results are visualized in Figure 4.
Figure 3.
LSTM Model Forecasting Results.
The forecasting results indicate that the model predicts a gradual upward price trend, suggesting that it successfully captures the price growth pattern in historical data.
These predictions are visualized in a graph, where the dashed red line represents the modelAos forecasted results based on previous data patterns.
As a final step, the model is evaluated using two key metrics: RMSE and MAPE.
The evaluation results show that the LSTM model performs exceptionally well, with an RMSE of 2.
326 and a MAPE of 3.
A MAPE value below 10% indicates a low prediction error rate .
MODELLING ARIMA
The development of the ARIMA model involves several key processes, including modeling, parameter selection, training, testing, and forecasting the price of large red chili in Malang Regency.
This model is used to capture price fluctuation patterns based on historical data and generate accurate predictions by considering the autoregressive (AR), differencing (I), and moving average (MA) components.
The ARIMA modeling process begins by determining the optimal parameter combination consisting of p .
, d .
, and q .
oving averag.
The differencing value d = 1 was pre-determined through the differencing process to ensure data stationarity.
Therefore, parameter search focuses on the values of p and q within the range of 0 to 5 using the Grid Search method with the auto_arima function from the pmdarima library.
During the parameter selection stage, various combinations of p and q values are explored while considering the Akaike Information Criterion (AIC) as a performance indicator for the The lower the AIC value, the better the model is at capturing data patterns.
Once the optimal parameters are obtained, the model is trained using the training dataset.
The ARIMA model is developed using the statsmodels library, where the fit() function is used to adjust the model parameters according to the historical data patterns in the training dataset.
The training results serve as the foundation for the model to understand temporal relationships and trends in historical data before being tested with new data.
VOLUME 07.
No 01, 2025 DOI: 10.
52985/insyst.
Figure 4.
ARIMA Model Testing Results.
The test result graph displays three main lines: the blue line represents training data, the orange line represents actual prices, and the green line represents predicted prices.
Although there are minor deviations, the model generally performs well in predicting price movement patterns.
After testing is complete, the model is used for the forecasting process, where predictions for large red chili prices are made for the next year.
This forecasting is performed using the trained model to estimate prices based on the patterns learned from historical data.
The visualization results of the forecasting process are shown in Figure 5.
Figure 5.
ARIMA Model Forecasting Results.
The forecasting results indicate a much more stable price pattern with minimal fluctuations.
Although there are slight increases and decreases in certain periods, the price movement predicted by ARIMA tends to be flat and does not show significant trend changes.
The evaluation results indicate that the ARIMA model has an RMSE of 2.
455 and a MAPE of A MAPE value below 10% indicates that the model can be considered to have excellent forecasting ability .
Yogi F.
Indrawan, et.
: Comparative Analysis of Large Red Chili Price A (April 2.
IV.
DISCUSSION
BEST LSTM PARAMETERS
In the LSTM model, parameter tuning was conducted using Grid Search with Keras Tuner, where the model was tested with various parameter combinations to determine the optimal The model consists of three LSTM layers with the number of units ranging from 50 to 150 and a dropout rate 2 and 0.
3 to prevent overfitting.
From a total of 54 tested combinations, the best configuration was found to have 150 units in each layer and a dropout rate of 0.
This configuration resulted in a loss value of 0.
indicating that the model effectively captured data patterns with relatively low error.
These results align with previous studies, which suggest that a higher number of units tends to yield more accurate models, as it enables the model to recognize more complex patterns.
Additionally, a dropout rate 2 was found to be optimal based on prior research, which indicates that a dropout rate that is too high can cause the model to lose critical information, while a rate that is too low may be insufficient in preventing overfitting .
BEST ARIMA PARAMETERS
For the ARIMA model, the best parameters were determined using the auto_arima method, which automatically performs a Grid Search while considering the Akaike Information Criterion (AIC) as a performance benchmark.
The parameter search ranged from 0 to 5 for both p and q values, while d was fixed at 1 to ensure data stationarity.
Out of the 36 parameter combinations tested, the best ARIMA model obtained was ARIMA .
,1,.
, with an AIC value of 019, indicating that this combination was the most optimal for capturing large red chili price patterns.
A model with p = 4 indicates that it considers four previous periods in making predictions, while q = 3 means that it uses three previous error values to improve prediction accuracy.
These results are supported by previous research, which suggests that higher p and q values generally enhance model performance, though excessively high values can lead to overfitting and increased computational complexity .
COMPARISON OF MODEL TESTING RESULTS
After training both LSTM and ARIMA models with their respective optimal parameters, testing was conducted using the test dataset.
Evaluation results using RMSE and MAPE showed that LSTM achieved an RMSE of 2.
326 and a MAPE 65%, while ARIMA had an RMSE of 2.
455 and a MAPE The value is categorized as excellent because a MAPE below 10% indicates a high level of accuracy .
These results indicate that LSTM slightly outperforms ARIMA, as it has lower error values in both actual price units (RMSE) and relative error percentage (MAPE).
However, the difference between the two models is not substantial, suggesting that both effectively capture large red chili price VOLUME 07.
No 01, 2025 DOI: 10.
52985/insyst.
patterns, despite utilizing different approaches.
LSTM, based on artificial neural networks, handles non-linear patterns, whereas ARIMA, a statistical method, is more effective for stationary and linear data.
COMPARISON OF FORECASTING MODEL RESULTS
Both models were used to predict large red chili prices for the following year, as shown in Figure 3 for LSTM and Figure 5 for ARIMA.
The forecasting results indicate that LSTM predicts a gradual increase in price trends, while ARIMA provides a more stable price prediction with minor A portion of the forecasted prices generated by both models is presented in Table 5.
TABLE V
FORECAST RESULTS ONE YEAR AHEAD
Date
01/09/2024
02/09/2024
03/09/2024
04/09/2024
05/09/2024
06/09/2024
07/09/2024
25/08/2025
26/08/2025
27/08/2025
28/08/2025
29/08/2025
30/08/2025
31/08/2025
Forecasting LSTM Rp 28.
Rp 29.
Rp 29.
Rp 29.
Rp 29.
Rp 29.
Rp 30.
Rp 75.
Rp 75.
Rp 75.
Rp 74.
Rp 74.
Rp 74.
Rp 74.
Forecasting ARIMA Rp 28.
Rp 28.
Rp 28.
Rp 28.
Rp 28.
Rp 29.
Rp 29.
Rp 29.
Rp 29.
Rp 29.
Rp 29.
Rp 29.
Rp 29.
Rp 29.
On the first day of forecasting.
September 1, 2024, the LSTM model predicted a price of Rp 28,919 and showed an increasing trend, reaching Rp 80,110 by the end of 2024 before slightly declining to Rp 74,348 on August 31, 2025.
This result indicates that LSTM successfully recognized an upward price trend based on historical data, reflecting its ability to capture complex and non-linear patterns .
Conversely, the ARIMA model predicted an initial price of Rp 28,437 on September 1, 2024, but its forecasted price movement remained relatively flat and stable, ranging only between Rp 28,341 and Rp 29,108 throughout the following This suggests that ARIMA is more effective in capturing seasonal and linear patterns but lacks the capability to handle more complex price fluctuations .
CONCLUSION
Based on the research findings, the LSTM and ARIMA models were successfully developed to forecast large red chili prices in Malang Regency using time-series data from 2022 to August 2024.
The LSTM model was built through data preprocessing with MinMaxScaler, data splitting into 80% training and 20% testing, and parameter optimization using Grid Search.
The best configuration obtained consisted of 150 units in each LSTM layer and a dropout rate of 0.
2, with the Yogi F.
Indrawan, et.
: Comparative Analysis of Large Red Chili Price A (April 2.
lowest loss value recorded at 0.
Evaluation results indicated that the LSTM model achieved an RMSE of 2.
and a MAPE of 3.
65%, demonstrating its ability to capture non-linear patterns and price fluctuations with high accuracy.
Meanwhile, the ARIMA model was developed using a statistical approach through stationarity analysis with the ADF test, data transformation via differencing, and optimal parameter selection using Grid Search with auto ARIMA.
The best ARIMA model identified was ARIMA .
,1,.
, with the lowest AIC value of 14159.
The evaluation results showed that the ARIMA model performed well, with an RMSE of 2.
455 and a MAPE of 3.
80%, indicating its capability in capturing linear patterns in historical data, although it was less effective in identifying non-linear patterns or more complex price fluctuations.
The comparative evaluation results showed that LSTM outperformed ARIMA, as indicated by its lower RMSE and MAPE values, making it more effective in capturing complex patterns and fluctuating price changes.
However, the difference between the two models was not significant, suggesting that both LSTM and ARIMA demonstrated strong predictive capabilities despite using different approaches.
LSTM is more suitable for data with dynamic and fluctuating patterns, whereas ARIMA remains a reliable method for data with more stable patterns.
Therefore, in the context of forecasting large red chili prices in Malang Regency.
LSTM
can be considered the primary model, while ARIMA can serve as a benchmark or alternative model in time-series data
AUTHORS CONTRIBUTION
Yogi Fradika Indrawan: Formal Analysis.
Investigation.
Methodology.
Original Draft Writing.
Aisyah Larasati: Conceptualization.
Methodology.
Supervision.
Review Writing & Editing.
Agus Rachmad Purnama: Validation.
Visualization.
Review Writing & Editing.
Nikmatus Sholikha: Visualization.
Review Writing & Editing.
COPYRIGHT
This work is licensed under a Creative Commons Attribution-NonCommercialShareAlike 4.
0 International License.
REFERENCES