Institut Riset dan Publikasi Indonesia (IRPI) MALCOM: Indonesian Journal of Machine Learning and Computer Science Journal Homepage: https://journal.
id/index.
php/malcom Vol.
5 Iss.
3 July 2025, pp: 1000-1011
ISSN(P): 2797-2313 | ISSN(E): 2775-8575
Predictive Sales Analysis in Coffee Shops Using the Random Forest Algorithm Shella Norma Windrasari1*.
Hendro Margono2.
Yudistira Ardi Nugraha Setyawan Putra3 Departemen of Human Resource Development.
Faculty Graduate School.
Airlangga University.
Indonesia Departemen of Information and Library Science.
Faculty of Social and Political Sciences.
Airlangga University.
Indonesia E-Mail: 1 shella.
win-2024@pasca.
id, 2hendro.
margono@fisip.
nugraha-2023@pasca.
Received Apr 19th 2025.
Revised Jun 22th 2025.
Accepted Jul 22th 2025.
Available Online Jul 31th 2025.
Published Jul 31th 2025 Corresponding Author: Shella Norma Windrasari Copyright A 2025 by Authors.
Published by Institut Riset dan Publikasi Indonesia (IRPI) Abstract The coffee shop industry has experienced significant growth, evolving into a highly competitive marketplace demanding specialty coffee and personalized experiences.
While data-driven strategies are crucial for optimizing operations, many owners still struggle to effectively leverage their sales data to understand dynamic customer behavior and enhance decision-making.
Addressing this gap, this study explores the application of machine learning (ML) techniques, specifically the Random Forest Regressor model, to predict sales performance within the coffee shop business By analyzing factors such as transaction timing, store location, product type, and day of the week, this research aims to uncover patterns that can enhance inventory management and customer engagement.
The Random Forest model was evaluated through cross-validation, yielding a mean Mean Squared Error (MSE) of 80.
97, which indicates moderate predictive accuracy and represents an improvement over traditional forecasting methods commonly employed in the industry.
Feature importance analysis revealed that Premium Beans is the most influential predictor, followed by seasonal trends .
, time of day, and weekend sales patterns.
These findings underscore the importance of incorporating temporal and contextual factors into forecasting models.
Despite these promising results, the model's performance exhibited variability, suggesting room for further refinement through better feature selection, the inclusion of external variables .
, weather, local event.
, and advanced hyperparameter tuning.
This study highlights the potential of ML in enhancing operational efficiency and decision-making in the coffee shop industry, while also pointing to opportunities for future research to optimize prediction models and drive profitability.
Keyword: Data Analysis.
Coffee Shop.
Machine Learning.
Random Forest.
Sales Prediction.
INTRODUCTION
In recent years, the coffee shop industry has experienced significant growth, evolving into a highly competitive marketplace driven by the popularity of specialty coffee and the demand for unique, personalized For example in particular regions like Hongkong, where keen competition has emerged due to the increasing importance of this segment in the foodservice industry .
This expansion has transformed coffee shops from mere caffeine stops into vibrant community hubs where people gather to work and However, this rapid growth has intensified competition, making it crucial for business owners to adopt effective, data-driven strategies to navigate various operational decisions, from product selection to promotional timing and stock placement, all while adapting to shifting consumer preferences and market trends .
Consequently, coffee shop operators increasingly recognize the need to leverage data analytics to gain insights into customer behavior, which can drive sales and enhance loyalty.
Despite the clear advantages of data-driven strategies, many coffee shop owners still struggle to deeply understand their customers' purchasing patterns .
The primary challenge lies in analyzing data across dimensions such as time, location, and product category.
Customer preferences can vary widely depending on the time of day, with certain items being more popular during morning rush hours and others in the afternoon .
Geographical factors also play a role, as urban customers may have different tastes than those in suburban or rural areas.
Additionally, the diverse product offerings in coffee shops complicate the DOI: https://doi.
org/10.
57152/malcom.
ISSN(P): 2797-2313 | ISSN(E): 2775-8575
analysis, making it difficult to determine which items resonate most with customers at different times and locations .
This ambiguity can lead to suboptimal decisions regarding inventory management and promotional strategies, ultimately impacting profitability and customer satisfaction.
The rapid pace of change in consumer trends, fueled by social media and online reviews, further complicates matters, requiring coffee shop owners not only to keep up with these shifts but also to anticipate future trends .
A robust analytical framework is essential for processing and interpreting data in real-time, enabling agile decision-making in this dynamic environment.
While traditional analytical methods and rule-based systems provide foundational insights, they often struggle to capture the intricate, non-linear patterns inherent in complex customer behaviors and rapidly shifting market trends .
To overcome these limitations and further enhance analytical capabilities, the integration of machine learning (ML) techniques has become increasingly relevant in the retail and food & beverage (F&B) sectors .
By leveraging advanced algorithms such as Random Forest and K-Nearest Neighbors (KNN), businesses can gain deeper, predictive insights into customer behavior and market trends that are beyond the scope of simpler, predefined rules .
These algorithms are particularly effective for tasks like precise sales prediction and nuanced transaction time segmentation, allowing companies to make truly informed decisions that dynamically align with evolving consumer demands .
Previous studies have demonstrated the efficacy of these ML approaches in accurately forecasting sales and optimizing inventory management .
, .
, .
, often outperforming traditional methods when compared.
Moreover, the implementation of ML in physical store operations has shown promising results, streamlining processes and enhancing customer experiences, ultimately driving profitability in an increasingly competitive landscape .
Machine learning, a subset of artificial intelligence, involves the use of algorithms and statistical models to enable computers to perform specific tasks without explicit instructions .
, .
, making it ideally suited for identifying complex, unforeseen correlations within vast datasets.
In the context of retail and F&B.
ML can analyze vast amounts of data to identify patterns, predict outcomes, and make recommendations .
This capability is particularly valuable in an industry characterized by rapidly changing consumer preferences and market dynamics, where static, rule-based approaches quickly become By employing ML, businesses can enhance their operational efficiency, improve customer engagement, and ultimately increase their bottom line .
However, despite its proven advantages in broader retail contexts, the application of ML specifically in the context of coffee shopsAiwhich are characterized by an extensive variety of products and highly dynamic visiting hours that challenge conventional analysisAiremains relatively limited and presents a significant research gap.
The purpose of this study is to explore the use of ML in predicting sales performance within the dynamic coffee shop business environment.
Specifically, we aim to examine how factors such as transaction timing, store location, product type, and day of the week influence customer purchasing behavior.
As the coffee shop industry continues its rapid growth, leveraging data-driven insights becomes essential for improving decision-making processes related to product offerings and inventory planning.
While many coffee shop owners still rely on intuition or past experiences, these methods often fall short in dynamic market conditions where customer preferences can shift throughout the day or week.
This study seeks to bridge that gap by addressing the following research questions: How can ML be effectively applied within the coffee shop business environment to enhance sales performance?.
What specific factors .
uch as transaction timing, store location, product type, and day of the wee.
most significantly influence customer purchasing behavior in coffee shops when analyzed with ML techniques?.
How accurately does a predictive model, specifically Random Forest, perform in forecasting coffee shop sales based on historical transaction data, considering the inherent variability in customer preferences and operational complexities? Despite the recognized advantages of machine learning, its application in coffee shops remains This research acknowledges the potential challenges in applying ML, such as data sparsity from diverse product offerings and the complexities of real-time implementation in a fast-paced retail setting.
However, by addressing these nuances, this study presents a unique opportunity to yield valuable insights into customer behavior and preferences, ultimately offering data-driven strategies that can enhance profitability and customer satisfaction in this evolving industry.
MATERIALS AND METHOD
This study adopts a quantitative approach using supervised ML techniques to predict sales performance in a coffee shop environment as shown in Figure 1.
The research is based on a dataset consisting of historical transaction records collected over a six-month period.
The dataset includes key features such as transaction date, transaction time, product category, store location, unit price, and transaction quantity.
The methodological process begins with data preprocessing, where data types are standardized and categorical string values are cleaned for consistency and accuracy.
As part of feature engineering, additional time-related features such as hour of transaction, day of the week, and weekend indicators are derived.
A new column for total sales is also calculated by multiplying unit price with transaction quantity.
These steps enhance the dataset to better capture time-based purchasing patterns.
MALCOM - Vol.
5 Iss.
3 July 2025, pp: 1000-1011 MALCOM-05.
: 1000-1011
Exploratory Data Analysis (EDA) is conducted to identify preliminary trends .
, such as product popularity across different times and days, as well as peak hours for transactions.
These insights are crucial for informing the predictive modeling process and aligning business strategies with customer behavior.
For the predictive modeling phase, the study utilizes the Random Forest Regressor algorithm to forecast sales The model is trained on input features including time of day, day of the week, store location, and product type, while the target variable is total sales.
Model performance is evaluated using 5-fold crossvalidation, and Mean Squared Error (MSE) is used as the primary evaluation metric to assess the accuracy of the predictions .
A lower MSE indicates better model performance .
This methodology provides a robust analytical framework for understanding customer purchasing behavior in coffee shops and aims to generate actionable insights to support data-driven decision-making, particularly in areas such as inventory management, product planning, and promotional timing.
Figure 1.
Research Methodology Type and Research Approach This study adopts a quantitative research approach with an emphasis on applied experimentation in the field of data analysis and machine learning.
The primary objective is to develop a sales prediction model based on historical transaction data from a coffee shop.
Utilizing the Random Forest Regressor algorithm, the research aims to identify purchasing behavior patterns influenced by transaction timing, store location, and product types.
The resulting model is expected to provide valuable insights to support data-driven decisionmaking, particularly in areas such as product planning, inventory management, and the strategic timing of Data Collection The dataset utilized in this study comprises a comprehensive record of transactions from a coffee shop, encompassing a total of 149,116 entries collected over a six-month period from January to June 2023.
This dataset serves as a rich source of information, capturing various attributes critical for analysis, including transaction ID, transaction date, transaction time, transaction quantity, store ID, store location, product ID, unit price, product category, product type, and product detail.
All columns in the dataset are complete with no null values, providing a robust foundation for further analysis.
However, certain fields such as transaction_time, which is currently stored as an object data type require data type adjustments.
Additionally, some columns exhibit high variance, which may reduce analytical efficiency.
These issues will be addressed during the data preprocessing phase to ensure consistency and improve model performance.
The datasetAos Predictive Sales Analysis in Coffee Shops Using.
(Windrasari et al, 2.
ISSN(P): 2797-2313 | ISSN(E): 2775-8575 temporal granularity allows for in-depth exploration of customer purchasing behavior in relation to timebased factors.
Therefore, the selection of relevant features plays a critical role in supporting the subsequent stages of feature engineering.
EDA, and predictive modeling.
Careful handling of these features ensures that the insights generated are both meaningful and actionable for enhancing operational decision-making within the coffee shop business environment .
Table 1 provides basic information on the dataset used.
Table 1.
Dataset Ground Information Column transaction_id transaction_date transaction_time transaction_qty store_id store_location product_id unit_price product_category product_type product_detail Unique Value Non-Null Count Data Type Data Preprocessing Data preprocessing is a crucial step in preparing raw data for analysis , ensuring that the dataset is clean, well-structured, and suitable for generating reliable insights.
In this study, several preprocessing techniques were applied to address different data quality issues and enhance the dataset's analytical value, as summarized in Table 2.
Table 2.
Data Preprocessing Summary Case False Data Type Solution Reformatting .
Unused Column Feature Selection .
Over Variance Depth_Analytical Purpose Scalling .
Feature Engineering .
, .
Action-to Column Aotransaction_timeAo Aotransaction_idAo.
Aostore_idAo.
Aoproduct_idAo.
Aoproduct_typeAo.
Aoproduct_detailAo Aotransaction_qtyAo,Aoproduct_categoryAo Aotransaction_dateAo,Aotransaction_timeAo,Aotra nsaction_qtyAo,Aounit_priceAo The first issue identified was a false data type in the 'transaction_time' column.
Originally, the time values were stored as strings, which limited their usability for time-based computations and aggregations.
resolve this, a reformatting process was carried out to convert the data type into a standardized time format, enabling more accurate time-series analysis and easier feature extraction such as hour or minute of transaction as illustrated in Figure 2.
Figure 2.
Illustrating Process for Cleaning and Feature Engineering From Autransaction_datetimeAy Column The second step involved feature selection to eliminate columns that were either redundant or not directly relevant to the analysis objectives.
Columns such as 'transaction_id', 'store_id', 'product_id', 'product_type', and 'product_detail' were removed.
These attributes, while useful for operational tracking or MALCOM - Vol.
5 Iss.
3 July 2025, pp: 1000-1011 MALCOM-05.
: 1000-1011
transactional identification, did not contribute meaningful patterns for the scope of this analysis, and their removal helped reduce dimensionality and potential noise in the model.
Table 3.
Example from Auproduct_categoryAy Column Data Distribution Before Mapping Value Coffee Tea Bakery Drinking Chocolate Flavours Coffee Beans Loose Tea Branded Package Chocolate Distribution (%) Third, the dataset exhibited signs of over variance in certain features, particularly 'transaction_qty' and 'product_category' as shown in Table 3.
High variance in these variables could potentially bias learning algorithms or lead to skewed model performance.
To address this, scaling techniques were applied to normalize the values, ensuring that features contributed equally to the model and were measured on a comparable scale as in Table 4.
This step was essential for improving the performance of algorithms sensitive to the magnitude of input values.
Table 4.
Example from Auproduct_categoryAy Column Data Distribution After Mapping Value After Mapping Coffee Tea Bakery Chocolate Condiment Merchandise Components Coffee.
Coffee Beans Tea.
Loose Tea Bakery Drinking Chocolate.
Package Chocolate Flavours Branded Distribution (%) Finally, to support in-depth analytical objectives, feature engineering was conducted by deriving new variables from existing ones.
For instance, 'transaction_date' and 'transaction_time' were transformed into new features such as day of the week, hour of transaction, and part-of-day categories .
, morning.
Additionally, 'transaction_qty' and 'unit_price' were used to compute the total transaction value, which serves as a key metric in analyzing purchasing behavior.
These engineered features provided richer context and improved the datasetAos ability to reveal underlying patterns and trends.
The preprocessing phase not only addressed technical inconsistencies and redundancies in the dataset but also enhanced its overall analytical potential through thoughtful transformation and feature enrichment.
Exploratory Data Analysis (EDA) EDA is a critical phase in the data analysis process that involves examining datasets to uncover patterns, trends, and insights that may not be immediately apparent.
This phase is essential for understanding the underlying structure of the data and for generating hypotheses that can be tested in subsequent analyses .
The primary goal of EDAis to provide a comprehensive overview of the data, allowing analysts to identify relationships, anomalies, and areas of interest that warrant further investigation.
During exploratory analysis, various techniques such as data visualization, summary statistics, and correlation analysis are employed to gain a deeper understanding of the dataset.
This process not only helps in identifying key trends and patterns but also assists in detecting potential issues with the data, such as missing values or outliers.
thoroughly exploring the data, analysts can make informed decisions about the appropriate modeling techniques and methodologies to apply in later stages of analysis.
Based on Figure 3, in this section we focus on two key aspects of exploratory analysis: sales performance and product popularity analysis.
These analyses provide valuable insights into customer behavior and sales performance, ultimately guiding strategic decisions for the business.
Predictive Modelling Predictive modeling is a statistical technique that uses historical data to forecast future outcomes.
leveraging various algorithms and ML techniques, predictive models can identify patterns and relationships within the data, enabling businesses to make informed decisions based on anticipated future trends.
The primary goal of predictive modeling is to provide actionable insights that can enhance strategic planning, optimize operations, and improve customer engagement.
The specific objective of predictive modeling in the Predictive Sales Analysis in Coffee Shops Using.
(Windrasari et al, 2.
ISSN(P): 2797-2313 | ISSN(E): 2775-8575 context of sales analysis is to predict the product category that is likely to be sold based on various relevant By utilizing historical sales data, this model aims to provide insights that can assist in strategic decision-making, such as inventory management and marketing planning.
The key processes for predictive modeling can be seen in Figure 4.
Figure 3.
Focus Key Aspect for EDA Figure 4.
Key Process for Predictive Modelling RESULTS AND DISCUSSION This chapter presents the key findings derived from the data analysis process, which comprises two main components: EDA and Predictive Modeling.
Each section is designed to address specific research objectives and to uncover meaningful insights that support data-driven decision-making.
The first part of this chapter focuses on Exploratory Analysis, an essential step aimed at understanding the structure and dynamics of the dataset.
This phase involved a thorough examination of sales data to uncover hidden patterns and trends that are not immediately apparent.
By employing techniques such as data visualization, summary statistics, and correlation analysis, this study explored various aspects of sales performance .
, .
Key areas of focus include analyzing sales trends over time and across locations, visualizing product popularity by hour and day, and identifying peak transaction times.
While these descriptive trends provide valuable initial insights into customer behavior and purchasing patterns, a deeper statistical validation will be necessary in subsequent sections to confirm whether observed patterns are significant or merely coincidental.
The results are presented in a logical order to form a coherent narrative, focusing on factual data rather than extensive Tables and Figures can be used to illustrate findings, ensuring no redundant data is presented across different visual aids and text.
Subtitles will be employed to further clarify the descriptions within this EDA By Sales Performance can view at figure 5.
Figure 5 illustrating "Total Revenue by Month" reveals significant insights into the sales performance from January to June 2023.
Beginning with a modest revenue of $81,678 in January, the figures exhibited a slight decline in February, dropping to $76,145.
However.
March marked a pivotal turnaround with sales increasing to $98,835.
This upward trend continued into April, where total revenue reached $118,941, suggesting that the market conditions or sales strategies may have begun to improve.
In May, revenue surged dramatically to $156,728, reflecting an impressive growth rate that likely resulted from effective promotional efforts or seasonal demand.
This momentum carried into June, where total revenue peaked at $166,486.
Notably, throughout this six-month period, the average monthly revenue stood at $116,469, represented by the dashed red line on the graph.
This average serves as a crucial benchmark, indicating that the sales figures in May and June not only exceeded this average but also highlighted an overall recovery and robust growth A more detailed breakdown of this growth can be observed in Figure 6, which analyzes total revenue by store location Astoria.
HellAos Kitchen, and Lower Manhattan over the same six-month period.
While all three stores exhibited relatively modest performance in January and February, with revenues generally hovering between $25,000 and $30,000, a notable acceleration in sales began in March.
HellAos Kitchen, in MALCOM - Vol.
5 Iss.
3 July 2025, pp: 1000-1011 MALCOM-05.
: 1000-1011
particular, demonstrated the most pronounced growth, rapidly outpacing the other locations.
This upward trajectory persisted through May and June, where HellAos Kitchen consistently recorded the highest monthly revenue, surpassing $50,000 by the end of the period.
Astoria and Lower Manhattan also showed positive growth trends, with Astoria maintaining a slight edge in performance over Lower Manhattan.
These findings suggest that while the general upward trend is consistent across locations.
HellAos Kitchen may benefit from stronger market dynamics or more effective local strategies.
Futhermore, the following analysis explores product popularity across different times of day and days of the week, offering insights into customer purchasing behavior and preferences that may underlie the observed revenue trends.
Figure 5.
EDA By Sales Performance Figure 6.
EDA By Sales Performance Per Store The analysis of product category sales across the first half of 2023 reveals distinct trends in customer preferences as in Figure 7.
Coffee consistently emerges as the top-selling category throughout the period, with transaction quantities steadily increasing from 10,589 in January to 21,875 in June.
This notable rise highlights coffee's continued popularity and suggests it as a primary driver of revenue for the business.
Similarly, tea also demonstrates strong and consistent sales, experiencing a gradual increase from 8,201 transactions in January to 16,699 in June.
The growth in both coffee and tea aligns with broader consumer trends toward beverages, indicating robust demand in this segment.
In contrast, categories like bakery and chocolate show more moderate increases.
Bakery products, while consistently popular, grow from 2,690 transactions in January to 5,431 in June, signaling a steady but less dramatic rise.
Chocolate, on the other hand, exhibits a more balanced performance, with a slight upward trend in sales, reaching 4,232 transactions in June from 2,072 in January.
Condiments and merchandise, however, lag behind, with sales remaining relatively low across the months.
Our EDA follows an in-depth look at the hourly sales data reveals distinct patterns in product category demand throughout the day as shown in Figure 8.
Coffee and tea stand out as the most popular products across nearly all hours, with significant peaks observed during the morning hours.
Coffee, in particular, shows a steady increase in sales, reaching a high of 7,344 transactions at 10 AM, before gradually tapering off in the evening.
Similarly, tea follows a similar pattern, with peak sales occurring at 10 AM .
,444 Predictive Sales Analysis in Coffee Shops Using.
(Windrasari et al, 2.
ISSN(P): 2797-2313 | ISSN(E): 2775-8575 transaction.
and maintaining a strong presence until mid-afternoon.
These trends suggest that coffee and tea are essential products for customers during the early part of the day, likely driven by their role as morning Bakery products, though not as dominant as coffee or tea, show a clear peak in sales between 8 AM and 10 AM, with transaction quantities consistently above 2,500 units during this period.
This indicates a high demand for bakery items in the morning, possibly as customers pair these products with their morning coffee or tea.
On the other hand, chocolate sales show a more moderate, consistent demand throughout the day, with no significant peaks, but a steady presence from 6 AM through 3 PM.
While chocolate maintains a lower volume than beverages or bakery items, it remains a steady choice for customers.
Condiments and merchandise once again show the least variation across hours, with sales being consistently low throughout the day.
Condiments experience a slight increase in the morning but remain relatively stable, while merchandise sales are minimal, even during peak hours, suggesting limited customer interest in this category during the day.
As noted in the previous analysis, condiments and merchandise demonstrate consistently low sales throughout the day, with minimal variation across hours.
Specifically, condiment sales show a slight increase in the morning but remain relatively stable thereafter, while merchandise consistently underperforms, even during peak hours.
These trends highlight a potential opportunity for product strategy By leveraging predictive modeling, the goal is to investigate sales performance factors and develop targeted strategies to increase sales.
Figure 7.
EDA for Product Popularity Within Months Figure 8.
Product Sold Pattern The performance of the Random Forest Regressor model was rigorously evaluated using 5-fold crossvalidation, yielding a mean MSE of approximately 80.
s detailed in Table .
This result suggests the model performs reasonably well, especially when considered against a simple linear regression baseline, which typically exhibits a higher MSE in this context due to its inability to capture complex, non-linear relationships inherent in sales data.
While a lower MSE is always desirable, this value indicates a moderate MALCOM - Vol.
5 Iss.
3 July 2025, pp: 1000-1011 MALCOM-05.
: 1000-1011
level of predictive accuracy for the Random Forest model.
However, a notable concern is the relatively high variance in MSE across the different folds .
anging from 43.
83 to 132.
This variability highlights that the modelAos performance can fluctuate significantly depending on the specific data subsets used for training and According to research.
Each fold in k-fold cross-validation represents a different combination of training and validation data, which can result in different accuracy values for each fold .
Such fluctuations likely stem from several factors, including pronounced seasonality .
here sales patterns change throughout the yea.
, diverse geographical influences affecting consumer preferences, or even idiosyncratic local events that impact specific store locations.
This high variance suggests the model might be sensitive to these uncaptured or under-represented factors, thereby limiting its generalizability across diverse conditions .
To mitigate these fluctuations and improve overall model robustness, future refinements could involve incorporating additional external variables .
, weather data, local marketing campaigns, competitor activitie.
, conducting more granular feature engineering to account for temporal and spatial nuances, and exploring advanced ensemble techniques or hyperparameter tuning strategies aimed at reducing variance .
Table 5.
Random Forest Regressor Prediction Performance Fold
MSE
Mean MSE Figure 9.
Sales Performance Factors Based on Random Forest Regressor In terms of feature importance, visualized in Figure 9, the model identifies several crucial predictors for accurately forecasting sales.
The most important feature is product_type_Premium Beans, with an importance score of 0.
32, indicating its significant role in the modelAos predictions.
The prominence of premium bean products is likely attributed to their higher price point and a perceived greater consumer demand for quality or exclusive items.
While this aligns with general business knowledge in the coffee industry, further investigation through pricing data or customer surveys would provide empirical evidence to fully explain why these products tend to generate more revenue and exhibit distinct sales patterns compared to regular offerings.
Additionally, the feature month has an importance score of 0.
15, highlighting the strong role of seasonality in sales patterns.
This suggests that sales are significantly influenced by the time of year, with particular months .
erhaps during holidays or special promotion.
seeing spikes or drops in customer For instance, sales may be higher during colder months when customers are more likely to purchase warm beverages, or during certain cultural or seasonal events that affect coffee consumption.
This finding underscores the importance of temporal features in forecasting, and further implies that incorporating more detailed seasonal features, such as specific holidays or regional events, could significantly enhance the model's predictive accuracy.
The time_of_day feature, with an importance score of 0.
08, also plays a notable role in predicting sales, indicating that customer behavior is significantly influenced by the hour.
This consistency aligns with the understanding that certain products are more popular at different times morning customers may gravitate Predictive Sales Analysis in Coffee Shops Using.
(Windrasari et al, 2.
ISSN(P): 2797-2313 | ISSN(E): 2775-8575 toward coffee and pastries, while afternoon or evening customers might prefer lighter beverages or snacks.
The timing of a customerAos visit clearly appears to be a significant factor in shaping their purchasing decisions, and this can vary depending on store type, location, and individual customer preferences.
Moreover, the is_weekend_Weekend feature, with an importance score of 0.
07, suggests that sales patterns are distinctly different on weekends compared to weekdays.
This aligns with common consumer behavior, where weekend shoppers often have more leisure time and higher spending tendencies, particularly in retail and food service environments.
The modelAos sensitivity to weekends reinforces the notion that business strategies, such as promotions, inventory management, or staffing schedules, should actively consider weekend dynamics, as consumer spending habits often vary considerably from weekdays.
Despite the model's relatively strong performance in capturing these temporal patterns, several opportunities for improvement and limitations remain.
The observed high variance in MSE .
s discussed previousl.
suggests that while the Random Forest model is powerful, it might be susceptible to capturing noise in the training data, potentially leading to overfitting on specific subsets.
This issue could be compounded by limitations in data granularity, particularly if the available dataset lacks detailed information on individual customer demographics or highly specific contextual events.
To address these limitations and enhance the model's generalizability and reproducibility, future work could involve more sophisticated feature engineering based on deeper domain knowledge, the incorporation of additional external variables such as local events, weather conditions, or real-time social media trends, and exploring advanced regularization techniques or alternative ensemble methods to mitigate overfitting.
Furthermore, conducting sensitivity analyses to understand how variations in input data affect predictions would bolster the study's credibility and provide more actionable insights for real-world application.
CONCLUSION
The evaluation of the Random Forest Regressor model, based on cross-validation, reveals it performs reasonably well, with a mean MSE of 80.
While this indicates decent predictive accuracy, the model still exhibits moderate error, and there's clear room for improvement.
The variance in MSE across the five folds .
anging from 43.
83 to 132.
suggests the modelAos performance fluctuates significantly depending on the data subsets, highlighting its sensitivity to varying conditions like seasonality and geographical factors.
These factors, including sales patterns that change across different times of the year or varying consumer behavior based on store locations, contribute to this performance variability.
Therefore, improving the modelAos generalizability across diverse scenarios will be a key area of focus in future work.
In terms of feature importance, the model identifies key predictors that strongly influence sales The most influential feature is product_type_Premium Beans, with an importance score of 0.
suggesting premium products significantly drive sales predictions, likely due to their higher price points and exclusive demand.
Seasonality, captured by the month feature .
mportance score 0.
, plays a critical role, reinforcing the significance of time-based patterns in sales.
This finding suggests the model could benefit from further refinement by incorporating more granular seasonal features, such as major holidays .
Christmas.
Eid al-Fit.
or significant local events .
, city festivals, university break.
, to improve forecasting accuracy.
Additionally, time_of_day .
and is_weekend_Weekend .
are identified as significant predictors, indicating consumer behavior is influenced by both the time of day and whether it's a weekend.
These insights align with established retail patterns, where certain products are more popular during specific times .
, coffee in the morning, lighter snacks in the afternoo.
, and spending habits differ between weekdays and weekends.
Thus, the modelAos sensitivity to these temporal factors suggests businesses should tailor their inventory management, staffing, and promotional strategies accordingly to optimize sales Despite the modelAos promising results, this study acknowledges several broader constraints.
These include potential limitations in data quality, such as missing values or inconsistencies in transaction records, and possible sampling biases if the historical data used does not fully represent all coffee shop locations or customer segments.
Furthermore, real-world implementation challenges, such as the need for robust IT infrastructure for real-time data processing and the integration of ML outputs into existing operational workflows, were not explicitly within the scope of this methodological study.
Therefore, there are several avenues for further enhancement.
The observed high variance in MSE implies that better feature selection or the inclusion of additional, contextually relevant external variablesAi such as local weather conditions .
, temperature, rainfall impacting outdoor seating or beverage choic.
, major public holidays, nearby office working patterns, or even competitor promotions detected via social media trendsAicould help refine predictions and reduce fluctuations in performance.
Furthermore, advanced hyperparameter tuning and exploring other modeling techniques, such as time-series specific models .
ARIMA.
Prophe.
or deep learning approaches, could enhance the model's stability and predictive power, making it more robust for practical application in the dynamic coffee shop industry.
MALCOM - Vol.
5 Iss.
3 July 2025, pp: 1000-1011 MALCOM-05.
: 1000-1011
REFERENCES