Institut Riset dan Publikasi Indonesia (IRPI)

MALCOM: Indonesian Journal of Machine Learning and Computer Science
Journal Homepage: https://journal.irpi.or.id/index.php/malcom
Vol. 5 Iss. 3 July 2025, pp: 766-775
ISSN(P): 2797-2313 | ISSN(E): 2775-8575

Optimization of Customer Segmentation in the Retail Industry
Using the K-Medoid Algorithm
Endy Wulan Agustin1*, Kurnia Uthami2, Arvan Izzatul Ulfa3,
Lusiana Efrizoni4, Rahmaddeni5
1,2,3,4,5
Department of Informatic Engineering,
Universitas Sains dan Teknologi Indonesia, Indonesia

E-Mail: 12210031802077@sar.ac.id, 22210031802070@sar.ac.id,
2210031802104@sar.ac.id, 4lusiana@stmik-amik-riau.ac.id, 5rahmaddeni@usti.ac.id

3

Received Feb 18th 2025; Revised Apr 03rd 2025; Accepted Apr 16th 2025; Available Online Jun 19th 2025, Published Jun 22th 2025
Corresponding Author: Endy Wulan Agustin
Copyright ©2025 by Authors, Published by Institut Riset dan Publikasi Indonesia (IRPI)

Abstract
The retail industry faces significant challenges in understanding increasingly complex customer behavior due to massive
data growth. One major obstacle is suboptimal customer segmentation, leading to ineffective marketing strategies. This
study aims to optimize customer segmentation by implementing the K-Medoid algorithm, which excels in handling
outliers and producing more stable clusters compared to K-Means. The dataset consists of over 10,000 customer
transactions from a major retail company in Indonesia. The research process includes data collection and preprocessing,
K-Medoid algorithm implementation, and performance evaluation using the silhouette score. The results indicate that the
K-Medoid algorithm achieves more accurate customer segmentation, with a silhouette score of 0.39. The generated
clusters exhibit greater homogeneity, enabling companies to design more targeted marketing strategies, such as specific
discount offers and tailored loyalty programs. Based on these findings, the K-Medoid algorithm is recommended to
enhance customer management effectiveness in the retail industry. This study contributes to selecting a more suitable
algorithm for customer segmentation in the era of big data and opens opportunities for further exploration of hybrid
algorithms and additional evaluation metrics.
Keywords: Customer Segmentation, K-Medoid, Optimization, Retail Industry, Silhouette Score

1.

INTRODUCTION
A retail business is a business that sells goods directly to consumers by breaking down several
products into smaller ones and including goods and services [1]. One of the key factors in determining the
success of a business is the presence of customers [2]. The competition in the business world is increasing,
driving companies to optimize sales and retain customers. As valuable assets, customers must be managed
well to ensure business sustainability and growth. Customer segmentation aims to understand customer
purchasing behavior, enabling companies to design and implement more effective and targeted marketing
strategies [3]. Market segmentation is a group of consumers with different needs, characteristics, and
behaviors in a particular market so that it becomes a homogeneous and unified market target market with a
marketing mix strategy [4]. Marketing strategies play a crucial role in business competition between
companies. In addition to focusing on product-oriented marketing, companies must also prioritize customeroriented approaches [5]. Cluster analysis is a method for grouping instances (samples) into several groups,
subsets, or clusters based on their "similarity" to other instances [6]. Maintaining product sales amid tight
market competition is crucial. Therefore, business sales analysis is essential to understand long-term
customer relationships, manage sales fluctuations, and plan consistent marketing strategies [7]. In managing
customer relationships, companies need to understand the characteristics of each customer in order to design
appropriate management strategies [8]. In marketing strategy development, information technology can be
utilized in computing, one of which is data mining [5].
K-medoids algorithm is another classical division-based clustering method. Compared with K-means,
this algorithm optimizes the se- lection method ofthe center ofmass, overcomes the defect of being sensitive
to isolated points, and has higher clustering accuracy [9]. One of the clustering technique algorithms is the KMedoids algorithm, which can group data into clusters with similar objects [10].The K-Medoids algorithm is
a clustering technique used to group objects into clusters based on similarity or resemblance. One of the
DOI: https://doi.org/10.57152/malcom.v5i3.1977

766

ISSN(P): 2797-2313 | ISSN(E): 2775-8575
advantages of this algorithm is its robustness against outliers, which helps reduce the influence of noise in the
clustering process [5]. K-Medoids Clustering is used to perform segmentation based on customer
characteristics, enabling more accurate and stable customer grouping [11]. Clustering is a data mining
technique that groups data into clusters with similar characteristics [12]. The K-Medoids algorithm also
belongs to the class of partitional clustering methods, which is a variant of the K-Means method. K-Medoids
is an improvement over the K-Means method with better capabilities in handling data containing outliers
[13].
Information technology makes it easier to work on various documents, reports and other
correspondence so that with information technology everything can be completed as effectively and
efficiently as possible [14]. The need for information has now become a necessity for both individuals and
organizations [15]. The information obtained can fulfill various needs and serve as the key to development in
various aspects such as technology, economy, health, and the environment [16].
According to a study by Anggi Ayu Dwi Sulistyawati and Mujiono Sadikin (2021), the optimal
number of clusters is 3, with a maximum Silhouette Index value of 0.375 and a minimum Davies-Bouldin
Index value of 1.030 [5]. Meanwhile, research conducted by Romadansyah Siagian et al. (2022) showed that
K-Medoids outperformed K-Means with a ratio value of 0.337575 compared to 0.3380724 for K-Means,
making K-Medoids the preferred method for clustering data as the optimal cluster [13]. Additionally, a study
by Nita Mirantika et al. (2023) utilized the K-Medoids algorithm to determine the optimal number of clusters
using the silhouette coefficient method, yielding three clusters[8]. The study conducted by Pertiwi T, Afdal
M, and Novita R found that customers in this segment frequently make purchases with a relatively large
amount of money. Meanwhile, customers in clusters 2, 3, 4, and 5 are dormant customers who rarely make
transactions and spend relatively small amounts of money [17].
This study differs from previous research in several aspects. Some prior studies only compared the KMedoid algorithm with K-Means without evaluating its impact on marketing strategies. Additionally, this
study utilizes a large-scale retail transaction dataset (>10,000 transactions), providing more representative
segmentation results compared to studies with smaller datasets. Performance evaluation is conducted using
three key metrics: Silhouette Score, Davies-Bouldin Index, and Purity Score, offering a more comprehensive
understanding of the algorithm's effectiveness.
2.

MATERIALS AND METHOD
In the research method, there are several work sequences that must be followed. These sequences
consist of steps that should be carried out in accordance with the main problem to ensure they do not deviate
from the specified problem boundaries[18]. The research method will be outlined in the research framework.
The research framework represents the sequence of steps involved in conducting the research process[19].
2.1.

Research Approach
This research uses a quantitative approach with an experimental method based on data mining. The KMedoids algorithm is used to perform customer segmentation based on transaction patterns in the retail
industry.
2.2.

Data Sources
Data has become an important and valuable asset in the era of information technology because it is
essential for strategy formulation and decision-making [20]. The dataset used in this research is customer
transaction data from the retail industry over a certain period. The data is obtained from the company's
transaction management system or relevant secondary sources. The dataset includes information such as:
Customer ID, Transaction Date, Product Category, Quantity, Transaction Amount, Store Location.
2.3.

Research Stage
There are five stages in the research process, namely data collection, data preprocessing, K-Medoid
algorithm implementation, evaluation and validation of results, and interpretation and analysis of results, as
shown in Figure 1.
Data collection is conducted from retail transaction systems or other credible sources, ensuring that
the gathered data is stored in a suitable format for further analysis. Once collected, the data undergoes
preprocessing to enhance its quality before clustering. This stage includes data cleaning, which involves
removing duplicates, handling missing values, and eliminating anomalies. Additionally, data transformation
is performed to convert the data into an appropriate format, followed by data normalization to rescale
numeric variables and ensure uniformity.
After preprocessing, the K-Medoids algorithm is implemented through several steps. First, the optimal
number of clusters (K) is determined using methods such as the Elbow method or the Silhouette Score. Initial
medoids are then randomly selected, and the distance between each data point and the medoids is calculated.
Based on these distances, the data points are assigned to clusters, and the medoids are updated iteratively
MALCOM - Vol. 5 Iss. 3 Juli 2025, pp: 766-775

767

MALCOM-05(03): 766-775
until no significant changes occur. The clustering results are then evaluated using validation metrics to ensure
accuracy and reliability.

Figure 1. Research Stage
The evaluation and validation process involves various techniques to assess the effectiveness of
clustering. The Silhouette Score measures how well data points fit within their clusters, while the DaviesBouldin Index evaluates the optimal formation of clusters. If labeled data is available, the Purity Score is
used to determine segmentation quality. Finally, the clustering results are interpreted to identify customer
characteristics in each segment. This analysis helps businesses develop more effective marketing strategies
tailored to different customer groups in the retail industry.
3.

RESULTS AND DISCUSSION
In this study, the two main features used for segmentation are Quantity (the number of products
purchased) and Transaction_Amount (the amount of money spent in each transaction).
3.1. Research Result
3.1.1 Data Collection
The data used comes from over 10,000 customer transactions at a major retail company in Indonesia.
It was obtained from the company's transaction management system or relevant secondary sources. The data
is collected from the retail transaction system or other credible sources. The data is then stored in a format
suitable for further analysis, ensuring accuracy and consistency. Common data collection techniques include
direct extraction from point-of-sale (POS) systems, integration with customer relationship management
(CRM) software, and data aggregation from multiple retail branches. This structured approach allows for a
comprehensive understanding of sales trends and customer behavior.
Below are the script and results displaying a sales transaction data table containing various important
pieces of information. Each row in the table represents a single transaction, with columns indicating the
transaction serial number, unique customer ID, transaction date, purchased product category (such as
Furniture, Groceries, Beauty, and Clothing), the number of units bought, and the store location where the
transaction took place (e.g., Bandung, Medan, Jakarta, Yogyakarta, and Surabaya). This table is highly useful
for sales analysis, helping to understand customer purchasing patterns and product performance across
different locations. The results can be seen in Table 1.
Table 1. Dataset
No

Customer_ID

0
1
2
9997
9998
9999

6d8f43b2
94d982ed
e161afe2
665c53e6
036a641b
f3fea7a3

Transaction_
Date
2024-03-29
2024-04-19
2024-08-04
2024-05-24
2024-07-09
2024-02-12

Product_
Category
Furniture
Beauty
Groceries
Clothing
Furniture
Groceries

Quantity
7
4
8
1
1
8

Transaction_
Amount
350.18
417.64
216.46
54.60
183.11
345.14

Optimization of Customer Segmentation in the Retail Industry... (Agustin et al, 2025)

Store_Location
Bandung
Bandung
Medan
Yogyakarta
Jakarta
Surabaya

768

ISSN(P): 2797-2313 | ISSN(E): 2775-8575
3.1.2 Data Preprocessing
In this data preprocessing, the syntax is only to display the first five rows of the data.
1.

Data Cleaning
The table 2 is the script for data cleaning, and the results after performing the data cleaning.
Table 2. Data Cleaning
No
0
1
2
3
4

Quantity
7
4
8
4
7

Transaction_Amount
350.18
417.64
216.46
447.29
67.81

The table 2 displays the results of the data cleaning process performed on the DataFrame. The table
consists of Quantity and Transaction_Amount columns. The Quantity column represents the number
of product units sold in each transaction. On the other hand, the Transaction_Amount column reflects
the total value of each transaction, providing relevant financial information. With the data cleaned and
neatly organized, this table is ready for further analysis, enabling a better understanding of sales
patterns and product performance.
2.

Data Transformation
The table 3 is the script for data transformation, and the results after performing the data
transformation.
Table 3. Data Transformation
No
0
1
2
3
4

Quantity
0.778152
-0.385875
1.166161
-0.385875
0.778152

Transaction_Amount
0.672082
1.147494
-0.270287
1.356447
-1.317872

The table 3 displays the results of the data transformation process applied to the DataFrame. The table
consists of Quantity and Transaction_Amount columns. Here, the Quantity column shows values that
have been normalized or standardized, allowing for better comparisons across data with different
scales. The values in the Transaction_Amount column have also undergone transformation, reflecting
adjustments that may be necessary for further analysis. Applying these transformation techniques
helps reduce bias and improve analytical accuracy, enabling a deeper understanding of patterns and
relationships in sales data. This final output provides a more solid foundation for statistical analysis or
predictive modeling.
3.

Data Normalization
The table 4 is the script for data normalization, and the results after performing the data normalization.
Table 4. Data Normalization
No
0
1
2
3
4

Quantity
0.778152
-0.385875
1.166161
-0.385875
0.778152

Transaction_Amount
0.672082
1.147494
-0.270287
1.356447
-1.317872

The table 4 shows the results of the data normalization process applied to the DataFrame. The table
consists of Quantity and Transaction_Amount columns. The values in these columns have been
normalized, meaning the data has been adjusted to a specific scale, typically between 0 and 1 or in the
form of a z-score. This normalization is essential to reduce the impact of different variable scales,
making analysis and comparisons easier. The normalized data allows analytical models, such as
regression or machine learning algorithms, to function more effectively since all features are now
within the same range. With normalized data, the analysis of sales patterns and relationships between
variables becomes clearer and more accurate.
MALCOM - Vol. 5 Iss. 3 Juli 2025, pp: 766-775

769

MALCOM-05(03): 766-775
3.1.3 Implementation K-Medoid Algorithm
# Determining the optimal number of clusters using the Silhouette Score
best_k = 2
best_score = -1
best_model = None
for k in range(2, 10): # Try from 2 to 9 clusters
model = KMedoids(n_clusters=k, random_state=42,max_iter=100)
labels = model.fit_predict(df_scaled)
score = silhouette_score(df_scaled, labels)
if score > best_score:
best_score = score
best_k = k
best_model = model
print(f"Best K: {best_k}")
Output:

Best K: 5
This code aims to determine the optimal number of clusters in cluster analysis using the K-Medoids
method, evaluated with the Silhouette Score. Initially, the variable best_k is set to 2, best_score is set to -1,
and best_model is set to None. Then, a loop is executed to test the number of clusters from 2 to 9. In each
iteration, a K-Medoids model is created with the current number of clusters being tested, and the normalized
data (stored in the df_scaled variable) is processed to obtain cluster labels. The Silhouette Score is calculated
for each model, and if the score is higher than the previous best_score, it is updated accordingly. After all
iterations are completed, the results indicate that the best number of clusters found is 5, suggesting an optimal
cluster configuration for the data analysis.
3.1.4 Evaluation and Validation Results
Silhouette Score measures how well objects within a cluster are grouped, with values ranging from -1
to 1, where higher values indicate better-separated and more cohesive clusters. Davies-Bouldin Index
assesses clustering quality based on the distance between clusters and the compactness within clusters, where
lower values indicate better clustering. Purity Score is used to measure how pure the formed clusters are
based on assigned labels, with higher values indicating that clusters are more homogeneous to a specific
category. These evaluation results help determine whether the clustering model is optimal or needs further
adjustments.
1.

Silhouette Score
First, silhouette_score is applied to the normalized data (df_scaled) using the cluster labels generated
by the previously determined best model (best_model.labels_). The Silhouette Score, which reflects
how well each data point is grouped, is calculated and stored in the variable silhouette_avg. Then, the
result is printed in a format that displays the number of clusters used (best_k) and the Silhouette Score
with four decimal places. The output shows that the Silhouette Score is approximately 0.3943,
indicating a moderate level of separation between the formed clusters, though there is room for
improvement in data grouping.

2.

Davies-Bouldin Index
First, the cluster labels generated by the best model are stored in the labels variable. Then, the
davies_bouldin_score function is applied to the normalized data (df_scaled) and the cluster labels to
calculate the Davies-Bouldin Index, which is stored in the db_index variable. This index measures
cluster separation and compactness: the lower the value, the better the clustering quality. The output
shows that the Davies-Bouldin Index is approximately 0.8445, indicating a fairly good level of
separation between the formed clusters. However, this value also suggests that there is still potential
for further improving the cluster structure.

3. Purity Score
In the execution results of the code, the obtained Purity Score is 0.2123, which means that only about
21.23% of the elements in the clusters truly belong to the same category. This value indicates that the
clustering results are still inaccurate and have a high degree of category mixing. To improve the Purity
Score, enhancements can be made to the clustering process, such as selecting more relevant features,
fine-tuning the algorithm’s parameters, or using alternative clustering methods that better suit the
characteristics of the data.

Optimization of Customer Segmentation in the Retail Industry... (Agustin et al, 2025)

770

ISSN(P): 2797-2313 | ISSN(E): 2775-8575
3.1.5 Interpretation and Analysis Results
The comparison results can be seen in Table 5.
Table 5. The Comparison Result
Metric
Silhouette
Score

Definition
Measures how well data points are
clustered with in their cluster compared to
other clusters. Values range from -1 to 1.

Value
0.39

DaviesBouldin
Index

Measures the ratio of intra-cluster
distance to inter-cluster distance. Lower
values indicate better clustering.

0.84

Purity
Score

Measures how much data in a cluster
comes from the same category. Values
range from 0 to 1.

0.21

Interpretation
A positive value indicates that most data points
are well-clustered, but there is still room for
improvement.
A value close to 0 suggests that clusters are
well-separated and more compact. This value
indivates overlap between clusters, leading to
suboptimal separartion.
A low value indicates that the clusters are less
homogeneous and contain a mix of categories,
with only a small portion of data in the cluster
coming from the same category.

The visualization of the clustering results can be seen in Figure 2.

Figure 2. Visualization of the Clustering
Figure 2 illustrates various elements related to data clustering using the K-Medoids algorithm. The Xaxis represents the quantity of data used in clustering, which may correspond to a specific feature from the
dataset, while the Y-axis indicates the transaction amount associated with each data point. Each data point is
color-coded to represent different clusters, showing how similar data points are grouped. Among these, the
red points labeled as "Medoids" act as cluster centers, selected based on their minimal distance to all other
points within the cluster. The distribution of data suggests the presence of distinct groups, with some points
clustered near the Y-axis, indicating lower transaction amounts.
This visualization helps in understanding patterns within the dataset by showcasing how data points
are grouped and how transaction behaviors vary across clusters. The K-Medoids algorithm identifies
meaningful clusters without making assumptions about data distribution, making it a robust method for
segmentation. Further evaluation using metrics like the Silhouette Score, Davies-Bouldin Index, and Purity
Score allows for a deeper analysis of the clustering results, ensuring that the segmentation accurately
represents customer behavior.
1.

Customer Segmentation Results
After applying the K-Medoids algorithm to the customer transaction dataset, segmentation results
were obtained with an optimal number of 5 clusters based on the highest Silhouette Score value. Customer
segmentation results play a crucial role in data analysis as they provide deeper insights into customer patterns
and characteristics. This segmentation allows businesses to understand how customers interact with products
or services and how their purchasing behaviors can be categorized into specific groups. By applying the KMedoids method, customers can be grouped based on similarities in transaction behavior, product
preferences, and purchase frequency. These segmentation results not only help identify potential customer
groups but also enable companies to make data-driven decisions in developing more effective marketing
strategies.
MALCOM - Vol. 5 Iss. 3 Juli 2025, pp: 766-775

771

MALCOM-05(03): 766-775
Additionally, the obtained segmentation results can be used to optimize promotional strategies, loyalty
programs, and service personalization based on the needs of each customer cluster. By understanding how
each cluster behaves, businesses can allocate resources more efficiently and enhance the overall customer
experience.
2.

Characteristics of Each Cluster
Each cluster generated in customer segmentation has unique characteristics that distinguish it from
others. These characteristics may include various aspects such as purchase frequency, transaction amounts,
product preferences, and consumption patterns over specific time periods. For example, one cluster may
consist of customers who make frequent purchases in small amounts, while another cluster may include
customers who rarely transact but make large purchases.
Analyzing the characteristics of each cluster is essential to identifying the most suitable marketing
strategies. For instance, customers in a high-transaction cluster may be offered loyalty programs or exclusive
deals to enhance retention, whereas customers in a low-purchase-frequency cluster may receive discountbased marketing strategies or special promotions to increase their engagement. By understanding these
characteristics, businesses can implement more targeted approaches to reach customers and improve the
effectiveness of their business strategies.
Table 6. Customer Segmentation Results Using K-Medoids
Cluster
1
2
3
4
5

Number of Customer
2,500
1,800
2,200
1,500
2,000

Average Transaction
High
Medium
Low
Very Low
Low

Purchase Pattern
Frequent
Irregular
Rare
Very Rare
Dormant

The customer segmentation results using the K-Medoid method in the table indicate that customers
can be grouped into several clusters based on specific characteristics, such as transaction volume, total
spending, and visit frequency. These clusters reflect different customer behavior patterns, where some groups
exhibit high transaction levels and active engagement, while others show lower activity. Understanding these
differences allows businesses to tailor their marketing strategies more effectively.
Customers in clusters with high transactions and spending can be considered loyal customers who
contribute significantly to revenue. Therefore, strategies such as loyalty programs or exclusive offers can be
implemented to retain them. Meanwhile, clusters with less active customers may require a different approach,
such as more aggressive promotional campaigns or personalized product recommendations to increase their
engagement. Additionally, clusters with customers who show high potential but are not yet fully active could
be prime targets for engagement-enhancing strategies through discounts or more intensive communication.
Overall, customer segmentation using K-Medoid provides deeper insights into customer purchasing
patterns and enables companies to optimize their resource allocation. With this data-driven approach,
businesses can enhance marketing efficiency, strengthen customer loyalty, and ultimately drive more
sustainable business growth.
3.

Cluster Visualization
The clustering results are visualized in a scatter plot illustrating the distribution of customers based on
two main variables: the number of products purchased (Quantity) and the total transaction amount
(Transaction Amount).
a. Cluster 1 shows a concentration of customers with high transactions and frequent shopping.
b. Clusters 2 and 3 indicate customers with medium and low transactions.
c. Clusters 4 and 5 are dominated by dormant customers who rarely make transaction
4.

Scatter Plot of Customer Segmentation
Below is a scatter plot that visualizes the clustering results. Each point represents a customer, with
colors indicating different clusters. The centroids (medoids) are marked to highlight the center of each
cluster. Each color represents a different cluster, with the X-axis showing the number of products purchased
and the Y-axis showing the transaction amount.
In the scatter plot of customer segmentation using the K-Medoids algorithm, each color represents a
different customer cluster. The X-axis indicates the number of products purchased, while the Y-axis
represents the transaction amount. Customers are categorized based on their purchasing patterns and
transaction value, which helps in developing targeted marketing strategies.
The first cluster, represented by the color red, consists of High-Value Customers. These customers
have high transaction amounts and frequently shop, making them the most valuable to the company.

Optimization of Customer Segmentation in the Retail Industry... (Agustin et al, 2025)

772

ISSN(P): 2797-2313 | ISSN(E): 2775-8575
Effective marketing strategies for this group include exclusive offers, premium loyalty programs, and
personalized promotions to maintain their engagement. Meanwhile, the blue cluster represents Medium-Value
Customers, who have moderate transaction values with irregular purchasing patterns. This group has the
potential to be upgraded to high-value customers through marketing strategies such as seasonal discounts and
recommendation-based promotions.

Figure 3. Customer Segmentation Visualization Using K-Medoids
Next, the green cluster represents Low-Value Customers, who have low transaction amounts and make
purchases infrequently. They typically buy only under specific circumstances, such as during promotions.
Therefore, suitable marketing strategies include incentive-based campaigns and more personalized product
recommendations. The purple cluster, labeled as Very Low-Value Customers, consists of customers who are
nearly inactive, with very rare transactions and minimal purchase amounts. This group may not have strong
brand engagement and only make occasional purchases. Strategies for re-engaging them include email
marketing, reactivation campaigns, and attractive discount offers.
Finally, the orange cluster categorizes Dormant Customers, who rarely make transactions and tend to
be inactive. To regain their interest in shopping, more aggressive marketing strategies are required, such as
customer reactivation programs, cashback offers, or special incentives for their first transaction after a long
period of inactivity. With this clear segmentation based on color, businesses can effectively tailor strategies
for each customer group, ultimately improving customer retention and profitability.
5.
a.
b.
c.

Analysis and Business Implications
Based on the segmentation results, it can be concluded that:
Cluster 1 consists of the most valuable customers who should receive special attention through loyalty
programs and personalized promotions.
Clusters 2 and 3 can be targeted with discount offers or incentives to increase their purchase
frequency.
Clusters 4 and 5 comprise customers who rarely transact. Marketing strategies such as email
marketing or customer retention campaigns can be used to re-engage them.

6.

Comparison with Previous Studies
This segmentation result aligns with previous research by Sulistyawati and Sadikin (2021), which
found an optimal number of 3 clusters. However, in this study, the optimal number of clusters obtained is 5,
based on evaluation using the Silhouette Score method.
7.
a.
b.
c.

Recommended Marketing Strategies Based on Clusters
Cluster 1: Exclusive offers, premium membership, and loyalty programs.
Clusters 2 & 3: Seasonal discounts, product recommendations based on purchase history.
Clusters 4 & 5: Re-engagement campaigns, email marketing, and special discounts for dormant
customers.

3.2.

Discussion
Based on the analysis conducted, the evaluation of clustering performance should consider the
combination of three key metrics: Silhouette Score, Davies-Bouldin Index, and Purity Score. These metrics
provide insights into cluster cohesion, separation, and homogeneity. The Silhouette Score in this study is
0.39, indicating that most data points are reasonably well-clustered, though there is still room for
improvement. Ideally, a value closer to 1 would signify better clustering performance. The Davies-Bouldin

MALCOM - Vol. 5 Iss. 3 Juli 2025, pp: 766-775

773

MALCOM-05(03): 766-775
Index is recorded at 0.84, suggesting some degree of overlap between clusters. Since lower values indicate
better cluster separation, this result implies that the clustering process could be further optimized. Lastly, the
Purity Score is 0.21, revealing that the clusters are less homogeneous and contain mixed categories. A higher
score would suggest better consistency within each cluster.
When comparing these results with previous studies that utilized the same clustering algorithm,
variations in performance metrics become apparent. For instance, a study by Kaufman and Rousseeuw (1990)
on the K-Medoids algorithm demonstrated a generally higher Silhouette Score in specific applications,
suggesting that better feature selection or preprocessing could enhance clustering quality. Similarly, research
by Arbelaitz et al. (2013) compared multiple clustering evaluation metrics and highlighted that a lower
Davies-Bouldin Index typically leads to more distinct cluster separations. This indicates that additional
refinements in feature engineering or parameter tuning might improve the clustering outcome in the current
study. Furthermore, studies focusing on Purity Score, such as those by Manning et al. (2008), emphasize that
improving feature representation can significantly enhance cluster homogeneity, which is a key challenge in
this study.
Among the three metrics, the Silhouette Score is the most relevant for assessing overall cluster
quality, as it directly reflects how well data points are assigned to their respective clusters. If the primary
objective is to achieve optimal cluster separation, future improvements should focus on increasing the
Silhouette Score while also considering the Davies-Bouldin Index to ensure well-separated clusters.
Additionally, working to enhance the Purity Score through refined feature selection or alternative clustering
techniques could lead to better overall clustering performance.
To enhance the clustering results in this study, several recommendations can be considered. First,
selecting models that yield a higher Silhouette Score would improve clustering efficiency. Second, ensuring
that the Davies-Bouldin Index is minimized would help reduce cluster overlap and improve separation.
Lastly, refining the feature selection process and considering alternative clustering techniques, such as
hierarchical clustering or DBSCAN, may contribute to a higher Purity Score, leading to more homogeneous
and meaningful cluster formations. By integrating these considerations with insights from previous research,
the study's clustering methodology can be refined for better segmentation accuracy.
4.

CONCLUSION
Based on the results obtained from the study on optimizing customer segmentation in the retail
industry using the K-Medoid algorithm. The Silhouette Score obtained was 0.394, indicating that most of the
data points are well-clustered, although there is still room for improvement. A value closer to 1 would
indicate better cluster separation. The Davies-Bouldin Index value was 0.844, suggesting some overlap
between clusters, meaning the separation of clusters is not optimal. Lower values would be preferable. The
Purity Score obtained was 0.212, indicating that the clusters formed are less homogeneous and contain a mix
of categories.
Based on the evaluation results, it is recommended to, Look for models with a higher Silhouette Score
for better cluster separation, consider the Davies-Bouldin Index value to ensure that clusters are wellseparated, work on improving the Purity Score by considering better feature selection or alternative
clustering techniques.
REFERENCES
[1] S. Ghaida Muthmainah and A. Id Hadiana, “Comparative Analysis of K-Means and K-Medoids
Clustering in Retail Store Product Grouping,” International Journal of Quantitative Research and
Modeling, vol. 5, no. 3, pp. 280–294, 2024.
[2] M. Galih Pradana, R. Dwi Amalia, and K. W. Gusti, “Optimalisasi Segmentasi Pelanggan
Menggunakan Hierarchical Clustering,” Jurnal Teknologi Informasi, vol. 7, no. 2, 2023.
[3] T. A. Pertiwi, M. Afdal, and R. Novita, “Penerapan Algoritma K-Medoids dan FP-Growth dalam
Penentuan Pola Kombinasi Produk Berdasarkan Hasil Segmentasi Pelanggan,” Technology and
Science (BITS), vol. 6, no. 2, 2024, doi: 10.47065/bits.v6i2.5268.
[4] J. Hutagalung, M. Syahril, and S. Sobirin, “Implementation of K-Medoids Clustering Method for
Indihome Service Package Market Segmentation,” Journal of Computer Networks, Architecture and
High Performance Computing, vol. 4, no. 2, pp. 137–147, Jul. 2022, doi: 10.47709/cnahpc.v4i2.1458.
[5] A. A. D. Sulistyawati and M. Sadikin, “Penerapan Algoritma K-Medoids Untuk Menentukan
Segmentasi Pelanggan,” SISTEMASI, vol. 10, no. 3, p. 516, Sep. 2021, doi:
10.32520/stmsi.v10i3.1332.
[6] T. L. Afandi, B. Warsito, and R. Santoso, “Implementasi K-Medoids Dan Model Weighted-Length
Recency Frequency Monetary (W-Lrfm) Untuk Segmentasi Pelanggan Dilengkapi Gui R,” Jurnal
Gaussian, vol. 11, no. 3, pp. 429–438, Jan. 2023, doi: 10.14710/j.gauss.11.3.429-438.

Optimization of Customer Segmentation in the Retail Industry... (Agustin et al, 2025)

774

ISSN(P): 2797-2313 | ISSN(E): 2775-8575
[7]

[8]
[9]

[10]

[11]
[12]

[13]

[14]

[15]
[16]

[17]

[18]
[19]

[20]

R. Azhar, U. Mahdiyah, D. Swanjaya, U. Nusantara, and P. Kediri, “Prosiding SEMNAS INOTEK
(Seminar Nasional Inovasi Teknologi) Analisis Segmentasi Pelanggan Dengan Metode K-Medoids
dan Simple Additive Weighting (SAW) Untuk Menentukan Strategi Pemasaran,” Online, 2024.
N. Mirantika, T. S. Syamfithriani, and R. Trisudarmo, “Implementasi Algoritma K-Medoids
Clustering Untuk Menentukan Segmentasi Pelanggan,” vol. 17, pp. 2614–5405, doi: 10.25134/nuansa.
Z. Wu, L. Jin, J. Zhao, L. Jing, and L. Chen, “Research on Segmenting E-Commerce Customer
through an Improved K-Medoids Clustering Algorithm,” Comput Intell Neurosci, vol. 2022, 2022, doi:
10.1155/2022/9930613.
A. Madani, A. Rahmah, F. Nurunnisa, and A. Elia, “SENTIMAS: Seminar Nasional Penelitian dan
Pengabdian Masyarakat Customer Segmentation at BC HNI 2 Pekanbaru by Applying the K-Medoids
Algorithm and Recency, Frequency, Monetary (RFM) Model Segmentasi Pelanggan pada BC HNI 2
Pekanbaru dengan Menerapkan Algoritma K-Medoids dan Model Recency, Frequency, Monetery
(RFM).” [Online]. Available: https://journal.irpi.or.id/index.php/sentimas
P. A. Windjaya, B. Siregar, and K. Kunci, “(RFMTS) Menggunakan Algoritma K-Medoids
Clustering,” Multidisciplinary Scientific Journal, vol. 2.
S. Ika Murpratiwi, I. Gusti Agung Indrawan, and A. Aranta, “Analisis Pemilihan Cluster Optimal
Dalam Segmentasi Pelanggan Toko Retail,” Jurnal Pendidikan Teknologi dan Kejuruan, vol. 18, no.
2, 2021.
R. Siagian, P. Sirait, and A. Halim, “SISTEMASI: Jurnal Sistem Informasi Penerapan Algoritma KMeans dan K-Medoids untuk Segmentasi Pelanggan pada Data Transaksi E-Commerce The
Implementation of K-Means and K-Medoids Algorithm for Customer Segmentation on E-commerce
Data Transactions.” [Online]. Available: http://sistemasi.ftik.unisi.ac.id
I. S. Afari, “K-Medoids Customer Segmentation Algorithm by Utilizing Customer Relationship
Management,” Journal of Computer Scine and Information Technology, pp. 89–93, Apr. 2023, doi:
10.35134/jcsitech.v9i2.69.
D. Ispandi, “Membangun Sistem Informasi Manajemen Laboratorium Komputer (SILABKOM)
STMIK-AMIK Riau.”
R. Rahmaddeni, M. K. Anam, Y. Irawan, S. Susanti, and M. Jamaris, “Comparison of Support Vector
Machine and XGBSVM in Analyzing Public Opinion on Covid-19 Vaccination,” ILKOM Jurnal
Ilmiah, vol. 14, no. 1, pp. 32–38, Apr. 2022, doi: 10.33096/ilkom.v14i1.1090.32-38.
T. A. Pertiwi, M. Afdal, and R. Novita, “Penerapan Algoritma K-Medoids dan FP-Growth dalam
Penentuan Pola Kombinasi Produk Berdasarkan Hasil Segmentasi Pelanggan,” Technology and
Science (BITS), vol. 6, no. 2, 2024, doi: 10.47065/bits.v6i2.5268.
E. Hermika and S. Zuhri Harahap, “Application Of Data Mining In Selecting Superior Products Using
The K-Means And K-Medoids Algorithm Methods”.
Y. Diana et al., “Analisa Penjualan Menggunakan Algoritma K-Medoids Untuk Mengoptimalkan
Penjualan Barang,” JOISIE Journal Of Information System And Informatics Engineering, vol. 7, no. 1,
pp. 97–103, 2023.
S. Ika Murpratiwi, I. Gusti Agung Indrawan, and A. Aranta, “Analisis Pemilihan Cluster Optimal
Dalam Segmentasi Pelanggan Toko Retail,” Jurnal Pendidikan Teknologi dan Kejuruan, vol. 18, no.
2, 2021.

MALCOM - Vol. 5 Iss. 3 Juli 2025, pp: 766-775

775