OPEN ACCESS
ISSN 2356-5462
http://socj.
id/ijoict/ Intl.
Journal on ICT Vol.
No.
Dec 2024.
doi: 10.
21108/ijoict.
Regional Mapping Based on Tourism Destinations in West Java: K-Medoid Clustering Analysis Nafis Almajid 1.
Prima Dina Atika 2.
Khairunnisa Fadhilla Ramdhania 3* 1,2,3 Department of Informatics.
Universitas Bhayangkara Jakarta Marga Mulya.
Bekasi.
West Java.
Indonesia * khairunnisa.
fadhilla@dsn.
Abstract The growth of the tourism sector in West Java demands an optimal development strategy.
This study aims to cluster regions in West Java based on the characteristics of their tourist destinations using the K-Medoid algorithm.
This algorithm was chosen because of its superiority in producing optimal clusters and robustness to outliers.
Data on tourist destination characteristics in West Java were analyzed using the K-Medoid algorithm and the Elbow method to determine the optimal number of The evaluation was conducted using the Davies-Bouldin Index.
As a result, three clusters with different characteristics were formed.
The first cluster, "Medium potential and medium achievement", consists of 1 region with unoptimized potential for campsite tourism.
The second cluster, "High potential and moderate achievement", consists of 25 regions with a diversity of attractions and a high number of visits.
Finally, the third cluster, "Medium potential and high achievement", consists of 1 region with popular historical and cultural attractions and high visitation.
The model evaluation showed a Davies Bouldin Index score of 0.
08, indicating good clustering This research is expected to provide insights for the government and related stakeholders to formulate targeted tourism development policies in West Java.
The K-Medoid algorithm helps identify certain patterns, providing deeper insights into regional differences in terms of tourism.
Keywords: Tourism Destinations.
Tourism.
West Java, regional analysis.
K-Medoid
INTRODUCTION
he tourism sector has experienced significant growth in recent times due to the emergence of more tourist destinations in various locations.
Information about these tourist spots, widely spread across various media, has made it easier for visitors to access.
Every year, the Indonesian Central Statistics Agency (BPS) periodically releases data on the development of tourism in Indonesia.
This data is presented in both raw and infographic formats that are easy for the public to understand.
The purpose of presenting data in infographic form is to help the public absorb information related to the state of tourism in the country.
The Central Statistics Agency regularly publishes data on Tourism Development in Indonesia every quarter.
In the first quarter of 2024, the number of foreign tourist arrivals increased by 24.
85% compared to the first quarter of 2023, and the number of domestic tourist visits rose by 19.
78% from the first quarter of 2023.
However, there has not yet been a comprehensive regional grouping based on the similarities in the characteristics of tourist destinations .
The arrival of foreign tourists to Indonesia can increase foreign exchange earnings and stimulate economic growth in tourist areas.
For foreigners wishing to visit Indonesia.
West Java is one of the best places to visit.
Received on 19 Nov 2024.
Revised on 16 Dec 2024.
Accepted and Published on 11 Mar 2025.
NAFIS ALMAJID ET AL.
REGIONAL MAPPING BASED ON TOURISM DESTINATIONS IN WEST JAVA: K-MEDOID CLUSTERING ANALYSIS
The K-Medoid algorithm can be used to cluster tourism data based on its characteristics.
The results of this clustering allow the government and relevant stakeholders to formulate more accurate and effective policies tailored to the specific characteristics and needs of each group .
II.
LITERATURE REVIEW
According to the Tourism Law No.
10 of 2009, nature, flora, fauna, ancient relics, history, art, and culture are blessings from God Almighty.
Therefore, they can be utilized as assets in tourism development and serve as resources that are systematically and sustainably managed while preserving religious, cultural, and environmental values.
This is aimed at achieving national development goals with a community-oriented approach .
Clustering is a reliable data mining method and valid tool for addressing complex problems in computer science and statistics .
Clustering works by partitioning objects within a dataset into homogeneous clusters .
The clustering analysis commonly used in research includes the K-Means and K-Medoid algorithms.
The advantage of K-Medoid is its abiity to produce optimal clusters that are robust to outliers, particularly when compared to the K-Means algorithm .
Given these considerations, the researcher has conducted a study entitled "Regional Mapping Based on Tourist Destinations in West Java: A K-Medoid Clustering Analysis.
K-Medoid is a clustering technique that uses representative objects .
as representatives for the cluster center point.
This algorithm uses an iterative and greedy method.
Like the K-Means algorithm, the initial representative objects are selected randomly.
The iterative process of replacing representative objects with nonrepresentative objects continues until the quality of the resulting grouping cannot be improved by any This quality is measured by the deviation value between the new total distance value and the previous total distance value .
RESEARCH METHOD
This study adopts the Cross-Industry Standard Process for Data Mining (CRISP-DM) framework.
CRISPDM was first introduced in the late 1990s by four major companies in Europe: Integral Solution Ltd.
provider of commercial data mining solution.
NCR .
database provide.
DaimlerChrysler .
n automotive manufacture.
, and OHRA .
n insurance compan.
The CRISP-DM model offers several advantages, including an organized and systematic structure, comprehensive coverage of each stage in the knowledge extraction process, flexibility in application across different projects, and efficiency in developing useful models .
Business Understanding At this step, regional clusters within West Java Province are formed based on tourist destinations using the K-Medoid algorithm.
Thus, the results of this study can be used to provide recommendations to the government and local authorities regarding marketing strategies and evaluations.
Data Understanding The data used in this study includes visitor counts and camping facility data, visitor counts and museum facility data, and visitor counts and homestay facility data, all of which were downloaded from the official Open Data Jabar website in .
csv format.
The dataset consists of six files, with a total of 1,158 records.
Data Preparation Data management was conducted using Google Colab with the Python 3.
12 programming language.
The data used were cleaned by reducing or correcting inconsistencies to refine the dataset for the Modeling stage.
The steps involved in Data Cleaning are as follows:
INTL.
JOURNAL ON ICT VOL.
NO.
DEC 2024
Dropping rows of data that are not relevant.
Performing data reduction through feature engineering.
Data integration by merging the entire dataset used.
Modelling This study utilizes the K-Medoid algorithm in its modeling process.
The K-Medoid algorithm uses a medoid as the center of the cluster to form groups of numerical data points that exhibit high similarity .
Modeling with the K-Medoid algorithm aims to identify patterns and group data into clusters based on similar The steps in the K-Medoid method include:
Determining the number of clusters (K) using the Elbow Method as an estimate for the optimal number of clusters .
, with the following equation ycu ycIycIya = Oc.
cuycn Oe yayco )2 .
ycn=1 .
The distance between each data point and its assigned cluster center is calculated using the Euclidean Distance formula .
, given by:
ycu ycc.
cuycn , ycyc ) = oc.
cuycn,yco Oe ycycn,yco )2 .
yco=1 .
Randomly select an object from each cluster as the new medoid.
Calculate the Euclidean distance of each object in the cluster to the candidate new medoid.
Compute the total new distance and subtract the total old distance to calculate the total deviation .
cI).
If ycI is less than 0, swap the object with the cluster data to form a new set of yco objects as the new medoid.
Repeat steps 3 to 5 until no change occurs in the medoid, resulting in the final clusters and their members .
Evaluation At this step, the modeling process is evaluated for its accuracy in achieving business objectives using the Davies-Bouldin Index, followed by an assessment based on the obtained results.
The Davies-Bouldin Index measures the effectiveness of the clustering process by evaluating how well the data clusters are formed based on the attributes or features contained within the data .
The formula for the Davies-Bouldin Index is as yco yayaAya = Oc max ycI.
cn, y.
ycnOyc ycn=1 IV.
RESULTS
This study utilizes the K-Medoid algorithm for the modeling process.
To achieve the result in the form of clusters, the algorithm involves several key steps.
These steps include determining the number of K .
to be used.
Next, a medoid is selected randomly, and the Euclidean distance of each data point to the medoid is Then, an object within each cluster is chosen as the new medoid, and the Euclidean distance between each data point and the new medoid is recalculated.
The data clusters are determined based on the minimum NAFIS ALMAJID ET AL.
REGIONAL MAPPING BASED ON TOURISM DESTINATIONS IN WEST JAVA: K-MEDOID CLUSTERING ANALYSIS
value of the total new distance, subtracted by the total old distance.
This process is repeated iteratively until the medoid no longer changes.
Elbow Method This study employs the elbow method to determine the optimal number of clusters.
This method operates by running iterations from ya = 1 to ya = ycu to identify the "elbow" point on a graph visualizing the Within-Cluster Sum of Squares (WCSS) value.
The WCSS value represents the total squared distance between each data point and its respective cluster center .
, determined by the value of ya.
The "elbow" point on the graph indicates the most optimal number of clusters for the analyzed data.
Fig.
Elbow Method Visualization.
The elbow method graph visualization in Fig.
1 shows that an increase in the number of clusters .
correlates with a decrease in the distance between data points and their respective medoids.
For instance, when ya = 2, the WCSS value is 1,510,059.
However, when ya = 3, the WCSS value decreases to 811,420.
indicating that the distance between data points and their medoids reduces as the number of clusters increases.
The term "elbow method" is derived from the process of determining the optimal K value, which is done by visually identifying the "elbow" point that resembles a right angle on the resulting line chart.
As shown in the Fig.
1, this can be observed clearly.
K-Medoid Modelling The modeling process begins by determining the number of clusters.
The next stage is medoid initiation, where several data are selected randomly from the dataset used.
In this study, three medoids were randomly selected from a previously processed dataset, the data selected were given in TABLE I.
TABLE I
MEDOID DETERMINATION
Regency/City Majalengka Bogor Tasikmalaya Medoid The medoids randomly selected from the processed dataset will be used as the medoids for Iteration 1.
The distances from each data point to these medoids will then be calculated using the Euclidean distance formula, starting from the first data point to medoids 0, 1, and 2, and continuing to the nth data point for medoids 0, 1.
For example, the distance calculation for the first data point.
Kabupaten Bogor, and the second data point.
Kabupaten Sukabumi, for medoids 0, 1, and 2 can be performed.
Specifically, the calculation for the first data INTL.
JOURNAL ON ICT VOL.
NO.
DEC 2024
Kabupaten Bogor, to Kabupaten Majalengka as medoid 0 .
a0 ) using Equation .
, serves as the following illustrative example.
cu1 , yc0 ) = Oo.
cu1,1 Oe yc0,1 ) .
cu1,2 Oe yc0,2 ) U .
cu1,27 Oe yc0,27 ) = Oo.
Oe .
Oe .
2 U .
Oe 1,.
2 = 30784,5215.
Similarly, it is calculated using the Euclidean distance formula.
For the first data point.
Kabupaten Bogor, the calculation shows 30.
784,5215 to medoid ya0 , 0 to medoid ya1 , and 2.
482,154105 to medoid ya2 .
For the second data point, the distances are 286.
043,7964 to ya0 , 284.
523,702 to ya1 , and 283.
654,9796 to ya2 .
The Euclidean distance calculation is then performed for data points 4 through 27.
The next step involves selecting the minimum value among the distances to ya0 , ya1 , and ya2 for each data point.
In the second iteration, the first step involves randomly selecting a non-medoid data point to undergo the Euclidean distance calculation process, similar to the procedure in the first iteration.
The selected non-medoid data points are given in TABLE II.
TABLE II
DETERMINING THE MEDOID FOR THE SECOND ITERATION.
Regency/City Majalengka Bogor Bogor Baru Medoid The newly selected medoids are subsequently used to perform calculations identical to those conducted in the first iteration.
The next step involves calculating the total cost or deviation, obtained by subtracting the total Euclidean distance of the previous iteration from the total Euclidean distance of the current iteration.
This value is then used to guide the decision-making process for the subsequent iteration.
The goal is to identify the iteration with the total cost value closest to zero, based on the calculations and results of total deviation Oe297.
359,3712.
The total cost value obtained by subtracting the results of the second iteration from the first iteration indicates that the total cost remains too low, below 0.
Therefore, it can be concluded that further iterations, starting from the third, are necessary to achieve a total deviation value closer to 0 or convergence.
In this study, the calculation converged in the fourth iteration, with the results detailed in TABLE i.
TABLE i LAST ITERATION RESULT.
Regency/City Bogor
Sukabumi Cianjur
Bandung
Garut
Tasikmalaya
Ciamis Kuningan Cirebon
Majalengka
Sumedang
Indramayu
Subang
Purwakarta
Karawang
Bekasi ycya
523,702
523,6782
507,7303
222,3374
273,0301
267,2562
139,9745
511,888
043,7964
573,0165
554,8326
070,9788
983,6611
720,0566
546,0503
523,7017
55,83009941
534,72675
079,973733
063,17697
374,35948
903,88978
212,520515
784,52204
602,56369
660,173954
492,08498
846,5825
846,19488
060,515784
959,8598
628,3297
959,8622
209,5025
492,3473
094,4932
535,0169
143,1087
411,7647
275,9625
654,4903
338,4313
855,177
361,6424
311,2844
993,6236
55,83009941
534,72675
079,973733
063,17697
374,35948
903,88978
212,520515
784,52204
602,56369
660,173954
492,08498
846,5825
846,19488
060,515784
NAFIS ALMAJID ET AL.
REGIONAL MAPPING BASED ON TOURISM DESTINATIONS IN WEST JAVA: K-MEDOID CLUSTERING ANALYSIS
Regency/City Bandung Barat
Pangandaran Bogor City Sukabumi City Bandung City Cirebon City Bekasi City Depok City Cimahi City Tasikmalaya City Banjar City ycya
601,424
522,1532
628,3297
400,9778
523,5889
285,6093
521,362
523,7017
073,0144
654,9796
523,7028
853,379195
959,8598
931,438159
200,706253
729,96492
400,005
220,823902
482,153903
1,414213562
015,1428
960,3407
382,2054
959,9058
286,6256
803,1979
959,8598
961,8515
969,5112
959,8599
853,379195
556,6264457
931,438159
200,706253
729,96492
400,005
220,823902
482,153903
1,414213562
Clustering Result The results of the modeling process conducted earlier can be further analyzed to determine the categories of the clusters obtained through Exploratory Data Analysis (EDA).
These analytical findings can then be used to define the categories of the clusters as follows:
Cluster 0: Moderate Potential and Moderate Performance This cluster is characterized by underutilized potential in camping tourism and museum tourism, as indicated by relatively low visitor numbers for these attractions.
Significant efforts are needed in infrastructure development, promotion, and diversification of tourist attractions.
However, there is growth potential, as evidenced by a relatively high number of domestic visitors, particularly for homestay accommodations.
Cluster 1: High Potential and Moderate Performance This cluster exhibits a diversity of tourist attractions and moderate visitor numbers.
Efforts should focus on improving service quality, implementing innovative marketing strategies, and ensuring sustainable environmental management.
Cluster 2: Moderate Potential and High Performance This cluster features popular historical and cultural attractions with high visitor numbers and relatively significant economic impacts, particularly in the Meeting.
Incentive.
Convention, and Exhibition (MICE) To maximize its potential, efforts should include expanding market segmentation and fostering innovation and creativity.
Next, a function will be created to map cluster values to their respective categories: Cluster 0 will be categorized as AuModerate Potential and Moderate Performance,Ay Cluster 1 as AuHigh Potential and Moderate Performance,Ay and Cluster 2 as AuModerate Potential and High Performance.
Ay This function will then be applied to the cluster column, and the results will be stored in a new column named AoCluster Description.
Ao TABLE IV
TABLE IV
DATASET RESULTS WITH LABELS.
Regency/City Bogor Sukabumi Cianjur Bandung Garut Tasikmalaya Ciamis Kuningan Cirebon Majalengka Cluster Label High Potential and Moderate Performance Moderate Potential and Moderate Performance High Potential and Moderate Performance High Potential and Moderate Performance High Potential and Moderate Performance High Potential and Moderate Performance High Potential and Moderate Performance High Potential and Moderate Performance High Potential and Moderate Performance High Potential and Moderate Performance INTL.
JOURNAL ON ICT VOL.
NO.
DEC 2024
Regency/City Sumedang Indramayu Subang Purwakarta Karawang Bekasi Bandung Barat Pangandaran Bogor City Sukabumi City Bandung City Cirebon City Bekasi City Depok City Cimahi City Tasikmalaya City Banjar City Cluster Label High Potential and Moderate Performance High Potential and Moderate Performance High Potential and Moderate Performance High Potential and Moderate Performance High Potential and Moderate Performance High Potential and Moderate Performance High Potential and Moderate Performance High Potential and Moderate Performance Moderate Potential and High Performance High Potential and Moderate Performance High Potential and Moderate Performance High Potential and Moderate Performance High Potential and Moderate Performance High Potential and Moderate Performance High Potential and Moderate Performance High Potential and Moderate Performance High Potential and Moderate Performance Evaluation This evaluation is conducted by measuring the proximity between data points, the distance of data points to respective cluster centers, and the distances between clusters.
To measure the quality of the resulting model, this study employs the Davies-Bouldin Index (DBI) as the evaluation metric.
The DBI evaluates the performance of the clustering model by measuring the degree of separation between clusters.
A lower DBI score, closer to 0, indicates better-separated clusters, reflecting a more optimal model.
The steps for calculating the DBI are as follows:
One method for measuring the similarity or closeness of data points within a cluster to its center is by calculating the Sum of Squares Within Cluster (SSW), also known as the cohesion value.
This calculation is performed using the following equation.
ycIycIycO = Oc ycc.
cuycn , ycyc ) yco ycn=1 For example, calculating the distance of the 1st data to the medoid cluster ycc.
cuycn , ycyc ) = oc.
cuycn,yco Oe ycycn,yco ) yco=1 = Oo.
cu1,1 Oe yc0,1 ) .
cu1,2 Oe yc0,2 ) U .
cu1,27 Oe yc0,27 ) = Oo.
Oe .
Oe .
2 U .
Oe .
2 = 284523.
The calculation above is then applied to all data points in the dataset.
The next step is to determine the SSW value for each cluster by calculating the average distance of all cluster members from their respective cluster ycIycIycO0 = = 0 3 55.
83009941 U 1.
ycIycIycO1 = = 21075.
ycIycIycO2 = = 0 .
The next step is to calculate the Sum of Squares Between Clusters (SSB).
The SSB value reflects the degree of separation between clusters, using distance formula .
NAFIS ALMAJID ET AL.
REGIONAL MAPPING BASED ON TOURISM DESTINATIONS IN WEST JAVA: K-MEDOID CLUSTERING ANALYSIS
ycIycIyaAycn,yc = ycc.
cycn , ycyc ) .
As follows, ycIycIyaA0,1 = Oo.
Oe .
Oe .
2 U .
Oe .
2 ycIycIyaA0,1 = 284523.
the calculation above is performed for each cluster.
In this study, which uses three clusters, the calculation is repeated for all clusters, yielding the results showing in TABLE 5.
TABLE 5
SBB CALCULATION RESULTS
SSB
Once the SSW and SSB values are obtained, the next step is to calculate the ratio to compare the variability within clusters with the variability between clusters.
A smaller ratio value indicates better clustering, as it reflects more homogeneous clusters that are clearly separated.
The ratio is calculated using the formula .
ycIycIycOycn ycIycIycOyc ycIycn,yc = .
ycIycIyaAycn,yc Thus, the following results are obtained:
ycI1,2 = = 0.
ycI1,3 = 86509 0 ycI2,3 = = 0.
The calculated ratio values are then used to compute the DBI value using formula .
yayaAya = .
= 0.
The obtained DBI score can be considered a favorable result, as it is close to 0.
The evaluation process is conducted again to confirm that yco = 3 is the optimal number of clusters, using the Davies-Bouldin Index for several potential cluster configurations.
TABLE V
DBI EVALUATION RESULTS.
yeU=yea yco=2 yco=3 yco=4 yco=5 yco=6 yco=7 yco=8 yco=9 yco = 10 Davies Bouldin Index the DBI can be observed from TABLE V that value for each tested number of clusters shows that yco = 3 has the best DBI value compared to other values of K.
Therefore, it can be confirmed that yco = 3 is the optimal number of clusters in this study.
INTL.
JOURNAL ON ICT VOL.
NO.
DEC 2024
CONCLUSION
The results of the modeling conducted using the K-Medoids algorithm successfully identified patterns in the dataset used, producing a clustering of areas in West Java into 3 clusters.
Kabupaten Sukabumi is a member of cluster 0, while Kabupaten Bogor.
Kabupaten Cianjur.
Kabupaten Bandung.
Kabupaten Garut.
Kabupaten Tasikmalaya.
Kabupaten Ciamis.
Kabupaten Kuningan.
Kabupaten Cirebon.
Kabupaten Majalengka.
Kabupaten Sumedang.
Kabupaten Indramayu.
Kabupaten Subang.
Kabupaten Purwakarta.
Kabupaten Karawang.
Kabupaten Bekasi.
Kabupaten Bandung Barat.
Kabupaten Pangandaran.
Kota Sukabumi.
Kota Bandung.
Kota Cirebon.
Kota Bekasi.
Kota Depok.
Kota Cimahi.
Kota Tasikmalaya, and Kota Banjar are members of cluster 1.
Lastly.
Kota Bogor is a member of cluster 2.
The clustering results also received a good DBI evaluation score with a value of 0.
Other clustering algorithms such as K-Prototype.
Agglomerative Clustering, or DBSCAN can be used to test the accuracy of each model in handling data.
Second, further research development on regional clustering based on tourist destinations in West Java, particularly in examining the results and influencing factors, requires synergy and collaboration across various scientific disciplines.
Third, real-time system implementation can be carried out using the developed model.
REFERENCES