Zeta Ae Math Journal Volume 10 No. November 2025, pp. E-ISSN: 2579-5864 P-ISSN: 2459-9948 D https://doi. org/10. 31102/zeta. A District/City Profiling Based on Poverty Indicators in East Nusa Tenggara Using the Centroid Linkage Algorithm Andre Tri Rian Dani1*. Yossy Candra2. Fachrian Bimantoro Putra3. Meirinda Fauziyah4 Program Studi Statistika. Jurusan Matematika. FMIPA. Universitas Mulawarman. Samarinda. Indonesia Statistisi. Sekretariat Jenderal. Kementerian Agama. Jakarta. Indonesia BNI Staff Banking. Future Relationship Manager. BNI. Jakarta. Indonesia *Corresponding AuthorAos Email: andreatririandani@fmipa. ABSTRACT Poverty is a complex multidimensional phenomenon that significantly impacts human life. Poverty has always been a problem that the government has discussed regionally, centrally, and internationally. The issue of poverty is interesting to approach and analyze using a statistical approach, namely cluster analysis. Cluster analysis is used to group objects based on their level of similarity. In this research, the algorithm used is the Centroid Linkage Algorithm. The Centroid Linkage algorithm was chosen based on its advantages in the grouping process. Distance similarity measurement uses Squared Euclidean. The data used are district/city poverty indicators in East Nusa Tenggara Province. The analysis results show that two optimal clusters were obtained with their distinguishing characteristics. Hopefully, the results of this analysis can be used as a reference in formulating policies for alleviating poverty Keyword: Cluster Analysis. Centroid Linkage. Poverty. Squared Euclidean Article info: Received: May 3, 2025 Accepted: October 12, 2025 Revised: June 6, 2025 Available online: October 20, 2025 How to cite this article: Dani. Candra. Putra. , & Fauziyah. A District/City Profiling Based on Poverty Indicators in East Nusa Tenggara Using the Centroid Linkage Algorithm. Zeta - Math Journal, 10. , 81-91. https://doi. org/10. 31102/zeta. A 2025 The Author. Published by the Mathematics Study Program. Universitas Islam Madura. Indonesia. This article is an open access article under the CC BY-NC-ND license . ttps://creativecommons. org/licenses/by-nc-nd/4. 0/), which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial, and no modifications or adaptations are made. Received: D Accepted: J A District/City Profiling Based on Poverty Indicators in East Nusa Tenggara Using the Centroid Linkage Algorithm Andre Tri Rian Dani, et al. Introduction Cluster analysis is a technique employed to categorize data according to their resemblances. This method employs the data's information to group objects together, resulting in the formation of a single cluster with a high degree of similarity . within the cluster (Hikmah, et. , 2. On the other hand, objects in different clusters will show apparent differences (Han. Kamber. & Pei. , 2. This article employs cluster analysis to categorize districts/cities in East Nusa Tenggara Province according to poverty indicators. The employed algorithm is the Centroid Linkage algorithm. The selection of this algorithm was based on its merits, as it yields superior cluster outcomes compared to the Single Linkage and Complete Linkage algorithms. This is represented because, in the process, the Centroid Linkage algorithm combines clusters based on the average between all points in the cluster (Widodo. , et. , 2. so that the resulting cluster will be more centralized and have lower variation. Apart from that, the Centroid Linkage algorithm is more stable in the presence of outlier data, efficient in the grouping process, and produces dendrograms that are easier to interpret. Various prior studies have been conducted by researchers using the Centroid Linkage algorithm. Table 1. Previous Research Article Title Cluster Analysis with Outlier Data using Centroid Linkage and KMeans Clustering for Grouping HIV/AIDS Indicators in Indonesia Writer Silvi, 2018 Analysis Results The research results show that the number of clusters formed was 7, with the Centroid Linkage algorithm producing more homogeneous groupings than K-Means. Grouping Provinces Based on Internet Network Quality Using the Centroid Linkage Method Dzikrullah, 2022 The cluster analysis research has yielded the conclusion that there are 4 distinct clusters. The provinces of Papua and West Papua are part of clusters that exhibit a tendency towards low internet network Implementation of the Centroid Linkage and K-Medoids Algorithms in Grouping Districts/Cities in South Sulawesi based on Education Indicators Raja. Tinungki. Sirajang, 2024 The research results show that the Centroid Linkage algorithm produces 2 clusters with a minimum ratio of standard deviation within and between clusters, so the Centroid Linkage algorithm is the best algorithm. Cluster analysis with the Centroid Linkage algorithm is applied to poverty problems. Poverty is a complex and crucial problem faced by various countries, including Indonesia. Poverty is a multidimensional problem measured by material deprivation and includes other aspects such as education, health, and access to essential This is because poverty is a crucial problem and must be handled seriously, so efforts are required to overcome it comprehensively. This study aims to move beyond single-dimension poverty metrics by clustering East Nusa TenggaraAos districts and cities with the centroid-linkage method, thereby capturing the full spectrum of income, education, health, and infrastructure deprivation. The profiling of districts/cities based on poverty indicators in East Nusa Tenggara becomes reference material that can be used in formulating policies for alleviating the problem of poverty. Zeta Ae Math Journal. Vol. No. 2, pp. 81 - 91. November, 2025. Literature Review 1 Data Mining Data mining identifies exciting information hidden in large data sets stored in databases, warehouses, or other data storage sources (Tan. Steinbach. & Kumar. , 2. Data mining is commonly referred to as knowledge discovery in databases (KDD) (Nanang. & Susanti. , 2. KDD, or Knowledge Discovery in Databases, is a systematic process that involves gathering and examining past data in order to identify patterns, relationships, or regularities within large datasets, with the goal of generating valuable insights (Santosa. , 2. The process comprises three primary stages: initial data exploration, model construction and validation, and implementation of the created model (Prasetyo. , 2. 2 Cluster Analysis Cluster analysis is the data mining technique employed to categorize a collection of objects into multiple clusters based on their similar attribute characteristics. The objective is to create clusters in which objects that share identical characteristics are grouped together, while objects that differ from each other are placed in separate clusters. The proximity of a cluster is directly proportional to the degree of similarity within it, and inversely proportional to the dissimilarity from other clusters. According to Hair et al. , cluster analysis is considered good if the clusters fulfill the following High level of homogeneity among members of the same cluster . ithin-cluste. High level of heterogeneity between one cluster and another . etween-cluste. There are two main methods in cluster analysis, namely non-hierarchical methods and hierarchical methods (Dewi. Syaripuddin & Hayati. , 2. 3 Hierarchy Method The grouping technique in the hierarchical method uses agglomerative techniques or divisive techniques. The hierarchical method in cluster analysis forms a multilevel or hierarchical structure like a tree because grouping is done in stages or levels. The results of grouping using the hierarchical method are usually presented as a dendrogram. A dendrogram is an advantage of the hierarchical method, as it visually represents the steps in cluster analysis. It shows how clusters are formed and the distance coefficient values at each step (Simamora, , 2. 4 Centroid Linkage Algorithm Centroid Linkage is a cluster analysis method in which the cluster is represented by the average of all objects within it. The distance between two clusters is determined by calculating the distance between the centroids of each cluster. The cluster centroid is the middle value of the observations on the variables in each In this method, the centroid is recalculated every time a new cluster is formed until a stable cluster is formed (Sokal. & Michener. , 1. The advantage of this method is that outliers do not have a significant effect compared to other methods. Therefore, this method is suitable for data containing outliers (Karlita. , 2. Apart from that, this method also effectively reduces within-cluster variance in linkage centroids, which are based on the location of the central point formed in the previous stage (Xu. , et. The distance between two clusters is defined in Equation . yccycOycO. cO) = ycc. cuI1 , ycuI2 ) . The new cluster centroid is formed using the formula in Equation . ycuI = ycA1 = ycA2 : Many objects ycA1 ycuI1 ycA2 ycuI2 ycA1 ycA2 A District/City Profiling Based on Poverty Indicators in East Nusa Tenggara Using the Centroid Linkage Algorithm Andre Tri Rian Dani, et al. The centroid is the arithmetic mean of all individuals within the cluster. When objects are merged, a new centroid is computed such that the centroid will keep changing until a stable group is formed (Sokal. Michener. , 1. The centroid linkage hierarchical clustering algorithm utilizes Euclidean distance (Xu, , et. , 2. among all hierarchical clustering algorithms. 5 Similarity Measures Multiple distance measurements can be employed, among which is the Euclidean distance. Euclidean distance is a highly efficient method for calculating similarity or dissimilarity between objects based on their attributes, particularly when dealing with numerical data. It is commonly used for grouping objects. In addition. Euclidean distance exhibits a strong data clustering structure (Aditya. Sari. & Padilah. , 2. surpasses other distance measurement techniques in effectiveness, and is particularly well-suited for small datasets (Mohibullah. Hossain. & Hasan. , 2. The assumption that must be met when using the Euclidean distance is that all observed variables have no correlation and the variables have the same units (Mongi. , 2. In this research, the distance measure used is Square Euclidean distance. The Square Euclidean distance between two observations can be calculated using Equation . ycy yccycnyc = Oc. cuycnyco Oe ycuycyco ) . ycn, yc = 1,2,3. A , ycu yco=1 yccycnyc ycuycnyco ycuycyco : Square Euclidean distance from observations i and j : The value of the i-th observation on the k-th variable : The value of the j-th observation on the k-th variable 6 Silhouette Coefficient (SC) The silhouette coefficient is a technique employed to assess the quality and robustness of clusters, and it quantifies the extent to which an object is appropriately assigned to a cluster (Shoolihah. Furqon. Widodo. , 2. The silhouette coefficient method is a combination of two methods: the cohesion method, which quantifies the proximity between objects within a cluster, and the separation method, which quantifies the distance between a cluster and other clusters. The steps involved in computing the silhouette coefficient are as follows (Handoyo. Mangkudjaja. & Nasution. , 2. Calculate the average observation with all other observations that are in the same cluster using Equation . = Oc yc OOya,ycOycn ycc. cn, y. Oe 1 Calculate the average distance of the observation to all other observations in other clusters, then take the minimum value using Equation . cn, y. = Oc yc OO ya ycc. cn, y. Calculate the silhouette coefficient value using Equation . Oe yca. , yca. } The silhouette coefficient yields values ranging from -1 to 1. A higher value approaching 1 indicates a more optimal grouping of data within a single cluster. On the other hand, if the value is close to -1, it is considered detrimental to group the data into one cluster. = Zeta Ae Math Journal. Vol. No. 2, pp. 81 - 91. November, 2025. Research Methods 1 Data Sources and Research Variables The Central Statistics Agency (CSA) is an institution tasked with providing official statistical data in Indonesia. This research's data source comes from the Official Publication of the East Nusa Tenggara (NTT) Provincial Central Statistics Agency for 2022. The unit of observation in this research is the Regency/City. The variables used in this research are: : Human Development Index (HDI) : Percentage of Poor Population (%) : Open Unemployment Rate (%) : Average Years of Schooling (Yea. : Percentage of population aged 15 years and over who are studying at university (%) : Percentage of population aged 15 years and over who are illiterate (%) : Economic Growth Rate (%) : Life Expectancy (Yea. The eight variables used are represented as poverty indicators that can be used for profiling. 2 Data Analysis The steps taken in data analysis are presented as follows: Carry out descriptive statistical analysis to see a general overview of the research data. Multicollinearity checking with Pearson correlation. Grouping data using the centroid linkage algorithm with the following stages: Standardize data with ycsycycaycuycyce . Calculate the Square Euclidean distance for all observations using Equation . Combining observations that have the smallest distance, then calculating the average between the combined observations using Equation . , thus forming a new data set. Repeat steps . until the number of clusters formed is only 1 cluster. Calculate the silhouette coefficient using Equations . , . , and . to find out the most optimal grouping. The clusters tested were 2, 3, 4 and 5 clusters. Profiling and interpretation of cluster results. Results and Discussion In the sub-chapter, the results and discussion will be presented regarding descriptive statistics, multicollinearity checks, grouping using the Centroid Linkage algorithm, and the final results in profiling districts/cities in East Nusa Tenggara Province. Descriptive Statistics Descriptive statistics is an analytical technique used to summarize, present, and describe data in a form that is easy to read and understand. The goal is to provide helpful information about the data and assist in decision-making. In this article, the descriptive statistics used are the Minimum. Maximum, and Average values , which are presented in Table 2. Table 2. Descriptive Statistics Variables ycu1 Minimum Maximum Mean ycu2 ycu3 ycu4 A District/City Profiling Based on Poverty Indicators in East Nusa Tenggara Using the Centroid Linkage Algorithm Andre Tri Rian Dani, et al. Variables ycu5 Minimum Maximum Mean ycu6 ycu7 ycu8 Multicollinearity Check Multicollinearity is a condition with a robust linear relationship between two or more variables. In cluster analysis, multicollinearity can be a problem because it can cause distortion in the grouping results and make interpretation difficult. Identifying multicollinearity problems can use heatmap correlation visualization. Pearson correlation results visualized through heatmap correlation visualization can be seen in Figure 1. Correlation Variabel2 Variabel1 Figure 1. Heatmap correlation visualization Based on Figure 1. , the heatmap correlation does not show red cells. it can be concluded that there is no strong linear relationship between the research variables. This indicates that there is no significant multicollinearity between these variables. Centroid Linkage Algorithm Sumba_Barat_Daya Sumba_Barat Timor_Tengah_Selatan Malaka Sumba_Tengah Lembata Rote_Ndao Sumba_Timur Sikka Kupang Belu Ende Timor_Tengah_Utara Manggarai_Timur Manggarai_Barat Ngada Manggarai Flores_Timur Alor Nagekeo Kota_Kupang Sabu_Raijua 0 10 20 30 40 50 60 Height Centroid linkage is a technique employed in hierarchical clustering. Hierarchical clustering is a method that seeks to construct a hierarchical structure of clusters. Centroid linkage is a method that determines whether two clusters should be merged based on the distance between their centroids, which are the central points of each cluster. The grouping results visualized through a dendrogram using the centroid linkage algorithm can be seen in Figure 2. Cluster Dendrogram hclust (*, "centroid") Figure 2. Dendrogram of Grouping Results Based on Figure 2, it can be seen that Kupang City is spread far from other regencies/cities. The geographical separation of Kupang City from different towns/regencies can be influenced by various factors, including its status as the dominant government or economic center, better infrastructure, quality education and health facilities, and cultural and entertainment attractions. Geographic conditions and government policies can also be necessary in determining population distribution. Zeta Ae Math Journal. Vol. No. 2, pp. 81 - 91. November, 2025. Silhouette Coefficient Value The method for measuring how optimal the groupings are is using the silhouette coefficient and the clusters tried, namely 2, 3, 4 and 5 clusters which can be seen in Table 3. Table 3. Silhouette Test Number of Clusters Silhouette Score Table 3. reveals that the maximum silhouette coefficient value is 0. 785, indicating the presence of 2 Based on the variables analyzed using the centroid linkage algorithm, it is determined that there are 2 clusters formed when grouping districts/cities in East Nusa Tenggara Province. Profiling Grouping Results Sumba_Barat_Daya Timor_Tengah_Selatan Malaka Sumba_Barat Lembata Sumba_Tengah Sumba_Timur Kupang Rote_Ndao Belu Sikka Ende Timor_Tengah_Utara Manggarai_Barat Manggarai_Timur Ngada Manggarai Flores_Timur Alor Nagekeo Kota_Kupang Sabu_Raijua 10 20 30 40 50 60 Height To illustrate how the 22 districts/cities in East Nusa Tenggara coalesce into the optimal number of clusters determined earlier, we plotted the hierarchical tree generated by the centroid-linkage algorithm. The resulting dendrogram in Figure 3 arranges the units along the horizontal axis, while the vertical axis indicates the rescaled distance . r AuheightA. at which successive agglomerations occur. By reading the branch cut at the chosen height, one can observe which districts merge first, which remain isolated the longest, and how the Cluster Dendrogram final cluster structure is formed. Figure 3. Dendrogram of Optimal Grouping Results Furthermore, the grouping results obtained based squared on optimal clusters can be seen through the cluster plot hclust (*, "centroid") in Figure 4. A District/City Profiling Based on Poverty Indicators in East Nusa Tenggara Using the Centroid Linkage Algorithm Andre Tri Rian Dani, et al. Figure 4. Cluster Plot Optimal Grouping Results Complete details of each Regency/City in Figures 3. are detailed in Table 4. Table 4. Optimal Cluster Grouping Clusters Cluster 1 Number of Members Regency/City Kota Kupang Cluster 2 Flores Timur. Ngada. Manggarai. Manggarai Barat. Nagekeo. Timor Tengah Utara. Ende. Alor. Manggarai Timur. Sumba Timur. Lembata. Rote Ndao. Kupang. Malaka. Belu. Sikka. Sumba Tengah. Sumba Barat. Sabu Raijua. Timor Tengah Selatan. Sumba Barat Daya The analysis of the two clusters reveals distinct characteristics for each cluster, as determined by the variables examined. Table 5. displays the attributes of these clusters, using mean values. Table 5. Characteristics of Cluster Results Variables Mean ycu1 Cluster 1 Cluster 2 ycu2 ycu3 ycu4 ycu5 ycu6 ycu7 ycu8 Table 5. reveals that cluster 1 represents a city characterized by a high Human Development Index (HDI), low open unemployment rate, significant average years of schooling, a notable percentage of the population aged 15 years and above pursuing higher education, and a long-life expectancy. On the other hand, cluster 2 Zeta Ae Math Journal. Vol. No. 2, pp. 81 - 91. November, 2025. represents a Regency/City cluster with a high percentage of poor individuals, a significant percentage of people aged 15 years and above who are illiterate, and a high rate of economic growth. Writing the results and discussion can be separated into different sub-subs or can also be combined into one sub. The summary of the results can be presented in the form of graphs and images. The results and discussion sections must be free from multiple interpretations. The discussion must answer the research problem, support and defend the answer with the results, compare with relevant research results, state the limitations of the research, and find novelties. Conclusion The analysis of the grouping of districts/cities in East Nusa Tenggara Province based on poverty indicators yielded the following conclusions: Silhouette-coefficient diagnostics confirm that the 22 districts/cities of East Nusa Tenggara are best separated into two clusters when centroid-linkage agglomeration is applied to the eight poverty-related The districts/cities in East Nusa Tenggara Province were categorized into 2 clusters based on poverty A Cluster 1 (High-HDI) Represented solely by Kota Kupang, this cluster is distinguished by the provinceAos highest Human Development Index (OO 79. , very low open-unemployment (OO 0. 8 %), long life expectancy (OO 69. , and the best educational metricsAiaverage years of schooling OO 11. 6 and tertiary-enrolment share OO 15. 2 %. A Cluster 2 (Lagging Regencie. Comprising the remaining 21 districts, this group shows markedly lower HDI (OO 63. , shorter schooling (OO 7. 4 year. , higher illiteracy among adults, and more than double the unemployment rate (OO 5. 9 %). Despite a relatively high economic-growth rate, the average poverty headcount remains elevated (OO 22. 3 %). Declaration of AI and AI assisted technologies in the writing process The authors used several AI-assisted tools to improve the quality and accuracy of the manuscript. Specifically. Google Translate was used to support translation between Bahasa Indonesia and English. QuillBot and Grammarly were used for paraphrasing, language refinement, and grammatical improvement. and Mendeley Reference Manager was utilized for citation organization and reference formatting. All outputs generated or suggested by these tools were thoroughly reviewed, revised, and verified by the authors. The authors take full responsibility for the content of this manuscript. CRediT Authorship Contribution Statement Author contributions (CRediT): Andrea Tri Rian Dani. Meirinda FauziyahAiConceptualization. Methodology. Data curation. Formal analysis. Software. Visualization. Project administration. Validation. Writing Ae original Writing Ae review & editing. Supervision. Yossy CandraAiConceptualization. Validation. Writing Ae review & editing. Fachrian Bimantoro PutraAiData curation. Investigation. Resources. Software. Formal Project administration. Declaration of Competing Interest The authors declare no competing financial or personal interests. Acknowledgments This research was funded in part by DIPA BLU-PNBP FMIPA. Mulawarman University. Samarinda. Indonesia [No: 1700 / UN17. 7 / LT/ 2. Data Availability Primary data come from official BPS publications and are publicly accessible. derived data and analysis scripts are available from the corresponding author upon reasonable request. Funding This work was supported by Mulawarman University Andre Tri Rian Dani, et al. A District/City Profiling Based on Poverty Indicators in East Nusa Tenggara Using the Centroid Linkage Algorithm Ethical Approval Not applicable. The study did not involve human or animal subjects and did not access personal data. References