Cyberspace: Jurnal Pendidikan Teknologi informasi Volume 7. Nomor 2. Oktober 2023, hal. 118 - 125 ISSN 2598-2079 . | ISSN 2597-9671 . PRINCIPAL COMPONENT K-MEANS SOFT CONSTRAINT BASED ON WELL-BEING INDICATORS IN ACEH PROVINCE Winny Dian Safitri Department of Economics. Faculty of Islamic Economics and Business. Ar-Raniry State Islamic University. Banda Aceh, 23111 E-mail: winny. diansafitri@ar-raniry. Abstract The success of government policies can be from the state of the well- being This research was conducted to obtain district/city groupings based on the similarity of characteristics of the well-being indicators of each district/city in Aceh Province in 2022. The data used in the Aceh well-being indicator data for 2022 consists of 29 variables. The analysis method used is the principal component kmeans soft constrain method. The background information data can be used as a provision to streamline the clustering algorithm by creating soft constraints which is found as the most appropriate algorithm. The results of this study indicate there are four district/city clusters in Aceh Province. The characteristics of the first cluster are that kindergarten and elementary school facilities are adequate, while the school enrollment rate needs to be improved. The characteristics of the second cluster are superior to the Gross Enrollment Rate (GER) and the population of university graduates, but still very lacking in school facilities. The third cluster is the cluster that is the center of well-being in Aceh, so this cluster is the cluster with the best well-being level. The characteristic of the fourth cluster is that it is very good in the school participation rate indicator, but it must increase early childhood school Keywords: Well-being. Clustering. Principal Component. K-means. Constrain Abstrak Keberhasilan kebijakan pemerintah dapat digambarkan dari keberhasilan pembangunan yang tergambarkan dari kesejahteraan masyarakat. Penelitian ini dilakukan untuk memperoleh pengelompokan kabupaten/kota berdasarkan kesamaan karakteristik indikator kesejahteraan khususnya sektor pendidikan masing-masing kabupaten/kota di Provinsi Aceh pada tahun 2022. Data yang digunakan dalam penelitian ini yaitu data indikator kesejahteraan Aceh tahun 2022 terdiri dari 29 variabel. Metode analisis yang digunakan adalah analisis komponen utama K-means dengan batasan. Data informasi latar belakang dapat digunakan sebagai bekal untuk memproses algoritma clustering dengan membuat soft constraint yang ditemukan sebagai algoritma yang paling tepat. Hasil penelitian ini menunjukkan terdapat empat klaster kabupaten/kota di Provinsi Aceh. Karakteristik klaster pertama adalah fasilitas TK dan SD memadai, sedangkan tingkat pendaftaran sekolah perlu ditingkatkan. Karakteristik klaster kedua lebih unggul dibandingkan Gross Enrollment Rate (GER) dan populasi lulusan universitas, namun masih sangat kurang di fasilitas sekolah. Klaster ketiga adalah klaster yang menjadi pusat kesejhateraan di Aceh, sehingga klaster ini merupakan klaster dengan jenjang kesejahteraan terbaik. Karakteristik klaster keempat adalah sangat baik dalam Cyberspace: Jurnal Pendidikan Teknologi Informasi | 118 Winny Dian Safitri indikator tingkat partisipasi sekolah, tetapi harus meningkatkan partisipasi sekolah anak usia dini. Kata Kunci: Kesejahteraan. Klaster. Komponen Utama. K-mean. Batasan Introduction Prosperity is a dream for all countries of the world. In the eyes of a country is said to be prosperous, if the education is good. Education is one aspect that becomes the benchmark in development success. In the education system, there is a transfer of knowledge between humans. According to Law No. 20 of 2003, the definition of education is an effort and planning to create a controlled teaching and learning process by including spiritual abilities to create independent, moral, and knowledgeable humans. Education will be successful if it has a transparent system and facilities that are met. The United Nations Development Programs (UNDP) since 1990 have started to issue annual reports on human development in various countries, namely the Human Development Report. Some of the approaches used in measuring poverty levels include proper education. Several developed countries, such as Indonesia, should already have a sound education system. The Indonesian Central Bureau of Statistics formulates education indicators, which are present the development of Indonesian education over time and compare provinces and areas of residence. Education in Indonesia. Aceh Province, is notably still inferior, from readiness to availability of facilities. Based on the National Examination results issued by the Ministry of Education and Culture of the Republic of Indonesia. Aceh is ranked 27th out of 34 Provinces in Indonesia. It very concerns about the education world in Aceh. The governmentAos efforts to improve the education system are not yet optimal, so it is necessary to have an education strategy in studies with existing data. The study results were included in policymaking to achieve an intelligent community by the AuAceh CarongAy program promoted by the Aceh government to know the obstacles in each district/city in the education system with different levels of difficulty. Research related to factors that influence education, including planning, regulations, human resources, technical, coordination, and procurement of goods and services, affect the realization of the education budget in Aceh Province. The factors that influence the success of district/city education in Aceh must be different, given the regionAos geographic location and the various cultures, so it is necessary to have an analysis that can identify each region. One of the statistical methods used is cluster analysis with the primary condition that there must be no relationship between variables . When analyzing the data, there were several problems, including a high correlation between variables. This high correlation can cause a high bias value from the analysis Principal Component Analysis is a solution to overcome high correlation problems between variables by reducing the dimensions of large variables. The hope is to provide more accurate results from clusters. The use of PCA in k-means to reduce the high dimensional data showed that the PCA can be able to be a solution to produce the better grouping result by using k-means. The other problem that could occur when analyzing the data is missing value. In order to continue the analysis, the missing values must be addressed first. One of the cluster analysis methods that can handle the problem of missing value without imputation is Kmeans Soft Constraint method. Hence, the Principal Component Analysis K-means Soft Constraint method is used in this research. 119 | Cyberspace: Jurnal Pendidikan Teknologi Informasi PRINCIPAL COMPONENT K-MEANS SOFT CONSTRAINT BASED ON WELL-BEING INDICATORS IN ACEH PROVINCE Literature Review Principal Component Analysis The principal component analysis is a reasonably good method for obtaining estimator coefficients in regression equations with multicollinearity problems. The independent variable in principal component regression is a linear combination of the original Z variable called the principal component. This methodAos estimation coefficient is obtained from the shrinkage of the main component dimensions, with the subset of the main components selected having to maintain the most remarkable diversity. The method of eliminating the principal component starts from the procedure for selecting the root feature of an equation: | AX Ae I | = 0 If the root of the feature j is ordered from the most massive value to the smallest value, then the effect of the main component Wj corresponds to j. These components explain the proportion of diversity to the dependent variable Y, which is getting smaller and smaller. The main components of Wj are orthogonal to each other and are formed through a relationship: Wj = v1j Z1 v2j Z2 v3j Z3 A vp j Zp . where p is the number of variables used. The vector vj is obtained from each feature root j which satisfies a homogeneous system of equations: | AX Ae j I | vj = 0 where vj = . 1j ,v2j, v3j,A, vp. There are three methods commonly used to determine the number of main components, namely: If the number of main components produced is q where q O p, then what has been transformed . ain component score dat. has as many variables as q. Suppose the proportion for the root of trait ith is: yuIycn ycy Ocycn=1 yuIycn The determination of the number of principal components . is based on the cumulative proportion of its characteristic roots. If the cumulative proportion of q, the first feature root is 80% or more, then the number of principal components is q . The main componentsAo selection is based on the variety of the main components, which are none other than the root features. According to, after doing a study, the better cut off is 0. The scree plot is a plot between the root features k and k. Using this plot, the number of principal components selected is k. If at that point k, the plot is steep to the left but not steep to the right. K-means Soft Constraints The background information data can be used as a provision to streamline the clustering algorithm by creating soft constraints. Soft constraints are a function made as initial information from members of a group. The use of constraint becomes important for several clustering algorithms. Several researchers have shown that constraints can improve the results of a variety of clustering algorithms. Thus, the soft constraint addition in k-means algorithm becomes beneficial. K-means soft constraint is a developed k-means algorithm that has robustness for grouping a set of data without any imputation process required. Missing value issue on a dataset may limit the use of clustering methods. Hence, a method that resistance for Cyberspace: Jurnal Pendidikan Teknologi Informasi | 120 Winny Dian Safitri clustering the dataset which contains missing values is required. ItAos obtained that kmeans soft constraint on dataset Glass. Wine. Iris, and Breast Cancer outperformed kmeans for all datasets. Beside the robustness for dealing with missing value, k-means soft constraint also showed its performance for dealing with dataset that contains ItAos found that the use of k-means soft constraint results in high accuracy on dataset contains multicollinearity. It is explained that k-means clustering is an algorithm that is most appropriate for grouping with soft constraints, so that soft constraints are used as information in the K-means with soft constraints is done by dividing the data set into two parts, namely the set with complete data variables (F. and the set with incomplete data variables (F. Suppose sc is the symbol for soft constraints. Fm is the set of the incomplete data variable, xim is the item of the ith object of the incomplete data variable m, xjm is the item of the jth object of the missing data variable m, f is an incomplete variable member. The soft constraints of xim and xjm are: sc = Oe Eu . f Oe x f ) f EaFm where sc is always negative. It indicates that one object has different groups. The k-means soft constraints algorithm adopts the steps of the k-means algorithm in dividing k objects into c suitable groups. The stages of the k-means soft constraints algorithm are: Determine the center of the cth band. Determine the member of the cth band by calculating the minimum distance of an object to the kth band to the center of the cth band arg min E cv E EE . Oe . E . cc E v max cvmax EE by calculating the distance from the kth object to the cth center of the complete data variable d is as follows: v = Eu ( x kd Oe ccd ) 2 d =1 ccd= center of the c-th band based on the-d variable. w = weighting factor determined subjectively by value w OO . , in this study using the value of w = 0. vmax= the maximum distance from all objects to the center of the cluster on the complete data variable. cv = sum of the squares of soft constraints containing the value of sc cvmax= sum of squares of all soft constraints . ) ( r Oe. Oe ccd |C 10 Oe4 ) . Repeat steps 1 to 2 through max (| ccd c ,d Material and Method The data used in this study is secondary data, namely well-being indicators based education statistics data for 2022 sourced from the Central Statistics Agency of Aceh Province. The software that used to conduct the analysis is R version 3. The research variables consisted of indicators of the participation rate of children aged 3-6 years in the Early Childhood Education program. It consisted of 3 variables, the School Participation Rate, which consisted of 3 variables, the Gross Enrollment Rate, which consisted of 3 121 | Cyberspace: Jurnal Pendidikan Teknologi Informasi PRINCIPAL COMPONENT K-MEANS SOFT CONSTRAINT BASED ON WELL-BEING INDICATORS IN ACEH PROVINCE variables, the net enrollment rate, which consists of 3 variables, the percentage of the population ten years and over is detailed according to the highest diploma held which consists of 6 variables and the number of schools consisting of 11 variables. The step carried out in this research is started by presenting the overview of Aceh literacy rate. The next step is the implementation of PCA. This is aimed to reduce the dimensions of 29 variables into the smaller dimensions. The result of the dimension reduction is then grouped by using cluster analysis namely k-means soft constraints. The clusters obtained were identified based on the characteristics of each cluster. Result and Discussion Aceh's well-being-based education indicators have changed significantly from year to year. Along with these changes, an analysis of the education indicator variables in Aceh will be carried out and classifying the districts in Aceh Province based on their similar From the results of the correlation analysis, if itAos looked at the correlation in the correlation matrix R measuring p x p = 29 x 29 . is the number of observed variable. , there are several high correlations between independent variables that indicate multicollinearity, which may be due to different units of measurement. The correlation matrix is shown in Figure 1. Figure 1. Correlation matrix between variables Multicollinearity can be overcome by principal component analysis by first standardizing the X variables into Z variables and selecting the component dimensions, which must have a cumulative diversity of more than 70 percent. Table 1. Main component selection Dimension Eigen value Variance percent Dim. Dim. Dim. Dim. Dim. Dim. Dim. Dim. Cumulative variance percent Cyberspace: Jurnal Pendidikan Teknologi Informasi | 122 Winny Dian Safitri Grouping objects using the grouping method. Principal Component Analysis of KMeans Soft Constraint is to see the distance between objects, but the initial process is to deal with multicollinearity problems first. If the distance value for each object is small, it will be grouped into one cluster. The following are the district/city clusters' results in Aceh based on the 2022 education indicators. MAPPING VISUALIZATION OF DISTRICT / CITY WELL-BEING BASED EDUCATION INDICATORS IN ACEH Figure 2. Results of the Aceh education cluster in 2022 Based on Figure 2, it was obtained four district/city clusters in Aceh Province based on well-being-based education indicator data for 2022, namely: Members of the first cluster are Aceh Jaya. Pidie Jaya. Nagan Raya. Aceh Barat Daya. Aceh Selatan. Gayo Lues. Subulussalam. Aceh Singkil. Sabang. Lhokseumawe. Langsa, and Simeulue. The first cluster's characteristics are that kindergarten and elementary school facilities are adequate, while the school enrollment rate needs improvement. Members of the second cluster are Aceh Barat. Aceh Tengah. Bener Meriah. Aceh Tamiang, and Aceh Tenggara. The second cluster characteristics are superior in the Gross Enrollment Rate and university graduates' population, but still lacking in school facilities. Members of the third cluster, namely Banda Aceh. Banda Aceh is the center of education in Aceh Province. The third cluster characteristic is the cluster that is the center of education in Aceh, so that this cluster is the cluster with the best education level. Members of the fourth cluster are Aceh Besar. Pidie. Bireuen. Aceh Utara, and Aceh Timur. The fourth cluster characteristic is that it is very good in the School Participation Rate indicator, but it must increase early childhood school From the cluster above, it can be seen that there should be more programs to improve the quality and quality of teaching staff as well as the distribution of teachers with national education standards evenly in every district/city in Aceh. These efforts were made 123 | Cyberspace: Jurnal Pendidikan Teknologi Informasi PRINCIPAL COMPONENT K-MEANS SOFT CONSTRAINT BASED ON WELL-BEING INDICATORS IN ACEH PROVINCE primarily to reduce the gap in education level and quality between districts/cities in Aceh. The level of disparities in district/city education development in Aceh must be minimized so that every Acehnese has the same opportunity to get proper education up to the highest Expectations from the high level of public education will automatically increase the standard of living so that just and equitable welfare can be realized in people's lives. However, the initial process is to deal with multicollinearity problems first. If the distance value for each object is small, it will be grouped into one cluster. The following are the district/city clusters' results in Aceh based on the 2022 education indicators. Conclusion Based on the results of the clustering of districts/cities in Aceh Province using the Principal Component K-Means Soft Constraint method, it shows that the members of each cluster are strongly influenced by geographic location, the proximity between districts/cities in Aceh so that the similar characteristics of the Aceh education indicators in 2022 are formed. The results of this study indicate there are four district/city clusters in Aceh Province. The first cluster's characteristics are that the kindergarten and elementary school facilities are adequate, while the school participation rate needs to be The second cluster's characteristics are superior in the Gross Enrollment Rate and the population of university graduates, but still lacking in school facilities. Characteristics of the third cluster is the cluster that is the center of well-being based education in Aceh so that this cluster is the cluster with the best education level. The fourth cluster's characteristic is that it is very good in the School Participation Rate indicator, but it must increase early childhood school participation. References