ISSN: 0852-0682. EISSN: 2460-3945 )RUXP*HRJUDA9RO -XO\ '2,IRUJHRYL AU$XWKRU V &&%<1&1'$WWULEXWLRQ/LFHQVH Tourist Attraction Popularity Mapping Based on Geotagged Tweets Totok W. Wibowo *. Ahmad F. Bustomi . Anggito V. Sukamdi Faculty of Geography. Universitas Gadjah Mada. Yogyakarta. Indonesia 55281 Corresponding Author . -mail: totok. wahyu@ugm. Received: 20 April 2019 / Accepted: 17 July 2019 / Published: 01 August 2019 Abstract. 7KHGHYHORSPHQWRIWRXULVWDWWUDFWLRQVLVQRZKLJKO\LQyAXHQFHGE\VRFLDOPHGLD The speed at which information can be disseminated via the Internet has become an essential factor in enabling distinct tourist attractions to potentially gain high popularity in a relatively short time. This condition was not as prevalent several years ago, when tourism promotion remained limited to a certain kind of media. As a consequence, rapid change in the relative popularity of tourist attractions is inevitable. Against this, knowledge of tourist attraction hotspots is essential in tourism management. This means there is a need to study the means by which to both quickly determine the popularity level of tourist attractions and encompass a relatively large area. This article utilised tweet data from microblogging website Twitter as the basis from which to determine the popularity level of a tourist attraction. Data mining was conducted using Python and the Tweepy module. The tweet data were collected at the end of April and early May 2017, at times when there are several long holiday weekends. A Tweet Proximity Index (TPI) was used to calculate both the density and frequency of tweets based on DGHyAQHGVHDUFKUDGLXV$'HQVLW\,QGH[ '. ZDVDOVRXVHGDVDWHFKQLTXHIRUGHWHUPLQLQJWKH The results from both approaches were then compared to a random survey about peopleAos perceptions of tourist attractions in the study area. The result shows that geotagged tweet data can be used to determine the popularity of a tourist attraction, although it still only achieved a medium level of accuracy. The TPI approach used in this study produced an accuracy of 76. 47%, while the DI achieved only 58. This medium accuracy does indicate that the two approaches are not yet strong enough to be used for decision-making but should be more than adequate as an initial description. Further, it is necessary to improve the method of indexing and the exploration of other aspects of Twitter data. Keywords: Twitter, geotagged, hotspot, popularity, tourism. Abstrak. Perkembangan objek wisata pada saat ini tidak dapat terpisahkan dari media sosial. Kemampuan internet dalam menyebarkan informasi telah membuat suatu objek wisata dapat secara singkat meraih popularitas yang tinggi. Hal ini tentu berbeda dengan kondisi beberapa tahun yang lalu, yang mana promosi objek wisata masih sangat terbatas. Perubahan popularitas pun menjadi hal yang tak terelakkan karena tingkat penyebaran data yang begitu Di sisi lain pengetahuan tentang tingkat popularitas objek wisata sangat diperlukan dalam penentuan prioritas pengembangan yang menyeluruh. Dengan demikian diperlukan kajian untuk dapat memetakan tingkat popularitas objek wisata secara cepat dan dapat menjangkau daerah yang luas. Artikel ini akan memanfaatkan sumber data dari situs Microblogging Twitter, sebagai dasar untuk penentuan tingkat popularitas suatu objek wisata. Penambangan data . ata minin. dilakukan dengan menggunakan bahasa Python dan modul Tweepy. Data dikumpulkan pada saat libur panjang di akhir bulan April dan awal bulan Mei tahun 2017, yang mana diasumsikan akan terdapat banyak wisatawan yang berlibur. Tweet Proximity Index (TPI) digunakan untuk menghitung kepadatan tweet dan frekuensi tweet, berdasarkan radius pencarian yang ditentukan. Density Index (DI) juga digunakan untuk 7RXULVW$WWUDFWLRQ3RSXODULW\. :LERZRet al. memberikan pendekatan lain untuk menentukan popularitas objek wisata. Kedua hasil analisis akan dibandingkan dengan survei secara acak tentang persepsi masyarakat terhadap objek wisata di wilayah kajian. Survei secara langsung juga dilakukan untuk mengetahui akurasi hasil analisis yang telah dilakukan. Hasil penelitian menunjukkan bahwa data geolocated Tweets dapat digunakan untuk penentuan popularitas objek wisata. TPI menghasilkan akurasi yang lebih tinggi . ,47%) daripada DI . ,82%). Akurasi menengah ini memang menunjukkan bahwa kedua pendekatan tersebut belum cukup kuat untuk digunakan untuk pengambilan keputusan, tetapi lebih dari cukup untuk digunakan sebagai deskripsi awal popularitas objek wisata. Perbaikan metode penyusunan indeks maupun eksplorasi aspek lain dari data Twitter perlu dikembangkan untuk mendapatkan nilai akurasi yang lebih tinggi. Kata kunci: Twitter, geotagged, hotspot, popularitas, pariwisata. Introduction Indonesia development of social media over recent Many factors have contributed to this development, including hardware, software and infrastructure development. Among such factors, however, information technology infrastructure plays a huge role in promoting and supporting the development of social media. for instance, the recent implementation of a 4G network in Indonesia. The latest generation of broadband Internet provides far higher speeds than the previous generation (Fauzi et al. Around the same time, the smartphone has become a ubiquitous item. The competitive price of smartphones, combined with their inbuilt sensors and functionality, has led to their widespread use by people as an enhanced telecommunication device. Furthermore, the addition of a Global Positioning System (GPS) sensor in smartphones opens up the possibility of recording geospatial data. Users have a choice of many different social media platforms, although it is relatively common for a user to be active across numerous different platforms. Twitter, a microblogging social media website, is a platform with a relatively large number of users in Indonesia. Statista . noted that in 2016 there were 24. million active Twitter users in Indonesia, which means that Indonesia has the third-highest number of active Twitter users in the world after the United States and India. There are also various different groups of Twitter users. UDQJLQJIURPJRYHUQPHQWRIyAFLDOVSROLWLFLDQV )RUXP*HRJUDA9RO -XO\ academics and advertisers, to students who are still at school (Huberman et al. , 2. Even WKHSUHVLGHQWRIWKH8QLWHG6WDWHVKDVDVSHFLyAF 7ZLWWHUDFFRXQWFDOOHG32786 3UHVLGHQWRIWKH United State. In contrast to other social media platforms such as Instagram and Facebook, the Twitter Application Programming Interface (API) is more accessible, thus increasing the possibility of obtaining more data. The growing number of users will directly result in massive transfers of data between users and the server. The server will also be affected by the very high volumes of data being stored, which can even exceed the limits of big data . xabyte/1. The concept of big data has existed since the beginning of computing because it was used incipiently to identify data WKDW FRXOG QRW EH SURFHVVHG HIyAFLHQWO\ XVLQJ traditional database methods (Kaisler et al. Thus, due to its different characteristics, big data required special handling for its There are two main things to consider when handling big data, namely the design of a system that is capable of handling such large volumes of data and the ability to yAOWHULWDFFRUGLQJWRVSHFLyAFREMHFWLYHV . DWDOet , 2. The impressive thing about tweets is the option to add position data, which in this case is supported by the GPS found on smartphones. A tweet that incorporates location information . geotagged twee. can be used for the purposes of spatial visualisation and spatial analysis. Although, according to the data, only 5% of all tweets have position ISSN: 0852-0682. EISSN: 2460-3945 7RXULVW$WWUDFWLRQ3RSXODULW\. :LERZRet al. information (Carto, 2. , it is undeniable that their existence has added new data sources in mapping as outcomes of location-based social media (Thatcher, 2. , in addition to the data sources mentioned in several kinds of literature UDDN 2UPHOLQJ 7KH FXUUHQW DQG recent use of geolocated tweet data has been very diverse, ranging from studies on happiness level (Frank et al. , 2. , sense of place (Jenkins et al. , 2. , global mobility patterns (Howelka et , 2014. Yin and Du, 2. , to Twitter network analysis (Takhteyev et al. , 2. and rainfall data correlation (Lwin et al. , 2. The results are able to reveal things that ZHUH SUHYLRXVO\ GLIyAFXOW WR GR ,QGHHG HYHQ the act of obtaining data for a study was more This opportunity is inseparable from the role of technology in transforming humans into active sensors for the purpose of data collection (Miller & Goodchild, 2. in such a way as to engender a shift in the data collection paradigm. Whereas in the past data collection was based on data-scarce activity, there has now been a shift in the paradigm due to the fact that currently, respondents actively collect data . ata-ric. Tourism was declared a national priority in the 2015-2019 Medium Term Development Plan (RPJM), with the hope that by the end of 2019 there would be 20 million visiting foreign tourists and 275 million local tourists (Setkab. The tourism sector is highly strategic in terms of its role in increasing economic activity and supporting regional development. Ideally, these efforts will be accompanied by improvements in the facilities and infrastructure at each tourist attraction. The management of tourist attractions that have been integrated into one administrative area will support the implementation of such regional development. Therefore, information is needed on the popularity of tourist attractions. Ideally, more popular attractions will require more resources than less popular attractions. ISSN: 0852-0682. EISSN: 2460-3945 In recent years, social media has contributed VLJQLyAFDQWO\ WR WKH GLVVHPLQDWLRQ RI WRXULVP Some social media accounts are HYHQ FUHDWHG VSHFLyAFDOO\ IRU WKH SXUSRVH RI tourism promotion. Interactions between social media users have the power to encourage users to visit certain tourist attractions. Moreover, the information presented on social media is not just textual in nature but also features multimedia The abundance of multimedia data on social media provides the opportunity to study a variety of things. Nevertheless, it is still necessary to process the data carefully, particularly in the stages of data collection and Data analysis can then be applied as needed. New tourist attractions, such as Breccia Cliff Park. Amaryllis Park and Kalibiru Tourism Village, are notable for having rapidly gained popularity among social media users. It is important to be prepared for such popularity in order to be in a position to maximise the YLVLWRU H[SHULHQFH 7KH LQyAXHQFH RI VRFLDO media on the popularity of legendary tourist attractions is another interesting case to study. Adaptation is the key for any tourist attraction to retain its popularity and attract visitors. As an example, there is the transition from agriculture DQG yAVKHULHV WR WRWDO WRXULVP LQ . DULPXQMDZD (Setiawan et al. , 2. Borobudur, which had setbacks and was abandoned, was able to achieve a high level of popularity through a process of adaptive transformation (Baiquni. Twitter allows users to access data on a server using an API which is limited by UsersAo Twitter data, especially geotagged tweets, can be used to map the distribution of the popularity of attractions TXLFNO\DQGHIyAFLHQWO\ RZHYHUWKHDFFXUDF\RI the methodAos use in determining popularity still needs to be assessed. This paper will examine the usefulness of Twitter data as an indicator to assess the popularity of tourist attractions. )RUXP*HRJUDA9RO -XO\ 7RXULVW$WWUDFWLRQ3RSXODULW\. :LERZRet al. Literature Review Big Data The development of mobile computing hardware has followed MooreAos law for The increase in hardware production has also had an impact on the volume of data collected owing to the fact that almost every electronic device has a mechanism for obtaining However, in the current information age, the ability to handle large volumes of data continues to evolve (Tsai et al. , 2. That is why Fisher et al. showed that big data involves data that cannot be handled and processed by most current methods or information systems. The characteristics of big data that are often discussed are 3V, namely volume, velocity and variety (Laney, 2. These three characteristics H[SODLQWKHAELJAAWHUPLQELJGDWD9ROXPHUHIHUV to a massive data size, velocity refers to transfer rates and variety refers to the large variety of data structures. However, the concept of 3V is now no longer suitable for describing big data (Rijmenam, 2013. Borne, 2. To describe the characteristics of the current trend in big data, we need to add several additional features, namely veracity, validity, value, variability, venue, vocabulary and vagueness. Data mining is the study of the collection, cleaning, processing, analysis and acquisition of meaningful information from a data set (Aggarwal, 2. In its utilisation, there are numerous variations in the problem domain, application, formulation and data Thus, the term data mining is wide-ranging in its use to explain several aspects of data processing. The abundance of data is a direct impact of technological development and computerisation in various aspects of life. The systematics of data collection must accommodate the purpose of data usage. However, there is also the possibility of reusing the same data for different purposes. In this case, data mining can be used as a medium for extracting data from various sources for its )RUXP*HRJUDA9RO -XO\ later management and presentation (Aggarwal. Raw data will be collected, cleaned and transformed into a standard format for Data can be stored in commercial database systems and then processed using various analytical methods to gain insight/ Within the entire process, the majority of data mining work is focused on data preparation. Twitter API Founded by four people in 2006. Twitter is a microblogging site that allows users to post messages comprising a maximum of 140 characters of text. Despite its simple concept, in its development. Twitter has become a choice of social media platform that is widely used by YDULRXV GLIIHUHQW JURXSV :LWKLQ yAYH \HDUV RI its release, there were 100 million active Twitter XVHUV 2A5HLOO\ 0LOVWHLQ A follower is the most basic level of user LQWHUDFWLRQ RQ 7ZLWWHU 7KH yAUVW DFFRXQW ZLOO always get the latest tweets from the second DFFRXQW)XUWKHUPRUHWKHyAUVWDFFRXQWKDVWKH RSWLRQWRGLVWULEXWHVSHFLyAFWZHHWVIURPRWKHU accounts . nown as retweetin. Users can also mention other accounts on Twitter, while the feature of many more interactions among other users is what differentiates Twitter as unique compared to other social media. Every tweet by a user will be stored on the Twitter server that is certainly equipped with cybersecurity. However, like most web services. Twitter has an API that allows users to download data using predetermined rules. Streaming API provides low latency access to stream tweet data globally. A streaming client ZLOO UHFHLYH D SXVK QRWLyAFDWLRQ DERXW WZHHWV that match their search criteria. Streaming API enables data to be obtained in real time. As at the time of the research. Twitter has three types of streaming API, namely: Public streams: enable the tracking of public data on the Twitter timeline. Used WR yAQG RXW VSHFLyAF WRSLFV DQG IRU GDWD ISSN: 0852-0682. EISSN: 2460-3945 7RXULVW$WWUDFWLRQ3RSXODULW\. :LERZRet al. User streams: allow searching on a Twitter user account. The result of the research is data that corresponds to the desired account. Site streams: a multi-user version of user streams. Connections to Twitter are required to use a server and represent multiple users. Research Method Data Mining Data mining was carried out using the Public Streaming Twitter API. In this case, four keys needed to be generated from the Twitter developer page, namely access tokens, access token secret, consumer key and consumer secret. The function of the keys is to get legitimacy to stream to TwitterAos data via 2$XWK The scripting was carried out using the Python programming language. Not all tweets were collected in this study as only geotagged tweets were relevant. Thus, in streaming, it is necessary to limit the search area, in our case to the Central Java Province and the Special Region of Yogyakarta (Figure . The search area limit parameters were included in the script as one of the query criteria. Python requires an additional Tweepy library to communicate with the Twitter API. Installation of the Tweepy module is done directly in the Python storage directory that is associated with QGIS. This is done to maintain the independence of the Python installation from the various software on the FRPSXWHU7KHyAUVWSDUWRIWKHVFULSWFRQWDLQV several functions from the Tweepy module, which is then followed by providing the four previously obtained accesses and keys. The command to stream tweets is written in the next section, which is then followed by authentication and entry of the keyword as the basis for the query. The script can be executed through the Console / Terminal / Command Prompt, which is available on any desktop operating In addition, some GIS software provides direct access to Python through the GUI Console, with one way being to use Quantum GIS. In this study. Python script was executed from QGIS because it can be set to directly display geospatial data. Data collection was carried out over two long weekends at the end of April and early May 2017 because generally, the number of tourists will increase over both of these holidays. the method chosen was Streaming, the script continued during the time the query was being run. )LJXUH5HVHDUFKDUHD LQGLFDWHGE\OLJKWUHGFRORXU ISSN: 0852-0682. EISSN: 2460-3945 )RUXP*HRJUDA9RO -XO\ 7RXULVW$WWUDFWLRQ3RSXODULW\. :LERZRet al. Data Visualisation The point data visualisation technique has a unique characteristic. However, there is a need to generalise its appearance should the volume of data become too high. Simple data visualisation can be valuable for analysis. In this case, density value can be used to simplify spatial point data. Global density constitutes the simplest density calculation, which divides the population over the administrative boundary area. This visualisation method provides a very effective aggregation of point data. However, the use of arbitrary administrative boundaries tends to lead to subjectivity and to details being missed from the display. The issue of which areas to select to represent the data point is referred to as the 0RGLyADEOH $UHDO 8QLW 3UREOHP 0$83 ,I not treated carefully. MAUP can lead to bias. The tessellation polygon technique can be used to address this problem and divides the study area into a grid with predetermined shapes and sizes. An area of 1 square km LV DVVXPHG WR EH VXIyAFLHQW WR UHSUHVHQW WKH effective service area of an average tourist Data Analysis While visualisation is intended to produce a general picture regardless of the tourist attraction, point analysis is carried out as an approach for determining the popularity of attractions. Radius of Gyration is a measure that is often used to determine and quantify the effect of distance reduction on mobility patterns (Gonzalez et al. , 2. However, since the moment of inertia effect does not impact on the creation of a tweet, an alternative approach is needed. 7KH yAUVW DSSURDFK HPSKDVLVHV WKH measurement of the number of tweets and the distance to the measured point. The Tweet Proximity Index (TPI) is used for this purpose. TPI is calculated based on two parameters, namely the index of the number of tweets and the average distance index of tweets (Wibowo. Both calculation parameters are carried RXWDWDGHyAQHGUDGLXVIURPWKHWRXULVWSRLQW The TPI value ranges from 0 to 2, where 0 indicates no tweets at all and 2 denotes many tweets and that the location is in the tourist In this study, we used a search radius of 1 km. Figure 2 illustrates the spatial depiction of TPI in each tourist attraction. )LJXUH6SDWLDOGHSLFWLRQRI73,FDOFXODWLRQ )RUXP*HRJUDA9RO -XO\ ISSN: 0852-0682. EISSN: 2460-3945 7RXULVW$WWUDFWLRQ3RSXODULW\. :LERZRet al. Point density was used as a second approach to determine the popularity. The density index (DI) was calculated based on the results of point density analysis using the kernel density estimation (KDE) principle. The grid size used was equated with a search radius for TPI calculations. Theoretically, the denser the Twitter data on a tourist attraction, the more popular the tourist attraction. Measurement of density level was carried out using the point density algorithm in GIS software, which in principle will also pay attention to neighbouring cell density. Popularity Assessment In recent years social media has become a very effective means of disseminating tourism Many new attractions have become very popular as a result of information uploads, which act as a chain message for social PHGLDXVHUV$FFRUGLQJWRRIyAFLDOGDWDWKHUHDUH more than 100 tourist attractions in the study DUHD DOWKRXJK WKLV yAJXUH GRHV QRW LQFOXGH attractions that are popular because of social The popularity of tourist attractions was measured through the random dissemination of questionnaires using an online survey form. The items in the questionnaire were divided into four stages . , namely identity. Twitter data, tourism data, and social media and The target respondents were tourists in several tourist locations. The age limit of the respondents was determined by selecting respondents who were most likely to have social media and actively use it . ges 15Ae50 year. Seventeen tourist attractions that were rated popular by 144 respondents, as indicated by a high number of votes, were used as the reference data. Meanwhile, the same number of tourist attractions with the greatest TPI and the highest DI was also selected. The accuracy of both approaches was assessed by comparing them with the reference data. Accuracy was indicated as a percentage, denoting the extent to which TPI and DI can predict the correct tourist Results and Discussion Tweet Data A total of 85,096 tweets were obtained from the data mining before going through a data cleaning process. The aim of the cleaning was to remove data from outside the research The amount of Twitter data from within the study area stood at 76,859 . 32%), with the remainder found from within Indonesia but outside the search area. This query imperfection was likely caused by various data that did not have location information but were nevertheless captured by the query script. Figure 3. Frequency of tweets per hour. ISSN: 0852-0682. EISSN: 2460-3945 )RUXP*HRJUDA9RO -XO\ 7RXULVW$WWUDFWLRQ3RSXODULW\. :LERZRet al. Data acquisition generally began in the morning and ended at night. Figure 3 shows there were two peaks in terms of the number of tweets, which occurred during the day and Data acquisition fell dramatically between midnight and 4 am, which we can assume is because many of the tourist attractions in the study area were closed during this time. Twitter data mining using the Streaming API method requires users to always be connected to the Twitter server. If the query is met with a connection problem, then the data mining will be forfeited, which is one disadvantage of using WKH 6WUHDPLQJ $3. PHWKRG 2QH RSWLRQ IRU overcoming this problem is to reduce the amount of data for queries that can be implemented. this case, the user must diligently perform a reTXHU\LIWKHSUHYLRXVWDVNKDVyAQLVKHGUXQQLQJ The duration of a query depends on the desired the wider the area, the shorter the query time will be. Conversely, a narrower search area will require a longer query time. In general, the distribution of the spatial data displays a clustering pattern in locations such as Yogyakarta. Surakarta. Semarang and Magelang (Figure . The amassing of data in the four cities seemed to dominate the distribution of tweets at the study site. Further examination of the map indicates a longitudinal pattern which has a strong association with road network data. Some coastal areas have a relatively large volume of Twitter data. Cilacap. Bantul. Tegal. Pekalongan and Jepara Regencies. Several areas around the Kendeng Hills, such as Grobogan Regency. Rembang Regency and Blora Regency, have a very small number of tweets compared to other regions. A quite similar pattern can be seen in the western central zone, which has a hilly and mountainous topography that would certainly hinder Internet infrastructure. Data from various FHOOXODU RSHUDWRUV LQ ,QGRQHVLD FRQyAUPV WKLV condition, especially for the Kendeng Hills UHJLRQ2QWKHFRQWUDU\ODUJHFLWLHVDUHZLGHO\ covered by cellular operator services from various networks, and this can act as a growth stimulant for social media users. )LJXUH7KHUHVXOWRI7ZLWWHUGDWDPLQLQJ )RUXP*HRJUDA9RO -XO\ ISSN: 0852-0682. EISSN: 2460-3945 7RXULVW$WWUDFWLRQ3RSXODULW\. :LERZRet al. 2 Global Density Analysis Figure 5 exhibits the global density of tweets calculated based on regency boundary. Semarang. Magelang. Yogyakarta and Surakarta Regency dominate when it comes to high tweets density. The tweets density in those regencies exceeded 4 tweets/km2. Moreover, the tweets density in Yogyakarta City stood at 421 tweets/km2. The latter is far in excess of the average tweets density, which was only 18 tweets/km2. In addition to the regencies/cities in the area, only Banyumas Regency. Pekalongan City. Tegal City and Kudus Regency have relatively high density YDOXHV2WKHUGLVWULFWVFLWLHVKDYHDGHQVLW\RI 1 tweet/km2 or lower. Global density analysis tends to be very subjective and can sometimes be misleading because there is a rather forced data aggregation. This visualisation method can be used to give a global perspective or perform a regional analysis. Tessellation Polygon Density Analysis The substituting of administrative boundaries with uniform boundaries can provide a more objective assessment of density. In this case, we used a square tessellation polygon with an area of 1 km 2. A more uniform division of the unit analysis allows for a more thorough calculation of An area size of 1 km 2 is assumed to be sufficient to represent the average area of tourism since activity would only be SUDFWLFDOZLWKLQFORVHSUR[LPLW\2QHRIWKH advantages of using tessellation polygon visualisation is that it conveys the dramatic difference between neighbouring polygons. The results of the tweet density calculation based on the tessellation polygon can be seen in Figure 6. Clusters of tweet density can be observed in Yogyakarta City. Semarang City and Surakarta City. Linear patterns along the road found in the initial data can be represented well in this visualisation method, unlike the SUHYLRXV YLVXDOLVDWLRQ 2I FRXUVH WKLV LV an advantage because it enables a more detailed pattern to be presented, but with a level of information that is simpler than the original data. )LJXUH7KHJOREDOGHQVLW\RIHDFK5HJHQF\ ISSN: 0852-0682. EISSN: 2460-3945 )RUXP*HRJUDA9RO -XO\ 7RXULVW$WWUDFWLRQ3RSXODULW\. :LERZRet al. )LJXUH7KHJOREDOGHQVLW\RIHDFK5HJHQF\ distance index calculation, the calculation of the average distance is not consistent because each tourist attraction has a different number of tweets. This condition is advantageous to tourist attractions that have relatively close tweet distances and a small number of tweets. The distribution of TPI values in the study area is less affected by the density pattern discussed in the previous section. The TPI classes are distributed equally in the west and centre of the study area, despite the relatively low tweets density value. 7ZHHW3UR[LPLW\,QGH[ 73, Based on the results of the point distance exception is the Karimun Jawa National Park analysis, a TPI was developed which stated (TNKJ), although this cannot be included the average distance and the amount of data in the calculation of the TPI value as the within a predetermined radius. In general, distance to the nearest tweets is 4 km, which the TPI value ranges from 0 and 1. 35 with exceeds the search radius limit of only 1 km. a mean of 0. 58 and a standard deviation of This result is not unexpected as the access to This relatively poor result is due to cellular networks in the Karimun Islands is WKH VLJQLyAFDQW GLIIHUHQFH LQ WKH QXPEHU RI not as good as in Java Island, thus limiting tweets (Figure . Based on the results of the the movement of social media users. Each polygon contains an average of 13. tweets with a standard deviation value of However, the 4,196 data range is very This result indicates the emergence of inequality within the study area. It is interesting to investigate the factors further. For example, in addition to infrastructure factors, as previously thought, the tweetmaking behaviour of Twitter users also has DQLQyAXHQFHRQWKHFUHDWLRQRIGDWDSDWWHUQV )RUXP*HRJUDA9RO -XO\ ISSN: 0852-0682. EISSN: 2460-3945 7RXULVW$WWUDFWLRQ3RSXODULW\. :LERZRet al. )LJXUH7ZHHW3UR[LPLW\,QGH[LQWKHVWXG\DUHD )LJXUH'LVWULEXWLRQRIWZHHWVZLWKLQDNPUDGLXVRI0DOLRERUR6WUHHWDQG 10 < 24 hr Latest tweets 1-2 days 2-7 days 1-2 weeks > 1 month 6RXUFH4XHVWLRQQDLUH The upper-right corner of Table 1 displays proof of the decreasing activity level of the Twitter user. A majority of the respondents who indicated that they had not posted a tweet for more than one month had also posted less )RUXP*HRJUDA9RO -XO\ WKDQDQDYHUDJHRIRQHWZHHWSHUGD\2YHUDOO 7% of tweets had been made within the previous 24 hours, which was far below the yAJXUH IRU WZHHWV PDGH PRUH WKDQ RQH PRQWK ago, which stood at 54. ISSN: 0852-0682. EISSN: 2460-3945 7RXULVW$WWUDFWLRQ3RSXODULW\. :LERZRet al. Figure 11. Add location information to a tweet. survey results related to the adding of location information to 7ZLWWHUF VXUYH\UHVXOWVUHODWHGWRWKHIUHTXHQF\RIDGGLQJORFDWLRQLQIRUPDWLRQWR7ZLWWHU The feature of adding a location to a tweet will further increase the chances of data being created that include coordinates. The option to add location information is presented every time a user makes a tweet (Figure 11. However, since it is optional, not all users will opt to show their location. According to the respondents, only 44. 1% had ever used this feature within their Twitter account (Figure A more detailed look at the data shows that of the respondents who had used the location feature in Twitter, only 7. 9% always activated it, while 48. 3% of the respondents very rarely used it (Figure 11. The results of the survey indicate that there is the possibility to acquire Twitter data that contains location information in only limited TXDQWLWLHV7KLVyAQGLQJFRQyAUPVRWKHUVWXGLHV which reveal that only 0. 71% of all tweets in Indonesia were geotagged (Carley et al. , 2. The third part of the questionnaire presented questions related to tourism in Central Java and the Special Region of Yogyakarta. Most of the respondents favoured nature tourism activity. The data also show that 30% of the respondents undertake tourism activities more than ten times per year. The fourth part of the questionnaire looked at the relationship of social media with tourism. Based on the data, many respondents obtained tourism information through social media. ISSN: 0852-0682. EISSN: 2460-3945 followed by information from friends and web pages respectively. As a form of media that EHQHyAWV IURP D UHODWLYHO\ TXLFN VSHHG RI GDWD transfer, social media is indeed an effective and HIyAFLHQWIRUPRISURPRWLRQDOPHGLD7KHSDVW few years have seen the sudden emergence of new famous tourist spots after they have gone viral on social media. As mentioned above, there is little probability of geotagged tweets being created by the user. However, looking at the survey data, 17. 7% of respondents answered that they had added location information to tourist Thus, among the various data contained on Twitter, it still offers the potential for use in tourism research. Popularity The popularity of tourist attractions was assessed by comparing the results from the questionnaire with the TPI calculation and DI. The assessment involved data from a total of 17 tourist attractions. The tourist attractions were selected based on the results of the respondentsAo choice of favourite, with a minimum of 2 voters Appendix 1 presents a comparison of the popularity of attractions based on the three above-mentioned elements. The TPI, despite appearing to be overestimated, turns out to have a greater accuracy than the DI, although with a very weak difference. The accuracies of the )RUXP*HRJUDA9RO -XO\ 7RXULVW$WWUDFWLRQ3RSXODULW\. :LERZRet al. calculation indexes were 76. 47% and 58. These accuracy values are quite high considering that the data used in the FDOFXODWLRQ RI 73. DQG '. ZHUH XQyAOWHUHG E\ Twitter content. If the raw data processed correspond with the purpose of the mapping, then the result of the index analysis is expected to be able to provide a higher level of accuracy. Analysis of non-geotagged data is needed for exploration because the volume of data on the server is much higher than the geolocated GDWD 7KH ODFN RI DFFHVV WR RIyAFLDO DQG HDVLO\ accessible data on tourist numbers also acts as an impediment to testing accuracy in this If data on the number of tourists can be acquired at the same time as the data mining is carried out, then an accuracy assessment can be conducted more precisely. However, a lack of tourist categorisation will make the DQDO\VLV PXFK PRUH GLIyAFXOW DV WKH DQDO\VLV will include a large volume of tourist data. Conclusion Geolocated tweet data can be accessed using the Public Streaming API via Python scripts and the Tweepy module. Queries can be performed by determining a search location or by keyword. The wider the search area, the more quickly data can be queried as it increases the opportunity for capturing the The results derived from the query data can be utilised for mapping activities, especially thematic mapping. Many themes can be developed based on tweets from Twitter users. Two approaches were used in this study to analyse the popularity of tourist attractions, namely the Tweet Proximity Index (TPI) and density index (DI). Neither approach delivered a satisfactory level of accuracy. Further exploration is needed of both index drafting methods and the examination of non-geotagged tweet data. It is also interesting to study data from Instagram, which currently has the highest percentage of activity among other social media platforms in Indonesia. Acknowledgements This research was conducted with funding from the Faculty of Geography. Universitas Gadjah Mada through the Lecturer Research Grant. The author is grateful for the opportunity to carry out this research through the grant scheme. References