Advance Sustainable Science. Engineering and Technology Wildfire Risk Map Based on DBSCAN Clustering and Cluster Density Evaluation Muchamad Taufiq Anwar. Wiwien Hadikurniawati. Edy Winarno. Aji Supriyanto Faculty of Information Technology. Universitas Stikubank Semarang. Jl. Tri Lomba Juang No 1 Semarang 50241. Central Java. Indonesia taufiq@edu. Abstract. Wildfire risk analysis can be based on historical data of fire hotspot occurrence. Traditional wildfire risk analyses often rely on the use of administrative or grid polygons which has their own limitations. This research aims to develop a wildfire risk map by implementing DBSCAN clustering method to identify areas with wildfire risk based on historical data of wildfire hotspot occurrence points. The risk ranks for each area/cluster were then ranked/calculated based on the cluster density. The result showed that this method is capable of detecting major clusters/areas with their respective wildfire risk and that the majority of consequent fire occurrences were repeated inside the identified clusters/areas. Keywords: wildfire risk map. DBSCAN. cluster density. Introduction Wildfire is one of the most notable disasters occurred around the world. Wildfires had caused a large amount of economic losses and environmental damaged. Preventive actions/managements are required in order to minimize the negative effect caused by wildfire. One of the main tasks in managing wildfire is the detection of the areas which have high wildfire risk. The determination of area with a high risk of wildfire or any geographical phenomena . , the spatial AohotspotA. often relied on the use of administrative or grid polygons which has their own limitations, as mentioned by Han and Shu . Therefore, the detection/determination of high fire risk areas should implement AounsupervisedAo methods that are independent of administrative or grid polygons. One of the available options is by implementing a clustering algorithm to detect clusters of points and to set boundaries for each cluster. This research aims to develop a wildfire risk map by implementing DBSCAN clustering method to identify areas with wildfire risk based on historical data of wildfire hotspot occurrence points. Risk ranks for each identified area/cluster were then ranked/calculated based on their density . umber of hotspot points per km squar. The resulted wildfire risk map is useful in wildfire management so that preventive/mitigating action can be done to minimize losses or other negative effects. Methods Wildfire Risk Assessment Research often used historical data of wildfire to determine wildfire risk in certain areas. Recent research had explored the risk of fire spreading to the urban area in Australia . Other research used Advance Sustainable Science. Engineering and Technology generalized additive model to estimate wildfire risk in Mediterranean area . , and used artificial neural networks to predict wildfire risk in South Africa . The historical data can be obtained from satellite images such as from NASA . The risk model might be based on several variables related to wildfire, such as the meteorological variables . , vegetation indices . , and other variables . ,7,. or based only on historical wildfire data . When relying only on historical wildfire data, assessments can be based on the number of or the location of wildfire hotspot occurrences. A recent reseach also had explored a rule-based approach to wildfire model based on historical wildfire data. The identification of area with high fire occurrence/risk . he spatial AohotspotA. can also be done using spatial analysis such as the AnselinAos MoranAos I or LISA . and Getis-Ord Gi . This approach is sensible since most geographical phenomena often show a spatial pattern . , including wildfire itself . Even the evidence of spatial autocorrelation itself can be included in the model to make it better . However, these analyses often use administrative polygons or grid polygons and require a high number of polygons to be used in a study area in order for the analysis to give a good Furthermore, the use of polygons itself also has its own drawbacks, namely the scale mismatch, shape mismatch, and location mismatch as mentioned in . DBSCAN Clustering Density-Based Clustering of Applications with Noise (DBSCAN) is a modern clustering algorithm which has the capability to detect cluster in spatial points data . and not relying on polygons. DBSCAN was first introduced by Ester, et. al in 1996 . DBSCAN has two parameters namely eps which represent maximum distance/radius from a point where the membership of other points enclosed within it is evaluated. and minpts which tell the minimum number of points to be considered as a member of a cluster within the radius of eps. DBSCAN works by giving each point a circle with the radius of determined eps, followed by the membership evaluation for each point enclosed in that circle. A point would fall into one of three categories, namely a core point, border point, or noise points. A point is assigned as core points if it has at least a number of member . point equal to minpts within radius eps. A point is assigned as border points when a point is within eps but has a number of member points of less than minpts. Finally, if the point doesnAot belong to the core or border points, it is assigned as noise points. Noise points are not a member of any clusters. A cluster is then defined as a set membership containing a combination of core points surrounded by border points. The pseudocode for the DBSCAN algorithm is presented in Algorithm 1 . Algorithm 1: DBSCAN Clustering 1: DBSCAN(D, eps. MinPt. C=0 For each unvisited point P in dataset D mark P as visited N = getNeighbors (P, ep. If sizeof(N) < MinPts then mark P as NOISE Else C = next cluster expandCluster(P. C, eps. MinPt. End If End For 14: expandCluster(P. C, eps. MinPt. add P to cluster C For each point P' in N If P' is not visited mark P' as visited N' = getNeighbors(P', ep. If sizeof(N') >= MinPts N = N joined with N' Advance Sustainable Science. Engineering and Technology End If End If If P' is not yet member of any cluster add P' to cluster C End If End For The research framework for this paper is shown in Figure 1. For the purpose of this research. DBSCAN is used since it has advantages over other clustering algorithms such as the K-Means clustering and Scan statistics. First, unlike the K-Means. DBSCAN does not require us to set a determined number of cluster k. This determination of k, of course, is not suitable for this research, since the number of identified clusters is merely a result of density evaluation of points in the data, i. , an AuunsupervisedAy algorithm. Furthermore. DBSCAN can detect clusters with arbitrary shapes, while K-Means and Scan statistics only evaluate and produce clusters of circular shape. This circular shape, of course, is not suitable to be implemented on geographical data since geographical phenomena come in many shapes. Recent use of DBSCAN in geographical clustering includes retail agglomerations . , and location recommendation in location-based social networks . Recent research also applied the DBSCAN to detect clusters in raster images . Despite years of proposed improvements, the original Figure 1. Research framework. DBSCAN could still perform well as long as we pick a reasonable parameter . In order to calculate the density of a cluster, we need to estimate the area of each cluster, which first we need to determine the boundary of the clusters. One of the available options is the convex hull In mathematics, a convex hull is the smallest set . which allows any two points in that set to be connected without leaving the set. It is well understood using the analogy of a stretched rubber band which encloses a set of points . , a cluste. Wildfire data were collected from the Moderate Resolution Imaging Spectroradiometer (MODIS) provided by the Fire Information for Resource Management System (FIRMS). This data is a result of automatic detection by NASA MODIS satellite for the anomaly of temperature/fire presence. For the purpose of this research, only data with 100 confidence were used. The data of the national wildfire archive were then filtered/clipped only for the study area. DBSCAN clustering algorithm was done in R using AodbscanAo function in the AodbscanAo package. The determination of eps was first based on visual examination of AokneeAo in the K-NN distance plot provided by AokNNdistplotAo function, provided in the same package. The experimentations of eps and minpts were done in multiple datasets to test their performance. The datasets were varied in the time and spatial The best and final eps and minpts value to be used in this research were then picked based on the experimentation result. The DBSCAN algorithm resulted in points membership assigned as a member of a cluster or as a noise. Noise points were removed before the creation of convex hull polygons. Each polygon has the attribute of its area . n degree squar. This attribute is automatically generated by the Convex Hull function in QGIS. These areas in degree square were then converted into the areas in kilometer square . t the equator, one degree of longitude/latitude is about 110. 57 k. Our study area is located at the equator, spanning from latitude 02A 25Ao N to 01A 15Ao S. Clusters density were then calculated by dividing the number of points in each area/cluster by the clusterAos area . n km squar. The risk ranks for each area/cluster were then ranked/calculated based on the density of the clusters. The rank is relative, where clusters with a density higher than the mean density are classified as a high Advance Sustainable Science. Engineering and Technology fire risk area, while clusters with a density lower than the mean density are classified as a medium fire risk area. The low fire risk area then is the area outside the identified clusters. The resulted clusters with their corresponding fire risk ranks were then visualized using QGIS. The resulted clusters were tested using the test dataset from the past two years. The tools used in this research are RStudio and QGIS Results and Discussion The study area for this researh is Riau Province in Sumatra Island. Indonesia. Wildfire data from 2001 data 2017 were filtered only those with 100 confidence and were split into two datasets, 21252 training dataset has which ranged from 2001 to 2015, and 326 test dataset which ranged from 2016 to 2017. The wildfire locations of training dataset is shown in Figure 2a. The training dataset was run in the DBSCAN algorithm with eps of 0. 02 degrees . 21 k. and minpts = 5. These values were based on experimentation which gave good clustering results. The DBSCAN algorithm in R package AodbscanAo resulted in 211 clusters with a total membership of 16142 points . % of fire case dat. Minor clusters that have historical fire counts less than 120 were then excluded. The 120 limits were estimated by the judgment of one fire case in 8 fire-months for 15 years. Convex hull polygons were created using QGIS function which also calculated the area of each cluster. The areas of each cluster . n kilometer squar. were then calculated by multiplying the area . n degree squar. 57 km square . degree in equator 110. 57 k. Fire density for each cluster was then calculated by dividing the number of fire points in a cluster with the area . n kilometer squar. of the corresponding cluster. The previous minor clusters exclusion had also eliminated low-density cluster with a density lower than 0,5 . hich translated into less than one fire in an area of 2 km squar. The result is 22 AusignificantAy / major clusters which have high historical fire count and usually have a large area. These clusters were then assigned fire risk rank/label based on their density. The risk ranks were assigned based on their density compared to their A density lower than the mean would be assigned as an area of medium fire risk, while a density higher than the mean is assigned as an area of high fire risk. The low fire risk areas are the areas which are not included in the final . This resulted in l4 medium fire risk clusters and 8 high fire risk clusters. One of the clusters in the high fire risk category has very high density and deemed as an outlier from other clusters. This cluster was then assigned as a very high fire density. The resulted clusters and their respective fire ranks are shown in Figure 2b. The area with a very high fire risk is located at the near top right of the study area. Figure 2. Locations of fire hotspot within the study area from 2001 to 2015. Resulted clusters with their color-coded risk ranks. Recent wildfire cases overlayed with the resulted clusters. The resulted clusters were then tested against fire data from the last two years . which have 326 fire points as shown in Figure 2c. Layer overlay and intersection we used to calculate the number of points within the resulted clusters and resulted in 229 of 326 or 70% fire point fall within final significant clusters. The detailed information for each cluster is shown in Table 1. The resulted clusters might reveal similarity in wildfire variables such as meteorological factors, vegetation, or land Advance Sustainable Science. Engineering and Technology type . eat or non-pea. From the spatial hotspot analysis view, each cluster resulted from the DBSCAN algorithm may also represent a hotspot (High-Hig. surrounded by AowarmAo spots (High-Lo. The hotspots are analogous to the core points, while the warm spots are analogous to border points. Table 1. The resulted clusters and their attributes. Cluster Number of points (Training datase. Area m squar. Density (Points per km squar. 1043,89 4,64 533,96 97,33 603,16 110,46 513,08 109,70 111,10 262,43 200,49 169,80 301,52 1388,49 691,11 152,54 270,24 219,24 761,50 966,70 243,95 369,38 930,61 2,88 2,12 2,07 1,76 1,64 1,48 1,48 1,41 1,39 1,26 1,21 1,21 1,21 1,13 0,95 0,77 0,77 0,71 0,67 0,66 0,51 Risk Rank Very High High High High High High High High Medium Medium Medium Medium Medium Medium Medium Medium Medium Medium Medium Medium Medium Medium Number of points (Test Conclusion and Future Research Wildfire risk assessment can be based on historical data of fire hotspot occurrence. This research built a model for fire-risk prone areas using DBSCAN clustering applied on historical wildfire data from 2001 to 2015 and followed by density evaluation to determine risk rank. The resulted clusters were then tested against recent wildfire data from the last two years, and it showed that 70% of the recent wildfire occurrence falls within the resulted cluster, which indicates good performance. The identified cluster calls the authorities to perform wildfire ignition prevention and other mitigation actions . However, the convex hull algorithm used in this research has its own limitation in estimating area since it cannot create proper cluster boundary in complex cluster shapes. In that case, other cluster boundary estimation such as concave hull or buffer distance might work better, but some parameter tuning might be needed. Area estimation . n kilomete. also need to be reconsidered in a study area far Advance Sustainable Science. Engineering and Technology from the equator. Future research might also try to explore other or richer risk-ranking method. The determination of eps and minpts in DBSCAN, and also the risk ranking method could also be based on expertsAo judgment or against a certain standard from the authority. References