Journal of Natural Resources and Environmental Management http://dx. org/10. 29244/jpsl. RESEARCH ARTICLE Paddy Fields Classification Using A 2-Dimensional Scatterplot of Growth Phenological Features from Sentinel-1 Data Kustiyoa,c. Rokhmatuloha. Adhi Harmoko Saputrob. Dony Kushardonoc. Ratih Dewanti Dimyatic. Lilik Budi Prasetyod Department of Physics. Faculty of Mathematics And Natural Sciences. University of Indonesia. UI Depok Campus. Depok, 16424. Indonesia Department of Geography. Faculty of Mathematics And Natural Sciences. University of Indonesia. UI Depok Campus. Depok, 16424. Indonesia Research Center for Geoinformatics. National Research and Innovation Agency (BRIN). KST Soekarno Cibinong. Bogor, 16911. Indonesia Department of Forest Resource Conservation and Ecotourism. Faculty of Forestry and Environement. IPB University. IPB Darmaga Campus. Bogor, 16680. Indonesia Article History Received 29 December 2023 Revised 18 January 2024 Accepted 23 January 2024 ABSTRACT Keywords classification, paddy fields, phenology, practical algorithm. Sentinel-1 SAR Rice plays an essential role in ensuring the food security of Indonesia. Hence, rice . field monitoring using synthetic aperture radar (SAR) satellite data is critical, particularly in tropical This study presents a new algorithm to detect paddy fields in Subang. West Java, using Sentinel-1 SAR with a 12-day revisit acquisition. Three temporal phenological features of paddy growth were used, namely, the minimum and maximum backscatter, as well as their differences. Paddy fields were discriminated from other land covers using a simple thresholding algorithm based on their specific pattern of low minimum, high maximum, and high difference of vertical transmithorizontal receive polarization (VH) backscatter on a 2-dimensional . D) scatter plot. The results showed that the proposed algorithm had an accuracy of 94. 02%, comparable to that of the random forest algorithm and other studies using 3-dimensional . D) parameters. The proposed algorithm reduces the dimensionality from 3D to 2D and is practical for mapping and monitoring paddy fields. In this context, the application of the algorithm to the surrounding regions of Karawang. Indramayu, and Bekasi achieved high accuracy rates of 93. 37%, 92. 87%, and 88. 13%, respectively. Introduction Rice is essential for maintaining Indonesia's food security. Hence, the Indonesian government has established policies outlined in Law No. 41 of 2009 to preserve sustainable agricultural land and fields . Rapid mapping and monitoring of paddy fields are essential to protect existing agricultural land from land Remote sensing approaches that capture EarthAos surface characteristics in spectral, spatial, and temporal terms can be used to classify paddy fields and their changes . This can be achieved by using optical or synthetic aperture radar (SAR) sensors. However, optical sensors are unsuitable for tropical areas because of atmospheric conditions, particularly during the rainy season . SAR sensors produce cloud-free data owing to their ability to penetrate the atmosphere, smoke, and haze . Therefore, the use of SAR satellite data for various land applications is essential in tropical areas, particularly Indonesia. Studies of remote sensing data for paddy field classification have evolved from low-to high-resolution optical and SAR satellites, such as low-resolution . , medium-resolution . Ae. , and high-resolution data . Dong and Xiao . classified paddy fields into three generations based on the literatures ranging from the 1980s to 2015. In the first generation, paddy fields were mapped using reflectance data and image statisticsbased approaches. In the second generation, mapping and improvement were performed using vegetation indices and statistics-based methods. In the third generation, paddy fields were identified using vegetation indices and temporal analysis of the SAR backscatter. Recent developments have begun using paddy phenological growth through remote sensing detection as a key growth approach. Polarimetric information Corresponding Author: Kustiyo kustiyo@ui. of Indonesia. UI Depok Campus. Depok. Indonesia. Department of Physics. Faculty of Mathematics and Natural Sciences. University A 2024 Kustiyo et al. This is an open-access article distributed under the terms of the Creative Commons Attribution (CC BY) license, allowing unrestricted use, distribution, and reproduction in any medium, provided proper credit is given to the original authors. Think twice before printing this journal paper. Save paper, trees, and Earth! from remote-sensing SAR images provides excellent sensitivity for identifying crop phenological stages . An approach using multiple medium-resolution SAR images during the growth stages has been used to classify paddy fields . Park proposed a 1-dimensional . D) paddy field mapping index (PMI) during flooding and harvesting using the normalized difference of the vertical transmit-horizontal receive polarization (VH) backscatter. Furthermore, the three-dimensional . D) phenological features of time-series SAR images are related to growth phenology . Their study included minimum, maximum, and variance backscatter data for a certain period of the paddy life cycle. The 3D phenological features represent complete paddy growth phenologies. flooding, growing, and dynamic changes. On the other hand, the 1D PMI also represents the backscatter dynamics, which was similar to the variance backscatter feature. Therefore, changing the variance with the PMI simplifies the input, because the PMI feature is calculated from two other The present study classified paddy fields using SAR Sentinel-1 remote sensing data based on a simple thresholding algorithm by combining 1D and 3D phenological features and simplifying them into a 2dimensional . D) scatterplot. Three temporal characteristics of paddy growth were examined: flooding phenology . inimum backscatter paramete. , phenology during the growing and harvesting stages . aximum backscatter paramete. , and dynamic land cover change . ifference between maximum and minimum parameter. The minimum and maximum backscatter were replaced by 10% and 90% quantiles, respectively, to eliminate extreme values . Therefore, this study aims to obtain a simple and practical algorithm to classify paddy fields with high accuracy that can be implemented in a larger area to support paddy field mapping and monitoring. Methods Study Area The study area is located in the Subang region of West Java. Indonesia (Figure . Subang covers an area of 2,051. 76 km2 and experiences 100 rainy days per year with an average of 2,352 mm year-1. The topographical areas range from flat in the north to mountainous regions in the south. The altitude varies from 0 to 1,500 m above sea level, and more than 80% of the area has a gentle slope of less than 17A. Furthermore, the primary land use in the area is agriculture, specifically in paddy fields . Technically irrigated paddy fields were cultivated 2Ae3 times per year in the northern area. The central and southern parts of the study area had different slopes. Most paddy fields are located in rain-fed and simple-irrigated systems. The paddy field parcels in the central and southern parts were smaller than those in the northern part . Figure 1. Study area. Data Set This study used Sentinel-1A descending data in the Level 1 Ground Range Detected (GRD) product with Interferometric Wide (IW) swath mode. Sentinel-1A has a spatial resolution of 10 y 10 m and C-band with a nominal wavelength of 5. 6 cm . Temporal data were collected from October 2017 to March 2018 during the initial planting period. This dataset comprises 16 acquisitions at 12-day intervals. The use of data from http://dx. org/10. 29244/jpsl. JPSL, 14. | 429 the first planting period was better than that of the annual data . This study used VH polarization as the most suitable method for detecting and monitoring paddy growth . The training sample data were constructed using systematic random sampling . Sample data were collected at points regularly spaced 500 m apart, followed by the creation of circles with a radius of 25 m. The land cover types for each circular polygon were determined by visual analysis of high-resolution images from Satellite Pour lAoObservation de la Terre (SPOT) 6/7 and Pleiades in 2018. Only samples in the form of homogeneous paddy fields were used, and polygons with mixed land cover were avoided. Field surveys were conducted in circular polygons, particularly for uncertain land-cover types. The analysis was performed using the open-source software Quantum GIS (QGIS). A total of 10,838 points were used as samples, of which 4,138 and 6,700 points were paddy and non-paddy fields, respectively. Processing Flow Diagram The processing flow starts with the pre-processing of the remote sensing data. The task involved single acquisition of data and temporal filter processing, followed by the application of phenology extraction to produce three parameters. The final process involves classification and analysis, as shown in Figure 2. Figure 2. Processing flow diagram. Pre-processing and Phenology Extraction During the pre-processing step, the European Space Agency (ESA) Sentinel-1 toolbox was applied to the Google Earth Engine (GEE) to apply orbital files, remove thermal noise and GRD boundary noise, radiometric calibration to sigma nought backscatter . B), and range-Doppler terrain correction . The next step was to transform the data from sigma to gamma and apply terrain correction . Subsequently, the gammanought backscatter images were resampled to 20 y 20 m and cropped to Subang's area of interest (AOI), followed by stacking the data from October 2017 to March 2018. The resulting image was saved as a single image file comprising 15 layers. Temporal filtering enhances or smoothens temporal data series to obtain meaningful information. In this study, temporal filtering was performed by obtaining the median of three consecutive images. From the 15stacked images, there was a sequence of layers from 1 to 15. Layer 2 was filtered with medians 1, 2, and 3, while layer 3 was filtered with medians 2, 3, and 4. Specifically, layer 1 was filtered with means 1 and 2, whereas layer 15 was filtered with means 14 and 15, because the median of the two images equals the mean. In addition, three phenological parameters of paddy growth were extracted: . the flooding stage before transplanting, . the growing and harvesting stages, and . the dynamic land cover from the flooding to This journal is A Kustiyo et al. JPSL, 14. | 430 harvesting stage. A temporal transformation was applied to extract the backscatter series as a phenological Similar approaches have been used with an optical sensor for other purposes, such as the Landsat-8 cloud-free mosaic, using a statistical-based approach . The spatial-domain Lee filtering algorithm was selected for noise removal, with a radius of three pixels and one number of looks processed using the open-source Orfeo Tool Box (OTB) software . In this context, noise removal was applied to the three phenology parameters instead of the 15 original backscatters to reduce space and computational time. Training Sample and Accuracy Assessment Furthermore, the training samples were divided into training and testing sets. The classification model was developed using the training set, and the product accuracy was calculated using the test set. Subsequently, the classification of paddy fields using the proposed method was compared with the classification of random forest (RF) machine learning to analyze the advantages of the proposed algorithm. The comparison was also observed with previous algorithms for the PMI and 3-phenology parameters using the same classification method of RF and the same sample for consistency comparison. Accuracy assessment was performed using the K-fold cross-validation approach, which showed an accuracy bias . The available samples were randomly divided into four segments and three were consolidated, constituting 75% of the samples for model The remaining segment . %) was used for model validation, and the assessment was performed using metrics such as the overall accuracy, precision, recall. F1-score, and kappa . The Sorensen similarity index was used to evaluate classification results . Classification Using 2D Scatter Plots The classification model in this study was developed using a two-dimensional . D) scatterplot of three phenological growth features. The three features are the minimum and maximum backscatter and their difference, which is the updated PMI. Previous PMI parameters were based on normalized differences. However, this study adopted maximum and minimum backscatter differences to be plotted in a 2D A thresholding method was used to classify each pixel, labeled paddy or non-paddy, according to phenological conditions, using the following equation: X < Tx, and Y > Ty and Z > Tz X = Minimum backscatter. Y = Maximum backscatter. Z = Maximum Oe Minimum backscatter Tx = a threshold for X. Ty = a threshold for Y, and Tz = a threshold for Z The area was classified into paddy fields based on an algorithm during a given period, when the area was initially covered with water (X < T. and then covered with vegetation and bare land (Y > T. The change in the backscatter was sufficient to show the conversion from water to vegetation or bare land (Z > T. The threshold was determined using an iterative process to maximize overall accuracy. In this context, the iteration was terminated when the overall accuracy was optimal. Results and Discussion 2D Scatterplot of Paddy Growth Phenological Parameters Figure 3 shows the phenological parameters of the paddy growth during the period starting from October 2017 to March 2018. The black color in the minimum backscatter image indicates the flooding phase, including permanent water, which is also black (Figure 3. The brightest color in the maximum backscatter image indicates the area covered by permanent plantations or bare land. A less bright color indicates paddy fields during the vegetative and reproductive stages (Figure 3. The bright color in the difference between the maximum and minimum backscatter images indicates a land cover change (Figure 3. The paddy fields were classified based on three phenological parameters according to the following rules: low minimum backscatter . looding phas. , high maximum backscatter . egetative and reproductive stage. , and high change in delta backscatter. The initial threshold for classifying paddy fields was based on three phenological parameters, calculated using a 1D VH backscatter frequency histogram. The initial thresholds for Tx. Tz, and Ty were Ae18. 5 dB, 5. 90 dB, and Ae14. 5 dB, plotted in a 2D scatterplot between the minimum and maximum backscatters as shown in the X-axis and Y-axis, respectively (Figure . Tx is a vertical line separating the X-axis and Ty is a horizontal line separating the Y-axis. Tz is a diagonal line separating high backscatter changes. http://dx. org/10. 29244/jpsl. JPSL, 14. | 431 Figure 3. Phenological parameters of paddy growth from October 2017 to March 2018: . minimum VH backscatter . maximum VH backscatter parameter. difference between the maximum and minimum VH backscatter parameters. Figure 4. 2D paddy field scatterplot pattern between the minimum and maximum backscatter parameters and their possible classes. Seven regions . are bounded by Tx. Ty, and Tz, as shown in Figure 4. The area inside the green box refers to class 1, which was labeled based on a low minimum, high maximum, and high change to represent paddy fields. The remaining areas were described as non-paddy fields. Classes 2 and 3 showed low minimum and maximum values, and the backscatter was stable during this period. The potential land cover classes for numbers two and three included permanent water or fishponds. Classes 4 and 5 showed high minimum values, suggesting areas that were consistently unflooded and maintained vegetated or bare soil land cover. The potential land-cover types for these classes include plantations, forests, permanent bare soil, and Furthermore. Class 6 was related to paddy fields with slight variations. In this context, no pixels were labeled as having either low or high maximum values. The thresholds were recalculated using an iterative process to optimize the accuracy. The iteration process was completed when the threshold remained unchanged. The final thresholds and their deviations for Tx. Ty, and Tz were Ae17. 20 A 0. 24 dB. Ae 50 A 0. 21 dB, and 5. 80 A 0. 14 dB, respectively. This journal is A Kustiyo et al. JPSL, 14. | 432 Classification Result The RGB combination of the three phenological parameters showed that paddy fields were depicted in blue to red shade. Flooding, vegetation, and bare soil occurred from October 2017 to March 2018, showing dynamic changes in the land cover (Figure 5. Most paddy fields in the flat areas in the north were easy to Generally, six main objects can be identified: paddy fields, permanent water, fishponds, plantations, forests, and bare soil or settlements. The homogeneous blue color shows permanent water, such as the sea or reservoir, and the black color shows fishponds. The yellow-to-green color indicates that the area is covered only by vegetation or bare soil. Figure 5b shows the classification results obtained using the proposed method. The red, green, and blue layers were used for dynamic change . igh backscatter chang. , growth or harvesting . igh maximum backscatte. , and flooding phenology . ow minimum backscatte. , respectively. The RGB composites of all the agreed phenologies represented in white are paddy fields. The combination of the two agreed phenologies was magenta, cyan, and yellow, whereas the one agreed-upon was blue, green, and red. Plantation, forest, and settlement were in the same class, with stable high-VH backscatter and high minimum and maximum VH backscatter, acting as the most dominant class in green. The second dominant class in white with a high maximum, low minimum, and high difference was paddy fields. Permanent water was the third most dominant class in magenta and blue, with low minimum and maximum values. The other classes included fishponds in the northern region, irrigation canals, and rivers . Figure 5. RGB combination of three phenological parameters . ed: minimum, green: maximum, blue: difference between maximum and minimu. RGB combination of three classification results . ed: high difference, green: high maximum, blue: low minimu. Table 1 shows the classification accuracy results obtained using a simple threshold for the different numbers of phenological features. All three features, except recall, had the highest accuracy. A lower recall means that the area of the omitted paddy fields was higher, and a high precision showed a low commission error. The precision accuracy was higher than the overall accuracy, which is an advantage of using the SAR data. Paddy fields in mountainous areas are difficult to detect owing to the limitations of SAR images. http://dx. org/10. 29244/jpsl. JPSL, 14. | 433 Table 1. Classification accuracy results using thresholds from different numbers of phenological parameters. Accuracy Overall Precision Recall F1-score Kappa Accuracy (%) from 1 parameter Max-Min Min Max Accuracy (%) from 2 parameters Max-Min. Max Max-Min. Min Min. Max Accuracy (%) of 3-parameter The classification accuracy using the three proposed paddy growth phenology features was 94. This was 31% and 1. 65% higher than the highest accuracy obtained when using two features and a single feature. Considering a single feature, the most accurate was the dynamic change in land cover . aximum-minimum paramete. , with an accuracy of 92. The accuracy of the minimum features of the VH backscatter was approximately 87. however, it could not separate the paddy fields from permanent The accuracy of the maximum features of VH backscatter was the lowest at 40. This maximum feature cannot be used as a single image. however, when combined with dynamic features, the accuracy increased by 1. Testing the Model to Surrounding Regions The algorithm developed in Subang was implemented in the surrounding regions, namely Bekasi. Karawang, and Indramayu, using the same threshold to verify the consistency. Figure 6 shows the classification results of paddy fields in the four regions using the proposed algorithm, which includes an Indonesian paddy field one-map as reference data . The red, light green, and dark green colors represent undetected paddy fields, detected non-paddy fields as paddy fields, and agreement between the classification results and one map of paddy fields, respectively. Figure 6. Overlay paddy field classification results using 2D scatterplot algorithm in four regions and one-map paddy The four regions had similar paddy field characteristics, and most of the paddy fields had technical irrigation. The main water resource was supplied by the Jatiluhur reservoir, specifically in the northern part of Bekasi. Karawang, and Subang, and the western part of Indramayu. Random sampling was applied to generate 20,000 samples within the four regions to assess the accuracy of the paddy field classification. Similar to the previous samples, a circular polygon with a radius of 25 m was generated, and the samples were selected in a homogeneous area. The final number for homogeneous land cover was 16,390, consisting of: 2,864. 4,418. 4,549. and 4,559 samples in Bekasi. Karawang. Subang, and Indramayu, respectively. The results of the classification accuracy assessment of the implementation algorithm for the four regions are presented in Table 2. The accuracy and precision of Karawang and Indramayu were slightly lower compared to Subang. The recall was high, indicating that the number of false-negative samples was lower in This journal is A Kustiyo et al. JPSL, 14. | 434 these two locations than in Subang. This was because of the size of the paddy field parcel in Subang, specifically in the central and southern regions, which were scattered and more difficult to identify from the SAR data. Furthermore, the accuracy values in Bekasi were lower because of land-use change, which was close to Jakarta's capital city. Table 2. Accuracy assessment of the algorithm implemented in four regions. Region Bekasi Karawang Subang Indramayu Overall (%) Precision (%) Recall (%) F1-score (%) Kappa (%) Comparison between Simple Threshold and RF A comparison between the classification results of the 2D scatterplot using these simple thresholds and those of the RF algorithm was similar. The similarity between these two results using the Sorensen similarity index This shows that 96. 83% of the paddy fields matched, and the remaining 3. 17% were dissimilar. The simple threshold algorithm detects that the omitted area is larger than that detected by the RF algorithm. The most inconsistent result was for the fishpond area in the northern region, which was detected as nonpaddy and paddy fields using a simple threshold and an RF classification algorithm, respectively. The simple threshold showed better results for detecting fishponds than the RF algorithm. Comparison with Related Studies The proposed algorithm was compared with those of two previous studies that used three features . and PMI . The comparison was performed using the RF classification with the same training sample, and the scenarios were as follows: . 2D three features . inimum, maximum, and updated PMI). three features . inimum, maximum, and varianc. PMI . ormalized difference of maximum and minimum parameter. updated PMI . ifference between maximum and minimum parameter. The classification accuracy results for the four scenarios are listed in Table 3. Scenario 1 and 2 were similar for all the accuracy types. The updated PMI (Scenario . was more accurate than the PMI (Scenario . The three parameters (Scenario . are independent and must be plotted in 3D. The advantage of the proposed algorithm is the dimensional reduction from 3D to 2D because the third parameter is dependent, thus simplifying the analysis. Table 3. Conflicts related to the management of the L3S are seen in the form, triggers, and handling of conflicts. Scenario Scenario 1 Scenario 2 Scenario 3 Scenario 4 Average (%) Overall Precision Recall F1-score Kappa Deviation (%) Overall Precision Recall F1-score Kappa Conclusion In conclusion, paddy fields can be successfully discriminated from other land cover types using a simple thresholding algorithm. The algorithm was based on the specific pattern of low minimum, high maximum, and high difference in VH backscatter in the 2D scatter plot. The classification result of the proposed algorithm had an accuracy of 94. 02% A 0. 34%, which was comparable to the accuracy of other studies using 3D parameters of 93. 97% A 0. 51% and comparable to the accuracy of the RF algorithm with a Sorensen similarity index of 96. The results showed that the proposed algorithm reduced the dimensionality from 3D to 2D, and was easier to implement for mapping and monitoring paddy fields. The application of the algorithm to the surrounding regions of Karawang. Indramayu, and Bekasi achieved high accuracy rates of 37%, 92. 87%, and 88. 13%, respectively. Therefore, this simple and practical algorithm can be implemented to classify paddy fields in a larger area to support national paddy field mapping and monitoring with high-accuracy mapping, particularly in plain areas. In higher-elevation areas, particularly steep slopes, the method needs to be improved. http://dx. org/10. 29244/jpsl. JPSL, 14. | 435 Author Contributions K: Conceptualization. Methodology. Software. R: Conceptualization. Methodology. Investigation. AHS: Conceptualization. Methodology. Investigation. DK: Conceptualization. Investigation. RDD: Writing - Review & Editing. LBP: Writing - Review & Editing. Conflicts of Interest There are no conflicts to declare. Acknowledgements Funding for this work was provided by the Indonesian Ministry of Research. Technology, and Higher Education through the incentive study program for the Fiscal Year 2019 . etter no. 14/INS-1/PPK/E4/2. The publication was funded by the University of Indonesia (FMIPA UI) through publication support for the doctoral program in the fiscal year 2022/2023. References