SINERGI Vol. No. October 2023: 343-360 http://publikasi. id/index. php/sinergi http://doi. org/10. 22441/sinergi. Image Segmentation in Aerial Imagery: A Review Ade Purwanto*. Dewi Habsari Budiarti. Mukti Wibowo. Irfansyah Yudi Tanasa. Yomi Guno. Aris Surya Yunata. Asyaraf Hidayat. Fithri Nur Purnamastuti. Dede Dirgahayu National Research and Innovation Agency (BRIN). Indonesia Abstract The problem of distinguishing objects has plagued researchers for many years because of low accuracy compared to human eyesAo In the last decade, the use of Machine Learning in aerial imagery data processing has multiplied, with the technology behind it has also developed exponentially. One of those technologies is image-based object identification, which relies heavily upon data To reduce the computational load, various data segmentation algorithm was developed. This study is focused on reviewing the various image segmentation technology in aerial imagery for image recognition. Literature from as far as 1981 from various journals and conferences worldwide was reviewed. This review examines specific research questions to analyze image segmentation research over time and the challenges researchers face with each method. Machine Learning has gained popularity among segmentation methods. However. Deep Learning has been aggressively put an essential role in it by overcoming many of its The advanced algorithm used in Deep Learning to process the segmentation may drive more efficient and accurate data Keywords: Aerial Imagery. CNN. Image Processing. Image Segmentation. Machine Learning. Review. Article History: Received: December 2, 2022 Revised: February 3, 2023 Accepted: February 15, 2023 Published: October 2, 2023 Corresponding Author: Ade Purwanto National Research and Innovation Agency - BRIN Indonesia Email: adep001@brin. This is an open-access article under the CC BY-SA license INTRODUCTION Image segmentation divides a digital image into several image segments, called image areas or image objects . ixel set. The objective is to change the image's representation to be simpler, more intelligible, and easier to analyze . Image segmentation is customarily applied to find objects and boundaries . ines, curves, etc. ) from image More specifically, it is the action of attaching a label to each pixel in an image so that pixels with the same label share specific properties. Segmentation extracts objects of interest for more advanced analysis, such as classification and object recognition. The technique was utilized to separate the appropriate object of interest from the image to implement a more accurate analysis, faster processing, and lighter computing There are many techniques available for segmenting images. Regarding the nature of the object to be classified, those methods could be classified into two major fields: the semantic and the instances method. The semantic process refers to actions of pixels identification, grouping, and labeling to a visual that forms an object. At the same time, the instances classify each object individually, even if the objects are from the same An example is given in Figure 1. To list some of the methods in image segmentation are presented in Figure 2. Figure 1. Example of segmentation Purwanto et al. Image Segmentation in Aerial Imagery. A Review SINERGI Vol. No. October 2023: 343-360 Table 1. Research Questions No. Figure. Segmentation Methods While there are already several review articles on image segmentation, very few of them discuss the specific technique used in aerial imagery data. Some discussed segmentation for medical imagery, some focused on vegetation science, and others focused on sensor and vision technology . Ae. This study attempted to fill the hole in the review by focusing on image segmentation in aerial imagery technology. extensive review of numerous sources with inclusion and exclusion criteria was done to select a suitable reference. METHODS This systematic review focused on various technology for aerial image segmentation, leading up to machine learning. A system has been developed to conduct this review to sort previous papers accordingly. This system is devised to maintain the review's objective to image segmentation technology for standard color (RGB) Also, this system aims to reduce the risk of publication bias. The first step in the system is to frame the research questions. The relevant research publication was searched using various online databases and digital libraries. The accumulated publications were then sorted according to the inclusion and exclusion criteria. Research Questions The chosen methodology to be used in this review paper is by framing a few research These research questions can be seen in Table 1, following the approach of Rani et al. Research Questions What are the different Machine Learning-based approaches used by researchers for aerial image segmentation? What are their limitations? What are the different techniques used for image pre-processing and feature extraction? Which Machine Learning techniques are used primarily for aerial image segmentation? What are the different metrics used for evaluating the performance of the proposed Machine Learning How has Machine Learning-based image segmentation developed over time? What are the main challenges in implementing Machine Learning techniques in aerial imagery? Search Process and Sources of Information For this review, the authors compiled research publications mainly in segmenting images obtained from air vehicles . UAV. The earliest reference was as far as This step is established to give a clear view of the development of image segmentation prior to machine learning until currently, where machine learning is heavily explored and utilized. An online search for these publications used the keywords "image segmentation", "photogrammetry", "machine learning", "deep learning", "aerial images", "drone", "UAV", and "satellite" with an array of different combinations. Some publications that explored "classification" and "identification" were also included if they contained mentions regarding their method in image segmentation. Inclusion and Exclusion Criteria for Article Selection Despite the limitations set for searching the research publications related to this topic, there are still resulted in a lot of publications. To filter out unrelated publications and reduce the bias risk, the authors defined the inclusion and exclusion The inclusion criteria are defined as: Publications are written in the English The authors had come across a few publications written in Bahasa. Chinese, and Korean, so these are excluded purely due to the importance of internationally recognized . Publications related to image segmentation techniques for standard color (RGB and monochrom. Publications using images from multispectral cameras are excluded to limit the scope of the review. Publications that are viewed to be able to answer or represent an answer to the research Purwanto et al. Image Segmentation in Aerial Imagery. A Review p-ISSN: 1410-2331 e-ISSN: 2460-1217 Some exclusion criteria have been mentioned briefly in the inclusion criteria. However, more exclusion criteria are: Unrelated studies . Duplicate studies . Abstract-only publications . Publications that are unable to answer the research questions. The flowchart of the study was as in Figure Research objectives were determined, followed by the creation of exclusion and inclusion criteria for the reviewed articles. After that, a database for the selected article was established, and then an analysis of the data could be carried on. Lastly, some important points on each development step could be mentioned before concluding the study. RESULTS AND DISCUSSION Segmentation is a process that segments an image into parts and classifies the elements with similar traits. This process aims to separate the object in the image from the background. The classical segmentation approach consists of 4 primary methods . thresholding, clustering, region-based and edge-based, which can be described as follows. The thresholding method divides the image pixels into two groups based on a threshold value. the pixels with values above the threshold value . irst binary leve. and pixels with values below the threshold value . econd binary In the clustering method, the pixels of the image are organized into more than two groups. each cluster contains pixels that are considered Research Objective Create Criteria for Article Database Survey on Articles Available Based on Criteria Analysis on Selected Material Presenting Important Point of Each Selected Article Drawing Conclusion Figure 3. Course of the study The edge-based method detects boundaries, such as points, lines, and edges, by searching for discontinuity in intensity . rey leve. In Regionbased segmentation, areas in the image are extracted based on specific pre-determined criteria . uch as grey or color intensit. To better understand the development of image segmentation, a thorough survey was done to track the growth of the technology. As early as 1981. Fu and Mui already categorized image segmentation into three classes: . characteristic highlight thresholding, . edge location, and . locale extraction . Haralick and Shapiro then mentioned that segmentation techniques could be roughly classified as measurement space-guided spatial clustering, single linkage region growing schemes, hybrid linkage region growing schemes, centroid linkage region growing schemes, spatial split-and-merge Of all the methods above, the hybrid linkage schemes offer the most excellent compromise between having smooth boundaries and a few undesirable locales combined . Almost a decade later, a review was done by Pal and Pal from Indian Statistical Institute in 1993. then, it was mentioned that the literature on the segmentation of color images was not as rich as the grey tone. This research group reviewed and summarized some available techniques and attempted to cover fuzzy and non-fuzzy techniques, including segmentation for color images and neural network-based methods. The classical and fuzzy-mathematical approaches mentioned several techniques, such as histogram thresholding, edge detection, and semantic and region relaxation approaches . In 1994. Skarbek and Koschan reported that Image segmentation, i. , identification of homogeneous regions in the image, has been the subject of considerable research over the last At that time, the problem of segmentation for color images, which convey much more information about objects in scenes, received much less attention from the scientific They report an extensive survey of algorithms for color image segmentation, categorizing them according to a well-defined list of attributes, suggestions for their improvements, and descriptions of a few novel approaches . The following research on segmentation then focuses on improving segmentation methods' accuracy, precision, and computational speed and reducing the amount of manual interaction . According to Pham et al. , accuracy and precision can progress by consolidating earlier data from map books and combining discrete and continuous-based segmentation strategies. parallelizable approach, such as neural networks. Purwanto et al. Image Segmentation in Aerial Imagery. A Review SINERGI Vol. No. October 2023: 343-360 could be a promising method to improve computational efficiency. A fully automatic color image segmentation was reported by Deng et al. by quantizing the data into various classes to be used to differentiate regions in the data . These class labels were used to replace the pixel colors to obtain a class-map image. The step of this experiment is shown in Figure 4. A color image was quantized before the class-map was After that, segmentation was carried out by marking the region to get the final result. Experiments show that JSEG provides good segmentation results on a variety of images, as shown in an example in Figure 4. The algorithm has three parameters that need to be specified by the user. A limit for the color quantization process. decides the least distance between two quantized colors. The number of scales targeted for the image. A limit for region merging. These parameters are essential since of the varying image characteristics in different applications. Another research group discussed that color image segmentation was based on monochrome segmentation, which was applied to . Gray-level segmentation strategies can be directly applied to each component of a color space and then combined the result to get a final result. But, one of the issues is how to utilize the color data as an entirety for each pixel. When the color is separated into three components, it is so scattered that the color becomes essentially multispectral data, and the color data that people can understand is lost. Another issue is how to select the color representation for the segmentation. Since each color has its strengths and weaknesses, no single color representation can outperform others for segmentation purposes. Cheng et al. some of the attempted research on these HIS method solved it to some extent, with some issues at the hue channel, which was unstable at low saturation. Some researchers introduced physics models for color. Figure 4. Schematic of JSEG by Deng et al. result of JSEG . Purwanto et al. Image Segmentation in Aerial Imagery. A Review p-ISSN: 1410-2331 e-ISSN: 2460-1217 However, by utilizing it, their many restrictions avert it from being extensively used . Ae. In the case of the monochrome procedure. Cheng . , 19, . made a comparison of each method in the monochrome segmentation technique and the color spaces characteristics as shown in Table 2 and Table 3, respectively. From Table 2, it is understood that the thresholding method is the simplest to execute, despite its Because of this, much research has been done to improve its performance, such as iterative threshold . , adaptive threshold . , and Otsu method . Ae. Yanowitz & Bruckstein pioneered a technique for image segmentation by utilizing adaptive thresholding by interpolating the image grey levels at points where the gradient is high, showing the probable object edges . Wang et al. tried to develop iterative thresholding for a multi-phase image. The weakness is this method depends on the Lambda constant . Some methods combine the classical approach with the neural network, such as combining the region-based method with k-NN . Table 3 demonstrate the characteristic of each color space with the benefit and handicap, which then opened up a greater possibility of segmentation technique by utilizing the character of the spaces. However. Cheng concludes that a universal theory applied to every color image segmentation does not exist yet. All of the segmentation methods existing for color image segmentation are ad hoc. These methods are heavily application-dependent, so there are no general algorithms and color space. On the other hand, the Fuzzy method gives an auspicious technique . A research group led by Khan tried to evaluate segmentation techniques with a focus on hybrid algorithm solutions . They categorize segmentation techniques into threshold-based, region-based, edge detection-based, fuzzy, and ANN-based. Table 2. Techniques of Monochrome Segmentation . Technique Histogram Methods Inquire about several peaks of each region's data Benefit Low computational requirement, no prior data needed Regionbased Feature Pixels were grouped into If the region is clear, this is the best Regions form separate clusters in the feature locate points with abrupt changes in the gray level Easy and straightforward Using fuzzy operators, properties, mathematics, and inference rules to deal with various problems in Apply NN in clustering or Works well Edge Fuzzy Neural Network Very good for images with good contrast between each region Parallelable, simple Handicap Poor result for data with no clear peaks does not count the spatial Need No spatial information. Image dependent and challenging to determine cluster number Noise-prone and does not perform nicely in the image with too many edges Tend to have heavy computation Need time to train Table 3. Colors spaces characteristic . Technique RGB Benefit Suitable for display purposes YIQ. YUV. I2 I3 Some of the RGB correlations can be eliminated in a short amount of time. It performs well in edge Applicable for varying levels of illumination that create highlights, shading, and shadows in images. It can also discern different objects based on color Each color component can be identified without being hindered by its brightness level to effectively counter the effect of different illumination. Can process different information regarding color and intensity without any dependency. It uses geometric separation within CIE space to compare colors and can efficiently measure the difference. HSI Nrgb CIE spaces Handicap performs poorly for segmenting color images because of a high correlation value It encounters the same problem as RGB, though at a lesser value. The non-linear transformation causes low saturation that is unstable . , and there exists a non-removable singularity At low intensities, the noise level is very high. There exists a non-removable singularity. Purwanto et al. Image Segmentation in Aerial Imagery. A Review SINERGI Vol. No. October 2023: 343-360 On the other hand. Martinez et al. color index-based thresholding method for background and foreground segmentation The method is based on color index measures according to the HSV-DT and GMR-T. The color index was modified to give better data. Two fixed threshold methods were proposed to differentiate foreground . reen plan. and background . in plant images. The first fixed threshold was a simple fixed threshold. The second fixed threshold method applies high training data to obtain its parameters from actual environmental conditions . , cloudy, water reflectance, luminance varianc. Lu et al. proposed an unconstrained optimization of a linear combination of RGB component images to enhance the contrast between plant and background regions, followed by automatic thresholding of the contrastenhanced images (CEI. This method was demonstrated to be valid by using five plant image datasets acquired under different conditions and ground-truth data. They evaluated ten general index images using CEIs to get image contrast and the accuracy of the segmentation. A more profound threshold technique was assessed by AlAmri, using Mean Technique. Pile. HDT, and EMT on satellite images. The result was HDT and EMT were superior to other histogram techniques. contrast, more efficient computation resources of threshold technique were observed using the TOPSO algorithm. Particle Swarm Optimization, and the 2nd Otsu algorithm . Dunford et al. then analyzes data using pixel-based and objectoriented methods. The data were acquired using a paraglider UAV covering 174 hectares of land to identify vegetation and count the dead wood. Analysis was done on both single images and The data quality was varied because of the aircraft movement. thus, it decreased the accuracy of the analysis significantly . Karoui introduced an unsupervised image segmentation method in the regiony utilizing level set methods and texture statistics, while Zhou brought forward a mean-shift clustering algorithm . For edge detection. Hsiou incorporated a morphological operator with region-growing. They perform image enhancement followed by the dilation residue method to find the edges . Liu et al. in 2008 utilized the fast region imaging method, with the concept that all pixels with similar features should be segmented in the same region . They focus on the relationship between nearby pixels instead of the whole image feature. The algorithm, as shown in Figure 5, demonstrates the processing step, which is started by the edge detection process. After that, they over-segment the filtered image to get the primitive partitions using watershed or regionbased. Based on the results of the initial partitions obtained in the second step, a k-NN graph is constructed in the third stage. This stage also includes the computation of a new region similarity measure function proposed in the algorithm section i employing local features along region The last stage is the region merging process using the k-NN graph by utilizing two the merging rule and the stopping rule. The algorithm used by Liu as shown in Figure 5, was proved to be able to handle both colorful and grayscale images to get the image's partition However, the method was memory expensive and had various parameters which needed to be set manually. The final result of the proposed segmentation algorithm expressed in Figure 6, with a total of 7 regions are segmented in the monochrome process and a garden was successfully segmented from the surrounding Xuan and Hong then improved the weaknesses of the low-contrast images by improving the noise cancellation and providing a more precise image edge . The development of imaging sensors in the last two decades has provided very high image data, leading to rapid advancement in the data analysis algorithm, edge-detection The urgent need for accurate geospatial data, including the algorithm to extract the important features from the geographical view, has led to object-based image analysis (OBIA), which is constantly growing following the world's demand . This method is formulated to obtain meaningful data from remote sensing imagery instead of pixels . Ae. Even though OBIA consists of a two-step procedure, image segmentation and object classification, most of the publications only discuss image segmentation. Because of this, some researchers made review articles about the OBIA method . , 8, 55, . and were extensively reviewed by Kotaridis, segmentation advances . Figure 5. Flow chat of Liu et al. Purwanto et al. Image Segmentation in Aerial Imagery. A Review p-ISSN: 1410-2331 e-ISSN: 2460-1217 Figure 6. Final result of Liu et al. in grayscale . and color image . OBIA integrates knowledge from various disciplines with the target of production and utilization of geographic information. OBIA also has been applied to low spatial resolution imagery, as mentioned in . In the last decade, a method that utilizes the growing field of Artificial Intelligence has seen a lot of potential. Deep Learning, a branch of machine learning that improves on earlier neural network systems, was explored extensively. It added more layers to the neural network, an attempt to copy the way human brains work. This method enables a system to learn from a large amount of data with improvements in optimization and accuracy. This method has been explored in many research connected to the image segmentation aerial imagery and the medical field. In this method, each pixel in an image is given a label. Researchers have explored this method for plant analysis . , 41, 53, 59Ae. , monitoring the aftermath of a natural disaster. , 65Ae. and spotting areas with high water turbidity . All of these researches use semantic segmentation for images obtained by a UAV or satellite. Other researchers explore different variations in the method, such as MathConvNet . U-Net . , . TreeNet . , and . Some of the methods mentioned in semantic segmentation were Fuzzy. Bayesian network. K-nearest neighbor. Support Vector Machine, decision tree, and Random Forest. Deep Learning architecture enhanced semantic segmentation, particularly with the rise of Convolutional Neural Networks (CNN). By the robustness characteristic. Fuzzy C-Means (FCM) Clustering was the most utilized method in Fuzzy . Later on. Puig et al. propose to simplify image analysis by reducing user interaction in supervised learning. They apply Gaussian Kernels Convolution to the RGB channel of the original image, followed by a K-means clustering This method is often described as the soft K-means clustering method. The study case of this method was to assess the damage on sorghum plantations caused by white grub pests. Puig et al. demonstrated that their method could classify the crop field into three different areas according to the health status of the crop . Popescu and Ichim built a feature selection for image segmentation using the inter-spectral cooccurrence between RGB and HSV channels by applying the LBP . ocal binary patter. approach, which can describe an image's surface texture. This method has two phases: the learning phase and the segmentation phase. ROI was obtained during the learning phase, and each ROI was segmented using different features . On the other hand, in the field of CNN, some researchers developed an exciting method, such as super-pixel segmentation. Yuan and Hu utilize Simple Linear Iterative Clustering . by utilizing a UAV to acquire forest images. Then they built the forest classifier using the input vector from a 12-dimension descriptor obtained from the superpixels. However, the dataset was limited to only ten images, so the algorithm is not proven for a larger and condition-vary dataset. Bergado et al. on the other hand, attempt to do automatic extraction of the Spatial-Context Feature (SPF) and land cover classification using the CNN The data was high-resolution images with sub-decimetre resolution . There are two stages in this method, . CNN was used to learn the spatial context of the image, and . MLP was used for the classification. This experiment shows that CNN gives better results than individual pixels classifiers or other hand-crafted methods such as GLCM and LBP. Another example of the CNN method was mentioned in . , 62, 78Ae. Kim and his team combined U-Net with Pyramid Pooling Layers on an RGB aerial image of Seoul Purwanto et al. Image Segmentation in Aerial Imagery. A Review SINERGI Vol. No. October 2023: 343-360 City, while Rahnemoonfa et al. applied RNN dan CNN net on UAV images data of the Houston Flood. By 2019, a dual-path and lightweight DCNN was demonstrated by Zhang et al. by analyzing Vaihingen and Postdam aerial images in Germany. The dual path DCNN consists of . the Spatial approach, which combines a global context with multi-level features to accommodate the local and international information. HED olistically-nested edge detectio. to detect boundaries for feature learning . Ghassemi et al. developed a deep neural network training method using the convolutional encoder-decoder network. Their research object was satellite images. Their result was that residual encoders provide a better generalization capability than an ordinary convolutional network. furthermore, this capability increases the depth of the encoder. Experimental results also show that adapting a trained network improves its performance significantly with a lower level of complexity than training a network from scratch. The architecture used by Ghassemi et al. better results than other references by using two datasets with many different characteristics between training data and test data . Wu et al. from the Beijing University of Posts and Telecommunications research group developed an image semantic segmentation method using the Attention Dilation-LinkNet (AD LinkNe. Neural Network. It adopts an encoder-decoder structure serial-parallel The goal is to enlarge the receptive field and assemble features with various scales, such as long walks and small pools. A channel-wise attention mechanism is also designed to sharpen contextual information in satellite images and a pre-trained encoder. LinkNet uses three satellite datasets which are augmented using horizontal, vertical, and diagonal folding methods. After that, color augmentation was also carried out to deal with different lighting due to differences in image capture time. They conclude that for surface classification and road extraction. AD-LinkNet significantly improves segmentation accuracy. In the same year. Wurm et al. also conducted research to analyze the ability of transfer learning by using a fully convolutional network (FCN) that has been trained and applied to datasets that are different from the training dataset. They took a case study of mapping slum areas in urban areas. Their method is that the FCN was trained using image data from the QuickBird satellite. Then the FCN is transferred to be used on the low-resolution dataset from Sentinel-2 and the high-resolution dataset from TerraSAR-X. The conclusion was the experimental results show that the high geometric resolution . on the QuickBird image gives the best experimental results, followed by Transfer learning to the Sentinel-2 image, which significantly improves the quality and accuracy of However. Transfer learning to TerraSAR-X images did not improve the segmentation results and even decreased the results obtained . By 2020, research using the CNN method was carried out by Bhatnagar et al. by comparing the results of segmentation using Machine Learning (ML) and Deep Learning (DL). The ML method uses Random Forest as a pixel classifier and Graph Cut for image segmentation, while the DL method uses CNN with a combination of ResNet50 and SegNet architectures. As a result, segmentation using ML has an accuracy value close to 85%, while DL has an accuracy value close to 90%. However, more labeled data . round trut. is needed to train the system, more computation time, and higher hardware requirements to perform segmentation using the DL method. The conclusion that can be drawn is that by looking at the comparison of needs and the value of the difference in accuracy, it can be concluded that the ML method is more profitable to use. This is because to map the bog area, the network needs to be trained for each area, topography, season, and other natural conditions, which will be very inconvenient . On the other hand. Valente et al. an automated method to count the number of plants using high-resolution photos from UAVs with a case study for spinach plants. The technique used is machine vision, namely the Excess Green Index and the Otsu method, and transfer learning uses CNN to identify and count the number of plants. Two shots were taken for each data collection area to produce two original images with different spatial resolutions . and 16 mm/pixe. This is done to optimize the computation time by reducing the image size. This method is validated using ground truth data covering an area of 1/8. They concluded that the calculation accuracy value is 95% for a spatial resolution of 8 mm/pixel in 172 m2. However, for a land area of 3. 2 hectares, this method gives an error of 42. 5% . Ulmas & Liiv also use convolutional machine learning with a modified U-Net structure to map land cover on satellite images. The data used is from the BigEarthNet satellite. The conclusion obtained by this team is that CNN is possible to be used in satellite data segmentation. Some notes on their method are that the land cover category needs to be changed to get highaccuracy results. The improvised U-Net model proved to give better results for low-resolution The presence of noise and mislabels in Purwanto et al. Image Segmentation in Aerial Imagery. A Review p-ISSN: 1410-2331 e-ISSN: 2460-1217 BigEarthNET data needs to be considered because this greatly affects the results . Lu et attempted to retrieve plant physical data . anopy area, width, positio. by segmenting the image using Nayve Bayes with a case study using an apple orchard. Their method is that the original image is first pre-processed using SLRCM (Shadow Region Luminance Compensation Metho. to fix the lighting in the image. The performance results of the proposed method give a precision rate of 95. 30%, a recall rate of 84. and an F1-score of 89. The quality of the segmented image is also better than the results using the usual GMM or K-means algorithm . There is also research to use a VGG-based fully CNN to map flooded areas using drones by Gebrehiwot et al. , with their finding that a pretrained and fine-tuned model may result in superior performance, specifically when domainspecific training data are scarce . Yang et then introduced Mask R-CNN-based model by training it on Google Earth imagery, gaining superior performance compared to ResNet-100 . In the same year. Guo et al. developed a way to extract water out of GaoFen-1 data in a multiscale dimension with comparable results to U-Net and DeepLab-V3 . Water edge detection was detailly obtained by Pan et al. by incorporating a modified U-Net with a super-pixel method. Furthermore, in high-resolution imagery . Wieland et al. successfully separated normal water from flood water in various scenarios by using the fuse of U-Net. DeepLab-V3 with Mobilenet-V3. ResNet-50, and EfficientNet-B4 . Meanwhile. Li and Cheng, in late 2022 combined CNN and a transformer method to try to enhance the segmentation result of DeepLabV3 and UCTransNet . Tang et al made distinctive research to solve a massive need for ground truth data by utilizing mCL LC as the argument to determine various image data and use it in the encoder-decoder structure of CNN . Chen et al made a review on satellite data imagery However, unfortunately, only nine papers were reviewed . With all of the mentioned papers, the trend of research in the field of segmentation was shown with the reach of 12k publications regarding segmentation in Scopus-indexed articles, as presented in Figure 7 by providing "image segmentation" as the searched word in the article title, abstract, and keywords. With the keywords "image" and "machine learning", the trend shows an aggressive increase from 2016 in the publication using machine learning as their base image processing method, as shown in Figure 8. All in all, we have reviewed 149 references comprised of reports, articles, conference papers, and books from 1982 to 2022 related to image segmentation, citing 78 articles regarded as important advances in this field. From the reviewed articles, the data in Figure 8 shows that machine learning is gaining popularity among Specifically, deep learning has obtained an important breakthrough in solving many problems in image segmentation. In our reference. CNN is still the most popular method, while at the same time. OBIA is caching up aggressively, as shown in Figure 9. From the viewpoint of the reviewed reference type, there are five books cited in this article. In contrast, 17% of the references were conference papers from various fields, more than a hundred journal articles, and a couple of thesis and reports, as shown in Figure 10. In terms of application, the reviewed paper consisted of applications in Earth and Planetary Science as the majority, with 28%, followed by Vegetation Science, with 23%. Other fields were Vision. Medical, mathematics. Archaeology, and Material Science. Figure 11 displays the proportion of the reviewed articleAos field, while Figure 12 expresses the comparison of the aerial imagery articlesAo data source. Of the Aerial data from the reviewed article, there are 63% of images from satellites and 37% from UAVs. Figure 7. Publication trend in image segmentation Purwanto et al. Image Segmentation in Aerial Imagery. A Review SINERGI Vol. No. October 2023: 343-360 Figure 8. Publication trend in Machine Learning Figure 9. Segmentation method reviewed in this study 1% 4% Book Conference Paper Journal Article Figure 10. Type of referenceAos source Vision. Archeology Earth and Planetary. Vegetation Material. Medical. Mathematic. Figure 11. Application field of the reviewed articles Purwanto et al. Image Segmentation in Aerial Imagery. A Review p-ISSN: 1410-2331 e-ISSN: 2460-1217 Satellite UAV Figure 12. Comparison between UAV and Satellite application of the reviewed articles Some of the exciting studies in Machine Learning in aerial imagery were done, for example, by Heffels et al. They successfully solved popular datasets of DroneDeploy with an accuracy of IOU 70%. This number is the highest publicly available validation They use DeepLabv3 Xception65 architecture . to employ U-net neural network Figure 13 shows how ground truth labeling was done. Every object as cars, buildings, and vegetation, was labeled and then overlayed on the original picture with the mask's transparency of 50%. This ground truth labeling was then used to train the system. The conclusion of their study was augmentation, batch normalization, and the ResNet50 encoder DeepLabv3 Xception65 architecture. example of the result is in Figure 14, which is an overlayed image predicted label by Keras CCE ResNet50 U-net baseline. Table 4. Summary of Methods Research Questions What are the different Machine Learningbased approaches used by researchers for aerial image segmentation? What are their Findings Method: CNN. RCNN. DCNN. GLCM. LBP. U-Net. FCN. MathConvNet. TreeNet. Bayesian-Net. AD-LinkNet. K-nearest neighbor. SVM. Decision Tree. Random Forest. Fuzzy C-Means (FCM). Gaussian Kernels Convolution. SLIC. SPF-CNN. SLRCM, . Limit: There is no single algorithm able to serve as an all-purpose method. Each and every method has its own benefit and weaknesses. Mostly, by combining different methods, the optimum segmentation purposes will be achieved Ground truth data is scarce in to train model What are the different techniques used for pre-processing Histogram thresholding. Region limitation. Feature clustering. Edge separation. Super-pixel Which Machine Learning techniques are CNN and its variation. U-Net. SVM What are the different metrics used for evaluating the performance of the proposed Machine Learning models? Performance is measured by accuracy percentage compared to ground truth data. Mostly in mIoU . ean Intersection over Unio. dan mPA . ean Pixel Accurac. How has Machine Learning-based image segmentation developed over time? What are the main challenges in implementing Machine Learning techniques in aerial imagery? From monochrome imagery to complex inter-spectral interaction data acquisition, algorithms developed to be more complex and demanding. However, the advance in hardware technology support it to accelerate the development over time. A To acquire a single algorithm that can serve as an all-purpose Every method has its benefit and weaknesses, and every segmentation process has its purposes. Mostly, by combining different approaches, the optimum expected result will be acquired. A To acquire enough data to train the model in various environments and requirements so that the algorithm could get the optimum result from a certain dataset Purwanto et al. Image Segmentation in Aerial Imagery. A Review SINERGI Vol. No. October 2023: 343-360 Figure 13. (Lef. Original picture, (Cente. labeling, (Righ. overlay label on top of the original picture . Figure 13. The Drone Deploy dataset overlaid with 65% transparency . Figure 14. op-lef. Drone image, . op-righ. ground truth labeled image, . ottom-lef. DL semantic segmentation using U-Net with ResNet50 model, . ottom-righ. DL semantic segmentation using SegNet and ResNet50 model . Purwanto et al. Image Segmentation in Aerial Imagery. A Review p-ISSN: 1410-2331 e-ISSN: 2460-1217 Bhatnagar et al. , on the other hand, apply a deep-learning algorithm. CNN architecture was utilized with the combination of esNet50 and SegNet . The study shows that in comparing state-of-the-art pixel-based classifiers, the best ML algorithm for the given dataset was shown to be the RF classifier with an accuracy of 92% on Clara Bog drone images. The ResNet50 base model with both U-Net reached 5% and SegNet 89. An example of their images is shown in Figure 15. time when its performance overcame many of its machine-learning counterparts. It uses much more layers as it exploits the advances of the current hardware technology. The challenge is to get enough ground truth data so that the algorithm can be trained sufficiently to extract the optimum information out of a dataset. Deep Learning can potentially be used as a major segmentation method in image recognition in various fields that were once only ventured by using the hyperspectral camera. CONCLUSION Image segmentation from aerial imagery taken with satellites, aircraft, and UAVs has been essential for various purposes. For civilian purposes, it plays a big part in disaster mitigation and data gathering for crops and forestry, amongst other things. It is inexpensive when no specific camera, such as hyperspectral, is Depending on the area of observation, images from satellites are preferred when the area is immense. While images from UAVs are chosen for a smaller area. However, aerial images have some disadvantages as they are very susceptible to the lighting condition when the images were taken, cloud cover, etc. The time of day can influence whether certain areas fall under the shadows of certain mountains, clouds, or Particular objects are also possible to appear similar from a distance, such as a river and a road. There is also the problem of low contrast, such as the difference between a field of grass and a forest. For years, only the eyes of a human can discern between them as it is difficult to define in an algorithm to differentiate it. The vision technology research to recognize objects started as early as the 1980s. However, a breakthrough was apparent after Machine Learning was introduced. It offers a more straightforward way to solve many problems in the field of image segmentation as long as an abundance of images and sufficient computational resources are available. A system can then be trained to segment images, extracting features from them and classifying them using a method similar to training a human observer. Even though its early development was plagued with limitations due to the hardware technology, resulting in the preference for using grayscale images, the advanced development of hardware and algorithm was able to counter some of the problems. Methods such as FCN and CNN, determined by the layers they used in training, provide better results than the conventional methods, though still not without some challenges. The summary of our findings is presented in Table 4. The deep learning method made a more drastic improvement over REFERENCES