Communications in Science and T echnology 1 . 7Ae14 COMMUNICATIONS IN SCIENCE AND TECHNOLOGY Homepage: cst. A comprehensive review on intelligent surveillance systems Sutrisno Ibrahim* Electrical Engineering Department. College of Engineering. King Saud University. Box 800. Riyadh 11421. Saudi Arabia Article history: Received: 28 April 2016 / Received in revised form: 11 M ay 2016 / Accepted: 17 M ay 2016 Abstract Intelligent surveillance system (ISS) has received growing attention due to the increasing demand on security and safety. ISS is able to automatically analyze image, video, audio or other type of surveillance data without or with limited human intervention. The recent developments in sensor devices, computer vision, and machine learning have an important role in enabling such intelligent system. This paper aims to provide general overview of intelligent surveillance system and discuss some possible sensor modalities and their fusion scenarios such as visible camera (CCTV), infrared camera, thermal camera and radar. This paper also discusses main processing steps in ISS: background-foreground segmentation, object detection and classification, tracking, and behavioral analysis. Keywords: Intelligent surveillance system (ISS), object detection, human detection, moving object detection, object tracking, object recognition, behav iora l analysis. CCTV. Introduction Massive amount of security cameras, along with other sensors, have been deployed to monitor critical infrastructure such as: military bases, airport, power plant, banking, campuses, etc. Manual monitoring by human operator is inefficient solution or even unpractical because human resource is expensive and has limited ability . Intelligent surveillance system (ISS) is envisioned to automatically monitor the environment or infrastructure with less or without human intervention. Such monitoring tasks include automatically detecting and tracking object . ike human or vehicl. and performing further analysis and actions. Signal processing, image processing, and artificial intelligence . achine learnin. techniques play important role to develop such intelligent system. Visible camera such as CCTV is the most common modalities . for surveillance system. It has long been in use to monitor environments, people, events and activities. Extensive studies have been conducted to automatically analyze data . mage or vide. from surveillance camera. Much of these studies have been discussed in several focu s ed review papers: background-foreground segmentation . , objects detection and classification . , tracking . , and behavioral analysis . Different sensor modalities other than visible camera have been explored also for surveillance system such as infrared camera and thermal camera . , radar . adio detection and rangin. , . , lidar . ight detection and rangin. , audio sensor . , etc. Several review papers have discussed also different techniques for sensor fusion . to improve the system However, there is still lack of comprehensive paper that discusses general overview of intelligent surveillance system. The main objective of this paper is to provide general overview of intelligent surveillance system and review the existing methods for each its processing steps. The rest of this paper is organized as follows: Section II presents an overview of intelligent surveillance systems. Followed by Section i discusses some possible sensor modalities and different fusion Section IV reviews the existing methods for background-foreground segmentation, object detection, classification, tracking, and behavioral analysis. Section V concludes the paper and highlights future research direction in this field. Intelligent surveillance system (ISS) overview Huge amount of security cameras, intelligent surveillance system (ISS) is a surveillance system that has intelligent capability to automatically analyze surveillance data and perform necessary actions such as generating alarm or ISS is interdisciplinary topic that involves electronic . ensing devic. , computer vision and pattern recognition, artificial intelligence . achine learnin. , networking, communication and other areas. Intelligent surveillance system is promising to be implemented in various environments and applications. Some typical applications are listed as follows. * Corresponding author. Email: suibrahim@ksu. A 2016 KIPMI Ibrahim / Communications in Science and Technology 1 . 7-14 Home security . , . and intrusion detection . Home care and safety . Public transport area such as airport, seaport, bus/train terminal . Public area . such as colleges, campuses, governmental building. Traffic monitoring . Crown management and analysis . Pedestrian detection and autonomous car . , . Remote military surveillance, border monitoring, perimeter surveillance for power plant, company, etc. Example of surveillance systems that have been previously studied or developed to have automation or intelligent capabilities: VSAM . ideo surveillance and monitorin. W4 . PRISMATICA . ro-active integrated systems for security management by technological institutional and communication assistanc. , . ADVISOR . nnotated digital video for intelligent surveillance and optimized retrieva. Fig. 1 shows the overview of PRISMATICA system that has been proposed to improve passenger security and safety in the public transport system. It contains several main components: camera network . xisting CCTV), intelligent camera system, transmission system, audio surveillance, operator, and also the main server (MIFSA). Fig. An overview of PRISMATICA system . Another impressive surveillance system is DARPA ARGUS-IS . utonomous real-time ground ubiquitous surveillance imaging syste. With 1. 8 Gigapixels video system. ARGUS-IS is able to auto-track every moving object within a 40 square kilometers . ize of small cit. using single Such commercially available ISS products: DETEC AS . and DETER . etection of events for threat evaluation and recognitio. Intelligent surveillance system may play a significant role in security and safety in public, as well as in private domain. However, it is highly challenging due to some practical issues, such as: - Performance: such as the system accuracy - Robustness: the system should be robust again real wo r d issues such as illumination variation, clutter, occlusion, weather change, camouflage, etc. - Reliability - Real time constrain: the system should fast enough - Cost effective Possible Sensor Modalities and Fusion Methods Visible Camera Visible (Vide. camera is common sensor modalities for surveillance system. It has long been in use to monitor environments, people, events and activities. It is the most commercially available surveillance sensors starting from low cost IP camera to high performance professional CCTV. Security cameras have been placed in everywhere, from private homes, streets, public buildings, as well as in border between countries. Extensive research has been conducted for visible or video surveillance system . Different types of visible cameras have been investigated for surveillance system such as color . r RGB) camera, monocular, stereo, omnidirectional camera, etc. Valera et al . divided the technological evolution of visual surveillance systems into three generations: Analog CCTV systems . st generatio. , automated visual surveillance by combining computer vision technology with CCTV systems . and automated widearea surveillance system . Infrared (IR) and Thermal Camera Visible camera is working well only in the environment that has enough illumination or light intensity, for example during in daytime. In the environment with low light intensity or during the night, visible camera cannot capture the scene In this case, there are two possible solutions: using infrared camera or thermal camera. Object . ike huma. that has contras temperature with the surrounding environment is much easier to distinguish in the thermal or infrared camera image than in the visible camera image. Both cameras capture infrared radiation that is invisible for human eye, therefore the Auinfrared cameraAy and Authermal cameraAy terms are usually interchangeable. However, infrared camera usually referred to a camera that captures nearinfrared (NIR) or short-wavelength infrared (SWIR) emissions to increase the visibility. Infrared cameras are suitable for environments with a low illumination level. While thermal camera is referred to a camera that is able to capture long-wave or far-infrared (FIR) radiation emitted or reflected by objects. Thermal camera is useful if the scene is completely dark. Thermal camera can be divided into two types: cooled and uncooled. Cooled thermal camera provides higher resolution and image quality, but generally more expensive and consumes more power. Examples of infrared and thermal camera are FLIR cameras, produced by FLIR System . , and AXIS Q19 camera series. Fig. shows a scene that is captured using visible camera and thermal camera in the same time . Fig. A scene captured using visible and thermal camera . Ibrahim / Communications in Science and Technology 1 . 7-14 Radar and lidar Range sensing is an interesting sensor modality due to its accuracy, large field of view and robustness with respect to illumination changes. Such range sensing includes radar . adio detection and rangin. and lidar . ight detection and Radar uses uses radio waves for sensing, while lidar uses light or laser. In range data, changes in the background can be easily filtered out by excluding all data outside of the tracking area. One drawback is that range data is generally less informative than vision data for person or object Spinello et al . proposed people tracking using 3D lidar. Recenltly. Banedek . also proposed 3D people surveillance using rotating multi-beam (RBM) lidar . s shown in Fig. Javed et al . develop automatic target classifier . uch pedestrian and vehicle. using ground surveillance radar. While Kocur et al in . used ultrawideband (UWB) radars for surveillance robot. data fusion, each sensor sends its original measurement to the fusion center, and then the center makes the decision about the In decision fusion, each sensor makes its own decision based on its own measurement, and then the fusion center makes the final decision based on all individual decisions . or example using majority votin. Each fusion scenario has its own advantages and drawbacks. Challenges in sensor fusion: how to handle different data modalities . isual, audio, radio signal, et. , data imperfection, conflicting data, sensor topology etc . More information about basic sensor fusion may refer to . Extensive works have been done on visual surveillance system using multiple cameras . Reference . reviews recent progress in intelligent video surveillance using multiple cameras that include multi-camera calibration, computing the topology of camera networks, multi-camera tracking, object re-identification, and also multi-camera activity analysis. Robertson et al . combine visible, infrared and thermal camera for outdoor people detection for moving platform . Premebida et al . proposed pedestrian detection combining RGB camera and dense LIDAR data Data Processing Techniques for ISS Foreground-Background Segmentation Fig. People surveillance using 3D lidar . Other sensor (Audio. Ultrasonic, et. There are many sensor modalities have been explored to improve or assist surveillance system, such as: audio . , ultrasonic . , passive infrared (PIR) . pressure sensor. Environmental sound like breaking of glass, dogAos barking, people screaming, fire alarm, gun firing and similar kind of sounds, may give a reasonable degree of confidence in making a decision about AosecureAo or AoinsecureAo state . Some sensors maybe used for alerting. ones they detect an object, visible camera . r other senso. is activated for more reliable recognition. Bai et al . developed an embedded home surveillance system based on multiple ultrasonic In their other work . , they us ed pyroelectric infrared sensors (PIR) and pressure sensors as an alert system to save the power. Foreground-background segmentation is the first important step for intelligent surveillance system. The goal is to separate the object or moving object . and the environment . It commonly referred also as background modeling, background subtraction, or change detection. Many foreground-background segmentation techniques have been proposed, especially for visible/video surveillance. Several review paper also available that focused discuss foreground background segmentation methods . Bouwmans . discussed and provide a comprehensive list most of av ailable techniques . ee Table . Fig. 4 shows foregroundbackground using different methods: generalized mixture of Gaussians (MOG) . , non parametric kernel density estimation . , and codebook . Sensor Fusion Intuitively, combining multiple sensors will provide more accurate information about the targeted object. Multiple sensors might be homogeneous . ame modality, such as multiple camera. or heterogeneous . ifferent modalitie. Some sensor modalities are intuitively closed and complementary, such as visible, infrared and thermal camera, since they capture information in 2D image perspective. Similarly, both lidar and radar capture information in range domain . D or 3D). Sensor fusion may happen at low-level . ata fusio. , high level . ecision fusio. or in between. Fig. Foreground-background segmentation: . original image, . MOG, . Kernel, . Codebook . Ibrahim / Communications in Science and Technology 1 . 7-14 T able 1. Background Modeling Methods . eproduced from [Bouwmans2. ) Category Methods Main Contributor (Author, yea. Lee et al. Mac Farlane et al. Zheng et al. Wren et al. Stauffer and Grimson Elgammal 0,2. Basic Background Modeling Mean Median Histogram over time Statistical Background Modeling Single Gaussian Mixture of Gaussians Kernel Density Estimation Fuzzy Background Modeling Fuzzy Running Average T ype-2 Fuzzy Mixture of Gaussians K-Means Codebook General Regression Neural Network Self Organizing Neural Network Discrete Wavelet T ransform Sigari et al. El Baf et al. Wiener Filter Kalman Filter T chebychev Filter T oyama et al. Messelodi et al. Chang et al. Background Clustering Neural Network Background Modeling Wavelet Background Modeling Background Estimation Butler et al. Kim et al. Culibrk et al. Maddalena Petrosino . Biswas et al. Obviously any possible techniques for foreground background segmentation depend on the corresponding sensor Recently. Sobral et al . compared 29 methods using BMC (Background Models Challeng. Top five promising methods based on this experimental work are the methods that proposed by Wren et al . Kaewtrakulpong et al . Yao et al . Maddalena et al . Hofmann et al . Cristani et al . discussed also other sensing modalities . uch as audio, infrared and thermal camer. in their survey paper. Most of the proposed method in background-foreground segmentations employed only single sensor modality, and particularly using visible camera. Obviously, combining different sensor modality would make the system more robust or simplify the processing process for For example, by combining visible camera and range data the background-foreground segmentation task become easier. Changes in the background can be easily filtered out by excluding all data outside of the observed area in the range data and the visible/image data is used for fine Object Detection and Classification The ability to automatically detect and classify object . uch as human and vehicl. is one of key component in intelligent surveillance system (ISS). For a machine . , detecting object like human is a hard job due to wide range of possible appearance as result of changing articulated pose, clothing, lighting and background . Huge methods have been proposed for people detection based on visual camera. their experimental survey. Enzweiler et al . showed an advantage of HOG/linSVM . at higher image resolutions and lower processing speeds, and a superiority of the wavelet- based AdaBoost . cascade approach at lower image resolutions and closed real-time processing speeds. In the more recent benchmarking effort. Dollar et al . show that FPDW . has the best overall performance, but If computational cost is not a consideration, then MULTIFTR MOTION . is the best choice. Spinello et al . , . proposed people detection using a bottom-up top-down detector, based on lidar data. The bottom-up detector learns a layered person model from a bank of specialized classifiers for different height levels of people that collectively vote into a continuous space. In the top -down step, the candidates are classified using features that are computed in voxels of a boosted volume tessellation. While in . Banedek et al map the 3D lidar point data into deptimage, and performing people detection in 2D. Spinello et al in . presented a people detection approach based on RGBDepth sensors that provide both image and range data. Fig. shows example of their result for people detection. Most of the proposed method in objects detection and classification focusing only for couple of object types for example human and car. In fact, in the real s etting, there are a lot of object that should be considered also for example different type of animal or other subject that have potential threat for security or Fig. 5 people detection using RGB-D: color image . , depth image . Object Tracking and Re-Identification After object detection, surveillance systems generally track the object in the spatiotemporal domain. Object tracking in are realistic scene is a challenging problem due to illumination changes, occlusion, clutter, sensor motion, and other issues. A large number of visual tracking algorithms . ased on visible camer. have been proposed in recent years. Object tracking methods based on visual camera can be classified into five groups: model-based, appearance-based, contour- and meshbased, feature-based, and hybrid methods . Several review papers that focused on visual tracking problem are available such as . Recently. Smeulders et al . performed experimental survey based on Amsterdam Library of Ord inary Videos (ALOV) for 19 online trackers. Another effort for benchmarking visual object trackers was proposed by Wu et al . According to the Visual Object Tracking challenge (VOT2. result, the best tracker . ombined accuracy and S. Ibrahim / Communications in Science and Technology 1 . 7-14 robustnes. is the discriminative scale space tracker (DSST) proposed by . This tracker extended the minimum output sum of squared errors (MOSSE) tracker . with robust scale Recently, some attempts have been done for people tracking using other than visible camera, such as using radar, lidar etc. For example. Mitzel et al . using stereo range data for real-time multi-person tracking. They did not only analyze 2D image, but also the range information from stereo Fig. 6 shows an example of their results. Javed et al . develop automatic target classifier . uch pedestrian and vehicle. using ground surveillance radar. While Kocur et al in . used ultra-wideband (UWB) radars for surveillance Based on lidar data. Spinello et al . , . proposed 3D people tracking using multi-target multi-hypothesis tracking For people detection they employed a bottom-up top-down detector . xplained in the previous sectio. Banedek et al . proposed an approach on real-time 3D people surveillance, with probabilistic foreground modeling, multiple persons tracking and on-line re-identification. The tracker module was also tested in real outdoor scenarios, with multiple occlusions and several re-appearing people during the observation period. classification into normal, unusual and abnormal. Previously. Park and Aggarwal . classified the activities as positive, neutral and negative activities. Human can be monitored as isolated individuals, groups of people, or crowds. Examples of group events are people fighting, people being followed, people walking together, terrorists launching attacks in groups, etc . Solmaz et al . proposed a method for identifying five crowd behaviors . ottlenecks, fountainheads, lanes, arches, and blockin. in visual scenes. Bremond et al . proposed an activitymonitoring framework for recognizing behaviors, involving either isolated individuals, groups of people, or crowds, in the context of visual monitoring of metro scenes, using multiple For example. Fig. 7 shows their result to recognize the Aufighting behaviorAy in a metro station. The combined four methods or descriptions to recognize fighting behavior such as: (A) a group of people gathering around a lying person, (B) group width varied significantly, (C) people inside a group separate quickly, and (D) group trajectory changes very fast . Fig. Recognizing Aufighting behaviorAy in a metro station . Fig. Automatic people tracking . Behavioral Analysis There is an increasing interest to automatically analyze surveillance scene not only in the Auobject levelAy . uch as detecting, trackin. , but also further into Auevent levelAy. Particular interests such are automated human behavior analysis . , . , group behavior analysis . , crown analysis . , 83, . , and event analysis. Some review papers have been devoted to this topic . Human behavior analysis can play a significant role in security by decreasing the time taken to thwart unwanted events and picking them up during the suspicion stage itself . Analysis of human behavior although crucial, is highly challenging. Basic component in human behavior analysis is classifying the human behavior. Different ways have been proposed to classify human Kiryati et al . proposed simple classification: normal and abnormal. Foroughi et al . expand the However, it should be noted that current research in the behavior analysis are still considering simple or simplified More realistic and complex scene should be For example, in real Aufighting behaviorAy it may be involving also the use of weapon such as knife or gun, so there will be less contact between fighting groups or persons. The four descriptions proposed by Bremond et al above may fails to characterize this fighting behavior. Conclusion and Future Direction In this paper, general overview of intelligent surveillance systems has been presented. Such intelligent systems are promising to be implemented in various environments and This paper also has discussed some possible sensor modalities and their fusion scenarios to improve the system performance. Numerous techniques have been proposed to tackle several main processing steps: backgroundforeground segmentation, object detection and classification. Ibrahim / Communications in Science and Technology 1 . 7-14 tracking, and behavioral analysis. Although several promising results have been obtained, further studies are needed for real implementation with more complex settings. For example, in the background-foreground segmentation process, different combination of sensor modality should be explored to make the system robust or to simplify the processing process. Current studies in behavior analysis are still considering simplified scene, and thus more realistic and complex scene should be investigated. With decreasing price in sensor and processing devices, researchers should also consider investigating and developing a low cost intelligent surveillance system. References