International Journal of Electrical and Computer Engineering (IJECE) Vol. No. August 2017, pp. ISSN: 2088-8708. DOI: 10. 11591/ijece. Indian Monuments Classification using Support Vector Machine Malay S. Bhatt1. Tejas P. Patalia2 Rai University. Ahmedabad. India Dept. of Computer Engineering. Engineering College. Rajkot. India Article Info ABSTRACT Article history: Recently. Content-Based Image Retrieval is a widely popular and efficient searching and indexing approach used by knowledge seekers. Use of images by e-commerce sites, by product and by service industries is not new Travel and tourism are the largest service industries in India. Every year people visit tourist places and upload pictures of their visit on social networking sites or share via the mobile device with friends and Classification of the monuments is helpful to hoteliers for the development of a new hotel with state of the art amenities, to travel service providers, to restaurant owners, to government agencies for security, etc. The proposed system had extracted features and classified the Indian monuments visited by the tourists based on the linear Support Vector Machine (SVM). The proposed system was divided into 3 main phases: preprocessing, feature vector creation and classification. The extracted features are based on Local Binary Pattern. Histogram. Co-occurrence Matrix and Canny Edge Detection methods. Once the feature vector had been constructed, classification was performed using Linear SVM. The Database of 10 popular Indian monuments was generated with 50 images for each The proposed system is implemented in MATLAB and achieves very high accuracy. The proposed system was also tested on other popular benchmark databases. Received Nov 15, 2016 Revised Jun 4, 2017 Accepted Jun 21, 2017 Keyword: Classification Edge detection Generalized co-occurrence Local binary pattern Monument Support vector machine Copyright A 2017 Institute of Advanced Engineering and Science. All rights reserved. Corresponding Author: Malay S. Bhatt. Rai University. Ahmedabad. India. E-mail : malaybhatt202@yahoo. INTRODUCTION Content Based Image Retrieval system has become a significant research issue as plenty of image data have been generated in areas like medicine. Fashion Design, art galleries, entertainment, education, manufacturing and more. QBIC System of IBM . Chabot of U. Berkeley. Photobook of Massachusetts Institute of Technology (MIT) . VisualSEEK . and MARS . are popular examples of CBIR software The text based retrieval system involves manual annotation of images which involves problems like the vast amount of laborious task and most importantly human perception of the image. Different person perceives the same image differently . For many years, a tremendous amount of multimedia data in the form of images, audio and video have been generated due to availability of cost-effective electronic devices like camera, mobile or Handy These multimedia data have been shared, uploaded or emailed to relatives and friends staying away to make them feel that they had not missed the precious moments. Millions of such photographs are uploaded and it is almost impossible to manually classify these pictures as per the monuments people visited. January 2013, the India was at 3rd position with 62. 6 million Facebook members . Journal homepage: http://iaesjournal. com/online/index. php/IJECE IJECE ISSN: 2088-8708 Bollywood and Television industry are also considered as major sources of multimedia data (Movie stills. Posters etc. ) As they release more than 2000 movies, a lot of music videos, produce serials and also organize events for 100 years . It can be observed that pilot scenes and song sequences consist of the monuments in the background and thus movies promote tourism and increase the revenue of the country as a Figure 1 depicts the presence of the monuments in scenes of popular movies. Travel and tourism industry incorporate heritage, medical, business and sports tourism. The main objective of this sector is to develop and promote tourism, to maintain competitiveness of India as a tourist destination and to improve and to expand existing tourism products to ensure employment generation and economic growth . The government is also promoting tourism through advertisements, campaigns and takes special interest to preserve the beauty of these monuments. Every year, lots of foreign delegates and tourists visit India too. As per India Tourism Statistics 2013, 6. 97 million foreign tourists arrived with an annual growth rate of 5. 9% and approximately 1145 million domestic tourists with an annual growth rate of 6% were observed . Table 1. Bollywood Movies and Monuments Movie Name Release Year Monument Ki & Ka Ki & Ka FAN Tevar Jeans Jhoom Barabar Jhoom Leader Mere Brother ki Dulhan Namastey London Youngistaan India Gate Gate Way of India Red Fort Taj Mahal Taj Mahal Taj Mahal Taj Mahal Taj Mahal Taj Mahal Taj Mahal Movie Name Release Year Monument Fanaa Jannat 2 Rang De Basanti Rang De Bassanti Rab Ne banadi Jodi Ki & Ka Ki & Ka FAN Qutub Minar Qutub Minar India Gate Golden Temple Golden Temple India Gate Gate Way of India Red Fort Tevar Jeans Taj Mahal Taj Mahal Figure 1. 'Fanaa' Movie . 'Mere Brother Ki Dulhan' Movie . 'Leader' Movie . 'Tevar' Movie . 'Rab Ne Bana Di Jodi' Movie . 'Rab Ne Bana Di Jodi' Movie . 'Jannat 2' Movie and . 'Rang De Basanti' Movie. LITERATURE SURVEY Bhatt M. and Patalia T. used Generalized Co-occurrence Matrix (GCM) obtained from HSV color space having 64 gray-levels with various distance values . , 6, 9, 12 and . as input to Genetic Programming System. Genetic Programming evolved spatial descriptor using 15 Generalized Co-Occurrence matrices as terminals have been implemented with 7 functions and 2 operators. Each GCMs having size of 64x64 is considered as input. Obtained Genetic programming evolved spatial descriptor is tested on manually created Indian monuments database having four classes, namely 'Taj Mahal', 'Qutub Minar', 'Golden Temple' and 'India Gate'. Fitness function used in GP system is linear SVM with 10 fold cross validation. They obtained accuracy of 92 %. Murala et al. proposed Local Tetra Pattern (LTrP) as a new feature descriptor for Content Based Image Retrieval. LTrP uses four distinct values for encoding of information and also uses direction They highlighted advantages of LTrP over Local Binary Pattern (LBP). Local Ternary Pattern (LTP) and Local Derivative Pattern (LDP). Benchmark databases viz. Corel 1000 database. Brodatz texture database and MIT VisTex database were used for performance comparison. Indian Monuments Classification using Support Vector Machine (Malay S. Bhat. A ISSN: 2088-8708 Youness et al. proposed content based image retrieval based on 2-D ESPRIT (Estimation of Signal Parameters via Rotational Invariance Technique. and Gabor Filter. They used Brodatz gray scale image dataset, having 13 texture classes with 16 samples for each class, for the purpose of evaluation. They achieved an average precision of 80. Nazarloo et al. applied content based image retrieval for gender classification. The face is one of the most important biometric of human and contains lots of useful information. For gender classification, they merged the Gabor Filters and Local Binary Pattern Features of the face. Self-Organized Map was used for the classification and achieved an accuracy of 92. Guo et al. explained Completed Local Binary Pattern (CLBP) which is a modification to LBP. CLBP composed of Local Difference Sign Magnitude Transform (LDSMT) and center gray level value. They tested the proposed approach on CUReT and Outex texture databases. It is observed that sign component preserves much better local difference as compared to magnitude component and thus outperforms conventional LBP. Das et al. , . highlighted the importance of classified query for content based image recognition based on niblack's thresholding. They applied binarization on Red. Blue and Green planes separately based on niblack's thresholding. On the binarized images, upper mean, lower mean, upper standard deviation and lower standard deviation are calculated. For each plane 4 features are extracted and thus a feature vector of size 12 is generated. They tested the system on benchmark databases 'Wang Database' and 'OT-Scene'. Highest precision of 0. 838 and recall of 0. 838 was achieved using Artificial Neural Network with a Multi Layer Perceptron for 'Wang Database' while highest precision of 0. 753 and recall of 0. 754 are obtained using Support Vector Machine for 'OT-Scene' database. Streater. J . explains the design and implementation of genetic programming system for construction of feature descriptor for skin lesion images. 6 Generalized Co-occurrence Matrices (GCM. of size 64x64 are constructed using RGB color space with 5 inter-pixel distances for 100 images. Highest accuracy achieved was 72%. Fisher's Discriminant Ratio, nayve Bayes classifier is also used to compare the results obtained by the SVM. Extraction of information from movies has been focused since last 10 years. Hollywood Movie Database 51 . YouTube . and the Hollywood datasets . are challenging datasets used widely for action recognition. Action recognition from movies is a subset of general human recognition activity. Laptev et al. highlighted limitations of the human action dataset in a controlled environment and also described the difficulties faced during the recognition of real movie actions. They identified similarities and dissimilarities between action recognition from movies with object recognition in still The first task accomplished with the accuracy of 60% is the automatic annotation of the human action using the movie scripts. Inaccuracy is due to script video misalignment. Classification of human-action is the main goal, achieved in the paper with 91. 8% accuracy. Experiments are carried out for 8 different HoG. HoF. Spatio-Temporal Bag of Features (BoF) and a combination of the above are used with the non-linear support vector machine to achieve the desired result. Lei Chen et al. depict a top-down approach based on rules for video editing and audio cues to extract dialogue and action scenes. A finite state machine with an audio-based support vector machine (SVM) classifier is applied for detection of skin type. Classifier uses three features, namely: variance of zero crossing rate, silence ratio, and harmonic ratio. The precision and recall rates achieved are 76. 56% and 81. Doudpota et al. focused on the impact and popularity of Bollywood movies in South Asia. Middle East. UK. USA and other parts of the world. They mined song sequences from the Bollywood They used in the first part. Zero Crossing Rate. Spectrum Flux and Short Time. Energy as features in Support Vector machine for binary classification of extracting segment into music and non-music. In the second phase, extracted music segments are further classified into song and non-song sequences using Probabilistic Timed Automata (Song Gramma. An experiment was carried out on 10 Bollywood movies having 74 songs and out of which 69 were successfully extracted. Recall achieved is 93. 24%, while the Precision is 87. Vaudeville et al. have proposed integrated color and intensity co-occurrence matrix (ICICM) for content based image retrieval. The ICICM composed of four other matrices namely. ICICM CC. ICICMCI. ICICMIC and ICICMII. ICICMcc captures color perception of pixel p and color perception of the nighborhood of P while ICICMCI captures color perception of pixel p and intensity perception of the neighbourhood of P. The other two can be described similarly. ICICM is updated based on the weight which is a function of Saturation and Intensity. The development of ICICM is based on the properties of the HSV color space. They have tested the system with a combination of various color (C) and gray level perception (I) levels on two different datasets. The first database is constructed from general purpose images obtained from International Microcomputer Software Inc. (IMSI) while the other is constructed using a crawler. A web-based application IJECE Vol. No. August 2017 : 1952 Ae 1963 IJECE ISSN: 2088-8708 for CBIR based on ICICM is available in the public domain for performing the experiments with images in our database as well as with externally uploaded images. Deselaers et al. performed a quantitative comparison of image features like color histograms, invariant feature histograms. Gabor feature histograms. Tamura texture feature, local feature and region based features. Correlation among these features is analyzed. They focused on two image retrieval tasks: color photographs (WANG Datase. and medical radiographs (IRMA Datase. Low correlation exists among region features, image features, invariant feature histogram and Gabor histograms. The combination of these will produce better image retrieval for color photographs. Invariant Feature histogram gives 15. 9% error on WANG Dataset while error rate of 29. 2% is identified in IRMA Dataset. Similar kind of comparison describes all remaining features. It is clear that selection of feature depends on the task at hand and the combination of positively correlated features does not improve the classification result. Desai et al. highlighted the importance of monuments classification to archaeologists in an assessment of their findings and in classification. Art galleries and museums focus on visual aspects of A CBIR system based on visual shape based feature and texture feature was developed. Morphological operations were carried out for shape extraction and Gray level Co-occurrence Matrix was used for texture feature extraction. Five different classes with a total of 500 images are collected and performance was compared with Canny and Sobel edge detection approaches. Information about Indian movies can be obtained via Indian Movie Database . RESEARCH METHOD Figure 2 provides a flow chart of a content based image retrieval system. An image query is the image file that is given as an input to the system. The features of the input are calculated. A query of the extracted features is then generated and is compared with all the other features of the image files present in the database. Based on similarity measures, the system retrieves the required image files from the database and presents it in the form of the result. Figure. 2 Content Based Image Retrieval System Preprocessing A set of 500 input images is re-sized into 256x384 Resulted images are converted from RGB color space into Hue. Saturation and Value (HSV) color space . The layers of human retina sense the light through rod cells and cone cells . The gray-levels are perceived by rod cells at low-levels of illumination while at higher levels of illumination cone cells are also excited. The human perceives the color same as the HSV color space. RGB color representation is different and not as per human perception. Hue indicates the pure color. S indicates the percentage of white added in the pure color while V represents intensity. The HSV color space can be represented as a hexacone . When saturation is zero, we get only shades of gray from black to white by increasing the intensity. Incident light composed of many spectral components, but causes loss of color information when saturation is low even though illumination is very high. By changing the saturation from 0 to 1, perceived color changes from shades of gray to pure color under the given hue and It is known that HSV color space has more discriminating power as compared to RGB color space. Figure 3. Displays RGB image while corresponding HSV image is shown in Figure 3. Indian Monuments Classification using Support Vector Machine (Malay S. Bhat. A ISSN: 2088-8708 Figure 3. RGB Image and . HSV Image Generalized co-occurrence matrix properties with different distance and direction are also computed and it results into 20 additional features. S and V planes are also extracted from the input image. The Computation of Centre Symmetric Local Binary Pattern with 16 bins and Histogram with 16 bins are carried out on each plane. Generalized co-occurrence matrix properties are also extracted from each plane. Figure 4 shows the pre-processing & Feature Vector GenerationFigure 5 covers several techniques which were merged together to generate the feature vector. Figure 4. Pre-processing & Feature Vector Generation LBP CS- LBP GCM Properties Histogram Edge Property Histogram Figure 5. Feature Vector Generalized Co-Occurrence Matrix Generalized C0-Occurrence Matrix is useful to extract the texture of the image. It is represented as 4-tuple . , j, d, ). Here, y' and 'j' represent gray levels, d is the distance between pixels p1 and p2. Graylevels of p1 and p2 are i and j respectively. is the angle between pixels p1 and p2. IJECE Vol. No. August 2017 : 1952 Ae 1963 IJECE ISSN: 2088-8708 Figure 6. The Generalized Co-occurrence Matrices (GCM) of size 128x128 are calculated for interpixel distances 3,9, 15 and 64 in horizontal direction for H. S and V planes. planes x 4 inter- pixel Figure 7. describes the working of GCM for a sample image of size 5x5 with 4 gray levels while Figure 7. contains calculated GCM in the horizontal direction with inter-pixel distance '1'. Figure 7. A Sample Image . The GCM in horizontal Direction with inter- pixel distance '1' Table 2 shows the generalized Co-Occurrence Matrix Properties . Table 2. Generalized Co-Occurrence Matrix Properties . Property Description Contrast Correlation Energy It measures the intensity contrast between a pixel and its neighbour over the whole image. It indicates how correlated a pixel is to its neighbour over the whole image. It represents the sum of squared elements in the GLCM. It is also known as uniformity. Homogeneity It measures the closeness of the distribution of elements in the GLCM to the GLCM diagonal. Contrast: Eu | i A j | p. , j ) i, j Correlation: i, j ( i A Ai )( j A A j ) p ( i , j ) A iA j Energy: Eu i, j p . , j )2 . Indian Monuments Classification using Support Vector Machine (Malay S. Bhat. A ISSN: 2088-8708 Homogeneity: , j ) Eu 1A | i A j | . i, j p . , . represents count at position . , . in GLCM. AA denotes mean and E indicates the standard deviation in the above equations. Here, small, medium and large distance values are considered to capture the span of the monument in the horizontal and vertical direction. For example, span of the 'Red fort' is an almost whole image in the horizontal direction . eft-to-righ. with Hue, close to Red whereas 'Hawa Mahal' is spanned in both horizontal and vertical direction with Hue close to Red. Thus homogeneity and correlation properties are high for 'Red Fort' in the horizontal direction while the same properties are high in the horizontal as well as in the vertical direction in 'Hawa Mahal'. Table 3 shows the generalized Co-Occurrence Matrix Used as Features Table 3. Generalized Co-Occurrence Matrix Used as Features Number of Graylevels Distance Direction Horizontal Horizontal Horizontal Horizontal Vertical Local Binary Pattern and Centre-Symmetric Local Biinary Pattern The Local Binary Pattern effectively captures texture information from the local neighbourhood. Figure 8 explains working of the Local Binary Pattern and the Centre-Symmetric Local Binary Pattern. n A1 LBP( x, . A Eu s. c A ni ) 2 I . i A0 s. = 1 if x >=0 0 otherwise Here, nc indicates the graylevel of the centre pixel of 8-neighbourhood, ni indicates ith pixel of the The signs of the differences in a neighbourhood are interpreted as N-bit binary number resulting in 2N distinct values in the binary pattern. The LBP features are robust against illumination changes, they are very fast to compute, do not require many parameters to be set, and have high discriminative power . In CS-LBP, center symmetric pairs of pixels are compared. LBP produces 256 distinct binary patterns, whereas CS-LBP generates 16 distinct binary patterns. The robustness on flat image regions is obtained by thresholding the gray level differences with a small value T. In our proposed system, histogram of CS-LBP is generated for all 3 planes of HSV image resulting in 48 . while the histogram of LBP is obtained for all 3 planes of RGB image resulting into 768 . Figure 8. Local Binary Pattern and Centre Symmetric Local Binary Pattern . Edge Histogram Color information is obtained through histograms, an area information is added into feature vector using generalized co-occurrence matrix using different distance and direction, texture information is achieved IJECE Vol. No. August 2017 : 1952 Ae 1963 IJECE ISSN: 2088-8708 using LBP and CS-LBP histogram. To add the structural . ehavior at the edge point. information in the feature descriptor, canny edge detector is used with threshold 0. 2 so that most prominent edges are preseved. Canny edge detector consists of smoothing, finding gradients, non-maxima supression, double thresholding and edge tracking by hysteresis . For each detected edge point, 5x5 neighbourhood is considered and the mean and the standard deviation are calculated. The unique values obtained from these statistical properties vary for every image because the detected edge points are not fixed. It is observed that unique values are in the range of 2000-10000. Two Histograms with bin size 100 are generated for mean and standard deviation. Fitness Function Here, we have adopted the classification accuracy calculated by a linear SVM classifier on the training set as well as the testing set. We adopted tenfold cross-validation for which total dataset set is divided randomly into 10 equal-sized parts and perform ten repetitions of training the SVM on 9/10 of the set and testing on the remaining 1/10 . The overall fitness AErA is the average of the tenfold crossvalidation accuracy. In our case, the value of n is 10. Accuracy . represents the accuracy of fold I by the SVM. The fitness function is defined as follows: Er = . - (Oc (SVM. ])/. ))*100 % . RESULTS AND ANALYSIS Implementation Our goal is to classify above mentioned monuments from large repositories of photographs uploaded on the social networking websites. We have evaluated our proposed method using 64-bit MATLAB 2013a, 8GB of RAM running on the Windows 8. 1 OS with i7 5th generation processor. Datasets The WANG database . is a subset of 1,000 images of the Corel stock photo database which have been manually selected and it forms 10 classes of 100 images each. It is shown in Figure 9. The WANG database can be considered similar to common stock photo retrieval tasks with several images from each category and a potential user having an image from a particular category and looking for similar images. The 10 classes are used for relevance estimation: given a query image, it is assumed that the user is searching for images from the same class, and therefore the remaining 99 images from the same class are considered relevant and the images from all other classes are considered irrelevant. Figure 10 shows Oliva and Torralba (OT-Scen. categories, 2688 image. The OT-Scene Database was also considered for the evaluation purpose . Figure 11 displays the example images of Monuments dataset. The Most important task is the collection of data as no direct dataset is available for the task at hand. In our data set 10 different classes are considered, namely: 'Taj Mahal', 'Gate way of India, 'Golden Temple',' Lotus Temple', 'India Gate', 'Qutub MinarA, 'Red Fort', 'Chatra Pati Shivaji Railway Station', 'Hawa Mahal' and 'Victoria Memorial'. For each mentioned class, 50 images are collected from websites and a total of 500 images are collected. individual tourist, family and tourist groups in various poses are considered for both training and testing. achieve diversity. Front view . Near View . Far View. Left-View . , right-side view . , people posing . itting or standing etc. ) in front of monuments are considered. Figure 9. The Wang Sample Database . Indian Monuments Classification using Support Vector Machine (Malay S. Bhat. A ISSN: 2088-8708 Figure 10. The OT-Scene Database . Figure 11. The Sample Images from the Monuments Database Experimental Results The performance of the system is evaluated based on Error Rate. Precision. Recall. Accuracy and FScore . Table 5 shows confusion matrix status for Wang and Monuments Dataset while. Table 6 focuses on OT-Scene dataset. Confusion Matrix for the Indian Monuments dataset is shown in Table 4. Precision= tp / . p f. Recall= tp / . p f. Accuracy= . p t. / . p tn fp f. F-score= 2 * ((Precision * Recal. / . recision recal. ) . Here, tp indicates true positive, fp represents false positive, fn and tn are false negative and true negative respectively. The F-score is also known as harmonic mean of precision and recall. Table 4. The Confusion Matrix (Monuments Databas. Monuments Taj Mahal Gateway of India Golden Temple Lotus Temple India Gate Qutub Minaar Red Fort CST Hawa Mahal Victoria Memorial IJECE Vol. No. August 2017 : 1952 Ae 1963 IJECE ISSN: 2088-8708 Table 5. The Confusion Matrix Status Monuments Taj Mahal Gateway of India Golden Temple Lotus Temple India Gate Qutub Minaar Red Fort CST Hawa Mahal Victoria Memorial Average Monuments Dataset Precision Recall Accuracy F-Score Class Label Wang Dataset Precision Recall Accuracy Tribals Sea Beach FScore Gothic Structure Bus Dinosaur Elephant Roses Horses Mountains Food Average Table 6. The Confusion Matrix Status (OT-Scene Databas. OTScene Dataset Coast & Beach Open Country Forest Mountain Highway Street City Center Average Precision Recall Accuracy F-Score Figure 12 shows the Receiver Operating Characteristic curve for Monuments Dataset. Similar curves can be easily plotted for the other benchmark databases. Figure 12. ROC Curve of the Monuments Dataset CONCLUSION Recently. Content Based Image Classification has generated successful applications in industries like agriculture, pharmaceutical, surveillance and many more. The tourism industry of any country plays a vital role in the economic growth of the nation. The presence of the monuments in the Bollywood movies and its impact on the tourism industry is highlighted as tourists prefer such places to visit. Feature vector is generated using Histograms. Local Binary Pattern. Generalized Co-Occurrence Matrix and Canny-Edge Detector. Ten popular Indian Monuments were considered and image database has been constructed. The Indian Monuments Classification using Support Vector Machine (Malay S. Bhat. A ISSN: 2088-8708 system achieved an average accuracy of 97% with high precision and recall for the Indian monument The proposed system also works well on the other benchmark databases. REFERENCES