Communications in Science and T echnology 1 . COMMUNICATIONS IN SCIENCE AND TECHNOLOGY Homepage: cst. Texture feature extraction for the lung lesion density classification on computed tomography scan image Hasnelya*. Hanung Adi Nugroho a. Sunu Wibiramaa. Budi Windartab. Lina Choridahb Department of Electrical Engineering and Information Technology. Universitas Gadjah Mada. Jl. Grafika No. Yogyakarta 55281. Indonesia Department of Radiology. Universitas Gadjah Mada. Jl. Farmako Sekip Utara. Yogyakarta 55281. Indonesia Article history: Received: 25 M ay 2016 / Received in revised form: 28 M ay 2016 / Accepted: 28 M ay 2016 Abstract The radiology examination by computed tomography (CT) scan is an early detection of lung cancer to minimize the mortality rate. However, the assessment and diagnosis by an expert are subjective depending on the competence and experience of a radiologist. Hence, a digital image processing of CT scan is necessary as a tool to diagnose the lung cancer. This research proposes a morphological characteristics method for detecting lung cancer lesion density by using the histogram and GLCM (Gray Level Co-occurrence M atrice. The most well-known artificial neural network (ANN) architecture that is the multilayers perceptron (M LP), is used in classifying lung cancer lesion density of heterogeneous and homogeneous. Fifty CT scan images of lungs obtained from the Department of Radiology of RSUP Dr. Sardjito Hospital. Yogyakarta are used as the database. The results show that the proposed method achieved the accuracy of 98%, sensitivity of 96%, and specificity of 96%. Keywords: Classification. Density. CT Scan Image. Lung cancer Introduction Cancer is the most common cause of death in the world as revealed from WHO (World Health Organizatio. statistic In 2012, it was recorded that cancer induced 8. 2 million of deaths and lung cancer was recorded contributing of 1. million deaths that this disease is the highest mortality rate compared with liver cancer, stomach cancer, colorectal cancer, breast cancer and oesophageal cancer . As reported by the WHO in 2014, in Indonesia, 30. 866 deaths were caused by lung cancer, with 8. 390 and 22. 476 deaths of female and male respectively. The lung cancer then became the most common cause of death towards male in Indonesia . Lung cancer is an abnormal growth of the lung cells in body tissues and grow to be cancer cells . in one or both parts of them, which commonly caused by smoking . It is divided into Small Cell Lung Cancer (SCLC) and Non-Small Cell Lung Cancer (NSCLC) . The radiology examination of CT scan is one of early detection methods of the lung cancer to make the initial phase diagnosis in order to minimize the mortality rate. The image interpretation and assessment of lesion characteristic is subjective and various depend on the radiologist experience. Hence, the digital image processing of CT scan lungs image is necessary with an expectation that it can provide a second opinion diagnosis process . Based on morphological description on the CT Scan image there are number of criteria of the primary lung cancer diagnosis including ground glass * Corresponding author. Tel. : 62-274-552305. fax: 62-274-552305. Email: hasnely. mti13@mail. opacity, irregular speculated margin, density, size of tumour, air bronchogram, lobulated, and enhancement . This research is focused on the morphological characteristics detection of lung cancer lesion density. The lesion density is a description of the tissue density which can be further divided into heterogeneous density and homogeneous . A digital imag e processing is required for morphological characteristics of lung cancer lesion density detection. A number of researches has been conducted for feature extraction of texture, such as a research which carried out by Uyun . about pattern density detection on the mammogram image by using feature extraction method of GLCM (Gray Level Co-occurrence M atrice. The results obtained a strong significance towards the determination of breast can cer. Furthermore. Devan, et al . in their research used the extraction method of texture feature of GLCM. GLRLM (Gray Level Run Length Matrice. and entropy to identify the characteristics of three lung tissues types including normal, fibrosis, and carcinoma. The research result showed that the features used can differentiate three types of lung tissues The texture feature extraction by using the histogram and GLCM was also conducted by Patil, et al . to identify the benign cancer and malignant cancer by using Xray image. Tun, et al . conducted Otsu segmentation and texture feature extraction to identify the lung cancer stages of stadium I. II, i. IV by using GLCM method. A number of other researchers have used the method of the classifications of MLP (Multilayer Perceptro. One of example is a research which conducted by Anand . in classifying the lung tumour as cancer and normal. The research conducted by Ahmad, et A 2016 KIPMI Hasnely et al / Communications in Science and Technology 1 . 27-32 al . and Mitrea, et al . also implemented MLP to classify the image of colorectal cancer while research conducted by Valarmathi, et al . was to classify the mammogram image. In this research, the identification of the morphological characteristics of lung cancer lesion density is conducted by using the texture feature extraction method and classification of lesion density. The image processing phases are the preprocessing by cropped RoI (Region of Interes. , segmentation process by using Otsu segmentation, morphological operation and the feature extraction of the texture based upon the histogram and GLCM. The result of the texture feature extraction would be used for the class ification phase using the method of M LP. The weighting in two classes: object and background are calculated with the equation . in which L states is the number of grey level. = Ocycycn=1 ycy. = Ocyaycn=yc 1 ycy . = 1 Oe yc1 . The mean of object and background was calculated with the Equation . = Ocycycn=1 ycn . /yc1 . = Ocycycn=1 ycn . /yc2 . Materials and Methods This research is uses the data from the Department of Radiology of RSUP Dr. Sardjito Hospital. Yogyakarta, consist of 50 CT scan images of lung cancer. The aim of this research is to identify the morphological characteristics of lesion density from CT scan image in the case of primary lung Fig. 1 illustrates the block diagram of conducted research which include the pre-processing, segmentation, morphological operation, feature extraction and classification. Fig. Block diagram of the identification of lesion density morphology Fig. RoI cropping process of image: . Original image. Image as the result of RoI cropping Pre-processing At the pre-processing stage, the image is cropped on the RoI as initial step to make the research focus on the lesion This process is conducted manually to facilitate the process of identifying the morphology of lesion density. The cropping result of RoI is shown in Fig. Otsu Segmentation Otsu segmentation is a process of classification of pixel to differentiate two parts: object and background . by calculating the threshold values automatically based on the input image . At first, the main principle of Otsu is to determine the probability of intensity value of i in the histogram which is calculated by . N states is the total number of all pixels in the image and n i states is the number of pixels with the intensity i. = ycuycn , ycy . Ou 0. Oc256 = 1 Equation . states yuayaA2 called as between-class variance (BVC). The total means are calculated with equation by using equation . The optimum threshold value was obtained by maximizing BVC and more less computation time . yuayaA2 . = yc1 . [ yco1 . Oe ycoycN ] 2 yc2 . [ yco2 . Oe ycoycN ] 2 ycoycN = OcycA ycn=1 ycn . ycy ycn Fig. 3 shows the segmentation process by using Otsu The result of the Otsu segmentation showed the fixed form of lesion object separated from the background. Morphological Operation The output of Otsu segmentation is binary image that will be used as a template to get the lesion area by applying a simple morphological operation such as AND. OR and NOT. AND operation is used in this research. Fig. 4 shows the process of morphological operation. Hasnely et al / Communications in Science and Technology 1 . 27-32 The formula resulted mean of object brightness. In this case, m refers to number of mean value, i refers to the grey level in the image and p. represents the probability of emergence of i and L presenting the highest grey level. Standard Deviation yua = ocyaOe1 ycn=1 . cn Oe yco ) ycy. Standard deviation . refers to the level of statistic spread measuring the pattern of data spread and provides the level of Fig. Process of Otsu segmentation: . Image of RoI. Segmentation image . Skewness ycIycoyceycycuyceycyc = OcyaOe1 ycn=1 . cn Oe yco ) ycy. Skewness refers to the level of asymmetrical towards the It will be negative (-) if the histogram curve tends to be on the left side of from the value means and it will be positive ( ) if otherwise. Energy yaycuyceycyciyc = OcyaOe1 ycn=0 . ] . Energy refers to the stating level of pixel intensity distribution towards the extent of grey level which commonly known as uniformity. Entropy . Fig. Process of morphological operation: . RoI image. Segmentation . Morphological operation image yaycuycycycuycyyc = Oe OcyaOe1 ycn=0 ycy. ) Feature Extraction Entropies present a level complexity of an image. The higher of the value represent the high complexity of the Entropy also indicates the quantity of information contained in the data spread. The texture extraction method consisted of three groups: statistic method, structural method and spectral method. this research, the statistic method is used of the first order based upon the histogram and the second order with the base of GLCM as it can identify the density of cons tituent tissues by using the intensity of grey level with the highest performance in a number of previous researches . 1 Histogram-Based Texture The simplest extraction method of the statistic properties for the texture is the order one which based on the histogram. In order to obtain the histogram-based statistic properties, the texture of an image can be calculated using the following features . - . Mean yco = OcyaOe1 ycn=0 ycn . Smoothness ycIycoycuycuycEaycuyceycyc = 1 Oe 1 yua2 The level of smoothness of an image could be measured by the smoothness value. The low smoothness value shows that the image has the rough intensity. 2 Gray Level Co-occurrence Matrices (GLCM) The GLCM method was firstly published by Haralick in 1973 with 28 values of features. GLCM uses the texture measurement in second order by considering the relationship between the pair of two pixels of original image . In example, f. , . refers to the image with the size of Nx and Ny that has a pixel with probability. Thus, the L level and Hasnely et al / Communications in Science and Technology 1 . 27-32 ycE are the spatial direction vectors. yayayaycAycE. cn, y. is defined as the number of pixels with yc OO 1, . , ya occurred in the offset ycE towards the pixel with the values of ycn OO 1, . , yathat can be stated in the formula . Entropy is the size of complexity of grey level of an The values will be low if the elements of GLCM are close to the value of 0 or 1 and the value will be high if the elements of GLCM have the relatively equal value. yayayaycA ycE( ycn, y. = # {( ycu1 , yc1 ) , ( ycu2 , yc2 ) OO . cAycu, ycAyc ) y . cAycu , ycAyc ). cu1 , yc1 ) . Correlation ycE = yc = . cu e 2 Oe ycu1 , yc2 Oe yc1 ) } . In this case, offset ycE refers to angle and distance of pixel. For example. Fig. 5 shows four directions for GLCM. yaycuycycyceycoycaycycnycuycu = Ocyaycn=1 Ocyayc=1 . ayayaycA . cn,y. )OeyuNA yuNA ycn yc yuaycnA yuaycA Correlation is the size of the correlation of linearity from a number of pixel pair and provides the information regardin g the linear structure in image. Remark: yuNAycn = Ocyaycn=1 Ocyayc =1 ycn O yayayaycA . cn, y. yuNAyc = Ocyaycn=1 Ocyayc =1 ycn O yayayaycA . cn, y. yuayc2 = Ocyaycn=1 Ocyayc=1 yayayaycA . cn, y. cn Oe yuNAycn )2 yuaycn2 = Ocyaycn=1 Ocyayc=1 yayayaycA . cn, y. cn Oe yuNAycn )2 . Classification Fig. T he direction of GLCM . A, 45A, 90A, and 135A) Five features of GLCM used include Angular Second Moment (ASM), contrast. Inverse Different Moment (IDM), entropy, and correlation as explained as follows . Ae . ASM yaycIycA = Ocyaycn=1 Ocyayc=1 yayayaycA . cn, y. Angular Second Moment (ASM) measures the uniformity or the size of the properties of homogeneity of image. The artificial nerve tissue can be used for classification and to identify the pattern of object . MLP is the formation of artificial nerve system that are mostly used in terms of education and application . MLP has abilities to learn and give the better performance of classification are proven in a number of research . At the classification stage, this research used MLP method by using three layers, consisting of input layer, hidden layer, and output layer. The classification was conducted using Weka machine learning . K-fold cross validation is chosen to evaluate the performance of training and testing feature from the dataset before being classified . Technically, the architecture of MLP used in this research is illustrated in Fig. Contrast yaycuycuycycycaycyc = Ocyaycu=1 ycu 2 {O. ycnOey. =ycu yayayaycA . cn, y. } . Contrast measures the spatial frequency of image and the size of spread . nertia momen. of image matrix element. also refers to the existence of the variation of grey level of image pixel. Fig. Architecture of MLP for the classification of lesion density . IDM yayaycA = Ocyaycn=1 Ocyayc=1 yayayaycA ( ycn,y. 1 ( ycnOey. Inverse Different Moment (IDM) is used to measure the homogeneity of image with the similar grey level. The homogeny image will have a large value of IDM. Entropy yaycuycycycuycyyc = Oe Ocyaycn=1 Ocyayc=1. ayayaycA . cn, y. ) log. ayayaycA . cn, y. ) The performance of classification is measured from the prediction of accuracy, sensitivity and specification aspects as expressed in . - . Where. TP. TN. FP and FN are true positive, true negative, false positive, and false negative, yaycaycaycycycaycayc = ycNycE ycNycA ycNycE ycNycA yaycE yaycA ycIyceycuycycnycycnycycnycyc = . ycIycyyceycaycnyceycnycaycnycyc = ycNycE ycNycE yaycA ycNycA ycNycA yaycE Hasnely et al / Communications in Science and Technology 1 . 27-32 Results and Discussion Based on the experiment in this research, the texture feature extraction based on histogram and GLCM is taken from 50 images and the average value of feature shown in Table 1 for each features to difference heterogeneous and homogeneous lesion. T able 1. Average result of the histogram and GLCM-based texture feature extraction for each features Statistic feature of histogram and GLCM Average value of Average value of Mean Standard Deviation Skewness Energy Entropy Smoothness ASM 0 0 ASM 45 0 ASM 90 0 ASM 135 0 Contrast 0 0 Contrast 45 0 Contrast 90 0 Contrast 1350 IDM 0 0 IDM 45 0 IDM 90 0 IDM 135 0 Entropy 0 0 Entropy 45 0 Entropy 90 0 Entropy 1350 Correlation 0 0 Correlation 450 Correlation 900 Correlation 1350 92,50337 99,42738 3,01751 0,34104 2,36069 0,13258 0,28815 0,27486 0,28558 0,27288 1708,03977 2617,44874 1765,57904 2478,66052 0,63694 0,59110 0,61557 0,59197 3,53509 3,68478 3,59275 3,69508 0,00009 0,00009 0,00009 0,00009 152,49886 99,76297 -11,26714 0,14217 2,7899 0,13326 0,07602 0,06830 0,07292 0,06902 1463,88758 2336,75677 1619,49413 2251,99059 0,53278 0,44343 0,47510 0,45022 4,34803 4,59926 4,49295 4,58720 0,0001 0,0001 0,0001 0,0001 From the conducted experiment result, the ranked features difference influential are contrast, skewness, mean, entropy. ASM, energy. IDM, standard deviation, smoothness, and The value of all texture feature extraction method based on the histogram and GLCM required in the classification process. There are 50 images input of data which contains 25 heterogeneous image and 25 homogeneous The total combined features are 26 features of each Table 2 shows the accuracy, sensitivity, and specificity classifying rate. Classification based on MLP is used in this research could facilitate the process of classification of heterogeneous and homogeneous lesion with the highest accuracy. Fig. 7 shows the confusion matrix of proposed method describes that TP = the number of image with characteristics heterogeneous lesion recognizable as heterogeneous from 25 cases. TN = 25. all number of image with characteristics homogeneous lesion recognizable as homogeneous. FN = 1. only one image with characteristics heterogeneous lesion density recognizable as homogeneous lesion, and FP = 0. there is no image with characteristics homogeneous lesion recognizable as heterogeneous lesion. Fig. T he confusion matrix of proposed method Conclusion This research proposes a method to identify the characteristics and classification of lesion density of primary lung cancer by using the histogram and GLCM-based texture feature extraction. The combination of histogram and GLCM based texture feature extraction obtained achieved the accuracy of 98%, sensitivity of 96%, and specification of The obtained results are able to show quantitatively that the two methods are able to identify the characteristics of the difference of lesion density between the heterogeneous and homogeneous lung cancer. Thus, it can help the radiologists in interpreting the image. Furthermore, the proposed method in this research can be recommended as one part of CAD development to diagnose the lung cancer. In future work, it is suggested to propose other segmentation technique and feature extraction for density lesion detection. Acknowledgements The authors thank Department of Radiology of RSUP Dr. Sardjito Hospital. Yogyakarta that has provided the database for this research. References