JOIV : Int. J. Inform. Visualization, 5(4) - December 2021 409-414 INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION journal homepage : www.joiv.org/index.php/joiv Intelligence Eye for Blinds and Visually Impaired by Using RegionBased Convolutional Neural Network (R-CNN) Lee Ruo Yee a, Hazalila Kamaludin a,*, Noor Zuraidin Mohd Safar a, Norfaradilla Wahid a, Noryusliza Abdullah a, Dwiny Meidelfi b a Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Johor, Malaysia b Department of Information Technology, Politeknik Negeri Padang, West Sumatera, Indonesia Corresponding author: *hazalila@uthm.edu.my Abstract— Intelligence Eye is an Android based mobile application developed to help blind and visually impaired users to detect light and objects. Intelligence Eye used Region-based Convolutional Neural Networks (R-CNN) to recognize objects in the object recognition module and a vibration feedback is provided according to the light value in the light detection module. A voice guidance is provided in the application to guide the users and announce the result of the object recognition. TensorFlow Lite is used to train the neural network model for object recognition in conjunction with extensible markup language (XML) and Java in Android Studio for the programming language. For future works, improvements can be made to enhance the functionality of the Intelligence Eye application by increasing the object detection capacity in the object recognition module, add menu settings for vibration intensity in light detection module and support multiple languages for the voice guidance. Keywords— Mobile application; Android; light detection; object recognition. Manuscript received 10 Feb. 2021; revised 12 Apr. 2021; accepted 19 Oct. 2021. Date of publication 31 Dec. 2021. International Journal on Informatics Visualization is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. ease. There are three modules included in Intelligence Eye: main menu, light detection module and object recognition module. Main menu module allows the users to select the functions by using swipe gesture and pressing the buttons on screen. There are two buttons in the main menu module which are light detection button and object recognition button. Users can also choose the function by swiping up and down on screen. A voice guidance is provided to guide users to select the functions. Light detection module allows the users to detect the environment light by using the ambient light sensor of the smartphone and the system vibrates in different patterns according to the light value. When the light value is low, the vibration is slow. When the light value is high, the vibration is fast. When the light value is 0, the vibration stops. Object recognition module allows the users to identify and recognize an object by capturing the objects with live preview in realtime from the back camera of the smartphone. COCO (Common Object in Context) dataset that contains 80 object model classes is used as the object model class [6]. The recognition result is announced to users by using the Text-toSpeech (TTS) engine. I. INTRODUCTION The World Health Organization (WHO) estimates that there are 285 million visually impaired people worldwide [1]. Most mobile applications are designed for people with normal vision [2]. It is difficult for blind people to use mobile devices by using the accessibility features. The most versatile applications are not designed for the visually impaired, so they create a lot of confusion [3]. Blind and visually impaired people face several problems in their life. One of the most difficult tasks is identification of things that are useful in their daily life [4] for example accessing and using e-book applications as they must be able to navigate and use e-books in a manner equal to sighted people [5]. They also cannot detect light by themselves as these cannot be done just by touching, feeling, or smelling. They need to get the information on screen by hearing. This Intelligence Eye application combined light detection and object recognition function with 80 object model classes for blind and visually impaired people. Blind and visually impaired people are the main users of this application. All the functions in this application are specially designed for their 409 Radial Basis Function networks are the most widely used family of neural networks for pattern recognition [18]. Neural network performance increases the number of invisible layers by increasing them to some degree. By using Intelligence Eye, blind and visually impaired people can identify and recognize the objects, detect the environment light without help from others. The application guides them by using voice. Users can use swipe gestures to control the application. This reduces their time and effort for learning to use the application. Following are terms and works related to Intelligence Eye: 6) Region-Based Convolutional Neural Networks (R-CNN): R-CNN combines rectangular region proposals with convolutional neural network features. It is a two-stage detection algorithm. In an image that could include an entity, the first stage defines a subset of regions. In each area, the second stage classifies the artefact [16]. R-CNN has three variants to optimize, speed up or enhance the results of these processes [16]. 1) Blind: According to Cambridge Learner’s Dictionary, blind is defined as unable to see [7]. In Oxford Student’s Dictionary, blind refers to lacking the sense of sight. Blindness is defined as the state or condition of being unable to see because of injury, disease, or a congenital condition [8]. 7) Ambient Light Sensor: Ambient light sensor is a component in the smartphone that detects the nearby amount of light. It is used to provide data on the brightness of the light in the region. This data is typically used to adjust the ambient screen brightness control to save the battery from the monitor while maximizing the visibility [19]. It can detect the light by the reading of light intensity. 2) Visual Impairment: Visual impairment, otherwise called vision impairment or vision loss, is a diminished capacity to see to a degree that causes issues not fixable by usual means, for example, glasses [9]. Visual impairment is usually defined as the best-corrected visual acuity worse than 20/40 or 20/60 [10]. 3) Artificial Intelligence: Based on Cambridge Advanced Learner’s Dictionary [7], intelligence refers to the ability to learn, understand and make judgments or have opinions that are based on reason. Artificial Intelligence (AI) is defined as the theory and development of computer systems able to perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision making, and translation between languages [11]. Machine learning systems can automatically develop information shaping systems, rendering them more appealing than manually creating the software [12]. Machine learning is an implementation of Artificial Intelligence which allows machines the ability to automatically learn and improve on knowledge without being directly programmed. Machine learning focuses on the creation of computer programs that can access data and know for themselves. TensorFlow is a large-scale and heterogeneous running machine learning system [13]. TensorFlow Lite is used for mobile applications. It is a lightweight framework intended to run prototypes on mobile or embedded devices. It can be used in Android Studio to create an application with Artificial Intelligence. It consists of an API for users to plug into their application [14]. 8) Light Intensity: Light intensity relates to the frequency or quantity of light produced by a single source of light [19]. It is the estimate of the wavelength-weighted energy produced by a light source. There are different light levels for different environments. The light level is also known as illuminance. Lux is used to measuring the illuminance level. 9) Review on Existing System: Three existing systems which are TapTapSee [20], Seeing Assistant Home Lite [21] and Seeing AI [22] are selected. All the systems have their own strengths and weaknesses. TapTapSee from Net Ideas, LLC is a free Android and iOS image recognition application for blind and visually impaired people with a very simple interface [20]. To take a photo, the back camera needs to be aimed at the object and double-tap the screen. The photo is automatically uploaded to the servers and when a match is found, it speaks the item name. Its user interface is clear and easy to use, and it is a good application for the blind and visually impaired people. Seeing Assistant Home Lite is a community of assistants developed for everyday tasks on iOS and Android apps for mobile devices [21]. It is a mobile application with many features including light and colour detection, barcode scanner, and barcode reader [23]. It mainly uses the back camera of the devices to perform its assigned activities, but it frequently provides inconsistent data, particularly when in light detection mode. Seeing AI [22] is a free mobile application designed for the low vision community to recognize people, text, currency, colour, and objects. It is a Microsoft research project that combines cloud and AI together to create an intuitive device designed to help the sight-impaired to navigate their day. Many useful functions are provided in this application. Users may be confused sometimes as too many features in the application increase the chance of mistakes. 4) Object Recognition: Object recognition is a computer technology related to computer vision and image processing which identifies instances in digital images and videos of semantic objects of a certain type [15]. It is a technical discipline that looks for ways to simplify all the work a human visual system can do. 5) Object Recognition Methods: Object recognition methods are usually either machine-based approaches or deep learning approaches. For machine learning approaches, support vector machine (SVM) is used to do the classification. Deep learning approaches that are capable of detecting endto-end artefacts without identifying specific features and are usually based on convolutional neural networks (CNNs) [16]. Neural network is a model of computation commonly used in machine learning to solve different tasks. Perceptron, the primitive neuron model, is a two-layer structure [17]. Feedforward networks such as Multi-layer Perceptron and 10) Comparison of the System: Three existing systems are chosen to be compared with the proposed system, Intelligence Eye. The three systems are TapTapSee [20], Seeing Assistant Home Lite [21] and Seeing AI [22]. Table 1 shows the 410 comparison between the three existing systems and the Intelligence Eye on their characteristics and functions. The rest of the paper is organized as follows: Section 2 presents the material and method. Section 3 presents the result and discussion of Intelligence Eye. Finally, Section 4 concludes the work, limitations and highlights the future direction for research of Intelligence Eye. Microsoft Word 2019, Microsoft Project 2019, Android Studio, TensorFlow Lite, Balsamiq Mockup 3 and yEd Graph Editor. 3) Object-Oriented Design Phase: Based on the requirement specifications stated in the previous phase, the design of Intelligence Eye is prepared in this phase. The prototype is created to define the information architecture and content of the application. A neural network model is designed by using TensorFlow Lite in this process to fulfil the functionality of object recognition. TABLE I COMPARISON OF THE SYSTEM Characteristics Comparison TapTapSee Seeing Assistant Seeing AI Intelligence Eye Hardware Used Camera of Smartphone Camera of Smartphone, Light Sensor of Smartphone Camera of Smartphone, Light Sensor of Smartphone Camera of Smartphone, Light Sensor of Smartphone Yes Yes No Yes No Yes Yes Yes No No No Yes No Yes Yes Yes Yes Yes Yes Yes Android Based Audio Guidance Swipe gestures Light Detection Object Recognition 4) Object-Oriented Implementation Phase: In this phase, units, which are the small parts of an application, are developed according to the interface design. A unit test is done for every unit in the initial testing stage. The neural network model is plugged into the Android Studio for object recognition. Accessibility functions such as voice guidance by using TTS is implemented to the application. Then, the units are combined to the system to develop the complete application. 5) Object-Oriented Testing Phase: In this phase, functional and non-functional requirements are tested to make sure that the completed application meets the established requirements during the analysis phase. To debug and improve the user experience, functional testing is carried out and correction is made once the bugs and issues are detected. The Intelligence Eye application is ready to be implemented at the end of the phase once the bug and issues are fixed. II. MATERIAL AND METHOD A. Object-Oriented Software Development (OOSD) Object-oriented software development (OOSD) approach is used in software development as a framework for assuring the software to meet defined needs [24]. There are six phases contained in the object-oriented life cycle which are objectoriented planning, object-oriented analysis, object-oriented design, object-oriented implementation, object-oriented testing and object-oriented maintenance. 6) Object-Oriented Maintenance Phase: In this phase, the developed application will be maintained and updated frequently. This will take three months to complete. However, this phase is not included in this project due to the time limitation. 1) Object-Oriented Planning Phase: In the object-oriented planning phase, the problem statement, objectives, project scope, expected results and project significance are identified. The work plan of developing the application is modified in this phase to ensure it can be completed within the timeframe. A Gantt chart is created for the Intelligence Eye application. Research studies on similar information and knowledge from online resources and journal articles are undertaken. Comparison between the existing applications and the proposed application is conducted to analyse functions provided in each application and the limitations of existing applications. B. System Analysis 1) System Architecture Diagram: In Intelligence Eye, through user interface, users can choose to use light detection function or object recognition function by using swipe gestures or pressing buttons. In the light detection function, it will request permission to use the ambient light sensor and vibration sensor of the device. TTS engine is used to instruct users to use the light detection function. The ambient light sensor responses to the system for the light intensity reading. Vibrator sensor responses to the vibration feedback according to the light intensity reading. In object recognition function, it requests training data and the training data responses to the system when the user wants to recognize an object. The TTS engine is required for the voice guidance and the announcement of result in object recognition function. 2) Object-Oriented Analysis Phase: In this phase, the established requirements are gathered and collected. Hardware and software requirements are analysed for the development of the application. Unified Modelling Language (UML) diagrams are used to visually represent the relationship and interaction between the classes [25]. The yEd Graph Editor is used to create the UML diagrams. For hardware requirements, the computer model used is Lenovo Legion Y520 with Central Processing Unit (CPU) Intel Core i5-7300HQ, Graphic Processing Unit (GPU) NVIDIA GEFORCE GTX 1050 2GB, Random Access Memory (RAM) 8GB and runs in Window 10 operating system. There are six software that have been used to develop, design, operate and maintain Intelligence Eye which includes C. Requirement Analysis 1) Functional Requirement: Functional requirements are product features or functions to be implemented by developers to enable users to perform their tasks. For the light detection function, it allows the user to detect the brightness of environment light. A vibration feedback with different patterns according to the light intensity. For object recognition function, it allows the user to recognize the objects in real time by capturing images from the camera of 411 the device. The result is announced after the recognition process. model is invoked to recognize objects in the scene. The result is then announced by the TTS engine. 2) Non-Functional Requirement: A non-functional requirement is a measure that lays out parameters that can be used to determine the functionality of a process rather than specific behaviours. For usability requirements, users are able to learn and operate the application since it is user friendly to blind and visually impaired users. For performance, the response time is reasonable when operating the application. Intelligence Eye also must be available at any time. The maintenance features must be kept updated and repaired to improve the performance of the application. E. System Design 1) Prototype: Generally, a prototype is used to evaluate a new design to improve system analysts and users' precision. Figure 1 shows the splash screen of Intelligence Eye while Figure 2 shows the main menu of Intelligence Eye. Light detection interface as shown in Figure 3 while object recognition interface as shown in Figure 4. D. Unified Modelling Language (UML) 1) Use Case Diagram: There are a total of nine use cases for this application which includes choose function, voice guidance, swipe gestures, get light value, voice guidance, vibration feedback, capture image, recognize object in scene and speak result. Users are allowed to choose a function by using swipe gestures and pressing the buttons on the main menu page of Intelligence Eye. Voice guidance is provided in the main menu to guide the user. In the light detection function, an ambient light sensor is used to get the light value. Vibrator sensor is used to vibrate in different patterns according to the light value to alert the users. Voice guidance is provided to guide users for light detection. In object recognition function, the user is able to capture an image and the neural network model recognizes the object in the scene. After the recognition process, the TTS engine is used to announce the result. Fig. 1 Splash screen Fig. 3 Main menu Fig. 2 Light detection interface Fig. 4 Object recognition interface III. RESULT AND DISCUSSION 2) Sequence Diagram: In Intelligence Eye, users can choose a light detection function from the system interface. Once users choose to detect the light, ambient light sensor and vibration sensor are initiated and the value of light intensity is read. The light value is sent back to the system and the system will vibrate in different patterns according to the light value. Users can choose an object recognition function from the system interface. Users are able to capture images with the camera of the device. Once an image is captured, the neural network model recognizes the object in the scene by comparing it to the trained data. The result is then converted to text and announced by the TTS engine. A. Implementation of Intelligence Eye Application Intelligence Eye is developed using Android Studio which is a software that is specially designed for Android development. Java and XML programming language is used to develop the application in Android Studio. XML is a programming language that is used to design the layout for the user interface. While Java programming language is used to compile and execute the module by using Java Development Kit (JDK). 1) Text-to-Speech Engine: Text-to-Speech engine is implemented in all activity classes for the voice guidance and to speak out the result of a recognized object. 3) Class Diagram: There are six classes in Intelligence Eye which includes MainActivity, LightSensor, LightDetect, Object, ObjTrain and CameraActivity. 2) onSwipeTouchListener() Class: onSwipeTouchListener() class is implemented in Menu activity and LightDetector activity for the swipe gestures function. 4) Activity Diagram: Intelligence Eye starts with a main menu. Users need to choose the function from the menu by pressing the buttons or using swipe gestures to continue using the application. There are two functions that can be chosen in the application which are light detection and object recognition. If users choose light detection, the application goes to light detection activity. The light detection protocol is started. If the “Start” button is pressed, then the application starts to detect the light. If the “Stop” button is not pressed, the application vibrates in different patterns according to the value of light intensity until the “Stop” button is pressed. If users choose object recognition, the application invokes the camera module. Image is captured and the neural network 3) Splash Screen Activity: Splash screen activity is a module that acts as a start-up screen which appears when the application is launched. The logo of the application is animated, and a welcome message is presented by the TTS engine to welcome the user. It redirects to the main menu screen after 3 seconds. 4) Menu Activity: Menu activity is the first page shown to the user when the splash screen of Intelligence Eye is redirected. It consists of two buttons that redirect to light detection activity and object detection activity. Voice guidance is spoken to guide users to control the application. 412 Users can choose the function by pressing buttons or using the swipe gesture. 5) LightDetector Activity: LightDetector activity is the activity for light detection module of Intelligence Eye application which is to detect the environment light and give a vibration feedback to the user according to the light value. TTS engine is implemented in this activity to guide the user to swipe up for starting the detection and swipe down on screen to stop the detection. OnSwipeTouchListener() class is implemented in this activity to detect the swipe gestures of the user. Fig. 6 Result of common objects detected-1 6) Detector Activity: Detector activity is the activity for object recognition module of Intelligence Eye application which is used to detect the objects around the blind users and give a verbal feedback for the detection results. TTS engine is implemented in this activity to speak out the name of the objects. COCO dataset which has 80 object classes is used as the model to detect objects in Intelligence Eye. This convolutional neural network is used in conjunction with an API model by TensorFlow Lite to transfer the images to it for inference. Fig. 7 Result of common objects detected-2 B. Functional Testing 1) Test Plan: Test plans are developed to test whether an Intelligence Eye application meets its requirements. Test plans for each module of Intelligence Eye are tested and all the results are as expected. 2) Functional Testing for Object Recognition Module: A functional test is carried out to test on the number of objects detected by the object detection modules. A total of 80 objects are used to test for the object detection module. Figure 5 shows the results of animals that can be detected by the object recognition module. Figure 6 and Figure 7 show the results of common objects that can be detected. Figure 8 shows the food that can be detected while Figure 9 shows the kitchen utensils that can be detected by object recognition module. Figure 10 shows the results of sport equipment while Figure 11 shows the outdoor objects that can be detected. Figure 12 shows the common objects that can be detected by the object recognition module. Fig. 8 Result of food detected Fig. 9 Result of kitchen utensils detected Fig. 5 Result of animal detected Fig. 10 Result of sport equipment detected 413 REFERENCES [1] [2] [3] [4] [5] Fig. 11 Result of outdoor objects detected [6] [7] [8] [9] [10] [11] [12] Fig. 12 Result of common objects detected-3 [13] [14] IV. CONCLUSION [15] Intelligence Eye application has been developed successfully and achieved the objectives which provide a user-friendly and ease to use application to visually impaired users. There are still some limitations on this application. In future work, more languages should be added and supported by the application to provide services to the users who do not understand English. A setting menu for vibration should be added in the light detection module to customize the vibration intensity. The detection capacity for the object recognition module should be increased to allow users to detect more objects. The performance of the application should be improved to guarantee a better user experience. [16] [17] [18] [19] [20] [21] ACKNOWLEDGMENT [22] The authors would like to thank the Ministry of Higher Education (MOHE) Malaysia for supporting this research under Fundamental Research Grant Scheme Vot No. K216 Reference Code FRGS/1/2019/ICT04/UTHM/03/2 and partially sponsored by Universiti Tun Hussein Onn Malaysia. [23] [24] [25] 414 World Health Organization, “Visual Impairment and Blindness 2010,” World Heal. Organ., 2012. J. S. Sierra and J. S. R. De Togores, “Designing mobile apps for visually impaired and blind users: Using touch screen based mobile devices: IPhone/iPad,” in ACHI 2012 - 5th International Conference on Advances in Computer-Human Interactions, 2012. E. Chen, Y. Lin, C. H. Chen, and I. F. Wang, “BlindNavi: A navigation app for the visually impaired smartphone user.,” in Conference on Human Factors in Computing Systems Proceedings., 2015. R. Tapu, B. Mocanu, A. Bursuc, and T. Zaharia, “A SmartphoneBased Obstacle Detection and Classification System for Assisting Visually Impaired People,” in Proceedings of the IEEE International Conference on Computer Vision, 2013. N. Hashim, M. Saleh Ba Matraf and A. Hussain, "Identifying the Requirements of Visually Impaired Users for Accessible Mobile Ebook Applications'', JOIV : International Journal on Informatics Visualization, vol. 5, no. 2, pp. 99-104, 2021. T. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, and J. Hays, “Microsoft COCO: Common Objects in Context,” 2015. C. McIntosh, Cambridge Advanced Learner’s Dictionary, 5th ed. Cambridge, United Kingdom: Cambridge University Press, 2013. Oxford, Oxford Student’s Dictionary, 8th ed. Cambridge, United Kingdom: Oxford University Press, 2016. World Health Organization, “Change the Definition of Blindness,” World Heal. Organ., 2015. H. Hollands and J. C., “The Prevalence of Low Vision and Blindness in Canada,” Eye 20, pp. 341–346, 2016. M. Deuter and J. Bradbery, Oxford Advanced Learner’s Dictionary. Oxford, United Kingdom: Oxford University Press, 2014. P. Domingos, “A Few Useful Things to Know about Machine Learning,” 2016. M. Abadi et al., “TensorFlow: A System for Large-Scale Machine Learning.,” 2016. “TensorFlow Lite.” https://www.tensorflow.org/lite (accessed Oct. 05, 2019). A. Uçar, Y. Demir, and C. Güzeliş, “Object Recognition and Detection with Deep Learning for Autonomous Driving Applications,” 2017. R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” 2014. S. Asht and R. Dass, “Pattern Recognition Techniques: A Review,” 2012. H. Wu, Y. Zhou, Q. Luo, and M. A. Basset, “Training Feedforward Neural Networks Using Symbiotic Organisms Search Algorithm,” Comput. Intell. Neurosci., 2016. K. Dhondge, B. Choi, S. Song, and H. Park, “Optical Wireless Authentication for Smart Devices Using an Onboard Ambient Light Sensor.,” in 23rd International Conference on Computer Communication and Networks (ICCCN), 2014. “TapTapSee.” https://taptapseeapp.com/ (accessed Oct. 05, 2019). “Seeing Assistant Home Lite.” http://seeingassistant.tt.com.pl/ (accessed Oct. 05, 2019). “Microsoft.” https://www.microsoft.com/en-us/ai/seeing-ai (accessed Oct. 05, 2019). B. Leporini and M. Buzzi, “Home Automation for an Independent Living: Investigating the Needs of Visually Impaired People,” 2018. S. K. Dora and P. . Dubey, “Software Development Life Cycle (SDLC) Analytical Comparison and Survey on Traditional and Agile Methodology,” Natl. Mon. Ref. J. Res. Sci. Technol., 2013. F. Alhumaidan, “A Critical Analysis and Treatment of Important UML Diagrams Enhancing Modeling Power,” 2012.