Aptisi Transactions on Technopreneurship (ATT) Vol. 2 No. 2 September 2020 p-ISSN: 2655-8807 e-ISSN: 2656-8888 A Novel Approach for Facial Attendance:AttendXNet Kawal Arora​1​, Ankur Singh Bist​2​, Roshan Prakash​3​,Saksham Chaurasia​4 Signy Advanced Technologies, India Address : Level 39, One Canada Square, Canary Wharf, London E14 5AB e-mail: ​kawal@signy.io​1​, ​ankur@signy.io​2​, ​roshan@signy.io​3​, ​saksham@signy.io​4 (APA style, Justify, Arial 10pt) Example: To cite this document: Arora, K., Bist, AS., Prakash, R., & Chaurasia, S. (2020). A Novel Approach for Facial Attendance:AttendXNet. ​Aptisi Transactions On Technopreneurship (ATT)​, ​2​(2), 104-111. DOI : ​https://doi.org/10.34306/att.v2i2.86 Abstract Recent advancements in the area of facial recognition and verification introduced the possibility of facial attendance for various use cases. In this paper we present a system named as AttendXNet. Our method uses the ResNet and Multi-layer feed forward network to achieve the state of art results. Extensive analysis of various deep learning and machine learning techniques is described. Face anti-spoofing is a major challenge in facial attendance. Extended-MobileNet is used to resolve the same issue. We also introduced the end to end pipeline to implement an attendance system for various use cases. Keywords: Feature extraction, Support Vector Machine, Muli-layer Neural Network, Face Anti-spoofing, Faiss ​1. Introduction Attendance is very crucial part in any organization for maintaining the proper workflow. In schools and colleges, lecture attendance normally takes 10 minutes. If we extend our analysis for manual attendance time for a month or year, we found long hours are going into vain. Automatic attendance system is the need of hour where without human intervention attendance can be marked. Facial attendance involves the process of face detection and verification. There are various popular techniques for face detection and verification. DeepFace[2], DeepID2[3], DeepID3[4], FaceNet[5], Baidu[6], VGGface[7], light-CNN[8], Center Loss[9], L-softmax[10], Range Loss[11], L2-softmax[12], NormFace[13], CoCo Loss[14], vMF loss[15], Marginal loss [16], SphereFace[17], CCL[18],AMS-loss[19],Cosface[20],Arcface[21],DPSSD[22], Face recognition with alignment learning[23] and Ring loss[24] are some important methods in this domain. The question arises: face verification involves certain steps; will it be same for different use cases? Will unique selection of deep learning and machine learning models be sufficient for various use cases? Our proposed model comes out after diving into various models. After comparing different models, ML classifiers and distance functions, we found results that were suitable for facial attendance. We utilize the LFW [1] dataset for quantifying and comparing the performance of our proposed system on face based attendance. LFW dataset is created and maintained by researchers at the University of Massachusetts, Amherst. Original database contains four different sets of LFW images and also three different types of "aligned" images. In order to build dataset for face anti-spoofing, in different lightning conditions videos are recorded for ~30 seconds then replay the same video facing phone towards desktop after this process we get two videos for real and fake. A Novel Approach for Facial… ​■​ 104 Aptisi Transactions on Technopreneurship (ATT) Vol. 2 No. 2 September 2020 p-ISSN: 2655-8807 e-ISSN: 2656-8888 An overview of the rest of the paper is as follows: in section 2 we present related work in this area; section 3 defines proposed model architecture used; section 4 defines results and discussions. Finally in section 5 we present conclusion and future work. 2. Research Method 2.1 ​Related Work Face recognition and verification is the domain where deep learning dominated as per recent literature. DeepFace method used Alexnet architecture with softmax as loss function with training dataset Facebook(4.4M,4K)and obtained accuracy 97.35%. DeepID2 method used Alexnet architecture with contrastive loss with training dataset CelebFaces+(0.2M,10K) and obtained accuracy 99.15%. DeepID3 used VGGNet-10 architecture used contrastive loss with training dataset CelebFaces+(0.2M,10K) and obtained accuracy 99.53%. Facenet method used GoogleNet-24 architecture, triplet loss on Google(500M,10M) dataset and obtained accuracy 99.63%. Baidu used CNN-9 architecture, triplet loss on dataset baidu(1.2M,18k) and obtained 99.77% accuracy. VGGface used VGGNet-16 architecture, triplet loss on dataset VGGface(2.6m,2.6K) and obtained 98.95% accuracy. Light-CNN used light CNN architecture, softmax loss on dataset MS-Celeb-1M(8.4M,100K) and obtained 98.8% accuracy. Center Loss used Lenet+-7 architecture, center loss on dataset CASIA-WebFace, CACD2000, Celebrity+ (0.7M,17K) and obtained 99.28% accuracy. L-softmax used VGGNet-18, L-softmax on CASIA-WebFace (0.49M,10K) dataset and obtained 98.71% accuracy. Range Loss used VGGNet-18 architecture, range loss on dataset MS-Celeb-1M, CASIA-WebFace(5M,100K) and obtained 99.52% accuracy. L2-softmax used ResNet-101 architecture, L2 softmax on dataset MS-Celeb-1M (3.7M,58K) and obtained 99.78% accuracy. Normface used Resnet-28 architecture, contrastive loss on dataset(CASIA-WebFace (0.49M,10K)) and obtained 99.19% accuracy. CoColoss used loss function CoCo on dataset MS-Celeb-1M (3M,80K) and obtained 99.86% accuracy. vMF method used ResNet-27, vMF loss on dataset MS-Celeb-1M (4.6M,60K) and obtained 99.58% accuracy. Marginal Loss used ResNet-27, marginal loss on dataset (MS-Celeb-1M (4M,80K)) and obtained 99.48% accuracy. SphereFace used ResNet-64, A-softmax on dataset CASIA-WebFace (0.49M,10K) and obtained 99.42% accuracy. CCL used ResNet27, center invariant loss on training dataset (CASIA-WebFace (0.49M,10K)) and obtained 99.12% accuracy. AMS Loss ResNet-20, AMS Loss on training dataset (CASIA-WebFace (0.49M,10K)) and obtained 99.12% accuracy. Cosface used ResNet-64, cosface on training dataset CASIA-WebFace (0.49M,10K) and obtained 99.33% accuracy. ArcFace used ResNet-100, arcface loss on MS-Celeb-1M (3.8M,85K) dataset and obtained 99.83% accuracy. Ring Loss used ResNet-64, Ring Loss on MS-Celeb-1M (3.5M,31K) dataset and obtained 99.50% accuracy. 3. Findings AttendXNet method involves the best fitted combination of Face Alignment, Face embedding extraction, Classification based on embedding. In the real time scenario, input image of the face may not be aligned so we used face alignment technique to obtain accurate results. Face alignment is the process of localization of predefined landmarks. We designed the python script to retrieve output facial coordinate such that eyes comes under horizontal axis. Second crucial step is extraction of feature vector from aligned face. We trained ResNet-34 on LFW dataset and obtained128-d feature vector. A Novel Approach for Facial… ■​105 Picture 1. Convolutional Kernels and size of outputs for ResNet 34 [25] Picture 2. AttendeXNet face embedding Picture 2 depicts the process of face embedding generation from aligned input of face. For attendance task we have to store face embeddings in database. There are two approaches used by us to register user. Registration through single image or by multiple images at different angles. Picture 3 explains the process, after extraction of feature vectors from input samples, database is maintained to store the features. AttendeXNetV1 used Multi-layer Neural Network to learn from database of feature pool. Picture 3. AttendeXNetV1, version1 of module We used other machine learning models like Support vector machine, k-nearest neighbors (​KNN​) , Decision Tree and Naïve Bayes classifier. After testing on different IT workspaces and colleges we found Multi-layer Neural Network was most appropriate in face verification pipeline. AttendeXNetV2 used Faiss similarity search [26] to learn from database of feature pool. Efficient similarity search is very crucial in context of our problem. Faiss is used in this module and then tested on different IT workspaces and colleges. We found Faiss similarity is fast as compared to multi-layer Neural Network. Accuracy of AttendeXNetV2 is comparable with AttendeXNetV1. A Novel Approach for Facial… ■​106 Aptisi Transactions on Technopreneurship (ATT) Vol. 2 No. 2 September 2020 p-ISSN: 2655-8807 e-ISSN: 2656-8888 Picture 4. AttendeXNetV2, version2 of module AttendeXNetV3 used edit distance based similarity [27] to learn from database of feature pool as shown in Picture 5. Edit distance is used in this module and then tested on different IT workspaces and colleges. We found edit distance is also working well and accurate as compared to manhattan distance. Accuracy of AttendeXNetV3 is less as compared with AttendeXNetV1and AttendeXNetV2. PIcture 5: AttendeXNetV3, version3 of module There are two major attacks in case of facial attendance, print attack and video attack. Intruder can spoof the attendance system using photo or video of someone else i.e. print attack and video attack respectively. There are various gesture based techniques to sort out this problem but these techniques don’t work well for video attacks. Secondly user has to perform certain actions like eye blink etc. Advantage of deep learning based technique is better results as well as good user experience. To implement the face Anti-spoofing, first of all we preprocessed the dataset using random resize python script. Figure6 shows the basic architecture of MobileNet. After different set of experiments we found current architecture of MobileNet is not suitable for getting standard results. We then added three layers in existing network and tested it over different cases. Accuracy of our Extended-MobileNet model is 98%. We used RTX 2080 for performing the experiments, it’s recommended to use current or better version of GPU for better results. Picture 6: MobileNet Architecture[27] A Novel Approach for Facial… ■​107 ​4. Result and Discussion Picture 7: Signy App for on boarding To test proposed methods we developed Signy mobile app for on boarding the users. Figure7 shows first screen of app where admin has to start app once after that face detection module will detect face from live feed. Figure8 shows check-in process using Signy. After detection of face, same input will move into two APIs deployed on server. Initially Extended-MobileNet API will extract features from input image and classify it as real and spoof. If someone is trying to spoof the system then we will store image for security purpose. In second step, AttendeXNet API will take face and extract 128-d vector then as per architecture of AttendeXNetV1, AttendeXNetV2 and AttendeXNetV3, face will be verified. For experimental analysis, we used all versions of AttendeXNet for facial attendance. After using three variants of AttendeXNet, we found multilayer Neural Network and Faiss as most appropriate method. We selected Faiss in our final module because it was accurate and relatively fast as compared to other techniques. Picture 8: User Check-in using Signy To test proposed methods for group attendance or classroom attendance, we developed IP camera based solution for taking inputs as shown in Figure9. In context of classroom scenario, input will be sent to AttendexNet API after fix interval of time. A Novel Approach for Facial… ■​108 Aptisi Transactions on Technopreneurship (ATT) Vol. 2 No. 2 September 2020 p-ISSN: 2655-8807 e-ISSN: 2656-8888 Picture 9: Group Attendance AttendxNet API will take input as shown in Figure9 and returns confidence, face id, face count and status. AttendXNet Output {"IDs": {"confi":[97.6,96.88,96.88,96.29,98.83,98.45,97.3,96.47,96.38], "id":["5e283b1901dda02112a3f5a3", "5e283b8501dda02112a3f5a5", "5e283acf01dda02112a3f5a1", "5e282a3601dda02112a3f584", "5e283a7401dda02112a3f59f", "5e282b1901dda02112a3f58a", "5e282a9401dda02112a3f586", "5e282ed201dda02112a3f599", "5e27edd184fa810fff221d3f"], "locs":[[1032,237,1114,319], [468,140,525,197], [548,215,617,284], [946,105,1003,162], [277,257,395,375], [862,221,980,339], [146,145,202,202], [1707,318,1806,417], [1347,220,1429,302]]}, "faceCount":9, "status":"success"} Output from different frames will be consolidated to return final output as per requirements of client. In the following table we will present comparative analysis of different approaches used during experiments. A Novel Approach for Facial… ■​109 Table1: Comparative analysis of accuracy for different users(1000 requests) 5. Conclusion and Future Work In this paper we proposed AttendXNetV1, AttendXNetV2 and AttendxNetV3 which can effectively perform the task of face verification for attendance. Use of ResNet-32 for estimating face embedding with a combination of identified classifier and similarity measuring metric produced a pipeline for real world application. Extended-MobileNet ensures security from print and video attack. In future, we will improve datasets by collecting more samples in different lighting conditions for face anti-spoofing. Deep learning architectures are evolving with very fast pace, and that will be helpful for designing robust systems. Current work will be very useful for industrial or academic purposes. 5.1 Acknowledgement This project is fully funded by Signy Advanced Technologies, Level 39 One Canada Aquare, Canary Wharf, London E14 5AB. We want to extend our thanks to Parmesh, Sourabh and Matangi for developing Signy. References [1] Gary B. Huang​, Manu Ramesh, ​Tamara Berg​, and ​Erik Learned-Miller​. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. University of Massachusetts, Amherst, Technical Report 07-49​, October, 2007. [2] Taigman, Yaniv, et al. "Deepface: Closing the gap to human-level performance in face verification." ​Proceedings of the IEEE conference on computer vision and pattern recognition​. 2014. [3] ​W.-S. T. WST. Deeply learned face representations are sparse, selective, and robust. perception, 31:411–438, 2008. [4] Sun, Yi, et al. "Deepid3: Face recognition with very deep neural networks." ​arXiv preprint arXiv:1502.00873​ (2015). [5] Schroff, Florian, Dmitry Kalenichenko, and James Philbin. "Facenet: A unified embedding for face recognition and clustering." ​Proceedings of the IEEE conference on computer vision and pattern recognition.​ 2015. [6] J. Liu, Y. Deng, T. Bai, Z. Wei, and C. Huang. Targeting ultimate accuracy: Face recognition via deep embedding. arXiv preprint arXiv:1506.07310, 2015. [7] O. M. Parkhi, A. Vedaldi, A. Zisserman, et al. Deep face recognition. In BMVC, volume 1, page 6, 2015. [8] X. Wu, R. He, Z. Sun, and T. Tan. A light cnn for deep face representation with noisy labels. arXiv preprint arXiv:1511.02683,2015. [9] Y. Wu, H. Liu, J. Li, and Y. Fu. Deep face recognition with center invariant loss. In Proceedings of the on Thematic Workshops of ACM Multimedia 2017, pages 408–414. ACM, 2017. [10] W. Liu, Y. Wen, Z. Yu, and M. Yang. Large-margin softmax loss for convolutional neural networks. In ICML, pages 507–516, 2016. A Novel Approach for Facial… ■​110 Aptisi Transactions on Technopreneurship (ATT) Vol. 2 No. 2 September 2020 p-ISSN: 2655-8807 e-ISSN: 2656-8888 [11] X. Zhang, Z. Fang, Y. Wen, Z. Li, and Y. Qiao. Range loss for deepface recognition with long-tail. arXiv preprint arXiv:1611.08976, 2016.​W.-S. T. WST. Deeply learned face representations are sparse, selective,and robust. perception, 31:411–438, 2008. [12] R. Ranjan, C. D. Castillo, and R. Chellappa. L2-constrained softmax loss for discriminative face verification. arXiv preprint arXiv:1703.09507, 2017. [13] F. Wang, X. Xiang, J. Cheng, and A. L. Yuille. Normface: l 2 hypersphere embedding for face verification. arXiv preprint arXiv:1704.06369, 2017.. [14] Y. Liu, H. Li, and X. Wang. Rethinking feature discrimination and polymerization for large-scale recognition. arXiv preprint arXiv:1710.00870, 2017. [15] M. Hasnat, J. Bohn´e, J. Milgram, S. Gentric, L. Chen, et al. von mises-fisher mixture model-based deep learning: Application to faceverification. arXiv preprint arXiv:1706.04264, 2017. [16] J. Deng, Y. Zhou, and S. Zafeiriou. Marginal loss for deep face recognition. In CVPR Workshops, volume 4, 2017. [17] W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song. Sphereface: Deep hypersphere embedding for face recognition. In CVPR, volume 1, 2017. [18] X. Qi and L. Zhang. Face recognition via centralized coordinate learning. arXiv preprint arXiv:1801.05678, 2018. [19] F. Wang, W. Liu, H. Liu, and J. Cheng. Additive margin softmax for face verification. arXiv preprint arXiv:1801.05599, 2018.​W.-S. T. WST. Deeply learned face representations are sparse, selective, and robust. perception, 31:411–438, 2008. [20] H. Wang, Y. Wang, Z. Zhou, X. Ji, Z. Li, D. Gong, J. Zhou, and W. Liu. Cosface: Large margin cosine loss for deep face recognition. arXiv preprint arXiv:1801.09414, 2018. [21] J. Deng, J. Guo, and S. Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. arXiv preprint arXiv:1801.07698, 2018. [22] Ranjan, Rajeev, et al. "A fast and accurate system for face detection, identification, and verification." ​IEEE Transactions on Biometrics, Behavior, and Identity Science 1.2 (2019): 82-96. [23] Tang, Fenggao, et al. "An End-to-End Face Recognition Method with Alignment Learning." ​Optik​ (2020): 164238. [24] Y. Zheng, D. K. Pal, and M. Savvides. Ring loss: Convex feature normalization for face recognition. In CVPR, June 2018. [25] He, Kaiming, et al. "Deep residual learning for image recognition." ​Proceedings of the IEEE conference on computer vision and pattern recognition.​ 2016. [26] ​Douze, Matthijs, Jeff Johnson, and Hervé Jegou. "Faiss: A library for efficient similarity search." (2017). [27] ​Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." ​arXiv preprint arXiv:1704.04861​ (2017). A Novel Approach for Facial… ■​111