Hand Gesture Recognition against Complex Background Based on Deep Learning
PENG Yuqing1, ZHAO Xiaosong1, TAO Huifang1, LIU Xianzi1, LI Tiejun2
1. School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China;
2. School of Mechanical Engineering, Hebei University of Technology, Tianjin 300401, China
彭玉青, 赵晓松, 陶慧芳, 刘宪姿, 李铁军. 复杂背景下基于深度学习的手势识别[J]. 机器人, 2019, 41(4): 534-542.DOI: 10.13973/j.cnki.robot.180568.
PENG Yuqing, ZHAO Xiaosong, TAO Huifang, LIU Xianzi, LI Tiejun. Hand Gesture Recognition against Complex Background Based on Deep Learning. ROBOT, 2019, 41(4): 534-542. DOI: 10.13973/j.cnki.robot.180568.
摘要在人机交互领域,针对复杂背景下手势识别率低、算法鲁棒性差的问题,基于深度学习提出一种手势识别算法HGDR-Net(hand gesture detection and recognition network).该算法由手势检测和识别2部分构成.在手势检测阶段,为解决复杂背景下手势区域提取困难的问题,基于改进的YOLO(you only look once)算法进行手势检测.改进的YOLO算法结合了手势检测的特点,解决了原始YOLO对小物体检测效果差、定位准确度不高的问题.在识别阶段,利用卷积神经网络(CNN)进行识别,并针对手势区域的尺寸多样性引入了空间金字塔池化(SPP)来解决CNN的多尺度输入问题.最后在训练过程中联合线下和实时2种数据增强方法避免过拟合问题,提升HGDR-Net的泛化能力.在NUS-Ⅱ和Marcel两个复杂背景的公共数据集上进行了验证实验,识别率分别达到98.65%和99.59%.结果表明本文算法能准确地从各种复杂背景中识别手势,相比于基于人工提取特征的传统算法和其他基于CNN的算法具有更高的识别准确率和更强的鲁棒性.
Abstract:A gesture recognition algorithm based on deep learning, named HGDR-Net (hand gesture detection and recognition network), is proposed facing the problems of the low recognition rate of the gestures and the poor robustness of the algorithms against the complex backgrounds in the field of human-robot interaction. The algorithm consists of two parts, i.e. gesture detection and recognition. In the phase of gesture detection, gestures are detected based on the improved YOLO (you only look once) algorithm to solve the difficult problem of the gesture region extraction in complex background. The improved YOLO algorithm combines the characteristics of gesture detection, to solve the problems of poor detection effect and low location accuracy of the original YOLO algorithm detecting small objects. In the phase of recognition, convolution neural network (CNN) is used. In addition, space pyramid pooling (SPP) is introduced to deal with the size diversity of gesture region, and thus the multi-scale input problem of CNN is solved. Finally, two data augmentation methods, offline and real-time, are combined in the training process to avoid over-fitting and to improve the generalization ability of HGDR-Net. The validation experiments are conducted on NUS-Ⅱ and Marcel, two public datasets with complex background, with the recognition rates of 98.65% and 99.59% respectively. The results show that the proposed algorithm can recognize gestures from various complex backgrounds accurately, and is of a higher recognition accuracy and a stronger robustness than traditional algorithms based on artificial extraction features and other CNN based algorithms.
[1] 邓志敏.基于复杂背景下的手势识别系统[D].桂林:广西师范大学,2018.Deng Z M. Gesture recognition system based on complex background[D]. Guilin:Guangxi Normal University, 2018. [2] 易靖国,程江华,库锡树.视觉手势识别综述[J].计算机科学,2016,43(z1):103-108.Yi J G, Cheng J H, Ku X S. Review of gestures recognition based on vision[J]. Computer Science, 2016, 43(z1):103-108. [3] Pisharady P K, Vadakkepat P, Loh A P. Attention based detection and recognition of hand postures against complex backgrounds[J]. International Journal of Computer Vision, 2013, 101(3):403-419. [4] Asaari M S M, Suandi S A, Rosdi B A. Fusion of band limited phase only correlation and width centroid contour distance for finger based biometrics[J]. Expert Systems with Applications, 2014, 41(7):3367-3382. [5] Sangi P, Matilainen M, Silven O. Rotation tolerant hand pose recognition using aggregation of gradient orientationsule[M]//Lecture Notes in Computer Science, Vol.9730. Berlin, Germany:Springer, 2016:257-267. [6] Oyedotun O K, Khashman A. Deep learning in vision-based static hand gesture recognition[J]. Neural Computing and Applications, 2017, 28(12):3941-3951. [7] 王龙,刘辉,王彬,等.结合肤色模型和卷积神经网络的手势识别方法[J].计算机工程与应用,2017,53(6):209-214.Wang L, Liu H, Wang B, et al. Gesture recognition method combining skin color models and convolution neural network[J]. Computer Engineering and Applications, 2017, 53(6):209-214. [8] Mohanty A, Rambhatla S S, Sahay R R. Deep gesture:Statichand gesture recognition using CNN[M]//Advances in Intelligent Systems and Computing, Vol.460. Berlin, Germany:Springer, 2017:449-461. [9] Fang B, Sun F C, Liu H P, et al. 3D human gesture capturing and recognition by the IMMU-based data glove[J]. Neurocomputing, 2018, 277:198-207. [10] Marin G, Dominio F, Zanuttigh P. Hand gesture recognition with leap motion and Kinect devices[C]//IEEE International Conference on Image Processing. Piscataway, USA:IEEE, 2014:1565-1569. [11] Minto L, Zanuttigh P. Exploiting silhouette descriptors and synthetic data for hand gesture recognition[M]//Smart Tools and Apps for Graphics. Goslar, Germany:the Eurographics Association, 2015:15-23. [12] Redmon J, Divvala S, Girshick R, et al. You only look once:Unified, real-time object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2016:779-788. [13] He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916. [14] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems, Vol.1. New York, USA:ACM, 2012:1097-1105. [15] Zeiler M D, Fergus R. Visualizing and understanding convolutional networks[M]//Lecture Notes in Computer Science, Vol.8689. Berlin, Germany:Springer, 2014:818-833. [16] Marcel S, Bernier O. Hand posture recognition in a body-facecentered space[M]//Lecture Notes in Computer Science, Vol.1739. Berlin, Germany:Springer, 1999:97-100. [17] Redmon J, Farhadi A. YOLOv3:An incremental improve-ment[A/OL]. (2018-04-08)[2018-12-11]. https://arxiv.org/abs/1804.02767. [18] Fawcett T. An introduction to ROC analysis[J]. Pattern Recognition Letters, 2006, 27(8):861-874. [19] Everingham M, van Gool L, Williams C K I, et al. The Pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2):303-338. [20] Ren S, He K, Girshick R, et al. Faster R-CNN:Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149. [21] Zou C Y, Liu Y, Wang J Y, et al. Deformable part model basedhand detection against complex backgrounds[M]//Communica-tions in Computer and Information Science, Vol.634. Berlin, Gemany:Springer, 2016:149-159. [22] Buddhikot A G, Kulkarni N M, Shaligram A D. Hand gesture interface based on skin detection technique for automotive infotainment system[J]. International Journal of Image, Graphics and Signal Processing, 2018, 10(2):10-24. [23] Serre T, Wolf L, Bileschi S, et al. Robust object recognition with cortex-like mechanisms[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(3):411-426. [24] Triesch J, Malsburg C. A system for person independent hand posture recognition against complex backgrounds[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(12):1449-1453. [25] Doan H G, Nguyen V T, Vu H, et al. A combination of user-guide scheme and kernel descriptor on RGB-D data for robust and realtime hand posture recognition[J]. Engineering Applications of Artificial Intelligence, 2016, 49(C):103-113. [26] 吴晴.基于改进的CNN和SVM手势识别算法研究[D].南昌:江西农业大学,2018.Wu Q. Research on gesture recognition algorithm based on improved CNN and SVM[D]. Nanchang:Jiangxi AgriculturalUniversity, 2018. [27] Jia J, Jiang J M, Wang D. Recognition of hand gesture basedon Gaussian mixture model[C]//International Workshop on Content-Based Multimedia Indexing. Piscataway, USA:IEEE, 2008:353-356. [28] Azulay A, Weiss Y. Why do deep convolutional networks generalize so poorly to small image transformations?[A/OL]. (2018-05-30)[2018-12-11]. https://arxiv.org/abs/1805.12177.