复杂背景下基于深度学习的手势识别

彭玉青; 赵晓松; 陶慧芳; 刘宪姿; 李铁军

doi:10.13973/j.cnki.robot.180568

复杂背景下基于深度学习的手势识别

Hand Gesture Recognition against Complex Background Based on Deep Learning

摘要

摘要: 在人机交互领域，针对复杂背景下手势识别率低、算法鲁棒性差的问题，基于深度学习提出一种手势识别算法HGDR-Net（hand gesture detection and recognition network）．该算法由手势检测和识别2部分构成．在手势检测阶段，为解决复杂背景下手势区域提取困难的问题，基于改进的YOLO（you only look once）算法进行手势检测．改进的YOLO算法结合了手势检测的特点，解决了原始YOLO对小物体检测效果差、定位准确度不高的问题．在识别阶段，利用卷积神经网络（CNN）进行识别，并针对手势区域的尺寸多样性引入了空间金字塔池化（SPP）来解决CNN的多尺度输入问题．最后在训练过程中联合线下和实时2种数据增强方法避免过拟合问题，提升HGDR-Net的泛化能力．在NUS-Ⅱ和Marcel两个复杂背景的公共数据集上进行了验证实验，识别率分别达到98.65%和99.59%．结果表明本文算法能准确地从各种复杂背景中识别手势，相比于基于人工提取特征的传统算法和其他基于CNN的算法具有更高的识别准确率和更强的鲁棒性．

Abstract: A gesture recognition algorithm based on deep learning, named HGDR-Net (hand gesture detection and recognition network), is proposed facing the problems of the low recognition rate of the gestures and the poor robustness of the algorithms against the complex backgrounds in the field of human-robot interaction. The algorithm consists of two parts, i.e. gesture detection and recognition. In the phase of gesture detection, gestures are detected based on the improved YOLO (you only look once) algorithm to solve the difficult problem of the gesture region extraction in complex background. The improved YOLO algorithm combines the characteristics of gesture detection, to solve the problems of poor detection effect and low location accuracy of the original YOLO algorithm detecting small objects. In the phase of recognition, convolution neural network (CNN) is used. In addition, space pyramid pooling (SPP) is introduced to deal with the size diversity of gesture region, and thus the multi-scale input problem of CNN is solved. Finally, two data augmentation methods, offline and real-time, are combined in the training process to avoid over-fitting and to improve the generalization ability of HGDR-Net. The validation experiments are conducted on NUS-Ⅱ and Marcel, two public datasets with complex background, with the recognition rates of 98.65% and 99.59% respectively. The results show that the proposed algorithm can recognize gestures from various complex backgrounds accurately, and is of a higher recognition accuracy and a stronger robustness than traditional algorithms based on artificial extraction features and other CNN based algorithms.

HTML全文

参考文献(28)

施引文献

资源附件(0)