A gesture recognition algorithm based on deep learning, named HGDR-Net (hand gesture detection and recognition network), is proposed facing the problems of the low recognition rate of the gestures and the poor robustness of the algorithms against the complex backgrounds in the field of human-robot interaction. The algorithm consists of two parts, i.e. gesture detection and recognition. In the phase of gesture detection, gestures are detected based on the improved YOLO (you only look once) algorithm to solve the difficult problem of the gesture region extraction in complex background. The improved YOLO algorithm combines the characteristics of gesture detection, to solve the problems of poor detection effect and low location accuracy of the original YOLO algorithm detecting small objects. In the phase of recognition, convolution neural network (CNN) is used. In addition, space pyramid pooling (SPP) is introduced to deal with the size diversity of gesture region, and thus the multi-scale input problem of CNN is solved. Finally, two data augmentation methods, offline and real-time, are combined in the training process to avoid over-fitting and to improve the generalization ability of HGDR-Net. The validation experiments are conducted on NUS-Ⅱ and Marcel, two public datasets with complex background, with the recognition rates of 98.65% and 99.59% respectively. The results show that the proposed algorithm can recognize gestures from various complex backgrounds accurately, and is of a higher recognition accuracy and a stronger robustness than traditional algorithms based on artificial extraction features and other CNN based algorithms.