Abstract：Low illumination or even complete darkness is a very serious problem for lots of application scenes, such as disaster relief and underground space development, which brings a challenge to target search and recognition of robot. So a method for human detection and posture recognition in low illumination scenes using the image sequences collected by an infrared depth camera is proposed. Firstly, the AlphaPose algorithm based on YOLO v4 to detect human bounding boxes and key points is used. Then, a method to recover the missed human bounding box based on feature matching is proposed to reduce the missing detection rate. Meanwhile, the D-S (Dempster-Shafer) evidence theory is used to fuse the detection results of human bounding boxes and key points, in order to reduce the detection error rate. Finally, a sequence-based hierarchical recognition method to classify the human postures is designed, which extracts the torso features of human body and uses the sequential torso features in multiple frames to recognize the human posture accurately. Experimental results demonstrate that the proposed method can achieve good performance of human detection and posture recognition in low illumination scenes, and the accuracy of posture recognition can reach 95.36%.
 Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2005: 886-893.  Viola P A, Jones M J. Rapid object detection using a boosted cascade of simple features[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2001. DOI: 10.1109/CVPR.2001.990517.  Ojala T, Pietikainen M, Maenpaa T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7): 971-987.  Dalal N, Triggs B, Schmid C. Human detection using oriented histograms of flow and appearance[M]//Lecture Notes in Computer Science, Vol.3952. Berlin, Germany: Springer, 2006. DOI: 10.1007/11744047_33.  Ren S Q, He K M, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.  Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2016: 779-788.  Redmon J, Farhadi A. YOLOv3: An incremental improvement [DB/OL]. (2018-04-08) [2018-12-11]. https://arxiv.org/abs/1804.02767.  Bochkovskiy A, Wang C Y, Liao H M. YOLOv4: Optimal speed and accuracy of object detection[DB/OL]. (2020-04-23) [2020-11-05]. https://arxiv.org/abs/2004.10934.  Chen C, Chen Q F, Xu J, et al. Learning to see in the dark[C]// IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2018: 3291-3300.  Sasagawa Y, Nagahara H. YOLO in the dark-domain adaptation method for merging multiple models[M]//Lecture Notes in Computer Science, Vol.12366. Berlin, Germany: Springer, 2020. DOI: 10.1007/978-3-030-58589-1_21.  Anil K, Nalini K, Lakshmanan S. Object detection using Gabor filters[J]. Pattern Recognition, 1997, 30(2): 295-309.  Qiu W, Wang K D, Li S Y, et al. YOLO-based detection technology for aerial infrared targets[C]//IEEE Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems. Piscataway, USA: IEEE, 2019: 1115-1119.  Johansson G. Visual perception of biological motion and a model for its analysis[J]. Attention Perception and Psychophysics, 1973, 14(2): 201-211.  Zhang P, Lan C, Xing J, et al. View adaptive recurrent neural networks for high performance human action recognition from skeleton data[C]//IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2017: 2136-2145.  Pismenskova M, Balabaeva O, Voronin V, et al. Classification of a two-dimensional pose using a human skeleton[C]//XIII International Scientific-Technical Conference “Dynamic of Technical Systems”. Rostov-on-Don, Russian Federation: EDP Sciences, 2017. DOI: 10.1051/matecconf/201713205016.  Jaeger H. The “echo state” approach to analysing and training recurrent neural networks[R]. Sankt Augustin, Germany: German National Research Institute for Computer Science, 2001.  Fang H S, Xie S Q, Tai Y W. RMPE: Regional multi-person pose estimation[C]//IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2017: 2353-2362.  Abdullah M, Mazhar J A, Awais Y, et al. Real-time hand gesture recognition based on deep learning YOLOv3 model[J]. Applied Sciences, 2021, 11(9). DOI: 10.3390/app11094164.  He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2016: 770-778.