Abstract:
This paper aims to address limitations in multi-task scene perception, including the lack of system modeling and analysis on the uncertainty in state estimation for tasks such as detection, tracking, mapping, and localization. The scene perception task is divided into foreground and background perception tasks, where the former involves the detection and tracking of foreground objects, and the latter focuses on robot localization and mapping tasks. To achieve multi-task perception in complex scenes, both tasks are integrated within a dynamic Bayesian network framework. The multi-task scene perception problem is modeled as a joint optimization and estimation problem of system state parameters. Bayesian posterior probability estimation is employed to model the state of each system parameter. Starting with the point cloud measurement noises from LiDAR sensors, the uncertainties in ground truth annotation and self-pose estimation in the object detection and tracking network are analyzed, and uncertainty models for point cloud measurements and labels are constructed, along with a tracking model based on prediction confidence. Additionally, the impact of localization errors on mapping uncertainty and target tracking tasks is analyzed. An iterative extended Kalman filter is used to optimize the estimation of the pose's maximum posterior probability. The proposed method achieves scene perception in complex and large-scale dynamic environments. Experimental results on the KITTI and UrbanNav datasets demonstrate its effectiveness in addressing the impact of dynamic targets on environmental mapping in complex scenes, with high accuracy and robustness.