面向地下环境机器人的多模态目标检测方法

A Multimodal Object Detection Method for Robots in Underground Environments

  • 摘要: 针对地下复杂环境中机器人感知系统面临的低光照干扰与计算资源受限双重挑战,提出一种轻量化双模态目标检测方法。通过构建融合激光雷达(LiDAR)点云与RGB图像的双分支网络架构,在浅层、中层和深层实现多尺度特征融合。所提方法引入StarFusion模块,采用逐元素乘法增强跨模态特征交互,结合深度可分离卷积与通道压缩策略,将模型参数量压缩至2.3M。为突破算法验证瓶颈,构建包含4类地下典型目标的低光照多模态数据集,其图像亮度(25±8.3)与清晰度(18.6±6.9)显著低于常规数据集。实验表明,本文方法在自建数据集上mAP50(交并比为0.5时的平均精度均值)达到86.1%,较基准算法YOLOv8提升2.6%,推理速度达20帧/秒。将该方法实际部署于Jetson Orin NX平台的勘探机器人,结果表明,双模态互补机制有效克服了单传感器在低光照环境下的感知盲区,为地下自主作业提供了可靠的实时环境感知解决方案。

     

    Abstract: A lightweight dual-modal object detection method is proposed to address the dual challenges of low-light interference and limited computational resources faced by robotic perception systems in complex underground environments. By constructing a dual-branch network architecture that fuses LiDAR point clouds with RGB images, multi-scale feature fusion is achieved at shallow, intermediate, and deep levels. In the proposed method, the StarFusion model, featuring element-wise multiplication to enhance cross-modal feature interaction, is introduced, and depthwise separable convolutions and channel compression strategies are adopted, collectively reducing the model parameters to 2.3 million. To overcome the bottleneck of algorithm validation, a low-light multimodal dataset is constructed, containing 4 categories of typical underground targets, with image brightness (25±8.3) and sharpness (18.6±6.9) that are significantly lower than those of conventional datasets. Experimental results demonstrate that the method achieves an mAP50 (mean average precision with intersection-over-union = 0.5) of 86.1% on the custom dataset, representing a 2.6% improvement over the baseline YOLOv8 model, while achieving an inference speed of 20 frames per second. Practical deployment on an exploration robot equipped with the Jetson Orin NX platform verifies that the dual-modal complementary mechanism effectively overcomes the perception limitations of singlesensor systems in low-light conditions, providing a reliable real-time environmental perception solution for autonomous underground operations.

     

/

返回文章
返回