基于对比学习的水下图像增强和检测方法

Contrastive Learning-based Underwater Image Enhancement and Detection

  • 摘要: 水下图像因存在色彩失真和细节损失,严重影响了水下机器人的视觉感知能力。为实现图像增强,同时提高检测精度,提出了一种基于对比学习的水下图像增强与目标检测多任务学习框架,既生成视觉友好图像,又提高目标检测精度,实现面向目标检测任务的图像增强。针对目标纹理特征不清晰的问题,通过检测任务的区域生成模块构建对比学习的正负图像块,保证目标区域与原始图像在特征空间更加接近,同时利用检测的梯度信息引导图像增强朝着有利于目标检测的方向进行。此外,通过基于循环生成式对抗网络的图像翻译方法来学习并保留清晰图像特征实现图像增强,不需要成对的水下图像,降低了对数据的要求。最后,在EUVP、U45和UIEB数据集上进行了增强算法验证,在RUOD、URPC2020和RUIE数据集上进行了检测算法验证。实验结果表明,本文算法在主观视觉上可以有效修正颜色失真问题,同时保留了原始图像及目标的结构纹理;在客观指标上,峰值信噪比达到24.57 dB,结构相似度达到0.88。图像增强后在Faster R-CNN和YOLOv7算法上检测精度平均提升了2%。

     

    Abstract: Underwater images suffer from color distortion and loss of details, which seriously affect the visual perception ability of underwater robots. To achieve image enhancement while improving detection accuracy, a multi-task learning framework is proposed for underwater image enhancement and object detection based on contrastive learning, which not only generates visually friendly images, but also improves object detection accuracy, achieving image enhancement for object detection tasks. To address the issue of unclear target texture features, a region generation module for detection tasks is used to construct positive and negative image blocks for contrastive learning, ensuring that the target region is closer to the original image in the feature space. Moreover, the detected gradient information is used to guide image enhancement in a direction beneficial for target detection. Additionally, an image translation method based on the cycle-generative adversarial network is proposed to learn and preserve clear image features for image enhancement, eliminating the need for paired underwater images and reducing data requirements. Finally, the enhancement algorithm is validated on the EUVP, U45, and UIEB datasets, and the detection algorithm is validated on the RUOD, URPC2020, and RUIE datasets. The experimental results show that the proposed algorithm can effectively correct color distortion in subjective vision, while preserving the structural texture of the original image and the target; in terms of objective indicators, the peak signal-to-noise ratio reaches 24.57 dB and the structural similarity reaches 0.88. The detection accuracy is improved by an average of 2% on Faster R-CNN (region-based convolutional neural network) and YOLOv7 (you only look once, version 7) algorithms after image enhancement.

     

/

返回文章
返回