李子万, 范慧杰, 王强, 于祯成, 唐延东. 基于互相关-Transformer双层特征融合的目标跟踪算法[J]. 机器人, 2024, 46(1): 16-26. DOI: 10.13973/j.cnki.robot.230005
引用本文: 李子万, 范慧杰, 王强, 于祯成, 唐延东. 基于互相关-Transformer双层特征融合的目标跟踪算法[J]. 机器人, 2024, 46(1): 16-26. DOI: 10.13973/j.cnki.robot.230005
LI Ziwan, FAN Huijie, WANG Qiang, YU Zhencheng, TANG Yandong. Object Tracking Algorithm Based on Correlation-Transformer Dual Feature Fusion[J]. ROBOT, 2024, 46(1): 16-26. DOI: 10.13973/j.cnki.robot.230005
Citation: LI Ziwan, FAN Huijie, WANG Qiang, YU Zhencheng, TANG Yandong. Object Tracking Algorithm Based on Correlation-Transformer Dual Feature Fusion[J]. ROBOT, 2024, 46(1): 16-26. DOI: 10.13973/j.cnki.robot.230005

基于互相关-Transformer双层特征融合的目标跟踪算法

Object Tracking Algorithm Based on Correlation-Transformer Dual Feature Fusion

  • 摘要: 当前算法主要使用互相关操作和Transformer中的一种方法来设计特征融合网络,这种策略忽视了二者之间的优势互补,容易丢失语义信息,陷入局部最优。针对上述问题,设计了一种基于互相关-Transformer双层特征融合的目标跟踪算法,使用改进的互相关操作和Transformer方法分别对模板和搜索区域特征进行融合,实现两种融合方式的优势互补,使模板和搜索区域特征充分交互,实现特征的有效增强和充分融合,并在互相关操作中引入相似矩阵来增强模板和搜索区域中与当前帧中的目标有关联的特征,使互相关操作的匹配过程更加准确。该目标跟踪算法包括一个基于Swin-Transformer的主干网络,一个互相关和Transformer双层融合模块,一个预测分支。提出的算法在TrackingNet、LaSOT、NFS、UAV123和OTB2015五个数据集上取得了鲁棒的效果,分别达到81.8%、65.7%、66.2%、69.4%和69.8%的成功率,平均跟踪速度达到40帧/秒。

     

    Abstract: The current algorithms mainly use one of the cross-correlation operation and Transformer methods to design feature fusion network, which ignores the complementary advantages between the two methods, and is prone to lose semantic information and fall into local optimum. In order to solve the above problems, an object tracking algorithm based on correlation-Transformer dual feature fusion is designed. The improved cross-correlation operation and Transformer methods are used to fuse template and search area features respectively. The advantages of these two fusion methods are complementary, so that template and search area features can fully interact. In order to achieve effective enhancement and full fusion of features, the similarity matrix is introduced into cross-correlation operation to enhance features associated with target in current frame in template and search area, so that the matching process of cross-correlation operation is more accurate. The object tracking algorithm includes a backbone network based on Swin-Transformer, a cross-correlation and Transformer dual fusion module, as well as a prediction branch. The proposed algorithm achieves robust results on TrackingNet, LaSOT, NFS, UAV123 and OTB2015 datasets, with success rate of 81.8%, 65.7%, 66.2%, 69.4% and 69.8%, respectively, and an average tracking speed of 40 frame/s.

     

/

返回文章
返回