Abstract:
The current algorithms mainly use one of the cross-correlation operation and Transformer methods to design feature fusion network, which ignores the complementary advantages between the two methods, and is prone to lose semantic information and fall into local optimum. In order to solve the above problems, an object tracking algorithm based on correlation-Transformer dual feature fusion is designed. The improved cross-correlation operation and Transformer methods are used to fuse template and search area features respectively. The advantages of these two fusion methods are complementary, so that template and search area features can fully interact. In order to achieve effective enhancement and full fusion of features, the similarity matrix is introduced into cross-correlation operation to enhance features associated with target in current frame in template and search area, so that the matching process of cross-correlation operation is more accurate. The object tracking algorithm includes a backbone network based on Swin-Transformer, a cross-correlation and Transformer dual fusion module, as well as a prediction branch. The proposed algorithm achieves robust results on TrackingNet, LaSOT, NFS, UAV123 and OTB2015 datasets, with success rate of 81.8%, 65.7%, 66.2%, 69.4% and 69.8%, respectively, and an average tracking speed of 40 frame/s.