Abstract:
This paper aims to investigate the impact of deep Transformer networks on the matching performance by stacking more Transformer layers, and address the issue of the linear growth in model size as the number of Transformer layers increases. A local feature matcher named DWR-Matcher is proposed, which combines dynamic weight recycling technology and feature enhancement. Firstly, local features are aggregated using deep Transformer networks, which allow dynamic weight recycling between adjacent Transformer layers, thus reducing model parameters and effectively lowering the storage burden caused by increasing the number of network layers. Secondly, a feature enhancement module is introduced to prevent feature collapse due to excessive network depth, and the feature representation of each Transformer layer is enhanced through residual connections, enriching the diversity of features. Finally, experiments are conducted on the HPatches, InLoc, and MegaDepth datasets. The results show that DWR-Matcher achieves relative pose estimation accuracies of 44.20%, 61.20%, and 74.90% on the MegaDepth dataset under thresholds of 5, 10 and 20°, while the number of parameters is reduced by 8.3 MB, demonstrating the excellent performance of DWR-Matcher in various complex scenarios.