Authors
Peng Zou and Yunfei Cai, Nanjing University of Science and Technology, China
Abstract
In this paper, a target tracking algorithm, TriT(Triplet Network Based Tracker), based on Triplet network is proposed to solve the problem of visual target tracking in complex scenes. Compared with Siamese-fc algorithm, which adopts a two-way feature extraction network, TriT uses three parallel convolutional neural networks to extract the features of the target in the first frame, the target in the previous frame and the search regions of the current frame, and then obtains the high-level semantic information of the three areas. Then, the features of the target in the first frame and the target in the previous frame are respectively convolved with the features of the current search region to obtain the similarity between each position in the search area and the target in the first frame and the target in the previous frame, so as to generate two similarity score maps. Then, interpolate and enlarge the two low-resolution score maps, and use the APCE value of the score maps as the medium to fuse the two score maps, according to which the position of the tracking target in the current frame can be located. Experiments in this paper have confirmed that, compared with some other real-time target tracking algorithms such as Siamese-fc, TriT has great advantages in tracking robustness and can effectively execute tracking tasks in complex scenes, such as illumination change, occlusion and interference of similar targets. Experimental results also show that the proposed algorithm has good real-time performance.
Keywords
Target Tracking, High Robustness, Triplet Network, Score Maps Fusion