... This task can be described as given the initial state of the target in the first frame of the tracking sequence, and the tracking method needs to predict the target state in other frames [16], [17], [18], [19], [20]. There are many machine learning methods are widely used in TIR tracking tasks, such as mean-shift [21], [22], [23], sparse representation [24], [25], [26], particle filters [27], [28], multimodal [29], [30], [31], [32], [33], [34], correlation filters [35], [36], [37], [38], Siamese networks [39], [40], [41], [42], [43], convolutional neural networks [44], [45], [46], [47], and so on. The traditional methods (such as mean-shift, sparse representation, and particle filters)-based TIR trackers often have certain limitations in the tracking performance due to their simple tracking models and use of manually designed features. ...