Coordinate-aware thermal infrared tracking via natural language modeling
dc.contributor.author | Yan, M | |
dc.contributor.author | Zhang, P | |
dc.contributor.author | Zhang, H | |
dc.contributor.author | Hao, R | |
dc.contributor.author | Liu, J | |
dc.contributor.author | Wang, X | |
dc.contributor.author | Liu, L | |
dc.date.accessioned | 2025-01-06T15:00:20Z | |
dc.date.issued | 2024-12-19 | |
dc.date.updated | 2024-12-26T09:50:07Z | |
dc.description.abstract | Thermal infrared (TIR) tracking is pivotal in computer vision tasks due to its all-weather imaging capability. Traditional tracking methods predominantly rely on hand-crafted features, and while deep learning has introduced correlation filtering techniques, these are often constrained by rudimentary correlation operations. Furthermore, transformer-based approaches tend to overlook temporal and coordinate information, which is critical for TIR tracking that lacks texture and color information. In this paper, to address these issues, we apply natural language modeling to TIR tracking and propose a coordinate-aware thermal infrared tracking model called NLMTrack, which enhances the utilization of coordinate and temporal information. NLMTrack applies an encoder that unifies feature extraction and feature fusion, which simplifies the TIR tracking pipeline. To address the challenge of low detail and low contrast in TIR images, on the one hand, we design a multi-level progressive fusion module that enhances the semantic representation and incorporates multi-scale features. On the other hand, the decoder combines the TIR features and the coordinate sequence features using a causal transformer to generate the target sequence step by step. Moreover, we explore an adaptive loss aimed at elevating tracking accuracy and a simple template update strategy to accommodate the target’s appearance variations. Experiments show that NLMTrack achieves state-of-the-art performance on multiple benchmarks. | en_GB |
dc.description.sponsorship | Aeronautical Science Fund, China | en_GB |
dc.description.sponsorship | National Natural Science Foundation of China | en_GB |
dc.identifier.citation | Vol. 267, article 126012 | en_GB |
dc.identifier.doi | https://doi.org/10.1016/j.eswa.2024.126012 | |
dc.identifier.grantnumber | 2024Z071080002 | en_GB |
dc.identifier.grantnumber | 62075031 | en_GB |
dc.identifier.uri | http://hdl.handle.net/10871/139496 | |
dc.identifier | ORCID: 0000-0001-9332-2700 (Wang, Xiaoyang) | |
dc.language.iso | en | en_GB |
dc.publisher | Elsevier | en_GB |
dc.relation.url | https://github.com/ELOESZHANG/NLMTrack | en_GB |
dc.rights | © 2024 The author(s). For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising from this submission. | en_GB |
dc.subject | Thermal infrared object tracking | en_GB |
dc.subject | Natural language model | en_GB |
dc.subject | Transformer tracking | en_GB |
dc.title | Coordinate-aware thermal infrared tracking via natural language modeling | en_GB |
dc.type | Article | en_GB |
dc.date.available | 2025-01-06T15:00:20Z | |
dc.identifier.issn | 0957-4174 | |
exeter.article-number | 126012 | |
dc.description | This is the author accepted manuscript. The final version is available from Elsevier via the DOI in this record | en_GB |
dc.description | Code availability: The Code is publicly available at https://github.com/ELOESZHANG/NLMTrack | en_GB |
dc.description | Data availability: Data will be made available on request. | en_GB |
dc.identifier.journal | Expert Systems with Applications | en_GB |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | en_GB |
dcterms.dateAccepted | 2024-11-30 | |
dcterms.dateSubmitted | 2024-11-04 | |
rioxxterms.version | AM | en_GB |
rioxxterms.licenseref.startdate | 2024-12-19 | |
rioxxterms.type | Journal Article/Review | en_GB |
refterms.dateFCD | 2025-01-06T14:47:53Z | |
refterms.versionFCD | AM | |
refterms.dateFOA | 2025-01-06T15:05:33Z | |
refterms.panel | B | en_GB |
exeter.rights-retention-statement | No |
Files in this item
This item appears in the following Collection(s)
Except where otherwise noted, this item's licence is described as © 2024 The author(s). For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising from this submission.