Abstract
Visible-Infrared person re-identification (VI-ReID) faces significant challenges due to discrepancies between visible and infrared images. Traditional two-stream networks often struggle to preserve semantic guidance from data augmentation as network depth increases. To address this, we propose the Multi-Scale Joint Learning Network (MSJLNet), which employs a novel four-stream architecture to segregate data-augmented branches from original branches, focusing on extracting robust and color-agnostic modal features. An Information Purification Module (IPM) with a channel attention mechanism is designed to dynamically filter noise and suppress redundant color information in the augmented branches. Furthermore, a Joint Semantic Learning Module (JSLM) effectively fuses global detail features with color-agnostic features, improving the model’s discriminative ability. Extensive experiments on the SYSU-MM01 and RegDB datasets demonstrate MSJLNet’s superior performance, achieving 79.94% Rank-1 accuracy and 74.96% mAP on SYSU-MM01, and 93.14% Rank-1 accuracy and 87.22% mAP on RegDB. The proposed approach offers new insights for enhancing cross-modality feature learning. Code is available at https://github.com/1849714926/MSJLNet.
| Original language | English |
|---|---|
| Article number | 146 |
| Journal | Visual Computer |
| Volume | 42 |
| Issue number | 2 |
| DOIs | |
| Publication status | Published - Jan 2026 |
Keywords
- Channel Augmentation
- Cross-Modality
- Feature Alignment
- Person ReID
Fingerprint
Dive into the research topics of 'Multi-scale feature fusion for cross-modality person re-identification: the MSJLNet approach'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver