TY - JOUR
T1 - QWNet
T2 - A quaternion wavelet network for spatial-frequency aware multi-modal image fusion
AU - Yang, Jietao
AU - Lin, Miaoshan
AU - Huang, Guoheng
AU - Chen, Xuhang
AU - Zhang, Xiaofeng
AU - Yuan, Xiaochen
AU - Pun, Chi Man
AU - Ling, Bingo Wing Kuen
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2026/4
Y1 - 2026/4
N2 - Multi-modal Image Fusion (MMIF) enhances visual tasks by combining the strengths of different image modalities to improve object visibility and texture details. However, existing methods face two major challenges: First, a lack of intrinsic frequency-domain awareness, relying heavily on complex filters and fusion techniques that can be less adaptive. Second, simplistic channel combination that overlooks essential complex inter-channel relationships. To address these issues, we propose QWNet, a novel Quaternion Wavelet Network that harnesses both spatial and frequency information to enhance the network's inductive bias towards local features. By integrating wavelet transforms, we decompose input modalities into high- and low-frequency components, capturing global structures and fine details. These components are represented as quaternions, enabling the network to model complex inter-channel dependencies often missed by traditional real-valued networks. We also introduce a Bidirectional Adaptive Attention Module (BAAM) for effective multi-modal information interaction and difference enhancement, and a Quaternion Cross-modal Fusion Module (QCFM) to strengthen inter-channel relationships and effectively combine key features from different modalities. Extensive experiments confirm that our QWNet outperforms existing methods in fusion quality and downstream tasks like semantic segmentation, using only 4.27 K parameters and a computational cost of 0.30G FLOPs. The source code will be available at https://github.com/Mrzhans/QWNet.
AB - Multi-modal Image Fusion (MMIF) enhances visual tasks by combining the strengths of different image modalities to improve object visibility and texture details. However, existing methods face two major challenges: First, a lack of intrinsic frequency-domain awareness, relying heavily on complex filters and fusion techniques that can be less adaptive. Second, simplistic channel combination that overlooks essential complex inter-channel relationships. To address these issues, we propose QWNet, a novel Quaternion Wavelet Network that harnesses both spatial and frequency information to enhance the network's inductive bias towards local features. By integrating wavelet transforms, we decompose input modalities into high- and low-frequency components, capturing global structures and fine details. These components are represented as quaternions, enabling the network to model complex inter-channel dependencies often missed by traditional real-valued networks. We also introduce a Bidirectional Adaptive Attention Module (BAAM) for effective multi-modal information interaction and difference enhancement, and a Quaternion Cross-modal Fusion Module (QCFM) to strengthen inter-channel relationships and effectively combine key features from different modalities. Extensive experiments confirm that our QWNet outperforms existing methods in fusion quality and downstream tasks like semantic segmentation, using only 4.27 K parameters and a computational cost of 0.30G FLOPs. The source code will be available at https://github.com/Mrzhans/QWNet.
KW - Multi-modal image fusion
KW - Quaternion
KW - Spatial-frequency aware
KW - Wavelet transform
UR - https://www.scopus.com/pages/publications/105024915104
U2 - 10.1016/j.neunet.2025.108364
DO - 10.1016/j.neunet.2025.108364
M3 - Article
C2 - 41317631
AN - SCOPUS:105024915104
SN - 0893-6080
VL - 196
JO - Neural Networks
JF - Neural Networks
M1 - 108364
ER -