跳至主導覽 跳至搜尋 跳過主要內容

DINOv3 Guided difference feature fusion for remote sensing image change captioning: a case study on Macao land cover

  • Macao Polytechnic University

研究成果: Article同行評審

摘要

Remote sensing image change captioning (RSICC) aims to generate natural language descriptions of the changes in bi-temporal images. Existing RSICC datasets primarily focus on building change descriptions and lack descriptions of the dynamic processes in complex coastal cities. Furthermore, current algorithms often rely on transformer for difference feature comparison, but they lack local spatial inductive bias. To address these issues, we created the Macao Land Cover Change (MLCC) dataset, annotated with standardized directional terms. Meanwhile, DINOv3 Guided Difference Feature Fusion Change Captioning algorithm (DINO-DFFCC) is proposed. DINO-DFFCC uses the frozen DINOv3 as a feature encoder to obtain robust semantic features. Bi-temporal Difference Feature Adaptor (BDFA) is designed to align the semantic features from DINOv3 with the coarse-grained difference maps extracted by convolution. Re-parameterized convolution difference feature fusion module (RCDFF) is designed to iteratively fuse semantic and difference information, capturing multi-scale spatial context. Experimental results show that DINO-DFFCC outperforms the SOTA methods on the MLCC dataset, with BLEU4 of 0.4547 and CIDEr of 1.5125. The dataset and code are available at https://github.com/juncyan/dffcc.git.

原文English
期刊International Journal of Remote Sensing
DOIs
出版狀態Accepted/In press - 2026

指紋

深入研究「DINOv3 Guided difference feature fusion for remote sensing image change captioning: a case study on Macao land cover」主題。共同形成了獨特的指紋。

引用此