TY - GEN
T1 - Cross-View Geo-Localization via Learning Correspondence Semantic Similarity Knowledge
AU - Chen, Guanli
AU - Huang, Guoheng
AU - Yuan, Xiaochen
AU - Chen, Xuhang
AU - Zhong, Guo
AU - Pun, Chi Man
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - Cross-view geo-localization aims at retrieving and estimating accurate geographic locations from ground images in a geo-tagged aerial image database. Existing approaches focus on two independent two-branch models to learn fine-grained representations of perspectives, neglecting to learn more discriminative representations through interactions. In this paper, we propose the GeoSSK method, which adapts the learning process of the model by learning local semantic similarity information between aerial and ground pairs via a new interaction module. We then transfer the semantic similarity knowledge learned during the interaction process to the student model through knowledge distillation. Specifically, we design a Cross-fusion Interaction Module (CIM) based on cross-attention, which learns local semantic similarity information between perspectives to adjust the learning of the model. Meanwhile, considering the presence of visual distractions in complex environments, we adjust the degree of interaction between perspectives by the Contribution Factor (CF) of the local representation to the global representation. In addition, we introduce Semantic Similarity Knowledge Distillation (SSKD) between teachers and students for cross-view geo-localization. The interaction learning model serves as the teacher, transferring its semantic similarity knowledge to the student. At the same time, we designed an Incorrect Knowledge Filter (IKF) to filter incorrect knowledge of teachers. Experimental results demonstrate the effectiveness and competitive performance of GeoSSK.
AB - Cross-view geo-localization aims at retrieving and estimating accurate geographic locations from ground images in a geo-tagged aerial image database. Existing approaches focus on two independent two-branch models to learn fine-grained representations of perspectives, neglecting to learn more discriminative representations through interactions. In this paper, we propose the GeoSSK method, which adapts the learning process of the model by learning local semantic similarity information between aerial and ground pairs via a new interaction module. We then transfer the semantic similarity knowledge learned during the interaction process to the student model through knowledge distillation. Specifically, we design a Cross-fusion Interaction Module (CIM) based on cross-attention, which learns local semantic similarity information between perspectives to adjust the learning of the model. Meanwhile, considering the presence of visual distractions in complex environments, we adjust the degree of interaction between perspectives by the Contribution Factor (CF) of the local representation to the global representation. In addition, we introduce Semantic Similarity Knowledge Distillation (SSKD) between teachers and students for cross-view geo-localization. The interaction learning model serves as the teacher, transferring its semantic similarity knowledge to the student. At the same time, we designed an Incorrect Knowledge Filter (IKF) to filter incorrect knowledge of teachers. Experimental results demonstrate the effectiveness and competitive performance of GeoSSK.
KW - cross-attention
KW - geo-localization
KW - image retrieval
KW - knowledge distillation
UR - http://www.scopus.com/inward/record.url?scp=85216109735&partnerID=8YFLogxK
U2 - 10.1007/978-981-96-2054-8_17
DO - 10.1007/978-981-96-2054-8_17
M3 - Conference contribution
AN - SCOPUS:85216109735
SN - 9789819620531
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 220
EP - 233
BT - MultiMedia Modeling - 31st International Conference on Multimedia Modeling, MMM 2025, Proceedings
A2 - Ide, Ichiro
A2 - Kompatsiaris, Ioannis
A2 - Xu, Changsheng
A2 - Yanai, Keiji
A2 - Chu, Wei-Ta
A2 - Nitta, Naoko
A2 - Riegler, Michael
A2 - Yamasaki, Toshihiko
PB - Springer Science and Business Media Deutschland GmbH
T2 - 31st International Conference on Multimedia Modeling, MMM 2025
Y2 - 8 January 2025 through 10 January 2025
ER -