Abstract
Cross-view geo-localization represents that the same geographic target can be located by retrieving multiple platform views (UAV, satellite, and street view). The main challenge of this localization task currently is the drastic changes between different viewpoints, which reduces the retrieval performance of the model. Currently, such networks for cross-view geo-localization suffers from the following problems. Firstly, due to the diversity of scales and perspectives of geographical targets, current networks are vulnerable to the interference of localized areas when perceiving target information. Secondly, among different viewpoint targets belonging to the same category, the angles of these targets vary greatly. Therefore, a perceptual feature fusion network (PFFNet) for cross-view geo-localizationis proposed to learn location-aware features and establish semantic correlations between each viewpoint. In each viewpoint in PFFNet, a shunted contextual embedding network (SCENet) is built as the backbone network to extract the contextual information of each viewpoint separately and construct the target location encoding space. The proposed method is compared with the state-of-the-art methods on the cross-viewpoint geo-localization dataset University-1652. The experimental results show that the proposed perceptual feature fusion network achieves high adaptive performance in large-scale datasets.
Translated title of the contribution | Perceptual Feature Fusion Network for Cross-View Geo-Localization |
---|---|
Original language | Chinese (Traditional) |
Pages (from-to) | 255-262 |
Number of pages | 8 |
Journal | Computer Engineering and Applications |
Volume | 60 |
Issue number | 3 |
DOIs | |
Publication status | Published - 1 Feb 2024 |
Keywords
- contextual feature space
- cross-view geo-localization
- embedding network
- fine-grained spatial embedding
- location-aware