跳至主導覽 跳至搜尋 跳過主要內容

DFFormer: Capturing Dynamic Frequency Features to Locate Image Manipulation Through Adaptive Frequency Transformer and Prototype Learning

  • Yan Xiang
  • , Kaiqi Zhao
  • , Zhenghong Yu
  • , Xiaochen Yuan
  • , Guoheng Huang
  • , Jinyu Tian
  • , Jianqing Li
  • Macau University of Science and Technology
  • Shandong University of Political Science and Law
  • Guangdong Polytechnic of Science and Technology
  • Guangdong University of Technology

研究成果: Article同行評審

1 引文 斯高帕斯(Scopus)

摘要

The proliferation of modern image editing tools has raised concerns about image manipulation, particularly regarding the potential to mislead the public and compromise privacy and security. Consequently, detecting and localizing tampered regions has become a critical research challenge. Traditional methods struggle with subtle manipulations, such as splicing, copy-move, and removal, which are often more discernible in the frequency domain than in the spatial domain. Additionally, the size imbalance between the tampered and background regions further complicates the detection process. To address these challenges, we propose DFFormer, an end-to-end network that leverages frequency feature differences and a dynamic token strategy for precise manipulation localization. DFFormer combines the Conventional Neural Network (CNN) and Transformer in a hybrid architecture with three key modules: the Adaptive Frequency Transformer (AFT), the Prototype Learning Module (PLM), and the Cascaded Progressive Token Fusion Head (CPTF-Head). AFT integrates high- and low-frequency components into self-attention via the Parallel Adaptive Frequency Attention (PAFA) block, enhancing tampering feature representation while preserving fine details. PLM employs KNN-based density peak clustering (DPC-KNN) and weighted token aggregation to optimize dynamic token reduction. The CPTF-Head adopts a hierarchical coarse-to-fine strategy to integrate multiscale features, thereby improving localization accuracy and edge refinement. Experiments demonstrate that DFFormer outperforms state-of-the-art models across four benchmark datasets and one real-world dataset, exhibiting superior generalization and robustness.

原文English
頁(從 - 到)1907-1919
頁數13
期刊IEEE Transactions on Circuits and Systems for Video Technology
36
發行號2
DOIs
出版狀態Published - 2026

指紋

深入研究「DFFormer: Capturing Dynamic Frequency Features to Locate Image Manipulation Through Adaptive Frequency Transformer and Prototype Learning」主題。共同形成了獨特的指紋。

引用此