TY - JOUR
T1 - DFFormer
T2 - Capturing Dynamic Frequency Features to Locate Image Manipulation Through Adaptive Frequency Transformer and Prototype Learning
AU - Xiang, Yan
AU - Zhao, Kaiqi
AU - Yu, Zhenghong
AU - Yuan, Xiaochen
AU - Huang, Guoheng
AU - Tian, Jinyu
AU - Li, Jianqing
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2026
Y1 - 2026
N2 - The proliferation of modern image editing tools has raised concerns about image manipulation, particularly regarding the potential to mislead the public and compromise privacy and security. Consequently, detecting and localizing tampered regions has become a critical research challenge. Traditional methods struggle with subtle manipulations, such as splicing, copy-move, and removal, which are often more discernible in the frequency domain than in the spatial domain. Additionally, the size imbalance between the tampered and background regions further complicates the detection process. To address these challenges, we propose DFFormer, an end-to-end network that leverages frequency feature differences and a dynamic token strategy for precise manipulation localization. DFFormer combines the Conventional Neural Network (CNN) and Transformer in a hybrid architecture with three key modules: the Adaptive Frequency Transformer (AFT), the Prototype Learning Module (PLM), and the Cascaded Progressive Token Fusion Head (CPTF-Head). AFT integrates high- and low-frequency components into self-attention via the Parallel Adaptive Frequency Attention (PAFA) block, enhancing tampering feature representation while preserving fine details. PLM employs KNN-based density peak clustering (DPC-KNN) and weighted token aggregation to optimize dynamic token reduction. The CPTF-Head adopts a hierarchical coarse-to-fine strategy to integrate multiscale features, thereby improving localization accuracy and edge refinement. Experiments demonstrate that DFFormer outperforms state-of-the-art models across four benchmark datasets and one real-world dataset, exhibiting superior generalization and robustness.
AB - The proliferation of modern image editing tools has raised concerns about image manipulation, particularly regarding the potential to mislead the public and compromise privacy and security. Consequently, detecting and localizing tampered regions has become a critical research challenge. Traditional methods struggle with subtle manipulations, such as splicing, copy-move, and removal, which are often more discernible in the frequency domain than in the spatial domain. Additionally, the size imbalance between the tampered and background regions further complicates the detection process. To address these challenges, we propose DFFormer, an end-to-end network that leverages frequency feature differences and a dynamic token strategy for precise manipulation localization. DFFormer combines the Conventional Neural Network (CNN) and Transformer in a hybrid architecture with three key modules: the Adaptive Frequency Transformer (AFT), the Prototype Learning Module (PLM), and the Cascaded Progressive Token Fusion Head (CPTF-Head). AFT integrates high- and low-frequency components into self-attention via the Parallel Adaptive Frequency Attention (PAFA) block, enhancing tampering feature representation while preserving fine details. PLM employs KNN-based density peak clustering (DPC-KNN) and weighted token aggregation to optimize dynamic token reduction. The CPTF-Head adopts a hierarchical coarse-to-fine strategy to integrate multiscale features, thereby improving localization accuracy and edge refinement. Experiments demonstrate that DFFormer outperforms state-of-the-art models across four benchmark datasets and one real-world dataset, exhibiting superior generalization and robustness.
KW - Image forensics
KW - adaptive frequency attention
KW - image manipulation localization
KW - prototype learning
KW - token aggregation
UR - https://www.scopus.com/pages/publications/105013986649
U2 - 10.1109/TCSVT.2025.3601659
DO - 10.1109/TCSVT.2025.3601659
M3 - Article
AN - SCOPUS:105013986649
SN - 1051-8215
VL - 36
SP - 1907
EP - 1919
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 2
ER -