跳至主導覽 跳至搜尋 跳過主要內容

Gtfpose: a unified framework with double-chain GCN–transformer fusion for 3D human pose estimation

研究成果: Article同行評審

摘要

Monocular 3D human pose estimation faces numerous challenges, including depth blurring, self-occlusion, and significant pose variability. Existing methods typically rely on Graph Convolutional Networks (GCNs) to model local structure or employ Transformers to capture global relationships, yet both approaches suffer from fundamental limitations. GCNs struggle to capture global information, while Transformers are weak at extracting local details. To address these shortcomings and fuse their strengths, this study proposes the innovative unified dual-chain architecture GTFPose. Through an adaptive fusion mechanism, it dynamically balances GCNs and Transformers, leveraging both models’ advantages to ensure efficient modeling of local and global contexts. Simultaneously, we observe that Transformers cannot effectively extract both spatiotemporal information from positional encodings. To address this, we introduce a novel method, TJ-RoPE, which enhances long-term spatiotemporal reasoning by rotating positional embeddings along both joint and temporal axes. Comprehensive evaluations on Human3.6M and MPI-INF-3DHP datasets demonstrate that GTFPose surpasses existing methods on MPJPE and P-MPJPE metrics, setting new records and validating the effectiveness of the dual-chain fusion strategy for accurate and efficient 3D human pose estimation. Our code is available at: https://github.com/pray0915/GTFPose.git.

原文English
文章編號210
期刊Visual Computer
42
發行號5
DOIs
出版狀態Published - 3月 2026

指紋

深入研究「Gtfpose: a unified framework with double-chain GCN–transformer fusion for 3D human pose estimation」主題。共同形成了獨特的指紋。

引用此