Attention-CARU With Texture-Temporal Network for Video Depth Estimation

Research output: Contribution to journalArticlepeer-review

Abstract

Video depth estimation has a wide range of applications, especially in the tasks of robot navigation and autonomous driving. RNN-based encoder-decoder architectures are the most commonly used methods for depth feature prediction, but recurrent operators have limitations of large-scale perspective from global information and also face the long-term dependency problem, which often leads to inaccurate prediction of object depth in complex scenes. To alleviate these issues, this work introduces an attention-based texture-temporal Content-Adaptive Recurrent Unit (CARU) for depth estimation. The CARU performs an enhanced RNN approach that covers the texture content in a video sequence and extracts the main features from temporal frames. Also, a combination of VGGreNet and Transformer refines the extraction of global and local features to facilitate depth map estimation. Besides, to improve the detection of moving objects in the depth map, an advanced loss function is introduced to further penalise the depth estimation error of moving objects. Experimental tests on the KITTI datasets show that this work achieves competitive performance in depth estimation, demonstrating good resultant capabilities and applicability to a variety of real-time scenarios.

Original languageEnglish
Pages (from-to)107994-108004
Number of pages11
JournalIEEE Access
Volume13
DOIs
Publication statusPublished - 2025

Keywords

  • CARU
  • VGGreNet
  • attention mechanism
  • depth estimation

Fingerprint

Dive into the research topics of 'Attention-CARU With Texture-Temporal Network for Video Depth Estimation'. Together they form a unique fingerprint.

Cite this