Abstract
Video depth estimation has a wide range of applications, especially in the tasks of robot navigation and autonomous driving. RNN-based encoder-decoder architectures are the most commonly used methods for depth feature prediction, but recurrent operators have limitations of large-scale perspective from global information and also face the long-term dependency problem, which often leads to inaccurate prediction of object depth in complex scenes. To alleviate these issues, this work introduces an attention-based texture-temporal Content-Adaptive Recurrent Unit (CARU) for depth estimation. The CARU performs an enhanced RNN approach that covers the texture content in a video sequence and extracts the main features from temporal frames. Also, a combination of VGGreNet and Transformer refines the extraction of global and local features to facilitate depth map estimation. Besides, to improve the detection of moving objects in the depth map, an advanced loss function is introduced to further penalise the depth estimation error of moving objects. Experimental tests on the KITTI datasets show that this work achieves competitive performance in depth estimation, demonstrating good resultant capabilities and applicability to a variety of real-time scenarios.
Original language | English |
---|---|
Pages (from-to) | 107994-108004 |
Number of pages | 11 |
Journal | IEEE Access |
Volume | 13 |
DOIs | |
Publication status | Published - 2025 |
Keywords
- CARU
- VGGreNet
- attention mechanism
- depth estimation