An Investigation of CNN-CARU for Image Captioning

研究成果: Chapter同行評審

1 引文 斯高帕斯(Scopus)

摘要

The goal of an image description is to extract essential information and a description of the content of a media feature from an image. This description can be obtained directly from a human-understandable description of an interesting image (retrieval-based image with object(s) and their action description) or encoded by an encoder–decoder neural network. The challenge of the learning model is that it tries to project the media feature into a neutral language, which also produces the description in another feature domain. It may suffer from misidentification of scene or semantic elements. In this chapter, we attempt to address these challenges by introducing a novel image captioning framework that combines generation and retrieval. A CNN-CARU model is introduced, where the image is first encoded by a CNN-based network, and multiple captions are generated/created for a target image by an RNN network of CARU.

原文English
主出版物標題Signals and Communication Technology
發行者Springer Science and Business Media Deutschland GmbH
頁面15-23
頁數9
DOIs
出版狀態Published - 2024

出版系列

名字Signals and Communication Technology
Part F1810
ISSN(列印)1860-4862
ISSN(電子)1860-4870

指紋

深入研究「An Investigation of CNN-CARU for Image Captioning」主題。共同形成了獨特的指紋。

引用此