Enhancing Efficiency and Quality of Image Caption Generation with CARU

Xuefei Huang, Wei Ke, Hao Sheng

研究成果: Conference contribution同行評審

4 引文 斯高帕斯(Scopus)


Image caption is textual explanation automatically generated by a computer according to the content in an image. It involves both image and natural language processing, and thus becomes an important research topic in pattern recognition. Deep learning has been successful in accomplishing this task, and the quality of captions generated by existing methods is already high. However, due to the broadness and variety of image caption applications, the current generated captions are still not sufficiently detailed, and the training efficiency can also be improved. Therefore, under the encoder-decoder framework of deep learning, how to use fewer parameters to improve the training efficiency and retain the quality of the generated image descriptions is a huge challenge. In this work, we introduce an improved method based on the encoder-decoder structure, adding an attention mechanism, and applying the content adaptive recurrent unit (CARU), as the decoder, to generate image captions. Inspired by GRU, CARU is designed to have comparable performance with fewer parameters, and is sensitive to the features in hidden layers. The experimental results show, based on MsCOCO dataset, the proposed method achieved better performance than that using GRU as the decoder, and took less training time, effectively improves the training efficiency.

主出版物標題Wireless Algorithms, Systems, and Applications - 17th International Conference, WASA 2022, Proceedings
編輯Lei Wang, Michael Segal, Jenhui Chen, Tie Qiu
發行者Springer Science and Business Media Deutschland GmbH
出版狀態Published - 2022
事件17th International Conference on Wireless Algorithms, Systems, and Applications, WASA 2022 - Dalian, China
持續時間: 24 11月 202226 11月 2022


名字Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
13472 LNCS


Conference17th International Conference on Wireless Algorithms, Systems, and Applications, WASA 2022


深入研究「Enhancing Efficiency and Quality of Image Caption Generation with CARU」主題。共同形成了獨特的指紋。