Enhancing Efficiency and Quality of Image Caption Generation with CARU

Xuefei Huang, Wei Ke, Hao Sheng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

Image caption is textual explanation automatically generated by a computer according to the content in an image. It involves both image and natural language processing, and thus becomes an important research topic in pattern recognition. Deep learning has been successful in accomplishing this task, and the quality of captions generated by existing methods is already high. However, due to the broadness and variety of image caption applications, the current generated captions are still not sufficiently detailed, and the training efficiency can also be improved. Therefore, under the encoder-decoder framework of deep learning, how to use fewer parameters to improve the training efficiency and retain the quality of the generated image descriptions is a huge challenge. In this work, we introduce an improved method based on the encoder-decoder structure, adding an attention mechanism, and applying the content adaptive recurrent unit (CARU), as the decoder, to generate image captions. Inspired by GRU, CARU is designed to have comparable performance with fewer parameters, and is sensitive to the features in hidden layers. The experimental results show, based on MsCOCO dataset, the proposed method achieved better performance than that using GRU as the decoder, and took less training time, effectively improves the training efficiency.

Original languageEnglish
Title of host publicationWireless Algorithms, Systems, and Applications - 17th International Conference, WASA 2022, Proceedings
EditorsLei Wang, Michael Segal, Jenhui Chen, Tie Qiu
PublisherSpringer Science and Business Media Deutschland GmbH
Pages450-459
Number of pages10
ISBN (Print)9783031192135
DOIs
Publication statusPublished - 2022
Event17th International Conference on Wireless Algorithms, Systems, and Applications, WASA 2022 - Dalian, China
Duration: 24 Nov 202226 Nov 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13472 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th International Conference on Wireless Algorithms, Systems, and Applications, WASA 2022
Country/TerritoryChina
CityDalian
Period24/11/2226/11/22

Keywords

  • Content adaptive recurrent unit
  • Deep learning
  • Feature extraction
  • Image caption generation
  • Training efficiency

Fingerprint

Dive into the research topics of 'Enhancing Efficiency and Quality of Image Caption Generation with CARU'. Together they form a unique fingerprint.

Cite this