TY - GEN
T1 - Enhancing Efficiency and Quality of Image Caption Generation with CARU
AU - Huang, Xuefei
AU - Ke, Wei
AU - Sheng, Hao
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Image caption is textual explanation automatically generated by a computer according to the content in an image. It involves both image and natural language processing, and thus becomes an important research topic in pattern recognition. Deep learning has been successful in accomplishing this task, and the quality of captions generated by existing methods is already high. However, due to the broadness and variety of image caption applications, the current generated captions are still not sufficiently detailed, and the training efficiency can also be improved. Therefore, under the encoder-decoder framework of deep learning, how to use fewer parameters to improve the training efficiency and retain the quality of the generated image descriptions is a huge challenge. In this work, we introduce an improved method based on the encoder-decoder structure, adding an attention mechanism, and applying the content adaptive recurrent unit (CARU), as the decoder, to generate image captions. Inspired by GRU, CARU is designed to have comparable performance with fewer parameters, and is sensitive to the features in hidden layers. The experimental results show, based on MsCOCO dataset, the proposed method achieved better performance than that using GRU as the decoder, and took less training time, effectively improves the training efficiency.
AB - Image caption is textual explanation automatically generated by a computer according to the content in an image. It involves both image and natural language processing, and thus becomes an important research topic in pattern recognition. Deep learning has been successful in accomplishing this task, and the quality of captions generated by existing methods is already high. However, due to the broadness and variety of image caption applications, the current generated captions are still not sufficiently detailed, and the training efficiency can also be improved. Therefore, under the encoder-decoder framework of deep learning, how to use fewer parameters to improve the training efficiency and retain the quality of the generated image descriptions is a huge challenge. In this work, we introduce an improved method based on the encoder-decoder structure, adding an attention mechanism, and applying the content adaptive recurrent unit (CARU), as the decoder, to generate image captions. Inspired by GRU, CARU is designed to have comparable performance with fewer parameters, and is sensitive to the features in hidden layers. The experimental results show, based on MsCOCO dataset, the proposed method achieved better performance than that using GRU as the decoder, and took less training time, effectively improves the training efficiency.
KW - Content adaptive recurrent unit
KW - Deep learning
KW - Feature extraction
KW - Image caption generation
KW - Training efficiency
UR - http://www.scopus.com/inward/record.url?scp=85142872506&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-19214-2_38
DO - 10.1007/978-3-031-19214-2_38
M3 - Conference contribution
AN - SCOPUS:85142872506
SN - 9783031192135
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 450
EP - 459
BT - Wireless Algorithms, Systems, and Applications - 17th International Conference, WASA 2022, Proceedings
A2 - Wang, Lei
A2 - Segal, Michael
A2 - Chen, Jenhui
A2 - Qiu, Tie
PB - Springer Science and Business Media Deutschland GmbH
T2 - 17th International Conference on Wireless Algorithms, Systems, and Applications, WASA 2022
Y2 - 24 November 2022 through 26 November 2022
ER -