Image Content Generation with Causal Reasoning

Xiaochuan Li, Baoyu Fan, Runze Zhang, Liang Jin, Di Wang, Zhenhua Guo, Yaqian Zhao, Rengang Li

研究成果: Conference article同行評審

1 引文 斯高帕斯(Scopus)

摘要

The emergence of ChatGPT has once again sparked research in generative artificial intelligence (GAI). While people have been amazed by the generated results, they have also noticed the reasoning potential reflected in the generated textual content. However, this current ability for causal reasoning is primarily limited to the domain of language generation, such as in models like GPT-3. In visual modality, there is currently no equivalent research. Considering causal reasoning in visual content generation is significant. This is because visual information contains infinite granularity. Particularly, images can provide more intuitive and specific demonstrations for certain reasoning tasks, especially when compared to coarse-grained text. Hence, we propose a new image generation task called visual question answering with image (VQAI) and establish a dataset of the same name based on the classic Tom and Jerry animated series. Additionally, we develop a new paradigm for image generation to tackle the challenges of this task. Finally, we perform extensive experiments and analyses, including visualizations of the generated content and discussions on the potentials and limitations. The code and data are publicly available under the license of CC BY-NC-SA 4.0 for academic and non-commercial usage at: https://github.com/IEIT-AGI/MIX-Shannon/blob/main/projects/VQAI/lgd vqai.md.

原文English
頁(從 - 到)13646-13654
頁數9
期刊Proceedings of the AAAI Conference on Artificial Intelligence
38
發行號12
DOIs
出版狀態Published - 25 3月 2024
對外發佈
事件38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, Canada
持續時間: 20 2月 202427 2月 2024

指紋

深入研究「Image Content Generation with Causal Reasoning」主題。共同形成了獨特的指紋。

引用此