Image Content Generation with Causal Reasoning

Xiaochuan Li, Baoyu Fan, Runze Zhang, Liang Jin, Di Wang, Zhenhua Guo, Yaqian Zhao, Rengang Li

Research output: Contribution to journalConference articlepeer-review


The emergence of ChatGPT has once again sparked research in generative artificial intelligence (GAI). While people have been amazed by the generated results, they have also noticed the reasoning potential reflected in the generated textual content. However, this current ability for causal reasoning is primarily limited to the domain of language generation, such as in models like GPT-3. In visual modality, there is currently no equivalent research. Considering causal reasoning in visual content generation is significant. This is because visual information contains infinite granularity. Particularly, images can provide more intuitive and specific demonstrations for certain reasoning tasks, especially when compared to coarse-grained text. Hence, we propose a new image generation task called visual question answering with image (VQAI) and establish a dataset of the same name based on the classic Tom and Jerry animated series. Additionally, we develop a new paradigm for image generation to tackle the challenges of this task. Finally, we perform extensive experiments and analyses, including visualizations of the generated content and discussions on the potentials and limitations. The code and data are publicly available under the license of CC BY-NC-SA 4.0 for academic and non-commercial usage at:

Original languageEnglish
Pages (from-to)13646-13654
Number of pages9
JournalProceedings of the AAAI Conference on Artificial Intelligence
Issue number12
Publication statusPublished - 25 Mar 2024
Externally publishedYes
Event38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, Canada
Duration: 20 Feb 202427 Feb 2024


Dive into the research topics of 'Image Content Generation with Causal Reasoning'. Together they form a unique fingerprint.

Cite this