TY - GEN
T1 - Single Cross-domain Semantic Guidance Network for Multimodal Unsupervised Image Translation
AU - Lan, Jiaying
AU - Cheng, Lianglun
AU - Huang, Guoheng
AU - Pun, Chi Man
AU - Yuan, Xiaochen
AU - Lai, Shangyu
AU - Liu, Hong Rui
AU - Ling, Wing Kuen
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - Multimodal image-to-image translation has received great attention due to its flexibility and practicality. The existing methods lack the generality of effective style representation, and cannot capture different levels of stylistic semantic information from cross-domain images. Besides, they ignore the parallelism for cross-domain image generation, and their generator can only be responsible for specific domains. To address these issues, we propose a novel Single Cross-domain Semantic Guidance Network (SCSG-Net) for coarse-to-fine semantically controllable multimodal image translation. Images from different domains are mapped to a unified visual semantic latent space by a dual sparse feature pyramid encoder, and then the generative module generates the result images by extracting semantic style representation from the input images in a self-supervised manner guided by adaptive discrimination. Especially, our SCSG-Net meets the needs of users in different styles as well as diverse scenarios. Extensive experiments on different benchmark datasets show that our method can outperform other state-of-the-art methods both quantitatively and qualitatively.
AB - Multimodal image-to-image translation has received great attention due to its flexibility and practicality. The existing methods lack the generality of effective style representation, and cannot capture different levels of stylistic semantic information from cross-domain images. Besides, they ignore the parallelism for cross-domain image generation, and their generator can only be responsible for specific domains. To address these issues, we propose a novel Single Cross-domain Semantic Guidance Network (SCSG-Net) for coarse-to-fine semantically controllable multimodal image translation. Images from different domains are mapped to a unified visual semantic latent space by a dual sparse feature pyramid encoder, and then the generative module generates the result images by extracting semantic style representation from the input images in a self-supervised manner guided by adaptive discrimination. Especially, our SCSG-Net meets the needs of users in different styles as well as diverse scenarios. Extensive experiments on different benchmark datasets show that our method can outperform other state-of-the-art methods both quantitatively and qualitatively.
KW - Multimodal image translation
KW - Semantic guidance
KW - Unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=85152572411&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-27077-2_13
DO - 10.1007/978-3-031-27077-2_13
M3 - Conference contribution
AN - SCOPUS:85152572411
SN - 9783031270765
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 165
EP - 177
BT - MultiMedia Modeling - 29th International Conference, MMM 2023, Proceedings
A2 - Dang-Nguyen, Duc-Tien
A2 - Gurrin, Cathal
A2 - Smeaton, Alan F.
A2 - Larson, Martha
A2 - Rudinac, Stevan
A2 - Dao, Minh-Son
A2 - Trattner, Christoph
A2 - Chen, Phoebe
PB - Springer Science and Business Media Deutschland GmbH
T2 - 29th International Conference on MultiMedia Modeling, MMM 2023
Y2 - 9 January 2023 through 12 January 2023
ER -