Authors
Yuting Xue1, Heng Zhou1, 2, Yuxuan Ding1*, Xiao Shan1, 1Xidian University, China, 2Institute of Systems Engineering, China
Abstract
The generation task from text to image generates cross modal data with consistent content by mining the semantic consistency contained in two different modal information of text and image. Due to the differences between the two modes, the task of text to image generation faces many difficulties and challenges. In this paper, we propose to boost the text-to-image synthesis through an adaptive learning and generating generative adversarial networks (ALG-GANs). First, we propose an adaptive forgetting mechanism in the generator to reduce the error accumulation and learn knowledge flexibly in the cascade structure. Besides, to evade the mode collapse caused by a strong biased surveillance, we propose a multi-task discriminator using weaksupervision information to guide the generator more comprehensively and maintain the semantic consistency in the cascade generation process. To avoid the refine difficulty aroused by the bad initialization, we judge the quality of initialization before further processing. The generator will re-sample the noise and re-initialize the bad initializations to obtain good ones. All the above contributions have been integrated in a unified framework, which is an adaptive forgetting, drafting and comprehensive guiding based text-to-image synthesis method with hierarchical generative adversarial networks. The model is evaluated on the Caltech-UCSD Birds 200 (CUB) dataset and the Oxford 102 Category Flowers (Oxford) dataset with standard metrics. The results on Inception Score (IS) and Fréchet Inception Distance (FID) show that our model outperforms the previous methods.
Keywords
Text-to-Image Synthesis, Generative Adversarial Network, Forgetting Mechanism, Semantic Consistency.