keyboard_arrow_up
MCGM: Mask Conditional Text-to-Imagegenerative Mode

Authors

Rami Skaik, Leonardo Rossi, Tomaso Fontanini and Andrea Prati, University of Parma, Italy

Abstract

Recent advancements in generative models have revolutionized the field of artificial intelligence, enabling the creation of highly-realistic and detailed images. In this study, we propose a novel Mask Conditional Text-to-Image Generative Model (MCGM) that leverages the power of conditional diffusion models to generate pictures with specific poses. Our model builds upon the success of the Break-a-scene [1] model in generating new scenes using a single image with multiple subjects and incorporates a mask embedding injection that allows the conditioning of the generation process. By introducing this additional level of control, MCGM offers a flexible and intuitive approach for generating specific poses for one or more subjects learned from a single image, empowering users to influence the output based on their requirements. Through extensive experimentation and evaluation, we demonstrate the effectiveness of our proposed model in generating high-quality images that meet predefined mask conditions and improving the current Break-a-scene generative model.

Keywords

Fine-tuning, Diffusion-models, Generative-models, Mask Condition.

Full Text  Volume 14, Number 18