MuDI: Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models

1KAIST, 2DeepAuto.ai
*Indicates Equal Contribution, Indicates Equal Advising
MY ALT TEXT

MuDI can personalize a text-to-image model to generate images of multiple subjects without identity mixing.

[My dog] and friends together!

Description of the image

Abstract

Text-to-image diffusion models have shown remarkable success in generating a personalized subject based on a few reference images. However, current methods struggle with handling multiple subjects simultaneously, often resulting in mixed identities with combined attributes from different subjects. In this work, we present MuDI, a novel framework that enables multi-subject personalization by effectively decoupling identities from multiple subjects. Our main idea is to utilize segmented subjects generated by the Segment Anything Model as a form of data augmentation when personalizing text-to-image models. We also leverage segmented subjects in initializing the generation process. Our experiments demonstrate that MuDI can produce high-quality personalized images without identity mixing, even for highly similar subjects. In human evaluation, MuDI shows twice as many successes for personalizing multiple subjects without identity mixing over existing baselines and is preferred over 70% compared to the strongest baseline.

Other Examples from Sora [link]

Visual Comparison to Existing Methods

Methods

-->

BibTeX

@misc{jang2024identity,
        title={Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models}, 
        author={Sangwon Jang and Jaehyeong Jo and Kimin Lee and Sung Ju Hwang},
        year={2024},
        eprint={2404.04243},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
  }