MuDI: Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models

*Indicates Equal Contribution, Indicates Equal Advising

MuDI can personalize a text-to-image model to generate images of multiple subjects without identity mixing.

[My dog] and friends together!

Description of the image


Text-to-image diffusion models have shown remarkable success in generating personalized subjects based on a few reference images. However, current methods often fail when generating multiple subjects simultaneously, resulting in mixed identities with combined attributes from different subjects. In this work, we present MuDI, a novel framework that enables multi-subject personalization by effectively decoupling identities from multiple subjects. Our main idea is to utilize segmented subjects generated by a foundation model for segmentation (Segment Anything) for both training and inference, as a form of data augmentation for training and initialization for the generation process. Moreover, we further introduce a new metric to better evaluate the performance of our method on multi- subject personalization. Experimental results show that our MuDI can produce high-quality personalized images without identity mixing, even for highly similar subjects. Specifically, in human evaluation, MuDI obtains twice the success rate for personalizing multiple subjects without identity mixing over existing baselines and is preferred over 70% against the strongest baseline.

Other Examples from Sora [link]

Visual Comparison to Existing Methods




        title={Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models}, 
        author={Sangwon Jang and Jaehyeong Jo and Kimin Lee and Sung Ju Hwang},