ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.09164
77
0

E-MD3C: Taming Masked Diffusion Transformers for Efficient Zero-Shot Object Customization

13 February 2025
T. Pham
Zhang Kang
Ji Woo Hong
Xuran Zheng
Chang D. Yoo
ArXivPDFHTML
Abstract

We propose E-MD3C (E‾\underline{E}E​fficient M‾\underline{M}M​asked D‾\underline{D}D​iffusion Transformer with Disentangled C‾\underline{C}C​onditions and C‾\underline{C}C​ompact C‾\underline{C}C​ollector), a highly efficient framework for zero-shot object image customization. Unlike prior works reliant on resource-intensive Unet architectures, our approach employs lightweight masked diffusion transformers operating on latent patches, offering significantly improved computational efficiency. The framework integrates three core components: (1) an efficient masked diffusion transformer for processing autoencoder latents, (2) a disentangled condition design that ensures compactness while preserving background alignment and fine details, and (3) a learnable Conditions Collector that consolidates multiple inputs into a compact representation for efficient denoising and learning. E-MD3C outperforms the existing approach on the VITON-HD dataset across metrics such as PSNR, FID, SSIM, and LPIPS, demonstrating clear advantages in parameters, memory efficiency, and inference speed. With only 14\frac{1}{4}41​ of the parameters, our Transformer-based 468M model delivers 2.5×2.5\times2.5× faster inference and uses 23\frac{2}{3}32​ of the GPU memory compared to an 1720M Unet-based latent diffusion model.

View on arXiv
@article{pham2025_2502.09164,
  title={ E-MD3C: Taming Masked Diffusion Transformers for Efficient Zero-Shot Object Customization },
  author={ Trung X. Pham and Zhang Kang and Ji Woo Hong and Xuran Zheng and Chang D. Yoo },
  journal={arXiv preprint arXiv:2502.09164},
  year={ 2025 }
}
Comments on this paper