ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.14988
20
0

Emage: Non-Autoregressive Text-to-Image Generation

22 December 2023
Zhangyin Feng
Runyi Hu
Liangxin Liu
Fan Zhang
Duyu Tang
Yong Dai
Xiaocheng Feng
Jiwei Li
Bing Qin
Shuming Shi
    DiffM
    VLM
ArXivPDFHTML
Abstract

Autoregressive and diffusion models drive the recent breakthroughs on text-to-image generation. Despite their huge success of generating high-realistic images, a common shortcoming of these models is their high inference latency - autoregressive models run more than a thousand times successively to produce image tokens and diffusion models convert Gaussian noise into images with many hundreds of denoising steps. In this work, we explore non-autoregressive text-to-image models that efficiently generate hundreds of image tokens in parallel. We develop many model variations with different learning and inference strategies, initialized text encoders, etc. Compared with autoregressive baselines that needs to run one thousand times, our model only runs 16 times to generate images of competitive quality with an order of magnitude lower inference latency. Our non-autoregressive model with 346M parameters generates an image of 256×\times×256 with about one second on one V100 GPU.

View on arXiv
Comments on this paper