ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.07703
55
6

Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model

10 March 2025
Lixue Gong
Xiaoxia Hou
Fanshi Li
Liang Li
Xiaochen Lian
Fei Liu
Liyang Liu
Wei Liu
Wei Lu
Yichun Shi
S.
Yu Tian
Zhi Tian
P. Wang
Xun Wang
Y. Wang
Guofeng Wu
Jie Wu
Xin Xia
Xuefeng Xiao
L. Yang
Zhonghua Zhai
X. Zhang
Qi Zhang
Yuwei Zhang
Shijia Zhao
Jianchao Yang
Weilin Huang
    DiffM
    VLM
ArXivPDFHTML
Abstract

Rapid advancement of diffusion models has catalyzed remarkable progress in the field of image generation. However, prevalent models such as Flux, SD3.5 and Midjourney, still grapple with issues like model bias, limited text rendering capabilities, and insufficient understanding of Chinese cultural nuances. To address these limitations, we present Seedream 2.0, a native Chinese-English bilingual image generation foundation model that excels across diverse dimensions, which adeptly manages text prompt in both Chinese and English, supporting bilingual image generation and text rendering. We develop a powerful data system that facilitates knowledge integration, and a caption system that balances the accuracy and richness for image description. Particularly, Seedream is integrated with a self-developed bilingual large language model as a text encoder, allowing it to learn native knowledge directly from massive data. This enable it to generate high-fidelity images with accurate cultural nuances and aesthetic expressions described in either Chinese or English. Beside, Glyph-Aligned ByT5 is applied for flexible character-level text rendering, while a Scaled ROPE generalizes well to untrained resolutions. Multi-phase post-training optimizations, including SFT and RLHF iterations, further improve the overall capability. Through extensive experimentation, we demonstrate that Seedream 2.0 achieves state-of-the-art performance across multiple aspects, including prompt-following, aesthetics, text rendering, and structural correctness. Furthermore, Seedream 2.0 has been optimized through multiple RLHF iterations to closely align its output with human preferences, as revealed by its outstanding ELO score. In addition, it can be readily adapted to an instruction-based image editing model, such as SeedEdit, with strong editing capability that balances instruction-following and image consistency.

View on arXiv
@article{gong2025_2503.07703,
  title={ Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model },
  author={ Lixue Gong and Xiaoxia Hou and Fanshi Li and Liang Li and Xiaochen Lian and Fei Liu and Liyang Liu and Wei Liu and Wei Lu and Yichun Shi and Shiqi Sun and Yu Tian and Zhi Tian and Peng Wang and Xun Wang and Ye Wang and Guofeng Wu and Jie Wu and Xin Xia and Xuefeng Xiao and Linjie Yang and Zhonghua Zhai and Xinyu Zhang and Qi Zhang and Yuwei Zhang and Shijia Zhao and Jianchao Yang and Weilin Huang },
  journal={arXiv preprint arXiv:2503.07703},
  year={ 2025 }
}
Comments on this paper