43

Seedance 2.0: Advancing Video Generation for World Complexity

Team Seedance
De Chen
Liyang Chen
Xin Chen
Ying Chen
Zhuo Chen
Zhuowei Chen
Feng Cheng
Tianheng Cheng
Yufeng Cheng
Mojie Chi
Xuyan Chi
Jian Cong
Qinpeng Cui
Fei Ding
Qide Dong
Yujiao Du
Haojie Duanmu
Junliang Fan
Jiarui Fang
Jing Fang
Zetao Fang
Chengjian Feng
Yu Gao
Diandian Gu
Dong Guo
Hanzhong Guo
Qiushan Guo
Boyang Hao
Hongxiang Hao
Haoxun He
Jiaao He
Qian He
Tuyen Hoang
Heng Hu
Ruoqing Hu
Yuxiang Hu
Jiancheng Huang
Weilin Huang
Zhaoyang Huang
Zhongyi Huang
Jishuo Jin
Ming Jing
Ashley Kim
Shanshan Lao
Yichong Leng
Bingchuan Li
Gen Li
Haifeng Li
Huixia Li
Jiashi Li
Ming Li
Xiaojie Li
Xingxing Li
Yameng Li
Yiying Li
Yu Li
Yueyan Li
Chao Liang
Han Liang
Jianzhong Liang
Ying Liang
Wang Liao
J. H. Lien
Shanchuan Lin
Xi Lin
Feng Ling
Yue Ling
Fangfang Liu
Jiawei Liu
Jihao Liu
Jingtuo Liu
Shu Liu
Sichao Liu
Wei Liu
Xue Liu
Zuxi Liu
Ruijie Lu
Lecheng Lyu
Jingting Ma
Tianxiang Ma
Xiaonan Nie
Jingzhe Ning
Junjie Pan
Xitong Pan
Ronggui Peng
Xueqiong Qu
Yuxi Ren
Yuchen Shen
Guang Shi
Lei Shi
Yinglong Song
Fan Sun
Li Sun
Renfei Sun
Wenjing Tang
Boyang Tao
Zirui Tao
Dongliang Wang
Feng Wang
Main:22 Pages
4 Figures
Bibliography:2 Pages
31 Tables
Appendix:2 Pages
Abstract

Seedance 2.0 is a new native multi-modal audio-video generation model, officially released in China in early February 2026. Compared with its predecessors, Seedance 1.0 and 1.5 Pro, Seedance 2.0 adopts a unified, highly efficient, and large-scale architecture for multi-modal audio-video joint generation. This allows it to support four input modalities: text, image, audio, and video, by integrating one of the most comprehensive suites of multi-modal content reference and editing capabilities available in the industry to date. It delivers substantial, well-rounded improvements across all key sub-dimensions of video and audio generation. In both expert evaluations and public user tests, the model has demonstrated performance on par with the leading levels in the field. Seedance 2.0 supports direct generation of audio-video content with durations ranging from 4 to 15 seconds, with native output resolutions of 480p and 720p. For multi-modal inputs as reference, its current open platform supports up to 3 video clips, 9 images, and 3 audio clips. In addition, we provide Seedance 2.0 Fast version, an accelerated variant of Seedance 2.0 designed to boost generation speed for low-latency scenarios. Seedance 2.0 has delivered significant improvements to its foundational generation capabilities and multi-modal generation performance, bringing an enhanced creative experience for end users.

View on arXiv
Comments on this paper