Seedance 2.0: Advancing Video Generation for World Complexity

15 April 2026

Team Seedance

De Chen

Liyang Chen

Xin Chen

Ying Chen

Zhuo Chen

Zhuowei Chen

Feng Cheng

Tianheng Cheng

Yufeng Cheng

Mojie Chi

Xuyan Chi

Jian Cong

Qinpeng Cui

Fei Ding

Qide Dong

Yujiao Du

Haojie Duanmu

Junliang Fan

Jiarui Fang

Jing Fang

Zetao Fang

Chengjian Feng

Yu Gao

Diandian Gu

Dong Guo

Hanzhong Guo

Qiushan Guo

Boyang Hao

Hongxiang Hao

Haoxun He

Jiaao He

Qian He

Tuyen Hoang

Heng Hu

Ruoqing Hu

Yuxiang Hu

Jiancheng Huang

Weilin Huang

Zhaoyang Huang

Zhongyi Huang

Jishuo Jin

Ming Jing

Ashley Kim

Shanshan Lao

Yichong Leng

Bingchuan Li

Gen Li

Haifeng Li

Huixia Li

Jiashi Li

Ming Li

Xiaojie Li

Xingxing Li

Yameng Li

Yiying Li

Yu Li

Yueyan Li

Chao Liang

Han Liang

Jianzhong Liang

Ying Liang

Wang Liao

J. H. Lien

Shanchuan Lin

Xi Lin

Feng Ling

Yue Ling

Fangfang Liu

Jiawei Liu

Jihao Liu

Jingtuo Liu

Shu Liu

Sichao Liu

Wei Liu

Xue Liu

Zuxi Liu

Ruijie Lu

Lecheng Lyu

Jingting Ma

Tianxiang Ma

Xiaonan Nie

Jingzhe Ning

Junjie Pan

Xitong Pan

Ronggui Peng

Xueqiong Qu

Yuxi Ren

Yuchen Shen

Guang Shi

Lei Shi

Yinglong Song

Fan Sun

Li Sun

Renfei Sun

Wenjing Tang

Boyang Tao

Zirui Tao

Dongliang Wang

Feng Wang

VGen

VLM

ArXiv (abs)PDF HTML Github

Main:22 Pages

4 Figures

Bibliography:2 Pages

31 Tables

Appendix:2 Pages

Abstract

Seedance 2.0 is a new native multi-modal audio-video generation model, officially released in China in early February 2026. Compared with its predecessors, Seedance 1.0 and 1.5 Pro, Seedance 2.0 adopts a unified, highly efficient, and large-scale architecture for multi-modal audio-video joint generation. This allows it to support four input modalities: text, image, audio, and video, by integrating one of the most comprehensive suites of multi-modal content reference and editing capabilities available in the industry to date. It delivers substantial, well-rounded improvements across all key sub-dimensions of video and audio generation. In both expert evaluations and public user tests, the model has demonstrated performance on par with the leading levels in the field. Seedance 2.0 supports direct generation of audio-video content with durations ranging from 4 to 15 seconds, with native output resolutions of 480p and 720p. For multi-modal inputs as reference, its current open platform supports up to 3 video clips, 9 images, and 3 audio clips. In addition, we provide Seedance 2.0 Fast version, an accelerated variant of Seedance 2.0 designed to boost generation speed for low-latency scenarios. Seedance 2.0 has delivered significant improvements to its foundational generation capabilities and multi-modal generation performance, bringing an enhanced creative experience for end users.

View on arXiv

Comments on this paper