ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.05737
  4. Cited By
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
v1v2v3 (latest)

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

9 October 2023
Lijun Yu
José Lezama
N. B. Gundavarapu
Luca Versari
Kihyuk Sohn
David C. Minnen
Yong Cheng
Vighnesh Birodkar
Agrim Gupta
Xiuye Gu
Alexander G. Hauptmann
Boqing Gong
Ming-Hsuan Yang
Irfan Essa
David A. Ross
Lu Jiang
ArXiv (abs)PDFHTML

Papers citing "Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation"

50 / 90 papers shown
Title
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
Gaojie Lin
Jianwen Jiang
Jiaqi Yang
Zerong Zheng
Chao Liang
DiffMVGen
389
29
0
01 Jul 2025
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Yu Gao
Haoyuan Guo
Tuyen Hoang
Weilin Huang
Lu Jiang
...
Yang Zhao
Xiaozheng Zheng
Peihao Zhu
Jiaxin Zou
Feilong Zuo
DiffMVGenVLM
48
0
0
01 Jul 2025
SIDE: Semantic ID Embedding for effective learning from sequences
SIDE: Semantic ID Embedding for effective learning from sequences
Dinesh Ramasamy
Shakti Kumar
Chris Cadonic
Jiaxin Yang
Sohini Roychowdhury
Esam Abdel Rhman
Srihari Reddy
VLM
16
0
0
20 Jun 2025
Seeing What Matters: Generalizable AI-generated Video Detection with Forensic-Oriented Augmentation
Seeing What Matters: Generalizable AI-generated Video Detection with Forensic-Oriented Augmentation
Riccardo Corvi
D. Cozzolino
Ekta Prashnani
Shalini De Mello
Koki Nagano
L. Verdoliva
ViT
15
0
0
20 Jun 2025
Segment This Thing: Foveated Tokenization for Efficient Point-Prompted Segmentation
Segment This Thing: Foveated Tokenization for Efficient Point-Prompted Segmentation
Tanner Schmidt
Richard Newcombe
VLM
30
0
0
10 Jun 2025
Highly Compressed Tokenizer Can Generate Without Training
Lukas Lao Beyer
T. Li
X. Chen
S. Karaman
K. He
DiffMVLM
20
0
0
09 Jun 2025
Audio-Sync Video Generation with Multi-Stream Temporal Control
Audio-Sync Video Generation with Multi-Stream Temporal Control
Shuchen Weng
Haojie Zheng
Zheng Chang
Si Li
Boxin Shi
Xinlong Wang
DiffMVGen
30
0
0
09 Jun 2025
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models
Sangwon Jang
Taekyung Ki
Jaehyeong Jo
Jaehong Yoon
Soo Ye Kim
Zhe Lin
Sung Ju Hwang
DiffMVGen
28
0
0
08 Jun 2025
ContentV: Efficient Training of Video Generation Models with Limited Compute
Wenfeng Lin
Renjie Chen
Boyuan Liu
Shiyue Yan
Ruoyu Feng
...
Chao Feng
Jiao Ran
Qi Wu
Zuotao Liu
Mingyu Guo
VGen
111
0
0
05 Jun 2025
Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation
Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation
Tianyu Huang
Wangguandong Zheng
Tengfei Wang
Yuhao Liu
Zhenwei Wang
...
Jie Jiang
Hui Li
Rynson W. H. Lau
W. Zuo
Chunchao Guo
VGen
123
0
0
04 Jun 2025
DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models
DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models
Ziyi Wu
Anil Kag
Ivan Skorokhodov
Willi Menapace
Ashkan Mirzaei
Igor Gilitschenski
Sergey Tulyakov
Aliaksandr Siarohin
DiffMVGen
71
0
0
04 Jun 2025
PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms
PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms
Yifei Xia
Shuchen Weng
Siqi Yang
Jingqi Liu
Chengxuan Zhu
Minggui Teng
Zijian Jia
Han Jiang
Boxin Shi
DiffMVGen
102
0
0
28 May 2025
Sci-Fi: Symmetric Constraint for Frame Inbetweening
Sci-Fi: Symmetric Constraint for Frame Inbetweening
Liuhan Chen
Xiaodong Cun
Xiaoyu Li
Xianyi He
Shenghai Yuan
Jie Chen
Ying Shan
Li Yuan
VGen
81
0
0
27 May 2025
ReDDiT: Rehashing Noise for Discrete Visual Generation
ReDDiT: Rehashing Noise for Discrete Visual Generation
Tianren Ma
Xiaosong Zhang
Boyu Yang
Junlan Feng
QiXiang Ye
DiffM
122
0
0
26 May 2025
Universal Semantic Disentangled Privacy-preserving Speech Representation Learning
Universal Semantic Disentangled Privacy-preserving Speech Representation Learning
Biel Tura Vecino
Subhadeep Maji
Aravind Varier
Antonio Bonafonte
Ivan Valles
...
Roberto Barra-Chicote
Ariya Rastrow
C. Papayiannis
Volker Leutnant
Trevor Wood
30
0
0
19 May 2025
Video-GPT via Next Clip Diffusion
Video-GPT via Next Clip Diffusion
Shaobin Zhuang
Zhipeng Huang
Ying Zhang
Fangyikang Wang
Canmiao Fu
Binxin Yang
Chong Sun
Chen Li
Yali Wang
DiffMVGen
243
0
0
18 May 2025
Generative Pre-trained Autoregressive Diffusion Transformer
Generative Pre-trained Autoregressive Diffusion Transformer
Yuan Zhang
Jiacheng Jiang
Guoqing Ma
Zhiying Lu
Haoyang Huang
Jianlong Yuan
Nan Duan
VGen
138
2
0
12 May 2025
Phenotype-Guided Generative Model for High-Fidelity Cardiac MRI Synthesis: Advancing Pretraining and Clinical Applications
Phenotype-Guided Generative Model for High-Fidelity Cardiac MRI Synthesis: Advancing Pretraining and Clinical Applications
Zhiyu Li
Yujian Hu
Zhengyao Ding
Yiheng Mao
Haoyang Li
Fan Yi
Hongkun Zhang
Zhengxing Huang
MedIm
86
1
0
06 May 2025
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Xu Ma
Peize Sun
Haoyu Ma
Hao Tang
Chih-Yao Ma
...
Matt Feiszli
Peizhao Zhang
Peter Vajda
Sam S. Tsai
Y. Fu
183
4
0
24 Apr 2025
Distilling semantically aware orders for autoregressive image generation
Distilling semantically aware orders for autoregressive image generation
Rishav Pramanik
Antoine Poupon
Juan A. Rodriguez
Masih Aminbeidokhti
David Vazquez
Christopher Pal
Zhaozheng Yin
M. Pedersoli
80
0
0
23 Apr 2025
Elucidating the Design Space of Multimodal Protein Language Models
Elucidating the Design Space of Multimodal Protein Language Models
Cheng-Yen Hsieh
Xinze Wang
Daiheng Zhang
Dongyu Xue
Fei Ye
Shujian Huang
Zaixiang Zheng
Quanquan Gu
93
1
0
15 Apr 2025
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Team Seawead
Ceyuan Yang
Zhijie Lin
Yang Zhao
Shanchuan Lin
...
Zuquan Song
Zhenheng Yang
Jiashi Feng
Jianchao Yang
Lu Jiang
DiffM
184
22
0
11 Apr 2025
DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance
DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance
Yuxuan Luo
Zhengkun Rong
Lizhen Wang
Longhao Zhang
Tianshu Hu
Yongming Zhu
VGen
447
8
0
02 Apr 2025
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Size Wu
Wentao Zhang
Lumin Xu
Sheng Jin
Zhonghua Wu
Qingyi Tao
Wentao Liu
Wei Li
Chen Change Loy
VGen
463
6
0
27 Mar 2025
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Yuchao Gu
Weijia Mao
Mike Zheng Shou
VGen
176
11
0
25 Mar 2025
MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space
MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space
Lixing Xiao
Shunlin Lu
Huaijin Pi
Ke Fan
Liang Pan
Yueer Zhou
Ziyong Feng
Xiaowei Zhou
Sida Peng
Jingbo Wang
DiffMVGen
105
7
0
19 Mar 2025
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
Wei Song
Yansen Wang
Zijia Song
Yadong Li
Haoze Sun
Xin Wu
Guosheng Dong
Jianhua Xu
Jiaqi Wang
Kaicheng Yu
128
4
0
18 Mar 2025
Versatile Physics-based Character Control with Hybrid Latent Representation
Versatile Physics-based Character Control with Hybrid Latent Representation
Jinseok Bae
Jungdam Won
Donggeun Lim
I. Hwang
Y. Kim
92
0
0
17 Mar 2025
Direction-Aware Diagonal Autoregressive Image Generation
Direction-Aware Diagonal Autoregressive Image Generation
Yijia Xu
Jianzhong Ju
Jian Luan
J. Cui
185
0
0
14 Mar 2025
Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization
Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization
Kyle Sargent
Kyle Hsu
Justin Johnson
L. Fei-Fei
Jiajun Wu
DiffMMU
149
8
0
14 Mar 2025
Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
Kai Qiu
Xianrui Li
Jason Kuen
Hong Chen
Xiaohao Xu
Jiuxiang Gu
Yinyi Luo
Bhiksha Raj
Zhe Lin
Marios Savvides
166
2
0
11 Mar 2025
DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation
Runze Zhang
Guoguang Du
Xiaochuan Li
Qi Jia
Liang Jin
...
Zhenhua Guo
Yaqian Zhao
Xiaoli Gong
Rengang Li
Baoyu Fan
VGen
130
2
0
08 Mar 2025
Teaching Metric Distance to Autoregressive Multimodal Foundational Models
Teaching Metric Distance to Autoregressive Multimodal Foundational Models
Jiwan Chung
Saejin Kim
Yongrae Jo
Jinho Park
Dongjun Min
Youngjae Yu
254
0
0
04 Mar 2025
Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator
Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator
Kaiwen Zheng
Yongxin Chen
Huayu Chen
Guande He
Xuan Li
Jun Zhu
Qinsheng Zhang
DiffM
155
3
0
03 Mar 2025
Action Tokenizer Matters in In-Context Imitation Learning
An Vuong
M. Vu
Dong An
Ian Reid
119
1
0
03 Mar 2025
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Sucheng Ren
Qihang Yu
Ju He
Xiaohui Shen
Alan Yuille
Liang-Chieh Chen
VGen
225
11
0
27 Feb 2025
VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
Florent Bartoccioni
Elias Ramzi
Victor Besnier
Shashanka Venkataramanan
Tuan-Hung Vu
...
Mickael Chen
Éloi Zablocki
Andrei Bursuc
Eduardo Valle
Matthieu Cord
VGen
181
2
0
24 Feb 2025
CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers
D. She
Mushui Liu
Jingxuan Pang
Jin Wang
Zhen Yang
...
Yi Wang
Qihan Huang
Haobin Tang
YunLong Yu
Siming Fu
VGen
218
5
0
21 Feb 2025
MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation
MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation
Sihyun Yu
Meera Hahn
Dan Kondratyuk
Jinwoo Shin
Agrim Gupta
José Lezama
Irfan Essa
David A. Ross
Jonathan Huang
DiffMVGen
117
0
0
18 Feb 2025
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Theodoros Kouzelis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
DRL
283
8
0
17 Feb 2025
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
Guoqing Ma
Haoyang Huang
K. Yan
L. Chen
Nan Duan
...
Yansen Wang
Yuanwei Lu
Yu-Cheng Chen
Yu-Juan Luo
Yihao Luo
DiffMVGen
377
41
0
14 Feb 2025
History-Guided Video Diffusion
Kiwhan Song
Boyuan Chen
Max Simchowitz
Yilun Du
Russ Tedrake
Vincent Sitzmann
VGen
212
18
0
10 Feb 2025
UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Weijia Mao
Zhiyong Yang
Mike Zheng Shou
MoE
200
1
0
10 Feb 2025
Scaling Embedding Layers in Language Models
Scaling Embedding Layers in Language Models
Da Yu
Edith Cohen
Badih Ghazi
Yangsibo Huang
Pritish Kamath
Ravi Kumar
Daogao Liu
Chiyuan Zhang
172
1
0
03 Feb 2025
Visual Generation Without Guidance
Huayu Chen
Kai Jiang
Kaiwen Zheng
Jianfei Chen
Hang Su
Jun Zhu
168
2
0
28 Jan 2025
Taming Teacher Forcing for Masked Autoregressive Video Generation
Taming Teacher Forcing for Masked Autoregressive Video Generation
Deyu Zhou
Quan Sun
Yuang Peng
Kun Yan
Runpei Dong
...
Zheng Ge
Nan Duan
Xiangyu Zhang
L. Ni
H. Shum
VGen
100
9
0
21 Jan 2025
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
Dongwon Kim
Ju He
Qihang Yu
Chenglin Yang
Xiaohui Shen
Suha Kwak
Liang-Chieh Chen
VLM
139
11
0
13 Jan 2025
Abstracted Shapes as Tokens -- A Generalizable and Interpretable Model for Time-series Classification
Abstracted Shapes as Tokens -- A Generalizable and Interpretable Model for Time-series Classification
Yunshi Wen
Tengfei Ma
Tsui-Wei Weng
Lam M. Nguyen
A. Julius
AI4TS
88
3
0
08 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
174
15
0
06 Jan 2025
VidTwin: Video VAE with Decoupled Structure and Dynamics
VidTwin: Video VAE with Decoupled Structure and Dynamics
Yuchi Wang
Junliang Guo
Xinyi Xie
Tianyu He
Xu Sun
Li Zhao
DRLVGen
163
5
0
23 Dec 2024
12
Next