ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.05737
  4. Cited By
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

9 October 2023
Lijun Yu
José Lezama
N. B. Gundavarapu
Luca Versari
Kihyuk Sohn
David C. Minnen
Yong Cheng
Vighnesh Birodkar
Agrim Gupta
Xiuye Gu
Alexander G. Hauptmann
Boqing Gong
Ming-Hsuan Yang
Irfan Essa
David A. Ross
Lu Jiang
ArXivPDFHTML

Papers citing "Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation"

50 / 227 papers shown
Title
REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder
Yitian Zhang
Long Mai
Aniruddha Mahapatra
David Bourgin
Yicong Hong
Jonah Casebeer
Feng Liu
Y. Fu
DiffM
VGen
56
0
0
11 Mar 2025
Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
Kai Qiu
Xianrui Li
Jason Kuen
Hongyu Chen
Xiaohao Xu
Jiuxiang Gu
Yinyi Luo
Bhiksha Raj
Zhe-nan Lin
Marios Savvides
62
0
0
11 Mar 2025
AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion
Mingzhen Sun
Weining Wang
Gen Li
Jiawei Liu
Jiahui Sun
Wanquan Feng
Shanshan Lao
Siyu Zhou
Qian He
Jiaheng Liu
DiffM
VGen
84
3
0
10 Mar 2025
V2Flow: Unifying Visual Tokenization and Large Language Model Vocabularies for Autoregressive Image Generation
Guiwei Zhang
Tianyu Zhang
Mohan Zhou
Yalong Bai
Biye Li
66
0
0
10 Mar 2025
DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation
Runze Zhang
Guoguang Du
Xiaochuan Li
Qi Jia
Liang Jin
...
Zhenhua Guo
Yaqian Zhao
Xiaoli Gong
Rengang Li
Baoyu Fan
VGen
75
0
0
08 Mar 2025
Text-Speech Language Models with Improved Cross-Modal Transfer by Aligning Abstraction Levels
Santiago Cuervo
Adel Moumen
Yanis Labrak
Sameer Khurana
Antoine Laurent
Mickael Rouvier
R. Marxer
77
1
0
08 Mar 2025
Discrete Contrastive Learning for Diffusion Policies in Autonomous Driving
Kalle Kujanpää
Daulet Baimukashev
Farzeen Munir
Shoaib Azam
Tomasz Piotr Kucner
Joni Pajarinen
Ville Kyrki
43
0
0
07 Mar 2025
Frequency Autoregressive Image Generation with Continuous Tokens
Hu Yu
Hao Luo
Hangjie Yuan
Yu Rong
Feng Zhao
VGen
44
3
0
07 Mar 2025
FuseChat-3.0: Preference Optimization Meets Heterogeneous Model Fusion
Ziyi Yang
Fanqi Wan
Longguang Zhong
Canbin Huang
Guosheng Liang
Xiaojun Quan
MoMe
95
0
0
06 Mar 2025
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
Rui Zhao
Weijia Mao
Mike Zheng Shou
66
0
0
05 Mar 2025
ARINAR: Bi-Level Autoregressive Feature-by-Feature Generative Models
Qinyu Zhao
Stephen Gould
Liang Zheng
DiffM
GAN
VGen
VLM
80
0
0
04 Mar 2025
Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator
Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator
Kaiwen Zheng
Yongxin Chen
Huayu Chen
Guande He
Xuan Li
Jun Zhu
Qinsheng Zhang
DiffM
49
0
0
03 Mar 2025
Lossy Neural Compression for Geospatial Analytics: A Review
Carlos Gomes
Isabelle Wittmann
Damien Robert
Johannes Jakubik
Tim Reichelt
...
Romeo Kienzler
Rania Briq
Sabrina Benassou
Michele Lazzarini
C. Albrecht
96
2
0
03 Mar 2025
Action Tokenizer Matters in In-Context Imitation Learning
An Vuong
M. Vu
Dong An
Ian Reid
61
1
0
03 Mar 2025
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Sucheng Ren
Qihang Yu
Ju He
Xiaohui Shen
Alan Yuille
Liang-Chieh Chen
VGen
83
6
0
27 Feb 2025
VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
Florent Bartoccioni
Elias Ramzi
Victor Besnier
Shashanka Venkataramanan
Tuan-Hung Vu
...
Mickael Chen
Éloi Zablocki
Andrei Bursuc
Eduardo Valle
Matthieu Cord
VGen
86
1
0
24 Feb 2025
CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers
D. She
Mushui Liu
Jingxuan Pang
Jin Wang
Zhen Yang
...
Yi Wang
Qihan Huang
Haobin Tang
YunLong Yu
Siming Fu
VGen
96
4
0
21 Feb 2025
MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation
MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation
Sihyun Yu
Meera Hahn
Dan Kondratyuk
Jinwoo Shin
Agrim Gupta
José Lezama
Irfan Essa
David A. Ross
Jonathan Huang
DiffM
VGen
77
0
0
18 Feb 2025
From Principles to Applications: A Comprehensive Survey of Discrete Tokenizers in Generation, Comprehension, Recommendation, and Information Retrieval
From Principles to Applications: A Comprehensive Survey of Discrete Tokenizers in Generation, Comprehension, Recommendation, and Information Retrieval
Jian Jia
Jingtong Gao
Ben Xue
Junhao Wang
Qingpeng Cai
Quan Chen
Xiangyu Zhao
Peng Jiang
Kun Gai
OffRL
77
0
0
18 Feb 2025
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Theodoros Kouzelis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
DRL
80
5
0
17 Feb 2025
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
Guoqing Ma
Haoyang Huang
K. Yan
L. Chen
Nan Duan
...
Yansen Wang
Yuanwei Lu
Yu-Cheng Chen
Yu-Juan Luo
Y. Luo
DiffM
VGen
175
18
0
14 Feb 2025
History-Guided Video Diffusion
Kiwhan Song
Boyuan Chen
Max Simchowitz
Yilun Du
Russ Tedrake
Vincent Sitzmann
VGen
117
7
0
10 Feb 2025
UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Weijia Mao
Zhengyuan Yang
Mike Zheng Shou
MoE
78
0
0
10 Feb 2025
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
Gaojie Lin
Jianwen Jiang
Jiaqi Yang
Zerong Zheng
Chao Liang
DiffM
VGen
185
12
0
03 Feb 2025
Scaling Embedding Layers in Language Models
Scaling Embedding Layers in Language Models
Da Yu
Edith Cohen
Badih Ghazi
Yangsibo Huang
Pritish Kamath
Ravi Kumar
Daogao Liu
Chiyuan Zhang
82
0
0
03 Feb 2025
Visual Generation Without Guidance
Huayu Chen
Kai Jiang
Kaiwen Zheng
Jianfei Chen
Hang Su
Jun Zhu
57
0
0
28 Jan 2025
Taming Teacher Forcing for Masked Autoregressive Video Generation
Taming Teacher Forcing for Masked Autoregressive Video Generation
Deyu Zhou
Quan Sun
Yuang Peng
Kun Yan
Runpei Dong
...
Zheng Ge
Nan Duan
Xiangyu Zhang
L. Ni
H. Shum
VGen
54
7
0
21 Jan 2025
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
Dongwon Kim
Ju He
Qihang Yu
Chenglin Yang
Xiaohui Shen
Suha Kwak
Liang-Chieh Chen
VLM
54
6
0
13 Jan 2025
Abstracted Shapes as Tokens -- A Generalizable and Interpretable Model for Time-series Classification
Abstracted Shapes as Tokens -- A Generalizable and Interpretable Model for Time-series Classification
Yunshi Wen
Tengfei Ma
Tsui-Wei Weng
Lam M. Nguyen
A. Julius
AI4TS
45
1
0
08 Jan 2025
CAT: Content-Adaptive Image Tokenization
Junhong Shen
Kushal Tirumala
Michihiro Yasunaga
Ishan Misra
Luke Zettlemoyer
Lili Yu
Chunting Zhou
35
0
0
06 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
88
12
0
06 Jan 2025
GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking
Weikang Bian
Zhaoyang Huang
Xiaoyu Shi
Yijin Li
Fu-Yun Wang
Hongsheng Li
3DGS
VGen
DiffM
42
4
0
05 Jan 2025
Bridging Interpretability and Robustness Using LIME-Guided Model
  Refinement
Bridging Interpretability and Robustness Using LIME-Guided Model Refinement
Navid Nayyem
Abdullah Rakin
Longwei Wang
AAML
FAtt
63
1
0
25 Dec 2024
DrivingGPT: Unifying Driving World Modeling and Planning with
  Multi-modal Autoregressive Transformers
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers
Yuntao Chen
Yuqi Wang
Zhaoxiang Zhang
173
7
0
24 Dec 2024
VidTwin: Video VAE with Decoupled Structure and Dynamics
VidTwin: Video VAE with Decoupled Structure and Dynamics
Yuchi Wang
Junliang Guo
Xinyi Xie
Tianyu He
Xu Sun
Jiang Bian
DRL
VGen
77
3
0
23 Dec 2024
When Worse is Better: Navigating the compression-generation tradeoff in
  visual tokenization
When Worse is Better: Navigating the compression-generation tradeoff in visual tokenization
Vivek Ramanujan
Kushal Tirumala
Armen Aghajanyan
Luke Zettlemoyer
Ali Farhadi
DiffM
76
2
0
20 Dec 2024
Next Patch Prediction for Autoregressive Visual Generation
Next Patch Prediction for Autoregressive Visual Generation
Yatian Pang
Peng Jin
Shuo Yang
Bin Lin
Bin Zhu
...
Liuhan Chen
Francis E. H. Tay
Ser-Nam Lim
Harry Yang
Li Yuan
126
9
0
19 Dec 2024
Parallelized Autoregressive Visual Generation
Parallelized Autoregressive Visual Generation
Yanjie Wang
Shuhuai Ren
Zhijie Lin
Yujin Han
Haoyuan Guo
Zhenheng Yang
Difan Zou
Jiashi Feng
Xihui Liu
VGen
90
12
0
19 Dec 2024
E-CAR: Efficient Continuous Autoregressive Image Generation via
  Multistage Modeling
E-CAR: Efficient Continuous Autoregressive Image Generation via Multistage Modeling
Zhihang Yuan
Yuzhang Shang
Hao Zhang
Tongcheng Fang
Rui Xie
Bingxin Xu
Yan Yan
Shengen Yan
Guohao Dai
Yu Wang
DiffM
105
1
0
18 Dec 2024
$\texttt{DINO-Foresight}$: Looking into the Future with DINO
DINO-Foresight\texttt{DINO-Foresight}DINO-Foresight: Looking into the Future with DINO
Efstathios Karypidis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
AI4CE
87
2
0
16 Dec 2024
VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video
  Prompting
VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting
Muhammet Furkan Ilaslan
Ali Koksal
K. Lin
Burak Satar
Mike Zheng Shou
Qianli Xu
LM&Ro
79
0
0
16 Dec 2024
AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration
AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration
Wenhao Sun
Rong-Cheng Tu
Jingyi Liao
Zhao Jin
Dacheng Tao
VGen
105
1
0
16 Dec 2024
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
Hongyu Chen
Zihan Wang
Xianrui Li
Xingchen Sun
Fangyi Chen
Jiang Liu
Jiadong Wang
Bhiksha Raj
Zicheng Liu
Emad Barsoum
VLM
114
7
0
14 Dec 2024
[MASK] is All You Need
[MASK] is All You Need
Vincent Tao Hu
Bjorn Ommer
DiffM
137
2
0
09 Dec 2024
DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video
  Generation with Language Models
DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models
Yizhuo Li
Yuying Ge
Yixiao Ge
Ping Luo
Ying Shan
DiffM
VGen
98
0
0
05 Dec 2024
RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
Ziqi Pang
Tianyuan Zhang
Fujun Luan
Yunze Man
Hao Tan
Kai Zhang
William T. Freeman
Yu-Xiong Wang
VGen
81
14
0
02 Dec 2024
XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive
  Generation
XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation
Xianrui Li
Kai Qiu
Hongyu Chen
Jason Kuen
Jiuxiang Gu
Jiadong Wang
Zhe-nan Lin
Bhiksha Raj
VLM
125
3
0
02 Dec 2024
CPA: Camera-pose-awareness Diffusion Transformer for Video Generation
CPA: Camera-pose-awareness Diffusion Transformer for Video Generation
Yuelei Wang
Jian Zhang
Pengtao Jiang
Hao Zhang
Jinwei Chen
Bo Li
VGen
DiffM
107
4
0
02 Dec 2024
CogACT: A Foundational Vision-Language-Action Model for Synergizing
  Cognition and Action in Robotic Manipulation
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Qixiu Li
Yaobo Liang
Zeyu Wang
Lin Luo
Xi Chen
...
Jianmin Bao
Dong Chen
Yuanchun Shi
Jiaolong Yang
B. Guo
LM&Ro
83
23
0
29 Nov 2024
3D-WAG: Hierarchical Wavelet-Guided Autoregressive Generation for
  High-Fidelity 3D Shapes
3D-WAG: Hierarchical Wavelet-Guided Autoregressive Generation for High-Fidelity 3D Shapes
Tejaswini Medi
Arianna Rampini
Pradyumna Reddy
P. Jayaraman
M. Keuper
DiffM
84
0
0
28 Nov 2024
Previous
12345
Next