ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.05737
  4. Cited By
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

9 October 2023
Lijun Yu
José Lezama
N. B. Gundavarapu
Luca Versari
Kihyuk Sohn
David C. Minnen
Yong Cheng
Vighnesh Birodkar
Agrim Gupta
Xiuye Gu
Alexander G. Hauptmann
Boqing Gong
Ming-Hsuan Yang
Irfan Essa
David A. Ross
Lu Jiang
ArXivPDFHTML

Papers citing "Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation"

50 / 227 papers shown
Title
Continuous Visual Autoregressive Generation via Score Maximization
Continuous Visual Autoregressive Generation via Score Maximization
Chenze Shao
Fandong Meng
Jie Zhou
DiffM
31
0
0
12 May 2025
Generative Pre-trained Autoregressive Diffusion Transformer
Generative Pre-trained Autoregressive Diffusion Transformer
Yuan Zhang
Jiacheng Jiang
Guoqing Ma
Zhiying Lu
Haoyang Huang
Jianlong Yuan
Nan Duan
VGen
43
1
0
12 May 2025
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Haokun Lin
Teng Wang
Yixiao Ge
Yuying Ge
Zhichao Lu
Ying Wei
Qingfu Zhang
Zhenan Sun
Ying Shan
MLLM
VLM
70
0
0
08 May 2025
Phenotype-Guided Generative Model for High-Fidelity Cardiac MRI Synthesis: Advancing Pretraining and Clinical Applications
Phenotype-Guided Generative Model for High-Fidelity Cardiac MRI Synthesis: Advancing Pretraining and Clinical Applications
Z. Li
Yujian Hu
Zhengyao Ding
Yiheng Mao
Yiming Li
Fan Yi
Hongkun Zhang
Zhengxing Huang
MedIm
45
1
0
06 May 2025
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Xu Ma
Peize Sun
Haoyu Ma
Hao Tang
Chih-Yao Ma
...
Matt Feiszli
Peizhao Zhang
Peter Vajda
Sam S. Tsai
Y. Fu
70
1
0
24 Apr 2025
FashionM3: Multimodal, Multitask, and Multiround Fashion Assistant based on Unified Vision-Language Model
FashionM3: Multimodal, Multitask, and Multiround Fashion Assistant based on Unified Vision-Language Model
Kaicheng Pang
Xingxing Zou
W. Wong
29
0
0
24 Apr 2025
Fast Autoregressive Models for Continuous Latent Generation
Fast Autoregressive Models for Continuous Latent Generation
Tiankai Hang
Jianmin Bao
Fangyun Wei
Dong Chen
DiffM
80
0
0
24 Apr 2025
Distilling semantically aware orders for autoregressive image generation
Distilling semantically aware orders for autoregressive image generation
Rishav Pramanik
Antoine Poupon
Juan A. Rodriguez
Masih Aminbeidokhti
David Vazquez
Christopher Pal
Zhaozheng Yin
M. Pedersoli
31
0
0
23 Apr 2025
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Theodoros Kouzelis
Efstathios Karypidis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
DiffM
38
0
0
22 Apr 2025
Elucidating the Design Space of Multimodal Protein Language Models
Elucidating the Design Space of Multimodal Protein Language Models
Cheng-Yen Hsieh
Qing Guo
Daiheng Zhang
Dongyu Xue
Fei Ye
Shujian Huang
Zaixiang Zheng
Quanquan Gu
34
1
0
15 Apr 2025
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers
Xingjian Leng
Jaskirat Singh
Yunzhong Hou
Zhenchang Xing
Saining Xie
Liang Zheng
39
1
0
14 Apr 2025
H3AE: High Compression, High Speed, and High Quality AutoEncoder for Video Diffusion Models
H3AE: High Compression, High Speed, and High Quality AutoEncoder for Video Diffusion Models
Yushu Wu
Yanyu Li
Ivan Skorokhodov
Anil Kag
Willi Menapace
Sharath Girish
Aliaksandr Siarohin
Yanzhi Wang
Sergey Tulyakov
DiffM
VGen
39
0
0
14 Apr 2025
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
Tianwei Xiong
Jun Hao Liew
Zilong Huang
Jiashi Feng
Xihui Liu
36
0
0
11 Apr 2025
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft
Junliang Guo
Yang Ye
Tianyu He
Haoyu Wu
Yushu Jiang
Tim Pearce
Jiang Bian
VGen
SyDa
56
2
0
11 Apr 2025
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Team Seawead
Ceyuan Yang
Zhijie Lin
Yang Zhao
Shanchuan Lin
...
Zuquan Song
Zhenheng Yang
Jiashi Feng
Jianchao Yang
Lu Jiang
DiffM
93
1
0
11 Apr 2025
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
Yang Jiao
Haibo Qiu
Zequn Jie
S. Chen
Jingjing Chen
Lin Ma
Yu Jiang
34
2
0
06 Apr 2025
3D Scene Understanding Through Local Random Access Sequence Modeling
3D Scene Understanding Through Local Random Access Sequence Modeling
Wanhee Lee
Klemen Kotar
R. Venkatesh
Jared Watrous
Honglin Chen
Khai Loong Aw
Daniel L. K. Yamins
3DV
42
0
0
04 Apr 2025
Fine-Tuning Visual Autoregressive Models for Subject-Driven Generation
Fine-Tuning Visual Autoregressive Models for Subject-Driven Generation
Jiwoo Chung
Sangeek Hyun
Hyunjun Kim
Eunseo Koh
MinKyu Lee
Jae-Pil Heo
33
0
0
03 Apr 2025
DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance
DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance
Yuxuan Luo
Zhengkun Rong
Lizhen Wang
Longhao Zhang
Tianshu Hu
Yongming Zhu
VGen
163
3
0
02 Apr 2025
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Siyuan Li
L. Zhang
Zedong Wang
Juanxi Tian
Cheng Tan
...
Chang Yu
Qingsong Xie
Haonan Lu
Haoqian Wang
Zhen Lei
48
0
0
01 Apr 2025
HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation
HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation
Hongwei Zheng
Han Li
Wenrui Dai
Ziyang Zheng
Chenglin Li
Junni Zou
Hongkai Xiong
3DH
60
0
0
30 Mar 2025
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Size Wu
W. Zhang
Lumin Xu
Sheng Jin
Zhonghua Wu
Qingyi Tao
Wentao Liu
Wei Li
Chen Change Loy
VGen
153
2
0
27 Mar 2025
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models
Alex Jinpeng Wang
Linjie Li
Zhengyuan Yang
Lijuan Wang
Min Li
DiffM
73
0
0
26 Mar 2025
Unified Multimodal Discrete Diffusion
Unified Multimodal Discrete Diffusion
Alexander Swerdlow
Mihir Prabhudesai
Siddharth Gandhi
Deepak Pathak
Katerina Fragkiadaki
DiffM
77
0
0
26 Mar 2025
Synthetic Video Enhances Physical Fidelity in Video Synthesis
Synthetic Video Enhances Physical Fidelity in Video Synthesis
Qi Zhao
Xingyu Ni
Ziyu Wang
Feng Cheng
Ziyan Yang
Lu Jiang
Bohan Wang
VGen
47
2
0
26 Mar 2025
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Yuchao Gu
Weijia Mao
Mike Zheng Shou
VGen
84
2
0
25 Mar 2025
Video-T1: Test-Time Scaling for Video Generation
Video-T1: Test-Time Scaling for Video Generation
F. Liu
Hanyang Wang
Yimo Cai
Kaiyan Zhang
Xiaohang Zhan
Yueqi Duan
DiffM
VGen
78
1
0
24 Mar 2025
CODA: Repurposing Continuous VAEs for Discrete Tokenization
CODA: Repurposing Continuous VAEs for Discrete Tokenization
Zeyu Liu
Zanlin Ni
Yeguo Hua
Xin Deng
Xiao Ma
Cheng Zhong
Gao Huang
47
0
0
22 Mar 2025
D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens
D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens
Panpan Wang
Liqiang Niu
Fandong Meng
Jinan Xu
Yufeng Chen
Jie Zhou
DiffM
50
0
0
21 Mar 2025
Halton Scheduler For Masked Generative Image Transformer
Halton Scheduler For Masked Generative Image Transformer
Victor Besnier
Mickael Chen
David Hurych
Eduardo Valle
Matthieu Cord
52
1
0
21 Mar 2025
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Yalin Wang
Zhijie Lin
Yao Teng
Yuanzhi Zhu
Shuhuai Ren
Jiashi Feng
Xihui Liu
53
0
0
20 Mar 2025
Improving Autoregressive Image Generation through Coarse-to-Fine Token Prediction
Improving Autoregressive Image Generation through Coarse-to-Fine Token Prediction
Ziyao Guo
Kaipeng Zhang
Michael Qizhe Shieh
43
0
0
20 Mar 2025
VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling
VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling
Hyojun Go
Byeongjun Park
Hyelin Nam
Byung-Hoon Kim
Hyungjin Chung
Changick Kim
3DGS
VGen
99
1
0
20 Mar 2025
MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space
MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space
Lixing Xiao
Shunlin Lu
Huaijin Pi
Ke Fan
Liang Pan
Yueer Zhou
Ziyong Feng
Xiaowei Zhou
Sida Peng
Jingbo Wang
DiffM
VGen
50
4
0
19 Mar 2025
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
Wei Song
Yue Wang
Zijia Song
Yadong Li
Haoze Sun
Xin Wu
Guosheng Dong
Jianhua Xu
Jiaqi Wang
Kaicheng Yu
60
2
0
18 Mar 2025
Fast Autoregressive Video Generation with Diagonal Decoding
Fast Autoregressive Video Generation with Diagonal Decoding
Yang Ye
Junliang Guo
Haoyu Wu
Tianyu He
Tim Pearce
Tabish Rashid
Katja Hofmann
Jiang Bian
DiffM
VGen
81
1
0
18 Mar 2025
Deeply Supervised Flow-Based Generative Models
Deeply Supervised Flow-Based Generative Models
Inkyu Shin
Chenglin Yang
Liang-Chieh Chen
63
0
0
18 Mar 2025
Versatile Physics-based Character Control with Hybrid Latent Representation
Versatile Physics-based Character Control with Hybrid Latent Representation
Jinseok Bae
Jungdam Won
Donggeun Lim
I. Hwang
Y. Kim
44
0
0
17 Mar 2025
Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization
Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization
Kyle Sargent
Kyle Hsu
Justin Johnson
L. Fei-Fei
Jiajun Wu
DiffM
MU
58
3
0
14 Mar 2025
HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models
Ziqin Zhou
Yifan Yang
Yi Yang
Tianyu He
Houwen Peng
Kai Qiu
Qi Dai
Lili Qiu
Chong Luo
Lingqiao Liu
DiffM
VGen
60
1
0
14 Mar 2025
Direction-Aware Diagonal Autoregressive Image Generation
Direction-Aware Diagonal Autoregressive Image Generation
Yijia Xu
Jianzhong Ju
Jian Luan
J. Cui
57
0
0
14 Mar 2025
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance
Yufan Deng
Xun Guo
Yalin Wang
Jacob Zhiyuan Fang
Angtian Wang
Shenghai Yuan
Yiding Yang
Bo Liu
Haibin Huang
Chongyang Ma
DiffM
VGen
72
0
0
13 Mar 2025
Long Context Tuning for Video Generation
Yuwei Guo
Ceyuan Yang
Ziyan Yang
Zhibei Ma
Zhijie Lin
Zhenheng Yang
Dahua Lin
Lu Jiang
DiffM
VGen
76
2
0
13 Mar 2025
Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data
Paul Quinlan
Qingguo Li
Xiaodan Zhu
AI4TS
LRM
64
0
0
13 Mar 2025
CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models
Hao He
Ceyuan Yang
Shanchuan Lin
Yinghao Xu
Meng Wei
Liangke Gui
Qi Zhao
Gordon Wetzstein
Lu Jiang
Hongsheng Li
DiffM
VGen
105
5
0
13 Mar 2025
VideoMerge: Towards Training-free Long Video Generation
Siyang Zhang
Harry Yang
Ser-Nam Lim
DiffM
VGen
50
0
0
13 Mar 2025
Neighboring Autoregressive Modeling for Efficient Visual Generation
Yefei He
Yuanyu He
Shaoxuan He
Feng Chen
Hong Zhou
Kaipeng Zhang
Bohan Zhuang
53
1
0
12 Mar 2025
Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
Kai Qiu
Xianrui Li
Jason Kuen
H. Chen
Xiaohao Xu
Jiuxiang Gu
Yinyi Luo
Bhiksha Raj
Zhe-nan Lin
Marios Savvides
62
0
0
11 Mar 2025
"Principal Components" Enable A New Language of Images
Xin Wen
Bingchen Zhao
Ismail Elezi
Jiankang Deng
Xiaojuan Qi
66
0
0
11 Mar 2025
REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder
Yitian Zhang
Long Mai
Aniruddha Mahapatra
David Bourgin
Yicong Hong
Jonah Casebeer
Feng Liu
Y. Fu
DiffM
VGen
56
0
0
11 Mar 2025
12345
Next