ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.05737
  4. Cited By
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

9 October 2023
Lijun Yu
José Lezama
N. B. Gundavarapu
Luca Versari
Kihyuk Sohn
David C. Minnen
Yong Cheng
Vighnesh Birodkar
Agrim Gupta
Xiuye Gu
Alexander G. Hauptmann
Boqing Gong
Ming-Hsuan Yang
Irfan Essa
David A. Ross
Lu Jiang
ArXivPDFHTML

Papers citing "Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation"

27 / 227 papers shown
Title
Sora: A Review on Background, Technology, Limitations, and Opportunities
  of Large Vision Models
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Yixin Liu
Kai Zhang
Yuan Li
Zhiling Yan
Chujie Gao
...
Yue Huang
Hanchi Sun
Jianfeng Gao
Lifang He
Lichao Sun
VLM
VGen
EGVM
75
260
0
27 Feb 2024
AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D
  Talking Face Generation
AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation
Yasheng Sun
Wenqing Chu
Hang Zhou
Kaisiyuan Wang
Hideki Koike
37
5
0
25 Feb 2024
Rolling Diffusion Models
Rolling Diffusion Models
David Ruhe
Jonathan Heek
Tim Salimans
Emiel Hoogeboom
DiffM
35
32
0
12 Feb 2024
FoldToken: Learning Protein Language via Vector Quantization and Beyond
FoldToken: Learning Protein Language via Vector Quantization and Beyond
Zhangyang Gao
Cheng Tan
Jue Wang
Yufei Huang
Lirong Wu
Stan Z. Li
30
9
0
04 Feb 2024
Position: Graph Foundation Models are Already Here
Position: Graph Foundation Models are Already Here
Haitao Mao
Zhikai Chen
Wenzhuo Tang
Jianan Zhao
Yao Ma
Tong Zhao
Neil Shah
Mikhail Galkin
Jiliang Tang
AI4CE
64
27
0
03 Feb 2024
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis
Zecheng Tang
Chenfei Wu
Zekai Zhang
Mingheng Ni
Sheng-Siang Yin
...
Zhengyuan Yang
Lijuan Wang
Zicheng Liu
Juntao Li
Nan Duan
25
10
0
30 Jan 2024
A Survey on Generative AI and LLM for Video Generation, Understanding,
  and Streaming
A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming
Pengyuan Zhou
Lin Wang
Zhi Liu
Yanbin Hao
Pan Hui
Sasu Tarkoma
J. Kangasharju
VGen
46
26
0
30 Jan 2024
WorldDreamer: Towards General World Models for Video Generation via
  Predicting Masked Tokens
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Xiaofeng Wang
Zheng Zhu
Guan Huang
Boyuan Wang
Xinze Chen
Jiwen Lu
VGen
40
32
0
18 Jan 2024
VideoPoet: A Large Language Model for Zero-Shot Video Generation
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Dan Kondratyuk
Lijun Yu
Xiuye Gu
José Lezama
Jonathan Huang
...
Irfan Essa
Huisheng Wang
David A. Ross
Bryan Seybold
Lu Jiang
VGen
20
241
0
21 Dec 2023
Photorealistic Video Generation with Diffusion Models
Photorealistic Video Generation with Diffusion Models
Agrim Gupta
Lijun Yu
Kihyuk Sohn
Xiuye Gu
Meera Hahn
Fei-Fei Li
Irfan Essa
Lu Jiang
José Lezama
VGen
59
177
0
11 Dec 2023
Does Vector Quantization Fail in Spatio-Temporal Forecasting? Exploring a Differentiable Sparse Soft-Vector Quantization Approach
Does Vector Quantization Fail in Spatio-Temporal Forecasting? Exploring a Differentiable Sparse Soft-Vector Quantization Approach
Chao Chen
Tian Zhou
Yanjun Zhao
Hui Liu
Liang Sun
Rong Jin
40
0
0
06 Dec 2023
VBench: Comprehensive Benchmark Suite for Video Generative Models
VBench: Comprehensive Benchmark Suite for Video Generative Models
Ziqi Huang
Yinan He
Jiashuo Yu
Fan Zhang
Chenyang Si
...
Xinyuan Chen
Limin Wang
Dahua Lin
Yu Qiao
Ziwei Liu
VGen
77
351
0
29 Nov 2023
Applications of Large Scale Foundation Models for Autonomous Driving
Applications of Large Scale Foundation Models for Autonomous Driving
Yu Huang
Yue Chen
Zhu Li
ELM
AI4CE
LRM
ALM
LM&Ro
61
15
0
20 Nov 2023
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Zhen Yang
Yingxue Zhang
Fandong Meng
Jie Zhou
VLM
MLLM
47
3
0
08 Nov 2023
Neuro-GPT: Towards A Foundation Model for EEG
Neuro-GPT: Towards A Foundation Model for EEG
Wenhui Cui
Woojae Jeong
Philipp Tholke
T. Medani
Karim Jerbi
Anand A. Joshi
Richard M. Leahy
26
18
0
07 Nov 2023
A Pytorch Reproduction of Masked Generative Image Transformer
A Pytorch Reproduction of Masked Generative Image Transformer
Victor Besnier
Mickael Chen
ViT
61
12
0
22 Oct 2023
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen
  LLMs
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Lijun Yu
Yong Cheng
Zhiruo Wang
Vivek Kumar
Wolfgang Macherey
...
Yonatan Bisk
Ming Yang
Kevin Patrick Murphy
Alexander G. Hauptmann
Lu Jiang
MLLM
22
49
0
30 Jun 2023
MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer
MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer
Shanghua Gao
Pan Zhou
Mingg-Ming Cheng
Shuicheng Yan
DiffM
145
155
0
25 Mar 2023
Muse: Text-To-Image Generation via Masked Generative Transformers
Muse: Text-To-Image Generation via Masked Generative Transformers
Huiwen Chang
Han Zhang
Jarred Barber
AJ Maschinot
José Lezama
...
Kevin Patrick Murphy
William T. Freeman
Michael Rubinstein
Yuanzhen Li
Dilip Krishnan
DiffM
197
521
0
02 Jan 2023
Improved Masked Image Generation with Token-Critic
Improved Masked Image Generation with Token-Critic
José Lezama
Huiwen Chang
Lu Jiang
Irfan Essa
DiffM
188
43
0
09 Sep 2022
MaskViT: Masked Visual Pre-Training for Video Prediction
MaskViT: Masked Visual Pre-Training for Video Prediction
Agrim Gupta
Stephen Tian
Yunzhi Zhang
Jiajun Wu
Roberto Martín-Martín
Li Fei-Fei
112
111
0
23 Jun 2022
StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets
StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets
Axel Sauer
Katja Schwarz
Andreas Geiger
182
495
0
01 Feb 2022
VideoGPT: Video Generation using VQ-VAE and Transformers
VideoGPT: Video Generation using VQ-VAE and Transformers
Wilson Yan
Yunzhi Zhang
Pieter Abbeel
A. Srinivas
ViT
VGen
245
484
0
20 Apr 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,858
0
18 Apr 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,796
0
24 Feb 2021
Transformation-based Adversarial Video Prediction on Large-Scale Data
Transformation-based Adversarial Video Prediction on Large-Scale Data
Pauline Luc
Aidan Clark
Sander Dieleman
Diego de Las Casas
Yotam Doron
Albin Cassirer
Karen Simonyan
VGen
231
86
0
09 Mar 2020
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networks
Tero Karras
S. Laine
Timo Aila
306
10,368
0
12 Dec 2018
Previous
12345