ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.07550
  4. Cited By
An Image is Worth 32 Tokens for Reconstruction and Generation

An Image is Worth 32 Tokens for Reconstruction and Generation

11 June 2024
Qihang Yu
Mark Weber
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
    VLM
    ViT
ArXivPDFHTML

Papers citing "An Image is Worth 32 Tokens for Reconstruction and Generation"

27 / 77 papers shown
Title
Towards Unifying Understanding and Generation in the Era of Vision
  Foundation Models: A Survey from the Autoregression Perspective
Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective
Shenghao Xie
Wenqiang Zu
Mingyang Zhao
Duo Su
Shilong Liu
Ruohua Shi
Guoqi Li
Shanghang Zhang
Lei Ma
LRM
44
3
0
29 Oct 2024
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior
Hanyu Wang
Saksham Suri
Yixuan Ren
Hao Chen
Abhinav Shrivastava
VGen
31
9
0
28 Oct 2024
Progressive Compositionality in Text-to-Image Generative Models
Progressive Compositionality in Text-to-Image Generative Models
Xu Han
Linghao Jin
Xiaofeng Liu
Paul Pu Liang
CoGe
103
2
0
22 Oct 2024
Elucidating the design space of language models for image generation
Elucidating the design space of language models for image generation
Xuantong Liu
Shaozhe Hao
Xianbiao Qi
Tianyang Hu
Jun Wang
Rong Xiao
Yuan Yao
VLM
32
3
0
21 Oct 2024
Beyond GFVC: A Progressive Face Video Compression Framework with
  Adaptive Visual Tokens
Beyond GFVC: A Progressive Face Video Compression Framework with Adaptive Visual Tokens
Bolin Chen
Shanzhi Yin
Zihan Zhang
Jie Chen
Ru-Ling Liao
Lingyu Zhu
Shiqi Wang
Yan Ye
23
3
0
11 Oct 2024
Think While You Generate: Discrete Diffusion with Planned Denoising
Think While You Generate: Discrete Diffusion with Planned Denoising
Sulin Liu
Juno Nam
Andrew Campbell
Hannes Stärk
Yilun Xu
Tommi Jaakkola
Rafael Gómez-Bombarelli
DiffM
48
6
0
08 Oct 2024
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive
  Transformer for Efficient Finegrained Image Generation
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation
Liang Chen
Sinan Tan
Zefan Cai
Weichu Xie
Haozhe Zhao
Yichi Zhang
Junyang Lin
Jinze Bai
Tianyu Liu
Baobao Chang
ViT
55
3
0
02 Oct 2024
ImageFolder: Autoregressive Image Generation with Folded Tokens
ImageFolder: Autoregressive Image Generation with Folded Tokens
Xiang Li
Kai Qiu
Hao Chen
Jason Kuen
Jiuxiang Gu
Bhiksha Raj
Zhe-nan Lin
VLM
39
18
0
02 Oct 2024
MaskBit: Embedding-free Image Generation via Bit Tokens
MaskBit: Embedding-free Image Generation via Bit Tokens
Mark Weber
Lijun Yu
Qihang Yu
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
DiffM
51
30
0
24 Sep 2024
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Zhuoyan Luo
Fengyuan Shi
Yixiao Ge
Yujiu Yang
Limin Wang
Ying Shan
VLM
50
51
0
06 Sep 2024
Body of Her: A Preliminary Study on End-to-End Humanoid Agent
Body of Her: A Preliminary Study on End-to-End Humanoid Agent
Tenglong Ao
LM&Ro
31
1
0
06 Aug 2024
Quantised Global Autoencoder: A Holistic Approach to Representing Visual
  Data
Quantised Global Autoencoder: A Holistic Approach to Representing Visual Data
Tim Elsner
Paula Usinger
Victor Czech
Gregor Kobsik
Yanjiang He
I. Lim
Leif Kobbelt
39
1
0
16 Jul 2024
Several questions of visual generation in 2024
Several questions of visual generation in 2024
Shuyang Gu
29
1
0
11 Jul 2024
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for
  Interleaved Image-Text Generation
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation
Ethan Chern
Jiadi Su
Yan Ma
Pengfei Liu
MLLM
29
28
0
08 Jul 2024
Wavelets Are All You Need for Autoregressive Image Generation
Wavelets Are All You Need for Autoregressive Image Generation
Wael Mattar
Idan Levy
Nir Sharon
S. Dekel
43
3
0
28 Jun 2024
Alleviating Distortion in Image Generation via Multi-Resolution
  Diffusion Models
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models
Qihao Liu
Zhanpeng Zeng
Ju He
Qihang Yu
Xiaohui Shen
Liang-Chieh Chen
53
19
0
13 Jun 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu
Hao Fei
Xiangtai Li
Jiayi Ji
Hanwang Zhang
Tat-Seng Chua
Shuicheng Yan
MLLM
63
32
0
07 Jun 2024
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jienneg Chen
Qihang Yu
Xiaohui Shen
Alan L. Yuille
Liang-Chieh Chen
3DV
VLM
36
24
0
02 Apr 2024
A Pytorch Reproduction of Masked Generative Image Transformer
A Pytorch Reproduction of Masked Generative Image Transformer
Victor Besnier
Mickael Chen
ViT
61
12
0
22 Oct 2023
MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer
MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer
Shanghua Gao
Pan Zhou
Mingg-Ming Cheng
Shuicheng Yan
DiffM
145
155
0
25 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
272
4,244
0
30 Jan 2023
Muse: Text-To-Image Generation via Masked Generative Transformers
Muse: Text-To-Image Generation via Masked Generative Transformers
Huiwen Chang
Han Zhang
Jarred Barber
AJ Maschinot
José Lezama
...
Kevin Patrick Murphy
William T. Freeman
Michael Rubinstein
Yuanzhen Li
Dilip Krishnan
DiffM
197
519
0
02 Jan 2023
MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation
MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation
Chuanxia Zheng
L. Vuong
Jianfei Cai
Dinh Q. Phung
MQ
71
72
0
19 Sep 2022
Improved Masked Image Generation with Token-Critic
Improved Masked Image Generation with Token-Critic
José Lezama
Huiwen Chang
Lu Jiang
Irfan Essa
DiffM
188
43
0
09 Sep 2022
Autoregressive Image Generation using Residual Quantization
Autoregressive Image Generation using Residual Quantization
Doyup Lee
Chiheon Kim
Saehoon Kim
Minsu Cho
Wook-Shin Han
VGen
172
326
0
03 Mar 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
305
7,443
0
11 Nov 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,781
0
24 Feb 2021
Previous
12