ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.04200
  4. Cited By
MaskGIT: Masked Generative Image Transformer

MaskGIT: Masked Generative Image Transformer

8 February 2022
Huiwen Chang
Han Zhang
Lu Jiang
Ce Liu
William T. Freeman
    ViT
ArXivPDFHTML

Papers citing "MaskGIT: Masked Generative Image Transformer"

50 / 482 papers shown
Title
SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for
  Efficient Audio Synthesis and Beyond
SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond
Marco Comunità
Zhi-Wei Zhong
Akira Takahashi
Shiqi Yang
Mengjie Zhao
Koichi Saito
Yukara Ikemiya
Takashi Shibuya
Shusuke Takahashi
Yuki Mitsufuji
71
2
0
25 Jun 2024
Autoregressive Image Generation without Vector Quantization
Autoregressive Image Generation without Vector Quantization
Tianhong Li
Yonglong Tian
He Li
Mingyang Deng
Kaiming He
DiffM
62
183
0
17 Jun 2024
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of
  99%
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%
Lei Zhu
Fangyun Wei
Yanye Lu
Dong Chen
VLM
43
34
0
17 Jun 2024
Generative Visual Instruction Tuning
Generative Visual Instruction Tuning
Jefferson Hernandez
Ruben Villegas
Vicente Ordonez
VLM
38
3
0
17 Jun 2024
Alleviating Distortion in Image Generation via Multi-Resolution
  Diffusion Models
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models
Qihao Liu
Zhanpeng Zeng
Ju He
Qihang Yu
Xiaohui Shen
Liang-Chieh Chen
53
20
0
13 Jun 2024
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
Roman Bachmann
Oğuzhan Fatih Kar
David Mizrahi
Ali Garjani
Mingfei Gao
David Griffiths
Jiaming Hu
Afshin Dehghan
Amir Zamir
MoE
VLM
MLLM
41
14
0
13 Jun 2024
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Junke Wang
Yi-Xin Jiang
Zehuan Yuan
Binyue Peng
Zuxuan Wu
Yu-Gang Jiang
ViT
VGen
80
38
0
13 Jun 2024
An Image is Worth 32 Tokens for Reconstruction and Generation
An Image is Worth 32 Tokens for Reconstruction and Generation
Qihang Yu
Mark Weber
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
VLM
ViT
60
85
0
11 Jun 2024
Image and Video Tokenization with Binary Spherical Quantization
Image and Video Tokenization with Binary Spherical Quantization
Yue Zhao
Yuanjun Xiong
Philipp Krahenbuhl
45
18
0
11 Jun 2024
Autoregressive Model Beats Diffusion: Llama for Scalable Image
  Generation
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Peize Sun
Yi Jiang
Shoufa Chen
Shilong Zhang
Bingyue Peng
Ping Luo
Zehuan Yuan
VLM
66
229
0
10 Jun 2024
Deep Generative Modeling Reshapes Compression and Transmission: From
  Efficiency to Resiliency
Deep Generative Modeling Reshapes Compression and Transmission: From Efficiency to Resiliency
Jincheng Dai
Xiaoqi Qin
Sixian Wang
Lexi Xu
Kai Niu
Ping Zhang
42
5
0
10 Jun 2024
Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Zanlin Ni
Yulin Wang
Renping Zhou
Jiayi Guo
Jinyi Hu
Zhiyuan Liu
Shiji Song
Yuan Yao
Gao Huang
32
14
0
08 Jun 2024
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text
  to Speech Synthesizers
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Sanyuan Chen
Shujie Liu
Long Zhou
Yanqing Liu
Xu Tan
Jinyu Li
Sheng Zhao
Yao Qian
Furu Wei
VLM
47
67
0
08 Jun 2024
Enhancing Indoor Temperature Forecasting through Synthetic Data in
  Low-Data Environments
Enhancing Indoor Temperature Forecasting through Synthetic Data in Low-Data Environments
Zachari Thiry
Massimiliano Ruocco
Alessandro Nocente
Michail Spitieris
19
0
0
07 Jun 2024
Small-E: Small Language Model with Linear Attention for Efficient Speech
  Synthesis
Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis
Théodor Lemerle
Nicolas Obin
Axel Roebel
37
6
0
06 Jun 2024
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with
  Multi-Modal Context and Large Language Model
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
Jinlong Xue
Yayue Deng
Yicheng Han
Yingming Gao
Ya Li
40
4
0
06 Jun 2024
MaskSR: Masked Language Model for Full-band Speech Restoration
MaskSR: Masked Language Model for Full-band Speech Restoration
Xu Li
Qirui Wang
Xiaoyu Liu
47
8
0
04 Jun 2024
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and
  Zero-shot Language Style Control With Decoupled Codec
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
Shengpeng Ji
Jia-li Zuo
Minghui Fang
Siqi Zheng
Qian Chen
...
Ziyue Jiang
Hai Huang
Xize Cheng
Rongjie Huang
Zhou Zhao
55
8
0
03 Jun 2024
ContextFlow++: Generalist-Specialist Flow-based Generative Models with
  Mixed-Variable Context Encoding
ContextFlow++: Generalist-Specialist Flow-based Generative Models with Mixed-Variable Context Encoding
Denis A. Gudovskiy
Tomoyuki Okuno
Yohei Nakata
MoE
AI4CE
36
2
0
02 Jun 2024
Trajectory Forecasting through Low-Rank Adaptation of Discrete Latent
  Codes
Trajectory Forecasting through Low-Rank Adaptation of Discrete Latent Codes
Riccardo Benaglia
Angelo Porrello
Pietro Buzzega
Simone Calderara
Rita Cucchiara
20
0
0
31 May 2024
RIGID: A Training-free and Model-Agnostic Framework for Robust
  AI-Generated Image Detection
RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection
Zhiyuan He
Pin-Yu Chen
Tsung-Yi Ho
44
12
0
30 May 2024
MEGA: Masked Generative Autoencoder for Human Mesh Recovery
MEGA: Masked Generative Autoencoder for Human Mesh Recovery
Guénolé Fiche
Simon Leglaive
Xavier Alameda-Pineda
Francesc Moreno-Noguer
3DH
60
1
0
29 May 2024
Alignment is Key for Applying Diffusion Models to Retrosynthesis
Alignment is Key for Applying Diffusion Models to Retrosynthesis
Najwa Laabid
Severi Rissanen
Markus Heinonen
Arno Solin
Vikas K. Garg
DiffM
31
1
0
27 May 2024
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Shiqi Yang
Zhi-Wei Zhong
Mengjie Zhao
Shusuke Takahashi
Masato Ishii
Takashi Shibuya
Yuki Mitsufuji
43
3
0
23 May 2024
Text Prompting for Multi-Concept Video Customization by Autoregressive
  Generation
Text Prompting for Multi-Concept Video Customization by Autoregressive Generation
D. Kothandaraman
Kihyuk Sohn
Ruben Villegas
P. Voigtlaender
Dinesh Manocha
Mohammad Babaeizadeh
VGen
DiffM
35
2
0
22 May 2024
Robust Disaster Assessment from Aerial Imagery Using Text-to-Image
  Synthetic Data
Robust Disaster Assessment from Aerial Imagery Using Text-to-Image Synthetic Data
Tarun Kalluri
Jihyeon Janel Lee
Kihyuk Sohn
Sahil Singla
Manmohan Chandraker
Joseph Z. Xu
Jeremiah Liu
49
1
0
22 May 2024
Diffusion for World Modeling: Visual Details Matter in Atari
Diffusion for World Modeling: Visual Details Matter in Atari
Eloi Alonso
Adam Jelley
Vincent Micheli
Anssi Kanervisto
Amos Storkey
Tim Pearce
Franccois Fleuret
56
41
0
20 May 2024
Beyond Traditional Single Object Tracking: A Survey
Beyond Traditional Single Object Tracking: A Survey
Omar Abdelaziz
Mohamed Shehata
Mohamed Mohamed
35
0
0
16 May 2024
VQDNA: Unleashing the Power of Vector Quantization for Multi-Species
  Genomic Sequence Modeling
VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling
Siyuan Li
Zedong Wang
Zicheng Liu
Di Wu
Cheng Tan
Jiangbin Zheng
Yufei Huang
Stan Z. Li
40
7
0
13 May 2024
DiffTF++: 3D-aware Diffusion Transformer for Large-Vocabulary 3D
  Generation
DiffTF++: 3D-aware Diffusion Transformer for Large-Vocabulary 3D Generation
Ziang Cao
Fangzhou Hong
Tong Wu
Liang Pan
Ziwei Liu
29
2
0
13 May 2024
Controllable Image Generation With Composed Parallel Token Prediction
Controllable Image Generation With Composed Parallel Token Prediction
Jamie Stirling
Noura Al-Moubayed
33
0
0
10 May 2024
Lumina-T2X: Transforming Text into Any Modality, Resolution, and
  Duration via Flow-based Large Diffusion Transformers
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Peng Gao
Le Zhuo
Ziyi Lin
Ruoyi Du
Xu Luo
...
Weicai Ye
He Tong
Jingwen He
Yu Qiao
Hongsheng Li
VGen
37
84
0
09 May 2024
Is Sora a World Simulator? A Comprehensive Survey on General World
  Models and Beyond
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Zheng Zhu
Xiaofeng Wang
Wangbo Zhao
Chen Min
Nianchen Deng
...
Dawei Zhao
Liang Xiao
Jian-jun Zhao
Jiwen Lu
Guan Huang
VGen
LM&Ro
87
38
0
06 May 2024
Lazy Diffusion Transformer for Interactive Image Editing
Lazy Diffusion Transformer for Interactive Image Editing
Yotam Nitzan
Zongze Wu
Richard Zhang
Eli Shechtman
Daniel Cohen-Or
Taesung Park
Michael Gharbi
43
9
0
18 Apr 2024
Sketch-guided Image Inpainting with Partial Discrete Diffusion Process
Sketch-guided Image Inpainting with Partial Discrete Diffusion Process
Nakul Sharma
Aditay Tripathi
Anirban Chakraborty
Anand Mishra
DiffM
41
3
0
18 Apr 2024
σ-GPTs: A New Approach to Autoregressive Models
σ-GPTs: A New Approach to Autoregressive Models
Arnaud Pannatier
Evann Courdier
Franccois Fleuret
AI4TS
28
7
0
15 Apr 2024
Responsible Visual Editing
Responsible Visual Editing
Minheng Ni
Yeli Shen
Lei Zhang
W. Zuo
DiffM
29
0
0
08 Apr 2024
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Debang Li
Junshi Huang
40
24
0
06 Apr 2024
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
  Prediction
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Keyu Tian
Yi-Xin Jiang
Zehuan Yuan
Bingyue Peng
Liwei Wang
VGen
42
250
0
03 Apr 2024
LidarDM: Generative LiDAR Simulation in a Generated World
LidarDM: Generative LiDAR Simulation in a Generated World
Vlas Zyrianov
Henry Che
Zhijian Liu
Shenlong Wang
VGen
41
20
0
03 Apr 2024
Bigger is not Always Better: Scaling Properties of Latent Diffusion
  Models
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Kangfu Mei
Zhengzhong Tu
M. Delbracio
Hossein Talebi
Vishal M. Patel
P. Milanfar
DiffM
58
12
0
01 Apr 2024
Towards Variable and Coordinated Holistic Co-Speech Motion Generation
Towards Variable and Coordinated Holistic Co-Speech Motion Generation
Yifei Liu
Qiong Cao
Yandong Wen
Huaiguang Jiang
Changxing Ding
SLR
68
13
0
30 Mar 2024
BAMM: Bidirectional Autoregressive Motion Model
BAMM: Bidirectional Autoregressive Motion Model
Ekkasit Pinyoanuntapong
Muhammad Usama Saleem
Pu Wang
Minwoo Lee
Srijan Das
Chong Chen
VGen
40
23
0
28 Mar 2024
A Survey on Large Language Models from Concept to Implementation
A Survey on Large Language Models from Concept to Implementation
Chen Wang
Jin Zhao
Jiaqi Gong
LLMAG
LM&MA
42
3
0
27 Mar 2024
Don't Look into the Dark: Latent Codes for Pluralistic Image Inpainting
Don't Look into the Dark: Latent Codes for Pluralistic Image Inpainting
Haiwei Chen
Yajie Zhao
DiffM
19
2
0
27 Mar 2024
SD-DiT: Unleashing the Power of Self-supervised Discrimination in
  Diffusion Transformer
SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer
Rui Zhu
Yingwei Pan
Yehao Li
Ting Yao
Zhenglong Sun
Tao Mei
C. Chen
50
24
0
25 Mar 2024
Efficient Video Diffusion Models via Content-Frame Motion-Latent
  Decomposition
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition
Sihyun Yu
Weili Nie
De-An Huang
Boyi Li
Jinwoo Shin
A. Anandkumar
VGen
DiffM
34
15
0
21 Mar 2024
Codebook Transfer with Part-of-Speech for Vector-Quantized Image
  Modeling
Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling
Baoquan Zhang
Huaibin Wang
Chuyao Luo
Xutao Li
Guotao Liang
Yunming Ye
Xiaochen Qi
Yao He
40
11
0
15 Mar 2024
Eta Inversion: Designing an Optimal Eta Function for Diffusion-based
  Real Image Editing
Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing
Wonjun Kang
Kevin Galim
Hyung Il Koo
DiffM
34
5
0
14 Mar 2024
UniCode: Learning a Unified Codebook for Multimodal Large Language
  Models
UniCode: Learning a Unified Codebook for Multimodal Large Language Models
Sipeng Zheng
Bohan Zhou
Yicheng Feng
Ye Wang
Zongqing Lu
VLM
MLLM
46
7
0
14 Mar 2024
Previous
123456...8910
Next