Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2209.09002
Cited By
MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation
19 September 2022
Chuanxia Zheng
L. Vuong
Jianfei Cai
Dinh Q. Phung
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation"
19 / 19 papers shown
Title
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
X. Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
74
0
0
05 May 2025
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Kaihang Pan
Wang Lin
Zhongqi Yue
Tenglong Ao
Liyu Jia
Wei Zhao
Juncheng Billy Li
Siliang Tang
Hanwang Zhang
49
2
0
20 Apr 2025
FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning
Hang Guo
Yawei Li
Taolin Zhang
J. Wang
Tao Dai
Shu-Tao Xia
Luca Benini
70
1
0
30 Mar 2025
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
Wei Song
Y. Wang
Zijia Song
Yadong Li
Haoze Sun
Weipeng Chen
Zenan Zhou
Jianhua Xu
Jiaqi Wang
Kaicheng Yu
60
2
0
18 Mar 2025
Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
Kai Qiu
X. Li
Jason Kuen
H. Chen
Xiaohao Xu
Jiuxiang Gu
Yinyi Luo
Bhiksha Raj
Zhe-nan Lin
Marios Savvides
62
0
0
11 Mar 2025
Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence
Wenzhe Yin
Zehao Xiao
Pan Zhou
Shujian Yu
Jiayi Shen
J. Sonke
E. Gavves
37
0
0
24 Feb 2025
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
Yu Zhang
Ziyue Jiang
Ruiqi Li
Changhao Pan
Jinzheng He
Rongjie Huang
Chuxin Wang
Zhou Zhao
DiffM
VLM
47
4
0
24 Sep 2024
An Image is Worth 32 Tokens for Reconstruction and Generation
Qihang Yu
Mark Weber
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
VLM
ViT
51
81
0
11 Jun 2024
Free3D: Consistent Novel View Synthesis without 3D Representation
Chuanxia Zheng
Andrea Vedaldi
3DV
39
48
0
07 Dec 2023
Kandinsky 3.0 Technical Report
V.Ya. Arkhipkin
Andrei Filatov
Viacheslav Vasilev
Anastasia Maltseva
Said Azizov
Igor Pavlov
Julia Agafonova
Andrey Kuznetsov
Denis Dimitrov
DiffM
28
11
0
06 Dec 2023
FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline
V.Ya. Arkhipkin
Zein Shaheen
Viacheslav Vasilev
E. Dakhova
Andrey Kuznetsov
Denis Dimitrov
DiffM
VGen
23
5
0
22 Nov 2023
Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation
Minghui Hu
Jianbin Zheng
Daqing Liu
Chuanxia Zheng
Chaoyue Wang
Dacheng Tao
Tat-Jen Cham
DiffM
25
9
0
01 Jun 2023
Unified Discrete Diffusion for Simultaneous Vision-Language Generation
Minghui Hu
Chuanxia Zheng
Heliang Zheng
Tat-Jen Cham
Chaoyue Wang
Zuopeng Yang
Dacheng Tao
Ponnuthurai Nagaratnam Suganthan
DiffM
20
23
0
27 Nov 2022
Autoregressive Image Generation using Residual Quantization
Doyup Lee
Chiheon Kim
Saehoon Kim
Minsu Cho
Wook-Shin Han
VGen
172
325
0
03 Mar 2022
VideoGPT: Video Generation using VQ-VAE and Transformers
Wilson Yan
Yunzhi Zhang
Pieter Abbeel
A. Srinivas
ViT
VGen
245
484
0
20 Apr 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,777
0
24 Feb 2021
A Style-Based Generator Architecture for Generative Adversarial Networks
Tero Karras
S. Laine
Timo Aila
279
10,348
0
12 Dec 2018
A Learned Representation For Artistic Style
Vincent Dumoulin
Jonathon Shlens
M. Kudlur
GAN
214
1,156
0
24 Oct 2016
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
296
39,194
0
01 Sep 2014
1