Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.00446
Cited By
Generating Diverse High-Fidelity Images with VQ-VAE-2
2 June 2019
Ali Razavi
Aaron van den Oord
Oriol Vinyals
DRL
BDL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Generating Diverse High-Fidelity Images with VQ-VAE-2"
50 / 1,128 papers shown
Title
FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction
Siyu Jiao
Gengwei Zhang
Yinlong Qian
Jiancheng Huang
Yao Zhao
Humphrey Shi
Lin Ma
Y. X. Wei
Zequn Jie
VLM
109
6
0
27 Feb 2025
Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation
Zhi Cen
Huaijin Pi
Sida Peng
Qing Shuai
Yujun Shen
Hujun Bao
Xiaowei Zhou
Ruizhen Hu
VGen
OffRL
144
3
0
27 Feb 2025
Diffusion Models for conditional MRI generation
Miguel Herencia García del Castillo
Ricardo Moya Garcia
Manuel Jesús Cerezo Mazón
Ekaitz Arriola Garcia
Pablo Menéndez Fernández-Miranda
MedIm
91
0
0
25 Feb 2025
Predicting the Energy Landscape of Stochastic Dynamical System via Physics-informed Self-supervised Learning
Ruikun Li
Huandong Wang
Qingmin Liao
Yong Li
59
2
0
24 Feb 2025
Machine learning and high dimensional vector search
Matthijs Douze
140
0
0
24 Feb 2025
SpecDM: Hyperspectral Dataset Synthesis with Pixel-level Semantic Annotations
Wen Liu
Pei Yang
Wenhui Hong
Xiaoguang Mei
Jiayi Ma
DiffM
108
0
0
24 Feb 2025
Mojito: LLM-Aided Motion Instructor with Jitter-Reduced Inertial Tokens
Ziwei Shan
Yaoyu He
Chengfeng Zhao
Jiashen Du
Jingyan Zhang
Qixuan Zhang
Jingyi Yu
Lan Xu
129
1
0
22 Feb 2025
CoDiff: Conditional Diffusion Model for Collaborative 3D Object Detection
zhe Huang
Shuo Wang
Yanjie Wang
Lei Wang
DiffM
452
0
0
17 Feb 2025
MoFM: A Large-Scale Human Motion Foundation Model
Mohammadreza Baharani
Ghazal Alinezhad Noghre
Armin Danesh Pazho
Gabriel Maldonado
Hamed Tabkhi
AI4CE
481
1
0
08 Feb 2025
High-Fidelity Simultaneous Speech-To-Speech Translation
Tom Labiausse
Laurent Mazaré
Edouard Grave
P. Pérez
Alexandre Défossez
Neil Zeghidour
490
1
0
05 Feb 2025
BRIDLE: Generalized Self-supervised Learning with Quantization
Hoang M. Nguyen
Satya Narayan Shukla
Qiang Zhang
Hanchao Yu
Sreya D. Roy
Taipeng Tian
Lingjiong Zhu
Yuchen Liu
SSL
MQ
148
0
0
04 Feb 2025
ConceptVAE: Self-Supervised Fine-Grained Concept Disentanglement from 2D Echocardiographies
C. Ciușdel
Alex Serban
Tiziano Passerini
CoGe
127
1
0
03 Feb 2025
End-to-end Training for Text-to-Image Synthesis using Dual-Text Embeddings
Yeruru Asrar Ahmed
Anurag Mittal
DiffM
143
0
0
03 Feb 2025
Unveiling Discrete Clues: Superior Healthcare Predictions for Rare Diseases
Chuang Zhao
Hui Tang
Jiheng Zhang
Xiaomeng Li
87
0
0
23 Jan 2025
AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
Samir Sadok
Simon Leglaive
Laurent Girin
Gaël Richard
Xavier Alameda-Pineda
134
3
0
10 Jan 2025
EditAR: Unified Conditional Generation with Autoregressive Models
Jiteng Mu
Nuno Vasconcelos
Xinyu Wang
DiffM
97
6
0
08 Jan 2025
Circuit Complexity Bounds for Visual Autoregressive Model
Yekun Ke
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
111
8
0
08 Jan 2025
Human Grasp Generation for Rigid and Deformable Objects with Decomposed VQ-VAE
Mengshi Qi
Zhe Zhao
Huadong Ma
132
1
0
08 Jan 2025
Learning the Language of Protein Structure
Benoit Gaujac
Jérémie Donà
Liviu Copoiu
Timothy Atkinson
Thomas Pierrot
Thomas D. Barrett
103
12
0
08 Jan 2025
Qinco2: Vector Compression and Search with Improved Implicit Neural Codebooks
Théophane Vallaeys
Matthew Muckley
Jakob Verbeek
Matthijs Douze
MQ
99
2
0
06 Jan 2025
Neural Network Diffusion
Kaili Wang
Dongwen Tang
Boya Zeng
Yida Yin
Zhaopan Xu
Yukun Zhou
Zelin Zang
Trevor Darrell
Zhuang Liu
Yang You
DiffM
156
26
0
03 Jan 2025
Towards Human-AI Synergy in UI Design: Enhancing Multi-Agent Based UI Generation with Intent Clarification and Alignment
M. Yuan
Jieshan Chen
Yongquan Hu
Sidong Feng
Mulong Xie
Gelareh Mohammadi
Zhenchang Xing
Aaron Quigley
LLMAG
101
1
0
28 Dec 2024
Hierarchical Vector Quantization for Unsupervised Action Segmentation
Federico Spurio
Emad Bahrami
Gianpiero Francesca
Juergen Gall
124
0
0
23 Dec 2024
When Worse is Better: Navigating the compression-generation tradeoff in visual tokenization
Vivek Ramanujan
Kushal Tirumala
Armen Aghajanyan
Luke Zettlemoyer
Ali Farhadi
DiffM
127
3
0
20 Dec 2024
Next Patch Prediction for Autoregressive Visual Generation
Yatian Pang
Peng Jin
Shuo Yang
Bin Lin
Bin Zhu
...
Liuhan Chen
Francis E. H. Tay
Ser-Nam Lim
Harry Yang
Li Yuan
255
10
0
19 Dec 2024
E-CAR: Efficient Continuous Autoregressive Image Generation via Multistage Modeling
Zhihang Yuan
Yuzhang Shang
Hao Zhang
Tongcheng Fang
Rui Xie
Bingxin Xu
Yan Yan
Shengen Yan
Guohao Dai
Yu Wang
DiffM
172
1
0
18 Dec 2024
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
Hong Chen
Zihan Wang
Xianrui Li
Xingwu Sun
Fangyi Chen
Jiang Liu
Jiadong Wang
Bhiksha Raj
Zicheng Liu
Emad Barsoum
VLM
296
10
0
14 Dec 2024
A Decade of Deep Learning: A Survey on The Magnificent Seven
Dilshod Azizov
Muhammad Arslan Manzoor
Velibor Bojkovic
Yingxu Wang
Ziyi Wang
...
Liang Li
Siwei Liu
Yu Zhong
Wei Liu
Shangsong Liang
OOD
AI4TS
MedIm
183
0
0
13 Dec 2024
OFTSR: One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs
Yuanzhi Zhu
R. Wang
Shilin Lu
Junnan Li
Hanshu Yan
Peng Sun
SupR
188
5
0
12 Dec 2024
Unsupervised Cross-Domain Regression for Fine-grained 3D Game Character Reconstruction
Qi Wen
Xiang Wen
Hao Jiang
Siqi Yang
Bingfeng Han
Tianlei Hu
Gang Chen
Shuang Li
3DH
121
0
0
11 Dec 2024
CoMA: Compositional Human Motion Generation with Multi-modal Agents
Shanlin Sun
Gabriel De Araujo
Jiaqi Xu
S. Kevin Zhou
Hanwen Zhang
Ziheng Huang
Chenyu You
Xiaohui Xie
171
5
0
10 Dec 2024
Evaluating Hallucination in Text-to-Image Diffusion Models with Scene-Graph based Question-Answering Agent
Ziyuan Qin
D. Cheng
Haoyu Wang
Huahui Yi
Yuting Shao
Zhiyuan Fan
Kang Li
Qicheng Lao
EGVM
MLLM
486
0
0
07 Dec 2024
LMDM:Latent Molecular Diffusion Model For 3D Molecule Generation
Xiang Chen
DiffM
133
0
0
05 Dec 2024
XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation
Xianrui Li
Kai Qiu
Hong Chen
Jason Kuen
Jiuxiang Gu
Jiadong Wang
Zhe Lin
Bhiksha Raj
VLM
227
9
0
02 Dec 2024
Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads
Siqi Kou
Jiachun Jin
Chang Liu
Ye Ma
Jian Jia
Quan Chen
Peng Jiang
Zhijie Deng
Zhijie Deng
DiffM
VGen
VLM
266
12
0
28 Nov 2024
3D-WAG: Hierarchical Wavelet-Guided Autoregressive Generation for High-Fidelity 3D Shapes
Tejaswini Medi
Arianna Rampini
Pradyumna Reddy
P. Jayaraman
Margret Keuper
DiffM
173
0
0
28 Nov 2024
Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling
J. Hyung
Kinam Kim
Susung Hong
M. Kim
Jaegul Choo
VGen
164
4
0
27 Nov 2024
Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency
Y. Wang
Jiajie Teng
Jiajiong Cao
Yuming Li
Chenguang Ma
Hongteng Xu
Dixin Luo
VGen
DiffM
137
1
0
25 Nov 2024
VQalAttent: a Transparent Speech Generation Pipeline based on Transformer-learned VQ-VAE Latent Space
Armani Rodriguez
S. Kokalj-Filipovic
120
1
0
22 Nov 2024
CDI: Copyrighted Data Identification in Diffusion Models
Jan Dubiñski
Antoni Kowalczuk
Franziska Boenisch
Adam Dziedzic
146
2
0
19 Nov 2024
Multidimensional Byte Pair Encoding: Shortened Sequences for Improved Visual Data Generation
Tim Elsner
Paula Usinger
Julius Nehring-Wirxel
Gregor Kobsik
Victor Czech
Yanjiang He
I. Lim
Leif Kobbelt
103
1
0
15 Nov 2024
Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings
Aditya Sanghi
Aliasghar Khani
Pradyumna Reddy
Arianna Rampini
Derek Cheung
Kamal Rahimi Malekshan
Kanika Madan
Hooman Shayani
125
3
0
12 Nov 2024
ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis
Zanlin Ni
Yulin Wang
Renping Zhou
Yizeng Han
Jiayi Guo
Zhiyuan Liu
Yuan Yao
Gao Huang
107
5
0
11 Nov 2024
Autoregressive Models in Vision: A Survey
Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
...
Hao Fei
Guillermo Sapiro
Jiebo Luo
Ping Luo
Ngai Wong
VGen
203
14
0
08 Nov 2024
Analyzing The Language of Visual Tokens
David M. Chan
Rodolfo Corona
J. S. Park
Cheol Jun Cho
Yutong Bai
Trevor Darrell
45
4
0
07 Nov 2024
Image Understanding Makes for A Good Tokenizer for Image Generation
Luting Wang
Yang Zhao
Zijian Zhang
Jiashi Feng
Si Liu
Bingyi Kang
VLM
91
4
0
07 Nov 2024
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning
G. Zhou
Hengkai Pan
Yann LeCun
Lerrel Pinto
VGen
LM&Ro
OffRL
112
32
0
07 Nov 2024
Training on test proteins improves fitness, structure, and function prediction
Anton Bushuiev
Roman Bushuiev
Nikola Zadorozhny
Raman Samusevich
Hannes Stärk
Jiri Sedlar
Tomáš Pluskal
Josef Sivic
54
0
0
04 Nov 2024
Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Yongxin Zhu
Bing Li
Yifei Xin
Linli Xu
115
13
0
04 Nov 2024
Music Foundation Model as Generic Booster for Music Downstream Tasks
Weihsiang Liao
Yuhta Takida
Yukara Ikemiya
Zhi-Wei Zhong
Chieh-Hsin Lai
...
Stefan Uhlich
Taketo Akama
Woosung Choi
Yuichiro Koyama
Yuki Mitsufuji
252
1
0
02 Nov 2024
Previous
1
2
3
4
5
6
...
21
22
23
Next