ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
  • Feedback
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.00446
  4. Cited By
Generating Diverse High-Fidelity Images with VQ-VAE-2

Generating Diverse High-Fidelity Images with VQ-VAE-2

2 June 2019
Ali Razavi
Aaron van den Oord
Oriol Vinyals
    DRLBDL
ArXiv (abs)PDFHTML

Papers citing "Generating Diverse High-Fidelity Images with VQ-VAE-2"

50 / 1,154 papers shown
Title
Towards Homogeneous Lexical Tone Decoding from Heterogeneous Intracranial Recordings
Towards Homogeneous Lexical Tone Decoding from Heterogeneous Intracranial Recordings
Di Wu
Siyuan Li
Chen Feng
Lu Cao
Yize Zhang
Jie Yang
Mohamad Sawan
145
3
0
13 Oct 2024
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image
  Animation
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
Jiahao Cui
Hui Li
Yao Yao
Hao Zhu
Hanlin Shang
Kaihui Cheng
Hang Zhou
Siyu Zhu
Jingdong Wang
DiffMVGen
149
52
0
10 Oct 2024
ElasticTok: Adaptive Tokenization for Image and Video
ElasticTok: Adaptive Tokenization for Image and Video
Wilson Yan
Matei A. Zaharia
Volodymyr Mnih
Pieter Abbeel
Aleksandra Faust
Hao Liu
VGen
136
13
0
10 Oct 2024
Improved Variational Inference in Discrete VAEs using Error Correcting Codes
Improved Variational Inference in Discrete VAEs using Error Correcting Codes
María Martínez-García
Grace Villacrés
David Mitchell
Pablo M. Olmos
DRL
156
0
0
10 Oct 2024
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large
  Vision-Language Models
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
Rui Zhao
Hangjie Yuan
Yujie Wei
Shiwei Zhang
Yuchao Gu
...
Xiang Wang
Zhangjie Wu
Junhao Zhang
Yingya Zhang
Mike Zheng Shou
DiffMVLM
161
6
0
09 Oct 2024
LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning
LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning
Zhe Li
Weihao Yuan
Yisheng He
Lingteng Qiu
Shenhao Zhu
Xiaodong Gu
Weichao Shen
Yuan Dong
Zilong Dong
Laurence T. Yang
157
16
0
09 Oct 2024
Geometric Representation Condition Improves Equivariant Molecule Generation
Geometric Representation Condition Improves Equivariant Molecule Generation
Zian Li
Cai Zhou
Xiyuan Wang
Xingang Peng
Muhan Zhang
223
3
0
04 Oct 2024
SGW-based Multi-Task Learning in Vision Tasks
SGW-based Multi-Task Learning in Vision Tasks
Ruiyuan Zhang
Yuyao Chen
Yuchi Huo
Jiaxiang Liu
Dianbing Xi
Jie Liu
Chao Wu
102
1
0
03 Oct 2024
CaLMFlow: Volterra Flow Matching using Causal Language Models
CaLMFlow: Volterra Flow Matching using Causal Language Models
Shiyang Zhang
Daniel Levine
Ivan Vrkic
Marco Francesco Bressana
David Zhang
S. Rizvi
Yangtian Zhang
E. Zappala
David van Dijk
69
1
0
03 Oct 2024
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Wanpeng Zhang
Zilong Xie
Yicheng Feng
Yijiang Li
Xingrun Xing
Sipeng Zheng
Zongqing Lu
MLLM
175
6
0
03 Oct 2024
ImageFolder: Autoregressive Image Generation with Folded Tokens
ImageFolder: Autoregressive Image Generation with Folded Tokens
Xiang Li
Kai Qiu
Hao Chen
Jason Kuen
Jiuxiang Gu
Bhiksha Raj
Zhe Lin
VLM
155
44
0
02 Oct 2024
Denoising with a Joint-Embedding Predictive Architecture
Denoising with a Joint-Embedding Predictive Architecture
Dengsheng Chen
Jie Hu
Xiaoming Wei
Enhua Wu
DiffM
238
3
0
02 Oct 2024
Integrating Text-to-Music Models with Language Models: Composing Long
  Structured Music Pieces
Integrating Text-to-Music Models with Language Models: Composing Long Structured Music Pieces
Lilac Atassi
182
0
0
01 Oct 2024
Diverse Code Query Learning for Speech-Driven Facial Animation
Diverse Code Query Learning for Speech-Driven Facial Animation
Chunzhi Gu
Shigeru Kuriyama
Katsuya Hotta
DiffM
97
0
0
27 Sep 2024
Rejection Sampling IMLE: Designing Priors for Better Few-Shot Image
  Synthesis
Rejection Sampling IMLE: Designing Priors for Better Few-Shot Image Synthesis
Chirag Vashist
Shichong Peng
Ke Li
DiffM
120
2
0
26 Sep 2024
Exploring Semantic Clustering in Deep Reinforcement Learning for Video
  Games
Exploring Semantic Clustering in Deep Reinforcement Learning for Video Games
Liang Zhang
Justin Lieffers
A. Pyarelal
140
0
0
25 Sep 2024
Single Image, Any Face: Generalisable 3D Face Generation
Single Image, Any Face: Generalisable 3D Face Generation
Wenqing Wang
Haosen Yang
Josef Kittler
Xiatian Zhu
3DH
177
1
0
25 Sep 2024
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
Yu Zhang
Ziyue Jiang
Ruiqi Li
Changhao Pan
Jinzheng He
Rongjie Huang
Chuxin Wang
Zhou Zhao
DiffMVLM
255
13
0
24 Sep 2024
Multi-modal Generative AI: Multi-modal LLMs, Diffusions and the Unification
Multi-modal Generative AI: Multi-modal LLMs, Diffusions and the Unification
X. Wang
Yuwei Zhou
Bin Huang
Hong Chen
Wenwu Zhu
DiffM
223
1
0
23 Sep 2024
LASERS: LAtent Space Encoding for Representations with Sparsity for
  Generative Modeling
LASERS: LAtent Space Encoding for Representations with Sparsity for Generative Modeling
Xin Li
Anand Sarwate
69
0
0
16 Sep 2024
Beta-Sigma VAE: Separating beta and decoder variance in Gaussian
  variational autoencoder
Beta-Sigma VAE: Separating beta and decoder variance in Gaussian variational autoencoder
Seunghwan Kim
Seungkyu Lee
DRL
91
0
0
14 Sep 2024
Detect Fake with Fake: Leveraging Synthetic Data-driven Representation
  for Synthetic Image Detection
Detect Fake with Fake: Leveraging Synthetic Data-driven Representation for Synthetic Image Detection
Hina Otake
Yoshihiro Fukuhara
Yoshiki Kubotani
Shigeo Morishima
ViT
122
0
0
13 Sep 2024
GenCAD: Image-Conditioned Computer-Aided Design Generation with Transformer-Based Contrastive Representation and Diffusion Priors
GenCAD: Image-Conditioned Computer-Aided Design Generation with Transformer-Based Contrastive Representation and Diffusion Priors
Md Ferdous Alam
Faez Ahmed
DiffM
184
14
0
08 Sep 2024
Blended Latent Diffusion under Attention Control for Real-World Video
  Editing
Blended Latent Diffusion under Attention Control for Real-World Video Editing
Deyin Liu
Lin Yuanbo Wu
Xianghua Xie
DiffM
71
2
0
05 Sep 2024
VQ-Flow: Taming Normalizing Flows for Multi-Class Anomaly Detection via
  Hierarchical Vector Quantization
VQ-Flow: Taming Normalizing Flows for Multi-Class Anomaly Detection via Hierarchical Vector Quantization
Yixuan Zhou
Xing Xu
Zhe Sun
Jingkuan Song
A. Cichocki
Heng Tao Shen
147
2
0
02 Sep 2024
AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
Zanlin Ni
Yulin Wang
Renping Zhou
Rui Lu
Jiayi Guo
Jinyi Hu
Zhiyuan Liu
Yuan Yao
Gao Huang
154
10
0
31 Aug 2024
BELT-2: Bootstrapping EEG-to-Language representation alignment for
  multi-task brain decoding
BELT-2: Bootstrapping EEG-to-Language representation alignment for multi-task brain decoding
Jinzhao Zhou
Yiqun Duan
Fred Chang
T. Do
Yu-Kai Wang
Chin-Teng Lin
108
6
0
28 Aug 2024
AEMLO: AutoEncoder-Guided Multi-Label Oversampling
AEMLO: AutoEncoder-Guided Multi-Label Oversampling
Ao Zhou
Bin Liu
Jin Wang
K. Sun
Kelin Liu
SyDa
77
0
0
23 Aug 2024
A Grey-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse
A Grey-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse
Zhongliang Guo
Lei Fang
Jingyu Lin
Yifei Qian
Shuai Zhao
Zeyu Wang
Zeyu Wang
Cunjian Chen
Ognjen Arandjelović
Chun Pong Lau
DiffMAAML
190
11
0
20 Aug 2024
FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance
FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance
Jiasong Feng
Ao Ma
Jing Wang
Bo Cheng
Xiaodan Liang
DiffMVGen
137
9
0
15 Aug 2024
DiffLoRA: Generating Personalized Low-Rank Adaptation Weights with
  Diffusion
DiffLoRA: Generating Personalized Low-Rank Adaptation Weights with Diffusion
Yujia Wu
Yiming Shi
Jiwei Wei
Chengwei Sun
Yuyang Zhou
Yang Yang
Heng Tao Shen
160
6
0
13 Aug 2024
Music2Latent: Consistency Autoencoders for Latent Audio Compression
Music2Latent: Consistency Autoencoders for Latent Audio Compression
Marco Pasini
Stefan Lattner
George Fazekas
103
19
0
12 Aug 2024
DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation
DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation
Jisoo Kim
Jungbin Cho
Joonho Park
Soonmin Hwang
Da Eun Kim
Geon Kim
Youngjae Yu
210
1
0
12 Aug 2024
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Dongyang Liu
Shitian Zhao
Le Zhuo
Weifeng Lin
Ping Luo
Xinyue Li
Qi Qin
Yu Qiao
Hongsheng Li
Peng Gao
MLLM
214
78
0
05 Aug 2024
PanoFree: Tuning-Free Holistic Multi-view Image Generation with
  Cross-view Self-Guidance
PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance
Aoming Liu
Zhong Li
Zhang Chen
Nannan Li
Yinghao Xu
Bryan A. Plummer
101
9
0
04 Aug 2024
LDFaceNet: Latent Diffusion-based Network for High-Fidelity Deepfake
  Generation
LDFaceNet: Latent Diffusion-based Network for High-Fidelity Deepfake Generation
Dwij Mehta
Aditya Mehta
Pratik Narang
DiffM
117
2
0
04 Aug 2024
VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling
VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling
Qian Zhang
Xiangzi Dai
Ninghua Yang
Xiang An
Ziyong Feng
Xingyu Ren
VLMCLIP
144
26
0
02 Aug 2024
Informed Correctors for Discrete Diffusion Models
Informed Correctors for Discrete Diffusion Models
Yixiu Zhao
Jiaxin Shi
F. Chen
Shaul Druckmann
Lester W. Mackey
Scott W. Linderman
225
16
0
30 Jul 2024
Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken
  Generation
Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation
Chak Tou Leong
Hongru Cai
Wenjie Wang
Leigang Qu
Yinwei Wei
Wenjie Li
Liqiang Nie
Tat-Seng Chua
DiffM
81
1
0
24 Jul 2024
WebRPG: Automatic Web Rendering Parameters Generation for Visual
  Presentation
WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation
Zirui Shao
Feiyu Gao
Hangdi Xing
Zepeng Zhu
Zhi Yu
Jiajun Bu
Qi Zheng
Cong Yao
85
4
0
22 Jul 2024
Decomposed Vector-Quantized Variational Autoencoder for Human Grasp
  Generation
Decomposed Vector-Quantized Variational Autoencoder for Human Grasp Generation
Zhe Zhao
Mengshi Qi
Huadong Ma
DRL
116
4
0
19 Jul 2024
SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow
SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow
Yuanzhi Zhu
Xingchao Liu
Qiang Liu
115
12
0
17 Jul 2024
GLARE: Low Light Image Enhancement via Generative Latent Feature based
  Codebook Retrieval
GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval
Han Zhou
Wei Dong
Xiaohong Liu
Shuaicheng Liu
Xiongkuo Min
Guangtao Zhai
Jun Chen
144
27
0
17 Jul 2024
Generating 3D House Wireframes with Semantics
Generating 3D House Wireframes with Semantics
Xueqi Ma
Yilin Liu
Wenjun Zhou
Ruowei Wang
Hui Huang
3DV
101
4
0
17 Jul 2024
Quantised Global Autoencoder: A Holistic Approach to Representing Visual
  Data
Quantised Global Autoencoder: A Holistic Approach to Representing Visual Data
Tim Elsner
Paula Usinger
Victor Czech
Gregor Kobsik
Yanjiang He
I. Lim
Leif Kobbelt
111
2
0
16 Jul 2024
Masked Generative Video-to-Audio Transformers with Enhanced
  Synchronicity
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
Santiago Pascual
Chunghsin Yeh
Ioannis Tsiamas
Joan Serrà
DiffMVGen
126
20
0
15 Jul 2024
RTMW: Real-Time Multi-Person 2D and 3D Whole-body Pose Estimation
RTMW: Real-Time Multi-Person 2D and 3D Whole-body Pose Estimation
Tao Jiang
Xinchen Xie
Yining Li
3DH
113
10
0
11 Jul 2024
Several questions of visual generation in 2024
Several questions of visual generation in 2024
Shuyang Gu
104
2
0
11 Jul 2024
Mobius: A High Efficient Spatial-Temporal Parallel Training Paradigm for
  Text-to-Video Generation Task
Mobius: A High Efficient Spatial-Temporal Parallel Training Paradigm for Text-to-Video Generation Task
Yiran Yang
Jinchao Zhang
Ying Deng
Jie Zhou
DiffM
82
1
0
09 Jul 2024
Latent Space Imaging
Latent Space Imaging
Matheus Souza
Yidan Zheng
Kaizhang Kang
Yogeshwar Nath Mishra
Qiang Fu
Wolfgang Heidrich
208
0
0
09 Jul 2024
Previous
123456...222324
Next