Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1711.00937
Cited By
Neural Discrete Representation Learning
2 November 2017
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDL
SSL
OCL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Neural Discrete Representation Learning"
50 / 2,789 papers shown
Title
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
Roman Bachmann
Oğuzhan Fatih Kar
David Mizrahi
Ali Garjani
Mingfei Gao
David Griffiths
Jiaming Hu
Afshin Dehghan
Amir Zamir
MoE
VLM
MLLM
51
14
0
13 Jun 2024
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Junke Wang
Yi Jiang
Zehuan Yuan
Binyue Peng
Zuxuan Wu
Yu-Gang Jiang
ViT
VGen
80
38
0
13 Jun 2024
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
Chaeyoung Jung
Suyeon Lee
Ji-Hoon Kim
Joon Son Chung
DiffM
49
4
0
13 Jun 2024
The Significance of Latent Data Divergence in Predicting System Degradation
Miguel Fernandes
Catarina Silva
Alberto Cardoso
Bernardete Ribeiro
31
0
0
13 Jun 2024
ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
Dehua Tao
Daxin Tan
Y. Yeung
Xiao Chen
Tan Lee
35
3
0
13 Jun 2024
Cognitively Inspired Energy-Based World Models
Alexi Gladstone
Ganesh Nanduru
Md. Mofijul Islam
Aman Chadha
Jundong Li
Tariq Iqbal
49
0
0
13 Jun 2024
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
Mingwang Xu
Hui Li
Qingkun Su
Hanlin Shang
Liwei Zhang
Ce Liu
Jingdong Wang
Yao Yao
Siyu Zhu
VGen
45
69
0
13 Jun 2024
Human-level molecular optimization driven by mol-gene evolution
Jiebin Fang
Churu Mao
Yuchen Zhu
Xiaoming Chen
Chang-Yu Hsieh
Zhongjun Ma
BDL
45
0
0
13 Jun 2024
PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance
Qijun Gan
Song Wang
Shengtao Wu
Jianke Zhu
65
1
0
13 Jun 2024
Vivid-ZOO: Multi-View Video Generation with Diffusion Model
Bing Li
Cheng Zheng
Wenxuan Zhu
Jinjie Mai
Biao Zhang
Peter Wonka
Bernard Ghanem
53
16
0
12 Jun 2024
TokSing: Singing Voice Synthesis based on Discrete Tokens
Yuning Wu
Chunlei Zhang
Jiatong Shi
Yuxun Tang
Shan Yang
Qin Jin
44
6
0
12 Jun 2024
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
Wenhao Guan
Kaidi Wang
Wangjin Zhou
Yang Wang
Feng Deng
Hui Wang
Lin Li
Q. Hong
Yong Qin
DiffM
46
3
0
12 Jun 2024
Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio
Yi Lu
Yuankun Xie
Ruibo Fu
Zhengqi Wen
Jianhua Tao
...
Xuefei Liu
Yongwei Li
Yukun Liu
Xiaopeng Wang
Shuchen Shi
56
1
0
12 Jun 2024
Grounding Multimodal Large Language Models in Actions
Andrew Szot
Bogdan Mazoure
Harsh Agrawal
Devon Hjelm
Z. Kira
Alexander Toshev
LM&Ro
50
11
0
12 Jun 2024
To be Continuous, or to be Discrete, Those are Bits of Questions
Yiran Wang
Masao Utiyama
53
2
0
12 Jun 2024
Enhancing End-to-End Autonomous Driving with Latent World Model
Yingyan Li
Lue Fan
Jiawei He
Yuqi Wang
Yuntao Chen
Zhaoxiang Zhang
Tieniu Tan
82
8
0
12 Jun 2024
An Image is Worth 32 Tokens for Reconstruction and Generation
Qihang Yu
Mark Weber
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
VLM
ViT
65
86
0
11 Jun 2024
Image and Video Tokenization with Binary Spherical Quantization
Yue Zhao
Yuanjun Xiong
Philipp Krahenbuhl
45
18
0
11 Jun 2024
Identifiable Object-Centric Representation Learning via Probabilistic Slot Attention
Avinash Kori
Francesco Locatello
Ainkaran Santhirasekaram
Francesca Toni
Ben Glocker
Fabio De Sousa Ribeiro
OCL
55
1
0
11 Jun 2024
T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text
Aoxiong Yin
Haoyuan Li
Kai Shen
Siliang Tang
Yueting Zhuang
SLR
63
2
0
11 Jun 2024
CAT: Coordinating Anatomical-Textual Prompts for Multi-Organ and Tumor Segmentation
Zhongzhen Huang
Yankai Jiang
Rongzhao Zhang
Shaoting Zhang
Xiaofan Zhang
MedIm
75
4
0
11 Jun 2024
Discrete Dictionary-based Decomposition Layer for Structured Representation Learning
Taewon Park
Hyun-Chul Kim
Minho Lee
49
0
0
11 Jun 2024
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
X. Wang
Siming Fu
Qihan Huang
Wanggui He
Hao Jiang
DiffM
56
41
0
11 Jun 2024
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Peize Sun
Yi Jiang
Shoufa Chen
Shilong Zhang
Bingyue Peng
Ping Luo
Zehuan Yuan
VLM
68
235
0
10 Jun 2024
Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer
Sigal Raab
Inbar Gat
Nathan Sala
Guy Tevet
Rotem Shalev-Arkushin
Ohad Fried
Amit H. Bermano
Daniel Cohen-Or
37
11
0
10 Jun 2024
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction
Zhen Xing
Qi Dai
Zejia Weng
Zuxuan Wu
Yu-Gang Jiang
VGen
54
14
0
10 Jun 2024
Improving Deep Learning-based Automatic Cranial Defect Reconstruction by Heavy Data Augmentation: From Image Registration to Latent Diffusion Models
Marek Wodzinski
K. Kwarciak
Mateusz Daniol
D. Hemmerling
DiffM
MedIm
39
4
0
10 Jun 2024
Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks
Victor Boutin
Rishav Mukherji
Aditya Agrawal
Sabine Muzellec
Thomas Fel
Thomas Serre
Rufin VanRullen
DiffM
42
0
0
10 Jun 2024
Emotion-Aware Speech Self-Supervised Representation Learning with Intensity Knowledge
Rui Liu
Zening Ma
SSL
47
1
0
10 Jun 2024
FRAG: Frequency Adapting Group for Diffusion Video Editing
Sunjae Yoon
Gwanhyeong Koo
Geonwoo Kim
Chang D. Yoo
DiffM
41
5
0
10 Jun 2024
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Zhijun Liu
Shuai Wang
Sho Inoue
Qibing Bai
Haizhou Li
DiffM
50
15
0
08 Jun 2024
VP-LLM: Text-Driven 3D Volume Completion with Large Language Models through Patchification
Jianmeng Liu
Yichen Liu
Yuyao Zhang
Zeyuan Meng
Yu-Wing Tai
Chi-Keung Tang
51
0
0
08 Jun 2024
Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities
Sai Munikoti
Ian Stewart
Sameera Horawalavithana
Henry Kvinge
Tegan H. Emerson
Sandra E Thompson
Karl Pazdernik
40
2
0
08 Jun 2024
Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Zanlin Ni
Yulin Wang
Renping Zhou
Jiayi Guo
Jinyi Hu
Zhiyuan Liu
Shiji Song
Yuan Yao
Gao Huang
41
14
0
08 Jun 2024
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Sanjoy Chowdhury
Sayan Nag
K. J. Joseph
Balaji Vasan Srinivasan
Dinesh Manocha
DiffM
51
7
0
07 Jun 2024
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
Rishit Dagli
Shivesh Prakash
Robert Wu
H. Khosravani
41
3
0
06 Jun 2024
Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis
Marianna Ohanyan
Hayk Manukyan
Zhangyang Wang
Shant Navasardyan
Humphrey Shi
DiffM
63
1
0
06 Jun 2024
Fine-Grained Causal Dynamics Learning with Quantization for Improving Robustness in Reinforcement Learning
Inwoo Hwang
Yunhyeok Kwak
Suhyung Choi
Byoung-Tak Zhang
Sanghack Lee
50
1
0
05 Jun 2024
VQUNet: Vector Quantization U-Net for Defending Adversarial Atacks by Regularizing Unwanted Noise
Zhixun He
Mukesh Singhal
35
1
0
05 Jun 2024
Phy-Diff: Physics-guided Hourglass Diffusion Model for Diffusion MRI Synthesis
Juanhua Zhang
Ruodan Yan
Alessandro Perelli
Xi Chen
Chao Li
MedIm
DiffM
63
5
0
05 Jun 2024
Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
Haohan Guo
Fenglong Xie
Dongchao Yang
Hui Lu
Xixin Wu
Helen Meng
63
6
0
05 Jun 2024
U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation
Chenxin Li
Xinyu Liu
Wenbo Li
Cheng Wang
Hengyu Liu
Yifan Liu
Zhen Chen
Yixuan Yuan
MedIm
DiffM
SSeg
67
118
0
05 Jun 2024
What Matters in Hierarchical Search for Combinatorial Reasoning Problems?
Michał Zawalski
Gracjan Góral
Michał Tyrolski
Emilia Wisnios
Franciszek Budrowski
Marek Cygan
Łukasz Kuciński
Piotr Miłoś
52
0
0
05 Jun 2024
CoNav: A Benchmark for Human-Centered Collaborative Navigation
Changhao Li
Xinyu Sun
Peihao Chen
Jugang Fan
Zixu Wang
Yanxia Liu
Jinhui Zhu
Chuang Gan
Mingkui Tan
58
1
0
04 Jun 2024
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
Dongchao Yang
Dingdong Wang
Haohan Guo
Xueyuan Chen
Xixin Wu
Helen M. Meng
67
26
0
04 Jun 2024
AROMA: Preserving Spatial Structure for Latent PDE Modeling with Local Neural Fields
Louis Serrano
Thomas X. Wang
E. L. Naour
Jean-Noel Vittaut
Patrick Gallinari
40
5
0
04 Jun 2024
MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training
Kengo Uchida
Takashi Shibuya
Yuhta Takida
Naoki Murata
Shusuke Takahashi
Shusuke Takahashi
Yuki Mitsufuji
VGen
57
5
0
04 Jun 2024
Learning to Play Atari in a World of Tokens
Pranav Agarwal
Sheldon Andrews
Samira Ebrahimi Kahou
OffRL
38
1
0
03 Jun 2024
Learning-based legged locomotion; state of the art and future perspectives
Sehoon Ha
Joonho Lee
M. van de Panne
Zhaoming Xie
Wenhao Yu
Majid Khadiv
47
17
0
03 Jun 2024
LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning
Junjie Xu
Zongyu Wu
Min Lin
Xiang Zhang
Suhang Wang
35
12
0
03 Jun 2024
Previous
1
2
3
...
17
18
19
...
54
55
56
Next