Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1711.00937
Cited By
v1
v2 (latest)
Neural Discrete Representation Learning
2 November 2017
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDL
SSL
OCL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Neural Discrete Representation Learning"
50 / 3,267 papers shown
Title
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Seong-Whan Lee
65
0
0
26 May 2025
WorldEval: World Model as Real-World Robot Policies Evaluator
Yaxuan Li
Yichen Zhu
Junjie Wen
Chaomin Shen
Yi Xu
OffRL
VGen
41
0
0
25 May 2025
Plug-and-Play Context Feature Reuse for Efficient Masked Generation
Xuejie Liu
Anji Liu
Guy Van den Broeck
Yitao Liang
56
0
0
25 May 2025
Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation
Jiwan Chung
Junhyeok Kim
Siyeol Kim
Jaeyoung Lee
Min Soo Kim
Youngjae Yu
LRM
95
0
0
24 May 2025
Joint-stochastic-approximation Autoencoders with Application to Semi-supervised Learning
Wenbo He
Zhijian Ou
DRL
BDL
45
0
0
24 May 2025
BiomechGPT: Towards a Biomechanically Fluent Multimodal Foundation Model for Clinically Relevant Motion Tasks
Ruize Yang
Ann Kennedy
R. James Cotton
19
0
0
24 May 2025
High-Fidelity Functional Ultrasound Reconstruction via A Visual Auto-Regressive Framework
Xuhang Chen
Zhuo Li
Yanyan Shen
Mufti Mahmud
Hieu Pham
Chi-Man Pun
Shuqiang Wang
44
0
0
23 May 2025
UniTTS: An end-to-end TTS system without decoupling of acoustic and semantic information
Rui Wang
Qianguo Sun
Tianrong Chen
Zhiyun Zeng
Jinlin Wu
Jiaxing Zhang
VLM
45
0
0
23 May 2025
Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
Huayu Chen
Kaiwen Zheng
Qinsheng Zhang
Ganqu Cui
Yin Cui
Haotian Ye
Tsung-Yi Lin
Ming-Yu Liu
Jun Zhu
Haoxiang Wang
OffRL
LRM
263
3
0
23 May 2025
FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving
Shuang Zeng
Xinyuan Chang
Mengwei Xie
Xinran Liu
Yifan Bai
Zheng Pan
Mu Xu
Xing Wei
LRM
149
0
0
23 May 2025
Imagine Beyond! Distributionally Robust Auto-Encoding for State Space Coverage in Online Reinforcement Learning
Nicolas Castanet
Olivier Sigaud
Sylvain Lamprier
OffRL
116
0
0
23 May 2025
Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM
Donghwan Chi
Hyomin Kim
Yoonjin Oh
Yongjin Kim
Donghoon Lee
DaeJin Jo
Jongmin Kim
Junyeob Baek
Sungjin Ahn
Sungwoong Kim
MLLM
VLM
499
0
0
23 May 2025
FPQVAR: Floating Point Quantization for Visual Autoregressive Model with FPGA Hardware Co-design
Renjie Wei
Songqiang Xu
Qingyu Guo
Meng Li
MQ
91
0
0
22 May 2025
MARché: Fast Masked Autoregressive Image Generation with Cache-Aware Attention
Chaoyi Jiang
Sungwoo Kim
Lei Gao
Hossein Entezari Zarch
Won Woo Ro
Murali Annavaram
34
0
0
22 May 2025
ChemMLLM: Chemical Multimodal Large Language Model
Qian Tan
Dongzhan Zhou
Peng Xia
Wanhao Liu
Wanli Ouyang
Lei Bai
Yuqiang Li
Tianfan Fu
MLLM
49
0
0
22 May 2025
TensorAR: Refinement is All You Need in Autoregressive Image Generation
Cheng Cheng
Lin Song
Yicheng Xiao
Yuxin Chen
Xuchong Zhang
Hongbin Sun
Ying Shan
VGen
78
0
0
22 May 2025
Differentiable K-means for Fully-optimized Discrete Token-based ASR
Kentaro Onda
Yosuke Kashiwagi
E. Tsunoo
Hayato Futami
Shinji Watanabe
73
0
0
22 May 2025
Generative Latent Coding for Ultra-Low Bitrate Image and Video Compression
Linfeng Qi
Zhaoyang Jia
Jiahao Li
Bin Li
Houqiang Li
Yan Lu
86
0
0
22 May 2025
Segmentation-Variant Codebooks for Preservation of Paralinguistic and Prosodic Information
Nicholas Sanders
Yuanchao Li
Korin Richmond
Simon King
76
0
0
21 May 2025
EASY: Emotion-aware Speaker Anonymization via Factorized Distillation
Jixun Yao
Hexin Liu
Eng Siong Chng
Lei Xie
57
0
0
21 May 2025
Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation
Xiaozhao Liu
Dinggang Shen
Xihui Liu
86
0
0
21 May 2025
Discrete Audio Representations for Automated Audio Captioning
Jingguang Tian
Haoqin Sun
Xinhui Hu
Xinkang Xu
75
0
0
21 May 2025
Intentional Gesture: Deliver Your Intentions with Gestures for Speech
Pinxin Liu
Haiyang Liu
Luchuan Song
Chenliang Xu
SLR
72
1
0
21 May 2025
RLVR-World: Training World Models with Reinforcement Learning
Jialong Wu
Shaofeng Yin
Ningya Feng
Mingsheng Long
OffRL
VGen
87
2
0
20 May 2025
MSDformer: Multi-scale Discrete Transformer For Time Series Generation
Zhicheng Chen
Shibo Feng
Xi Xiao
Zhong Zhang
Qing Li
Xingyu Gao
Peilin Zhao
58
0
0
20 May 2025
MatchDance: Collaborative Mamba-Transformer Architecture Matching for High-Quality 3D Dance Synthesis
Kaixing Yang
Xulong Tang
Yuxuan Hu
Jiahao Yang
Hongyan Liu
Qinnan Zhang
Jun He
Zhaoxin Fan
102
0
0
20 May 2025
Plane Geometry Problem Solving with Multi-modal Reasoning: A Survey
Seunghyuk Cho
Zhenyue Qin
Yang Liu
Youngbin Choi
Seungbeom Lee
Dongwoo Kim
LRM
111
0
0
20 May 2025
Byte Pair Encoding for Efficient Time Series Forecasting
Leon Götz
Marcel Kollovieh
Stephan Günnemann
Leo Schwinn
AI4TS
95
1
0
20 May 2025
Large Language Models Implicitly Learn to See and Hear Just By Reading
Prateek Verma
Mert Pilanci
200
0
0
20 May 2025
Impact of Frame Rates on Speech Tokenizer: A Case Study on Mandarin and English
Haoyang Zhang
Hexin Liu
Xiangyu Zhang
Qiquan Zhang
Yuchen Hu
Junqi Zhao
Fei Tian
Xuerui Yang
Eng Siong Chng
Eng Siong Chng
67
0
0
20 May 2025
Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space
Zhengrui Ma
Yang Feng
Chenze Shao
Fandong Meng
Jie Zhou
Min Zhang
81
0
0
19 May 2025
VesselGPT: Autoregressive Modeling of Vascular Geometry
Paula Feldman
Martin Sinnona
Viviana Siless
C. Delrieux
Emmanuel Iarussi
AI4CE
85
0
0
19 May 2025
Universal Semantic Disentangled Privacy-preserving Speech Representation Learning
Biel Tura Vecino
Subhadeep Maji
Aravind Varier
Antonio Bonafonte
Ivan Valles
...
Roberto Barra-Chicote
Ariya Rastrow
C. Papayiannis
Volker Leutnant
Trevor Wood
43
0
0
19 May 2025
DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation
Jiaqi Li
Xiaolong Lin
Zhekai Li
Shixi Huang
Yuancheng Wang
Chaoren Wang
Zhenpeng Zhan
Zhizheng Wu
103
1
0
19 May 2025
VTBench: Evaluating Visual Tokenizers for Autoregressive Image Generation
Huawei Lin
Tong Geng
Zhaozhuo Xu
Weijie Zhao
VLM
182
1
0
19 May 2025
OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching
Hieu-Nghia Huynh-Nguyen
Ngoc Son Nguyen
Huynh Nguyen Dang
Thieu Vo
Truong-Son Hy
Van Nguyen
80
0
0
19 May 2025
GANCompress: GAN-Enhanced Neural Image Compression with Binary Spherical Quantization
Karthik Sivakoti
64
0
0
19 May 2025
Denoising Diffusion Probabilistic Model for Point Cloud Compression at Low Bit-Rates
Gabriele Spadaro
Alberto Presta
Jhony H. Giraldo
Marco Grangetto
Wei Hu
Giuseppe Valenzise
Attilio Fiandrotti
Enzo Tartaglione
DiffM
64
0
0
19 May 2025
MVAR: Visual Autoregressive Modeling with Scale and Spatial Markovian Conditioning
Jinhua Zhang
Wei Long
Minghao Han
Weiyi You
Shuhang Gu
BDL
85
0
0
19 May 2025
Sat2Sound: A Unified Framework for Zero-Shot Soundscape Mapping
Subash Khanal
Srikumar Sastry
Aayush Dhakal
Adeel Ahmad
Nathan Jacobs
81
0
0
19 May 2025
Touch2Shape: Touch-Conditioned 3D Diffusion for Shape Exploration and Reconstruction
Yuanbo Wang
Zhaoxuan Zhang
Jiajin Qiu
Dilong Sun
Zhengyu Meng
Xiaopeng Wei
Xin Yang
91
0
0
19 May 2025
ChromFound: Towards A Universal Foundation Model for Single-Cell Chromatin Accessibility Data
Yifeng Jiao
Yuchen Liu
Yu Zhang
Xin Guo
Yushuai Wu
...
Hongwei Zhang
Limei Han
Xin Gao
Yuan Qi
Yuan Cheng
152
0
0
19 May 2025
Understanding Complexity in VideoQA via Visual Program Generation
Cristobal Eyzaguirre
Igor Vasiljevic
Achal Dave
Jiajun Wu
Rares Andrei Ambrus
Thomas Kollar
Juan Carlos Niebles
P. Tokmakov
80
0
0
19 May 2025
FreqSelect: Frequency-Aware fMRI-to-Image Reconstruction
Junliang Ye
Lei Wang
Md Zakir Hossain
DiffM
67
0
0
18 May 2025
Context-Aware Autoregressive Models for Multi-Conditional Image Generation
Yixiao Chen
Zhiyuan Ma
Guoli Jia
Che Jiang
Jianjun Li
Bowen Zhou
DiffM
74
0
0
18 May 2025
Hyperbolic Residual Quantization: Discrete Representations for Data with Latent Hierarchies
Piotr Piękos
Subhradeep Kayal
Alexandros Karatzoglou
92
0
0
18 May 2025
Training Latent Diffusion Models with Interacting Particle Algorithms
Tim Y. J. Wang
Juan Kuntz
O. Deniz Akyildiz
124
0
0
18 May 2025
Patient-Specific Autoregressive Models for Organ Motion Prediction in Radiotherapy
Yuxiang Lai
Jike Zhong
Vanessa Su
Xiaofeng Yang
102
0
0
17 May 2025
TACO: Rethinking Semantic Communications with Task Adaptation and Context Embedding
Achintha Wijesinghe
Weiwei Wang
Suchinthaka Wanninayaka
Songyang Zhang
Zhi Ding
73
0
0
16 May 2025
EA-3DGS: Efficient and Adaptive 3D Gaussians with Highly Enhanced Quality for outdoor scenes
Jianlin Guo
Haihong Xiao
Wenxiong Kang
3DGS
131
1
0
16 May 2025
Previous
1
2
3
4
5
6
...
64
65
66
Next