ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.00937
  4. Cited By
Neural Discrete Representation Learning
v1v2 (latest)

Neural Discrete Representation Learning

2 November 2017
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
    BDLSSLOCL
ArXiv (abs)PDFHTML

Papers citing "Neural Discrete Representation Learning"

50 / 3,267 papers shown
Title
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Seong-Whan Lee
65
0
0
26 May 2025
WorldEval: World Model as Real-World Robot Policies Evaluator
WorldEval: World Model as Real-World Robot Policies Evaluator
Yaxuan Li
Yichen Zhu
Junjie Wen
Chaomin Shen
Yi Xu
OffRLVGen
41
0
0
25 May 2025
Plug-and-Play Context Feature Reuse for Efficient Masked Generation
Plug-and-Play Context Feature Reuse for Efficient Masked Generation
Xuejie Liu
Anji Liu
Guy Van den Broeck
Yitao Liang
56
0
0
25 May 2025
Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation
Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation
Jiwan Chung
Junhyeok Kim
Siyeol Kim
Jaeyoung Lee
Min Soo Kim
Youngjae Yu
LRM
95
0
0
24 May 2025
Joint-stochastic-approximation Autoencoders with Application to Semi-supervised Learning
Joint-stochastic-approximation Autoencoders with Application to Semi-supervised Learning
Wenbo He
Zhijian Ou
DRLBDL
45
0
0
24 May 2025
BiomechGPT: Towards a Biomechanically Fluent Multimodal Foundation Model for Clinically Relevant Motion Tasks
BiomechGPT: Towards a Biomechanically Fluent Multimodal Foundation Model for Clinically Relevant Motion Tasks
Ruize Yang
Ann Kennedy
R. James Cotton
19
0
0
24 May 2025
High-Fidelity Functional Ultrasound Reconstruction via A Visual Auto-Regressive Framework
High-Fidelity Functional Ultrasound Reconstruction via A Visual Auto-Regressive Framework
Xuhang Chen
Zhuo Li
Yanyan Shen
Mufti Mahmud
Hieu Pham
Chi-Man Pun
Shuqiang Wang
44
0
0
23 May 2025
UniTTS: An end-to-end TTS system without decoupling of acoustic and semantic information
Rui Wang
Qianguo Sun
Tianrong Chen
Zhiyun Zeng
Jinlin Wu
Jiaxing Zhang
VLM
45
0
0
23 May 2025
Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
Huayu Chen
Kaiwen Zheng
Qinsheng Zhang
Ganqu Cui
Yin Cui
Haotian Ye
Tsung-Yi Lin
Ming-Yu Liu
Jun Zhu
Haoxiang Wang
OffRLLRM
263
3
0
23 May 2025
FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving
Shuang Zeng
Xinyuan Chang
Mengwei Xie
Xinran Liu
Yifan Bai
Zheng Pan
Mu Xu
Xing Wei
LRM
149
0
0
23 May 2025
Imagine Beyond! Distributionally Robust Auto-Encoding for State Space Coverage in Online Reinforcement Learning
Imagine Beyond! Distributionally Robust Auto-Encoding for State Space Coverage in Online Reinforcement Learning
Nicolas Castanet
Olivier Sigaud
Sylvain Lamprier
OffRL
116
0
0
23 May 2025
Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM
Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM
Donghwan Chi
Hyomin Kim
Yoonjin Oh
Yongjin Kim
Donghoon Lee
DaeJin Jo
Jongmin Kim
Junyeob Baek
Sungjin Ahn
Sungwoong Kim
MLLMVLM
499
0
0
23 May 2025
FPQVAR: Floating Point Quantization for Visual Autoregressive Model with FPGA Hardware Co-design
FPQVAR: Floating Point Quantization for Visual Autoregressive Model with FPGA Hardware Co-design
Renjie Wei
Songqiang Xu
Qingyu Guo
Meng Li
MQ
91
0
0
22 May 2025
MARché: Fast Masked Autoregressive Image Generation with Cache-Aware Attention
MARché: Fast Masked Autoregressive Image Generation with Cache-Aware Attention
Chaoyi Jiang
Sungwoo Kim
Lei Gao
Hossein Entezari Zarch
Won Woo Ro
Murali Annavaram
34
0
0
22 May 2025
ChemMLLM: Chemical Multimodal Large Language Model
ChemMLLM: Chemical Multimodal Large Language Model
Qian Tan
Dongzhan Zhou
Peng Xia
Wanhao Liu
Wanli Ouyang
Lei Bai
Yuqiang Li
Tianfan Fu
MLLM
49
0
0
22 May 2025
TensorAR: Refinement is All You Need in Autoregressive Image Generation
TensorAR: Refinement is All You Need in Autoregressive Image Generation
Cheng Cheng
Lin Song
Yicheng Xiao
Yuxin Chen
Xuchong Zhang
Hongbin Sun
Ying Shan
VGen
78
0
0
22 May 2025
Differentiable K-means for Fully-optimized Discrete Token-based ASR
Differentiable K-means for Fully-optimized Discrete Token-based ASR
Kentaro Onda
Yosuke Kashiwagi
E. Tsunoo
Hayato Futami
Shinji Watanabe
73
0
0
22 May 2025
Generative Latent Coding for Ultra-Low Bitrate Image and Video Compression
Generative Latent Coding for Ultra-Low Bitrate Image and Video Compression
Linfeng Qi
Zhaoyang Jia
Jiahao Li
Bin Li
Houqiang Li
Yan Lu
86
0
0
22 May 2025
Segmentation-Variant Codebooks for Preservation of Paralinguistic and Prosodic Information
Segmentation-Variant Codebooks for Preservation of Paralinguistic and Prosodic Information
Nicholas Sanders
Yuanchao Li
Korin Richmond
Simon King
76
0
0
21 May 2025
EASY: Emotion-aware Speaker Anonymization via Factorized Distillation
EASY: Emotion-aware Speaker Anonymization via Factorized Distillation
Jixun Yao
Hexin Liu
Eng Siong Chng
Lei Xie
57
0
0
21 May 2025
Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation
Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation
Xiaozhao Liu
Dinggang Shen
Xihui Liu
86
0
0
21 May 2025
Discrete Audio Representations for Automated Audio Captioning
Discrete Audio Representations for Automated Audio Captioning
Jingguang Tian
Haoqin Sun
Xinhui Hu
Xinkang Xu
75
0
0
21 May 2025
Intentional Gesture: Deliver Your Intentions with Gestures for Speech
Intentional Gesture: Deliver Your Intentions with Gestures for Speech
Pinxin Liu
Haiyang Liu
Luchuan Song
Chenliang Xu
SLR
72
1
0
21 May 2025
RLVR-World: Training World Models with Reinforcement Learning
RLVR-World: Training World Models with Reinforcement Learning
Jialong Wu
Shaofeng Yin
Ningya Feng
Mingsheng Long
OffRLVGen
87
2
0
20 May 2025
MSDformer: Multi-scale Discrete Transformer For Time Series Generation
MSDformer: Multi-scale Discrete Transformer For Time Series Generation
Zhicheng Chen
Shibo Feng
Xi Xiao
Zhong Zhang
Qing Li
Xingyu Gao
Peilin Zhao
58
0
0
20 May 2025
MatchDance: Collaborative Mamba-Transformer Architecture Matching for High-Quality 3D Dance Synthesis
MatchDance: Collaborative Mamba-Transformer Architecture Matching for High-Quality 3D Dance Synthesis
Kaixing Yang
Xulong Tang
Yuxuan Hu
Jiahao Yang
Hongyan Liu
Qinnan Zhang
Jun He
Zhaoxin Fan
102
0
0
20 May 2025
Plane Geometry Problem Solving with Multi-modal Reasoning: A Survey
Plane Geometry Problem Solving with Multi-modal Reasoning: A Survey
Seunghyuk Cho
Zhenyue Qin
Yang Liu
Youngbin Choi
Seungbeom Lee
Dongwoo Kim
LRM
111
0
0
20 May 2025
Byte Pair Encoding for Efficient Time Series Forecasting
Byte Pair Encoding for Efficient Time Series Forecasting
Leon Götz
Marcel Kollovieh
Stephan Günnemann
Leo Schwinn
AI4TS
95
1
0
20 May 2025
Large Language Models Implicitly Learn to See and Hear Just By Reading
Prateek Verma
Mert Pilanci
200
0
0
20 May 2025
Impact of Frame Rates on Speech Tokenizer: A Case Study on Mandarin and English
Impact of Frame Rates on Speech Tokenizer: A Case Study on Mandarin and English
Haoyang Zhang
Hexin Liu
Xiangyu Zhang
Qiquan Zhang
Yuchen Hu
Junqi Zhao
Fei Tian
Xuerui Yang
Eng Siong Chng
Eng Siong Chng
67
0
0
20 May 2025
Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space
Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space
Zhengrui Ma
Yang Feng
Chenze Shao
Fandong Meng
Jie Zhou
Min Zhang
81
0
0
19 May 2025
VesselGPT: Autoregressive Modeling of Vascular Geometry
VesselGPT: Autoregressive Modeling of Vascular Geometry
Paula Feldman
Martin Sinnona
Viviana Siless
C. Delrieux
Emmanuel Iarussi
AI4CE
85
0
0
19 May 2025
Universal Semantic Disentangled Privacy-preserving Speech Representation Learning
Universal Semantic Disentangled Privacy-preserving Speech Representation Learning
Biel Tura Vecino
Subhadeep Maji
Aravind Varier
Antonio Bonafonte
Ivan Valles
...
Roberto Barra-Chicote
Ariya Rastrow
C. Papayiannis
Volker Leutnant
Trevor Wood
43
0
0
19 May 2025
DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation
DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation
Jiaqi Li
Xiaolong Lin
Zhekai Li
Shixi Huang
Yuancheng Wang
Chaoren Wang
Zhenpeng Zhan
Zhizheng Wu
103
1
0
19 May 2025
VTBench: Evaluating Visual Tokenizers for Autoregressive Image Generation
VTBench: Evaluating Visual Tokenizers for Autoregressive Image Generation
Huawei Lin
Tong Geng
Zhaozhuo Xu
Weijie Zhao
VLM
182
1
0
19 May 2025
OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching
OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching
Hieu-Nghia Huynh-Nguyen
Ngoc Son Nguyen
Huynh Nguyen Dang
Thieu Vo
Truong-Son Hy
Van Nguyen
80
0
0
19 May 2025
GANCompress: GAN-Enhanced Neural Image Compression with Binary Spherical Quantization
GANCompress: GAN-Enhanced Neural Image Compression with Binary Spherical Quantization
Karthik Sivakoti
64
0
0
19 May 2025
Denoising Diffusion Probabilistic Model for Point Cloud Compression at Low Bit-Rates
Denoising Diffusion Probabilistic Model for Point Cloud Compression at Low Bit-Rates
Gabriele Spadaro
Alberto Presta
Jhony H. Giraldo
Marco Grangetto
Wei Hu
Giuseppe Valenzise
Attilio Fiandrotti
Enzo Tartaglione
DiffM
64
0
0
19 May 2025
MVAR: Visual Autoregressive Modeling with Scale and Spatial Markovian Conditioning
MVAR: Visual Autoregressive Modeling with Scale and Spatial Markovian Conditioning
Jinhua Zhang
Wei Long
Minghao Han
Weiyi You
Shuhang Gu
BDL
85
0
0
19 May 2025
Sat2Sound: A Unified Framework for Zero-Shot Soundscape Mapping
Sat2Sound: A Unified Framework for Zero-Shot Soundscape Mapping
Subash Khanal
Srikumar Sastry
Aayush Dhakal
Adeel Ahmad
Nathan Jacobs
81
0
0
19 May 2025
Touch2Shape: Touch-Conditioned 3D Diffusion for Shape Exploration and Reconstruction
Touch2Shape: Touch-Conditioned 3D Diffusion for Shape Exploration and Reconstruction
Yuanbo Wang
Zhaoxuan Zhang
Jiajin Qiu
Dilong Sun
Zhengyu Meng
Xiaopeng Wei
Xin Yang
91
0
0
19 May 2025
ChromFound: Towards A Universal Foundation Model for Single-Cell Chromatin Accessibility Data
ChromFound: Towards A Universal Foundation Model for Single-Cell Chromatin Accessibility Data
Yifeng Jiao
Yuchen Liu
Yu Zhang
Xin Guo
Yushuai Wu
...
Hongwei Zhang
Limei Han
Xin Gao
Yuan Qi
Yuan Cheng
152
0
0
19 May 2025
Understanding Complexity in VideoQA via Visual Program Generation
Understanding Complexity in VideoQA via Visual Program Generation
Cristobal Eyzaguirre
Igor Vasiljevic
Achal Dave
Jiajun Wu
Rares Andrei Ambrus
Thomas Kollar
Juan Carlos Niebles
P. Tokmakov
80
0
0
19 May 2025
FreqSelect: Frequency-Aware fMRI-to-Image Reconstruction
FreqSelect: Frequency-Aware fMRI-to-Image Reconstruction
Junliang Ye
Lei Wang
Md Zakir Hossain
DiffM
67
0
0
18 May 2025
Context-Aware Autoregressive Models for Multi-Conditional Image Generation
Context-Aware Autoregressive Models for Multi-Conditional Image Generation
Yixiao Chen
Zhiyuan Ma
Guoli Jia
Che Jiang
Jianjun Li
Bowen Zhou
DiffM
74
0
0
18 May 2025
Hyperbolic Residual Quantization: Discrete Representations for Data with Latent Hierarchies
Hyperbolic Residual Quantization: Discrete Representations for Data with Latent Hierarchies
Piotr Piękos
Subhradeep Kayal
Alexandros Karatzoglou
92
0
0
18 May 2025
Training Latent Diffusion Models with Interacting Particle Algorithms
Training Latent Diffusion Models with Interacting Particle Algorithms
Tim Y. J. Wang
Juan Kuntz
O. Deniz Akyildiz
124
0
0
18 May 2025
Patient-Specific Autoregressive Models for Organ Motion Prediction in Radiotherapy
Patient-Specific Autoregressive Models for Organ Motion Prediction in Radiotherapy
Yuxiang Lai
Jike Zhong
Vanessa Su
Xiaofeng Yang
102
0
0
17 May 2025
TACO: Rethinking Semantic Communications with Task Adaptation and Context Embedding
TACO: Rethinking Semantic Communications with Task Adaptation and Context Embedding
Achintha Wijesinghe
Weiwei Wang
Suchinthaka Wanninayaka
Songyang Zhang
Zhi Ding
73
0
0
16 May 2025
EA-3DGS: Efficient and Adaptive 3D Gaussians with Highly Enhanced Quality for outdoor scenes
EA-3DGS: Efficient and Adaptive 3D Gaussians with Highly Enhanced Quality for outdoor scenes
Jianlin Guo
Haihong Xiao
Wenxiong Kang
3DGS
131
1
0
16 May 2025
Previous
123456...646566
Next