ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.00937
  4. Cited By
Neural Discrete Representation Learning
v1v2 (latest)

Neural Discrete Representation Learning

2 November 2017
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
    BDLSSLOCL
ArXiv (abs)PDFHTML

Papers citing "Neural Discrete Representation Learning"

50 / 3,267 papers shown
Title
Projectable Models: One-Shot Generation of Small Specialized Transformers from Large Ones
Projectable Models: One-Shot Generation of Small Specialized Transformers from Large Ones
A. Zhmoginov
Jihwan Lee
Mark Sandler
44
0
0
06 Jun 2025
Improving AI-generated music with user-guided training
Vishwa Mohan Singh
Sai Anirudh Aryasomayajula
Ahan Chatterjee
Beste Aydemir
Rifat Mehreen Amin
102
0
0
05 Jun 2025
UniMate: A Unified Model for Mechanical Metamaterial Generation, Property Prediction, and Condition Confirmation
UniMate: A Unified Model for Mechanical Metamaterial Generation, Property Prediction, and Condition Confirmation
Wangzhi Zhan
Jianpeng Chen
Dongqi Fu
Dawei Zhou
AI4CE
26
0
0
05 Jun 2025
STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization
STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization
Hao Li
Qi Lv
Rui Shao
Xiang Deng
Yinchuan Li
Jianye Hao
Liqiang Nie
163
1
0
04 Jun 2025
InterMamba: Efficient Human-Human Interaction Generation with Adaptive Spatio-Temporal Mamba
InterMamba: Efficient Human-Human Interaction Generation with Adaptive Spatio-Temporal Mamba
Zizhao Wu
Yingying Sun
Yiming Chen
Xiaoling Gu
Ruyu Liu
Jiazhou Chen
Mamba
50
0
0
03 Jun 2025
ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding
ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding
Junliang Ye
Zhengyi Wang
Ruowen Zhao
Shenghao Xie
Jun Zhu
74
0
0
02 Jun 2025
Self-supervised Latent Space Optimization with Nebula Variational Coding
Self-supervised Latent Space Optimization with Nebula Variational Coding
Yida Wang
D. Tan
Nassir Navab
Federico Tombari
DRLSSL
85
1
0
02 Jun 2025
Enhancing Interpretable Image Classification Through LLM Agents and Conditional Concept Bottleneck Models
Enhancing Interpretable Image Classification Through LLM Agents and Conditional Concept Bottleneck Models
Yiwen Jiang
Deval Mehta
Wei Feng
Zongyuan Ge
66
0
0
02 Jun 2025
Tomographic Foundation Model -- FORCE: Flow-Oriented Reconstruction Conditioning Engine
Tomographic Foundation Model -- FORCE: Flow-Oriented Reconstruction Conditioning Engine
Wenjun Xia
Chuang Niu
Ge Wang
MedImOOD
34
0
0
02 Jun 2025
Generative Next POI Recommendation with Semantic ID
Generative Next POI Recommendation with Semantic ID
D. Wang
Yuxi Huang
Shen Gao
Yifan Wang
Chengrui Huang
Shuo Shang
66
0
0
02 Jun 2025
Ultra-High-Resolution Image Synthesis: Data, Method and Evaluation
Ultra-High-Resolution Image Synthesis: Data, Method and Evaluation
Jinjin Zhang
Qiuyu Huang
Junjie Liu
Xiefan Guo
Di Huang
63
0
0
02 Jun 2025
Automatic Stage Lighting Control: Is it a Rule-Driven Process or Generative Task?
Automatic Stage Lighting Control: Is it a Rule-Driven Process or Generative Task?
Zijian Zhao
Dian Jin
Zijing Zhou
Xiaoyu Zhang
43
0
0
02 Jun 2025
Unraveling Spatio-Temporal Foundation Models via the Pipeline Lens: A Comprehensive Review
Unraveling Spatio-Temporal Foundation Models via the Pipeline Lens: A Comprehensive Review
Yuchen Fang
Hao Miao
Yuxuan Liang
Liwei Deng
Yue Cui
...
Yan Zhao
T. Pedersen
Christian S. Jensen
Xiaofang Zhou
Kai Zheng
AI4TSAI4CE
83
0
0
02 Jun 2025
FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens
FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens
Yiming Zhong
Yumeng Liu
Chuyang Xiao
Zemin Yang
Youzhuo Wang
Yufei Zhu
Ye-ling Shi
Yujing Sun
X. Zhu
Yuexin Ma
70
0
0
02 Jun 2025
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
Jialong Zuo
Shengpeng Ji
Minghui Fang
Mingze Li
Ziyue Jiang
Xize Cheng
Xiaoda Yang
Chen Feiyang
Xinyu Duan
Zhou Zhao
50
0
0
01 Jun 2025
Humanoid World Models: Open World Foundation Models for Humanoid Robotics
Humanoid World Models: Open World Foundation Models for Humanoid Robotics
Muhammad Qasim Ali
Aditya Sridhar
Shahbuland Matiana
Alex Wong
Mohammad Al-Sharman
VGenVLM
56
0
0
01 Jun 2025
Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues
Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues
Youngmin Kim
Jiwan Chung
Jisoo Kim
Sunghyun Lee
Sangkyu Lee
Junhyeok Kim
Cheoljong Yang
Youngjae Yu
VGen
37
0
0
01 Jun 2025
Concept-Centric Token Interpretation for Vector-Quantized Generative Models
Concept-Centric Token Interpretation for Vector-Quantized Generative Models
Tianze Yang
Yucheng Shi
Mengnan Du
Xuansheng Wu
Qiaoyu Tan
Jin Sun
Ninghao Liu
35
0
0
31 May 2025
MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation
MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation
Yakun Song
Jiawei Chen
Xiaobin Zhuang
Chenpeng Du
Ziyang Ma
...
Dongya Jia
Zhuo Chen
Yuping Wang
Yuxuan Wang
Xie Chen
43
0
0
31 May 2025
Probabilistic Forecasting for Building Energy Systems using Time-Series Foundation Models
Probabilistic Forecasting for Building Energy Systems using Time-Series Foundation Models
Young-Jin Park
François Germain
Jing Liu
Ye Wang
T. Koike-Akino
Gordon Wichern
Navid Azizan
C. Laughman
Ankush Chakrabarty
AI4TSAI4CE
42
0
0
31 May 2025
On Designing Diffusion Autoencoders for Efficient Generation and Representation Learning
On Designing Diffusion Autoencoders for Efficient Generation and Representation Learning
Magdalena Proszewska
Nikolay Malkin
N. Siddharth
DiffM
43
0
0
30 May 2025
SignBot: Learning Human-to-Humanoid Sign Language Interaction
SignBot: Learning Human-to-Humanoid Sign Language Interaction
Guanren Qiao
Sixu Lin
Ronglai Zuo Zhizheng Wu
Kui Jia
Kui Jia
Guiliang Liu
SLR
64
0
0
30 May 2025
DLM-One: Diffusion Language Models for One-Step Sequence Generation
DLM-One: Diffusion Language Models for One-Step Sequence Generation
Tianqi Chen
Shujian Zhang
Mingyuan Zhou
42
0
0
30 May 2025
Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?
Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?
Jiwan Chung
Janghan Yoon
J. S. Park
Sangeyl Lee
Joowon Yang
Sooyeon Park
Youngjae Yu
56
0
0
30 May 2025
Semantics-Aware Human Motion Generation from Audio Instructions
Semantics-Aware Human Motion Generation from Audio Instructions
Zi-An Wang
Shihao Zou
Shiyao Yu
Mingyuan Zhang
Chao Dong
VGen
39
0
0
29 May 2025
Normalizing Flows are Capable Models for RL
Normalizing Flows are Capable Models for RL
Raj Ghugare
Benjamin Eysenbach
OffRLAI4CE
95
0
0
29 May 2025
Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation
Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation
Jihai Zhang
Tianle Li
Linjie Li
Zhengyuan Yang
Yu Cheng
82
1
0
29 May 2025
EAD: An EEG Adapter for Automated Classification
EAD: An EEG Adapter for Automated Classification
Pushapdeep Singh
Jyoti Nigam
Medicherla Vamsi Krishna
Arnav V. Bhavsar
A. Nigam
17
0
0
29 May 2025
Fine-Tuning Next-Scale Visual Autoregressive Models with Group Relative Policy Optimization
Fine-Tuning Next-Scale Visual Autoregressive Models with Group Relative Policy Optimization
Matteo Gallici
Haitz Sáez de Ocáriz Borde
51
0
0
29 May 2025
MGE-LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction
MGE-LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction
Yunkee Chae
Kyogu Lee
66
0
0
29 May 2025
MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation
MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation
Siyuan Wang
Jiawei Liu
Wei Wang
Yeying Jin
Jinsong Du
Zhi Han
SLRVGen
84
0
0
29 May 2025
Let's Predict Sentence by Sentence
Let's Predict Sentence by Sentence
Hyeonbin Hwang
Byeongguk Jeon
Seungone Kim
Jiyeon Kim
Hoyeon Chang
Sohee Yang
Seungpil Won
Dohaeng Lee
Youbin Ahn
Minjoon Seo
96
0
0
28 May 2025
PacTure: Efficient PBR Texture Generation on Packed Views with Visual Autoregressive Models
PacTure: Efficient PBR Texture Generation on Packed Views with Visual Autoregressive Models
Fan Fei
Jiajun Tang
Fei-Peng Tian
Boxin Shi
P. Tan
DiffM
52
0
0
28 May 2025
ACE-Step: A Step Towards Music Generation Foundation Model
ACE-Step: A Step Towards Music Generation Foundation Model
Junmin Gong
Sean Zhao
Sen Wang
S. Xu
Joe Guo
44
2
0
28 May 2025
Improving Brain-to-Image Reconstruction via Fine-Grained Text Bridging
Improving Brain-to-Image Reconstruction via Fine-Grained Text Bridging
Runze Xia
Shuo Feng
Renzhi Wang
Congchi Yin
Xuyun Wen
Piji Li
DiffM
42
0
0
28 May 2025
AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion
AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion
Junqi Zhao
Jinzheng Zhao
Haohe Liu
Yun Chen
Lu Han
Xubo Liu
Mark D. Plumbley
Wenwu Wang
DiffM
48
0
0
28 May 2025
MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning
MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning
Hongjia Liu
Rongzhen Zhao
Haohan Chen
Joni Pajarinen
OCLVLM
129
0
0
27 May 2025
Sci-Fi: Symmetric Constraint for Frame Inbetweening
Sci-Fi: Symmetric Constraint for Frame Inbetweening
Liuhan Chen
Xiaodong Cun
Xiaoyu Li
Xianyi He
Shenghai Yuan
Jie Chen
Ying Shan
Lichao Sun
VGen
84
0
0
27 May 2025
What Do Latent Action Models Actually Learn?
What Do Latent Action Models Actually Learn?
Chuheng Zhang
Tim Pearce
Pushi Zhang
Kaixin Wang
Xiaoyu Chen
Wei Shen
Li Zhao
Jiang Bian
19
0
0
27 May 2025
Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective
Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective
Yang Zhang
Xinran Li
Jianing Ye
Delin Qu
Shuang Qiu
Chongjie Zhang
Xiu Li
Chenjia Bai
52
0
0
27 May 2025
LeDiFlow: Learned Distribution-guided Flow Matching to Accelerate Image Generation
LeDiFlow: Learned Distribution-guided Flow Matching to Accelerate Image Generation
Pascal Zwick
Nils Friederich
Maximilian Beichter
Lennart Hilbert
Ralf Mikut
Oliver Bringmann
MedIm
41
0
0
27 May 2025
Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech
Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech
Nam-Gyu Kim
Deok-Hyeon Cho
Seung-Bin Kim
Seong-Whan Lee
74
0
0
27 May 2025
StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation
StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation
Yi Wu
Lingting Zhu
Shengju Qian
Lei Liu
Wandi Qiao
Lequan Yu
Bin Li
77
0
0
26 May 2025
Knowledge-Aligned Counterfactual-Enhancement Diffusion Perception for Unsupervised Cross-Domain Visual Emotion Recognition
Knowledge-Aligned Counterfactual-Enhancement Diffusion Perception for Unsupervised Cross-Domain Visual Emotion Recognition
Wen Yin
Yong Wang
Guiduo Duan
Dongyang Zhang
Xin Hu
Yuan-Fang Li
Tao He
127
0
0
26 May 2025
Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression
Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression
Kunjun Li
Zigeng Chen
Cheng-Yen Yang
Jenq-Neng Hwang
95
0
0
26 May 2025
Discrete Markov Bridge
Discrete Markov Bridge
Hengli Li
Yuxuan Wang
Song-Chun Zhu
Ying Nian Wu
Zilong Zheng
DiffM
73
0
0
26 May 2025
Advancing Limited-Angle CT Reconstruction Through Diffusion-Based Sinogram Completion
Advancing Limited-Angle CT Reconstruction Through Diffusion-Based Sinogram Completion
Jiaqi Guo
S. Tapia
Aggelos K. Katsaggelos
DiffMMedIm
34
0
0
26 May 2025
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Seong-Whan Lee
65
0
0
26 May 2025
Information-theoretic Generalization Analysis for VQ-VAEs: A Role of Latent Variables
Information-theoretic Generalization Analysis for VQ-VAEs: A Role of Latent Variables
Futoshi Futami
Masahiro Fujisawa
DRLCML
97
0
0
26 May 2025
AniCrafter: Customizing Realistic Human-Centric Animation via Avatar-Background Conditioning in Video Diffusion Models
AniCrafter: Customizing Realistic Human-Centric Animation via Avatar-Background Conditioning in Video Diffusion Models
Muyao Niu
Mingdeng Cao
Yifan Zhan
Qingtian Zhu
Mingze Ma
Jiancheng Zhao
Yanhong Zeng
Zhihang Zhong
Xiao Sun
Yinqiang Zheng
DiffMVGen
66
0
0
26 May 2025
Previous
12345...646566
Next