ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.00937
  4. Cited By
Neural Discrete Representation Learning

Neural Discrete Representation Learning

2 November 2017
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
    BDL
    SSL
    OCL
ArXivPDFHTML

Papers citing "Neural Discrete Representation Learning"

50 / 2,785 papers shown
Title
Enabling Synergistic Full-Body Control in Prompt-Based Co-Speech Motion
  Generation
Enabling Synergistic Full-Body Control in Prompt-Based Co-Speech Motion Generation
Bohong Chen
Yumeng Li
Yao-Xiang Ding
Tianjia Shao
Kun Zhou
49
7
0
01 Oct 2024
LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details
LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details
Jian Yang
Xukun Wang
Wentao Wang
Guoming Li
Qihang Fang
Ruihong Yuan
Tianyang Wang
Jason Zhaoxin Fan
Yeying Jin
Zhaoxin Fan
VGen
52
1
0
01 Oct 2024
Recent Advances in Speech Language Models: A Survey
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
61
17
0
01 Oct 2024
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Jie Cheng
Ruixi Qiao
Gang Xiong
Binhua Li
Yingwei Ma
Binhua Li
Yongbin Li
Yisheng Lv
OffRL
OnRL
LM&Ro
50
3
0
01 Oct 2024
MaskMamba: A Hybrid Mamba-Transformer Model for Masked Image Generation
MaskMamba: A Hybrid Mamba-Transformer Model for Masked Image Generation
Wenchao Chen
Liqiang Niu
Ziyao Lu
Fandong Meng
Jie Zhou
Mamba
40
4
0
30 Sep 2024
Text-driven Human Motion Generation with Motion Masked Diffusion Model
Text-driven Human Motion Generation with Motion Masked Diffusion Model
Xingyu Chen
DiffM
VGen
45
2
0
29 Sep 2024
Storynizor: Consistent Story Generation via Inter-Frame Synchronized and
  Shuffled ID Injection
Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection
Yuhang Ma
Wenting Xu
Chaoyi Zhao
Keqiang Sun
Qinfeng Jin
Zeng Zhao
Changjie Fan
Zhipeng Hu
VGen
DiffM
37
1
0
29 Sep 2024
Multi-sensor Learning Enables Information Transfer across Different
  Sensory Data and Augments Multi-modality Imaging
Multi-sensor Learning Enables Information Transfer across Different Sensory Data and Augments Multi-modality Imaging
Lingting Zhu
Yizheng Chen
Lianli Liu
Lei Xing
Lequan Yu
36
1
0
28 Sep 2024
Conditional Image Synthesis with Diffusion Models: A Survey
Conditional Image Synthesis with Diffusion Models: A Survey
Zheyuan Zhan
Defang Chen
Jian-Ping Mei
Zhenghe Zhao
Jiawei Chen
Chun Chen
Siwei Lyu
Can Wang
VLM
53
5
0
28 Sep 2024
Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for
  Neural Codec Language Models
Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models
Wenrui Liu
Zhifang Guo
Jin Xu
Yuanjun Lv
Yunfei Chu
Zhou Zhao
Junyang Lin
59
1
0
28 Sep 2024
Diverse Code Query Learning for Speech-Driven Facial Animation
Diverse Code Query Learning for Speech-Driven Facial Animation
Chunzhi Gu
Shigeru Kuriyama
Katsuya Hotta
DiffM
33
0
0
27 Sep 2024
Discrete Policy: Learning Disentangled Action Space for Multi-Task Robotic Manipulation
Discrete Policy: Learning Disentangled Action Space for Multi-Task Robotic Manipulation
Kun Wu
Yichen Zhu
Jinming Li
Junjie Wen
Ning Liu
Zhiyuan Xu
Qinru Qiu
48
4
0
27 Sep 2024
EgoLM: Multi-Modal Language Model of Egocentric Motions
EgoLM: Multi-Modal Language Model of Egocentric Motions
Fangzhou Hong
Vladimir Guzov
Hyo Jin Kim
Yuting Ye
Richard Newcombe
Ziwei Liu
Lingni Ma
45
4
0
26 Sep 2024
MIO: A Foundation Model on Multimodal Tokens
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLM
AuLLM
60
11
0
26 Sep 2024
MoGenTS: Motion Generation based on Spatial-Temporal Joint Modeling
MoGenTS: Motion Generation based on Spatial-Temporal Joint Modeling
Weihao Yuan
Weichao Shen
Yisheng He
Yuan Dong
Xiaodong Gu
Zilong Dong
Liefeng Bo
Qixing Huang
MQ
36
2
0
26 Sep 2024
Learning Quantized Adaptive Conditions for Diffusion Models
Learning Quantized Adaptive Conditions for Diffusion Models
Yuchen Liang
Yuchuan Tian
Lei Yu
Huao Tang
Jie Hu
Xiangzhong Fang
Hanting Chen
DiffM
39
0
0
26 Sep 2024
FlowMAC: Conditional Flow Matching for Audio Coding at Low Bit Rates
FlowMAC: Conditional Flow Matching for Audio Coding at Low Bit Rates
N. Pia
Martin Strauss
M. Multrus
B. Edler
44
0
0
26 Sep 2024
Exploring Semantic Clustering in Deep Reinforcement Learning for Video
  Games
Exploring Semantic Clustering in Deep Reinforcement Learning for Video Games
Liang Zhang
Justin Lieffers
A. Pyarelal
34
0
0
25 Sep 2024
ChatCam: Empowering Camera Control through Conversational AI
ChatCam: Empowering Camera Control through Conversational AI
Xinhang Liu
Yu-Wing Tai
Chi-Keung Tang
VGen
35
2
0
25 Sep 2024
Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts
  in Diffusion Models
Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Models
Deepak Sridhar
Nuno Vasconcelos
DiffM
36
0
0
25 Sep 2024
MonoFormer: One Transformer for Both Diffusion and Autoregression
MonoFormer: One Transformer for Both Diffusion and Autoregression
Chuyang Zhao
Yuxing Song
Wenhao Wang
Haocheng Feng
Errui Ding
Yifan Sun
Xinyan Xiao
Jingdong Wang
DiffM
39
18
0
24 Sep 2024
MaskBit: Embedding-free Image Generation via Bit Tokens
MaskBit: Embedding-free Image Generation via Bit Tokens
Mark Weber
Lijun Yu
Qihang Yu
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
DiffM
51
31
0
24 Sep 2024
Exploring VQ-VAE with Prosody Parameters for Speaker Anonymization
Exploring VQ-VAE with Prosody Parameters for Speaker Anonymization
Sotheara Leang
Anderson Augusma
E. Castelli
Frédérique Letué
Sethserey Sam
Dominique Vaufreydaz
34
0
0
24 Sep 2024
ManiNeg: Manifestation-guided Multimodal Pretraining for Mammography
  Classification
ManiNeg: Manifestation-guided Multimodal Pretraining for Mammography Classification
Xujun Li
Xin Wei
Jing Jiang
Danxiang Chen
Wei Zhang
Jinpeng Li
38
0
0
24 Sep 2024
CrowdSurfer: Sampling Optimization Augmented with Vector-Quantized Variational AutoEncoder for Dense Crowd Navigation
CrowdSurfer: Sampling Optimization Augmented with Vector-Quantized Variational AutoEncoder for Dense Crowd Navigation
Naman Kumar
Antareep Singha
Laksh Nanwani
Dhruv Potdar
Tarun R
Fatemeh Rastgar
Simon Idoko
Arun Kumar Singh
K. Madhava Krishna
178
0
0
24 Sep 2024
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
Yu Zhang
Ziyue Jiang
Ruiqi Li
Changhao Pan
Jinzheng He
Rongjie Huang
Chuxin Wang
Zhou Zhao
DiffM
VLM
57
5
0
24 Sep 2024
Skills Made to Order: Efficient Acquisition of Robot Cooking Skills
  Guided by Multiple Forms of Internet Data
Skills Made to Order: Efficient Acquisition of Robot Cooking Skills Guided by Multiple Forms of Internet Data
Mrinal Verghese
C. Atkeson
39
0
0
23 Sep 2024
DepthART: Monocular Depth Estimation as Autoregressive Refinement Task
DepthART: Monocular Depth Estimation as Autoregressive Refinement Task
Bulat Gabdullin
Nina Konovalova
Nikolay Patakin
Dmitry Senushkin
Anton Konushin
MDE
40
0
0
23 Sep 2024
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Hong Chen
Xin Wang
Yuwei Zhou
Bin Huang
Yipeng Zhang
Wei Feng
Houlun Chen
Zeyang Zhang
Siao Tang
Wenwu Zhu
DiffM
55
7
0
23 Sep 2024
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
Weifeng Lin
Xinyu Wei
Renrui Zhang
Le Zhuo
Shitian Zhao
...
Junlin Xie
Junlin Xie
Yu Qiao
Peng Gao
Hongsheng Li
MLLM
DiffM
66
11
0
23 Sep 2024
Disentanglement with Factor Quantized Variational Autoencoders
Disentanglement with Factor Quantized Variational Autoencoders
Gulcin Baykal
M. Kandemir
Gözde B. Ünal
CoGe
DRL
44
0
0
23 Sep 2024
EQ-CBM: A Probabilistic Concept Bottleneck with Energy-based Models and
  Quantized Vectors
EQ-CBM: A Probabilistic Concept Bottleneck with Energy-based Models and Quantized Vectors
Sangwon Kim
Dasom Ahn
B. Ko
In-su Jang
Kwang-Ju Kim
35
4
0
22 Sep 2024
Low-Light Enhancement Effect on Classification and Detection: An
  Empirical Study
Low-Light Enhancement Effect on Classification and Detection: An Empirical Study
Xu Wu
Zhihui Lai
Zhou Jie
Can Gao
Xianxu Hou
Ya-Nan Zhang
Linlin Shen
25
0
0
22 Sep 2024
Pomo3D: 3D-Aware Portrait Accessorizing and More
Pomo3D: 3D-Aware Portrait Accessorizing and More
Tzu-Chieh Liu
Chih-Ting Liu
Shao-Yi Chien
38
0
0
22 Sep 2024
GroupDiff: Diffusion-based Group Portrait Editing
GroupDiff: Diffusion-based Group Portrait Editing
Yuming Jiang
Nanxuan Zhao
Qing Liu
Krishna Kumar Singh
Shuai Yang
Chen Change Loy
Ziwei Liu
DiffM
41
1
0
22 Sep 2024
R-AIF: Solving Sparse-Reward Robotic Tasks from Pixels with Active
  Inference and World Models
R-AIF: Solving Sparse-Reward Robotic Tasks from Pixels with Active Inference and World Models
Viet Dung Nguyen
Zhizhuo Yang
Christopher L. Buckley
Alexander Ororbia
41
2
0
21 Sep 2024
PoseAugment: Generative Human Pose Data Augmentation with Physical
  Plausibility for IMU-based Motion Capture
PoseAugment: Generative Human Pose Data Augmentation with Physical Plausibility for IMU-based Motion Capture
Zhuojun Li
Chun Yu
Chen Liang
Yuanchun Shi
3DH
36
1
0
21 Sep 2024
Towards the Discovery of Down Syndrome Brain Biomarkers Using Generative
  Models
Towards the Discovery of Down Syndrome Brain Biomarkers Using Generative Models
Jordi Malé
Juan Fortea
Mateus Rozalem Aranha
Yann Heuzé
Neus Martínez-Abadías
Xavier Sevillano
DiffM
34
1
0
20 Sep 2024
Audio Codec Augmentation for Robust Collaborative Watermarking of Speech
  Synthesis
Audio Codec Augmentation for Robust Collaborative Watermarking of Speech Synthesis
Lauri Juvela
Xin Eric Wang
39
3
0
20 Sep 2024
T2M-X: Learning Expressive Text-to-Motion Generation from Partially
  Annotated Data
T2M-X: Learning Expressive Text-to-Motion Generation from Partially Annotated Data
Mingdian Liu
Y. Liu
Gurunandan Krishnan
Karl S Bayer
Bing Zhou
VGen
49
0
0
20 Sep 2024
Using High-Level Patterns to Estimate How Humans Predict a Robot will Behave
Using High-Level Patterns to Estimate How Humans Predict a Robot will Behave
Sagar Parekh
Lauren Bramblett
Nicola Bezzo
Dylan P. Losey
37
0
0
20 Sep 2024
BGDB: Bernoulli-Gaussian Decision Block with Improved Denoising
  Diffusion Probabilistic Models
BGDB: Bernoulli-Gaussian Decision Block with Improved Denoising Diffusion Probabilistic Models
Chengkun Sun
Jinqian Pan
Russell Stevens Terry
Jiang Bian
Jie Xu
DiffM
28
0
0
19 Sep 2024
DNI: Dilutional Noise Initialization for Diffusion Video Editing
DNI: Dilutional Noise Initialization for Diffusion Video Editing
Sunjae Yoon
Gwanhyeong Koo
Ji Woo Hong
Chang D. Yoo
DiffM
52
2
0
19 Sep 2024
NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector
  Quantization
NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization
Zhikang Niu
Sanyuan Chen
Long Zhou
Ziyang Ma
Xie Chen
Shujie Liu
29
2
0
19 Sep 2024
Is Tokenization Needed for Masked Particle Modelling?
Is Tokenization Needed for Masked Particle Modelling?
Matthew Leigh
Samuel Klein
François Charton
Tobias Golling
Lukas Heinrich
Michael Kagan
Ines Ochoa
Margarita Osadchy
43
7
0
19 Sep 2024
Unlocking Reasoning Potential in Large Langauge Models by Scaling
  Code-form Planning
Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning
Jiaxin Wen
Jian Guan
Hongning Wang
Wei Wu
Minlie Huang
ReLM
OffRL
LRM
33
7
0
19 Sep 2024
Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality
  Speech LLM Training and Inference
Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference
Edresson Casanova
Ryan Langman
Paarth Neekhara
Shehzeen Samarah Hussain
Jason Chun Lok Li
Subhankar Ghosh
Ante Jukić
Sang-gil Lee
AuLLM
44
2
0
18 Sep 2024
NT-ViT: Neural Transcoding Vision Transformers for EEG-to-fMRI Synthesis
NT-ViT: Neural Transcoding Vision Transformers for EEG-to-fMRI Synthesis
Romeo Lanzino
Federico Fontana
Luigi Cinque
Francesco Scarcello
Atsuto Maki
MedIm
34
3
0
18 Sep 2024
Speaking from Coarse to Fine: Improving Neural Codec Language Model via
  Multi-Scale Speech Coding and Generation
Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation
Haohan Guo
Fenglong Xie
Dongchao Yang
Xixin Wu
Helen Meng
43
2
0
18 Sep 2024
3DFacePolicy: Speech-Driven 3D Facial Animation with Diffusion Policy
3DFacePolicy: Speech-Driven 3D Facial Animation with Diffusion Policy
Xuanmeng Sha
Liyun Zhang
Tomohiro Mashita
Yuki Uranishi
VGen
27
0
0
17 Sep 2024
Previous
123...121314...545556
Next