ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.00937
  4. Cited By
Neural Discrete Representation Learning

Neural Discrete Representation Learning

2 November 2017
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
    BDL
    SSL
    OCL
ArXivPDFHTML

Papers citing "Neural Discrete Representation Learning"

50 / 2,785 papers shown
Title
BAD: Bidirectional Auto-regressive Diffusion for Text-to-Motion
  Generation
BAD: Bidirectional Auto-regressive Diffusion for Text-to-Motion Generation
Seyed Rohollah Hosseyni
Ali Ahmad Rahmani
S. J. Seyedmohammadi
Sanaz Seyedin
Arash Mohammadi
DiffM
53
7
0
17 Sep 2024
Learning Source Disentanglement in Neural Audio Codec
Learning Source Disentanglement in Neural Audio Codec
Xiaoyu Bie
Xubo Liu
Gaël Richard
34
1
0
17 Sep 2024
RenderWorld: World Model with Self-Supervised 3D Label
RenderWorld: World Model with Self-Supervised 3D Label
Ziyang Yan
Wenzhen Dong
Yihua Shao
Yuhang Lu
Liu Haiyang
...
Haozhe Wang
Zhe Wang
Yan Wang
Fabio Remondino
Yuexin Ma
3DV
VGen
72
13
0
17 Sep 2024
A Missing Data Imputation GAN for Character Sprite Generation
A Missing Data Imputation GAN for Character Sprite Generation
Flávio R. S. Coutinho
Luiz Chaimowicz
GAN
39
0
0
16 Sep 2024
MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion
MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion
Lehong Wu
Lilang Lin
Jiahang Zhang
Yi Ma
Jiaying Liu
DiffM
59
0
0
16 Sep 2024
LASERS: LAtent Space Encoding for Representations with Sparsity for
  Generative Modeling
LASERS: LAtent Space Encoding for Representations with Sparsity for Generative Modeling
Xin Li
Anand Sarwate
37
0
0
16 Sep 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis
  with Distilled Time-Varying Style Diffusion
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Yinghao Aaron Li
Xilin Jiang
Cong Han
N. Mesgarani
DiffM
31
5
0
16 Sep 2024
2S-ODIS: Two-Stage Omni-Directional Image Synthesis by Geometric
  Distortion Correction
2S-ODIS: Two-Stage Omni-Directional Image Synthesis by Geometric Distortion Correction
Atsuya Nakata
Takao Yamanaka
32
2
0
16 Sep 2024
GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion
GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion
Vitor Campagnolo Guizilini
P. Tokmakov
Achal Dave
Rares Andrei Ambrus
DiffM
38
2
0
15 Sep 2024
Prevailing Research Areas for Music AI in the Era of Foundation Models
Prevailing Research Areas for Music AI in the Era of Foundation Models
Megan Wei
M. Modrzejewski
Aswin Sivaraman
Dorien Herremans
MedIm
48
1
0
14 Sep 2024
Visuo-Tactile Zero-Shot Object Recognition with Vision-Language Model
Visuo-Tactile Zero-Shot Object Recognition with Vision-Language Model
Shiori Ueda
Atsushi Hashimoto
Masashi Hamaya
Kazutoshi Tanaka
Hideo Saito
46
1
0
14 Sep 2024
SafeEar: Content Privacy-Preserving Audio Deepfake Detection
SafeEar: Content Privacy-Preserving Audio Deepfake Detection
Xinfeng Li
Kai Li
Yifan Zheng
Chen Yan
Xiaoyu Ji
Wenyuan Xu
35
14
0
14 Sep 2024
Detect Fake with Fake: Leveraging Synthetic Data-driven Representation
  for Synthetic Image Detection
Detect Fake with Fake: Leveraging Synthetic Data-driven Representation for Synthetic Image Detection
Hina Otake
Yoshihiro Fukuhara
Yoshiki Kubotani
Shigeo Morishima
ViT
66
0
0
13 Sep 2024
Investigating Disentanglement in a Phoneme-level Speech Codec for
  Prosody Modeling
Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
Sotirios Karapiperis
Nikolaos Ellinas
Alexandra Vioni
Junkwang Oh
Gunu Jho
Inchul Hwang
S. Raptis
36
0
0
13 Sep 2024
Anytime Continual Learning for Open Vocabulary Classification
Anytime Continual Learning for Open Vocabulary Classification
Zhen Zhu
Yiming Gong
Derek Hoiem
VLM
47
1
0
13 Sep 2024
Touch2Touch: Cross-Modal Tactile Generation for Object Manipulation
Touch2Touch: Cross-Modal Tactile Generation for Object Manipulation
Samanta Rodriguez
Yiming Dou
Miquel Oller
Andrew Owens
Nima Fazeli
DiffM
52
7
0
12 Sep 2024
MagicStyle: Portrait Stylization Based on Reference Image
MagicStyle: Portrait Stylization Based on Reference Image
Zhaoli Deng
Kaibin Zhou
Fanyi Wang
Zhenpeng Mi
DiffM
54
1
0
12 Sep 2024
Diffusion-Based Image-to-Image Translation by Noise Correction via
  Prompt Interpolation
Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation
Junsung Lee
Minsoo Kang
Bohyung Han
DiffM
VLM
31
3
0
12 Sep 2024
ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE
ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE
Sichun Wu
Kazi Injamamul Haque
Zerrin Yumak
VGen
35
2
0
12 Sep 2024
Towards Predicting Temporal Changes in a Patient's Chest X-ray Images based on Electronic Health Records
Towards Predicting Temporal Changes in a Patient's Chest X-ray Images based on Electronic Health Records
Daeun Kyung
J. Kim
Tackeun Kim
Edward Choi
MedIm
DiffM
47
1
0
11 Sep 2024
Learning Generative Interactive Environments By Trained Agent
  Exploration
Learning Generative Interactive Environments By Trained Agent Exploration
Naser Kazemi
N. Savov
Danda Paudel
Luc Van Gool
45
2
0
10 Sep 2024
G3PT: Unleash the power of Autoregressive Modeling in 3D Generation via
  Cross-scale Querying Transformer
G3PT: Unleash the power of Autoregressive Modeling in 3D Generation via Cross-scale Querying Transformer
Jinzhi Zhang
Feng Xiong
Mu Xu
41
6
0
10 Sep 2024
Multi-Source Music Generation with Latent Diffusion
Multi-Source Music Generation with Latent Diffusion
Zhongweiyang Xu
Debottam Dutta
Yu-Lin Wei
Romit Roy Choudhury
DiffM
45
1
0
10 Sep 2024
Latent 3D Brain MRI Counterfactual
Latent 3D Brain MRI Counterfactual
Wei Peng
Tian Xia
Fabio De Sousa Ribeiro
Tomas Bosschieter
Ehsan Adeli
Qingyu Zhao
Ben Glocker
K. Pohl
CML
MedIm
60
1
0
09 Sep 2024
On the Convergence Analysis of Over-Parameterized Variational
  Autoencoders: A Neural Tangent Kernel Perspective
On the Convergence Analysis of Over-Parameterized Variational Autoencoders: A Neural Tangent Kernel Perspective
Li Wang
Wei Huang
DRL
31
0
0
09 Sep 2024
Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
Jiaxin Cheng
Zixu Zhao
Tong He
Tianjun Xiao
Yicong Zhou
Zheng Zhang
DiffM
60
0
0
07 Sep 2024
Synergy and Synchrony in Couple Dances
Synergy and Synchrony in Couple Dances
V. Maluleke
Lea Müller
Jathushan Rajasegaran
Georgios Pavlakos
Shiry Ginosar
Angjoo Kanazawa
Jitendra Malik
42
2
0
06 Sep 2024
Investigating Neural Audio Codecs for Speech Language Model-Based Speech
  Generation
Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Jiaqi Li
Dongmei Wang
Xiaofei Wang
Yao Qian
Long Zhou
...
Junkun Chen
Sheng Zhao
Jinyu Li
Zhizheng Wu
Michael Zeng
AuLLM
43
3
0
06 Sep 2024
LAST: Language Model Aware Speech Tokenization
LAST: Language Model Aware Speech Tokenization
A. Turetzky
Yossi Adi
42
3
0
05 Sep 2024
Organized Grouped Discrete Representation for Object-Centric Learning
Organized Grouped Discrete Representation for Object-Centric Learning
Rongzhen Zhao
V. Wang
Arno Solin
Joni Pajarinen
VOS
OCL
54
1
0
05 Sep 2024
OccLLaMA: An Occupancy-Language-Action Generative World Model for
  Autonomous Driving
OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving
Julong Wei
Shanshuai Yuan
Pengfei Li
Qingda Hu
Zhongxue Gan
Wenchao Ding
VLM
39
17
0
05 Sep 2024
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
Hao-Han Guo
Kun Liu
Fei-Yu Shen
Yi-Chen Wu
Xu Tang
Kun Xie
Kai-Tuo Xu
Kun Xie
Kai-Tuo Xu
45
22
0
05 Sep 2024
Dynamic Motion Synthesis: Masked Audio-Text Conditioned Spatio-Temporal
  Transformers
Dynamic Motion Synthesis: Masked Audio-Text Conditioned Spatio-Temporal Transformers
Sohan Anisetty
James Hays
51
0
0
03 Sep 2024
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video
  Diffusion Model
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
Liuhan Chen
Zongjian Li
Bin Lin
Bin Zhu
Qian Wang
Shenghai Yuan
X. Zhou
Xinhua Cheng
Li Yuan
DiffM
96
14
0
02 Sep 2024
Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in
  Federated Class Continual Learning
Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning
Jinglin Liang
Jin Zhong
Hanlin Gu
Zhongqi Lu
Xingxing Tang
Gang Dai
Shuangping Huang
Lixin Fan
Qiang Yang
DiffM
52
7
0
02 Sep 2024
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec
  Transformer
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Yuancheng Wang
Haoyue Zhan
Liwei Liu
Ruihong Zeng
Haotian Guo
Jiachen Zheng
Qiang Zhang
Shunsi Zhang
Shunsi Zhang
Zhizheng Wu
45
43
0
01 Sep 2024
AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
Zanlin Ni
Yulin Wang
Renping Zhou
Rui Lu
Jiayi Guo
Jinyi Hu
Zhiyuan Liu
Yuan Yao
Gao Huang
50
7
0
31 Aug 2024
Identifying and Clustering Counter Relationships of Team Compositions in
  PvP Games for Efficient Balance Analysis
Identifying and Clustering Counter Relationships of Team Compositions in PvP Games for Efficient Balance Analysis
Chiu-Chou Lin
Yu-Wei Shih
Kuei-Ting Kuo
Yu-Cheng Chen
Chien-Hua Chen
Wei-Chen Chiu
I-Chen Wu
32
0
0
30 Aug 2024
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio
  Language Model
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Zhen Ye
Peiwen Sun
Jiahe Lei
Hongzhan Lin
Xu Tan
...
Jianyi Chen
Jiahao Pan
Qifeng Liu
Yike Guo
Wei Xue
AuLLM
34
14
0
30 Aug 2024
One-Shot Learning Meets Depth Diffusion in Multi-Object Videos
One-Shot Learning Meets Depth Diffusion in Multi-Object Videos
Anisha Jain
VGen
DiffM
MDE
29
1
0
29 Aug 2024
Blending Low and High-Level Semantics of Time Series for Better Masked
  Time Series Generation
Blending Low and High-Level Semantics of Time Series for Better Masked Time Series Generation
Johan Vik Mathisen
Erlend Lokna
Daesoo Lee
Erlend Aune
BDL
AI4TS
29
0
0
29 Aug 2024
Latent-EnSF: A Latent Ensemble Score Filter for High-Dimensional Data
  Assimilation with Sparse Observation Data
Latent-EnSF: A Latent Ensemble Score Filter for High-Dimensional Data Assimilation with Sparse Observation Data
Phillip Si
Peng Chen
35
1
0
29 Aug 2024
A Simple and Generalist Approach for Panoptic Segmentation
A Simple and Generalist Approach for Panoptic Segmentation
Nedyalko Prisadnikov
Wouter Van Gansbeke
Danda Pani Paudel
Luc Van Gool
VLM
53
0
0
29 Aug 2024
BELT-2: Bootstrapping EEG-to-Language representation alignment for
  multi-task brain decoding
BELT-2: Bootstrapping EEG-to-Language representation alignment for multi-task brain decoding
Jinzhao Zhou
Yiqun Duan
Fred Chang
T. Do
Yu-Kai Wang
Chin-Teng Lin
30
2
0
28 Aug 2024
Merging and Splitting Diffusion Paths for Semantically Coherent
  Panoramas
Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas
Fabio Quattrini
Vittorio Pippi
Silvia Cascianelli
Rita Cucchiara
48
3
0
28 Aug 2024
DIFR3CT: Latent Diffusion for Probabilistic 3D CT Reconstruction from
  Few Planar X-Rays
DIFR3CT: Latent Diffusion for Probabilistic 3D CT Reconstruction from Few Planar X-Rays
Yiran Sun
Hana Baroudi
Tucker Netherton
Laurence Court
Osama Mawlawi
Ashok Veeraraghavan
Guha Balakrishnan
DiffM
MedIm
44
3
0
27 Aug 2024
Alfie: Democratising RGBA Image Generation With No $$$
Alfie: Democratising RGBA Image Generation With No
Fabio Quattrini
Vittorio Pippi
Silvia Cascianelli
Rita Cucchiara
DiffM
51
5
0
27 Aug 2024
NeuroLM: A Universal Multi-task Foundation Model for Bridging the Gap between Language and EEG Signals
NeuroLM: A Universal Multi-task Foundation Model for Bridging the Gap between Language and EEG Signals
Wei-Bang Jiang
Yansen Wang
Bao-Liang Lu
Dongsheng Li
50
11
0
27 Aug 2024
TVG: A Training-free Transition Video Generation Method with Diffusion
  Models
TVG: A Training-free Transition Video Generation Method with Diffusion Models
Rui Zhang
Yaosen Chen
Yuegen Liu
Wei Wang
Xuming Wen
Hongxia Wang
DiffM
47
2
0
24 Aug 2024
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech
  Processing Tasks
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
He Huang
Taejin Park
Kunal Dhawan
Ivan Medennikov
Krishna Puvvada
Nithin Rao Koluguri
Weiqing Wang
Jagadeesh Balam
Boris Ginsburg
SSL
AI4TS
35
1
0
23 Aug 2024
Previous
123...131415...545556
Next