ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.00937
  4. Cited By
Neural Discrete Representation Learning

Neural Discrete Representation Learning

2 November 2017
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
    BDL
    SSL
    OCL
ArXivPDFHTML

Papers citing "Neural Discrete Representation Learning"

50 / 2,773 papers shown
Title
Aligned Vector Quantization for Edge-Cloud Collabrative Vision-Language
  Models
Aligned Vector Quantization for Edge-Cloud Collabrative Vision-Language Models
Xiao Liu
Lijun Zhang
Deepak Ganesan
Hui Guan
VLM
35
0
0
08 Nov 2024
Autoregressive Models in Vision: A Survey
Autoregressive Models in Vision: A Survey
Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
...
Mengdi Zhang
Guillermo Sapiro
Jiebo Luo
Ping Luo
Ngai Wong
VGen
53
9
0
08 Nov 2024
Analyzing The Language of Visual Tokens
Analyzing The Language of Visual Tokens
David M. Chan
Rodolfo Corona
J. S. Park
Cheol Jun Cho
Yutong Bai
Trevor Darrell
28
2
0
07 Nov 2024
Image Understanding Makes for A Good Tokenizer for Image Generation
Image Understanding Makes for A Good Tokenizer for Image Generation
Luting Wang
Yang Zhao
Zijian Zhang
Jiashi Feng
Si Liu
Bingyi Kang
VLM
47
4
0
07 Nov 2024
Community Forensics: Using Thousands of Generators to Train Fake Image
  Detectors
Community Forensics: Using Thousands of Generators to Train Fake Image Detectors
Jeongsoo Park
Andrew Owens
47
3
0
06 Nov 2024
Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data
Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data
Seunggeun Chi
Pin-Hao Huang
Enna Sachdeva
Hengbo Ma
Karthik Ramani
Kwonjoon Lee
DiffM
50
2
0
05 Nov 2024
VQ-ACE: Efficient Policy Search for Dexterous Robotic Manipulation via
  Action Chunking Embedding
VQ-ACE: Efficient Policy Search for Dexterous Robotic Manipulation via Action Chunking Embedding
Chenyu Yang
Davide Liconti
Robert K. Katzschmann
47
1
0
05 Nov 2024
Pre-trained Visual Dynamics Representations for Efficient Policy
  Learning
Pre-trained Visual Dynamics Representations for Efficient Policy Learning
Hao Luo
Bohan Zhou
Zongqing Lu
35
1
0
05 Nov 2024
Exploring the Interplay Between Video Generation and World Models in
  Autonomous Driving: A Survey
Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey
Ao Fu
Yi Zhou
Tao Zhou
Yuqing Yang
Bojun Gao
Qun Li
Guobin Wu
Ling Shao
VGen
59
2
0
05 Nov 2024
FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models
FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models
Zhanwei Zhang
Shizhao Sun
Wenxiao Wang
D. Cai
Jiang Bian
AI4CE
41
1
0
05 Nov 2024
Grouped Discrete Representation for Object-Centric Learning
Grouped Discrete Representation for Object-Centric Learning
Rongzhen Zhao
V. Wang
Arno Solin
Joni Pajarinen
BDL
OCL
34
1
0
04 Nov 2024
Addressing Representation Collapse in Vector Quantized Models with One
  Linear Layer
Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Yongxin Zhu
B. Li
Yifei Xin
Linli Xu
46
10
0
04 Nov 2024
Understanding Variational Autoencoders with Intrinsic Dimension and
  Information Imbalance
Understanding Variational Autoencoders with Intrinsic Dimension and Information Imbalance
Charles Camboulin
Diego Doimo
Aldo Glielmo
DRL
72
0
0
04 Nov 2024
MeToken: Uniform Micro-environment Token Boosts Post-Translational
  Modification Prediction
MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction
Cheng Tan
Zhenxiao Cao
Zhangyang Gao
Lirong Wu
Siyuan Li
Yufei Huang
Jun Xia
Bozhen Hu
Stan Z. Li
53
0
0
04 Nov 2024
IRS-Enhanced Secure Semantic Communication Networks: Cross-Layer and
  Context-Awared Resource Allocation
IRS-Enhanced Secure Semantic Communication Networks: Cross-Layer and Context-Awared Resource Allocation
Lingyi Wang
Wei Wu
Fuhui Zhou
Zhijin Qin
Qihui Wu
37
2
0
04 Nov 2024
Bootstrapping Top-down Information for Self-modulating Slot Attention
Bootstrapping Top-down Information for Self-modulating Slot Attention
Dongwon Kim
Seoyeon Kim
Suha Kwak
OCL
ObjD
42
0
0
04 Nov 2024
Transferable Sequential Recommendation via Vector Quantized Meta
  Learning
Transferable Sequential Recommendation via Vector Quantized Meta Learning
Zhenrui Yue
Huimin Zeng
Yang Zhang
Julian McAuley
Dong Wang
DRL
MQ
21
0
0
04 Nov 2024
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Seong-Whan Lee
46
4
0
04 Nov 2024
VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete
  Space via Vector Quantization
VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization
Yiwei Zhang
Jin Gao
Fudong Ge
Guan Luo
Bing Li
Z. Zhang
Haibin Ling
Weiming Hu
57
0
0
03 Nov 2024
HC$^3$L-Diff: Hybrid conditional latent diffusion with high frequency
  enhancement for CBCT-to-CT synthesis
HC3^33L-Diff: Hybrid conditional latent diffusion with high frequency enhancement for CBCT-to-CT synthesis
Shi Yin
Hongqi Tan
Li Ming Chong
Haofeng Liu
Hui Liu
Kang Hao Lee
Jeffrey Kit Loong Tuan
Dean Ho
Yueming Jin
DiffM
MedIm
32
0
0
03 Nov 2024
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual
  Text-to-Speech Synthesis
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Shijia Liao
Yalin Wang
Tianyu Li
Yifan Cheng
Ruoyi Zhang
Rongzhi Zhou
Yijin Xing
AuLLM
43
11
0
02 Nov 2024
Music Foundation Model as Generic Booster for Music Downstream Tasks
Music Foundation Model as Generic Booster for Music Downstream Tasks
Weihsiang Liao
Yuhta Takida
Yukara Ikemiya
Zhi-Wei Zhong
Chieh-Hsin Lai
...
Stefan Uhlich
Taketo Akama
Woosung Choi
Yuichiro Koyama
Yuki Mitsufuji
56
0
0
02 Nov 2024
Randomized Autoregressive Visual Generation
Randomized Autoregressive Visual Generation
Qihang Yu
Ju He
XueQing Deng
Xiaohui Shen
Liang-Chieh Chen
VGen
DiffM
59
31
1
01 Nov 2024
Optimizing Contextual Speech Recognition Using Vector Quantization for
  Efficient Retrieval
Optimizing Contextual Speech Recognition Using Vector Quantization for Efficient Retrieval
Nikolaos Flemotomos
Roger Hsiao
P. Swietojanski
Takaaki Hori
Dogan Can
Xiaodan Zhuang
51
0
0
01 Nov 2024
Constant Acceleration Flow
Constant Acceleration Flow
Dogyun Park
Sojin Lee
S. Kim
Taehoon Lee
Youngjoon Hong
Hyunwoo J. Kim
65
2
0
01 Nov 2024
LLM-Ref: Enhancing Reference Handling in Technical Writing with Large
  Language Models
LLM-Ref: Enhancing Reference Handling in Technical Writing with Large Language Models
Kazi Ahmed Asif Fuad
Lizhong Chen
26
0
0
01 Nov 2024
Enhancing Motion in Text-to-Video Generation with Decomposed Encoding
  and Conditioning
Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning
Penghui Ruan
Pichao Wang
Divya Saxena
Jiannong Cao
Yuhui Shi
DiffM
VGen
41
65
0
31 Oct 2024
Sparsh: Self-supervised touch representations for vision-based tactile
  sensing
Sparsh: Self-supervised touch representations for vision-based tactile sensing
Carolina Higuera
Akash Sharma
Chaithanya Krishna Bodduluri
Taosha Fan
Patrick E. Lancaster
...
Michael Kaess
Byron Boots
Mike Lambeta
Tingfan Wu
Mustafa Mukadam
52
12
0
31 Oct 2024
Identifying Spatio-Temporal Drivers of Extreme Events
Identifying Spatio-Temporal Drivers of Extreme Events
Mohamad Hakam Shams Eddin
Juergen Gall
AI4TS
55
0
0
31 Oct 2024
Breaking Determinism: Fuzzy Modeling of Sequential Recommendation Using
  Discrete State Space Diffusion Model
Breaking Determinism: Fuzzy Modeling of Sequential Recommendation Using Discrete State Space Diffusion Model
Wenjia Xie
Hao Wang
Lefei Zhang
Rui Zhou
Defu Lian
Enhong Chen
DiffM
49
3
0
31 Oct 2024
Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided
  Mixture-of-Experts
Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts
Xiang Deng
Youxin Pang
Xiaochen Zhao
Chao Xu
Lizhen Wang
Hongjiang Xiao
Shi Yan
Hongwen Zhang
Yebin Liu
DiffM
VGen
48
1
0
31 Oct 2024
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection
Xiufeng Song
Xiao Guo
J. Zhang
Qirui Li
Lei Bai
Xiaoming Liu
Guangtao Zhai
Xiaohong Liu
DiffM
VGen
74
9
0
31 Oct 2024
TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation
TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation
Sunjae Yoon
Gwanhyeong Koo
Younghwan Lee
Chang D. Yoo
VGen
80
3
0
31 Oct 2024
Emotion-Guided Image to Music Generation
Emotion-Guided Image to Music Generation
Souraja Kundu
Saket Singh
Yuji Iwahori
28
3
0
29 Oct 2024
Towards Unifying Understanding and Generation in the Era of Vision
  Foundation Models: A Survey from the Autoregression Perspective
Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective
Shenghao Xie
Wenqiang Zu
Mingyang Zhao
Duo Su
Shilong Liu
Ruohua Shi
Guoqi Li
Shanghang Zhang
Lei Ma
LRM
51
3
0
29 Oct 2024
MotionGPT-2: A General-Purpose Motion-Language Model for Motion
  Generation and Understanding
MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding
Yuan Wang
Di Huang
Yaqi Zhang
Wanli Ouyang
J. Jiao
Xuetao Feng
Yan Zhou
Pengfei Wan
Shixiang Tang
Dan Xu
VGen
36
13
0
29 Oct 2024
Diffusion-nested Auto-Regressive Synthesis of Heterogeneous Tabular Data
Diffusion-nested Auto-Regressive Synthesis of Heterogeneous Tabular Data
Hengrui Zhang
Liancheng Fang
Qitian Wu
Philip S. Yu
DiffM
LMTD
41
1
0
28 Oct 2024
Constrained Transformer-Based Porous Media Generation to Spatial
  Distribution of Rock Properties
Constrained Transformer-Based Porous Media Generation to Spatial Distribution of Rock Properties
Zihan Ren
Sanjay Srinivasan
Dustin Crandall
24
0
0
28 Oct 2024
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior
Hanyu Wang
Saksham Suri
Yixuan Ren
Hao Chen
Abhinav Shrivastava
VGen
31
10
0
28 Oct 2024
CardiacNet: Learning to Reconstruct Abnormalities for Cardiac Disease
  Assessment from Echocardiogram Videos
CardiacNet: Learning to Reconstruct Abnormalities for Cardiac Disease Assessment from Echocardiogram Videos
Jiewen Yang
Yiqun Lin
Bin Pu
Jiarong Guo
Xiaowei Xu
Xuelong Li
35
3
0
28 Oct 2024
MAMMAL -- Molecular Aligned Multi-Modal Architecture and Language
MAMMAL -- Molecular Aligned Multi-Modal Architecture and Language
Yoel Shoshan
Moshiko Raboh
Michal Ozery-Flato
Vadim Ratner
Alex Golts
...
Sharon Kurant
Joseph A. Morrone
Parthasarathy Suryanarayanan
Michal Rosen-Zvi
Efrat Hexter
39
1
0
28 Oct 2024
Vector Quantization Prompting for Continual Learning
Vector Quantization Prompting for Continual Learning
L. Jiao
Qiuxia Lai
Yu LI
Qiang Xu
VLM
CLL
41
3
0
27 Oct 2024
Lodge++: High-quality and Long Dance Generation with Vivid Choreography
  Patterns
Lodge++: High-quality and Long Dance Generation with Vivid Choreography Patterns
Ronghui Li
Hongwen Zhang
Yachao Zhang
Yuxiang Zhang
Youliang Zhang
Jie Guo
Yan Zhang
Xiu Li
Yebin Liu
39
7
0
27 Oct 2024
GiVE: Guiding Visual Encoder to Perceive Overlooked Information
GiVE: Guiding Visual Encoder to Perceive Overlooked Information
Junjie Li
Jianghong Ma
Xiaofeng Zhang
Yuhang Li
Jianyang Shi
48
0
0
26 Oct 2024
Your Image is Secretly the Last Frame of a Pseudo Video
Your Image is Secretly the Last Frame of a Pseudo Video
Wenlong Chen
Wenlin Chen
Lapo Rastrelli
Yingzhen Li
DiffM
VGen
39
0
0
26 Oct 2024
Do Discrete Self-Supervised Representations of Speech Capture Tone
  Distinctions?
Do Discrete Self-Supervised Representations of Speech Capture Tone Distinctions?
Opeyemi Osakuade
Simon King
34
0
0
25 Oct 2024
NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video
  Reconstruction
NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction
Z. Gong
Guangyin Bao
Qi Zhang
Zhongwei Wan
Duoqian Miao
...
Changwei Wang
Rongtao Xu
Liang Hu
Ke Liu
Yu Zhang
DiffM
VGen
53
9
0
25 Oct 2024
Where Am I and What Will I See: An Auto-Regressive Model for Spatial
  Localization and View Prediction
Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
Junyi Chen
Di Huang
Weicai Ye
Wanli Ouyang
Tong He
LRM
41
2
0
24 Oct 2024
Learning Global Object-Centric Representations via Disentangled Slot
  Attention
Learning Global Object-Centric Representations via Disentangled Slot Attention
Tonglin Chen
Yinxuan Huang
Zhimeng Shen
Jinghao Huang
Bin Li
Xiangyang Xue
OCL
41
1
0
24 Oct 2024
Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with
  Coordinated Semantics
Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics
Jinghao Hu
Yuhe Zhang
Guohua Geng
Liuyuxin Yang
JiaRui Yan
Jingtao Cheng
YaDong Zhang
Kang Li
DiffM
43
0
0
24 Oct 2024
Previous
123...91011...545556
Next