Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1711.00937
Cited By
Neural Discrete Representation Learning
2 November 2017
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDL
SSL
OCL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Neural Discrete Representation Learning"
50 / 2,773 papers shown
Title
Aligned Vector Quantization for Edge-Cloud Collabrative Vision-Language Models
Xiao Liu
Lijun Zhang
Deepak Ganesan
Hui Guan
VLM
35
0
0
08 Nov 2024
Autoregressive Models in Vision: A Survey
Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
...
Mengdi Zhang
Guillermo Sapiro
Jiebo Luo
Ping Luo
Ngai Wong
VGen
53
9
0
08 Nov 2024
Analyzing The Language of Visual Tokens
David M. Chan
Rodolfo Corona
J. S. Park
Cheol Jun Cho
Yutong Bai
Trevor Darrell
28
2
0
07 Nov 2024
Image Understanding Makes for A Good Tokenizer for Image Generation
Luting Wang
Yang Zhao
Zijian Zhang
Jiashi Feng
Si Liu
Bingyi Kang
VLM
47
4
0
07 Nov 2024
Community Forensics: Using Thousands of Generators to Train Fake Image Detectors
Jeongsoo Park
Andrew Owens
47
3
0
06 Nov 2024
Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data
Seunggeun Chi
Pin-Hao Huang
Enna Sachdeva
Hengbo Ma
Karthik Ramani
Kwonjoon Lee
DiffM
50
2
0
05 Nov 2024
VQ-ACE: Efficient Policy Search for Dexterous Robotic Manipulation via Action Chunking Embedding
Chenyu Yang
Davide Liconti
Robert K. Katzschmann
47
1
0
05 Nov 2024
Pre-trained Visual Dynamics Representations for Efficient Policy Learning
Hao Luo
Bohan Zhou
Zongqing Lu
35
1
0
05 Nov 2024
Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey
Ao Fu
Yi Zhou
Tao Zhou
Yuqing Yang
Bojun Gao
Qun Li
Guobin Wu
Ling Shao
VGen
59
2
0
05 Nov 2024
FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models
Zhanwei Zhang
Shizhao Sun
Wenxiao Wang
D. Cai
Jiang Bian
AI4CE
41
1
0
05 Nov 2024
Grouped Discrete Representation for Object-Centric Learning
Rongzhen Zhao
V. Wang
Arno Solin
Joni Pajarinen
BDL
OCL
34
1
0
04 Nov 2024
Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Yongxin Zhu
B. Li
Yifei Xin
Linli Xu
46
10
0
04 Nov 2024
Understanding Variational Autoencoders with Intrinsic Dimension and Information Imbalance
Charles Camboulin
Diego Doimo
Aldo Glielmo
DRL
72
0
0
04 Nov 2024
MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction
Cheng Tan
Zhenxiao Cao
Zhangyang Gao
Lirong Wu
Siyuan Li
Yufei Huang
Jun Xia
Bozhen Hu
Stan Z. Li
53
0
0
04 Nov 2024
IRS-Enhanced Secure Semantic Communication Networks: Cross-Layer and Context-Awared Resource Allocation
Lingyi Wang
Wei Wu
Fuhui Zhou
Zhijin Qin
Qihui Wu
37
2
0
04 Nov 2024
Bootstrapping Top-down Information for Self-modulating Slot Attention
Dongwon Kim
Seoyeon Kim
Suha Kwak
OCL
ObjD
42
0
0
04 Nov 2024
Transferable Sequential Recommendation via Vector Quantized Meta Learning
Zhenrui Yue
Huimin Zeng
Yang Zhang
Julian McAuley
Dong Wang
DRL
MQ
21
0
0
04 Nov 2024
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Seong-Whan Lee
46
4
0
04 Nov 2024
VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization
Yiwei Zhang
Jin Gao
Fudong Ge
Guan Luo
Bing Li
Z. Zhang
Haibin Ling
Weiming Hu
57
0
0
03 Nov 2024
HC
3
^3
3
L-Diff: Hybrid conditional latent diffusion with high frequency enhancement for CBCT-to-CT synthesis
Shi Yin
Hongqi Tan
Li Ming Chong
Haofeng Liu
Hui Liu
Kang Hao Lee
Jeffrey Kit Loong Tuan
Dean Ho
Yueming Jin
DiffM
MedIm
32
0
0
03 Nov 2024
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Shijia Liao
Yalin Wang
Tianyu Li
Yifan Cheng
Ruoyi Zhang
Rongzhi Zhou
Yijin Xing
AuLLM
43
11
0
02 Nov 2024
Music Foundation Model as Generic Booster for Music Downstream Tasks
Weihsiang Liao
Yuhta Takida
Yukara Ikemiya
Zhi-Wei Zhong
Chieh-Hsin Lai
...
Stefan Uhlich
Taketo Akama
Woosung Choi
Yuichiro Koyama
Yuki Mitsufuji
56
0
0
02 Nov 2024
Randomized Autoregressive Visual Generation
Qihang Yu
Ju He
XueQing Deng
Xiaohui Shen
Liang-Chieh Chen
VGen
DiffM
59
31
1
01 Nov 2024
Optimizing Contextual Speech Recognition Using Vector Quantization for Efficient Retrieval
Nikolaos Flemotomos
Roger Hsiao
P. Swietojanski
Takaaki Hori
Dogan Can
Xiaodan Zhuang
51
0
0
01 Nov 2024
Constant Acceleration Flow
Dogyun Park
Sojin Lee
S. Kim
Taehoon Lee
Youngjoon Hong
Hyunwoo J. Kim
65
2
0
01 Nov 2024
LLM-Ref: Enhancing Reference Handling in Technical Writing with Large Language Models
Kazi Ahmed Asif Fuad
Lizhong Chen
26
0
0
01 Nov 2024
Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning
Penghui Ruan
Pichao Wang
Divya Saxena
Jiannong Cao
Yuhui Shi
DiffM
VGen
41
65
0
31 Oct 2024
Sparsh: Self-supervised touch representations for vision-based tactile sensing
Carolina Higuera
Akash Sharma
Chaithanya Krishna Bodduluri
Taosha Fan
Patrick E. Lancaster
...
Michael Kaess
Byron Boots
Mike Lambeta
Tingfan Wu
Mustafa Mukadam
52
12
0
31 Oct 2024
Identifying Spatio-Temporal Drivers of Extreme Events
Mohamad Hakam Shams Eddin
Juergen Gall
AI4TS
55
0
0
31 Oct 2024
Breaking Determinism: Fuzzy Modeling of Sequential Recommendation Using Discrete State Space Diffusion Model
Wenjia Xie
Hao Wang
Lefei Zhang
Rui Zhou
Defu Lian
Enhong Chen
DiffM
49
3
0
31 Oct 2024
Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts
Xiang Deng
Youxin Pang
Xiaochen Zhao
Chao Xu
Lizhen Wang
Hongjiang Xiao
Shi Yan
Hongwen Zhang
Yebin Liu
DiffM
VGen
48
1
0
31 Oct 2024
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection
Xiufeng Song
Xiao Guo
J. Zhang
Qirui Li
Lei Bai
Xiaoming Liu
Guangtao Zhai
Xiaohong Liu
DiffM
VGen
74
9
0
31 Oct 2024
TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation
Sunjae Yoon
Gwanhyeong Koo
Younghwan Lee
Chang D. Yoo
VGen
80
3
0
31 Oct 2024
Emotion-Guided Image to Music Generation
Souraja Kundu
Saket Singh
Yuji Iwahori
28
3
0
29 Oct 2024
Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective
Shenghao Xie
Wenqiang Zu
Mingyang Zhao
Duo Su
Shilong Liu
Ruohua Shi
Guoqi Li
Shanghang Zhang
Lei Ma
LRM
51
3
0
29 Oct 2024
MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding
Yuan Wang
Di Huang
Yaqi Zhang
Wanli Ouyang
J. Jiao
Xuetao Feng
Yan Zhou
Pengfei Wan
Shixiang Tang
Dan Xu
VGen
36
13
0
29 Oct 2024
Diffusion-nested Auto-Regressive Synthesis of Heterogeneous Tabular Data
Hengrui Zhang
Liancheng Fang
Qitian Wu
Philip S. Yu
DiffM
LMTD
41
1
0
28 Oct 2024
Constrained Transformer-Based Porous Media Generation to Spatial Distribution of Rock Properties
Zihan Ren
Sanjay Srinivasan
Dustin Crandall
24
0
0
28 Oct 2024
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior
Hanyu Wang
Saksham Suri
Yixuan Ren
Hao Chen
Abhinav Shrivastava
VGen
31
10
0
28 Oct 2024
CardiacNet: Learning to Reconstruct Abnormalities for Cardiac Disease Assessment from Echocardiogram Videos
Jiewen Yang
Yiqun Lin
Bin Pu
Jiarong Guo
Xiaowei Xu
Xuelong Li
35
3
0
28 Oct 2024
MAMMAL -- Molecular Aligned Multi-Modal Architecture and Language
Yoel Shoshan
Moshiko Raboh
Michal Ozery-Flato
Vadim Ratner
Alex Golts
...
Sharon Kurant
Joseph A. Morrone
Parthasarathy Suryanarayanan
Michal Rosen-Zvi
Efrat Hexter
39
1
0
28 Oct 2024
Vector Quantization Prompting for Continual Learning
L. Jiao
Qiuxia Lai
Yu LI
Qiang Xu
VLM
CLL
41
3
0
27 Oct 2024
Lodge++: High-quality and Long Dance Generation with Vivid Choreography Patterns
Ronghui Li
Hongwen Zhang
Yachao Zhang
Yuxiang Zhang
Youliang Zhang
Jie Guo
Yan Zhang
Xiu Li
Yebin Liu
39
7
0
27 Oct 2024
GiVE: Guiding Visual Encoder to Perceive Overlooked Information
Junjie Li
Jianghong Ma
Xiaofeng Zhang
Yuhang Li
Jianyang Shi
48
0
0
26 Oct 2024
Your Image is Secretly the Last Frame of a Pseudo Video
Wenlong Chen
Wenlin Chen
Lapo Rastrelli
Yingzhen Li
DiffM
VGen
39
0
0
26 Oct 2024
Do Discrete Self-Supervised Representations of Speech Capture Tone Distinctions?
Opeyemi Osakuade
Simon King
34
0
0
25 Oct 2024
NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction
Z. Gong
Guangyin Bao
Qi Zhang
Zhongwei Wan
Duoqian Miao
...
Changwei Wang
Rongtao Xu
Liang Hu
Ke Liu
Yu Zhang
DiffM
VGen
53
9
0
25 Oct 2024
Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
Junyi Chen
Di Huang
Weicai Ye
Wanli Ouyang
Tong He
LRM
41
2
0
24 Oct 2024
Learning Global Object-Centric Representations via Disentangled Slot Attention
Tonglin Chen
Yinxuan Huang
Zhimeng Shen
Jinghao Huang
Bin Li
Xiangyang Xue
OCL
41
1
0
24 Oct 2024
Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics
Jinghao Hu
Yuhe Zhang
Guohua Geng
Liuyuxin Yang
JiaRui Yan
Jingtao Cheng
YaDong Zhang
Kang Li
DiffM
43
0
0
24 Oct 2024
Previous
1
2
3
...
9
10
11
...
54
55
56
Next