Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1711.00937
Cited By
v1
v2 (latest)
Neural Discrete Representation Learning
2 November 2017
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDL
SSL
OCL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Neural Discrete Representation Learning"
50 / 3,267 papers shown
Title
DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion
Maksim Siniukov
Di Chang
Minh Tran
Hongkun Gong
Ashutosh Chaubey
Mohammad Soleymani
DiffM
VGen
120
0
0
05 Apr 2025
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation
Zhiyuan Yan
Junyan Ye
Weijia Li
Zilong Huang
Shenghai Yuan
Xiangyang He
Kaiqing Lin
Jun-Jian He
Conghui He
Lichao Sun
MLLM
EGVM
193
24
0
03 Apr 2025
Moment Quantization for Video Temporal Grounding
Xiaolong Sun
Le Wang
Sanping Zhou
Liushuai Shi
Kun Xia
Mengnan Liu
Yabing Wang
Gang Hua
MQ
75
0
0
03 Apr 2025
Fine-Tuning Visual Autoregressive Models for Subject-Driven Generation
Jiwoo Chung
Sangeek Hyun
Hyunjun Kim
Eunseo Koh
MinKyu Lee
Jae-Pil Heo
79
0
0
03 Apr 2025
Towards Assessing Deep Learning Test Input Generators
Seif Mzoughi
Ahmed Hajyahmed
Mohamed Elshafei
Foutse Khomh anb Diego Elias Costa
D. Costa
AAML
94
0
0
03 Apr 2025
Explainable and Interpretable Forecasts on Non-Smooth Multivariate Time Series for Responsible Gameplay
Hussain Jagirdar
Rukma Talwadker
Aditya Pareek
Pulkit Agrawal
Tridib Mukherjee
AI4TS
200
2
0
03 Apr 2025
Instruction-Guided Autoregressive Neural Network Parameter Generation
Soro Bedionita
Bruno Andreis
Song Chong
Sung Ju Hwang
DiffM
100
0
0
02 Apr 2025
Overcoming Vocabulary Constraints with Pixel-level Fallback
Jonas F. Lotz
Hendra Setiawan
Stephan Peitz
Yova Kementchedjhieva
103
1
0
02 Apr 2025
Foreground Focus: Enhancing Coherence and Fidelity in Camouflaged Image Generation
Pei-Chi Chen
Yi Yao
Chan-Feng Hsu
Hongxia Xie
Hung-Jen Chen
Hong-Han Shuai
Wen-Huang Cheng
DiffM
77
0
0
02 Apr 2025
Continual Cross-Modal Generalization
Yan Xia
Hai Huang
Minghui Fang
Zhou Zhao
CLL
110
0
0
01 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Jianchao Tan
MGen
VGen
300
1
0
01 Apr 2025
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Siyuan Li
Lefei Zhang
Zedong Wang
Juanxi Tian
Cheng Tan
...
Chang Yu
Qingsong Xie
Haonan Lu
Haoqian Wang
Zhen Lei
118
2
0
01 Apr 2025
Personalized Federated Training of Diffusion Models with Privacy Guarantees
Kumar Kshitij Patel
Weitong Zhang
Lingxiao Wang
MedIm
105
0
0
01 Apr 2025
A Theory of Machine Understanding via the Minimum Description Length Principle
Canlin Zhang
Xiuwen Liu
137
0
0
01 Apr 2025
WorldScore: A Unified Evaluation Benchmark for World Generation
Haoyi Duan
Hong-Xing Yu
Sirui Chen
L. Fei-Fei
Jiajun Wu
VGen
150
8
0
01 Apr 2025
Learned Image Compression with Dictionary-based Entropy Model
Jingbo Lu
Leheng Zhang
Xingyu Zhou
Mu Li
Wen Li
Shuhang Gu
118
1
0
01 Apr 2025
Style Quantization for Data-Efficient GAN Training
Jian Wang
Xin Lan
Jizhe Zhou
Yuxin Tian
Jiancheng Lv
99
0
0
31 Mar 2025
Biologically Inspired Spiking Diffusion Model with Adaptive Lateral Selection Mechanism
Linghao Feng
Dongcheng Zhao
Sicheng Shen
Yi Zeng
129
0
0
31 Mar 2025
Training-Free Text-Guided Image Editing with Visual Autoregressive Model
Yufei Wang
Lanqing Guo
Zhihao Li
Jiaxing Huang
Pichao Wang
Bihan Wen
Jingchao Wang
DiffM
123
1
0
31 Mar 2025
FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning
Hang Guo
Yawei Li
Taolin Zhang
Jiadong Wang
Tao Dai
Shu-Tao Xia
Luca Benini
172
5
0
30 Mar 2025
HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation
Hongwei Zheng
Han Li
Wenrui Dai
Ziyang Zheng
Chenglin Li
Junni Zou
Hongkai Xiong
3DH
94
1
0
30 Mar 2025
MeshCraft: Exploring Efficient and Controllable Mesh Generation with Flow-based DiTs
Xianglong He
Junyi Chen
Di Huang
Zexiang Liu
Xiaoshui Huang
Wanli Ouyang
C. Yuan
Yangguang Li
DiffM
88
3
0
29 Mar 2025
Tokenization of Gaze Data
Tim Rolff
Jurik Karimian
Niklas Hypki
S. Schmidt
Markus Lappe
Frank Steinicke
126
0
0
28 Mar 2025
Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
Shivam Mehta
Nebojsa Jojic
Hannes Gamper
76
0
0
28 Mar 2025
ViSketch-GPT: Collaborative Multi-Scale Feature Extraction for Sketch Recognition and Generation
Giulio Federico
Giuseppe Amato
F. Carrara
Claudio Gennaro
Marco Di Benedetto
61
0
0
28 Mar 2025
Data Quality Matters: Quantifying Image Quality Impact on Machine Learning Performance
Christian Steinhauser
Philipp Reis
Hubert Padusinski
Jacob Langner
Eric Sax
64
0
0
28 Mar 2025
Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities
Raman Dutt
Harleen Hanspal
Guoxuan Xia
Petru-Daniel Tudosiu
Alexander Black
Yongxin Yang
Jingyu Sun
Sarah Parisot
MoE
105
0
0
28 Mar 2025
SocialGen: Modeling Multi-Human Social Interaction with Language Models
Heng Yu
Juze Zhang
Changan Chen
Tiange Xiang
Yusu Fang
Juan Carlos Niebles
Ehsan Adeli
VGen
93
1
0
28 Mar 2025
A Unified Image-Dense Annotation Generation Model for Underwater Scenes
Hongkai Lin
Dingkang Liang
Zhenghao Qi
X. Bai
DiffM
84
0
0
27 Mar 2025
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis
Jike Zhong
Qilong Wu
Xinyue Li
Bo Zhang
Ming Li
...
Haoyang Li
Yu Qiao
Peng Gao
Bin Fu
Zhen Li
EGVM
89
1
0
27 Mar 2025
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Dian Zheng
Ziqi Huang
Hongbo Liu
Kai Zou
Yinan He
...
Yize Zhang
Jingwen He
Wei-Shi Zheng
Yu Qiao
Ziwei Liu
EGVM
VGen
132
14
0
27 Mar 2025
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling
Xianglong He
Zi-Xin Zou
Chia-Hao Chen
Yu Guo
Ding Liang
Chun Yuan
Wanli Ouyang
Yan-Pei Cao
Yangguang Li
130
5
0
27 Mar 2025
Vision-to-Music Generation: A Survey
Zhaokai Wang
Chenxi Bao
Le Zhuo
Jingrui Han
Yang Yue
Yihong Tang
Victor Shea-Jay Huang
Yue Liao
EGVM
VGen
141
1
0
27 Mar 2025
Controlling Large Language Model with Latent Actions
Chengxing Jia
Ziniu Li
Pengyuan Wang
Yi-Chen Li
Zhenyu Hou
Yuxiao Dong
Y. Yu
124
1
0
27 Mar 2025
VADMamba: Exploring State Space Models for Fast Video Anomaly Detection
Jiahao Lyu
Minghua Zhao
Jing Hu
Xuewen Huang
Yifei Chen
Shuangli Du
Mamba
122
0
0
27 Mar 2025
Towards Generating Realistic 3D Semantic Training Data for Autonomous Driving
Lucas Nunes
Rodrigo Marcuzzi
Jens Behley
C. Stachniss
3DPC
160
1
0
27 Mar 2025
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Size Wu
Wentao Zhang
Lumin Xu
Sheng Jin
Zhonghua Wu
Qingyi Tao
Wentao Liu
Wei Li
Chen Change Loy
VGen
468
6
0
27 Mar 2025
ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer
Yong Xie
Yunlian Sun
Hongwen Zhang
Yebin Liu
Jinhui Tang
VGen
154
0
0
27 Mar 2025
UGen: Unified Autoregressive Multimodal Model with Progressive Vocabulary Learning
Hongxuan Tang
Hao Liu
Xinyan Xiao
84
2
0
27 Mar 2025
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models
Alex Jinpeng Wang
Linjie Li
Zhiyong Yang
Lijuan Wang
Min Li
DiffM
105
0
0
26 Mar 2025
Benchmarking Machine Learning Methods for Distributed Acoustic Sensing
Shuaikai Shi
Qijun Zong
101
0
0
26 Mar 2025
Offline Reinforcement Learning with Discrete Diffusion Skills
Ruixi Qiao
Jie Cheng
Xingyuan Dai
Yonglin Tian
Yisheng Lv
OffRL
106
0
0
26 Mar 2025
FuXi-RTM: A Physics-Guided Prediction Framework with Radiative Transfer Modeling
Qiusheng Huang
Xiaohui Zhong
Xu Fan
Lei Chen
Hao Li
AI4TS
AI4CE
89
0
0
25 Mar 2025
AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers
Jiazhi Guan
Kaisiyuan Wang
Zhiliang Xu
Quanwei Yang
Yasheng Sun
...
Errui Ding
Jiadong Wang
Youjian Zhao
Hang Zhou
Ziwei Liu
VGen
89
0
0
25 Mar 2025
Dance Like a Chicken: Low-Rank Stylization for Human Motion Diffusion
Haim Sawdayee
Chuan Guo
Guy Tevet
Bing Zhou
Jian Wang
Amit H. Bermano
DiffM
VGen
99
1
0
25 Mar 2025
Show or Tell? Effectively prompting Vision-Language Models for semantic segmentation
Niccolo Avogaro
Thomas Frick
Mattia Rigotti
Andrea Bartezzaghi
Filip M. Janicki
Cristiano Malossi
Konrad Schindler
Roy Assaf
MLLM
VLM
106
1
0
25 Mar 2025
Analyzable Chain-of-Musical-Thought Prompting for High-Fidelity Music Generation
Max W. Y. Lam
Yijin Xing
Weiya You
Jingcheng Wu
Zongyu Yin
...
T. Zhao
Chien-Hung Liu
Xuchen Song
Yang Li
Yahui Zhou
LRM
103
4
0
25 Mar 2025
HOIGPT: Learning Long Sequence Hand-Object Interaction with Language Models
Mingzhen Huang
Fu-Jen Chu
Bugra Tekin
Kevin J. Liang
Haoyu Ma
...
Hongfei Xue
Siwei Lyu
Kris Kitani
Matt Feiszli
Hao Tang
VLM
126
4
0
24 Mar 2025
OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad
Luyao Tang
Yuxuan Yuan
Chen Chen
Zeyu Zhang
Yue Huang
Kun Zhang
98
1
0
24 Mar 2025
Learning Beamforming Codebooks for Active Sensing with Reconfigurable Intelligent Surface
Zhongze Zhang
Wei Yu
63
0
0
24 Mar 2025
Previous
1
2
3
...
5
6
7
...
64
65
66
Next