Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1711.00937
Cited By
Neural Discrete Representation Learning
2 November 2017
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDL
SSL
OCL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Neural Discrete Representation Learning"
50 / 2,747 papers shown
Title
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
Yang Jiao
Haibo Qiu
Zequn Jie
Tian Jin
Jingjing Chen
Lin Ma
Yu Jiang
34
2
0
06 Apr 2025
Pre-training Generative Recommender with Multi-Identifier Item Tokenization
Bowen Zheng
Enze Liu
Z. Chen
Zhongrui Ma
Yue Wang
Wayne Xin Zhao
Zhicheng Dou
38
0
0
06 Apr 2025
DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion
Maksim Siniukov
Di Chang
Minh Tran
Hongkun Gong
Ashutosh Chaubey
Mohammad Soleymani
DiffM
VGen
23
0
0
05 Apr 2025
Moment Quantization for Video Temporal Grounding
Xiaolong Sun
Le Wang
Sanping Zhou
Liushuai Shi
Kun Xia
Mengnan Liu
Yabing Wang
Gang Hua
MQ
31
0
0
03 Apr 2025
Explainable and Interpretable Forecasts on Non-Smooth Multivariate Time Series for Responsible Gameplay
Hussain Jagirdar
Rukma Talwadker
Aditya Pareek
Pulkit Agrawal
Tridib Mukherjee
AI4TS
43
1
0
03 Apr 2025
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation
Zhiyuan Yan
Junyan Ye
Weijia Li
Zilong Huang
Shenghai Yuan
Xiangyang He
Kaiqing Lin
Jun-Jian He
Conghui He
Li Yuan
MLLM
EGVM
93
10
0
03 Apr 2025
Towards Assessing Deep Learning Test Input Generators
Seif Mzoughi
Ahmed Hajyahmed
Mohamed Elshafei
Foutse Khomh anb Diego Elias Costa
D. Costa
AAML
40
0
0
03 Apr 2025
Fine-Tuning Visual Autoregressive Models for Subject-Driven Generation
Jiwoo Chung
Sangeek Hyun
Hyunjun Kim
Eunseo Koh
MinKyu Lee
Jae-Pil Heo
33
0
0
03 Apr 2025
Instruction-Guided Autoregressive Neural Network Parameter Generation
Soro Bedionita
Bruno Andreis
Song Chong
Sung Ju Hwang
DiffM
45
0
0
02 Apr 2025
Overcoming Vocabulary Constraints with Pixel-level Fallback
Jonas F. Lotz
Hendra Setiawan
Stephan Peitz
Yova Kementchedjhieva
43
0
0
02 Apr 2025
Foreground Focus: Enhancing Coherence and Fidelity in Camouflaged Image Generation
Pei-Chi Chen
Yi Yao
Chan-Feng Hsu
Hongxia Xie
Hung-Jen Chen
Hong-Han Shuai
Wen-Huang Cheng
DiffM
46
0
0
02 Apr 2025
Personalized Federated Training of Diffusion Models with Privacy Guarantees
Kumar Kshitij Patel
Weitong Zhang
Lingxiao Wang
MedIm
50
0
0
01 Apr 2025
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Siyuan Li
Lefei Zhang
Zedong Wang
Juanxi Tian
Cheng Tan
...
Chang Yu
Qingsong Xie
Haonan Lu
Haoqian Wang
Zhen Lei
48
0
0
01 Apr 2025
Continual Cross-Modal Generalization
Yan Xia
Hai Huang
Minghui Fang
Zhou Zhao
CLL
54
0
0
01 Apr 2025
WorldScore: A Unified Evaluation Benchmark for World Generation
Haoyi Duan
Hong-Xing Yu
Sirui Chen
L. Fei-Fei
Jiajun Wu
VGen
65
3
0
01 Apr 2025
Learned Image Compression with Dictionary-based Entropy Model
Jingbo Lu
Leheng Zhang
Xingyu Zhou
Mu Li
Wen Li
Shuhang Gu
51
0
0
01 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Kaipeng Zhang
MGen
VGen
70
1
0
01 Apr 2025
Style Quantization for Data-Efficient GAN Training
Jian Wang
Xin Lan
Jizhe Zhou
Yuxin Tian
Jiancheng Lv
39
0
0
31 Mar 2025
Training-Free Text-Guided Image Editing with Visual Autoregressive Model
Yufei Wang
Lanqing Guo
Z. Li
Jiaxing Huang
Pichao Wang
Bihan Wen
Jingchao Wang
DiffM
68
1
0
31 Mar 2025
Biologically Inspired Spiking Diffusion Model with Adaptive Lateral Selection Mechanism
Linghao Feng
Dongcheng Zhao
Sicheng Shen
Yi Zeng
67
0
0
31 Mar 2025
FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning
Hang Guo
Yawei Li
Taolin Zhang
Jiadong Wang
Tao Dai
Shu-Tao Xia
Luca Benini
72
2
0
30 Mar 2025
HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation
Hongwei Zheng
Han Li
Wenrui Dai
Ziyang Zheng
Chenglin Li
Junni Zou
Hongkai Xiong
3DH
60
0
0
30 Mar 2025
MeshCraft: Exploring Efficient and Controllable Mesh Generation with Flow-based DiTs
Xianglong He
Junyi Chen
Di Huang
Zexiang Liu
Xiaoshui Huang
Wanli Ouyang
C. Yuan
Yangguang Li
DiffM
57
0
0
29 Mar 2025
Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
Shivam Mehta
Nebojsa Jojic
Hannes Gamper
31
0
0
28 Mar 2025
SocialGen: Modeling Multi-Human Social Interaction with Language Models
Heng Yu
Juze Zhang
Changan Chen
Tiange Xiang
Yusu Fang
Juan Carlos Niebles
Ehsan Adeli
VGen
49
0
0
28 Mar 2025
Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities
Raman Dutt
Harleen Hanspal
Guoxuan Xia
Petru-Daniel Tudosiu
Alexander Black
Yongxin Yang
Jingyu Sun
Sarah Parisot
MoE
43
0
0
28 Mar 2025
ViSketch-GPT: Collaborative Multi-Scale Feature Extraction for Sketch Recognition and Generation
Giulio Federico
Giuseppe Amato
F. Carrara
Claudio Gennaro
Marco Di Benedetto
31
0
0
28 Mar 2025
Data Quality Matters: Quantifying Image Quality Impact on Machine Learning Performance
Christian Steinhauser
Philipp Reis
Hubert Padusinski
Jacob Langner
Eric Sax
31
0
0
28 Mar 2025
Tokenization of Gaze Data
Tim Rolff
Jurik Karimian
Niklas Hypki
S. Schmidt
Markus Lappe
Frank Steinicke
41
0
0
28 Mar 2025
VADMamba: Exploring State Space Models for Fast Video Anomaly Detection
Jiahao Lyu
Minghua Zhao
Jing Hu
Xuewen Huang
Yifei Chen
Shuangli Du
Mamba
71
0
0
27 Mar 2025
UGen: Unified Autoregressive Multimodal Model with Progressive Vocabulary Learning
Hongxuan Tang
Hao Liu
Xinyan Xiao
45
1
0
27 Mar 2025
A Unified Image-Dense Annotation Generation Model for Underwater Scenes
Hongkai Lin
Dingkang Liang
Zhenghao Qi
X. Bai
DiffM
46
0
0
27 Mar 2025
ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer
Yong Xie
Yunlian Sun
Hongwen Zhang
Y. Liu
Jinhui Tang
VGen
100
0
0
27 Mar 2025
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Dian Zheng
Ziqi Huang
Hongbo Liu
Kai Zou
Yinan He
...
Yuyao Zhang
Jingwen He
Wei-Shi Zheng
Yu Qiao
Ziwei Liu
EGVM
VGen
56
6
0
27 Mar 2025
Towards Generating Realistic 3D Semantic Training Data for Autonomous Driving
Lucas Nunes
Rodrigo Marcuzzi
Jens Behley
C. Stachniss
3DPC
85
0
0
27 Mar 2025
Vision-to-Music Generation: A Survey
Zhaokai Wang
Chenxi Bao
Le Zhuo
Jingrui Han
Yang Yue
Yihong Tang
Victor Shea-Jay Huang
Yue Liao
EGVM
VGen
74
1
0
27 Mar 2025
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling
Xianglong He
Zi-Xin Zou
Chia-Hao Chen
Y. Guo
Ding Liang
Chun Yuan
Wanli Ouyang
Yan-Pei Cao
Yangguang Li
58
0
0
27 Mar 2025
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis
Jike Zhong
Qilong Wu
Xinyue Li
Bo Zhang
Ming Li
...
Yiming Li
Yu Qiao
Peng Gao
Bin Fu
Zhen Li
EGVM
45
0
0
27 Mar 2025
Controlling Large Language Model with Latent Actions
Chengxing Jia
Ziniu Li
Pengyuan Wang
Yi-Chen Li
Zhenyu Hou
Yuxiao Dong
Y. Yu
56
0
0
27 Mar 2025
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Size Wu
Feiyu Xiong
Lumin Xu
Sheng Jin
Zhonghua Wu
Qingyi Tao
Wentao Liu
Wei Li
Chen Change Loy
VGen
177
2
0
27 Mar 2025
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models
Alex Jinpeng Wang
Linjie Li
Zhengyuan Yang
Lijuan Wang
Min Li
DiffM
73
0
0
26 Mar 2025
Benchmarking Machine Learning Methods for Distributed Acoustic Sensing
Shuaikai Shi
Qijun Zong
84
0
0
26 Mar 2025
Offline Reinforcement Learning with Discrete Diffusion Skills
Ruixi Qiao
Jie Cheng
Xingyuan Dai
Yonglin Tian
Yisheng Lv
OffRL
84
0
0
26 Mar 2025
AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers
Jiazhi Guan
Kaisiyuan Wang
Zhiliang Xu
Quanwei Yang
Yasheng Sun
...
Errui Ding
Jiadong Wang
Youjian Zhao
Hang Zhou
Ziwei Liu
VGen
44
0
0
25 Mar 2025
Dance Like a Chicken: Low-Rank Stylization for Human Motion Diffusion
Haim Sawdayee
Chuan Guo
Guy Tevet
Bing Zhou
Jian Wang
Amit H. Bermano
DiffM
VGen
53
0
0
25 Mar 2025
FuXi-RTM: A Physics-Guided Prediction Framework with Radiative Transfer Modeling
Qiusheng Huang
Xiaohui Zhong
Xu Fan
Lei Chen
Hao Li
AI4TS
AI4CE
49
0
0
25 Mar 2025
Analyzable Chain-of-Musical-Thought Prompting for High-Fidelity Music Generation
Max W. Y. Lam
Yijin Xing
Weiya You
Jingcheng Wu
Zongyu Yin
...
T. Zhao
Chien-Hung Liu
Xuchen Song
Yang Li
Yahui Zhou
LRM
64
2
0
25 Mar 2025
Show or Tell? Effectively prompting Vision-Language Models for semantic segmentation
Niccolo Avogaro
Thomas Frick
Mattia Rigotti
A. Bartezzaghi
Filip Janicki
C. Malossi
Konrad Schindler
Roy Assaf
MLLM
VLM
63
0
0
25 Mar 2025
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
Wencheng Zhu
Yuexin Wang
Hongxuan Li
Pengfei Zhu
Q. Hu
CLIP
52
0
0
24 Mar 2025
Human Motion Unlearning
Edoardo De Matteis
Matteo Migliarini
Alessio Sampieri
Indro Spinelli
Fabio Galasso
MU
60
0
0
24 Mar 2025
Previous
1
2
3
4
5
6
...
53
54
55
Next