Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1711.00937
Cited By
Neural Discrete Representation Learning
2 November 2017
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDL
SSL
OCL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Neural Discrete Representation Learning"
50 / 2,765 papers shown
Title
Second FRCSyn-onGoing: Winning Solutions and Post-Challenge Analysis to Improve Face Recognition with Synthetic Data
Ivan Deandres-Tame
Ruben Tolosana
Pietro Melzi
R. Vera-Rodríguez
Minchul Kim
...
Bernardo Biesseck
Pedro Vidal
Luiz Coelho
Roger Granada
David Menotti
82
2
0
02 Dec 2024
Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis
Anton Voronov
Denis Kuznedelev
Mikhail Khoroshikh
Valentin Khrulkov
Dmitry Baranchuk
114
2
0
02 Dec 2024
A Wave is Worth 100 Words: Investigating Cross-Domain Transferability in Time Series
Xiangkai Ma
Xiaobin Hong
Wenzhong Li
Sanglu Lu
AI4TS
64
0
0
01 Dec 2024
Raw Audio Classification with Cosine Convolutional Neural Network (CosCovNN)
Kazi Nazmul Haque
R. Rana
Tasnim Jarin
Bjorn W. Schuller Jr
67
0
0
30 Nov 2024
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Qixiu Li
Yaobo Liang
Zeyu Wang
Lin Luo
Xi Chen
...
Jianmin Bao
Dong Chen
Yuanchun Shi
Jiaolong Yang
B. Guo
LM&Ro
83
23
0
29 Nov 2024
Deepfake Media Generation and Detection in the Generative AI Era: A Survey and Outlook
Florinel-Alin Croitoru
Andrei Iulian Hiji
Vlad Hondru
Nicolae-Cătălin Ristea
Paul Irofti
Marius Popescu
Cristian Rusu
Radu Tudor Ionescu
Fahad Shahbaz Khan
Mubarak Shah
89
3
0
29 Nov 2024
Fleximo: Towards Flexible Text-to-Human Motion Video Generation
Yuhang Zhang
Yuan Zhou
Zeyu Liu
Yuxuan Cai
Qiuyue Wang
Aidong Men
Huan Yang
VGen
DiffM
84
0
0
29 Nov 2024
DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding
Jungbin Cho
Junwan Kim
Jisoo Kim
Minseo Kim
Mingu Kang
S. Hong
Tae-Hyun Oh
Youngjae Yu
VGen
94
1
0
29 Nov 2024
Pretrained Reversible Generation as Unsupervised Visual Representation Learning
Rongkun Xue
Jinouwen Zhang
Yazhe Niu
Dazhong Shen
Bingqi Ma
Yu Liu
Jing Yang
87
0
0
29 Nov 2024
Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers
Chancharik Mitra
Brandon Huang
Tianning Chai
Zhiqiu Lin
Assaf Arbelle
Rogerio Feris
Leonid Karlinsky
Trevor Darrell
Deva Ramanan
Roei Herzig
VLM
134
4
0
28 Nov 2024
BiPO: Bidirectional Partial Occlusion Network for Text-to-Motion Synthesis
Seong-Eun Hong
Soobin Lim
Juyeong Hwang
Minwook Chang
Hyeongyeop Kang
98
1
0
28 Nov 2024
Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads
Siqi Kou
Jiachun Jin
Chang Liu
Ye Ma
Jian Jia
Quan Chen
Peng Jiang
Zhijie Deng
Zhijie Deng
DiffM
VGen
VLM
137
6
0
28 Nov 2024
Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation
Marco Pasini
J. Nistal
Stefan Lattner
George Fazekas
69
3
0
27 Nov 2024
InfiniDreamer: Arbitrarily Long Human Motion Generation via Segment Score Distillation
Wenjie Zhuo
Fan Ma
Hehe Fan
69
0
0
27 Nov 2024
RankMap: Priority-Aware Multi-DNN Manager for Heterogeneous Embedded Devices
Andreas Karatzas
Dimitrios Stamoulis
Iraklis Anagnostopoulos
61
1
0
26 Nov 2024
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
Zigeng Chen
Xinyin Ma
Gongfan Fang
Xinchao Wang
VLM
89
5
0
26 Nov 2024
MotionLLaMA: A Unified Framework for Motion Synthesis and Comprehension
Zeyu Ling
Bo Han
Shiyang Li
H. Shen
Jikang Cheng
Changqing Zou
87
1
0
26 Nov 2024
Efficient Multi-modal Large Language Models via Visual Token Grouping
Minbin Huang
Runhui Huang
Han Shi
Yimeng Chen
Chuanyang Zheng
Xiangguo Sun
Xin Jiang
Zhiyu Li
Hong Cheng
VLM
90
3
0
26 Nov 2024
Rethinking Diffusion for Text-Driven Human Motion Generation
Zichong Meng
Yiming Xie
Xiaogang Peng
Zeyu Han
Huaizu Jiang
VGen
80
3
0
25 Nov 2024
Representation Collapsing Problems in Vector Quantization
Wenhao Zhao
Qiran Zou
Rushi Shah
Dianbo Liu
74
1
0
25 Nov 2024
Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency
Y. Wang
Jiajie Teng
Jiajiong Cao
Yuming Li
Chenguang Ma
Hongteng Xu
Dixin Luo
VGen
DiffM
79
0
0
25 Nov 2024
Comparison of Generative Learning Methods for Turbulence Modeling
Claudia Drygala
Edmund Ross
F. Mare
Hanno Gottschalk
AI4CE
69
0
0
25 Nov 2024
SKQVC: One-Shot Voice Conversion by K-Means Quantization with Self-Supervised Speech Representations
Youngjun Sim
Jinsung Yoon
Young-Joo Suh
84
0
0
25 Nov 2024
A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation
M. Valiuddin
R. V. Sloun
C.G.A. Viviers
Peter H. N. de With
Fons van der Sommen
UQCV
91
1
0
25 Nov 2024
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
Yongwei Chen
Yushi Lan
Shangchen Zhou
Tengfei Wang
Xingang Pan
102
5
0
25 Nov 2024
M3-CVC: Controllable Video Compression with Multimodal Generative Models
Rui Wan
Qi Zheng
Yibo Fan
VGen
DiffM
71
0
0
24 Nov 2024
Comparative Analysis of Diffusion Generative Models in Computational Pathology
Denisha Thakkar
Vincent Quoc-Huy Trinh
Sonal Varma
Samira Ebrahimi Kahou
Hassan Rivaz
Mahdi S. Hosseini
MedIm
77
1
0
24 Nov 2024
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Luis Vilaca
Yi Yu
Paula Vinan
75
0
0
24 Nov 2024
PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs
Teng Zhou
Xiaoyu Zhang
Yongchuan Tang
MLLM
DiffM
95
0
0
24 Nov 2024
Efficient Online Inference of Vision Transformers by Training-Free Tokenization
Leonidas Gee
Wing Yan Li
V. Sharmanska
Novi Quadrianto
ViT
93
0
0
23 Nov 2024
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Wei Guo
Heng Wang
Jianbo Ma
Weidong Cai
DiffM
93
3
0
23 Nov 2024
VQalAttent: a Transparent Speech Generation Pipeline based on Transformer-learned VQ-VAE Latent Space
Armani Rodriguez
S. Kokalj-Filipovic
75
0
0
22 Nov 2024
S^2 ALM: Sequence-Structure Pre-trained Large Language Model for Comprehensive Antibody Representation Learning
Mingze Yin
Hanjing Zhou
Jialu Wu
Yiheng Zhu
Yuxuan Zhan
...
Hongxia Xu
Chang-Yu Hsieh
Jintai Chen
Tingjun Hou
Junfei Wu
77
0
0
20 Nov 2024
SkillTree: Explainable Skill-Based Deep Reinforcement Learning for Long-Horizon Control Tasks
Yongyan Wen
Siyuan Li
Rongchang Zuo
Lei Yuan
Hangyu Mao
P. Liu
69
0
0
19 Nov 2024
DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation
Tianyi Yan
Dongming Wu
Wencheng Han
Junpeng Jiang
Xia Zhou
Kun Zhan
Cheng-Zhong Xu
Jianbing Shen
30
3
0
18 Nov 2024
LaVin-DiT: Large Vision Diffusion Transformer
Zhaoqing Wang
Xiaobo Xia
Runnan Chen
Dongdong Yu
Changhu Wang
Mingming Gong
Tongliang Liu
97
6
0
18 Nov 2024
Bag of Design Choices for Inference of High-Resolution Masked Generative Transformer
Shitong Shao
Zikai Zhou
Tian Ye
Lichen Bai
Zhiqiang Xu
Zeke Xie
DiffM
51
0
0
16 Nov 2024
Multidimensional Byte Pair Encoding: Shortened Sequences for Improved Visual Data Generation
Tim Elsner
Paula Usinger
Julius Nehring-Wirxel
Gregor Kobsik
Victor Czech
Yanjiang He
I. Lim
Leif Kobbelt
39
1
0
15 Nov 2024
Zero-shot Voice Conversion with Diffusion Transformers
Songting Liu
45
2
0
15 Nov 2024
ReMP: Reusable Motion Prior for Multi-domain 3D Human Pose Estimation and Motion Inbetweening
Hojun Jang
Y. Kim
3DH
38
0
0
13 Nov 2024
Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings
Aditya Sanghi
Aliasghar Khani
Pradyumna Reddy
Arianna Rampini
Derek Cheung
Kamal Rahimi Malekshan
Kanika Madan
Hooman Shayani
48
3
0
12 Nov 2024
Artificial Intelligence for Biomedical Video Generation
Linyuan Li
Jianing Qiu
Anujit Saha
Lin Li
Poyuan Li
Mengxian He
Ziyu Guo
Wu Yuan
VGen
63
1
0
12 Nov 2024
ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis
Zanlin Ni
Yulin Wang
Renping Zhou
Yizeng Han
Jiayi Guo
Zhiyuan Liu
Yuan Yao
Gao Huang
63
4
0
11 Nov 2024
KMM: Key Frame Mask Mamba for Extended Motion Generation
Zeyu Zhang
Hang Gao
Akide Liu
Qi Chen
Feng Chen
...
Hao Tang
Zhenming Li
Zhongwen Zhou
Hao Tang
Bohan Zhuang
Mamba
VGen
61
3
0
10 Nov 2024
GFT: Graph Foundation Model with Transferable Tree Vocabulary
Zehong Wang
Zheyuan Zhang
Nitesh V. Chawla
Chuxu Zhang
Yanfang Ye
52
10
0
09 Nov 2024
GaussianSpa: An "Optimizing-Sparsifying" Simplification Framework for Compact and High-Quality 3D Gaussian Splatting
Yangming Zhang
Wenqi Jia
Wei Niu
Miao Yin
3DGS
83
3
0
09 Nov 2024
Aligned Vector Quantization for Edge-Cloud Collabrative Vision-Language Models
Xiao Liu
Lijun Zhang
Deepak Ganesan
Hui Guan
VLM
33
0
0
08 Nov 2024
Autoregressive Models in Vision: A Survey
Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
...
Hao Fei
Guillermo Sapiro
Jiebo Luo
Ping Luo
Ngai Wong
VGen
48
9
0
08 Nov 2024
Analyzing The Language of Visual Tokens
David M. Chan
Rodolfo Corona
J. S. Park
Cheol Jun Cho
Yutong Bai
Trevor Darrell
26
2
0
07 Nov 2024
Image Understanding Makes for A Good Tokenizer for Image Generation
Luting Wang
Yang Zhao
Zijian Zhang
Jiashi Feng
Si Liu
Bingyi Kang
VLM
47
4
0
07 Nov 2024
Previous
1
2
3
...
8
9
10
...
54
55
56
Next