Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.03555
Cited By
v1
v2
v3 (latest)
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"
50 / 557 papers shown
Title
On the Stepwise Nature of Self-Supervised Learning
James B. Simon
Maksis Knutins
Liu Ziyin
Daniel Geisz
Abraham J. Fetterman
Joshua Albrecht
SSL
96
35
0
27 Mar 2023
Decoupled Multimodal Distilling for Emotion Recognition
Yong Li
Yuan-Zheng Wang
Zhen Cui
94
83
0
24 Mar 2023
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
169
48
0
21 Mar 2023
GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding
Jihao Liu
Tai Wang
Boxiao Liu
Qihang Zhang
Yu Liu
Hongsheng Li
69
16
0
20 Mar 2023
Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech
Maryam Fazel-Zarandi
Wei-Ning Hsu
SSL
59
9
0
20 Mar 2023
Right the docs: Characterising voice dataset documentation practices used in machine learning
Kathy Reid
Elizabeth T. Williams
66
2
0
19 Mar 2023
OVRL-V2: A simple state-of-art baseline for ImageNav and ObjectNav
Karmesh Yadav
Arjun Majumdar
Ram Ramrakhya
Naoki Yokoyama
Alexei Baevski
Z. Kira
Oleksandr Maksymets
Dhruv Batra
ViT
99
49
0
14 Mar 2023
AdPE: Adversarial Positional Embeddings for Pretraining Vision Transformers via MAE+
Tianlin Li
Ying Wang
Ziwei Xuan
Guo-Jun Qi
ViT
75
3
0
14 Mar 2023
CrossFormer++: A Versatile Vision Transformer Hinging on Cross-scale Attention
Wenxiao Wang
Wei Chen
Qibo Qiu
Long Chen
Boxi Wu
Binbin Lin
Xiaofei He
Wei Liu
98
49
0
13 Mar 2023
Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking
Peng Gao
Renrui Zhang
Rongyao Fang
Ziyi Lin
Hongyang Li
Hongsheng Li
Qiao Yu
65
19
0
09 Mar 2023
Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation
Qi Chen
Ziyang Ma
Tao Liu
Xuejiao Tan
Qu Lu
Xie Chen
K. Yu
CVBM
69
5
0
09 Mar 2023
Masked Image Modeling with Local Multi-Scale Reconstruction
Haoqing Wang
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhiwei Deng
Kai Han
90
52
0
09 Mar 2023
Centroid-centered Modeling for Efficient Vision Transformer Pre-training
Xin Yan
Zuchao Li
Lefei Zhang
Bo Du
Dacheng Tao
VLM
73
0
0
08 Mar 2023
Self-supervised speech representation learning for keyword-spotting with light-weight transformers
Chenyang Gao
Yue Gu
Francesco Calivá
Yuzong Liu
OffRL
81
4
0
07 Mar 2023
Applying Plain Transformers to Real-World Point Clouds
Lanxiao Li
M. Heizmann
3DPC
ViT
82
3
0
28 Feb 2023
Generic-to-Specific Distillation of Masked Autoencoders
Wei Huang
Zhiliang Peng
Li Dong
Furu Wei
Jianbin Jiao
QiXiang Ye
90
23
0
28 Feb 2023
Efficient Masked Autoencoders with Self-Consistency
Zhaowen Li
Yousong Zhu
Zhiyang Chen
Wei Li
Chaoyang Zhao
Rui Zhao
Ming Tang
Jinqiao Wang
136
2
0
28 Feb 2023
Phone and speaker spatial organization in self-supervised speech representations
Pablo Riera
M. Cerdeiro
L. Pepino
Luciana Ferrer
SSL
86
1
0
24 Feb 2023
Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition
Xie Chen
Ziyang Ma
Changli Tang
Yujin Wang
Zhi-shen Zheng
57
4
0
18 Feb 2023
Gaussian-smoothed Imbalance Data Improves Speech Emotion Recognition
Xuefeng Liang
Hexin Jiang
Wenxin Xu
Ying Zhou
65
3
0
17 Feb 2023
A Comprehensive Review and a Taxonomy of Edge Machine Learning: Requirements, Paradigms, and Techniques
Wenbin Li
Hakim Hacid
Ebtesam Almazrouei
Merouane Debbah
93
13
0
16 Feb 2023
Speech Enhancement with Multi-granularity Vector Quantization
Xiaokang Zhao
Qiu-shi Zhu
Jie Zhang
67
0
0
16 Feb 2023
Multi-modal Machine Learning in Engineering Design: A Review and Future Directions
Binyang Song
Ruilin Zhou
Faez Ahmed
AI4CE
144
47
0
14 Feb 2023
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Jiachen Lian
Alexei Baevski
Wei-Ning Hsu
Michael Auli
SSL
150
34
0
10 Feb 2023
Representation Deficiency in Masked Language Modeling
Yu Meng
Jitin Krishnan
Sinong Wang
Qifan Wang
Yuning Mao
Han Fang
Marjan Ghazvininejad
Jiawei Han
Luke Zettlemoyer
149
7
0
04 Feb 2023
ANTM: An Aligned Neural Topic Model for Exploring Evolving Topics
Hamed Rahimi
Hubert Naacke
Camélia Constantin
B. Amann
BDL
AI4TS
127
6
0
03 Feb 2023
SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling
Jiaxiang Dong
Haixu Wu
Haoran Zhang
Li Zhang
Jianmin Wang
Mingsheng Long
AI4TS
142
94
0
02 Feb 2023
Image-Based Vehicle Classification by Synergizing Features from Supervised and Self-Supervised Learning Paradigms
S. Ma
Jidong J. Yang
SSL
33
5
0
01 Feb 2023
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications
Muhammad Arslan Manzoor
S. Albarri
Ziting Xian
Zaiqiao Meng
Preslav Nakov
Shangsong Liang
AI4TS
104
32
0
01 Feb 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
238
344
0
30 Jan 2023
Aerial Image Object Detection With Vision Transformer Detector (ViTDet)
Liya Wang
A. Tien
141
9
0
28 Jan 2023
Open Problems in Applied Deep Learning
M. Raissi
AI4CE
115
2
0
26 Jan 2023
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
Mahmoud Assran
Quentin Duval
Ishan Misra
Piotr Bojanowski
Pascal Vincent
Michael G. Rabbat
Yann LeCun
Nicolas Ballas
SSL
AI4TS
MDE
147
364
0
19 Jan 2023
Vision Learners Meet Web Image-Text Pairs
Bingchen Zhao
Quan Cui
Hao Wu
Osamu Yoshie
Cheng Yang
Oisin Mac Aodha
VLM
86
5
0
17 Jan 2023
RILS: Masked Visual Reconstruction in Language Semantic Space
Shusheng Yang
Yixiao Ge
Kun Yi
Dian Li
Ying Shan
Xiaohu Qie
Xinggang Wang
CLIP
95
11
0
17 Jan 2023
A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends
Jie Gui
Tuo Chen
Jing Zhang
Qiong Cao
Zhe Sun
Haoran Luo
Dacheng Tao
232
161
0
13 Jan 2023
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
Jia Ning
Chen Li
Zheng Zhang
Zigang Geng
Qi Dai
Kun He
Han Hu
130
47
0
05 Jan 2023
Trace Encoding in Process Mining: a survey and benchmarking
Sylvio Barbon Junior
Paolo Ceravolo
R. Oyamada
G. Tavares
AI4TS
80
21
0
05 Jan 2023
TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models
Sucheng Ren
Fangyun Wei
Zheng Zhang
Han Hu
146
43
0
03 Jan 2023
Disjoint Masking with Joint Distillation for Efficient Masked Image Modeling
Xin Ma
Chang-Shu Liu
Chunyu Xie
Long Ye
Yafeng Deng
Xiang Ji
140
10
0
31 Dec 2022
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks
Suwon Shon
Siddhant Arora
Chyi-Jiunn Lin
Ankita Pasad
Felix Wu
Roshan S. Sharma
Wei Wu
Hung-yi Lee
Karen Livescu
Shinji Watanabe
ELM
80
33
0
20 Dec 2022
Exploring Effective Fusion Algorithms for Speech Based Self-Supervised Learning Models
Changli Tang
Yujin Wang
Xie Chen
Weiqiang Zhang
61
2
0
20 Dec 2022
Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning
Huimin Wu
Chenyang Lei
Xiao Sun
Pengju Wang
Qifeng Chen
Kwang-Ting Cheng
Stephen Lin
Zhirong Wu
MQ
86
6
0
19 Dec 2022
BEATs: Audio Pre-Training with Acoustic Tokenizers
Sanyuan Chen
Yu-Huan Wu
Chengyi Wang
Shujie Liu
Daniel C. Tompkins
Zhuo Chen
Furu Wei
124
299
0
18 Dec 2022
MAViL: Masked Audio-Video Learners
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
81
54
0
15 Dec 2022
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Alexei Baevski
Arun Babu
Wei-Ning Hsu
Michael Auli
VLM
SSL
129
97
0
14 Dec 2022
Disentangling Prosody Representations with Unsupervised Speech Reconstruction
Leyuan Qu
Taiha Li
C. Weber
Theresa Pekarek-Rosin
F. Ren
S. Wermter
85
10
0
14 Dec 2022
Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders
Renrui Zhang
Liuhui Wang
Yu Qiao
Peng Gao
Hongsheng Li
3DPC
92
137
0
13 Dec 2022
FastMIM: Expediting Masked Image Modeling Pre-training for Vision
Jianyuan Guo
Kai Han
Han Wu
Yehui Tang
Yunhe Wang
Chang Xu
80
10
0
13 Dec 2022
Jointly Learning Visual and Auditory Speech Representations from Raw Data
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Maja Pantic
SSL
92
49
0
12 Dec 2022
Previous
1
2
3
...
10
11
12
7
8
9
Next