Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.03555
Cited By
v1
v2
v3 (latest)
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"
50 / 557 papers shown
Title
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Shuyang Gu
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
74
37
0
12 Dec 2022
TriNet: stabilizing self-supervised learning from complete or slow collapse on ASR
Lixin Cao
Jun Wang
Ben Yang
Jane Polak Scowcroft
Dong Yu
72
4
0
12 Dec 2022
Deep Architectures for Content Moderation and Movie Content Rating
Fatih Çagatay Akyön
A. Temi̇zel
80
5
0
08 Dec 2022
Group Generalized Mean Pooling for Vision Transformer
ByungSoo Ko
Han-Gyu Kim
Byeongho Heo
Sangdoo Yun
Sanghyuk Chun
Geonmo Gu
Wonjae Kim
ViT
90
1
0
08 Dec 2022
Improved Speech Pre-Training with Supervision-Enhanced Acoustic Unit
Pengcheng Li
Genshun Wan
Fenglin Ding
Hang Chen
Jianqing Gao
Jia Pan
Cong Liu
SSL
53
1
0
07 Dec 2022
Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information
Fenglin Ding
Genshun Wan
Pengcheng Li
Jia Pan
Cong Liu
SSL
83
1
0
07 Dec 2022
Self-Supervised Audio-Visual Speech Representations Learning By Multimodal Self-Distillation
Jing-Xuan Zhang
Genshun Wan
Zhenhua Ling
Jia Pan
Jianqing Gao
Cong Liu
SSL
81
13
0
06 Dec 2022
Location-Aware Self-Supervised Transformers for Semantic Segmentation
Mathilde Caron
N. Houlsby
Cordelia Schmid
ViT
70
14
0
05 Dec 2022
MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning
Yizhi Li
Ruibin Yuan
Ge Zhang
Yi Ma
Chenghua Lin
...
Haoyu He
Emmanouil Benetos
Norbert Gyenge
Ruibo Liu
Jie Fu
SSL
87
21
0
05 Dec 2022
Exploring Stochastic Autoregressive Image Modeling for Visual Representation
Yu-Hang Qi
Fan Yang
Yousong Zhu
Yufei Liu
Liwei Wu
Rui Zhao
Wei Li
DiffM
57
13
0
03 Dec 2022
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Xiaohuan Zhou
Jiaming Wang
Zeyu Cui
Shiliang Zhang
Zhijie Yan
Jingren Zhou
Chang Zhou
93
12
0
29 Nov 2022
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
Pritam Sarkar
Ali Etemad
112
23
0
25 Nov 2022
TESSP: Text-Enhanced Self-Supervised Speech Pre-training
Zhuoyuan Yao
Shuo Ren
Sanyuan Chen
Ziyang Ma
Pengcheng Guo
Linfu Xie
90
5
0
24 Nov 2022
Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token Migration
Yunjie Tian
Lingxi Xie
Jihao Qiu
Jianbin Jiao
Yaowei Wang
Qi Tian
Qixiang Ye
ViT
98
7
0
23 Nov 2022
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Qiu-shi Zhu
Long Zhou
Zi-Hua Zhang
Shujie Liu
Binxing Jiao
Jie Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
113
38
0
21 Nov 2022
CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow
Philippe Weinzaepfel
Thomas Lucas
Vincent Leroy
Yohann Cabon
Vaibhav Arora
Romain Brégier
G. Csurka
L. Antsfeld
Boris Chidlovskii
Jérôme Revaud
ViT
133
97
0
18 Nov 2022
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Weijie Su
Xizhou Zhu
Chenxin Tao
Lewei Lu
Bin Li
Gao Huang
Yu Qiao
Xiaogang Wang
Jie Zhou
Jifeng Dai
97
42
0
17 Nov 2022
CAE v2: Context Autoencoder with CLIP Target
Xinyu Zhang
Jiahui Chen
Junkun Yuan
Qiang Chen
Jian Wang
...
Jimin Pi
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
VLM
CLIP
109
24
0
17 Nov 2022
Assessing Neural Network Robustness via Adversarial Pivotal Tuning
Peter Ebert Christensen
Vésteinn Snaebjarnarson
Andrea Dittadi
Serge Belongie
Sagie Benaim
AAML
93
1
0
17 Nov 2022
Prompt Tuning for Parameter-efficient Medical Image Segmentation
Marc Fischer
Alexander Bartler
Bin Yang
SSeg
61
21
0
16 Nov 2022
Stare at What You See: Masked Image Modeling without Reconstruction
Hongwei Xue
Peng Gao
Hongyang Li
Yu Qiao
Hao Sun
Houqiang Li
Jiebo Luo
68
32
0
16 Nov 2022
Improving Speech Emotion Recognition with Unsupervised Speaking Style Transfer
Leyuan Qu
Wei Wang
C. Weber
F. Ren
Taiha Li
S. Wermter
40
1
0
16 Nov 2022
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
251
730
0
14 Nov 2022
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
Ziyang Ma
Zhisheng Zheng
Changli Tang
Yujin Wang
Xie Chen
124
20
0
14 Nov 2022
SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation
Yi Wang
Nassim Ait Ali Braham
Zhitong Xiong
Chenying Liu
C. Albrecht
Xiao Xiang Zhu
103
73
0
13 Nov 2022
MARLIN: Masked Autoencoder for facial video Representation LearnINg
Zhixi Cai
Shreya Ghosh
Kalin Stefanov
Abhinav Dhall
Jianfei Cai
Hamid Rezatofighi
Reza Haffari
Munawar Hayat
ViT
CVBM
114
62
0
12 Nov 2022
Okapi: Generalising Better by Making Statistical Matches Match
Myles Bartlett
Sara Romiti
V. Sharmanska
Novi Quadrianto
83
3
0
07 Nov 2022
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Yonggan Fu
Yang Zhang
Kaizhi Qian
Zhifan Ye
Zhongzhi Yu
Cheng-I Jeff Lai
Yingyan Lin
168
9
0
02 Nov 2022
data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup
Vasista Sai Lodagala
Sreyan Ghosh
S. Umesh
SSL
72
5
0
02 Nov 2022
Deep Multimodal Fusion for Generalizable Person Re-identification
Suncheng Xiang
Hao Chen
Jing Gao
Jiawang Mou
Ting Liu
Xiaobo Li
Yuzhuo Fu
93
5
0
02 Nov 2022
More Speaking or More Speakers?
Dan Berrebbi
R. Collobert
Navdeep Jaitly
Tatiana Likhomanenko
49
6
0
02 Nov 2022
Self-Supervised Learning with Limited Labeled Data for Prostate Cancer Detection in High Frequency Ultrasound
P. Wilson
Mahdi Gilany
A. Jamzad
Fahimeh Fooladgar
Minh-Son To
Brian Wodlinger
Purang Abolmaesumi
P. Mousavi
71
12
0
01 Nov 2022
Speech-text based multi-modal training with bidirectional attention for improved speech recognition
Yuhang Yang
Haihua Xu
Hao-Ming Huang
Eng Siong Chng
Sheng Li
93
7
0
01 Nov 2022
Training Vision-Language Models with Less Bimodal Supervision
Elad Segal
Ben Bogin
Jonathan Berant
VLM
53
2
0
01 Nov 2022
token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text
Xianghu Yue
Junyi Ao
Xiaoxue Gao
Haizhou Li
SSL
60
8
0
30 Oct 2022
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition
Yujin Wang
Changli Tang
Ziyang Ma
Zhisheng Zheng
Xie Chen
Weiqiang Zhang
128
1
0
27 Oct 2022
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning
Qiu-shi Zhu
Long Zhou
Jie Zhang
Shujie Liu
Yu-Chen Hu
Lirong Dai
VLM
SSL
103
37
0
27 Oct 2022
Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
SSL
105
33
0
26 Oct 2022
AVES: Animal Vocalization Encoder based on Self-Supervision
Masato Hagiwara
CLIP
VLM
AI4TS
53
24
0
26 Oct 2022
Learning Explicit Object-Centric Representations with Vision Transformers
Oscar Vikström
Alexander Ilin
OCL
ViT
79
4
0
25 Oct 2022
Adversarial Pretraining of Self-Supervised Deep Networks: Past, Present and Future
Guo-Jun Qi
M. Shah
SSL
78
8
0
23 Oct 2022
Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech
Cheol Jun Cho
Peter Wu
Abdel-rahman Mohamed
Gopala K. Anumanchipalli
89
34
0
21 Oct 2022
Towards Sustainable Self-supervised Learning
Shanghua Gao
Pan Zhou
Mingg-Ming Cheng
Shuicheng Yan
CLL
127
7
0
20 Oct 2022
CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion
Philippe Weinzaepfel
Vincent Leroy
Thomas Lucas
Romain Brégier
Yohann Cabon
Vaibhav Arora
L. Antsfeld
Boris Chidlovskii
G. Csurka
Jérôme Revaud
SSL
138
73
0
19 Oct 2022
A Unified View of Masked Image Modeling
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
VLM
133
38
0
19 Oct 2022
Continuous Pseudo-Labeling from the Start
Dan Berrebbi
R. Collobert
Samy Bengio
Navdeep Jaitly
Tatiana Likhomanenko
65
16
0
17 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Tzu-hsun Feng
Annie Dong
Ching-Feng Yeh
Shu-Wen Yang
Tzu-Quan Lin
...
Xuankai Chang
Shinji Watanabe
Abdel-rahman Mohamed
Shang-Wen Li
Hung-yi Lee
ELM
SSL
102
35
0
16 Oct 2022
Improving generalizability of distilled self-supervised speech processing models under distorted settings
Kuan-Po Huang
Yu-Kuan Fu
Tsung-Yuan Hsu
Fabian Ritter-Gutierrez
Fan Wang
Liang-Hsuan Tseng
Yu Zhang
Hung-yi Lee
80
14
0
14 Oct 2022
Multi-Modal Recommendation System with Auxiliary Information
Mufhumudzi Muthivhi
Terence L van Zyl
Hairong Wang
32
2
0
13 Oct 2022
The Hidden Uniform Cluster Prior in Self-Supervised Learning
Mahmoud Assran
Randall Balestriero
Quentin Duval
Florian Bordes
Ishan Misra
Piotr Bojanowski
Pascal Vincent
Michael G. Rabbat
Nicolas Ballas
SSL
96
50
0
13 Oct 2022
Previous
1
2
3
...
10
11
12
8
9
Next