ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.03555
  4. Cited By
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
v1v2v3 (latest)

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
    SSLVLMViT
ArXiv (abs)PDFHTML

Papers citing "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"

50 / 557 papers shown
Title
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1
  Accuracy with ViT-B and ViT-L on ImageNet
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Shuyang Gu
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
74
37
0
12 Dec 2022
TriNet: stabilizing self-supervised learning from complete or slow
  collapse on ASR
TriNet: stabilizing self-supervised learning from complete or slow collapse on ASR
Lixin Cao
Jun Wang
Ben Yang
Jane Polak Scowcroft
Dong Yu
72
4
0
12 Dec 2022
Deep Architectures for Content Moderation and Movie Content Rating
Deep Architectures for Content Moderation and Movie Content Rating
Fatih Çagatay Akyön
A. Temi̇zel
80
5
0
08 Dec 2022
Group Generalized Mean Pooling for Vision Transformer
Group Generalized Mean Pooling for Vision Transformer
ByungSoo Ko
Han-Gyu Kim
Byeongho Heo
Sangdoo Yun
Sanghyuk Chun
Geonmo Gu
Wonjae Kim
ViT
90
1
0
08 Dec 2022
Improved Speech Pre-Training with Supervision-Enhanced Acoustic Unit
Improved Speech Pre-Training with Supervision-Enhanced Acoustic Unit
Pengcheng Li
Genshun Wan
Fenglin Ding
Hang Chen
Jianqing Gao
Jia Pan
Cong Liu
SSL
53
1
0
07 Dec 2022
Improved Self-Supervised Multilingual Speech Representation Learning
  Combined with Auxiliary Language Information
Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information
Fenglin Ding
Genshun Wan
Pengcheng Li
Jia Pan
Cong Liu
SSL
83
1
0
07 Dec 2022
Self-Supervised Audio-Visual Speech Representations Learning By
  Multimodal Self-Distillation
Self-Supervised Audio-Visual Speech Representations Learning By Multimodal Self-Distillation
Jing-Xuan Zhang
Genshun Wan
Zhenhua Ling
Jia Pan
Jianqing Gao
Cong Liu
SSL
81
13
0
06 Dec 2022
Location-Aware Self-Supervised Transformers for Semantic Segmentation
Location-Aware Self-Supervised Transformers for Semantic Segmentation
Mathilde Caron
N. Houlsby
Cordelia Schmid
ViT
70
14
0
05 Dec 2022
MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music
  Audio Representation Learning
MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning
Yizhi Li
Ruibin Yuan
Ge Zhang
Yi Ma
Chenghua Lin
...
Haoyu He
Emmanouil Benetos
Norbert Gyenge
Ruibo Liu
Jie Fu
SSL
87
21
0
05 Dec 2022
Exploring Stochastic Autoregressive Image Modeling for Visual
  Representation
Exploring Stochastic Autoregressive Image Modeling for Visual Representation
Yu-Hang Qi
Fan Yang
Yousong Zhu
Yufei Liu
Liwei Wu
Rui Zhao
Wei Li
DiffM
57
13
0
03 Dec 2022
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech
  Recognition
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Xiaohuan Zhou
Jiaming Wang
Zeyu Cui
Shiliang Zhang
Zhijie Yan
Jingren Zhou
Chang Zhou
93
12
0
29 Nov 2022
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video
  Representation Learning
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
Pritam Sarkar
Ali Etemad
112
23
0
25 Nov 2022
TESSP: Text-Enhanced Self-Supervised Speech Pre-training
TESSP: Text-Enhanced Self-Supervised Speech Pre-training
Zhuoyuan Yao
Shuo Ren
Sanyuan Chen
Ziyang Ma
Pengcheng Guo
Linfu Xie
90
5
0
24 Nov 2022
Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token
  Migration
Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token Migration
Yunjie Tian
Lingxi Xie
Jihao Qiu
Jianbin Jiao
Yaowei Wang
Qi Tian
Qixiang Ye
ViT
98
7
0
23 Nov 2022
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for
  Speech Representation Learning
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Qiu-shi Zhu
Long Zhou
Zi-Hua Zhang
Shujie Liu
Binxing Jiao
Jie Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
113
38
0
21 Nov 2022
CroCo v2: Improved Cross-view Completion Pre-training for Stereo
  Matching and Optical Flow
CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow
Philippe Weinzaepfel
Thomas Lucas
Vincent Leroy
Yohann Cabon
Vaibhav Arora
Romain Brégier
G. Csurka
L. Antsfeld
Boris Chidlovskii
Jérôme Revaud
ViT
133
97
0
18 Nov 2022
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual
  Information
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Weijie Su
Xizhou Zhu
Chenxin Tao
Lewei Lu
Bin Li
Gao Huang
Yu Qiao
Xiaogang Wang
Jie Zhou
Jifeng Dai
97
42
0
17 Nov 2022
CAE v2: Context Autoencoder with CLIP Target
CAE v2: Context Autoencoder with CLIP Target
Xinyu Zhang
Jiahui Chen
Junkun Yuan
Qiang Chen
Jian Wang
...
Jimin Pi
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
VLMCLIP
109
24
0
17 Nov 2022
Assessing Neural Network Robustness via Adversarial Pivotal Tuning
Assessing Neural Network Robustness via Adversarial Pivotal Tuning
Peter Ebert Christensen
Vésteinn Snaebjarnarson
Andrea Dittadi
Serge Belongie
Sagie Benaim
AAML
93
1
0
17 Nov 2022
Prompt Tuning for Parameter-efficient Medical Image Segmentation
Prompt Tuning for Parameter-efficient Medical Image Segmentation
Marc Fischer
Alexander Bartler
Bin Yang
SSeg
61
21
0
16 Nov 2022
Stare at What You See: Masked Image Modeling without Reconstruction
Stare at What You See: Masked Image Modeling without Reconstruction
Hongwei Xue
Peng Gao
Hongyang Li
Yu Qiao
Hao Sun
Houqiang Li
Jiebo Luo
68
32
0
16 Nov 2022
Improving Speech Emotion Recognition with Unsupervised Speaking Style
  Transfer
Improving Speech Emotion Recognition with Unsupervised Speaking Style Transfer
Leyuan Qu
Wei Wang
C. Weber
F. Ren
Taiha Li
S. Wermter
40
1
0
16 Nov 2022
EVA: Exploring the Limits of Masked Visual Representation Learning at
  Scale
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLMCLIP
251
730
0
14 Nov 2022
MT4SSL: Boosting Self-Supervised Speech Representation Learning by
  Integrating Multiple Targets
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
Ziyang Ma
Zhisheng Zheng
Changli Tang
Yujin Wang
Xie Chen
124
20
0
14 Nov 2022
SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for
  Self-Supervised Learning in Earth Observation
SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation
Yi Wang
Nassim Ait Ali Braham
Zhitong Xiong
Chenying Liu
C. Albrecht
Xiao Xiang Zhu
103
73
0
13 Nov 2022
MARLIN: Masked Autoencoder for facial video Representation LearnINg
MARLIN: Masked Autoencoder for facial video Representation LearnINg
Zhixi Cai
Shreya Ghosh
Kalin Stefanov
Abhinav Dhall
Jianfei Cai
Hamid Rezatofighi
Reza Haffari
Munawar Hayat
ViTCVBM
114
62
0
12 Nov 2022
Okapi: Generalising Better by Making Statistical Matches Match
Okapi: Generalising Better by Making Statistical Matches Match
Myles Bartlett
Sara Romiti
V. Sharmanska
Novi Quadrianto
83
3
0
07 Nov 2022
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Yonggan Fu
Yang Zhang
Kaizhi Qian
Zhifan Ye
Zhongzhi Yu
Cheng-I Jeff Lai
Yingyan Lin
168
9
0
02 Nov 2022
data2vec-aqc: Search for the right Teaching Assistant in the
  Teacher-Student training setup
data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup
Vasista Sai Lodagala
Sreyan Ghosh
S. Umesh
SSL
72
5
0
02 Nov 2022
Deep Multimodal Fusion for Generalizable Person Re-identification
Deep Multimodal Fusion for Generalizable Person Re-identification
Suncheng Xiang
Hao Chen
Jing Gao
Jiawang Mou
Ting Liu
Xiaobo Li
Yuzhuo Fu
93
5
0
02 Nov 2022
More Speaking or More Speakers?
More Speaking or More Speakers?
Dan Berrebbi
R. Collobert
Navdeep Jaitly
Tatiana Likhomanenko
49
6
0
02 Nov 2022
Self-Supervised Learning with Limited Labeled Data for Prostate Cancer
  Detection in High Frequency Ultrasound
Self-Supervised Learning with Limited Labeled Data for Prostate Cancer Detection in High Frequency Ultrasound
P. Wilson
Mahdi Gilany
A. Jamzad
Fahimeh Fooladgar
Minh-Son To
Brian Wodlinger
Purang Abolmaesumi
P. Mousavi
71
12
0
01 Nov 2022
Speech-text based multi-modal training with bidirectional attention for
  improved speech recognition
Speech-text based multi-modal training with bidirectional attention for improved speech recognition
Yuhang Yang
Haihua Xu
Hao-Ming Huang
Eng Siong Chng
Sheng Li
93
7
0
01 Nov 2022
Training Vision-Language Models with Less Bimodal Supervision
Training Vision-Language Models with Less Bimodal Supervision
Elad Segal
Ben Bogin
Jonathan Berant
VLM
53
2
0
01 Nov 2022
token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired
  Speech and Text
token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text
Xianghu Yue
Junyi Ao
Xiaoxue Gao
Haizhou Li
SSL
60
8
0
30 Oct 2022
Exploring Effective Distillation of Self-Supervised Speech Models for
  Automatic Speech Recognition
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition
Yujin Wang
Changli Tang
Ziyang Ma
Zhisheng Zheng
Xie Chen
Weiqiang Zhang
128
1
0
27 Oct 2022
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by
  Combining Regression and Improved Contrastive Learning
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning
Qiu-shi Zhu
Long Zhou
Jie Zhang
Shujie Liu
Yu-Chen Hu
Lirong Dai
VLMSSL
103
37
0
27 Oct 2022
Masked Modeling Duo: Learning Representations by Encouraging Both
  Networks to Model the Input
Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
SSL
105
33
0
26 Oct 2022
AVES: Animal Vocalization Encoder based on Self-Supervision
AVES: Animal Vocalization Encoder based on Self-Supervision
Masato Hagiwara
CLIPVLMAI4TS
53
24
0
26 Oct 2022
Learning Explicit Object-Centric Representations with Vision
  Transformers
Learning Explicit Object-Centric Representations with Vision Transformers
Oscar Vikström
Alexander Ilin
OCLViT
79
4
0
25 Oct 2022
Adversarial Pretraining of Self-Supervised Deep Networks: Past, Present
  and Future
Adversarial Pretraining of Self-Supervised Deep Networks: Past, Present and Future
Guo-Jun Qi
M. Shah
SSL
78
8
0
23 Oct 2022
Evidence of Vocal Tract Articulation in Self-Supervised Learning of
  Speech
Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech
Cheol Jun Cho
Peter Wu
Abdel-rahman Mohamed
Gopala K. Anumanchipalli
89
34
0
21 Oct 2022
Towards Sustainable Self-supervised Learning
Towards Sustainable Self-supervised Learning
Shanghua Gao
Pan Zhou
Mingg-Ming Cheng
Shuicheng Yan
CLL
127
7
0
20 Oct 2022
CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View
  Completion
CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion
Philippe Weinzaepfel
Vincent Leroy
Thomas Lucas
Romain Brégier
Yohann Cabon
Vaibhav Arora
L. Antsfeld
Boris Chidlovskii
G. Csurka
Jérôme Revaud
SSL
138
73
0
19 Oct 2022
A Unified View of Masked Image Modeling
A Unified View of Masked Image Modeling
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
VLM
133
38
0
19 Oct 2022
Continuous Pseudo-Labeling from the Start
Continuous Pseudo-Labeling from the Start
Dan Berrebbi
R. Collobert
Samy Bengio
Navdeep Jaitly
Tatiana Likhomanenko
65
16
0
17 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of
  Self-Supervised Speech Representation Learning
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Tzu-hsun Feng
Annie Dong
Ching-Feng Yeh
Shu-Wen Yang
Tzu-Quan Lin
...
Xuankai Chang
Shinji Watanabe
Abdel-rahman Mohamed
Shang-Wen Li
Hung-yi Lee
ELMSSL
102
35
0
16 Oct 2022
Improving generalizability of distilled self-supervised speech
  processing models under distorted settings
Improving generalizability of distilled self-supervised speech processing models under distorted settings
Kuan-Po Huang
Yu-Kuan Fu
Tsung-Yuan Hsu
Fabian Ritter-Gutierrez
Fan Wang
Liang-Hsuan Tseng
Yu Zhang
Hung-yi Lee
80
14
0
14 Oct 2022
Multi-Modal Recommendation System with Auxiliary Information
Multi-Modal Recommendation System with Auxiliary Information
Mufhumudzi Muthivhi
Terence L van Zyl
Hairong Wang
32
2
0
13 Oct 2022
The Hidden Uniform Cluster Prior in Self-Supervised Learning
The Hidden Uniform Cluster Prior in Self-Supervised Learning
Mahmoud Assran
Randall Balestriero
Quentin Duval
Florian Bordes
Ishan Misra
Piotr Bojanowski
Pascal Vincent
Michael G. Rabbat
Nicolas Ballas
SSL
96
50
0
13 Oct 2022
Previous
123...10111289
Next