ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.03555
  4. Cited By
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
v1v2v3 (latest)

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
    SSLVLMViT
ArXiv (abs)PDFHTML

Papers citing "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"

50 / 557 papers shown
Title
On Compressing Sequences for Self-Supervised Speech Models
On Compressing Sequences for Self-Supervised Speech Models
Yen Meng
Hsuan-Jui Chen
Jiatong Shi
Shinji Watanabe
Paola García
Hung-yi Lee
Hao Tang
SSL
56
15
0
13 Oct 2022
Multilingual Zero Resource Speech Recognition Base on Self-Supervise
  Pre-Trained Acoustic Models
Multilingual Zero Resource Speech Recognition Base on Self-Supervise Pre-Trained Acoustic Models
Haoyu Wang
Weiqiang Zhang
Hongbin Suo
Yulong Wan
53
0
0
13 Oct 2022
Comparison of Soft and Hard Target RNN-T Distillation for Large-scale
  ASR
Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR
DongSeon Hwang
K. Sim
Yu Zhang
Trevor Strohman
67
11
0
11 Oct 2022
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language
  Representation Learning
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning
Zijia Zhao
Longteng Guo
Xingjian He
Shuai Shao
Zehuan Yuan
Jing Liu
105
9
0
09 Oct 2022
CoBERT: Self-Supervised Speech Representation Learning Through Code
  Representation Learning
CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
Chutong Meng
Junyi Ao
Tom Ko
Mingxuan Wang
Haizhou Li
SSL
111
6
0
08 Oct 2022
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder
  Based Speech-Text Pre-training
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Zi-Hua Zhang
Long Zhou
Junyi Ao
Shujie Liu
Lirong Dai
Jinyu Li
Furu Wei
131
58
0
07 Oct 2022
Improving Label-Deficient Keyword Spotting Through Self-Supervised
  Pretraining
Improving Label-Deficient Keyword Spotting Through Self-Supervised Pretraining
H. S. Bovbjerg
Zheng-Hua Tan
VLM
79
3
0
04 Oct 2022
That Sounds Right: Auditory Self-Supervision for Dynamic Robot
  Manipulation
That Sounds Right: Auditory Self-Supervision for Dynamic Robot Manipulation
Abitha Thankaraj
Lerrel Pinto
68
17
0
03 Oct 2022
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language
  Model
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Yi-Jen Shih
Hsuan-Fu Wang
Heng-Jui Chang
Layne Berry
Hung-yi Lee
David Harwath
VLMCLIP
137
32
0
03 Oct 2022
Where Should I Spend My FLOPS? Efficiency Evaluations of Visual
  Pre-training Methods
Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods
Skanda Koppula
Yazhe Li
Evan Shelhamer
Andrew Jaegle
Nikhil Parthasarathy
Relja Arandjelović
João Carreira
Olivier J. Hénaff
86
9
0
30 Sep 2022
Match to Win: Analysing Sequences Lengths for Efficient Self-supervised
  Learning in Speech and Audio
Match to Win: Analysing Sequences Lengths for Efficient Self-supervised Learning in Speech and Audio
Yan Gao
Javier Fernandez-Marques
Titouan Parcollet
Pedro Porto Buarque de Gusmão
Nicholas D. Lane
87
9
0
30 Sep 2022
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
Zi-Hua Zhang
Sanyuan Chen
Long Zhou
Yu Wu
Shuo Ren
...
Zhuoyuan Yao
Xun Gong
Lirong Dai
Jinyu Li
Furu Wei
79
57
0
30 Sep 2022
TVLT: Textless Vision-Language Transformer
TVLT: Textless Vision-Language Transformer
Zineng Tang
Jaemin Cho
Yixin Nie
Joey Tianyi Zhou
VLM
137
31
0
28 Sep 2022
An Efficient Multitask Learning Architecture for Affective Vocal Burst
  Analysis
An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis
Tobias Hallmen
Silvan Mertes
Dominik Schiller
Elisabeth André
52
5
0
28 Sep 2022
Implementing and Experimenting with Diffusion Models for Text-to-Image
  Generation
Implementing and Experimenting with Diffusion Models for Text-to-Image Generation
Robin Zbinden
42
3
0
22 Sep 2022
Deep Lake: a Lakehouse for Deep Learning
Deep Lake: a Lakehouse for Deep Learning
S. Hambardzumyan
Abhina Tuli
Levon Ghukasyan
Fariz Rahman
Hrant Topchyan
...
Mark McQuade
M. Harutyunyan
Tatevik Hakobyan
I. Stranic
Davit Buniatyan
90
21
0
22 Sep 2022
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples
  on Self-Supervised Speech Recognition models
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models
R. Olivier
H. Abdullah
Bhiksha Raj
AAML
75
1
0
17 Sep 2022
Exploring Target Representations for Masked Autoencoders
Exploring Target Representations for Masked Autoencoders
Xingbin Liu
Jinghao Zhou
Tao Kong
Xianming Lin
Rongrong Ji
197
52
0
08 Sep 2022
Generalization in Neural Networks: A Broad Survey
Generalization in Neural Networks: A Broad Survey
Chris Rohlfs
OODAI4CE
67
7
0
04 Sep 2022
BinImg2Vec: Augmenting Malware Binary Image Classification with Data2Vec
BinImg2Vec: Augmenting Malware Binary Image Classification with Data2Vec
Joon Sern Lee
Kai Keng Tay
Zong Fu Chua
15
2
0
02 Sep 2022
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image
  Pretraining
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Xiaoyi Dong
Jianmin Bao
Yinglin Zheng
Ting Zhang
Dongdong Chen
...
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIPVLM
115
167
0
25 Aug 2022
AI and 6G into the Metaverse: Fundamentals, Challenges and Future
  Research Trends
AI and 6G into the Metaverse: Fundamentals, Challenges and Future Research Trends
Muhammad Zawish
Fayaz Ali Dharejo
Sunder Ali Khowaja
Saleem Raza
Steven Davy
Kapal Dev
P. Bellavista
82
68
0
23 Aug 2022
Estimating a potential without the agony of the partition function
Estimating a potential without the agony of the partition function
E. Haber
Moshe Eliasof
L. Tenorio
61
2
0
19 Aug 2022
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
75
323
0
12 Aug 2022
MILAN: Masked Image Pretraining on Language Assisted Representation
MILAN: Masked Image Pretraining on Language Assisted Representation
Zejiang Hou
Fei Sun
Yen-kuang Chen
Yuan Xie
S. Kung
ViT
123
70
0
11 Aug 2022
Understanding Masked Image Modeling via Learning Occlusion Invariant
  Feature
Understanding Masked Image Modeling via Learning Occlusion Invariant Feature
Xiangwen Kong
Xiangyu Zhang
SSL
78
55
0
08 Aug 2022
SdAE: Self-distillated Masked Autoencoder
SdAE: Self-distillated Masked Autoencoder
Yabo Chen
Yuchen Liu
Dongsheng Jiang
Xiaopeng Zhang
Wenrui Dai
H. Xiong
Qi Tian
ViT
99
74
0
31 Jul 2022
A Survey on Masked Autoencoder for Self-supervised Learning in Vision
  and Beyond
A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
Chaoning Zhang
Chenshuang Zhang
Junha Song
John Seon Keun Yi
Kang Zhang
In So Kweon
SSL
96
78
0
30 Jul 2022
UAVM: Towards Unifying Audio and Visual Models
UAVM: Towards Unifying Audio and Visual Models
Yuan Gong
Alexander H. Liu
Andrew Rouditchenko
James R. Glass
75
23
0
29 Jul 2022
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech
  Recognition at Production Scale
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale
Gopinath Chennupati
Milind Rao
Gurpreet Chadha
Aaron Eakin
A. Raju
...
Andrew Oberlin
Buddha Nandanoor
Prahalad Venkataramanan
Zheng Wu
Pankaj Sitpure
CLL
95
8
0
19 Jul 2022
Bootstrapped Masked Autoencoders for Vision BERT Pretraining
Bootstrapped Masked Autoencoders for Vision BERT Pretraining
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
89
79
0
14 Jul 2022
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer
  to Unlabeled Modality
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality
Wei-Ning Hsu
Bowen Shi
SSLVLM
112
43
0
14 Jul 2022
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic
  Knowledge Distillation of Self-Supervised Speech Models
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
83
31
0
14 Jul 2022
Masked Autoencoders that Listen
Masked Autoencoders that Listen
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
145
290
0
13 Jul 2022
Big Learning
Big Learning
Yulai Cong
Miaoyun Zhao
AI4CE
94
0
0
08 Jul 2022
Leveraging Acoustic Contextual Representation by Audio-textual
  Cross-modal Learning for Conversational ASR
Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR
Kun Wei
Yike Zhang
Sining Sun
Lei Xie
Long Ma
62
9
0
03 Jul 2022
FAIR principles for AI models with a practical application for
  accelerated high energy diffraction microscopy
FAIR principles for AI models with a practical application for accelerated high energy diffraction microscopy
Nikil Ravi
Pranshu Chaturvedi
Eliu A. Huerta
Zhengchun Liu
Ryan Chard
Aristana Scourtas
K. J. Schmidt
Kyle Chard
Ben Blaiszik
Ian Foster
119
29
0
01 Jul 2022
Analysis of Self-Supervised Learning and Dimensionality Reduction
  Methods in Clustering-Based Active Learning for Speech Emotion Recognition
Analysis of Self-Supervised Learning and Dimensionality Reduction Methods in Clustering-Based Active Learning for Speech Emotion Recognition
Einari Vaaras
Manu Airaksinen
Okko Räsänen
48
6
0
21 Jun 2022
Supervision-Guided Codebooks for Masked Prediction in Speech
  Pre-training
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training
Chengyi Wang
Yiming Wang
Yu Wu
Sanyuan Chen
Jinyu Li
Shujie Liu
Furu Wei
SSL
95
20
0
21 Jun 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary
  Algorithm
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
121
35
0
19 Jun 2022
OmniMAE: Single Model Masked Pretraining on Images and Videos
OmniMAE: Single Model Masked Pretraining on Images and Videos
Rohit Girdhar
Alaaeldin El-Nouby
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
ViT
117
99
0
16 Jun 2022
Masked Frequency Modeling for Self-Supervised Visual Pre-Training
Masked Frequency Modeling for Self-Supervised Visual Pre-Training
Jiahao Xie
Wei Li
Xiaohang Zhan
Ziwei Liu
Yew-Soon Ong
Chen Change Loy
115
76
0
15 Jun 2022
Masked Siamese ConvNets
Masked Siamese ConvNets
L. Jing
Jiachen Zhu
Yann LeCun
SSL
118
35
0
15 Jun 2022
Language Models are General-Purpose Interfaces
Language Models are General-Purpose Interfaces
Y. Hao
Haoyu Song
Li Dong
Shaohan Huang
Zewen Chi
Wenhui Wang
Shuming Ma
Furu Wei
MLLM
78
102
0
13 Jun 2022
Extreme Masking for Learning Instance and Distributed Visual
  Representations
Extreme Masking for Learning Instance and Distributed Visual Representations
Zhirong Wu
Zihang Lai
Xiao Sun
Stephen Lin
106
22
0
09 Jun 2022
Words are all you need? Language as an approximation for human
  similarity judgments
Words are all you need? Language as an approximation for human similarity judgments
Raja Marjieh
Pol van Rijn
Ilia Sucholutsky
T. Sumers
Harin Lee
Thomas Griffiths
Nori Jacoby
93
19
0
08 Jun 2022
Towards Understanding Why Mask-Reconstruction Pretraining Helps in
  Downstream Tasks
Towards Understanding Why Mask-Reconstruction Pretraining Helps in Downstream Tasks
Jia Pan
Pan Zhou
Shuicheng Yan
SSL
89
18
0
08 Jun 2022
Masked Unsupervised Self-training for Label-free Image Classification
Masked Unsupervised Self-training for Label-free Image Classification
Junnan Li
Silvio Savarese
Steven C. H. Hoi
VLMSSL
45
13
0
07 Jun 2022
Beyond Just Vision: A Review on Self-Supervised Representation Learning
  on Multimodal and Temporal Data
Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data
Shohreh Deldari
Hao Xue
Aaqib Saeed
Jiayuan He
Daniel V. Smith
Flora D. Salim
AI4TS
75
37
0
06 Jun 2022
Siamese Image Modeling for Self-Supervised Vision Representation
  Learning
Siamese Image Modeling for Self-Supervised Vision Representation Learning
Chenxin Tao
Xizhou Zhu
Weijie Su
Gao Huang
Bin Li
Jie Zhou
Yu Qiao
Xiaogang Wang
Jifeng Dai
SSL
111
97
0
02 Jun 2022
Previous
123...1011129
Next