ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.03555
  4. Cited By
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
v1v2v3 (latest)

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
    SSLVLMViT
ArXiv (abs)PDFHTML

Papers citing "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"

50 / 557 papers shown
Title
On the Stepwise Nature of Self-Supervised Learning
On the Stepwise Nature of Self-Supervised Learning
James B. Simon
Maksis Knutins
Liu Ziyin
Daniel Geisz
Abraham J. Fetterman
Joshua Albrecht
SSL
96
35
0
27 Mar 2023
Decoupled Multimodal Distilling for Emotion Recognition
Decoupled Multimodal Distilling for Emotion Recognition
Yong Li
Yuan-Zheng Wang
Zhen Cui
94
83
0
24 Mar 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
169
48
0
21 Mar 2023
GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling
  for Multi-view 3D Understanding
GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding
Jihao Liu
Tai Wang
Boxiao Liu
Qihang Zhang
Yu Liu
Hongsheng Li
69
16
0
20 Mar 2023
Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture
  and Single-Source Speech
Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech
Maryam Fazel-Zarandi
Wei-Ning Hsu
SSL
59
9
0
20 Mar 2023
Right the docs: Characterising voice dataset documentation practices
  used in machine learning
Right the docs: Characterising voice dataset documentation practices used in machine learning
Kathy Reid
Elizabeth T. Williams
66
2
0
19 Mar 2023
OVRL-V2: A simple state-of-art baseline for ImageNav and ObjectNav
OVRL-V2: A simple state-of-art baseline for ImageNav and ObjectNav
Karmesh Yadav
Arjun Majumdar
Ram Ramrakhya
Naoki Yokoyama
Alexei Baevski
Z. Kira
Oleksandr Maksymets
Dhruv Batra
ViT
99
49
0
14 Mar 2023
AdPE: Adversarial Positional Embeddings for Pretraining Vision
  Transformers via MAE+
AdPE: Adversarial Positional Embeddings for Pretraining Vision Transformers via MAE+
Tianlin Li
Ying Wang
Ziwei Xuan
Guo-Jun Qi
ViT
75
3
0
14 Mar 2023
CrossFormer++: A Versatile Vision Transformer Hinging on Cross-scale
  Attention
CrossFormer++: A Versatile Vision Transformer Hinging on Cross-scale Attention
Wenxiao Wang
Wei Chen
Qibo Qiu
Long Chen
Boxi Wu
Binbin Lin
Xiaofei He
Wei Liu
98
49
0
13 Mar 2023
Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature
  Mimicking
Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking
Peng Gao
Renrui Zhang
Rongyao Fang
Ziyi Lin
Hongyang Li
Hongsheng Li
Qiao Yu
65
19
0
09 Mar 2023
Improving Few-Shot Learning for Talking Face System with TTS Data
  Augmentation
Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation
Qi Chen
Ziyang Ma
Tao Liu
Xuejiao Tan
Qu Lu
Xie Chen
K. Yu
CVBM
69
5
0
09 Mar 2023
Masked Image Modeling with Local Multi-Scale Reconstruction
Masked Image Modeling with Local Multi-Scale Reconstruction
Haoqing Wang
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhiwei Deng
Kai Han
90
52
0
09 Mar 2023
Centroid-centered Modeling for Efficient Vision Transformer Pre-training
Centroid-centered Modeling for Efficient Vision Transformer Pre-training
Xin Yan
Zuchao Li
Lefei Zhang
Bo Du
Dacheng Tao
VLM
73
0
0
08 Mar 2023
Self-supervised speech representation learning for keyword-spotting with
  light-weight transformers
Self-supervised speech representation learning for keyword-spotting with light-weight transformers
Chenyang Gao
Yue Gu
Francesco Calivá
Yuzong Liu
OffRL
81
4
0
07 Mar 2023
Applying Plain Transformers to Real-World Point Clouds
Applying Plain Transformers to Real-World Point Clouds
Lanxiao Li
M. Heizmann
3DPCViT
82
3
0
28 Feb 2023
Generic-to-Specific Distillation of Masked Autoencoders
Generic-to-Specific Distillation of Masked Autoencoders
Wei Huang
Zhiliang Peng
Li Dong
Furu Wei
Jianbin Jiao
QiXiang Ye
90
23
0
28 Feb 2023
Efficient Masked Autoencoders with Self-Consistency
Efficient Masked Autoencoders with Self-Consistency
Zhaowen Li
Yousong Zhu
Zhiyang Chen
Wei Li
Chaoyang Zhao
Rui Zhao
Ming Tang
Jinqiao Wang
136
2
0
28 Feb 2023
Phone and speaker spatial organization in self-supervised speech
  representations
Phone and speaker spatial organization in self-supervised speech representations
Pablo Riera
M. Cerdeiro
L. Pepino
Luciana Ferrer
SSL
86
1
0
24 Feb 2023
Front-End Adapter: Adapting Front-End Input of Speech based
  Self-Supervised Learning for Speech Recognition
Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition
Xie Chen
Ziyang Ma
Changli Tang
Yujin Wang
Zhi-shen Zheng
57
4
0
18 Feb 2023
Gaussian-smoothed Imbalance Data Improves Speech Emotion Recognition
Gaussian-smoothed Imbalance Data Improves Speech Emotion Recognition
Xuefeng Liang
Hexin Jiang
Wenxin Xu
Ying Zhou
65
3
0
17 Feb 2023
A Comprehensive Review and a Taxonomy of Edge Machine Learning:
  Requirements, Paradigms, and Techniques
A Comprehensive Review and a Taxonomy of Edge Machine Learning: Requirements, Paradigms, and Techniques
Wenbin Li
Hakim Hacid
Ebtesam Almazrouei
Merouane Debbah
93
13
0
16 Feb 2023
Speech Enhancement with Multi-granularity Vector Quantization
Speech Enhancement with Multi-granularity Vector Quantization
Xiaokang Zhao
Qiu-shi Zhu
Jie Zhang
67
0
0
16 Feb 2023
Multi-modal Machine Learning in Engineering Design: A Review and Future
  Directions
Multi-modal Machine Learning in Engineering Design: A Review and Future Directions
Binyang Song
Ruilin Zhou
Faez Ahmed
AI4CE
144
47
0
14 Feb 2023
AV-data2vec: Self-supervised Learning of Audio-Visual Speech
  Representations with Contextualized Target Representations
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Jiachen Lian
Alexei Baevski
Wei-Ning Hsu
Michael Auli
SSL
150
34
0
10 Feb 2023
Representation Deficiency in Masked Language Modeling
Representation Deficiency in Masked Language Modeling
Yu Meng
Jitin Krishnan
Sinong Wang
Qifan Wang
Yuning Mao
Han Fang
Marjan Ghazvininejad
Jiawei Han
Luke Zettlemoyer
149
7
0
04 Feb 2023
ANTM: An Aligned Neural Topic Model for Exploring Evolving Topics
ANTM: An Aligned Neural Topic Model for Exploring Evolving Topics
Hamed Rahimi
Hubert Naacke
Camélia Constantin
B. Amann
BDLAI4TS
127
6
0
03 Feb 2023
SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling
SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling
Jiaxiang Dong
Haixu Wu
Haoran Zhang
Li Zhang
Jianmin Wang
Mingsheng Long
AI4TS
142
94
0
02 Feb 2023
Image-Based Vehicle Classification by Synergizing Features from
  Supervised and Self-Supervised Learning Paradigms
Image-Based Vehicle Classification by Synergizing Features from Supervised and Self-Supervised Learning Paradigms
S. Ma
Jidong J. Yang
SSL
33
5
0
01 Feb 2023
Multimodality Representation Learning: A Survey on Evolution,
  Pretraining and Its Applications
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications
Muhammad Arslan Manzoor
S. Albarri
Ziting Xian
Zaiqiao Meng
Preslav Nakov
Shangsong Liang
AI4TS
104
32
0
01 Feb 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion
  Models
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
238
344
0
30 Jan 2023
Aerial Image Object Detection With Vision Transformer Detector (ViTDet)
Aerial Image Object Detection With Vision Transformer Detector (ViTDet)
Liya Wang
A. Tien
141
9
0
28 Jan 2023
Open Problems in Applied Deep Learning
Open Problems in Applied Deep Learning
M. Raissi
AI4CE
115
2
0
26 Jan 2023
Self-Supervised Learning from Images with a Joint-Embedding Predictive
  Architecture
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
Mahmoud Assran
Quentin Duval
Ishan Misra
Piotr Bojanowski
Pascal Vincent
Michael G. Rabbat
Yann LeCun
Nicolas Ballas
SSLAI4TSMDE
147
364
0
19 Jan 2023
Vision Learners Meet Web Image-Text Pairs
Vision Learners Meet Web Image-Text Pairs
Bingchen Zhao
Quan Cui
Hao Wu
Osamu Yoshie
Cheng Yang
Oisin Mac Aodha
VLM
86
5
0
17 Jan 2023
RILS: Masked Visual Reconstruction in Language Semantic Space
RILS: Masked Visual Reconstruction in Language Semantic Space
Shusheng Yang
Yixiao Ge
Kun Yi
Dian Li
Ying Shan
Xiaohu Qie
Xinggang Wang
CLIP
95
11
0
17 Jan 2023
A Survey on Self-supervised Learning: Algorithms, Applications, and
  Future Trends
A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends
Jie Gui
Tuo Chen
Jing Zhang
Qiong Cao
Zhe Sun
Haoran Luo
Dacheng Tao
232
161
0
13 Jan 2023
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
Jia Ning
Chen Li
Zheng Zhang
Zigang Geng
Qi Dai
Kun He
Han Hu
130
47
0
05 Jan 2023
Trace Encoding in Process Mining: a survey and benchmarking
Trace Encoding in Process Mining: a survey and benchmarking
Sylvio Barbon Junior
Paolo Ceravolo
R. Oyamada
G. Tavares
AI4TS
80
21
0
05 Jan 2023
TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models
TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models
Sucheng Ren
Fangyun Wei
Zheng Zhang
Han Hu
146
43
0
03 Jan 2023
Disjoint Masking with Joint Distillation for Efficient Masked Image
  Modeling
Disjoint Masking with Joint Distillation for Efficient Masked Image Modeling
Xin Ma
Chang-Shu Liu
Chunyu Xie
Long Ye
Yafeng Deng
Xiang Ji
140
10
0
31 Dec 2022
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
  Tasks
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks
Suwon Shon
Siddhant Arora
Chyi-Jiunn Lin
Ankita Pasad
Felix Wu
Roshan S. Sharma
Wei Wu
Hung-yi Lee
Karen Livescu
Shinji Watanabe
ELM
80
33
0
20 Dec 2022
Exploring Effective Fusion Algorithms for Speech Based Self-Supervised
  Learning Models
Exploring Effective Fusion Algorithms for Speech Based Self-Supervised Learning Models
Changli Tang
Yujin Wang
Xie Chen
Weiqiang Zhang
61
2
0
20 Dec 2022
Randomized Quantization: A Generic Augmentation for Data Agnostic
  Self-supervised Learning
Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning
Huimin Wu
Chenyang Lei
Xiao Sun
Pengju Wang
Qifeng Chen
Kwang-Ting Cheng
Stephen Lin
Zhirong Wu
MQ
86
6
0
19 Dec 2022
BEATs: Audio Pre-Training with Acoustic Tokenizers
BEATs: Audio Pre-Training with Acoustic Tokenizers
Sanyuan Chen
Yu-Huan Wu
Chengyi Wang
Shujie Liu
Daniel C. Tompkins
Zhuo Chen
Furu Wei
124
299
0
18 Dec 2022
MAViL: Masked Audio-Video Learners
MAViL: Masked Audio-Video Learners
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
81
54
0
15 Dec 2022
Efficient Self-supervised Learning with Contextualized Target
  Representations for Vision, Speech and Language
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Alexei Baevski
Arun Babu
Wei-Ning Hsu
Michael Auli
VLMSSL
129
97
0
14 Dec 2022
Disentangling Prosody Representations with Unsupervised Speech
  Reconstruction
Disentangling Prosody Representations with Unsupervised Speech Reconstruction
Leyuan Qu
Taiha Li
C. Weber
Theresa Pekarek-Rosin
F. Ren
S. Wermter
85
10
0
14 Dec 2022
Learning 3D Representations from 2D Pre-trained Models via
  Image-to-Point Masked Autoencoders
Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders
Renrui Zhang
Liuhui Wang
Yu Qiao
Peng Gao
Hongsheng Li
3DPC
92
137
0
13 Dec 2022
FastMIM: Expediting Masked Image Modeling Pre-training for Vision
FastMIM: Expediting Masked Image Modeling Pre-training for Vision
Jianyuan Guo
Kai Han
Han Wu
Yehui Tang
Yunhe Wang
Chang Xu
80
10
0
13 Dec 2022
Jointly Learning Visual and Auditory Speech Representations from Raw
  Data
Jointly Learning Visual and Auditory Speech Representations from Raw Data
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Maja Pantic
SSL
92
49
0
12 Dec 2022
Previous
123...101112789
Next