ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.03555
  4. Cited By
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
v1v2v3 (latest)

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
    SSLVLMViT
ArXiv (abs)PDFHTML

Papers citing "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"

50 / 557 papers shown
Title
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
153
20
0
27 Nov 2023
SSIN: Self-Supervised Learning for Rainfall Spatial Interpolation
SSIN: Self-Supervised Learning for Rainfall Spatial Interpolation
Jia Li
Yanyan Shen
Lei Chen
Charles Wang Wai Ng
60
3
0
27 Nov 2023
Explainable Time Series Anomaly Detection using Masked Latent Generative
  Modeling
Explainable Time Series Anomaly Detection using Masked Latent Generative Modeling
Daesoo Lee
Sara Malacarne
Erlend Aune
AI4TS
115
13
0
21 Nov 2023
From Wrong To Right: A Recursive Approach Towards Vision-Language
  Explanation
From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation
Jiaxin Ge
Sanjay Subramanian
Trevor Darrell
Boyi Li
LRM
104
4
0
21 Nov 2023
Self-Distilled Representation Learning for Time Series
Self-Distilled Representation Learning for Time Series
Felix Pieper
Konstantin Ditschuneit
Martin Genzel
Alexandra Lindt
Johannes Otterbach
AI4TS
64
1
0
19 Nov 2023
R-Spin: Efficient Speaker and Noise-invariant Representation Learning
  with Acoustic Pieces
R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces
Heng-Jui Chang
James R. Glass
72
3
0
15 Nov 2023
SS-MAE: Spatial-Spectral Masked Auto-Encoder for Multi-Source Remote
  Sensing Image Classification
SS-MAE: Spatial-Spectral Masked Auto-Encoder for Multi-Source Remote Sensing Image Classification
Junyan Lin
Feng Gao
Xiaochen Shi
Junyu Dong
Q. Du
94
52
0
08 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
OmniVec: Learning robust representations with cross modal sharing
Siddharth Srivastava
Gaurav Sharma
SSL
95
67
0
07 Nov 2023
FATE: Feature-Agnostic Transformer-based Encoder for learning
  generalized embedding spaces in flow cytometry data
FATE: Feature-Agnostic Transformer-based Encoder for learning generalized embedding spaces in flow cytometry data
Lisa Weijler
Florian Kowarsch
Michael Reiter
Pedro Hermosilla
Margarita Maurer-Granofszky
Michael N. Dworzak
MedIm
44
3
0
06 Nov 2023
Pseudo-Labeling for Domain-Agnostic Bangla Automatic Speech Recognition
Pseudo-Labeling for Domain-Agnostic Bangla Automatic Speech Recognition
R. N. Nandi
Mehadi Hasan Menon
Tareq Al Muntasir
Sagor Sarker
Quazi Sarwar Muhtaseem
Md. Tariqul Islam
Shammur A. Chowdhury
Firoj Alam
91
3
0
06 Nov 2023
Towards Calibrated Robust Fine-Tuning of Vision-Language Models
Towards Calibrated Robust Fine-Tuning of Vision-Language Models
Changdae Oh
Hyesu Lim
Mijoo Kim
Dongyoon Han
Junhyeok Park
Euiseog Jeong
Alexander G. Hauptmann
Zhi-Qi Cheng
Kyungwoo Song
VLM
120
18
0
03 Nov 2023
Investigating Relative Performance of Transfer and Meta Learning
Investigating Relative Performance of Transfer and Meta Learning
Benji Alwis
31
0
0
31 Oct 2023
Mean BERTs make erratic language teachers: the effectiveness of latent
  bootstrapping in low-resource settings
Mean BERTs make erratic language teachers: the effectiveness of latent bootstrapping in low-resource settings
David Samuel
54
4
0
30 Oct 2023
Pre-training with Random Orthogonal Projection Image Modeling
Pre-training with Random Orthogonal Projection Image Modeling
Maryam Haghighat
Peyman Moghadam
Shaheer Mohamed
Piotr Koniusz
VLM
85
9
0
28 Oct 2023
Large-scale Foundation Models and Generative AI for BigData Neuroscience
Large-scale Foundation Models and Generative AI for BigData Neuroscience
Ran Wang
Zhe Sage Chen
MedImAI4CELRM
40
10
0
27 Oct 2023
Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked
  Auto-Encoder
Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder
Huiwon Jang
Jihoon Tack
Daewon Choi
Jongheon Jeong
Jinwoo Shin
76
3
0
25 Oct 2023
Fine tuning Pre trained Models for Robustness Under Noisy Labels
Fine tuning Pre trained Models for Robustness Under Noisy Labels
Sumyeong Ahn
Sihyeon Kim
Jongwoo Ko
SeYoung Yun
AAMLNoLa
121
8
0
24 Oct 2023
Conversational Speech Recognition by Learning Audio-textual Cross-modal
  Contextual Representation
Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation
Kun Wei
Bei Li
Hang Lv
Quan Lu
Ning Jiang
Lei Xie
92
4
0
22 Oct 2023
Learning with Unmasked Tokens Drives Stronger Vision Learners
Learning with Unmasked Tokens Drives Stronger Vision Learners
Taekyung Kim
Sanghyuk Chun
Byeongho Heo
Dongyoon Han
SSL
100
2
0
20 Oct 2023
A Car Model Identification System for Streamlining the Automobile Sales
  Process
A Car Model Identification System for Streamlining the Automobile Sales Process
Said Togru
Marco Moldovan
79
0
0
19 Oct 2023
Detecting Speech Abnormalities with a Perceiver-based Sequence
  Classifier that Leverages a Universal Speech Model
Detecting Speech Abnormalities with a Perceiver-based Sequence Classifier that Leverages a Universal Speech Model
H. Soltau
Izhak Shafran
Alex Ottenwess
Joseph R. Duffy
Rene L. Utianski
L. Barnard
John L. Stricker
D. Wiepert
David T. Jones
Hugo Botha
83
3
0
16 Oct 2023
Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text
Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text
Chanho Park
Chengsong Lu
Mingjie Chen
Thomas Hain
147
3
0
12 Oct 2023
Incorporating Domain Knowledge Graph into Multimodal Movie Genre
  Classification with Self-Supervised Attention and Contrastive Learning
Incorporating Domain Knowledge Graph into Multimodal Movie Genre Classification with Self-Supervised Attention and Contrastive Learning
Jiaqi Li
Guilin Qi
Chuanyi Zhang
Yongrui Chen
Yiming Tan
Chenlong Xia
Ye Tian
81
3
0
12 Oct 2023
Enhancing Representations through Heterogeneous Self-Supervised Learning
Enhancing Representations through Heterogeneous Self-Supervised Learning
Zhongyu Li
Bo-Wen Yin
Yongxiang Liu
Li Liu
Ming-Ming Cheng
SSL
62
2
0
08 Oct 2023
Learning Separable Hidden Unit Contributions for Speaker-Adaptive
  Lip-Reading
Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading
Songtao Luo
Shuang Yang
Shiguang Shan
Xilin Chen
89
2
0
08 Oct 2023
OMG-ATTACK: Self-Supervised On-Manifold Generation of Transferable
  Evasion Attacks
OMG-ATTACK: Self-Supervised On-Manifold Generation of Transferable Evasion Attacks
Ofir Bar Tal
Adi Haviv
Amit H. Bermano
AAML
79
0
0
05 Oct 2023
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised
  Learning with Masked Unit Prediction
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
Jiatong Shi
Hirofumi Inaguma
Xutai Ma
Ilia Kulikov
Anna Y. Sun
115
27
0
04 Oct 2023
Operator Learning Meets Numerical Analysis: Improving Neural Networks
  through Iterative Methods
Operator Learning Meets Numerical Analysis: Improving Neural Networks through Iterative Methods
E. Zappala
Daniel Levine
Shiyang Zhang
S. Rizvi
Sacha Lévy
David van Dijk
67
1
0
02 Oct 2023
Active Learning Based Fine-Tuning Framework for Speech Emotion
  Recognition
Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition
Dongyuan Li
Yusong Wang
Kotaro Funakoshi
Manabu Okumura
103
4
0
30 Sep 2023
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
Andrew Rouditchenko
R. Collobert
Tatiana Likhomanenko
VLM
88
3
0
29 Sep 2023
Graph-level Representation Learning with Joint-Embedding Predictive Architectures
Graph-level Representation Learning with Joint-Embedding Predictive Architectures
Geri Skenderi
Hang Li
Jiliang Tang
Marco Cristani
AI4TSGNN
142
5
0
27 Sep 2023
Joint Prediction and Denoising for Large-scale Multilingual
  Self-supervised Learning
Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
William Chen
Jiatong Shi
Brian Yan
Dan Berrebbi
Wangyou Zhang
Yifan Peng
Xuankai Chang
Soumi Maiti
Shinji Watanabe
83
10
0
26 Sep 2023
M$^{3}$3D: Learning 3D priors using Multi-Modal Masked Autoencoders for
  2D image and video understanding
M3^{3}33D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding
Muhammad Abdullah Jamal
Omid Mohareri
3DPC
76
1
0
26 Sep 2023
SeMAnD: Self-Supervised Anomaly Detection in Multimodal Geospatial
  Datasets
SeMAnD: Self-Supervised Anomaly Detection in Multimodal Geospatial Datasets
Daria Reshetova
Swetava Ganguli
C. V. K. Iyer
Vipul Pandey
59
3
0
26 Sep 2023
Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech
  Representation Learning
Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning
Guan-lin Yang
Ziyang Ma
Zhisheng Zheng
Ya-Zhen Song
Zhikang Niu
Xie Chen
75
8
0
25 Sep 2023
M$^3$CS: Multi-Target Masked Point Modeling with Learnable Codebook and
  Siamese Decoders
M3^33CS: Multi-Target Masked Point Modeling with Learnable Codebook and Siamese Decoders
Qibo Qiu
Honghui Yang
Wenxiao Wang
Shun Zhang
Haiming Gao
Haochao Ying
Wei Hua
Xiaofei He
3DPC
83
0
0
23 Sep 2023
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion
  Recognition
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
Ziyang Ma
Wen Wu
Zhisheng Zheng
Yiwei Guo
Qian Chen
Shiliang Zhang
Xie Chen
86
17
0
19 Sep 2023
Echotune: A Modular Extractor Leveraging the Variable-Length Nature of
  Speech in ASR Tasks
Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks
Sizhou Chen
Songyang Gao
Sen Fang
26
0
0
14 Sep 2023
CoLLD: Contrastive Layer-to-layer Distillation for Compressing
  Multilingual Pre-trained Speech Encoders
CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders
Heng-Jui Chang
Ning Dong
Ruslan Mavlyutov
Sravya Popuri
Yu-An Chung
87
7
0
14 Sep 2023
Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio
  Representation
Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation
Anna Deichler
Shivam Mehta
Simon Alexanderson
Jonas Beskow
DiffM
80
24
0
11 Sep 2023
Multimodal Fish Feeding Intensity Assessment in Aquaculture
Multimodal Fish Feeding Intensity Assessment in Aquaculture
Meng Cui
Xubo Liu
Haohe Liu
Zhuangzhuang Du
Tao Chen
Guoping Lian
Daoliang Li
Wenwu Wang
79
5
0
10 Sep 2023
DropPos: Pre-Training Vision Transformers by Reconstructing Dropped
  Positions
DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions
Haochen Wang
Junsong Fan
Yuxi Wang
Kaiyou Song
Tong Wang
Zhaoxiang Zhang
80
21
0
07 Sep 2023
Leveraging Label Information for Multimodal Emotion Recognition
Leveraging Label Information for Multimodal Emotion Recognition
Pei-Hsin Wang
Sunlu Zeng
Junqing Chen
Lu Fan
Meng Chen
Youzheng Wu
Xiaodong He
81
5
0
05 Sep 2023
RepCodec: A Speech Representation Codec for Speech Tokenization
RepCodec: A Speech Representation Codec for Speech Tokenization
Zhichao Huang
Chutong Meng
Tom Ko
92
28
0
31 Aug 2023
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for
  Automatic Speech Recognition
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition
Zhisheng Zheng
Ziyang Ma
Yu Wang
Xie Chen
62
3
0
28 Aug 2023
Diversified Ensemble of Independent Sub-Networks for Robust
  Self-Supervised Representation Learning
Diversified Ensemble of Independent Sub-Networks for Robust Self-Supervised Representation Learning
Amirhossein Vahidi
Lisa Wimmer
H. Gündüz
Bernd Bischl
Eyke Hüllermeier
Mina Rezaei
OODUQCV
91
4
0
28 Aug 2023
Rep2wav: Noise Robust text-to-speech Using self-supervised
  representations
Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Qiu-shi Zhu
Yunting Gu
Rilin Chen
Chao Weng
Yuchen Hu
Lirong Dai
Jie Zhang
AI4TS
81
3
0
28 Aug 2023
Speech Self-Supervised Representations Benchmarking: a Case for Larger
  Probing Heads
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads
Salah Zaiem
Youcef Kemiche
Titouan Parcollet
S. Essid
Mirco Ravanelli
SSL
50
10
0
28 Aug 2023
Unleash Model Potential: Bootstrapped Meta Self-supervised Learning
Unleash Model Potential: Bootstrapped Meta Self-supervised Learning
Wenwen Qiang
Changwen Zheng
Jingyao Wang
Changwen Zheng
SSL
61
1
0
28 Aug 2023
Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning
Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning
Miguel Sarabia
Elena Menyaylenko
Alessandro Toso
Skyler Seto
Zakaria Aldeneh
Shadi Pirhosseinloo
Luca Zappella
B. Theobald
N. Apostoloff
Jonathan Sheaffer
73
7
0
18 Aug 2023
Previous
123456...101112
Next