Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.03555
Cited By
v1
v2
v3 (latest)
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"
50 / 557 papers shown
Title
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
153
20
0
27 Nov 2023
SSIN: Self-Supervised Learning for Rainfall Spatial Interpolation
Jia Li
Yanyan Shen
Lei Chen
Charles Wang Wai Ng
60
3
0
27 Nov 2023
Explainable Time Series Anomaly Detection using Masked Latent Generative Modeling
Daesoo Lee
Sara Malacarne
Erlend Aune
AI4TS
115
13
0
21 Nov 2023
From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation
Jiaxin Ge
Sanjay Subramanian
Trevor Darrell
Boyi Li
LRM
104
4
0
21 Nov 2023
Self-Distilled Representation Learning for Time Series
Felix Pieper
Konstantin Ditschuneit
Martin Genzel
Alexandra Lindt
Johannes Otterbach
AI4TS
64
1
0
19 Nov 2023
R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces
Heng-Jui Chang
James R. Glass
72
3
0
15 Nov 2023
SS-MAE: Spatial-Spectral Masked Auto-Encoder for Multi-Source Remote Sensing Image Classification
Junyan Lin
Feng Gao
Xiaochen Shi
Junyu Dong
Q. Du
94
52
0
08 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
Siddharth Srivastava
Gaurav Sharma
SSL
95
67
0
07 Nov 2023
FATE: Feature-Agnostic Transformer-based Encoder for learning generalized embedding spaces in flow cytometry data
Lisa Weijler
Florian Kowarsch
Michael Reiter
Pedro Hermosilla
Margarita Maurer-Granofszky
Michael N. Dworzak
MedIm
44
3
0
06 Nov 2023
Pseudo-Labeling for Domain-Agnostic Bangla Automatic Speech Recognition
R. N. Nandi
Mehadi Hasan Menon
Tareq Al Muntasir
Sagor Sarker
Quazi Sarwar Muhtaseem
Md. Tariqul Islam
Shammur A. Chowdhury
Firoj Alam
91
3
0
06 Nov 2023
Towards Calibrated Robust Fine-Tuning of Vision-Language Models
Changdae Oh
Hyesu Lim
Mijoo Kim
Dongyoon Han
Junhyeok Park
Euiseog Jeong
Alexander G. Hauptmann
Zhi-Qi Cheng
Kyungwoo Song
VLM
120
18
0
03 Nov 2023
Investigating Relative Performance of Transfer and Meta Learning
Benji Alwis
31
0
0
31 Oct 2023
Mean BERTs make erratic language teachers: the effectiveness of latent bootstrapping in low-resource settings
David Samuel
54
4
0
30 Oct 2023
Pre-training with Random Orthogonal Projection Image Modeling
Maryam Haghighat
Peyman Moghadam
Shaheer Mohamed
Piotr Koniusz
VLM
85
9
0
28 Oct 2023
Large-scale Foundation Models and Generative AI for BigData Neuroscience
Ran Wang
Zhe Sage Chen
MedIm
AI4CE
LRM
40
10
0
27 Oct 2023
Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder
Huiwon Jang
Jihoon Tack
Daewon Choi
Jongheon Jeong
Jinwoo Shin
76
3
0
25 Oct 2023
Fine tuning Pre trained Models for Robustness Under Noisy Labels
Sumyeong Ahn
Sihyeon Kim
Jongwoo Ko
SeYoung Yun
AAML
NoLa
121
8
0
24 Oct 2023
Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation
Kun Wei
Bei Li
Hang Lv
Quan Lu
Ning Jiang
Lei Xie
92
4
0
22 Oct 2023
Learning with Unmasked Tokens Drives Stronger Vision Learners
Taekyung Kim
Sanghyuk Chun
Byeongho Heo
Dongyoon Han
SSL
100
2
0
20 Oct 2023
A Car Model Identification System for Streamlining the Automobile Sales Process
Said Togru
Marco Moldovan
79
0
0
19 Oct 2023
Detecting Speech Abnormalities with a Perceiver-based Sequence Classifier that Leverages a Universal Speech Model
H. Soltau
Izhak Shafran
Alex Ottenwess
Joseph R. Duffy
Rene L. Utianski
L. Barnard
John L. Stricker
D. Wiepert
David T. Jones
Hugo Botha
83
3
0
16 Oct 2023
Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text
Chanho Park
Chengsong Lu
Mingjie Chen
Thomas Hain
147
3
0
12 Oct 2023
Incorporating Domain Knowledge Graph into Multimodal Movie Genre Classification with Self-Supervised Attention and Contrastive Learning
Jiaqi Li
Guilin Qi
Chuanyi Zhang
Yongrui Chen
Yiming Tan
Chenlong Xia
Ye Tian
81
3
0
12 Oct 2023
Enhancing Representations through Heterogeneous Self-Supervised Learning
Zhongyu Li
Bo-Wen Yin
Yongxiang Liu
Li Liu
Ming-Ming Cheng
SSL
62
2
0
08 Oct 2023
Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading
Songtao Luo
Shuang Yang
Shiguang Shan
Xilin Chen
89
2
0
08 Oct 2023
OMG-ATTACK: Self-Supervised On-Manifold Generation of Transferable Evasion Attacks
Ofir Bar Tal
Adi Haviv
Amit H. Bermano
AAML
79
0
0
05 Oct 2023
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
Jiatong Shi
Hirofumi Inaguma
Xutai Ma
Ilia Kulikov
Anna Y. Sun
115
27
0
04 Oct 2023
Operator Learning Meets Numerical Analysis: Improving Neural Networks through Iterative Methods
E. Zappala
Daniel Levine
Shiyang Zhang
S. Rizvi
Sacha Lévy
David van Dijk
67
1
0
02 Oct 2023
Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition
Dongyuan Li
Yusong Wang
Kotaro Funakoshi
Manabu Okumura
103
4
0
30 Sep 2023
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
Andrew Rouditchenko
R. Collobert
Tatiana Likhomanenko
VLM
88
3
0
29 Sep 2023
Graph-level Representation Learning with Joint-Embedding Predictive Architectures
Geri Skenderi
Hang Li
Jiliang Tang
Marco Cristani
AI4TS
GNN
142
5
0
27 Sep 2023
Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
William Chen
Jiatong Shi
Brian Yan
Dan Berrebbi
Wangyou Zhang
Yifan Peng
Xuankai Chang
Soumi Maiti
Shinji Watanabe
83
10
0
26 Sep 2023
M
3
^{3}
3
3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding
Muhammad Abdullah Jamal
Omid Mohareri
3DPC
76
1
0
26 Sep 2023
SeMAnD: Self-Supervised Anomaly Detection in Multimodal Geospatial Datasets
Daria Reshetova
Swetava Ganguli
C. V. K. Iyer
Vipul Pandey
59
3
0
26 Sep 2023
Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning
Guan-lin Yang
Ziyang Ma
Zhisheng Zheng
Ya-Zhen Song
Zhikang Niu
Xie Chen
75
8
0
25 Sep 2023
M
3
^3
3
CS: Multi-Target Masked Point Modeling with Learnable Codebook and Siamese Decoders
Qibo Qiu
Honghui Yang
Wenxiao Wang
Shun Zhang
Haiming Gao
Haochao Ying
Wei Hua
Xiaofei He
3DPC
83
0
0
23 Sep 2023
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
Ziyang Ma
Wen Wu
Zhisheng Zheng
Yiwei Guo
Qian Chen
Shiliang Zhang
Xie Chen
86
17
0
19 Sep 2023
Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks
Sizhou Chen
Songyang Gao
Sen Fang
26
0
0
14 Sep 2023
CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders
Heng-Jui Chang
Ning Dong
Ruslan Mavlyutov
Sravya Popuri
Yu-An Chung
87
7
0
14 Sep 2023
Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation
Anna Deichler
Shivam Mehta
Simon Alexanderson
Jonas Beskow
DiffM
80
24
0
11 Sep 2023
Multimodal Fish Feeding Intensity Assessment in Aquaculture
Meng Cui
Xubo Liu
Haohe Liu
Zhuangzhuang Du
Tao Chen
Guoping Lian
Daoliang Li
Wenwu Wang
79
5
0
10 Sep 2023
DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions
Haochen Wang
Junsong Fan
Yuxi Wang
Kaiyou Song
Tong Wang
Zhaoxiang Zhang
80
21
0
07 Sep 2023
Leveraging Label Information for Multimodal Emotion Recognition
Pei-Hsin Wang
Sunlu Zeng
Junqing Chen
Lu Fan
Meng Chen
Youzheng Wu
Xiaodong He
81
5
0
05 Sep 2023
RepCodec: A Speech Representation Codec for Speech Tokenization
Zhichao Huang
Chutong Meng
Tom Ko
92
28
0
31 Aug 2023
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition
Zhisheng Zheng
Ziyang Ma
Yu Wang
Xie Chen
62
3
0
28 Aug 2023
Diversified Ensemble of Independent Sub-Networks for Robust Self-Supervised Representation Learning
Amirhossein Vahidi
Lisa Wimmer
H. Gündüz
Bernd Bischl
Eyke Hüllermeier
Mina Rezaei
OOD
UQCV
91
4
0
28 Aug 2023
Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Qiu-shi Zhu
Yunting Gu
Rilin Chen
Chao Weng
Yuchen Hu
Lirong Dai
Jie Zhang
AI4TS
81
3
0
28 Aug 2023
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads
Salah Zaiem
Youcef Kemiche
Titouan Parcollet
S. Essid
Mirco Ravanelli
SSL
50
10
0
28 Aug 2023
Unleash Model Potential: Bootstrapped Meta Self-supervised Learning
Wenwen Qiang
Changwen Zheng
Jingyao Wang
Changwen Zheng
SSL
61
1
0
28 Aug 2023
Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning
Miguel Sarabia
Elena Menyaylenko
Alessandro Toso
Skyler Seto
Zakaria Aldeneh
Shadi Pirhosseinloo
Luca Zappella
B. Theobald
N. Apostoloff
Jonathan Sheaffer
73
7
0
18 Aug 2023
Previous
1
2
3
4
5
6
...
10
11
12
Next