ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.03555
  4. Cited By
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
v1v2v3 (latest)

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
    SSLVLMViT
ArXiv (abs)PDFHTML

Papers citing "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"

50 / 557 papers shown
Title
Unified Modeling of Multi-Talker Overlapped Speech Recognition and
  Diarization with a Sidecar Separator
Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator
Lingwei Meng
Jiawen Kang
Mingyu Cui
Haibin Wu
Xixin Wu
Helen M. Meng
72
10
0
25 May 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in
  1-layer Transformer
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Yuandong Tian
Yiping Wang
Beidi Chen
S. Du
MLT
109
79
0
25 May 2023
Detecting Check-Worthy Claims in Political Debates, Speeches, and
  Interviews Using Audio Data
Detecting Check-Worthy Claims in Political Debates, Speeches, and Interviews Using Audio Data
Petar Ivanov
Ivan Koychev
Momchil Hardalov
Preslav Nakov
52
4
0
24 May 2023
Training Transitive and Commutative Multimodal Transformers with LoReTTa
Training Transitive and Commutative Multimodal Transformers with LoReTTa
Manuel Tran
Yashin Dicente Cid
Amal Lahiani
Fabian J. Theis
Tingying Peng
Eldad Klaiman
58
2
0
23 May 2023
Masked Modeling Duo for Speech: Specializing General-Purpose Audio
  Representation to Speech using Denoising Distillation
Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
112
4
0
23 May 2023
Can Self-Supervised Neural Representations Pre-Trained on Human Speech
  distinguish Animal Callers?
Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?
Eklavya Sarkar
Mathew Magimai.-Doss
49
12
0
23 May 2023
Know Your Self-supervised Learning: A Survey on Image-based Generative
  and Discriminative Training
Know Your Self-supervised Learning: A Survey on Image-based Generative and Discriminative Training
Utku Ozbulak
Hyun Jung Lee
Beril Boga
Esla Timothy Anzaku
Ho-min Park
Arnout Van Messem
W. D. Neve
J. Vankerschaver
DiffM
107
38
0
23 May 2023
Recycle-and-Distill: Universal Compression Strategy for
  Transformer-based Speech SSL Models with Attention Map Reusing and Masking
  Distillation
Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation
Kangwook Jang
Sungnyun Kim
Se-Young Yun
Hoi-Rim Kim
100
5
0
19 May 2023
Language-universal phonetic encoder for low-resource speech recognition
Language-universal phonetic encoder for low-resource speech recognition
Siyuan Feng
Ming Tu
Rui Xia
Chuanzeng Huang
Yuxuan Wang
81
3
0
19 May 2023
Language-Universal Phonetic Representation in Multilingual Speech
  Pretraining for Low-Resource Speech Recognition
Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition
Siyuan Feng
Ming Tu
Rui Xia
Chuanzeng Huang
Yuxuan Wang
70
5
0
19 May 2023
Unsupervised ASR via Cross-Lingual Pseudo-Labeling
Unsupervised ASR via Cross-Lingual Pseudo-Labeling
Tatiana Likhomanenko
Loren Lugosch
R. Collobert
39
0
0
19 May 2023
Unsupervised Domain-agnostic Fake News Detection using Multi-modal Weak
  Signals
Unsupervised Domain-agnostic Fake News Detection using Multi-modal Weak Signals
Amila Silva
Ling Luo
S. Karunasekera
C. Leckie
102
5
0
18 May 2023
MALM: Mask Augmentation based Local Matching for Food-Recipe Retrieval
MALM: Mask Augmentation based Local Matching for Food-Recipe Retrieval
Bhanu Prakash Voutharoja
Peng Wang
Lei Wang
Vivienne Guan
69
6
0
18 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLMMLLMObjD
151
122
0
18 May 2023
Self-supervised Fine-tuning for Improved Content Representations by
  Speaker-invariant Clustering
Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering
Heng-Jui Chang
Alexander H. Liu
James R. Glass
SSL
84
21
0
18 May 2023
DinoSR: Self-Distillation and Online Clustering for Self-supervised
  Speech Representation Learning
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
Alexander H. Liu
Heng-Jui Chang
Michael Auli
Wei-Ning Hsu
James R. Glass
90
26
0
17 May 2023
Evaluation of self-supervised pre-training for automatic infant movement
  classification using wearable movement sensors
Evaluation of self-supervised pre-training for automatic infant movement classification using wearable movement sensors
Einari Vaaras
Manu Airaksinen
S. Vanhatalo
Okko Räsänen
102
4
0
16 May 2023
GeoMAE: Masked Geometric Target Prediction for Self-supervised Point
  Cloud Pre-Training
GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training
Xiaoyu Tian
Haoxi Ran
Yue Wang
Hang Zhao
3DPCViT
62
42
0
15 May 2023
Meta Omnium: A Benchmark for General-Purpose Learning-to-Learn
Meta Omnium: A Benchmark for General-Purpose Learning-to-Learn
Ondrej Bohdal
Yinbing Tian
Yongshuo Zong
Ruchika Chavhan
Da Li
Henry Gouk
Li Guo
Timothy M. Hospedales
103
5
0
12 May 2023
Traffic Forecasting on New Roads Using Spatial Contrastive Pre-Training
  (SCPT)
Traffic Forecasting on New Roads Using Spatial Contrastive Pre-Training (SCPT)
Arian Prabowo
Hao Xue
Wei Shao
Piotr Koniusz
Flora D. Salim
AI4TS
94
14
0
09 May 2023
Exploration of Language Dependency for Japanese Self-Supervised Speech
  Representation Models
Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
88
3
0
09 May 2023
A vector quantized masked autoencoder for audiovisual speech emotion recognition
A vector quantized masked autoencoder for audiovisual speech emotion recognition
Samir Sadok
Simon Leglaive
Renaud Séguier
SSL
179
6
0
05 May 2023
ZipIt! Merging Models from Different Tasks without Training
ZipIt! Merging Models from Different Tasks without Training
George Stoica
Daniel Bolya
J. Bjorner
Pratik Ramesh
Taylor N. Hearn
Judy Hoffman
VLMMoMe
141
125
0
04 May 2023
HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio
  Codec
HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec
Dongchao Yang
Songxiang Liu
Rongjie Huang
Jinchuan Tian
Chao Weng
Yuexian Zou
240
132
0
04 May 2023
SLTUNET: A Simple Unified Model for Sign Language Translation
SLTUNET: A Simple Unified Model for Sign Language Translation
Biao Zhang
Mathias Müller
Rico Sennrich
SLR
94
34
0
02 May 2023
Multimodal Neural Databases
Multimodal Neural Databases
Giovanni Trappolini
Andrea Santilli
Emanuele Rodolà
A. Halevy
Fabrizio Silvestri
101
10
0
02 May 2023
Img2Vec: A Teacher of High Token-Diversity Helps Masked AutoEncoders
Img2Vec: A Teacher of High Token-Diversity Helps Masked AutoEncoders
Heng Pan
Chenyang Liu
Wenxiao Wang
Liejie Yuan
Hongfa Wang
Zhifeng Li
Wen Liu
VLM
64
3
0
25 Apr 2023
Deep Audio-Visual Singing Voice Transcription based on Self-Supervised
  Learning Models
Deep Audio-Visual Singing Voice Transcription based on Self-Supervised Learning Models
Xiangming Gu
Weizhen Zeng
Jianan Zhang
Longshen Ou
Ye Wang
100
6
0
24 Apr 2023
A Comparative Study of Pre-trained Speech and Audio Embeddings for
  Speech Emotion Recognition
A Comparative Study of Pre-trained Speech and Audio Embeddings for Speech Emotion Recognition
Orchid Chetia Phukan
Arun Balaji Buduru
Rajesh Sharma
60
7
0
22 Apr 2023
Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget
Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget
Johannes Lehner
Benedikt Alkin
Andreas Fürst
Elisabeth Rumetshofer
Lukas Miklautz
Sepp Hochreiter
111
18
0
20 Apr 2023
Complex Mixer for MedMNIST Classification Decathlon
Complex Mixer for MedMNIST Classification Decathlon
Zhuoran Zheng
Xiuyi Jia
78
7
0
20 Apr 2023
SelfAct: Personalized Activity Recognition based on Self-Supervised and
  Active Learning
SelfAct: Personalized Activity Recognition based on Self-Supervised and Active Learning
Luca Arrotta
Gabriele Civitarese
Samuele Valente
Claudio Bettini
98
1
0
19 Apr 2023
Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models
Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models
Yaohua Zha
Jinpeng Wang
Tao Dai
Bin Chen
Zhi Wang
Shutao Xia
VLM
115
48
0
14 Apr 2023
DINOv2: Learning Robust Visual Features without Supervision
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLMCLIPSSL
538
3,535
0
14 Apr 2023
Hard Patches Mining for Masked Image Modeling
Hard Patches Mining for Masked Image Modeling
Haochen Wang
Kaiyou Song
Junsong Fan
Yuxi Wang
Jin Xie
Zhaoxiang Zhang
72
64
0
12 Apr 2023
MoMo: A shared encoder Model for text, image and multi-Modal
  representations
MoMo: A shared encoder Model for text, image and multi-Modal representations
Rakesh Chada
Zhao-Heng Zheng
P. Natarajan
ViT
64
4
0
11 Apr 2023
Wav2code: Restore Clean Speech Representations via Codebook Lookup for
  Noise-Robust ASR
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
Yuchen Hu
Cheng Chen
Qiu-shi Zhu
Eng Siong Chng
124
16
0
11 Apr 2023
Mask-Based Modeling for Neural Radiance Fields
Mask-Based Modeling for Neural Radiance Fields
Ganlin Yang
Guoqiang Wei
Zhizheng Zhang
Yan Lu
Dong Liu
AI4CE
45
1
0
11 Apr 2023
Diffusion Models as Masked Autoencoders
Diffusion Models as Masked Autoencoders
Chen Wei
K. Mangalam
Po-Yao (Bernie) Huang
Yanghao Li
Haoqi Fan
Hu Xu
Huiyu Wang
Cihang Xie
Alan Yuille
Christoph Feichtenhofer
DiffMSyDa
100
53
0
06 Apr 2023
Self-Supervised Siamese Autoencoders
Self-Supervised Siamese Autoencoders
Friederike Baier
Sebastian Mair
Samuel G. Fadel
SSL
92
4
0
05 Apr 2023
RARE: Robust Masked Graph Autoencoder
RARE: Robust Masked Graph Autoencoder
Wenxuan Tu
Qing Liao
Sihang Zhou
Xin Peng
Chuan Ma
Yanfeng Guo
Xinwang Liu
Zhiping Cai
117
16
0
04 Apr 2023
Multi-Modal Perceiver Language Model for Outcome Prediction in Emergency
  Department
Multi-Modal Perceiver Language Model for Outcome Prediction in Emergency Department
Sabri Boughorbel
Fethi Jarray
Abdulaziz Yousuf Al-Homaid
Rashid Niaz
Khalid Alyafei
113
0
0
03 Apr 2023
Mask Hierarchical Features For Self-Supervised Learning
Mask Hierarchical Features For Self-Supervised Learning
Fenggang Liu
Yangguang Li
Feng Liang
Jilan Xu
Bin Huang
Jing Shao
30
0
0
01 Apr 2023
Where are we in the search for an Artificial Visual Cortex for Embodied
  Intelligence?
Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
Arjun Majumdar
Karmesh Yadav
Sergio Arnaud
Yecheng Jason Ma
Claire Chen
...
Dhruv Batra
Yixin Lin
Oleksandr Maksymets
Aravind Rajeswaran
Franziska Meier
LM&Ro
79
185
0
31 Mar 2023
DIME-FM: DIstilling Multimodal and Efficient Foundation Models
DIME-FM: DIstilling Multimodal and Efficient Foundation Models
Ximeng Sun
Pengchuan Zhang
Peizhao Zhang
Hardik Shah
Kate Saenko
Xide Xia
VLM
109
22
0
31 Mar 2023
Self-Supervised Multimodal Learning: A Survey
Self-Supervised Multimodal Learning: A Survey
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
125
50
0
31 Mar 2023
LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision
  Transformers
LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision Transformers
Zijun Long
Zaiqiao Meng
Gerardo Aragon Camarasa
R. McCreadie
VLM
79
5
0
31 Mar 2023
Mixed Autoencoder for Self-supervised Visual Representation Learning
Mixed Autoencoder for Self-supervised Visual Representation Learning
Kai Chen
Zhili Liu
Lanqing Hong
Hang Xu
Zhenguo Li
Dit-Yan Yeung
SSL
123
45
0
30 Mar 2023
Point2Vec for Self-Supervised Representation Learning on Point Clouds
Point2Vec for Self-Supervised Representation Learning on Point Clouds
Karim Abou Zeid
Jonas Schult
Alexander Hermans
Bastian Leibe
3DPC
73
30
0
29 Mar 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
136
169
0
28 Mar 2023
Previous
123...678...101112
Next