ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.03555
  4. Cited By
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
v1v2v3 (latest)

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
    SSLVLMViT
ArXiv (abs)PDFHTML

Papers citing "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"

50 / 557 papers shown
Title
CORN: Contact-based Object Representation for Nonprehensile Manipulation
  of General Unseen Objects
CORN: Contact-based Object Representation for Nonprehensile Manipulation of General Unseen Objects
Yoonyoung Cho
Junhyek Han
Yoontae Cho
Beomjoon Kim
115
8
0
16 Mar 2024
SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech
  Recognition Evaluation
SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation
Jiayu Du
Jinpeng Li
Guoguo Chen
Wei-Qiang Zhang
ELM
78
3
0
13 Mar 2024
Spatiotemporal Predictive Pre-training for Robotic Motor Control
Spatiotemporal Predictive Pre-training for Robotic Motor Control
Jiange Yang
Bei Liu
Jianlong Fu
Bocheng Pan
Gangshan Wu
Limin Wang
108
12
0
08 Mar 2024
IndicVoices: Towards building an Inclusive Multilingual Speech Dataset
  for Indian Languages
IndicVoices: Towards building an Inclusive Multilingual Speech Dataset for Indian Languages
Tahir Javed
J. Nawale
E. George
Sakshi Joshi
Kaushal Bhogale
...
M. ManickamK
C. V. Vaijayanthi
Krishnan Srinivasa Raghavan Karunganni
Pratyush Kumar
Mitesh M Khapra
94
22
0
04 Mar 2024
BootTOD: Bootstrap Task-oriented Dialogue Representations by Aligning
  Diverse Responses
BootTOD: Bootstrap Task-oriented Dialogue Representations by Aligning Diverse Responses
Weihao Zeng
Keqing He
Yejie Wang
Dayuan Fu
Weiran Xu
72
0
0
02 Mar 2024
Learning and Leveraging World Models in Visual Representation Learning
Learning and Leveraging World Models in Visual Representation Learning
Q. Garrido
Mahmoud Assran
Nicolas Ballas
Adrien Bardes
Laurent Najman
Yann LeCun
SSL
105
30
0
01 Mar 2024
Enhancing EEG-to-Text Decoding through Transferable Representations from
  Pre-trained Contrastive EEG-Text Masked Autoencoder
Enhancing EEG-to-Text Decoding through Transferable Representations from Pre-trained Contrastive EEG-Text Masked Autoencoder
Jiaqi Wang
Zhenxi Song
Zhengyu Ma
Xipeng Qiu
Min Zhang
Zhiguo Zhang
156
8
0
27 Feb 2024
Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised
  Learning
Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised Learning
Johnathan Xie
Yoonho Lee
Annie S. Chen
Chelsea Finn
79
3
0
22 Feb 2024
The Effect of Batch Size on Contrastive Self-Supervised Speech
  Representation Learning
The Effect of Batch Size on Contrastive Self-Supervised Speech Representation Learning
Nik Vaessen
David A. van Leeuwen
96
3
0
21 Feb 2024
EMO-SUPERB: An In-depth Look at Speech Emotion Recognition
EMO-SUPERB: An In-depth Look at Speech Emotion Recognition
Haibin Wu
Huang-Cheng Chou
Kai-Wei Chang
Lucas Goncalves
Jiawei Du
Jyh-Shing Roger Jang
Chi-Chun Lee
Hung-Yi Lee
91
15
0
20 Feb 2024
Handling Ambiguity in Emotion: From Out-of-Domain Detection to
  Distribution Estimation
Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation
Wen Wu
Yue Liu
Chuxu Zhang
Chung-Cheng Chiu
Qiujia Li
Junwen Bai
Tara N. Sainath
P. Woodland
70
3
0
20 Feb 2024
A Comprehensive Review of Machine Learning Advances on Data Change: A
  Cross-Field Perspective
A Comprehensive Review of Machine Learning Advances on Data Change: A Cross-Field Perspective
Jeng-Lin Li
Chih-Fan Hsu
Ming-Ching Chang
Wei-Chao Chen
OOD
110
2
0
20 Feb 2024
Probing Self-supervised Learning Models with Target Speech Extraction
Probing Self-supervised Learning Models with Target Speech Extraction
Junyi Peng
Marc Delcroix
Tsubasa Ochiai
Oldrich Plchot
Takanori Ashihara
Shoko Araki
J. Černocký
105
4
0
17 Feb 2024
EEG2Rep: Enhancing Self-supervised EEG Representation Through
  Informative Masked Inputs
EEG2Rep: Enhancing Self-supervised EEG Representation Through Informative Masked Inputs
Navid Mohammadi Foumani
G. Mackellar
Soheila Ghane
Saad Irtza
Nam Nguyen
Mahsa Salehi
99
17
0
17 Feb 2024
Revisiting Feature Prediction for Learning Visual Representations from
  Video
Revisiting Feature Prediction for Learning Visual Representations from Video
Adrien Bardes
Q. Garrido
Jean Ponce
Xinlei Chen
Michael G. Rabbat
Yann LeCun
Mahmoud Assran
Nicolas Ballas
MDEVLM
157
87
0
15 Feb 2024
Advancing Human Action Recognition with Foundation Models trained on
  Unlabeled Public Videos
Advancing Human Action Recognition with Foundation Models trained on Unlabeled Public Videos
Yang Qian
Yinan Sun
A. Kargarandehkordi
Parnian Azizian
O. Mutlu
Saimourya Surabhi
Pingyi Chen
Zain Jabbar
Dennis Paul Wall
Peter Washington
OffRL
73
1
0
14 Feb 2024
MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and
  Instruction Tuning
MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning
Hang Zhao
Yifei Xin
Zhesong Yu
Bilei Zhu
Lu Lu
Zejun Ma
AuLLM
93
4
0
12 Feb 2024
SpeechCLIP+: Self-supervised multi-task representation learning for
  speech via CLIP and speech-image data
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data
Hsuan-Fu Wang
Yi-Jen Shih
Heng-Jui Chang
Layne Berry
Puyuan Peng
Hung-yi Lee
Hsin-Min Wang
David Harwath
VLM
78
2
0
10 Feb 2024
REBORN: Reinforcement-Learned Boundary Segmentation with Iterative
  Training for Unsupervised ASR
REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR
Liang-Hsuan Tseng
En-Pei Hu
Cheng-Han Chiang
Yuan Tseng
Hung-yi Lee
Lin-shan Lee
Shao-Hua Sun
107
1
0
06 Feb 2024
The last Dance : Robust backdoor attack via diffusion models and
  bayesian approach
The last Dance : Robust backdoor attack via diffusion models and bayesian approach
Orson Mengara
DiffM
97
4
0
05 Feb 2024
Point Cloud Matters: Rethinking the Impact of Different Observation
  Spaces on Robot Learning
Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
Haoyi Zhu
Yating Wang
Di Huang
Weicai Ye
Wanli Ouyang
Tong He
SSL3DPC
153
25
0
04 Feb 2024
TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition
  in Conversation
TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation
Taeyang Yun
Hyunkuk Lim
Jeong-Hoon Lee
Min Song
75
12
0
16 Jan 2024
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
Hyoung-Seok Oh
Sang-Hoon Lee
Deok-Hyun Cho
Seong-Whan Lee
149
2
0
16 Jan 2024
MIMIC: Mask Image Pre-training with Mix Contrastive Fine-tuning for
  Facial Expression Recognition
MIMIC: Mask Image Pre-training with Mix Contrastive Fine-tuning for Facial Expression Recognition
Fan Zhang
Xiaobao Guo
Xiaojiang Peng
Alex C. Kot
60
1
0
14 Jan 2024
An EcoSage Assistant: Towards Building A Multimodal Plant Care Dialogue
  Assistant
An EcoSage Assistant: Towards Building A Multimodal Plant Care Dialogue Assistant
Mohit Tomar
Abhisek Tiwari
Tulika Saha
Prince Jha
Sriparna Saha
35
1
0
10 Jan 2024
HiMTM: Hierarchical Multi-Scale Masked Time Series Modeling for
  Long-Term Forecasting
HiMTM: Hierarchical Multi-Scale Masked Time Series Modeling for Long-Term Forecasting
Shubao Zhao
Ming Jin
Zhaoxiang Hou
Che-Sheng Yang
Zengxiang Li
Qingsong Wen
Yi Wang
81
2
0
10 Jan 2024
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
Wenxi Chen
Yuzhe Liang
Ziyang Ma
Zhisheng Zheng
Xie Chen
ViT
107
22
0
07 Jan 2024
MERBench: A Unified Evaluation Benchmark for Multimodal Emotion
  Recognition
MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition
Zheng Lian
Guoying Zhao
Yong Ren
Hao Gu
Haiyang Sun
Lan Chen
Bin Liu
Jianhua Tao
124
13
0
07 Jan 2024
CrisisViT: A Robust Vision Transformer for Crisis Image Classification
CrisisViT: A Robust Vision Transformer for Crisis Image Classification
Zijun Long
R. McCreadie
Muhammad Imran
151
10
0
05 Jan 2024
Towards Weakly Supervised Text-to-Audio Grounding
Towards Weakly Supervised Text-to-Audio Grounding
Xuenan Xu
Ziyang Ma
Mengyue Wu
Kai Yu
AI4TS
83
9
0
05 Jan 2024
Few-shot Adaptation of Multi-modal Foundation Models: A Survey
Few-shot Adaptation of Multi-modal Foundation Models: A Survey
Fan Liu
Tianshu Zhang
Wenwen Dai
Wenwen Cai
Wenwen Cai Xiaocong Zhou
Delong Chen
VLMOffRL
82
30
0
03 Jan 2024
Skeleton2vec: A Self-supervised Learning Framework with Contextualized
  Target Representations for Skeleton Sequence
Skeleton2vec: A Self-supervised Learning Framework with Contextualized Target Representations for Skeleton Sequence
Ruizhuo Xu
Linzhi Huang
Mei Wang
Jiani Hu
Weihong Deng
ViTMedIm
95
3
0
01 Jan 2024
Masked Modeling for Self-supervised Representation Learning on Vision
  and Beyond
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
112
15
0
31 Dec 2023
Morphing Tokens Draw Strong Masked Image Models
Morphing Tokens Draw Strong Masked Image Models
Taekyung Kim
Byeongho Heo
Dongyoon Han
192
3
0
30 Dec 2023
Learning Vision from Models Rivals Learning Vision from Data
Learning Vision from Models Rivals Learning Vision from Data
Yonglong Tian
Lijie Fan
Kaifeng Chen
Dina Katabi
Dilip Krishnan
Phillip Isola
108
51
0
28 Dec 2023
Learning to Embed Time Series Patches Independently
Learning to Embed Time Series Patches Independently
Seunghan Lee
Taeyoung Park
Kibok Lee
SSLAI4TS
95
31
0
27 Dec 2023
emotion2vec: Self-Supervised Pre-Training for Speech Emotion
  Representation
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Ziyang Ma
Zhisheng Zheng
Jiaxin Ye
Jinchao Li
Zhifu Gao
Shiliang Zhang
Xie Chen
MDESLRSSL
87
114
0
23 Dec 2023
Bootstrap Masked Visual Modeling via Hard Patches Mining
Bootstrap Masked Visual Modeling via Hard Patches Mining
Haochen Wang
Junsong Fan
Yuxi Wang
Kaiyou Song
Tiancai Wang
Xiangyu Zhang
Zhaoxiang Zhang
81
5
0
21 Dec 2023
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual
  Test-Time Adaptation
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
Jiaming Liu
Ran Xu
Senqiao Yang
Renrui Zhang
Qizhe Zhang
Zehui Chen
Yandong Guo
Shanghang Zhang
TTA
77
12
0
19 Dec 2023
Efficiency-oriented approaches for self-supervised speech representation
  learning
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
62
1
0
18 Dec 2023
Towards Compact 3D Representations via Point Feature Enhancement Masked
  Autoencoders
Towards Compact 3D Representations via Point Feature Enhancement Masked Autoencoders
Yaohua Zha
Huizhen Ji
Jinmin Li
Rongsheng Li
Tao Dai
Bin Chen
Zhi Wang
Shu-Tao Xia
3DPC
109
32
0
17 Dec 2023
Audio-visual fine-tuning of audio-only ASR models
Audio-visual fine-tuning of audio-only ASR models
Avner May
Dmitriy Serdyuk
Ankit Parag Shah
Otavio Braga
Olivier Siohan
72
3
0
14 Dec 2023
4M: Massively Multimodal Masked Modeling
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
99
74
0
11 Dec 2023
Large-scale Training of Foundation Models for Wearable Biosignals
Large-scale Training of Foundation Models for Wearable Biosignals
Salar Abbaspourazad
Oussama Elachqar
Andrew C. Miller
S. Emrani
Udhyakumar Nallasamy
Ian Shapiro
81
37
0
08 Dec 2023
Emergence and Function of Abstract Representations in Self-Supervised
  Transformers
Emergence and Function of Abstract Representations in Self-Supervised Transformers
Quentin RV. Ferry
Joshua Ching
Takashi Kawai
78
3
0
08 Dec 2023
LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL
  Architectures
LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures
Vimal Thilak
Chen Huang
Omid Saremi
Laurent Dinh
Hanlin Goh
Preetum Nakkiran
Josh Susskind
Etai Littwin
109
10
0
07 Dec 2023
Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
Arun V. Reddy
William Paul
Corban Rivera
Ketul Shah
Celso M. de Melo
Rama Chellappa
132
4
0
05 Dec 2023
Rejuvenating image-GPT as Strong Visual Representation Learners
Rejuvenating image-GPT as Strong Visual Representation Learners
Sucheng Ren
Zeyu Wang
Hongru Zhu
Junfei Xiao
Alan Yuille
Cihang Xie
VLM
116
8
0
04 Dec 2023
Bigger is not Always Better: The Effect of Context Size on Speech
  Pre-Training
Bigger is not Always Better: The Effect of Context Size on Speech Pre-Training
Sean Robertson
Ewan Dunbar
SSL
71
1
0
03 Dec 2023
Stochastic Vision Transformers with Wasserstein Distance-Aware Attention
Stochastic Vision Transformers with Wasserstein Distance-Aware Attention
Franciskus Xaverius Erick
Mina Rezaei
Johanna P. Müller
Bernhard Kainz
48
0
0
30 Nov 2023
Previous
12345...101112
Next