ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.03555
  4. Cited By
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
v1v2v3 (latest)

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
    SSLVLMViT
ArXiv (abs)PDFHTML

Papers citing "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"

50 / 557 papers shown
Title
Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction
Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction
Chenxin Xu
R. Tan
Yuhong Tan
Siheng Chen
Xinchao Wang
Yanfeng Wang
3DH
121
22
0
17 Aug 2023
SRMAE: Masked Image Modeling for Scale-Invariant Deep Representations
SRMAE: Masked Image Modeling for Scale-Invariant Deep Representations
Zhiming Wang
Lin Gu
Feng Lu
96
0
0
17 Aug 2023
Computer vision-enriched discrete choice models, with an application to
  residential location choice
Computer vision-enriched discrete choice models, with an application to residential location choice
Sander van Cranenburgh
Francisco Garrido-Valenzuela
57
2
0
16 Aug 2023
Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images
  with Free Attention Masks
Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks
David Junhao Zhang
Mutian Xu
Chuhui Xue
Wenqing Zhang
Xiaoguang Han
Song Bai
Mike Zheng Shou
DiffM
131
6
0
13 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised
  Pretraining
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
133
246
0
10 Aug 2023
Speaker Recognition Using Isomorphic Graph Attention Network Based
  Pooling on Self-Supervised Representation
Speaker Recognition Using Isomorphic Graph Attention Network Based Pooling on Self-Supervised Representation
Zirui Ge
Xinzhou Xu
Haiyan Guo
Tingting Wang
Zhen Yang
SSL
74
2
0
09 Aug 2023
Elucidate Gender Fairness in Singing Voice Transcription
Elucidate Gender Fairness in Singing Voice Transcription
Xiangming Gu
Weizhen Zeng
Ye Wang
80
3
0
05 Aug 2023
SALTTS: Leveraging Self-Supervised Speech Representations for improved
  Text-to-Speech Synthesis
SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis
Ramanan Sivaguru
Vasista Sai Lodagala
S. Umesh
56
2
0
02 Aug 2023
Multimodal Multi-loss Fusion Network for Sentiment Analysis
Multimodal Multi-loss Fusion Network for Sentiment Analysis
Zehui Wu
Ziwei Gong
Jaywon Koo
Julia Hirschberg
128
27
0
01 Aug 2023
How to Scale Your EMA
How to Scale Your EMA
Dan Busbridge
Jason Ramapuram
Pierre Ablin
Tatiana Likhomanenko
Eeshan Gunesh Dhekane
Xavier Suau
Russ Webb
82
19
0
25 Jul 2023
MOCA: Self-supervised Representation Learning by Predicting Masked
  Online Codebook Assignments
MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments
Spyros Gidaris
Andrei Bursuc
Oriane Siméoni
Antonín Vobecký
N. Komodakis
Matthieu Cord
Patrick Pérez
SSLViT
63
3
0
18 Jul 2023
Learn from Incomplete Tactile Data: Tactile Representation Learning with
  Masked Autoencoders
Learn from Incomplete Tactile Data: Tactile Representation Learning with Masked Autoencoders
G. Cao
Jiaqi Jiang
Danushka Bollegala
Shan Luo
86
14
0
14 Jul 2023
DSV: An Alignment Validation Loss for Self-supervised Outlier Model
  Selection
DSV: An Alignment Validation Loss for Self-supervised Outlier Model Selection
Jaemin Yoo
Yue Zhao
Lingxiao Zhao
Leman Akoglu
44
5
0
13 Jul 2023
Self-supervised adversarial masking for 3D point cloud representation
  learning
Self-supervised adversarial masking for 3D point cloud representation learning
Michal Szachniewicz
Wojciech Kozlowski
Michal Stypulkowski
Maciej Ziȩba
3DPC
51
2
0
11 Jul 2023
On the Effectiveness of Speech Self-supervised Learning for Music
On the Effectiveness of Speech Self-supervised Learning for Music
Yi Ma
Ruibin Yuan
Yizhi Li
Ge Zhang
Xingran Chen
...
Ruibo Liu
Gus Xia
Roger Dannenberg
Yi-Ting Guo
Jie Fu
65
10
0
11 Jul 2023
On the Use of Self-Supervised Speech Representations in Spontaneous
  Speech Synthesis
On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis
Siyang Wang
G. Henter
Joakim Gustafson
Éva Székely
73
6
0
11 Jul 2023
Multimodal Temporal Fusion Transformers Are Good Product Demand
  Forecasters
Multimodal Temporal Fusion Transformers Are Good Product Demand Forecasters
M. Sukel
Stevan Rudinac
Marcel Worring
AI4TS
50
1
0
05 Jul 2023
What Do Self-Supervised Speech Models Know About Words?
What Do Self-Supervised Speech Models Know About Words?
Ankita Pasad
C. Chien
Shane Settle
Karen Livescu
SSL
158
36
0
30 Jun 2023
Addressing Cold Start Problem for End-to-end Automatic Speech Scoring
Addressing Cold Start Problem for End-to-end Automatic Speech Scoring
Jungbae Park
Seungtaek Choi
54
5
0
25 Jun 2023
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
Sen Liu
Yiwei Guo
Chenpeng Du
Xie Chen
Kai Yu
88
6
0
25 Jun 2023
Task-Robust Pre-Training for Worst-Case Downstream Adaptation
Task-Robust Pre-Training for Worst-Case Downstream Adaptation
Jianghui Wang
Cheng Yang
Xingyu Xie
Cong Fang
Zhouchen Lin
OOD
68
0
0
21 Jun 2023
Federated Self-Learning with Weak Supervision for Speech Recognition
Federated Self-Learning with Weak Supervision for Speech Recognition
Milind Rao
Gopinath Chennupati
Gautam Tiwari
Anit Kumar Sahu
A. Raju
Ariya Rastrow
J. Droppo
85
3
0
21 Jun 2023
Recent Advances in Direct Speech-to-text Translation
Recent Advances in Direct Speech-to-text Translation
Chen Xu
Rong Ye
Qianqian Dong
Chengqi Zhao
Tom Ko
Mingxuan Wang
Tong Xiao
Jingbo Zhu
116
23
0
20 Jun 2023
Decentralized Quantum Federated Learning for Metaverse: Analysis, Design
  and Implementation
Decentralized Quantum Federated Learning for Metaverse: Analysis, Design and Implementation
Devya Gurung
Shiva Raj Pokhrel
Gang Li
148
6
0
20 Jun 2023
FutureTOD: Teaching Future Knowledge to Pre-trained Language Model for
  Task-Oriented Dialogue
FutureTOD: Teaching Future Knowledge to Pre-trained Language Model for Task-Oriented Dialogue
Weihao Zeng
Keqing He
Yejie Wang
Chen Zeng
Jingang Wang
Yunsen Xian
Weiran Xu
55
1
0
17 Jun 2023
Evaluation of Speech Representations for MOS prediction
Evaluation of Speech Representations for MOS prediction
F. S. Oliveira
Edresson Casanova
Arnaldo Cândido Júnior
L. Gris
A. S. Soares
A. R. G. Filho
56
4
0
16 Jun 2023
Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and
  Text Integration
Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration
Chenyang Lyu
Minghao Wu
Longyue Wang
Xinting Huang
Bingshuai Liu
Zefeng Du
Shuming Shi
Zhaopeng Tu
MLLMAuLLM
86
173
0
15 Jun 2023
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech
  Representation
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation
Ziyang Ma
Zhisheng Zheng
Guanrou Yang
Yu Wang
Chuxu Zhang
Xie Chen
SSL
72
9
0
15 Jun 2023
EM-Network: Oracle Guided Self-distillation for Sequence Learning
EM-Network: Oracle Guided Self-distillation for Sequence Learning
J. Yoon
Sunghwan Ahn
Hyeon Seung Lee
Minchan Kim
Seokhwan Kim
N. Kim
VLM
90
2
0
14 Jun 2023
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture
  Linguistic Knowledge?
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
Yusuke Ijima
Taichi Asami
Marc Delcroix
Yukinori Honma
SSLELM
79
11
0
14 Jun 2023
Efficient Adapters for Giant Speech Models
Efficient Adapters for Giant Speech Models
Nanxin Chen
Izhak Shafran
Yu Zhang
Chung-Cheng Chiu
H. Soltau
James Qin
Yonghui Wu
91
10
0
13 Jun 2023
GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio
  Pretraining for Accurate Speech Emotion Recognition
GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Accurate Speech Emotion Recognition
Yu Pan
Yanni Hu
Yuguang Yang
Wen Fei
Jixun Yao
Heng Lu
Lei Ma
Jianjun Zhao
VLM
124
12
0
13 Jun 2023
MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge
  in Speech Emotion Recognition
MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge in Speech Emotion Recognition
Haiyang Sun
Fulin Zhang
Yingying Gao
Zheng Lian
Shilei Zhang
Junlan Feng
47
4
0
12 Jun 2023
Understanding Masked Autoencoders via Hierarchical Latent Variable
  Models
Understanding Masked Autoencoders via Hierarchical Latent Variable Models
Lingjing Kong
Martin Q. Ma
Guan-Hong Chen
Eric Xing
Yuejie Chi
Louis-Philippe Morency
Kun Zhang
87
32
0
08 Jun 2023
Exposing flaws of generative model evaluation metrics and their unfair
  treatment of diffusion models
Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models
G. Stein
Jesse C. Cresswell
Rasa Hosseinzadeh
Yi Sui
Brendan Leigh Ross
Valentin Villecroze
Zhaoyan Liu
Anthony L. Caterini
J. E. T. Taylor
Gabriel Loaiza-Ganem
EGVM
155
108
0
07 Jun 2023
Self-supervised Audio Teacher-Student Transformer for Both Clip-level
  and Frame-level Tasks
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks
Xian Li
Nian Shao
Xiaofei Li
ViTCLIP
103
28
0
07 Jun 2023
Quantifying the Variability Collapse of Neural Networks
Quantifying the Variability Collapse of Neural Networks
Jing-Xue Xu
Haoxiong Liu
94
6
0
06 Jun 2023
rPPG-MAE: Self-supervised Pre-training with Masked Autoencoders for
  Remote Physiological Measurement
rPPG-MAE: Self-supervised Pre-training with Masked Autoencoders for Remote Physiological Measurement
Xin Liu
Yuting Zhang
Zitong Yu
Hao Lu
Huanjing Yue
Jingyu Yang
82
31
0
04 Jun 2023
Task-Agnostic Structured Pruning of Speech Representation Models
Task-Agnostic Structured Pruning of Speech Representation Models
Haoyu Wang
Siyuan Wang
Weiqiang Zhang
Hongbin Suo
Yulong Wan
VLM
69
19
0
02 Jun 2023
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
...
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
3DH
128
188
0
01 Jun 2023
Masked Autoencoders with Multi-Window Local-Global Attention Are Better
  Audio Learners
Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners
Sarthak Yadav
Sergios Theodoridis
Lars Kai Hansen
Zheng-Hua Tan
92
9
0
01 Jun 2023
Speech Self-Supervised Representation Benchmarking: Are We Doing it
  Right?
Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?
Salah Zaiem
Youcef Kemiche
Titouan Parcollet
S. Essid
Mirco Ravanelli
SSL
99
27
0
01 Jun 2023
MERT: Acoustic Music Understanding Model with Large-Scale
  Self-supervised Training
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Yizhi Li
Ruibin Yuan
Ge Zhang
Yi Ma
Xingran Chen
...
Yemin Shi
Wen-Fen Huang
Zili Wang
Yi-Ting Guo
Jie Fu
119
130
0
31 May 2023
Make-A-Voice: Unified Voice Synthesis With Discrete Representation
Make-A-Voice: Unified Voice Synthesis With Discrete Representation
Rongjie Huang
Chunlei Zhang
Yongqiang Wang
Dongchao Yang
Lu Liu
Zhenhui Ye
Ziyue Jiang
Chao Weng
Zhou Zhao
Dong Yu
DiffM
88
27
0
30 May 2023
Exploration of Efficient End-to-End ASR using Discretized Input from
  Self-Supervised Learning
Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning
Xuankai Chang
Brian Yan
Yuya Fujita
Takashi Maekaku
Shinji Watanabe
79
40
0
29 May 2023
Semantic Role Labeling Guided Out-of-distribution Detection
Semantic Role Labeling Guided Out-of-distribution Detection
Jinan Zou
Maihao Guo
Yu Tian
Yuhao Lin
Hai Cao
Lingqiao Liu
Ehsan Abbasnejad
Javen Qinfeng Shi
OODD
132
1
0
29 May 2023
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech
  Models
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models
Yifan Peng
Yui Sudo
Muhammad Shakeel
Shinji Watanabe
73
43
0
28 May 2023
CIF-PT: Bridging Speech and Text Representations for Spoken Language
  Understanding via Continuous Integrate-and-Fire Pre-Training
CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-Training
Linhao Dong
Zhecheng An
Peihao Wu
Jun Zhang
Lu Lu
Zejun Ma
49
6
0
27 May 2023
On convex decision regions in deep network representations
On convex decision regions in deep network representations
Lenka Tvetková
Thea Brusch
Teresa Scheidt
Fabian Martin Mager
R. Aagaard
Jonathan Foldager
T. S. Alstrøm
Lars Kai Hansen
93
2
0
26 May 2023
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech
  Recognition
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
Wangyou Zhang
Y. Qian
89
11
0
25 May 2023
Previous
123...567...101112
Next