ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.03555
  4. Cited By
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
v1v2v3 (latest)

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
    SSLVLMViT
ArXiv (abs)PDFHTML

Papers citing "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"

50 / 557 papers shown
Title
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for
  Robust 3D Robotic Manipulation
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
Yueru Jia
Jiaming Liu
Sixiang Chen
Chenyang Gu
Zihan Wang
...
Lily Lee
Pengwei Wang
Zhongyuan Wang
Renrui Zhang
Shanghang Zhang
174
19
0
27 Nov 2024
Image Generation Diversity Issues and How to Tame Them
Image Generation Diversity Issues and How to Tame Them
Mischa Dombrowski
Weitong Zhang
Sarah Cechnicka
Hadrien Reynaud
Bernhard Kainz
132
1
0
25 Nov 2024
Everything is a Video: Unifying Modalities through Next-Frame Prediction
Everything is a Video: Unifying Modalities through Next-Frame Prediction
G. Hudson
Dean L. Slack
T. Winterbottom
Jamie Sterling
Chenghao Xiao
Junjie Shentu
Noura Al Moubayed
77
2
0
15 Nov 2024
Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
Wupeng Wang
Zexu Pan
Xianrui Li
Shuai Wang
Haoyang Li
78
4
0
05 Nov 2024
Music Foundation Model as Generic Booster for Music Downstream Tasks
Music Foundation Model as Generic Booster for Music Downstream Tasks
Weihsiang Liao
Yuhta Takida
Yukara Ikemiya
Zhi-Wei Zhong
Chieh-Hsin Lai
...
Stefan Uhlich
Taketo Akama
Woosung Choi
Yuichiro Koyama
Yuki Mitsufuji
237
1
0
02 Nov 2024
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
Heng-Jui Chang
Hongyu Gong
Changhan Wang
James R. Glass
Yu-An Chung
120
0
0
31 Oct 2024
Sparsh: Self-supervised touch representations for vision-based tactile
  sensing
Sparsh: Self-supervised touch representations for vision-based tactile sensing
Carolina Higuera
Akash Sharma
Chaithanya Krishna Bodduluri
Taosha Fan
Patrick E. Lancaster
...
Michael Kaess
Byron Boots
Mike Lambeta
Tingfan Wu
Mustafa Mukadam
85
23
0
31 Oct 2024
Enhancing TTS Stability in Hebrew using Discrete Semantic Units
Enhancing TTS Stability in Hebrew using Discrete Semantic Units
Ella Zeldes
Or Tal
Yossi Adi
59
1
0
28 Oct 2024
Connecting Joint-Embedding Predictive Architecture with Contrastive
  Self-supervised Learning
Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
Shentong Mo
Shengbang Tong
98
1
0
25 Oct 2024
AC-Mix: Self-Supervised Adaptation for Low-Resource Automatic Speech
  Recognition using Agnostic Contrastive Mixup
AC-Mix: Self-Supervised Adaptation for Low-Resource Automatic Speech Recognition using Agnostic Contrastive Mixup
Carlos Carvalho
A. Abad
83
0
0
18 Oct 2024
Self-supervised contrastive learning performs non-linear system identification
Self-supervised contrastive learning performs non-linear system identification
Rodrigo González Laiz
Tobias Schmidt
Steffen Schneider
SSL
85
1
0
18 Oct 2024
EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech
  Representation Learning
EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Ashish Seth
Ramaneswaran Selvakumar
S. Sakshi
Sonal Kumar
Sreyan Ghosh
Dinesh Manocha
85
0
0
17 Oct 2024
Investigation of Speaker Representation for Target-Speaker Speech
  Processing
Investigation of Speaker Representation for Target-Speaker Speech Processing
Takanori Ashihara
Takafumi Moriya
Shota Horiguchi
Junyi Peng
Tsubasa Ochiai
Marc Delcroix
Kohei Matsuura
Hiroshi Sato
66
1
0
15 Oct 2024
JOOCI: a Framework for Learning Comprehensive Speech Representations
JOOCI: a Framework for Learning Comprehensive Speech Representations
Hemant Yadav
R. Shah
Sunayana Sitaram
90
0
0
14 Oct 2024
Adaptive Diffusion Terrain Generator for Autonomous Uneven Terrain
  Navigation
Adaptive Diffusion Terrain Generator for Autonomous Uneven Terrain Navigation
Youwei Yu
Junhong Xu
Lantao Liu
63
5
0
14 Oct 2024
Locality Alignment Improves Vision-Language Models
Locality Alignment Improves Vision-Language Models
Ian Covert
Tony Sun
James Zou
Tatsunori Hashimoto
VLM
267
7
0
14 Oct 2024
Learning General Representation of 12-Lead Electrocardiogram with a
  Joint-Embedding Predictive Architecture
Learning General Representation of 12-Lead Electrocardiogram with a Joint-Embedding Predictive Architecture
Sehun Kim
66
2
0
11 Oct 2024
Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge
Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge
Yi Zhu
C. Goel
Surya Koppisetti
Trang Tran
Ankur Kumar
Gaurav Bharaj
AAML
55
0
0
09 Oct 2024
Forte : Finding Outliers with Representation Typicality Estimation
Forte : Finding Outliers with Representation Typicality Estimation
Debargha Ganguly
Warren Morningstar
A. Yu
Vipin Chaudhary
OODD
93
2
0
02 Oct 2024
Denoising with a Joint-Embedding Predictive Architecture
Denoising with a Joint-Embedding Predictive Architecture
Dengsheng Chen
Jie Hu
Xiaoming Wei
Enhua Wu
DiffM
172
3
0
02 Oct 2024
You Only Speak Once to See
You Only Speak Once to See
Wenhao Yang
Jianguo Wei
Wenhuan Lu
Lei Li
VOS
63
2
0
27 Sep 2024
Adaptive Self-Supervised Learning Strategies for Dynamic On-Device LLM
  Personalization
Adaptive Self-Supervised Learning Strategies for Dynamic On-Device LLM Personalization
Rafael Mendoza
Isabella Cruz
Richard Liu
Aarav Deshmukh
David Williams
Jesscia Peng
Rohan Iyer
85
1
0
25 Sep 2024
PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings
PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings
Sutharsan Mahendren
Saimunur Rahman
Piotr Koniusz
Tharindu Fernando
Sridha Sridharan
Clinton Fookes
Peyman Moghadam
3DPC
88
0
0
24 Sep 2024
CA-MHFA: A Context-Aware Multi-Head Factorized Attentive Pooling for
  SSL-Based Speaker Verification
CA-MHFA: A Context-Aware Multi-Head Factorized Attentive Pooling for SSL-Based Speaker Verification
Junyi Peng
Ladislav Mošner
Lin Zhang
Oldrich Plchot
Themos Stafylakis
Lukáš Burget
Jan Černocký
55
0
0
23 Sep 2024
The ParlaSpeech Collection of Automatically Generated Speech and Text
  Datasets from Parliamentary Proceedings
The ParlaSpeech Collection of Automatically Generated Speech and Text Datasets from Parliamentary Proceedings
Nikola Ljubesic
Peter Rupnik
Danijel Koržinek
64
1
0
23 Sep 2024
Is Tokenization Needed for Masked Particle Modelling?
Is Tokenization Needed for Masked Particle Modelling?
Matthew Leigh
Samuel Klein
François Charton
Tobias Golling
Lukas Heinrich
Michael Kagan
Ines Ochoa
Margarita Osadchy
95
8
0
19 Sep 2024
Stimulus Modality Matters: Impact of Perceptual Evaluations from
  Different Modalities on Speech Emotion Recognition System Performance
Stimulus Modality Matters: Impact of Perceptual Evaluations from Different Modalities on Speech Emotion Recognition System Performance
Huang-Cheng Chou
Haibin Wu
Chi-Chun Lee
93
2
0
16 Sep 2024
Self-supervised Speech Models for Word-Level Stuttered Speech Detection
Self-supervised Speech Models for Word-Level Stuttered Speech Detection
Yi-Jen Shih
Zoi Gkalitsiou
A. Dimakis
David Harwath
111
3
0
16 Sep 2024
NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training
NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training
Minglun Han
Ye Bai
Chen Shen
Youjia Huang
Mingkun Huang
Zehua Lin
Linhao Dong
Lu Lu
Yuxuan Wang
76
1
0
13 Sep 2024
Connecting Concept Convexity and Human-Machine Alignment in Deep Neural
  Networks
Connecting Concept Convexity and Human-Machine Alignment in Deep Neural Networks
Teresa Dorszewski
Lenka Tětková
Lorenz Linhardt
Lars Kai Hansen
HAI
77
0
0
10 Sep 2024
A Survey of the Self Supervised Learning Mechanisms for Vision Transformers
A Survey of the Self Supervised Learning Mechanisms for Vision Transformers
Asifullah Khan
A. Sohail
Mustansar Fiaz
Mehdi Hassan
Tariq Habib Afridi
...
Muhammad Zaigham Zaheer
Kamran Ali
Tangina Sultana
Ziaurrehman Tanoli
Naeem Akhter
280
5
0
30 Aug 2024
SSDM: Scalable Speech Dysfluency Modeling
SSDM: Scalable Speech Dysfluency Modeling
Jiachen Lian
Xuanru Zhou
Z. Ezzes
Jet M J Vonk
Brittany Morin
D. Baquirin
Zachary Mille
M. G. Tempini
Gopala Anumanchipalli
AuLLM
113
4
0
29 Aug 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Shengpeng Ji
Ziyue Jiang
Xize Cheng
Yifu Chen
Minghui Fang
...
Rongjie Huang
Yidi Jiang
Qian Chen
Zhou Zhao
Zhou Zhao
VLM
149
45
0
29 Aug 2024
GSIFN: A Graph-Structured and Interlaced-Masked Multimodal
  Transformer-based Fusion Network for Multimodal Sentiment Analysis
GSIFN: A Graph-Structured and Interlaced-Masked Multimodal Transformer-based Fusion Network for Multimodal Sentiment Analysis
Yijie Jin
69
0
0
27 Aug 2024
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech
  Processing Tasks
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
He Huang
Taejin Park
Kunal Dhawan
Ivan Medennikov
Krishna Puvvada
Nithin Rao Koluguri
Weiqing Wang
Jagadeesh Balam
Boris Ginsburg
SSLAI4TS
84
1
0
23 Aug 2024
BUT Systems and Analyses for the ASVspoof 5 Challenge
BUT Systems and Analyses for the ASVspoof 5 Challenge
Johan Rohdin
Lin Zhang
Oldřich Plchot
Vojtěch Staněk
David Mihola
...
Themos Stafylakis
Dmitriy Beveraki
Anna Silnova
Jan Brukner
Lukáš Burget
79
3
0
20 Aug 2024
mRNA2vec: mRNA Embedding with Language Model in the 5ÚTR-CDS for mRNA
  Design
mRNA2vec: mRNA Embedding with Language Model in the 5ÚTR-CDS for mRNA Design
Honggen Zhang
Xiangrui Gao
Igor Molybog
Lipeng Lai
50
1
0
16 Aug 2024
SpectralEarth: Training Hyperspectral Foundation Models at Scale
SpectralEarth: Training Hyperspectral Foundation Models at Scale
Nassim Ait Ali Braham
C. Albrecht
Julien Mairal
J. Chanussot
Yi Wang
X. Zhu
82
15
0
15 Aug 2024
Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem
  Compatibility Estimation
Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem Compatibility Estimation
Alain Riou
Stefan Lattner
Gaëtan Hadjeres
Michael Anslow
Geoffroy Peeters
71
2
0
05 Aug 2024
Towards Achieving Human Parity on End-to-end Simultaneous Speech
  Translation via LLM Agent
Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent
Shanbo Cheng
Zhichao Huang
Tom Ko
Hang Li
Ningxin Peng
Lu Xu
Qini Zhang
90
6
0
31 Jul 2024
Beyond Silent Letters: Amplifying LLMs in Emotion Recognition with Vocal
  Nuances
Beyond Silent Letters: Amplifying LLMs in Emotion Recognition with Vocal Nuances
Mieko Ochi
Ziwei Gong
D. Komura
Pengyuan Shi
Kaan Donbekci
Julia Hirschberg
110
16
0
31 Jul 2024
SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake
  Detection
SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection
Yi Zhu
Surya Koppisetti
Trang Tran
Gaurav Bharaj
118
10
0
26 Jul 2024
Towards Latent Masked Image Modeling for Self-Supervised Visual
  Representation Learning
Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning
Yibing Wei
Abhinav Gupta
Pedro Morgado
SSL
77
8
0
22 Jul 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep
  Speaker Representation Learning
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
117
6
0
21 Jul 2024
Linear-Complexity Self-Supervised Learning for Speech Processing
Linear-Complexity Self-Supervised Learning for Speech Processing
Shucong Zhang
Titouan Parcollet
Rogier van Dalen
Sourav Bhattacharya
122
1
0
18 Jul 2024
ColorMAE: Exploring data-independent masking strategies in Masked
  AutoEncoders
ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders
Carlos Hinojosa
Shuming Liu
Guohao Li
72
2
0
17 Jul 2024
A Closer Look at Benchmarking Self-Supervised Pre-training with Image
  Classification
A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification
Markus Marks
Manuel Knott
Neehar Kondapaneni
Elijah Cole
T. Defraeye
Fernando Pérez-Cruz
Pietro Perona
SSL
132
5
0
16 Jul 2024
Efficient Unsupervised Visual Representation Learning with Explicit
  Cluster Balancing
Efficient Unsupervised Visual Representation Learning with Explicit Cluster Balancing
Ioannis Maniadis Metaxas
Georgios Tzimiropoulos
Ioannis Patras
SSL
109
0
0
15 Jul 2024
AnatoMask: Enhancing Medical Image Segmentation with
  Reconstruction-guided Self-masking
AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking
Yuheng Li
Tianyu Luan
Yizhou Wu
Shaoyan Pan
Yenho Chen
Xiaofeng Yang
83
6
0
09 Jul 2024
Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in
  Tunisian Dialect
Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in Tunisian Dialect
Salima Mdhaffar
Haroun Elleuch
Fethi Bougares
Yannick Esteve
124
1
0
05 Jul 2024
Previous
12345...101112
Next