Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.03555
Cited By
v1
v2
v3 (latest)
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
7 February 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"
50 / 557 papers shown
Title
Discrete JEPA: Learning Discrete Token Representations without Reconstruction
Junyeob Baek
Hosung Lee
Christopher Hoang
Mengye Ren
Sungjin Ahn
22
0
0
17 Jun 2025
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
Tony Alex
S. Ahmed
A. Mustafa
Muhammad Awais
Philip J. B. Jackson
26
1
0
13 Jun 2025
PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation
Yanlong Chen
Mattia Orlandi
Pierangelo Maria Rapa
Simone Benatti
Luca Benini
Yawei Li
102
0
0
12 Jun 2025
Vision Generalist Model: A Survey
Ziyi Wang
Yongming Rao
Shuofeng Sun
Xinrun Liu
Yi Wei
...
Zuyan Liu
Yanbo Wang
Hongmin Liu
Jie Zhou
Jiwen Lu
70
0
0
11 Jun 2025
UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation
Yihe Tang
Wenlong Huang
Yingke Wang
Chengshu Li
Roy Yuan
Ruohan Zhang
Jiajun Wu
Li Fei-Fei
50
0
0
10 Jun 2025
Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech
Jingyu Li
Lingchao Mao
Hairong Wang
Zhendong Wang
Xi Mao
Xuelei Sherry Ni
25
0
0
09 Jun 2025
GigaAM: Efficient Self-Supervised Learner for Speech Recognition
Aleksandr Kutsakov
Alexandr Maximenko
Georgii Gospodinov
Pavel Bogomolov
Fyodor Minkin
36
0
0
01 Jun 2025
AVROBUSTBENCH
\texttt{AVROBUSTBENCH}
AVROBUSTBENCH
: Benchmarking the Robustness of Audio-Visual Recognition Models at Test-Time
Sarthak Kumar Maharana
Saksham Singh Kushwaha
Baoming Zhang
Adrian Rodriguez
Songtao Wei
Yapeng Tian
Yunhui Guo
TTA
VLM
38
0
0
31 May 2025
Unfolding A Few Structures for The Many: Memory-Efficient Compression of Conformer and Speech Foundation Models
Zhaoqing Li
Haoning Xu
Xurong Xie
Zengrui Jin
Tianzi Wang
Xunying Liu
39
0
0
27 May 2025
Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis
Tianyi Xu
Hongjie Chen
Wang Qing
Lv Hang
Jian Kang
Li Jie
Zhennan Lin
Yongxiang Li
Xie Lei
19
0
0
27 May 2025
Automated data curation for self-supervised learning in underwater acoustic analysis
Hilde I. Hummel
Sandjai Bhulai
Burooj Ghani
R. V. D. Mei
21
0
0
26 May 2025
X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance
Junbo Zhang
Heinrich Dinkel
Yadong Niu
Chenyu Liu
Si Cheng
Anbei Zhao
Jian Luan
173
0
0
22 May 2025
SHEET: A Multi-purpose Open-source Speech Human Evaluation Estimation Toolkit
Wen-Chin Huang
Erica Cooper
Tomoki Toda
108
1
0
21 May 2025
Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition
Shuo Zhang
Jinsong Zhang
Zhejun Zhang
Lei Li
MoE
55
0
0
20 May 2025
Self-supervised perception for tactile skin covered dexterous hands
Akash Sharma
Carolina Higuera
Chaithanya Krishna Bodduluri
Ziqiang Liu
Taosha Fan
...
Byron Boots
Michael Kaess
Tingfan Wu
Francois Robert Hogan
Mustafa Mukadam
SSL
86
2
0
16 May 2025
TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models
Junyi Peng
Takanori Ashihara
Marc Delcroix
Tsubasa Ochiai
Oldrich Plchot
Shoko Araki
J. Černocký
ELM
118
0
0
10 May 2025
seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
Hafez Ghaemi
Eilif Muller
Shahab Bakhtiari
166
0
0
06 May 2025
Contextures: Representations from Contexts
Runtian Zhai
Kai Yang
Che-Ping Tsai
Burak Varici
Zico Kolter
Pradeep Ravikumar
447
0
0
02 May 2025
SparseJEPA: Sparse Representation Learning of Joint Embedding Predictive Architectures
Max Hartman
Lav Varshney
114
0
0
22 Apr 2025
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning
Yang Yue
Yulin Wang
Chenxin Tao
Pan Liu
Shiji Song
Gao Huang
MedIm
77
0
0
18 Apr 2025
EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance
Yang Yue
Yulin Wang
Haojun Jiang
Pan Liu
S. Song
Gao Huang
VGen
114
0
0
17 Apr 2025
Balancing long- and short-term dynamics for the modeling of saliency in videos
Theodor Wulff
Fares Abawi
Philipp Allgeuer
Stefan Wermter
66
0
0
08 Apr 2025
REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval
Shabnam Choudhury
Yash Salunkhe
Sarthak Mehrotra
Biplab Banerjee
84
0
0
04 Apr 2025
Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation
Wupeng Wang
Zexu Pan
Xianrui Li
Shuai Wang
Haizhou Li
AI4TS
79
0
0
03 Apr 2025
Dual-stream Transformer-GCN Model with Contextualized Representations Learning for Monocular 3D Human Pose Estimation
Mingrui Ye
Lianping Yang
Hegui Zhu
Zenghao Zheng
Xin Wang
Yantao Lo
ViT
95
0
0
02 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Kai Zhang
MGen
VGen
297
1
0
01 Apr 2025
Magnitude-Phase Dual-Path Speech Enhancement Network based on Self-Supervised Embedding and Perceptual Contrast Stretch Boosting
Alimjan Mattursun
Liejun Wang
Yinfeng Yu
Chunyang Ma
114
0
0
27 Mar 2025
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
Yangyang Meng
Jinpeng Li
Guodong Lin
Yu Pu
G. Wang
Hu Du
Zhiming Shao
Yukai Huang
Ke Li
Wei-Qiang Zhang
ObjD
161
0
0
26 Mar 2025
Structured-Noise Masked Modeling for Video, Audio and Beyond
Aritra Bhowmik
Fida Mohammad Thoker
Carlos Hinojosa
Bernard Ghanem
Cees G. M. Snoek
VGen
108
0
0
20 Mar 2025
Heterogeneous bimodal attention fusion for speech emotion recognition
Jiachen Luo
Huy Phan
Lin Wang
Joshua Reiss
133
0
0
09 Mar 2025
The order in speech disorder: a scoping review of state of the art machine learning methods for clinical speech classification
Birger Moëll
Fredrik Sand Aronsson
Per Östberg
Jonas Beskow
65
1
0
03 Mar 2025
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
Alexander H. Liu
Sang-gil Lee
Chao-Han Huck Yang
Yuan Gong
Yu-Chun Wang
James Glass
Rafael Valle
Bryan Catanzaro
SSL
99
1
0
02 Mar 2025
Twofold Debiasing Enhances Fine-Grained Learning with Coarse Labels
Xin-yang Zhao
Jian Jin
Yang-yang Li
Yazhou Yao
81
0
0
27 Feb 2025
Escaping The Big Data Paradigm in Self-Supervised Representation Learning
Carlos Vélez García
Miguel Cazorla
Jorge Pomares
85
0
0
25 Feb 2025
Graph Perceiver IO: A General Architecture for Graph Structured Data
Seyun Bae
Hoyoon Byun
Changdae Oh
Yoon-Sik Cho
Kyungwoo Song
GNN
258
3
0
24 Feb 2025
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Benedikt Alkin
Lukas Miklautz
Sepp Hochreiter
Johannes Brandstetter
VLM
261
8
0
24 Feb 2025
voc2vec: A Foundation Model for Non-Verbal Vocalization
Alkis Koudounas
Moreno La Quatra
Marco Sabato Siniscalchi
Elena Baralis
81
2
0
22 Feb 2025
On the Robust Approximation of ASR Metrics
Abdul Waheed
Hanin Atwany
Rita Singh
Bhiksha Raj
47
0
0
18 Feb 2025
Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning
Aurian Quélennec
Pierre Chouteau
Geoffroy Peeters
S. Essid
SSL
153
0
0
17 Feb 2025
From Pixels to Components: Eigenvector Masking for Visual Representation Learning
Alice Bizeul
Thomas M. Sutter
Alain Ryser
Bernhard Schölkopf
Julius von Kügelgen
Julia E. Vogt
197
2
0
10 Feb 2025
ConceptVAE: Self-Supervised Fine-Grained Concept Disentanglement from 2D Echocardiographies
C. Ciușdel
Alex Serban
Tiziano Passerini
CoGe
120
1
0
03 Feb 2025
Fine Tuning without Catastrophic Forgetting via Selective Low Rank Adaptation
Reza Akbarian Bafghi
Carden Bagwell
Avinash Ravichandran
Ashish Shrivastava
M. Raissi
82
2
0
28 Jan 2025
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Sungnyun Kim
Sungwoo Cho
Sangmin Bae
Kangwook Jang
Se-Young Yun
SSL
169
1
0
23 Jan 2025
On Creating A Brain-To-Text Decoder
Zenon Lamprou
Yashar Moshfeghi
82
0
0
10 Jan 2025
PiLaMIM: Toward Richer Visual Representations by Integrating Pixel and Latent Masked Image Modeling
Junmyeong Lee
Eui Jun Hwang
Sukmin Cho
Jong C. Park
89
0
0
06 Jan 2025
Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey)
Subba Reddy Oota
Zijiao Chen
Manish Gupta
R. Bapi
G. Jobard
F. Alexandre
X. Hinaut
3DV
AI4CE
145
15
0
31 Dec 2024
The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder Learning
Shentong Mo
100
0
0
23 Dec 2024
A Concept-Centric Approach to Multi-Modality Learning
Yuchong Geng
Ao Tang
162
0
0
18 Dec 2024
Wearable Accelerometer Foundation Models for Health via Knowledge Distillation
Salar Abbaspourazad
Anshuman Mishra
Joseph D. Futoma
Andrew C. Miller
Ian Shapiro
183
0
0
15 Dec 2024
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
Pengcheng Guo
Xuankai Chang
Hang Lv
Shinji Watanabe
Lei Xie
113
1
0
07 Dec 2024
1
2
3
4
...
10
11
12
Next