v1v2v3 (latest)

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

7 February 2022

Papers citing "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"

50 / 557 papers shown

Title
Discrete JEPA: Learning Discrete Token Representations without Reconstruction Junyeob Baek Hosung Lee Christopher Hoang Mengye Ren Sungjin Ahn 22 0 0 17 Jun 2025
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes Tony Alex S. Ahmed A. Mustafa Muhammad Awais Philip J. B. Jackson 26 1 0 13 Jun 2025
PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation Yanlong Chen Mattia Orlandi Pierangelo Maria Rapa Simone Benatti Luca Benini Yawei Li 102 0 0 12 Jun 2025
Vision Generalist Model: A Survey Ziyi Wang Yongming Rao Shuofeng Sun Xinrun Liu Yi Wei ... Zuyan Liu Yanbo Wang Hongmin Liu Jie Zhou Jiwen Lu 70 0 0 11 Jun 2025
UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation Yihe Tang Wenlong Huang Yingke Wang Chengshu Li Roy Yuan Ruohan Zhang Jiajun Wu Li Fei-Fei 50 0 0 10 Jun 2025
Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech Jingyu Li Lingchao Mao Hairong Wang Zhendong Wang Xi Mao Xuelei Sherry Ni 25 0 0 09 Jun 2025
GigaAM: Efficient Self-Supervised Learner for Speech Recognition Aleksandr Kutsakov Alexandr Maximenko Georgii Gospodinov Pavel Bogomolov Fyodor Minkin 36 0 0 01 Jun 2025
$$\texttt{AVROBUSTBENCH}$: Benchmarking the Robustness of Audio-Visual Recognition Models at Test-Time$ $\texttt{AVROBUSTBENCH}$ : Benchmarking the Robustness of Audio-Visual Recognition Models at Test-Time Sarthak Kumar Maharana Saksham Singh Kushwaha Baoming Zhang Adrian Rodriguez Songtao Wei Yapeng Tian Yunhui Guo TTA VLM 38 0 0 31 May 2025
Unfolding A Few Structures for The Many: Memory-Efficient Compression of Conformer and Speech Foundation Models Zhaoqing Li Haoning Xu Xurong Xie Zengrui Jin Tianzi Wang Xunying Liu 39 0 0 27 May 2025
Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis Tianyi Xu Hongjie Chen Wang Qing Lv Hang Jian Kang Li Jie Zhennan Lin Yongxiang Li Xie Lei 19 0 0 27 May 2025
Automated data curation for self-supervised learning in underwater acoustic analysis Hilde I. Hummel Sandjai Bhulai Burooj Ghani R. V. D. Mei 21 0 0 26 May 2025
X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance Junbo Zhang Heinrich Dinkel Yadong Niu Chenyu Liu Si Cheng Anbei Zhao Jian Luan 173 0 0 22 May 2025
SHEET: A Multi-purpose Open-source Speech Human Evaluation Estimation Toolkit Wen-Chin Huang Erica Cooper Tomoki Toda 108 1 0 21 May 2025
Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition Shuo Zhang Jinsong Zhang Zhejun Zhang Lei Li MoE 55 0 0 20 May 2025
Self-supervised perception for tactile skin covered dexterous hands Akash Sharma Carolina Higuera Chaithanya Krishna Bodduluri Ziqiang Liu Taosha Fan ... Byron Boots Michael Kaess Tingfan Wu Francois Robert Hogan Mustafa Mukadam SSL 86 2 0 16 May 2025
TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models Junyi Peng Takanori Ashihara Marc Delcroix Tsubasa Ochiai Oldrich Plchot Shoko Araki J. Černocký ELM 118 0 0 10 May 2025
seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models Hafez Ghaemi Eilif Muller Shahab Bakhtiari 166 0 0 06 May 2025
Contextures: Representations from Contexts Runtian Zhai Kai Yang Che-Ping Tsai Burak Varici Zico Kolter Pradeep Ravikumar 447 0 0 02 May 2025
SparseJEPA: Sparse Representation Learning of Joint Embedding Predictive Architectures Max Hartman Lav Varshney 114 0 0 22 Apr 2025
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning Yang Yue Yulin Wang Chenxin Tao Pan Liu Shiji Song Gao Huang MedIm 77 0 0 18 Apr 2025
EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance Yang Yue Yulin Wang Haojun Jiang Pan Liu S. Song Gao Huang VGen 114 0 0 17 Apr 2025
Balancing long- and short-term dynamics for the modeling of saliency in videos Theodor Wulff Fares Abawi Philipp Allgeuer Stefan Wermter 66 0 0 08 Apr 2025
REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval Shabnam Choudhury Yash Salunkhe Sarthak Mehrotra Biplab Banerjee 84 0 0 04 Apr 2025
Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation Wupeng Wang Zexu Pan Xianrui Li Shuai Wang Haizhou Li AI4TS 79 0 0 03 Apr 2025
Dual-stream Transformer-GCN Model with Contextualized Representations Learning for Monocular 3D Human Pose Estimation Mingrui Ye Lianping Yang Hegui Zhu Zenghao Zheng Xin Wang Yantao Lo ViT 95 0 0 02 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives Shuyu Li Shulei Ji Zihao Wang Songruoyao Wu Jiaxing Yu Kai Zhang MGen VGen 297 1 0 01 Apr 2025
Magnitude-Phase Dual-Path Speech Enhancement Network based on Self-Supervised Embedding and Perceptual Contrast Stretch Boosting Alimjan Mattursun Liejun Wang Yinfeng Yu Chunyang Ma 114 0 0 27 Mar 2025
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages Yangyang Meng Jinpeng Li Guodong Lin Yu Pu G. Wang Hu Du Zhiming Shao Yukai Huang Ke Li Wei-Qiang Zhang ObjD 161 0 0 26 Mar 2025
Structured-Noise Masked Modeling for Video, Audio and Beyond Aritra Bhowmik Fida Mohammad Thoker Carlos Hinojosa Bernard Ghanem Cees G. M. Snoek VGen 108 0 0 20 Mar 2025
Heterogeneous bimodal attention fusion for speech emotion recognition Jiachen Luo Huy Phan Lin Wang Joshua Reiss 133 0 0 09 Mar 2025
The order in speech disorder: a scoping review of state of the art machine learning methods for clinical speech classification Birger Moëll Fredrik Sand Aronsson Per Östberg Jonas Beskow 65 1 0 03 Mar 2025
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation Alexander H. Liu Sang-gil Lee Chao-Han Huck Yang Yuan Gong Yu-Chun Wang James Glass Rafael Valle Bryan Catanzaro SSL 99 1 0 02 Mar 2025
Twofold Debiasing Enhances Fine-Grained Learning with Coarse Labels Xin-yang Zhao Jian Jin Yang-yang Li Yazhou Yao 81 0 0 27 Feb 2025
Escaping The Big Data Paradigm in Self-Supervised Representation Learning Carlos Vélez García Miguel Cazorla Jorge Pomares 85 0 0 25 Feb 2025
Graph Perceiver IO: A General Architecture for Graph Structured Data Seyun Bae Hoyoon Byun Changdae Oh Yoon-Sik Cho Kyungwoo Song GNN 258 3 0 24 Feb 2025
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations Benedikt Alkin Lukas Miklautz Sepp Hochreiter Johannes Brandstetter VLM 261 8 0 24 Feb 2025
voc2vec: A Foundation Model for Non-Verbal Vocalization Alkis Koudounas Moreno La Quatra Marco Sabato Siniscalchi Elena Baralis 81 2 0 22 Feb 2025
On the Robust Approximation of ASR Metrics Abdul Waheed Hanin Atwany Rita Singh Bhiksha Raj 47 0 0 18 Feb 2025
Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning Aurian Quélennec Pierre Chouteau Geoffroy Peeters S. Essid SSL 153 0 0 17 Feb 2025
From Pixels to Components: Eigenvector Masking for Visual Representation Learning Alice Bizeul Thomas M. Sutter Alain Ryser Bernhard Schölkopf Julius von Kügelgen Julia E. Vogt 197 2 0 10 Feb 2025
ConceptVAE: Self-Supervised Fine-Grained Concept Disentanglement from 2D Echocardiographies C. Ciușdel Alex Serban Tiziano Passerini CoGe 120 1 0 03 Feb 2025
Fine Tuning without Catastrophic Forgetting via Selective Low Rank Adaptation Reza Akbarian Bafghi Carden Bagwell Avinash Ravichandran Ashish Shrivastava M. Raissi 82 2 0 28 Jan 2025
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation Sungnyun Kim Sungwoo Cho Sangmin Bae Kangwook Jang Se-Young Yun SSL 169 1 0 23 Jan 2025
On Creating A Brain-To-Text Decoder Zenon Lamprou Yashar Moshfeghi 82 0 0 10 Jan 2025
PiLaMIM: Toward Richer Visual Representations by Integrating Pixel and Latent Masked Image Modeling Junmyeong Lee Eui Jun Hwang Sukmin Cho Jong C. Park 89 0 0 06 Jan 2025
Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey) Subba Reddy Oota Zijiao Chen Manish Gupta R. Bapi G. Jobard F. Alexandre X. Hinaut 3DV AI4CE 145 15 0 31 Dec 2024
The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder Learning Shentong Mo 100 0 0 23 Dec 2024
A Concept-Centric Approach to Multi-Modality Learning Yuchong Geng Ao Tang 162 0 0 18 Dec 2024
Wearable Accelerometer Foundation Models for Health via Knowledge Distillation Salar Abbaspourazad Anshuman Mishra Joseph D. Futoma Andrew C. Miller Ian Shapiro 183 0 0 15 Dec 2024
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR Pengcheng Guo Xuankai Chang Hang Lv Shinji Watanabe Lei Xie 113 1 0 07 Dec 2024