v1v2v3 (latest)

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

7 February 2022

Papers citing "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language"

50 / 557 papers shown

Title
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation Yueru Jia Jiaming Liu Sixiang Chen Chenyang Gu Zihan Wang ... Lily Lee Pengwei Wang Zhongyuan Wang Renrui Zhang Shanghang Zhang 174 19 0 27 Nov 2024
Image Generation Diversity Issues and How to Tame Them Mischa Dombrowski Weitong Zhang Sarah Cechnicka Hadrien Reynaud Bernhard Kainz 132 1 0 25 Nov 2024
Everything is a Video: Unifying Modalities through Next-Frame Prediction G. Hudson Dean L. Slack T. Winterbottom Jamie Sterling Chenghao Xiao Junjie Shentu Noura Al Moubayed 77 2 0 15 Nov 2024
Speech Separation with Pretrained Frontend to Minimize Domain Mismatch Wupeng Wang Zexu Pan Xianrui Li Shuai Wang Haoyang Li 78 4 0 05 Nov 2024
Music Foundation Model as Generic Booster for Music Downstream Tasks Weihsiang Liao Yuhta Takida Yukara Ikemiya Zhi-Wei Zhong Chieh-Hsin Lai ... Stefan Uhlich Taketo Akama Woosung Choi Yuichiro Koyama Yuki Mitsufuji 237 1 0 02 Nov 2024
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models Heng-Jui Chang Hongyu Gong Changhan Wang James R. Glass Yu-An Chung 120 0 0 31 Oct 2024
Sparsh: Self-supervised touch representations for vision-based tactile sensing Carolina Higuera Akash Sharma Chaithanya Krishna Bodduluri Taosha Fan Patrick E. Lancaster ... Michael Kaess Byron Boots Mike Lambeta Tingfan Wu Mustafa Mukadam 85 23 0 31 Oct 2024
Enhancing TTS Stability in Hebrew using Discrete Semantic Units Ella Zeldes Or Tal Yossi Adi 59 1 0 28 Oct 2024
Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning Shentong Mo Shengbang Tong 98 1 0 25 Oct 2024
AC-Mix: Self-Supervised Adaptation for Low-Resource Automatic Speech Recognition using Agnostic Contrastive Mixup Carlos Carvalho A. Abad 83 0 0 18 Oct 2024
Self-supervised contrastive learning performs non-linear system identification Rodrigo González Laiz Tobias Schmidt Steffen Schneider SSL 85 1 0 18 Oct 2024
EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning Ashish Seth Ramaneswaran Selvakumar S. Sakshi Sonal Kumar Sreyan Ghosh Dinesh Manocha 85 0 0 17 Oct 2024
Investigation of Speaker Representation for Target-Speaker Speech Processing Takanori Ashihara Takafumi Moriya Shota Horiguchi Junyi Peng Tsubasa Ochiai Marc Delcroix Kohei Matsuura Hiroshi Sato 66 1 0 15 Oct 2024
JOOCI: a Framework for Learning Comprehensive Speech Representations Hemant Yadav R. Shah Sunayana Sitaram 90 0 0 14 Oct 2024
Adaptive Diffusion Terrain Generator for Autonomous Uneven Terrain Navigation Youwei Yu Junhong Xu Lantao Liu 63 5 0 14 Oct 2024
Locality Alignment Improves Vision-Language Models Ian Covert Tony Sun James Zou Tatsunori Hashimoto VLM 267 7 0 14 Oct 2024
Learning General Representation of 12-Lead Electrocardiogram with a Joint-Embedding Predictive Architecture Sehun Kim 66 2 0 11 Oct 2024
Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge Yi Zhu C. Goel Surya Koppisetti Trang Tran Ankur Kumar Gaurav Bharaj AAML 55 0 0 09 Oct 2024
Forte : Finding Outliers with Representation Typicality Estimation Debargha Ganguly Warren Morningstar A. Yu Vipin Chaudhary OODD 93 2 0 02 Oct 2024
Denoising with a Joint-Embedding Predictive Architecture Dengsheng Chen Jie Hu Xiaoming Wei Enhua Wu DiffM 172 3 0 02 Oct 2024
You Only Speak Once to See Wenhao Yang Jianguo Wei Wenhuan Lu Lei Li VOS 63 2 0 27 Sep 2024
Adaptive Self-Supervised Learning Strategies for Dynamic On-Device LLM Personalization Rafael Mendoza Isabella Cruz Richard Liu Aarav Deshmukh David Williams Jesscia Peng Rohan Iyer 85 1 0 25 Sep 2024
PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings Sutharsan Mahendren Saimunur Rahman Piotr Koniusz Tharindu Fernando Sridha Sridharan Clinton Fookes Peyman Moghadam 3DPC 88 0 0 24 Sep 2024
CA-MHFA: A Context-Aware Multi-Head Factorized Attentive Pooling for SSL-Based Speaker Verification Junyi Peng Ladislav Mošner Lin Zhang Oldrich Plchot Themos Stafylakis Lukáš Burget Jan Černocký 55 0 0 23 Sep 2024
The ParlaSpeech Collection of Automatically Generated Speech and Text Datasets from Parliamentary Proceedings Nikola Ljubesic Peter Rupnik Danijel Koržinek 64 1 0 23 Sep 2024
Is Tokenization Needed for Masked Particle Modelling? Matthew Leigh Samuel Klein François Charton Tobias Golling Lukas Heinrich Michael Kagan Ines Ochoa Margarita Osadchy 95 8 0 19 Sep 2024
Stimulus Modality Matters: Impact of Perceptual Evaluations from Different Modalities on Speech Emotion Recognition System Performance Huang-Cheng Chou Haibin Wu Chi-Chun Lee 93 2 0 16 Sep 2024
Self-supervised Speech Models for Word-Level Stuttered Speech Detection Yi-Jen Shih Zoi Gkalitsiou A. Dimakis David Harwath 111 3 0 16 Sep 2024
NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training Minglun Han Ye Bai Chen Shen Youjia Huang Mingkun Huang Zehua Lin Linhao Dong Lu Lu Yuxuan Wang 76 1 0 13 Sep 2024
Connecting Concept Convexity and Human-Machine Alignment in Deep Neural Networks Teresa Dorszewski Lenka Tětková Lorenz Linhardt Lars Kai Hansen HAI 77 0 0 10 Sep 2024
A Survey of the Self Supervised Learning Mechanisms for Vision Transformers Asifullah Khan A. Sohail Mustansar Fiaz Mehdi Hassan Tariq Habib Afridi ... Muhammad Zaigham Zaheer Kamran Ali Tangina Sultana Ziaurrehman Tanoli Naeem Akhter 280 5 0 30 Aug 2024
SSDM: Scalable Speech Dysfluency Modeling Jiachen Lian Xuanru Zhou Z. Ezzes Jet M J Vonk Brittany Morin D. Baquirin Zachary Mille M. G. Tempini Gopala Anumanchipalli AuLLM 113 4 0 29 Aug 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Shengpeng Ji Ziyue Jiang Xize Cheng Yifu Chen Minghui Fang ... Rongjie Huang Yidi Jiang Qian Chen Zhou Zhao Zhou Zhao VLM 149 45 0 29 Aug 2024
GSIFN: A Graph-Structured and Interlaced-Masked Multimodal Transformer-based Fusion Network for Multimodal Sentiment Analysis Yijie Jin 69 0 0 27 Aug 2024
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks He Huang Taejin Park Kunal Dhawan Ivan Medennikov Krishna Puvvada Nithin Rao Koluguri Weiqing Wang Jagadeesh Balam Boris Ginsburg SSL AI4TS 84 1 0 23 Aug 2024
BUT Systems and Analyses for the ASVspoof 5 Challenge Johan Rohdin Lin Zhang Oldřich Plchot Vojtěch Staněk David Mihola ... Themos Stafylakis Dmitriy Beveraki Anna Silnova Jan Brukner Lukáš Burget 79 3 0 20 Aug 2024
mRNA2vec: mRNA Embedding with Language Model in the 5ÚTR-CDS for mRNA Design Honggen Zhang Xiangrui Gao Igor Molybog Lipeng Lai 50 1 0 16 Aug 2024
SpectralEarth: Training Hyperspectral Foundation Models at Scale Nassim Ait Ali Braham C. Albrecht Julien Mairal J. Chanussot Yi Wang X. Zhu 82 15 0 15 Aug 2024
Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem Compatibility Estimation Alain Riou Stefan Lattner Gaëtan Hadjeres Michael Anslow Geoffroy Peeters 71 2 0 05 Aug 2024
Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent Shanbo Cheng Zhichao Huang Tom Ko Hang Li Ningxin Peng Lu Xu Qini Zhang 90 6 0 31 Jul 2024
Beyond Silent Letters: Amplifying LLMs in Emotion Recognition with Vocal Nuances Mieko Ochi Ziwei Gong D. Komura Pengyuan Shi Kaan Donbekci Julia Hirschberg 110 16 0 31 Jul 2024
SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection Yi Zhu Surya Koppisetti Trang Tran Gaurav Bharaj 118 10 0 26 Jul 2024
Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning Yibing Wei Abhinav Gupta Pedro Morgado SSL 77 8 0 22 Jul 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning Shuai Wang Zheng-Shou Chen Kong Aik Lee Yan-min Qian Haizhou Li 117 6 0 21 Jul 2024
Linear-Complexity Self-Supervised Learning for Speech Processing Shucong Zhang Titouan Parcollet Rogier van Dalen Sourav Bhattacharya 122 1 0 18 Jul 2024
ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders Carlos Hinojosa Shuming Liu Guohao Li 72 2 0 17 Jul 2024
A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification Markus Marks Manuel Knott Neehar Kondapaneni Elijah Cole T. Defraeye Fernando Pérez-Cruz Pietro Perona SSL 132 5 0 16 Jul 2024
Efficient Unsupervised Visual Representation Learning with Explicit Cluster Balancing Ioannis Maniadis Metaxas Georgios Tzimiropoulos Ioannis Patras SSL 109 0 0 15 Jul 2024
AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking Yuheng Li Tianyu Luan Yizhou Wu Shaoyan Pan Yenho Chen Xiaofeng Yang 83 6 0 09 Jul 2024
Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in Tunisian Dialect Salima Mdhaffar Haroun Elleuch Fethi Bougares Yannick Esteve 124 1 0 05 Jul 2024