Comparative layer-wise analysis of self-supervised speech models

8 November 2022

Papers citing "Comparative layer-wise analysis of self-supervised speech models"

50 / 87 papers shown

Title
On The Landscape of Spoken Language Models: A Comprehensive Survey Siddhant Arora Kai-Wei Chang Chung-Ming Chien Yifan Peng Haibin Wu Yossi Adi Emmanuel Dupoux Hung-yi Lee Karen Livescu Shinji Watanabe 52 2 0 11 Apr 2025
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech Ji-Hoon Kim Jeongsoo Choi Jaehun Kim Chaeyoung Jung Joon Son Chung CVBM 50 1 0 21 Mar 2025
Context-Aware Two-Step Training Scheme for Domain Invariant Speech Separation Wupeng Wang Zexu Pan Jingru Lin Shuai Wang Haizhou Li 53 0 0 16 Mar 2025
From TOWER to SPIRE: Adding the Speech Modality to a Text-Only LLM Kshitij Ambilduke Ben Peters Sonal Sannigrahi Anil Keshwani Tsz Kin Lam Bruno Martins Marcely Zanon Boito André F. T. Martins 52 0 0 13 Mar 2025
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens Xinbing Wang Mingqi Jiang Z. Ma Ziyu Zhang S. Liu ... Zhifei Li Xie Chen Lei Xie Y. Guo Wei Xue 81 12 0 03 Mar 2025
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation Alexander H. Liu Sang-gil Lee Chao-Han Huck Yang Yuan Gong Yu-Chun Wang James Glass Rafael Valle Bryan Catanzaro SSL 52 0 0 02 Mar 2025
Why disentanglement-based speaker anonymization systems fail at preserving emotions? Ünal Ege Gaznepoglu Nils Peters 83 0 0 22 Jan 2025
How Redundant Is the Transformer Stack in Speech Representation Models? Teresa Dorszewski Albert Kjøller Jacobsen Lenka Tětková Lars Kai Hansen 107 0 0 20 Jan 2025
Discrete Speech Unit Extraction via Independent Component Analysis Tomohiko Nakamura Kwanghee Choi Keigo Hojo Yoshiaki Bando Satoru Fukayama Shinji Watanabe 43 0 0 11 Jan 2025
Towards Unsupervised Speech Recognition Without Pronunciation Models Junrui Ni Liming Wang Yang Zhang Kaizhi Qian Heting Gao Mark Hasegawa-Johnson Chang D. Yoo SSL OffRL 88 0 0 10 Jan 2025
An Empirical Analysis of Speech Self-Supervised Learning at Multiple Resolutions Theo Clark Benedetta Cevoli Eloy de Jong Timofey Abramski Jamie Dougherty SSL 38 0 0 31 Oct 2024
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation K R Prajwal Bowen Shi Matthew Lee Apoorv Vyas Andros Tjandra ... Baishan Guo Huiyu Wang Triantafyllos Afouras David Kant Wei-Ning Hsu 43 5 0 27 Oct 2024
JOOCI: a Framework for Learning Comprehensive Speech Representations Hemant Yadav R. Shah Sunayana Sitaram 28 0 0 14 Oct 2024
Music Genre Classification using Large Language Models Mohamed El Amine Meguenani Alceu de Souza Britto Jr. A. L. Koerich 31 0 0 10 Oct 2024
Exploring ASR-Based Wav2Vec2 for Automated Speech Disorder Assessment: Insights and Analysis Tuan Nguyen C. Fredouille A. Ghio M. Balaguer Virginie Woisard 16 0 0 10 Oct 2024
Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge Yi Zhu C. Goel Surya Koppisetti Trang Tran Ankur Kumar Gaurav Bharaj AAML 28 0 0 09 Oct 2024
Mitigation of gender bias in automatic facial non-verbal behaviors generation Alice Delbosc M. Ochs Nicolas Sabouret Brian Ravenet Stéphane Ayache 29 0 0 09 Oct 2024
SyllableLM: Learning Coarse Semantic Units for Speech Language Models Alan Baade Puyuan Peng David Harwath 50 3 0 05 Oct 2024
Adaptive Large Language Models By Layerwise Attention Shortcuts Prateek Verma Mert Pilanci KELM OffRL 52 0 0 17 Sep 2024
Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models Li-Wei Chen Takuya Higuchi He Bai Ahmed Hussen Abdelaziz Alexander Rudnicky Shinji Watanabe Tatiana Likhomanenko B. Theobald Zakaria Aldeneh 49 0 0 16 Sep 2024
Connecting Concept Convexity and Human-Machine Alignment in Deep Neural Networks Teresa Dorszewski Lenka Tětková Lorenz Linhardt Lars Kai Hansen HAI 36 0 0 10 Sep 2024
Property Neurons in Self-Supervised Speech Transformers T. Lin Guan-Ting Lin Hung-yi Lee Hao Tang MILM 27 0 0 07 Sep 2024
Probing self-attention in self-supervised speech models for cross-linguistic differences Sai Gopinath Joselyn Rodriguez MILM 56 0 0 04 Sep 2024
Convexity-based Pruning of Speech Representation Models Teresa Dorszewski Lenka Tětková Lars Kai Hansen 25 2 0 16 Aug 2024
SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection Yi Zhu Surya Koppisetti Trang Tran Gaurav Bharaj 52 9 0 26 Jul 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning Shuai Wang Zheng-Shou Chen Kong Aik Lee Yan-min Qian Haizhou Li 39 4 0 21 Jul 2024
Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation J. Duret Yannick Esteve Titouan Parcollet 41 0 0 08 Jul 2024
Improving Self-supervised Pre-training using Accent-Specific Codebooks Darshan Prabhu Abhishek Gupta Omkar Nitsure P. Jyothi Sriram Ganapathy SSL 44 0 0 04 Jul 2024
Cross-Lingual Transfer Learning for Speech Translation Rao Ma Yassir Fathullah Mengjie Qian Siyuan Tang Mark J. F. Gales Kate Knill 20 1 0 01 Jul 2024
Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study Peikun Chen Sining Sun Changhao Shan Qing Yang Lei Xie 42 2 0 27 Jun 2024
WavRx: a Disease-Agnostic, Generalizable, and Privacy-Preserving Speech Health Diagnostic Model Yi Zhu Tiago H. Falk MedIm 41 0 0 26 Jun 2024
AND: Audio Network Dissection for Interpreting Deep Acoustic Models Tung-Yu Wu Yu-Xiang Lin Tsui-Wei Weng 52 1 0 24 Jun 2024
Speech Representation Analysis based on Inter- and Intra-Model Similarities Yassine El Kheir Ahmed M. Ali Shammur A. Chowdhury SSL 43 2 0 23 Jun 2024
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics Cheol Jun Cho Peter Wu Tejas S. Prabhune Dhruv Agarwal Gopala K. Anumanchipalli 36 1 0 18 Jun 2024
Interface Design for Self-Supervised Speech Models Yi-Jen Shih David Harwath 54 1 0 18 Jun 2024
Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations Mukhtar Mohamed Oli Danyi Liu Hao Tang Sharon Goldwater SSL 44 2 0 13 Jun 2024
Self-Supervised Speech Representations are More Phonetic than Semantic Kwanghee Choi Ankita Pasad Tomohiko Nakamura Satoru Fukayama Karen Livescu Shinji Watanabe 31 14 0 12 Jun 2024
SCDNet: Self-supervised Learning Feature-based Speaker Change Detection Yue Li Xinsheng Wang Li Zhang Lei Xie 42 1 0 12 Jun 2024
Sustainable self-supervised learning for speech representations Luis Lugo Valentin Vielzeuf 31 2 0 11 Jun 2024
MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations Hemant Yadav Sunayana Sitaram R. Shah SSL 49 1 0 09 Jun 2024
Towards objective and interpretable speech disorder assessment: a comparative analysis of CNN and transformer-based models Malo Maisonneuve C. Fredouille M. Lalain A. Ghio Virginie Woisard 48 0 0 07 Jun 2024
Fill in the Gap! Combining Self-supervised Representation Learning with Neural Audio Synthesis for Speech Inpainting Ihab Asaad Maxime Jacquelin Olivier Perrotin Laurent Girin Thomas Hueber 33 0 0 30 May 2024
Crossmodal ASR Error Correction with Discrete Speech Units Yuanchao Li Pinzhen Chen Peter Bell Catherine Lai 36 6 0 26 May 2024
Investigating the Áutoencoder Behavior' in Speech Self-Supervised Models: a focus on HuBERT's Pretraining Valentin Vielzeuf SSL 44 0 0 14 May 2024
A predictive learning model can simulate temporal dynamics and context effects found in neural representations of continuous speech Oli Danyi Liu Hao Tang Naomi H Feldman Sharon Goldwater 24 1 0 13 May 2024
A Large-Scale Evaluation of Speech Foundation Models Shu-Wen Yang Heng-Jui Chang Zili Huang Andy T. Liu Cheng-I Jeff Lai ... Kushal Lakhotia Shang-Wen Li Abdelrahman Mohamed Shinji Watanabe Hung-yi Lee 38 19 0 15 Apr 2024
Compact Speech Translation Models via Discrete Speech Units Pretraining Tsz Kin Lam Alexandra Birch Barry Haddow 53 2 0 29 Feb 2024
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing Jeong Hun Yeo Seunghee Han Minsu Kim Y. Ro 53 22 0 23 Feb 2024
Establishing degrees of closeness between audio recordings along different dimensions using large-scale cross-lingual models Maxime Fily Guillaume Wisniewski Severine Guillaume Gilles Adda Alexis Michaud 22 1 0 08 Feb 2024
Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition Alexandra Saliba Yuanchao Li Ramon Sanabria Catherine Lai 38 8 0 04 Feb 2024