What all do audio transformer models hear? Probing Acoustic Representations for Language Delivery and its Structure

2 January 2021

Papers citing "What all do audio transformer models hear? Probing Acoustic Representations for Language Delivery and its Structure"

50 / 54 papers shown

Title
Towards Smarter Hiring: Are Zero-Shot and Few-Shot Pre-trained LLMs Ready for HR Spoken Interview Transcript Analysis? Subhankar Maity Aniket Deroy Sudeshna Sarkar 23 0 0 08 Apr 2025
Rethinking Few-Shot Medical Image Segmentation by SAM2: A Training-Free Framework with Augmentative Prompting and Dynamic Matching Haiyue Zu Jun Ge Heting Xiao Jile Xie Zhangzhe Zhou ... Jiayi Ni Junjie Niu Linlin Zhang Li Ni Huilin Yang MedIm VLM 54 0 0 05 Mar 2025
Enhancing and Exploring Mild Cognitive Impairment Detection with W2V-BERT-2.0 Yueguan Wang Tatsunari Matsushima Soichiro Matsushima Toshimitsu Sakai 40 0 0 28 Jan 2025
Investigating large language models for their competence in extracting grammatically sound sentences from transcribed noisy utterances Alina Wróblewska 33 0 0 07 Oct 2024
ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks Nakamasa Inoue Shinta Otake Takumi Hirose Masanari Ohi Rei Kawakami 45 1 0 28 Jul 2024
SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection Yi Zhu Surya Koppisetti Trang Tran Gaurav Bharaj 54 9 0 26 Jul 2024
Speech Representation Analysis based on Inter- and Intra-Model Similarities Yassine El Kheir Ahmed M. Ali Shammur A. Chowdhury SSL 43 2 0 23 Jun 2024
Impact of Speech Mode in Automatic Pathological Speech Detection S. A. Sheikh Ina Kodrasi 34 3 0 14 Jun 2024
Deep Learning for Assessment of Oral Reading Fluency Mithilesh Vaidya Binaya Kumar Sahoo Preeti Rao 26 0 0 29 May 2024
Exploring the Capabilities of Prompted Large Language Models in Educational and Assessment Applications Subhankar Maity Aniket Deroy Sudeshna Sarkar AI4Ed ELM 34 11 0 19 May 2024
Reimagining Speech: A Scoping Review of Deep Learning-Powered Voice Conversion A. R. Bargum Stefania Serafin Cumhur Erkut 26 3 0 14 Nov 2023
Self-Supervised Models of Speech Infer Universal Articulatory Kinematics Cheol Jun Cho Abdelrahman Mohamed Alan W. Black Gopala K. Anumanchipalli SSL 24 10 0 16 Oct 2023
Homophone Disambiguation Reveals Patterns of Context Mixing in Speech Transformers Hosein Mohebbi Grzegorz Chrupała Willem H. Zuidema A. Alishahi 36 12 0 15 Oct 2023
Do self-supervised speech and language models extract similar representations as human brain? Peili Chen Linyang He Li Fu Lu Fan Edward F. Chang Yuanning Li SSL 24 2 0 07 Oct 2023
Decoding Emotions: A comprehensive Multilingual Study of Speech Models for Speech Emotion Recognition Anant Singh Akshat Gupta 31 4 0 17 Aug 2023
Unsupervised Out-of-Distribution Dialect Detection with Mahalanobis Distance Sourya Dipta Das Yash Vadi Abhishek Unnam Kuldeep Yadav 28 1 0 09 Aug 2023
Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition Weidong Chen Xiaofen Xing Peihao Chen Xiangmin Xu VLM 38 35 0 20 Jul 2023
What Do Self-Supervised Speech Models Know About Words? Ankita Pasad C. Chien Shane Settle Karen Livescu SSL 51 26 0 30 Jun 2023
GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech Yahuan Cong Haoyu Zhang Hao-Ping Lin Shichao Liu Chunfeng Wang Yi Ren Xiang Yin Zejun Ma 33 1 0 27 Jun 2023
Make-A-Voice: Unified Voice Synthesis With Discrete Representation Rongjie Huang Chunlei Zhang Yongqiang Wang Dongchao Yang Lu Liu Zhenhui Ye Ziyue Jiang Chao Weng Zhou Zhao Dong Yu DiffM 39 26 0 30 May 2023
Investigating Pre-trained Audio Encoders in the Low-Resource Condition Haomiao Yang Jinming Zhao Gholamreza Haffari Ehsan Shareghi 32 6 0 28 May 2023
SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing Weidong Chen Xiaofen Xing Xiangmin Xu Jianxin Pang Lan Du 30 38 0 27 Feb 2023
Phone and speaker spatial organization in self-supervised speech representations Pablo Riera M. Cerdeiro L. Pepino Luciana Ferrer SSL 26 1 0 24 Feb 2023
A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One Lingwei Meng Jiawen Kang Mingyu Cui Yuejiao Wang Xixin Wu Helen M. Meng 25 17 0 20 Feb 2023
Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation Tomer Wullach Shlomo E. Chazan 30 1 0 27 Dec 2022
Parameter Efficient Transfer Learning for Various Speech Processing Tasks Shinta Otake Rei Kawakami Nakamasa Inoue 24 16 0 06 Dec 2022
L2 proficiency assessment using self-supervised speech representations Stefano Bannò Kate Knill M. Matassoni Vyas Raina Mark Gales SSL 34 7 0 16 Nov 2022
Comparative layer-wise analysis of self-supervised speech models Ankita Pasad Bowen Shi Karen Livescu SSL 37 109 0 08 Nov 2022
Probing Statistical Representations For End-To-End ASR A. Ollerenshaw Md. Asif Jalal Thomas Hain 35 2 0 03 Nov 2022
Proficiency assessment of L2 spoken English using wav2vec 2.0 Stefano Bannò M. Matassoni 20 22 0 24 Oct 2022
Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech Cheol Jun Cho Peter Wu Abdel-rahman Mohamed Gopala K. Anumanchipalli 29 29 0 21 Oct 2022
On the Utility of Self-supervised Models for Prosody-related Tasks Guan-Ting Lin Chiyu Feng Wei-Ping Huang Yuan Tseng Tzu-Han Lin Chen-An Li Hung-yi Lee Nigel G. Ward 23 48 0 13 Oct 2022
A Comparison of Transformer, Convolutional, and Recurrent Neural Networks on Phoneme Recognition Kyuhong Shim Wonyong Sung 27 2 0 01 Oct 2022
End-to-End and Self-Supervised Learning for ComParE 2022 Stuttering Sub-Challenge S. A. Sheikh Md. Sahidullah F. Hirsch Slim Ouni SSL 24 9 0 20 Jul 2022
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models Takanori Ashihara Takafumi Moriya Kohei Matsuura Tomohiro Tanaka 30 28 0 14 Jul 2022
COVYT: Introducing the Coronavirus YouTube and TikTok speech dataset featuring the same speakers with and without infection Andreas Triantafyllopoulos A. Semertzidou Meishu Song Florian B. Pokorny Björn W. Schuller 52 2 0 20 Jun 2022
Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning Eesung Kim J. Jeon Hyeji Seo Ho-Young Kim SSL 23 37 0 08 Apr 2022
Probing Speech Emotion Recognition Transformers for Linguistic Knowledge Andreas Triantafyllopoulos Johannes Wagner H. Wierstorf Maximilian Schmitt U. Reichel F. Eyben Felix Burkhardt Björn W. Schuller 29 25 0 01 Apr 2022
Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition Ashish Seth L. D. Prasad Sreyan Ghosh S. Umesh 33 3 0 31 Mar 2022
Span Classification with Structured Information for Disfluency Detection in Spoken Utterances Sreyan Ghosh Sonal Kumar Yaman Kumar Singla R. Shah S. Umesh 33 6 0 30 Mar 2022
The MSXF TTS System for ICASSP 2022 ADD Challenge Chunyong Yang Pengfei Liu Yanli Chen Hongbin Wang Min Liu 13 0 0 27 Jan 2022
Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency P. Bamdev Manraj Singh Grover Yaman Kumar Singla Payman Vafaee Mika Hama R. Shah 31 12 0 30 Nov 2021
Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees Yaman Kumar Singla Sriram Krishna R. Shah Changyou Chen 18 6 0 17 Nov 2021
Investigating self-supervised front ends for speech spoofing countermeasures Xin Wang Junichi Yamagishi AAML 24 123 0 15 Nov 2021
Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations Hyeong-Seok Choi Juheon Lee W. Kim Jie Hwan Lee Hoon Heo Kyogu Lee 42 151 0 27 Oct 2021
DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances Sreyan Ghosh Samden Lepcha S. Sakshi R. Shah S. Umesh 31 14 0 14 Oct 2021
AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses Yaman Kumar Singla Swapnil Parekh Somesh Singh Junjie Li R. Shah Changyou Chen AAML 41 14 0 24 Sep 2021
Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring Yaman Kumar Singla Avykat Gupta Shaurya Bagga Changyou Chen Balaji Krishnamurthy R. Shah 32 12 0 30 Aug 2021
Layer-wise Analysis of a Self-supervised Speech Representation Model Ankita Pasad Ju-Chieh Chou Karen Livescu SSL 26 291 0 10 Jul 2021
What do End-to-End Speech Models Learn about Speaker, Language and Channel Information? A Layer-wise and Neuron-level Analysis Shammur A. Chowdhury Nadir Durrani Ahmed M. Ali 49 12 0 01 Jul 2021