Title
What is different between these datasets? Varun Babbar Zhicheng Guo Cynthia Rudin 59 1 0 08 Mar 2024
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations Guan-Ting Lin Cheng-Han Chiang Hung-yi Lee 34 22 0 20 Feb 2024
Cross-Modal Coordination Across a Diverse Set of Input Modalities Jorge Sánchez Rodrigo Laguna VLM 44 0 0 29 Jan 2024
Speech foundation models on intelligibility prediction for hearing-impaired listeners Santiago Cuervo R. Marxer 30 6 0 24 Jan 2024
MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition Zheng Lian Guoying Zhao Yong Ren Hao Gu Haiyang Sun Lan Chen Bin Liu Jianhua Tao 21 12 0 07 Jan 2024
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation Ziyang Ma Zhisheng Zheng Jiaxin Ye Jinchao Li Zhifu Gao Shiliang Zhang Xie Chen MDE SLR SSL 25 86 0 23 Dec 2023
On the choice of the optimal temporal support for audio classification with Pre-trained embeddings Aurian Quélennec Michel Olvera Geoffroy Peeters S. Essid 27 2 0 21 Dec 2023
Efficiency-oriented approaches for self-supervised speech representation learning Luis Lugo Valentin Vielzeuf SSL 26 1 0 18 Dec 2023
Fine-Tuned Self-Supervised Speech Representations for Language Diarization in Multilingual Code-Switched Speech Geoffrey T. Frost Emily Morris Joshua Jansen van Vüren T. Niesler 26 2 0 15 Dec 2023
A-JEPA: Joint-Embedding Predictive Architecture Can Listen Zhengcong Fei Mingyuan Fan Junshi Huang 25 17 0 27 Nov 2023
R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces Heng-Jui Chang James R. Glass 33 3 0 15 Nov 2023
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT Cheol Jun Cho Abdelrahman Mohamed Shang-Wen Li Alan W. Black Gopala K. Anumanchipalli 33 8 0 16 Oct 2023
Toward Joint Language Modeling for Speech Units and Text Ju-Chieh Chou Chung-Ming Chien Wei-Ning Hsu Karen Livescu Arun Babu Alexis Conneau Alexei Baevski Michael Auli VLM 26 20 0 12 Oct 2023
Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text Chanho Park Chengsong Lu Mingjie Chen Thomas Hain 28 3 0 12 Oct 2023
Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding Pavel Denisov Ngoc Thang Vu 29 1 0 09 Oct 2023
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond Jiatong Shi William Chen Dan Berrebbi Hsiu-Hsuan Wang Wei-Ping Huang ... Yuxun Tang Shang-Wen Li Abdelrahman Mohamed Hung-yi Lee Shinji Watanabe LRM ELM 36 15 0 09 Oct 2023
XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words Robin Algayres Pablo Diego-Simon Benoît Sagot Emmanuel Dupoux 36 1 0 08 Oct 2023
Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting Hemant Yadav Erica Cooper Junichi Yamagishi Sunayana Sitaram R. Shah 11 0 0 08 Oct 2023
LanSER: Language-Model Supported Speech Emotion Recognition Taesik Gong Joshua Belanich Krishna Somandepalli Arsha Nagrani B. Eoff Brendan Jou 33 10 0 07 Sep 2023
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition Zhisheng Zheng Ziyang Ma Yu Wang Xie Chen 26 2 0 28 Aug 2023
An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification Harunori Kawano Sota Shimizu 30 1 0 22 Aug 2023
Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations Wen Wu C. Zhang P. Woodland 31 3 0 14 Aug 2023
Improving Joint Speech-Text Representations Without Alignment Cal Peyser Zhong Meng Ke Hu Rohit Prabhavalkar Andrew Rosenberg Tara N. Sainath M. Picheny Kyunghyun Cho VLM 31 4 0 11 Aug 2023
Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains Martin Lebourdais Théo Mariotte Marie Tahon Anthony Larcher Antoine Laurent Silvio Montrésor S. Meignier Jean-Hugh Thomas VLM 25 5 0 24 Jul 2023
Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition Weidong Chen Xiaofen Xing Peihao Chen Xiangmin Xu VLM 28 35 0 20 Jul 2023
On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis Siyang Wang G. Henter Joakim Gustafson Éva Székely 44 5 0 11 Jul 2023
On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation Gene-Ping Yang Yue Gu Qingming Tang Dongsu Du Yuzong Liu 22 5 0 06 Jul 2023
Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers Yineng Chen Z. Li Lefei Zhang Bo Du Hai Zhao 33 4 0 02 Jul 2023
When to Use Efficient Self Attention? Profiling Text, Speech and Image Transformer Variants Anuj Diwan Eunsol Choi David Harwath 41 0 0 14 Jun 2023
Estimating the Uncertainty in Emotion Attributes using Deep Evidential Regression Wen Wu C. Zhang P. Woodland UQCV UD EDL 19 11 0 11 Jun 2023
Simultaneous or Sequential Training? How Speech Representations Cooperate in a Multi-Task Self-Supervised Learning System Khazar Khorrami María Andrea Cruz Blandón Tuomas Virtanen Okko Rasanen SSL 27 1 0 05 Jun 2023
On the Robustness of Arabic Speech Dialect Identification Peter Sullivan AbdelRahim Elmadany Muhammad Abdul-Mageed 23 8 0 01 Jun 2023
MiniSUPERB: Lightweight Benchmark for Self-supervised Speech Models Yu-Hsiang Wang Huan Chen Kai-Wei Chang Winston H. Hsu Hung-yi Lee 24 6 0 30 May 2023
Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target Guanyong Wu Guan-Ting Lin Shang-Wen Li Hung-yi Lee 26 5 0 29 May 2023
Investigating Pre-trained Audio Encoders in the Low-Resource Condition Haomiao Yang Jinming Zhao Gholamreza Haffari Ehsan Shareghi 19 6 0 28 May 2023
One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification Ju-Sung Heo Chan-yeong Lim Ju-ho Kim Hyun-Seo Shin Ha-Jin Yu 26 2 0 27 May 2023
Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers? Eklavya Sarkar Mathew Magimai.-Doss 21 11 0 23 May 2023
Self-supervised representations in speech-based depression detection Wen Wu C. Zhang P. Woodland 14 23 0 20 May 2023
Scaling laws for language encoding models in fMRI Richard Antonello Aditya R. Vaidya Alexander G. Huth MedIm 30 55 0 19 May 2023
A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning Jiyang Tang William Chen Xuankai Chang Shinji Watanabe B. MacWhinney 24 10 0 19 May 2023
Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation Kangwook Jang Sungnyun Kim Se-Young Yun Hoi-Rim Kim 29 5 0 19 May 2023
Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model Puyuan Peng Shang-Wen Li Okko Rasanen Abdel-rahman Mohamed David Harwath SSL VLM 26 7 0 19 May 2023
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark Jiatong Shi Dan Berrebbi William Chen Ho-Lam Chung En-Pei Hu ... Xuankai Chang Shang-Wen Li Abdel-rahman Mohamed Hung-yi Lee Shinji Watanabe ELM 55 58 0 18 May 2023
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning Alexander H. Liu Heng-Jui Chang Michael Auli Wei-Ning Hsu James R. Glass 24 25 0 17 May 2023
Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models Takanori Ashihara Takafumi Moriya Kohei Matsuura Tomohiro Tanaka 25 3 0 09 May 2023
A Comparative Study of Pre-trained Speech and Audio Embeddings for Speech Emotion Recognition Orchid Chetia Phukan Arun Balaji Buduru Rajesh Sharma 28 6 0 22 Apr 2023
Computational modeling of semantic change Nina Tahmasebi Haim Dubossarsky 34 6 0 13 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning Nikhil Singh Chih-Wei Wu Iroro Orife Mahdi M. Kalayeh 23 2 0 12 Apr 2023
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit Brian Yan Jiatong Shi Yun Tang H. Inaguma Yifan Peng ... Zhaoheng Ni Moto Hira Soumi Maiti J. Pino Shinji Watanabe 19 20 0 10 Apr 2023
Designing and Evaluating Speech Emotion Recognition Systems: A reality check case study with IEMOCAP Nikolaos Antoniou Athanasios Katsamanis Theodoros Giannakopoulos Shrikanth Narayanan 23 17 0 03 Apr 2023