Title
Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge Ewan Dunbar Nicolas Hamilakis Emmanuel Dupoux SSL 32 30 0 27 Oct 2022
Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models Siddhant Arora Siddharth Dalmia Brian Yan Florian Metze A. Black Shinji Watanabe 15 12 0 27 Oct 2022
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition Yujin Wang Changli Tang Ziyang Ma Zhisheng Zheng Xie Chen Weiqiang Zhang 40 1 0 27 Oct 2022
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech Takaaki Saeki Heiga Zen Zhehuai Chen Nobuyuki Morioka Gary Wang Yu Zhang Ankur Bapna Andrew Rosenberg Bhuvana Ramabhadran 61 19 0 27 Oct 2022
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion Jingyi Li Weiping Tu Li Xiao 46 96 0 27 Oct 2022
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning Qiu-shi Zhu Long Zhou Jie Zhang Shujie Liu Yu-Chen Hu Lirong Dai VLM SSL 60 37 0 27 Oct 2022
UFO2: A unified pre-training framework for online and offline speech recognition Li Fu Siqi Li Qingtao Li L. Deng Fangzhu Li Lu Fan Meng Chen Xiaodong He OffRL 24 8 0 26 Oct 2022
AVES: Animal Vocalization Encoder based on Self-Supervision Masato Hagiwara CLIP VLM AI4TS 19 24 0 26 Oct 2022
Real-time Speech Interruption Analysis: From Cloud to Client Deployment Quchen Fu Szu-Wei Fu Yaran Fan Yu-Huan Wu Zhuo Chen J. Gupchup Ross Cutler 31 0 0 24 Oct 2022
Bootstrapping meaning through listening: Unsupervised learning of spoken sentence embeddings Jian Zhu Zuoyu Tian Yadong Liu Cong Zhang Chia-wen Lo SSL 32 2 0 23 Oct 2022
Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech Cheol Jun Cho Peter Wu Abdel-rahman Mohamed Gopala K. Anumanchipalli 29 29 0 21 Oct 2022
Large-scale learning of generalised representations for speaker recognition Jee-weon Jung Hee-Soo Heo Bong-Jin Lee Jaesong Lee Hye-jin Shim Youngki Kwon Joon Son Chung Shinji Watanabe CVBM 23 6 0 20 Oct 2022
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation Yoshiki Masuyama Xuankai Chang Samuele Cornell Shinji Watanabe Nobutaka Ono 17 19 0 19 Oct 2022
SVLDL: Improved Speaker Age Estimation Using Selective Variance Label Distribution Learning Zuheng Kang Jianzong Wang Junqing Peng Jing Xiao 19 3 0 18 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning Tzu-hsun Feng Annie Dong Ching-Feng Yeh Shu-Wen Yang Tzu-Quan Lin ... Xuankai Chang Shinji Watanabe Abdel-rahman Mohamed Shang-Wen Li Hung-yi Lee ELM SSL 28 33 0 16 Oct 2022
CTCBERT: Advancing Hidden-unit BERT with CTC Objectives Ruchao Fan Yiming Wang Yashesh Gaur Jinyu Li 38 7 0 16 Oct 2022
Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations Themos Stafylakis Ladislav Mošner Sofoklis Kakouros Oldrich Plchot L. Burget J. Černocký SSL 32 8 0 15 Oct 2022
Learning to Jointly Transcribe and Subtitle for End-to-End Spontaneous Speech Recognition Jakob Poncelet Hugo Van hamme 18 2 0 14 Oct 2022
TransFusion: Transcribing Speech with Multinomial Diffusion Matthew Baas Kevin Eloff Herman Kamper DiffM 14 4 0 14 Oct 2022
Training speech emotion classifier without categorical annotations Meysam Shamsi Marie Tahon 18 2 0 14 Oct 2022
Experiments on Turkish ASR with Self-Supervised Speech Representation Learning Ali Safaya E. Erzin 16 1 0 13 Oct 2022
On Compressing Sequences for Self-Supervised Speech Models Yen Meng Hsuan-Jui Chen Jiatong Shi Shinji Watanabe Paola García Hung-yi Lee Hao Tang SSL 13 14 0 13 Oct 2022
On the Utility of Self-supervised Models for Prosody-related Tasks Guan-Ting Lin Chiyu Feng Wei-Ping Huang Yuan Tseng Tzu-Han Lin Chen An Li Hung-yi Lee Nigel G. Ward 23 47 0 13 Oct 2022
On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding G. Laperriere Valentin Pelloin Mickael Rouvier Themos Stafylakis Yannick Esteve 27 9 0 11 Oct 2022
CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning Chutong Meng Junyi Ao Tom Ko Mingxuan Wang Haizhou Li SSL 44 6 0 08 Oct 2022
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training Zi-Hua Zhang Long Zhou Junyi Ao Shujie Liu Lirong Dai Jinyu Li Furu Wei 61 57 0 07 Oct 2022
CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations Vasista Sai Lodagala Sreyan Ghosh S. Umesh SSL 43 18 0 05 Oct 2022
Improving Label-Deficient Keyword Spotting Through Self-Supervised Pretraining H. S. Bovbjerg Zheng-Hua Tan VLM 27 3 0 04 Oct 2022
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model Yi-Jen Shih Hsuan-Fu Wang Heng-Jui Chang Layne Berry Hung-yi Lee David Harwath VLM CLIP 46 32 0 03 Oct 2022
Match to Win: Analysing Sequences Lengths for Efficient Self-supervised Learning in Speech and Audio Yan Gao Javier Fernandez-Marques Titouan Parcollet Pedro Porto Buarque de Gusmão Nicholas D. Lane 33 9 0 30 Sep 2022
Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling Itai Gat Felix Kreuk Tu Nguyen Ann Lee Jade Copet Gabriel Synnaeve Emmanuel Dupoux Yossi Adi 30 11 0 30 Sep 2022
AudioGen: Textually Guided Audio Generation Felix Kreuk Gabriel Synnaeve Adam Polyak Uriel Singer Alexandre Défossez Jade Copet Devi Parikh Yaniv Taigman Yossi Adi DiffM 27 289 0 30 Sep 2022
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data Zi-Hua Zhang Sanyuan Chen Long Zhou Yu Wu Shuo Ren ... Zhuoyuan Yao Xun Gong Lirong Dai Jinyu Li Furu Wei 35 55 0 30 Sep 2022
Speech Enhancement Using Self-Supervised Pre-Trained Model and Vector Quantization Xiaokang Zhao Qiu-shi Zhu Jie Zhang 39 4 0 28 Sep 2022
MeWEHV: Mel and Wave Embeddings for Human Voice Tasks Andrés Vasco-Carofilis Laura Fernández-Robles Enrique Alegre Eduardo FIDALGO 40 1 0 28 Sep 2022
The Kriston AI System for the VoxCeleb Speaker Recognition Challenge 2022 Qutang Cai Guoqiang Hong Zhijian Ye Ximin Li Haizhou Li 33 7 0 23 Sep 2022
The Microsoft System for VoxCeleb Speaker Recognition Challenge 2022 Gang Liu Tianyan Zhou Yong Zhao Yu Wu Zhuo Chen Yao Qian Jian Wu 14 1 0 22 Sep 2022
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models R. Olivier H. Abdullah Bhiksha Raj AAML 24 1 0 17 Sep 2022
Overlapped speech and gender detection with WavLM pre-trained features Martin Lebourdais Marie Tahon Antoine Laurent S. Meignier 33 17 0 09 Sep 2022
Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization Dongmei Wang Xiong Xiao Naoyuki Kanda Takuya Yoshioka Jian Wu 33 25 0 27 Aug 2022
The ReprGesture entry to the GENEA Challenge 2022 Sicheng Yang Zhiyong Wu Minglei Li Mengchen Zhao Jiuxin Lin Liyang Chen Weihong Bao 25 11 0 25 Aug 2022
3M: An Effective Multi-view, Multi-granularity, and Multi-aspect Modeling Approach to English Pronunciation Assessment Fu-An Chao Tien-Hong Lo Tzu-I Wu Yao-Ting Sung Berlin Chen 26 41 0 19 Aug 2022
C3-DINO: Joint Contrastive and Non-contrastive Self-Supervised Learning for Speaker Verification Chunlei Zhang Dong Yu 28 17 0 15 Aug 2022
Utterance-by-utterance overlap-aware neural diarization with Graph-PIT K. Kinoshita Thilo von Neumann Marc Delcroix Christoph Boeddeker Reinhold Haeb-Umbach 38 4 0 28 Jul 2022
Dive into Big Model Training Qinghua Liu Yuxiang Jiang MoMe AI4CE LRM 15 3 0 25 Jul 2022
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale Gopinath Chennupati Milind Rao Gurpreet Chadha Aaron Eakin A. Raju ... Andrew Oberlin Buddha Nandanoor Prahalad Venkataramanan Zheng Wu Pankaj Sitpure CLL 27 8 0 19 Jul 2022
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models Takanori Ashihara Takafumi Moriya Kohei Matsuura Tomohiro Tanaka 30 25 0 14 Jul 2022
Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription Xianrui Zheng C. Zhang P. Woodland 26 16 0 08 Jul 2022
Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track Tilak Purohit Imen Ben Mahmoud Bogdan Vlasenko Mathew Magimai.-Doss SSL 15 8 0 23 Jun 2022
A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement Or Tal Moshe Mandel Felix Kreuk Yossi Adi AAML 12 8 0 22 Jun 2022