Robust Speech Recognition via Large-Scale Weak Supervision

6 December 2022

Papers citing "Robust Speech Recognition via Large-Scale Weak Supervision"

50 / 500 papers shown

Title
Improving Voice Conversion for Dissimilar Speakers Using Perceptual Losses Suhita Ghosh Yamini Sinha Ingo Siegert Sebastian Stober 11 1 0 15 Sep 2023
Can Whisper perform speech-based in-context learning? Siyin Wang Chao-Han Huck Yang Ji Wu Chao Zhang 29 24 0 13 Sep 2023
LanSER: Language-Model Supported Speech Emotion Recognition Taesik Gong Joshua Belanich Krishna Somandepalli Arsha Nagrani B. Eoff Brendan Jou 33 10 0 07 Sep 2023
Matcha-TTS: A fast TTS architecture with conditional flow matching Shivam Mehta Ruibo Tu Jonas Beskow Éva Székely G. Henter 24 69 0 06 Sep 2023
Learning Speech Representation From Contrastive Token-Acoustic Pretraining Chunyu Qiang Hao Li Yixin Tian Ruibo Fu Tao Wang Longbiao Wang J. Dang 29 5 0 01 Sep 2023
Maestro: Uncovering Low-Rank Structures via Trainable Decomposition Samuel Horváth Stefanos Laskaridis Shashank Rajput Hongyi Wang BDL 32 4 0 28 Aug 2023
Mobile Foundation Model as Firmware Jinliang Yuan Chenchen Yang Dongqi Cai Shihe Wang Xin Yuan ... Di Zhang Hanzi Mei Xianqing Jia Shangguang Wang Mengwei Xu 40 19 0 28 Aug 2023
Spoken Language Intelligence of Large Language Models for Language Learning Linkai Peng Baorian Nuchged Yingming Gao ELM 62 4 0 28 Aug 2023
An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification Harunori Kawano Sota Shimizu 30 1 0 22 Aug 2023
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer Sang-Hoon Lee Haram Choi H. Oh Seong-Whan Lee BDL 28 9 0 30 Jul 2023
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding Chunyu Qiang Hao Li Hao Ni He Qu Ruibo Fu Tao Wang Longbiao Wang J. Dang DiffM 30 8 0 28 Jul 2023
Cascaded Cross-Modal Transformer for Request and Complaint Detection Nicolae-Cătălin Ristea Radu Tudor Ionescu 36 3 0 27 Jul 2023
Turning Whisper into Real-Time Transcription System Dominik Machávcek Raj Dabre Ondrej Bojar 19 22 0 27 Jul 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures Kun Yuan V. Srivastav Tong Yu Joël L. Lavanchy Pietro Mascagni Pietro Mascagni N. Padoy Nicolas Padoy 35 20 0 27 Jul 2023
Adaptation of Whisper models to child speech recognition Rishabh Jain Andrei Barcovschi Mariam Yiwere Peter Corcoran H. Cucu 16 30 0 24 Jul 2023
MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems Thilo von Neumann Christoph Boeddeker Marc Delcroix Reinhold Haeb-Umbach 29 16 0 21 Jul 2023
Replay to Remember: Continual Layer-Specific Fine-tuning for German Speech Recognition Theresa Pekarek-Rosin S. Wermter VLM CLL 29 2 0 14 Jul 2023
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding Titouan Parcollet Rogier van Dalen Shucong Zhang S. Bhattacharya 26 6 0 12 Jul 2023
On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis Siyang Wang G. Henter Joakim Gustafson Éva Székely 47 5 0 11 Jul 2023
ChatGPT in the Age of Generative AI and Large Language Models: A Concise Survey S. Mohamadi G. Mujtaba Ngan Le Gianfranco Doretto Don Adjeroh LM&MA AI4MH 26 21 0 09 Jul 2023
MultiVENT: Multilingual Videos of Events with Aligned Natural Text Kate Sanders David Etter Reno Kriz Benjamin Van Durme VGen 42 7 0 06 Jul 2023
Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework Eliya Segev Maya Alroy Ronen Katsir Noam Wies Ayana Shenhav ... D. Zar Oren Tadmor Jacob Bitterman Amnon Shashua Tal Rosenwein 32 2 0 04 Jul 2023
Boosting Norwegian Automatic Speech Recognition Javier de la Rosa Rolv-Arild Braaten P. Kummervold Freddy Wetjen Svein Arne Brygfjeld 38 7 0 04 Jul 2023
Transcribing Educational Videos Using Whisper: A preliminary study on using AI for transcribing educational videos Ashwin Rao 16 6 0 04 Jul 2023
NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning K. Yuksel Thiago Castro Ferreira Golara Javadi Mohamed El-Badrashiny Ahmet Gunduz 26 4 0 21 Jun 2023
Quilt-1M: One Million Image-Text Pairs for Histopathology Wisdom O. Ikezogwo M. S. Seyfioglu Fatemeh Ghezloo Dylan Stefan Chan Geva Fatwir Sheikh Mohammed Pavan Kumar Anand Ranjay Krishna Linda G. Shapiro CLIP VLM 139 114 0 20 Jun 2023
Towards End-to-end Speech-to-text Summarization Raul Monteiro Diogo Pernes 9 1 0 06 Jun 2023
Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses Lucía Gómez Zaragozá Simone Wills Cristian Tejedor-García Javier Marín-Morales Mariano Alcañiz H. Strik 24 8 0 06 Jun 2023
Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models Liam Dugan Anshul Wadhawan Kyle Spence Chris Callison-Burch Morgan McGuire Victor Zordan OffRL 30 2 0 01 Jun 2023
Some voices are too common: Building fair speech recognition systems using the Common Voice dataset Lucas Maison Yannick Esteve 26 3 0 01 Jun 2023
Voice Conversion With Just Nearest Neighbors Matthew Baas Benjamin van Niekerk Herman Kamper SSL 32 48 0 30 May 2023
Investigating Pre-trained Audio Encoders in the Low-Resource Condition Haomiao Yang Jinming Zhao Gholamreza Haffari Ehsan Shareghi 21 6 0 28 May 2023
External Language Model Integration for Factorized Neural Transducers Michael Levit S. Parthasarathy Cem Aksoylar Mohammad Sadegh Rasooli Shuangyu Chang 29 2 0 26 May 2023
DistriBlock: Identifying adversarial audio samples by leveraging characteristics of the output distribution Matías P. Pizarro D. Kolossa Asja Fischer AAML 35 1 0 26 May 2023
NormMark: A Weakly Supervised Markov Model for Socio-cultural Norm Discovery Farhad Moghimifar Shilin Qu Tongtong Wu Yuan-Fang Li Gholamreza Haffari 34 4 0 26 May 2023
Context-aware attention layers coupled with optimal transport domain adaptation and multimodal fusion methods for recognizing dementia from spontaneous speech Loukas Ilias D. Askounis 31 9 0 25 May 2023
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation Chenyang Le Yao Qian Long Zhou Shujie Liu Yanmin Qian Michael Zeng Xuedong Huang 24 13 0 24 May 2023
Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person L. Gris R. Marcacini Arnaldo Cândido Júnior Edresson Casanova A. S. Soares S. Aluísio 21 7 0 23 May 2023
Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test Eungbeom Kim Yunkee Chae Jaeheon Sim Kyogu Lee 17 1 0 22 May 2023
Textually Pretrained Speech Language Models Michael Hassid Tal Remez Tu Nguyen Itai Gat Alexis Conneau ... Alexandre Défossez Gabriel Synnaeve Emmanuel Dupoux Roy Schwartz Yossi Adi VLM SyDa 31 53 0 22 May 2023
CopyNE: Better Contextual ASR by Copying Named Entities Shilin Zhou Zhenghua Li Yu Hong Mengdi Zhang Zhefeng Wang Baoxing Huai 15 5 0 22 May 2023
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages Andrew Rouditchenko Sameer Khurana Samuel Thomas Rogerio Feris Leonid Karlinsky Hilde Kuehne David Harwath Brian Kingsbury James R. Glass VLM 37 22 0 21 May 2023
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data Ziyi Yang Mahmoud Khademi Yichong Xu Reid Pryzant Yuwei Fang ... Yu Shi Lu Yuan Takuya Yoshioka Michael Zeng Xuedong Huang 17 2 0 21 May 2023
Scaling laws for language encoding models in fMRI Richard Antonello Aditya R. Vaidya Alexander G. Huth MedIm 30 55 0 19 May 2023
Solving NLP Problems through Human-System Collaboration: A Discussion-based Approach Masahiro Kaneko Graham Neubig Naoaki Okazaki 36 6 0 19 May 2023
MD3: The Multi-Dialect Dataset of Dialogues Jacob Eisenstein Vinodkumar Prabhakaran Clara E. Rivera Dorottya Demszky D. Sharma 35 7 0 19 May 2023
Data Redaction from Conditional Generative Models Zhifeng Kong Kamalika Chaudhuri KELM 16 7 0 18 May 2023
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model Siyuan Huang Zhengkai Jiang Hao Dong Yu Qiao Peng Gao Hongsheng Li LM&Ro 27 93 0 18 May 2023
An Android Robot Head as Embodied Conversational Agent Marcel Heisler C. Becker-Asano LM&Ro LLMAG 29 0 0 18 May 2023
The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation Mutian He Philip N. Garner 44 4 0 16 May 2023