ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.04356
  4. Cited By
Robust Speech Recognition via Large-Scale Weak Supervision

Robust Speech Recognition via Large-Scale Weak Supervision

6 December 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
    OffRL
ArXivPDFHTML

Papers citing "Robust Speech Recognition via Large-Scale Weak Supervision"

50 / 503 papers shown
Title
Enhancing Multilingual Voice Toxicity Detection with Speech-Text
  Alignment
Enhancing Multilingual Voice Toxicity Detection with Speech-Text Alignment
Joseph Liu
Mahesh Kumar Nandwana
Janne Pylkkönen
Hannes Heikinheimo
Morgan McGuire
37
1
0
14 Jun 2024
Inclusive ASR for Disfluent Speech: Cascaded Large-Scale Self-Supervised
  Learning with Targeted Fine-Tuning and Data Augmentation
Inclusive ASR for Disfluent Speech: Cascaded Large-Scale Self-Supervised Learning with Targeted Fine-Tuning and Data Augmentation
Dena F. Mujtaba
Nihar R. Mahapatra
Megan Arney
J Scott Yaruss
Caryn Herring
Jia Bin
35
1
0
14 Jun 2024
Detecting the terminality of speech-turn boundary for spoken
  interactions in French TV and Radio content
Detecting the terminality of speech-turn boundary for spoken interactions in French TV and Radio content
Rémi Uro
Marie Tahon
D. Doukhan
Antoine Laurent
Albert Rilliard
38
0
0
14 Jun 2024
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Holy Lovenia
Rahmad Mahendra
Salsabil Maulana Akbar
Lester James V. Miranda
Jennifer Santoso
...
Genta Indra Winata
Ruochen Zhang
Fajri Koto
Zheng-Xin Yong
Samuel Cahyawijaya
92
9
0
14 Jun 2024
The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in
  Conversational Environments
The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in Conversational Environments
Shareef Babu Kalluri
Prachi Singh
Pratik Roy Chowdhuri
Apoorva Kulkarni
Shikha Baghel
...
Swapnil Sontakke
D. K T
S. R. M. Prasanna
Deepu Vijayasenan
Sriram Ganapathy
37
3
0
13 Jun 2024
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling
  Constraints, Languages, and Datasets
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
Jiatong Shi
Shih-Heng Wang
William Chen
Martijn Bartelds
Vanya Bannihatti Kumar
...
Xuankai Chang
Dan Jurafsky
Karen Livescu
Hung-yi Lee
Shinji Watanabe
AuLLM
77
5
0
12 Jun 2024
DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and
  Missing Labels
DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and Missing Labels
Samuele Cornell
Janek Ebbers
Constance Douwes
Irene Martín-Morató
Manu Harju
A. Mesaros
Romain Serizel
37
13
0
12 Jun 2024
Dual-Pipeline with Low-Rank Adaptation for New Language Integration in
  Multilingual ASR
Dual-Pipeline with Low-Rank Adaptation for New Language Integration in Multilingual ASR
Yerbolat Khassanov
Zhipeng Chen
Tianfeng Chen
Tze Yuang Chong
Wei Li
Jun Zhang
Lu Lu
Yuxuan Wang
AI4CE
21
0
0
12 Jun 2024
PolySpeech: Exploring Unified Multitask Speech Models for
  Competitiveness with Single-task Models
PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models
Runyan Yang
Huibao Yang
Xiqing Zhang
Tiantian Ye
Ying Liu
Yingying Gao
Shilei Zhang
Chao Deng
Junlan Feng
34
0
0
12 Jun 2024
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Xuankai Chang
Jiatong Shi
Jinchuan Tian
Yuning Wu
Yuxun Tang
Yihan Wu
Shinji Watanabe
Yossi Adi
Xie Chen
Qin Jin
47
15
0
11 Jun 2024
Cognitive Insights Across Languages: Enhancing Multimodal Interview
  Analysis
Cognitive Insights Across Languages: Enhancing Multimodal Interview Analysis
David Ortiz-Perez
José García Rodríguez
David Tomás
30
1
0
11 Jun 2024
Multimodal Belief Prediction
Multimodal Belief Prediction
John Murzaku
Adil Soubki
Owen Rambow
18
0
0
11 Jun 2024
An Improved Empirical Fisher Approximation for Natural Gradient Descent
An Improved Empirical Fisher Approximation for Natural Gradient Descent
Xiaodong Wu
Wenyi Yu
Chao Zhang
Philip Woodland
29
3
0
10 Jun 2024
INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of
  Progress in Speech Emotion Recognition
INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition
Andreas Triantafyllopoulos
A. Batliner
Simon Rampp
M. Milling
Björn Schuller
VLM
28
0
0
10 Jun 2024
Symmetric Dot-Product Attention for Efficient Training of BERT Language
  Models
Symmetric Dot-Product Attention for Efficient Training of BERT Language Models
Martin Courtois
Malte Ostendorff
Leonhard Hennig
Georg Rehm
39
2
0
10 Jun 2024
Learning Fine-Grained Controllability on Speech Generation via Efficient
  Fine-Tuning
Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning
Chung-Ming Chien
Andros Tjandra
Apoorv Vyas
Matt Le
Bowen Shi
Wei-Ning Hsu
32
0
0
10 Jun 2024
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Guanrou Yang
Ziyang Ma
Fan Yu
Zhifu Gao
Shiliang Zhang
Xie Chen
AuLLM
41
2
0
09 Jun 2024
Optimizing Multi-Stuttered Speech Classification: Leveraging Whisper's Encoder for Efficient Parameter Reduction in Automated Assessment
Optimizing Multi-Stuttered Speech Classification: Leveraging Whisper's Encoder for Efficient Parameter Reduction in Automated Assessment
Huma Ameer
Seemab Latif
Iram Tariq Bhatti
40
1
0
09 Jun 2024
Should you use a probabilistic duration model in TTS? Probably!
  Especially for spontaneous speech
Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech
Shivam Mehta
Harm Lameris
Rajiv Punmiya
Jonas Beskow
Éva Székely
G. Henter
33
1
0
08 Jun 2024
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
Zheshu Song
Jianheng Zhuo
Yifan Yang
Ziyang Ma
Shixiong Zhang
Xie Chen
36
9
0
07 Jun 2024
LLM-based speaker diarization correction: A generalizable approach
LLM-based speaker diarization correction: A generalizable approach
Georgios Efstathiadis
Vijay Yadav
Anzar Abbas
45
3
0
07 Jun 2024
InaGVAD : a Challenging French TV and Radio Corpus Annotated for Speech
  Activity Detection and Speaker Gender Segmentation
InaGVAD : a Challenging French TV and Radio Corpus Annotated for Speech Activity Detection and Speaker Gender Segmentation
D. Doukhan
Christine Maertens
William Le Personnic
Ludovic Speroni
Reda Dehak
38
2
0
06 Jun 2024
Beyond Performance Plateaus: A Comprehensive Study on Scalability in
  Speech Enhancement
Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
Wangyou Zhang
Kohei Saijo
Jee-weon Jung
Chenda Li
Shinji Watanabe
Yanmin Qian
32
4
0
06 Jun 2024
Hypernetworks for Personalizing ASR to Atypical Speech
Hypernetworks for Personalizing ASR to Atypical Speech
Max Müller-Eberstein
Dianna Yee
Karren D. Yang
G. Mantena
Colin S. Lea
33
0
0
06 Jun 2024
An interpretable speech foundation model for depression detection by revealing prediction-relevant acoustic features from long speech
An interpretable speech foundation model for depression detection by revealing prediction-relevant acoustic features from long speech
Qingkun Deng
Saturnino Luz
Sofia de la Fuente Garcia
28
0
0
05 Jun 2024
BIPED: Pedagogically Informed Tutoring System for ESL Education
BIPED: Pedagogically Informed Tutoring System for ESL Education
Soonwoo Kwon
Sojung Kim
Minju Park
Seunghyun Lee
Kyuseok Kim
29
3
0
05 Jun 2024
Discrete Multimodal Transformers with a Pretrained Large Language Model
  for Mixed-Supervision Speech Processing
Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing
V. Trinh
Rosy Southwell
Yiwen Guan
Xinlu He
Zhiyong Wang
Jacob Whitehill
OffRL
36
2
0
04 Jun 2024
Textless Acoustic Model with Self-Supervised Distillation for
  Noise-Robust Expressive Speech-to-Speech Translation
Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation
Min-Jae Hwang
Ilia Kulikov
Benjamin Peloquin
Hongyu Gong
Peng-Jen Chen
Ann Lee
32
1
0
04 Jun 2024
Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
Saierdaer Yusuyin
Te Ma
Hao Huang
Wenbo Zhao
Zhijian Ou
52
2
0
04 Jun 2024
SAVA: Scalable Learning-Agnostic Data Valuation
SAVA: Scalable Learning-Agnostic Data Valuation
Samuel Kessler
Tam Le
Vu Nguyen
TDI
61
0
0
03 Jun 2024
Towards a copilot in BIM authoring tool using a large language
  model-based agent for intelligent human-machine interaction
Towards a copilot in BIM authoring tool using a large language model-based agent for intelligent human-machine interaction
Changyu Du
Stavros Nousias
André Borrmann
LLMAG
26
2
0
02 Jun 2024
1st Place Solution to Odyssey Emotion Recognition Challenge Task1:
  Tackling Class Imbalance Problem
1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem
Mingjie Chen
Hezhao Zhang
Yuanchao Li
Jiachen Luo
Wen Wu
...
Lin Wang
P. Woodland
Xie Chen
Huy P Phan
Thomas Hain
25
0
0
30 May 2024
Gaussian Flow Bridges for Audio Domain Transfer with Unpaired Data
Gaussian Flow Bridges for Audio Domain Transfer with Unpaired Data
Eloi Moliner
Sebastian Braun
H. Gamper
OT
50
2
0
29 May 2024
CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
Xingqun Qi
Hengyuan Zhang
Yatian Wang
J. Pan
Chen Liu
...
Qixun Zhang
Shanghang Zhang
Wenhan Luo
Qifeng Liu
Qi-fei Liu
DiffM
SLR
110
5
0
27 May 2024
Crossmodal ASR Error Correction with Discrete Speech Units
Crossmodal ASR Error Correction with Discrete Speech Units
Yuanchao Li
Pinzhen Chen
Peter Bell
Catherine Lai
36
6
0
26 May 2024
Denoising LM: Pushing the Limits of Error Correction Models for Speech
  Recognition
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition
Zijin Gu
Tatiana Likhomanenko
Richard He Bai
Erik McDermott
R. Collobert
Navdeep Jaitly
AuLLM
51
2
0
24 May 2024
A Multi-Modal Explainability Approach for Human-Aware Robots in Multi-Party Conversation
A Multi-Modal Explainability Approach for Human-Aware Robots in Multi-Party Conversation
Iveta Becková
Stefan Pócos
Giulia Belgiovine
Marco Matarese
A. Sciutti
Carlo Mazzola
Carlo Mazzola
42
0
0
20 May 2024
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Yunxin Li
Shenyuan Jiang
Baotian Hu
Longyue Wang
Wanqi Zhong
Wenhan Luo
Lin Ma
Min-Ling Zhang
MoE
46
28
0
18 May 2024
SIGMA: An Open-Source Interactive System for Mixed-Reality Task
  Assistance Research
SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research
D. Bohus
Sean Andrist
Nick Saw
Ann Paradiso
Ishani Chakraborty
Mahdi Rad
38
9
0
16 May 2024
Listen Again and Choose the Right Answer: A New Paradigm for Automatic
  Speech Recognition with Large Language Models
Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models
Yuchen Hu
Chen Chen
Chengwei Qin
Qiushi Zhu
E. Chng
Ruizhe Li
AuLLM
KELM
49
5
0
16 May 2024
CinePile: A Long Video Question Answering Dataset and Benchmark
CinePile: A Long Video Question Answering Dataset and Benchmark
Ruchit Rawal
Khalid Saifullah
Ronen Basri
David Jacobs
Gowthami Somepalli
Tom Goldstein
43
39
0
14 May 2024
SpeechVerse: A Large-scale Generalizable Audio Language Model
SpeechVerse: A Large-scale Generalizable Audio Language Model
Nilaksh Das
Saket Dingliwal
S. Ronanki
Rohit Paturi
David Huang
...
Monica Sunkara
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
Katrin Kirchhoff
41
37
0
14 May 2024
G-VOILA: Gaze-Facilitated Information Querying in Daily Scenarios
G-VOILA: Gaze-Facilitated Information Querying in Daily Scenarios
Zeyu Wang
Yuanchun Shi
Yuntao wang
Yuchen Yao
Kun Yan
Yuhan Wang
Lei Ji
Xuhai Xu
Chun Yu
40
7
0
13 May 2024
AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss
  Weighting
AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss Weighting
Shreyan Ganguly
Roshan Nayak
Rakshith Rao
Ujan Deb
AP Prathosh
32
1
0
11 May 2024
An Investigation of Incorporating Mamba for Speech Enhancement
An Investigation of Incorporating Mamba for Speech Enhancement
Rong-Yu Chao
Wen-Huang Cheng
Moreno La Quatra
Sabato Marco Siniscalchi
Chao-Han Huck Yang
Szu-Wei Fu
Yu Tsao
Mamba
53
25
0
10 May 2024
LyS at SemEval-2024 Task 3: An Early Prototype for End-to-End Multimodal
  Emotion Linking as Graph-Based Parsing
LyS at SemEval-2024 Task 3: An Early Prototype for End-to-End Multimodal Emotion Linking as Graph-Based Parsing
Ana Ezquerro
David Vilares
38
1
0
10 May 2024
Lost in Transcription: Identifying and Quantifying the Accuracy Biases
  of Automatic Speech Recognition Systems Against Disfluent Speech
Lost in Transcription: Identifying and Quantifying the Accuracy Biases of Automatic Speech Recognition Systems Against Disfluent Speech
Dena F. Mujtaba
Nihar R. Mahapatra
Megan Arney
J Scott Yaruss
Hope Gerlach-Houck
Caryn Herring
Jia Bin
40
0
0
10 May 2024
Muting Whisper: A Universal Acoustic Adversarial Attack on Speech
  Foundation Models
Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation Models
Vyas Raina
Rao Ma
Charles G McGhee
Kate Knill
Mark J. F. Gales
AAML
33
4
0
09 May 2024
Mixat: A Data Set of Bilingual Emirati-English Speech
Mixat: A Data Set of Bilingual Emirati-English Speech
Maryam Al Ali
Hanan Aldarmaki
39
0
0
04 May 2024
Fake it to make it: Using synthetic data to remedy the data shortage in
  joint multimodal speech-and-gesture synthesis
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis
Shivam Mehta
Anna Deichler
Jim O'Regan
Birger Moëll
Jonas Beskow
G. Henter
Simon Alexanderson
46
4
0
30 Apr 2024
Previous
123...567...91011
Next