ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.04356
  4. Cited By
Robust Speech Recognition via Large-Scale Weak Supervision

Robust Speech Recognition via Large-Scale Weak Supervision

6 December 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
    OffRL
ArXivPDFHTML

Papers citing "Robust Speech Recognition via Large-Scale Weak Supervision"

50 / 514 papers shown
Title
Mixat: A Data Set of Bilingual Emirati-English Speech
Mixat: A Data Set of Bilingual Emirati-English Speech
Maryam Al Ali
Hanan Aldarmaki
41
0
0
04 May 2024
Fake it to make it: Using synthetic data to remedy the data shortage in
  joint multimodal speech-and-gesture synthesis
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis
Shivam Mehta
Anna Deichler
Jim O'Regan
Birger Moëll
Jonas Beskow
G. Henter
Simon Alexanderson
46
4
0
30 Apr 2024
Automatic Speech Recognition System-Independent Word Error Rate
  Estimation
Automatic Speech Recognition System-Independent Word Error Rate Estimation
Chanho Park
Mingjie Chen
Thomas Hain
26
0
0
25 Apr 2024
Rethinking Processing Distortions: Disentangling the Impact of Speech
  Enhancement Errors on Speech Recognition Performance
Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance
Tsubasa Ochiai
Kazuma Iwamoto
Marc Delcroix
Rintaro Ikeshita
Hiroshi Sato
Shoko Araki
Shigeru Katagiri
29
2
0
23 Apr 2024
Crossing the principle-practice gap in AI ethics with ethical
  problem-solving
Crossing the principle-practice gap in AI ethics with ethical problem-solving
N. Corrêa
James William Santos
Camila Galvão
Marcelo Pasetti
Dieine Schiavon
Faizah Naqvi
Robayet Hossain
N. D. Oliveira
40
4
0
16 Apr 2024
Anatomy of Industrial Scale Multilingual ASR
Anatomy of Industrial Scale Multilingual ASR
Francis McCann Ramirez
Luka Chkhetiani
Andrew Ehrenberg
R. McHardy
Rami Botros
...
Ahmed Efty
Daniel McCrystal
Sam Flamini
Domenic Donato
Takuya Yoshioka
42
7
0
15 Apr 2024
Navigating the Landscape of Large Language Models: A Comprehensive
  Review and Analysis of Paradigms and Fine-Tuning Strategies
Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies
Benjue Weng
LM&MA
46
8
0
13 Apr 2024
Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task
Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task
Hassan Ali
Philipp Allgeuer
Stefan Wermter
54
1
0
12 Apr 2024
Behavior Trees Enable Structured Programming of Language Model Agents
Behavior Trees Enable Structured Programming of Language Model Agents
Richard Kelley
AI4CE
LM&Ro
LLMAG
40
0
0
11 Apr 2024
An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution
An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution
Tien-Hong Lo
Fu-An Chao
Tzu-I Wu
Yao-Ting Sung
Berlin Chen
23
3
0
11 Apr 2024
Linguistic Changes in Spontaneous Speech for Detecting Parkinsons
  Disease Using Large Language Models
Linguistic Changes in Spontaneous Speech for Detecting Parkinsons Disease Using Large Language Models
Jonathan Crawford
41
0
0
08 Apr 2024
Exploration is Harder than Prediction: Cryptographically Separating
  Reinforcement Learning from Supervised Learning
Exploration is Harder than Prediction: Cryptographically Separating Reinforcement Learning from Supervised Learning
Noah Golowich
Ankur Moitra
Dhruv Rohatgi
OffRL
35
4
0
04 Apr 2024
The VoicePrivacy 2024 Challenge Evaluation Plan
The VoicePrivacy 2024 Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Pierre Champion
Sarina Meyer
Xin Wang
Emmanuel Vincent
Michele Panariello
Nicholas W. D. Evans
Junichi Yamagishi
Massimiliano Todisco
41
22
0
03 Apr 2024
Voice EHR: Introducing Multimodal Audio Data for Health
Voice EHR: Introducing Multimodal Audio Data for Health
James Anibal
Hannah Huth
Ming Li
Lindsey A Hazen
Y. Lam
...
Emily Ricotta
David A. Clifton
Louise Thwaites
Yael Bensoussan
Bradford J. Wood
37
1
0
02 Apr 2024
Chat Modeling: Natural Language-based Procedural Modeling of Biological
  Structures without Training
Chat Modeling: Natural Language-based Procedural Modeling of Biological Structures without Training
Donggang Jia
Yunhai Wang
Ivan Viola
40
1
0
01 Apr 2024
WavLLM: Towards Robust and Adaptive Speech Large Language Model
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Shujie Hu
Long Zhou
Shujie Liu
Sanyuan Chen
Hongkun Hao
...
Xunying Liu
Jinyu Li
S. Sivasankaran
Linquan Liu
Furu Wei
AuLLM
21
45
0
31 Mar 2024
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Yash Jain
David M. Chan
Pranav Dheram
Aparna Khare
Olabanji Shonibare
Venkatesh Ravichandran
Shalini Ghosh
40
2
0
28 Mar 2024
Towards Multimodal Video Paragraph Captioning Models Robust to Missing
  Modality
Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality
Sishuo Chen
Lei Li
Shuhuai Ren
Rundong Gao
Yuanxin Liu
Xiaohan Bi
Xu Sun
Lu Hou
45
3
0
28 Mar 2024
PhoWhisper: Automatic Speech Recognition for Vietnamese
PhoWhisper: Automatic Speech Recognition for Vietnamese
Thanh-Thien Le
L. T. Nguyen
Dat Quoc Nguyen
29
3
0
27 Mar 2024
Learning with SASQuaTCh: a Novel Variational Quantum Transformer Architecture with Kernel-Based Self-Attention
Learning with SASQuaTCh: a Novel Variational Quantum Transformer Architecture with Kernel-Based Self-Attention
Ethan N. Evans
Matthew G. Cook
Zachary P. Bradshaw
Margarite L. LaBorde
48
5
0
21 Mar 2024
MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models
MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models
Zunnan Xu
Yukang Lin
Haonan Han
Sicheng Yang
Ronghui Li
Yachao Zhang
Xiu Li
Mamba
46
25
0
14 Mar 2024
Borrowing Treasures from Neighbors: In-Context Learning for Multimodal
  Learning with Missing Modalities and Data Scarcity
Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity
Zhuo Zhi
Ziquan Liu
M. Elbadawi
Adam Daneshmend
Mine Orlu
Abdul Basit
Andreas Demosthenous
Miguel R. D. Rodrigues
36
2
0
14 Mar 2024
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast
  Conformer
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
Maxime Burchi
Krishna C. Puvvada
Jagadeesh Balam
Boris Ginsburg
Radu Timofte
44
8
0
14 Mar 2024
A New Benchmark for Evaluating Automatic Speech Recognition in the
  Arabic Call Domain
A New Benchmark for Evaluating Automatic Speech Recognition in the Arabic Call Domain
Qusai Abo Obaidah
Muhy Eddin Za'ter
Adnan Jaljuli
Ali Mahboub
Asma Hakouz
Bashar Alfrou
Yazan Estaitia
21
1
0
07 Mar 2024
SaulLM-7B: A pioneering Large Language Model for Law
SaulLM-7B: A pioneering Large Language Model for Law
Pierre Colombo
T. Pires
Malik Boudiaf
Dominic Culver
Rui Melo
...
Andre F. T. Martins
Fabrizio Esposito
Vera Lúcia Raposo
Sofia Morgado
Michael Desa
ELM
AILaw
52
66
0
06 Mar 2024
Neural Additive Image Model: Interpretation through Interpolation
Neural Additive Image Model: Interpretation through Interpolation
Arik Reuter
Anton Thielmann
Benjamin Saefken
DiffM
37
1
0
06 Mar 2024
Adversarial Infrared Geometry: Using Geometry to Perform Adversarial
  Attack against Infrared Pedestrian Detectors
Adversarial Infrared Geometry: Using Geometry to Perform Adversarial Attack against Infrared Pedestrian Detectors
Kalibinuer Tiliwalidi
AAML
51
0
0
06 Mar 2024
RADIA -- Radio Advertisement Detection with Intelligent Analytics
RADIA -- Radio Advertisement Detection with Intelligent Analytics
Jorge Álvarez
J. C. Armenteros
Camilo Torrón
Miguel Ortega-Martín
Alfonso Ardoiz
...
Íñigo Galdeano
Ignacio Garrido
Adrián Alonso
Fernando Bayón
Oleg Vorontsov
26
0
0
06 Mar 2024
Non-verbal information in spontaneous speech -- towards a new framework
  of analysis
Non-verbal information in spontaneous speech -- towards a new framework of analysis
Tirza Biron
Moshe Barboy
Eran Ben-Artzy
Alona Golubchik
Yanir Marmor
Smadar Szekely
Yaron Winter
David Harel
37
0
0
06 Mar 2024
Single-Channel Robot Ego-Speech Filtering during Human-Robot Interaction
Single-Channel Robot Ego-Speech Filtering during Human-Robot Interaction
Yue Li
Koen V. Hindriks
Florian A. Kunneman
35
2
0
05 Mar 2024
NeuroVoz: a Castillian Spanish corpus of parkinsonian speech
NeuroVoz: a Castillian Spanish corpus of parkinsonian speech
Janaína Mendes-Laureano
Jorge A. Gómez-García
Alejandro Guerrero-López
Elisa Luque-Buzo
Julián D. Arias-Londoño
Francisco J. Grandas-Pérez
Juan Ignacio Godino-Llorente
13
4
0
04 Mar 2024
CustomListener: Text-guided Responsive Interaction for User-friendly
  Listening Head Generation
CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation
Xi Liu
Ying Guo
Cheng Zhen
Tong Li
Yingying Ao
Pengfei Yan
DiffM
34
3
0
01 Mar 2024
Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale
  Speech Recognition
Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition
Jeehyun Lee
Yerin Choi
Tae-Jin Song
M. Koo
16
4
0
29 Feb 2024
High-Fidelity Neural Phonetic Posteriorgrams
High-Fidelity Neural Phonetic Posteriorgrams
Cameron Churchwell
Max Morrison
Bryan Pardo
40
5
0
27 Feb 2024
Direct Punjabi to English speech translation using discrete units
Direct Punjabi to English speech translation using discrete units
Prabhjot Kaur
L. A. M. Bush
Weisong Shi
31
0
0
25 Feb 2024
Advancing Large Language Models to Capture Varied Speaking Styles and
  Respond Properly in Spoken Conversations
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations
Guan-Ting Lin
Cheng-Han Chiang
Hung-yi Lee
34
24
0
20 Feb 2024
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech
  Recognition, Translation, and Language Identification
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Yifan Peng
Yui Sudo
Muhammad Shakeel
Shinji Watanabe
VLM
37
17
0
20 Feb 2024
Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up
  Speech Diffusion Model
Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model
Xiangyu Zhang
Daijiao Liu
Hexin Liu
Qiquan Zhang
Hanyu Meng
Leibny Paola García
Chng Eng Siong
Lina Yao
DiffM
25
3
0
16 Feb 2024
Large Language Models "Ad Referendum": How Good Are They at Machine
  Translation in the Legal Domain?
Large Language Models "Ad Referendum": How Good Are They at Machine Translation in the Legal Domain?
Vicent Briva-Iglesias
Joao Lucas Cavalheiro Camargo
Gokhan Dogru
AILaw
ELM
38
7
0
12 Feb 2024
GET-Tok: A GenAI-Enriched Multimodal TikTok Dataset Documenting the 2022
  Attempted Coup in Peru
GET-Tok: A GenAI-Enriched Multimodal TikTok Dataset Documenting the 2022 Attempted Coup in Peru
Gabriela Pinto
Keith Burghardt
Kristina Lerman
Emilio Ferrara
11
4
0
08 Feb 2024
BAT: Learning to Reason about Spatial Sounds with Large Language Models
BAT: Learning to Reason about Spatial Sounds with Large Language Models
Zhisheng Zheng
Puyuan Peng
Ziyang Ma
Xie Chen
Eunsol Choi
David Harwath
LRM
35
14
0
02 Feb 2024
Institutional Platform for Secure Self-Service Large Language Model Exploration
Institutional Platform for Secure Self-Service Large Language Model Exploration
V. Bumgardner
Mitchell A. Klusty
W. V. Logan
Samuel E. Armstrong
Caylin D. Hickey
Jeff Talbert
Caylin Hickey
Jeff Talbert
58
1
0
01 Feb 2024
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on
  E-Branchformer
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Yifan Peng
Jinchuan Tian
William Chen
Siddhant Arora
Brian Yan
...
Kwanghee Choi
Jiatong Shi
Xuankai Chang
Jee-weon Jung
Shinji Watanabe
VLM
OSLM
34
40
0
30 Jan 2024
Comuniqa : Exploring Large Language Models for improving speaking skills
Comuniqa : Exploring Large Language Models for improving speaking skills
Manas Mhasakar
Shikhar Sharma
Apurv Mehra
Utkarsh Venaik
Ujjwal Singhal
Dhruv Kumar
Kashish Mittal
22
4
0
28 Jan 2024
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
Chenpeng Du
Yiwei Guo
Hankun Wang
Yifan Yang
Zhikang Niu
Shuai Wang
Hui Zhang
Xie Chen
Kai Yu
VLM
35
25
0
25 Jan 2024
Speech foundation models on intelligibility prediction for
  hearing-impaired listeners
Speech foundation models on intelligibility prediction for hearing-impaired listeners
Santiago Cuervo
R. Marxer
38
6
0
24 Jan 2024
Large Language Models are Efficient Learners of Noise-Robust Speech
  Recognition
Large Language Models are Efficient Learners of Noise-Robust Speech Recognition
Yuchen Hu
Chen Chen
Chao-Han Huck Yang
Ruizhe Li
Chao Zhang
Pin-Yu Chen
Ensiong Chng
27
20
0
19 Jan 2024
Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks
Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks
Yichao Du
Zhirui Zhang
Linan Yue
Xu Huang
Yuqing Zhang
Tong Xu
Linli Xu
Enhong Chen
FedML
67
5
0
18 Jan 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&Ro
LLMAG
69
35
0
16 Jan 2024
Cascaded Cross-Modal Transformer for Audio-Textual Classification
Cascaded Cross-Modal Transformer for Audio-Textual Classification
Nicolae-Cătălin Ristea
Andrei Anghel
Radu Tudor Ionescu
36
2
0
15 Jan 2024
Previous
123...10116789
Next