ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.04356
  4. Cited By
Robust Speech Recognition via Large-Scale Weak Supervision

Robust Speech Recognition via Large-Scale Weak Supervision

6 December 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
    OffRL
ArXivPDFHTML

Papers citing "Robust Speech Recognition via Large-Scale Weak Supervision"

50 / 500 papers shown
Title
Improving Voice Conversion for Dissimilar Speakers Using Perceptual
  Losses
Improving Voice Conversion for Dissimilar Speakers Using Perceptual Losses
Suhita Ghosh
Yamini Sinha
Ingo Siegert
Sebastian Stober
11
1
0
15 Sep 2023
Can Whisper perform speech-based in-context learning?
Can Whisper perform speech-based in-context learning?
Siyin Wang
Chao-Han Huck Yang
Ji Wu
Chao Zhang
29
24
0
13 Sep 2023
LanSER: Language-Model Supported Speech Emotion Recognition
LanSER: Language-Model Supported Speech Emotion Recognition
Taesik Gong
Joshua Belanich
Krishna Somandepalli
Arsha Nagrani
B. Eoff
Brendan Jou
33
10
0
07 Sep 2023
Matcha-TTS: A fast TTS architecture with conditional flow matching
Matcha-TTS: A fast TTS architecture with conditional flow matching
Shivam Mehta
Ruibo Tu
Jonas Beskow
Éva Székely
G. Henter
24
69
0
06 Sep 2023
Learning Speech Representation From Contrastive Token-Acoustic
  Pretraining
Learning Speech Representation From Contrastive Token-Acoustic Pretraining
Chunyu Qiang
Hao Li
Yixin Tian
Ruibo Fu
Tao Wang
Longbiao Wang
J. Dang
29
5
0
01 Sep 2023
Maestro: Uncovering Low-Rank Structures via Trainable Decomposition
Maestro: Uncovering Low-Rank Structures via Trainable Decomposition
Samuel Horváth
Stefanos Laskaridis
Shashank Rajput
Hongyi Wang
BDL
32
4
0
28 Aug 2023
Mobile Foundation Model as Firmware
Mobile Foundation Model as Firmware
Jinliang Yuan
Chenchen Yang
Dongqi Cai
Shihe Wang
Xin Yuan
...
Di Zhang
Hanzi Mei
Xianqing Jia
Shangguang Wang
Mengwei Xu
40
19
0
28 Aug 2023
Spoken Language Intelligence of Large Language Models for Language Learning
Spoken Language Intelligence of Large Language Models for Language Learning
Linkai Peng
Baorian Nuchged
Yingming Gao
ELM
62
4
0
28 Aug 2023
An Effective Transformer-based Contextual Model and Temporal Gate
  Pooling for Speaker Identification
An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Harunori Kawano
Sota Shimizu
30
1
0
22 Aug 2023
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
Sang-Hoon Lee
Haram Choi
H. Oh
Seong-Whan Lee
BDL
28
9
0
30 Jul 2023
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model
  and Language Model: A Comparative Study of Semantic Coding
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding
Chunyu Qiang
Hao Li
Hao Ni
He Qu
Ruibo Fu
Tao Wang
Longbiao Wang
J. Dang
DiffM
30
8
0
28 Jul 2023
Cascaded Cross-Modal Transformer for Request and Complaint Detection
Cascaded Cross-Modal Transformer for Request and Complaint Detection
Nicolae-Cătălin Ristea
Radu Tudor Ionescu
36
3
0
27 Jul 2023
Turning Whisper into Real-Time Transcription System
Turning Whisper into Real-Time Transcription System
Dominik Machávcek
Raj Dabre
Ondrej Bojar
19
22
0
27 Jul 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Kun Yuan
V. Srivastav
Tong Yu
Joël L. Lavanchy
Pietro Mascagni
Pietro Mascagni
N. Padoy
Nicolas Padoy
35
20
0
27 Jul 2023
Adaptation of Whisper models to child speech recognition
Adaptation of Whisper models to child speech recognition
Rishabh Jain
Andrei Barcovschi
Mariam Yiwere
Peter Corcoran
H. Cucu
16
30
0
24 Jul 2023
MeetEval: A Toolkit for Computation of Word Error Rates for Meeting
  Transcription Systems
MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems
Thilo von Neumann
Christoph Boeddeker
Marc Delcroix
Reinhold Haeb-Umbach
29
16
0
21 Jul 2023
Replay to Remember: Continual Layer-Specific Fine-tuning for German
  Speech Recognition
Replay to Remember: Continual Layer-Specific Fine-tuning for German Speech Recognition
Theresa Pekarek-Rosin
S. Wermter
VLM
CLL
29
2
0
14 Jul 2023
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for
  Speech Recognition and Understanding
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding
Titouan Parcollet
Rogier van Dalen
Shucong Zhang
S. Bhattacharya
26
6
0
12 Jul 2023
On the Use of Self-Supervised Speech Representations in Spontaneous
  Speech Synthesis
On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis
Siyang Wang
G. Henter
Joakim Gustafson
Éva Székely
47
5
0
11 Jul 2023
ChatGPT in the Age of Generative AI and Large Language Models: A Concise Survey
S. Mohamadi
G. Mujtaba
Ngan Le
Gianfranco Doretto
Don Adjeroh
LM&MA
AI4MH
26
21
0
09 Jul 2023
MultiVENT: Multilingual Videos of Events with Aligned Natural Text
MultiVENT: Multilingual Videos of Events with Aligned Natural Text
Kate Sanders
David Etter
Reno Kriz
Benjamin Van Durme
VGen
42
7
0
06 Jul 2023
Align With Purpose: Optimize Desired Properties in CTC Models with a
  General Plug-and-Play Framework
Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework
Eliya Segev
Maya Alroy
Ronen Katsir
Noam Wies
Ayana Shenhav
...
D. Zar
Oren Tadmor
Jacob Bitterman
Amnon Shashua
Tal Rosenwein
32
2
0
04 Jul 2023
Boosting Norwegian Automatic Speech Recognition
Boosting Norwegian Automatic Speech Recognition
Javier de la Rosa
Rolv-Arild Braaten
P. Kummervold
Freddy Wetjen
Svein Arne Brygfjeld
38
7
0
04 Jul 2023
Transcribing Educational Videos Using Whisper: A preliminary study on
  using AI for transcribing educational videos
Transcribing Educational Videos Using Whisper: A preliminary study on using AI for transcribing educational videos
Ashwin Rao
16
6
0
04 Jul 2023
NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition
  via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning
NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning
K. Yuksel
Thiago Castro Ferreira
Golara Javadi
Mohamed El-Badrashiny
Ahmet Gunduz
26
4
0
21 Jun 2023
Quilt-1M: One Million Image-Text Pairs for Histopathology
Quilt-1M: One Million Image-Text Pairs for Histopathology
Wisdom O. Ikezogwo
M. S. Seyfioglu
Fatemeh Ghezloo
Dylan Stefan Chan Geva
Fatwir Sheikh Mohammed
Pavan Kumar Anand
Ranjay Krishna
Linda G. Shapiro
CLIP
VLM
139
114
0
20 Jun 2023
Towards End-to-end Speech-to-text Summarization
Towards End-to-end Speech-to-text Summarization
Raul Monteiro
Diogo Pernes
9
1
0
06 Jun 2023
Alzheimer Disease Classification through ASR-based Transcriptions:
  Exploring the Impact of Punctuation and Pauses
Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses
Lucía Gómez Zaragozá
Simone Wills
Cristian Tejedor-García
Javier Marín-Morales
Mariano Alcañiz
H. Strik
24
8
0
06 Jun 2023
Learning When to Speak: Latency and Quality Trade-offs for Simultaneous
  Speech-to-Speech Translation with Offline Models
Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models
Liam Dugan
Anshul Wadhawan
Kyle Spence
Chris Callison-Burch
Morgan McGuire
Victor Zordan
OffRL
30
2
0
01 Jun 2023
Some voices are too common: Building fair speech recognition systems
  using the Common Voice dataset
Some voices are too common: Building fair speech recognition systems using the Common Voice dataset
Lucas Maison
Yannick Esteve
26
3
0
01 Jun 2023
Voice Conversion With Just Nearest Neighbors
Voice Conversion With Just Nearest Neighbors
Matthew Baas
Benjamin van Niekerk
Herman Kamper
SSL
32
48
0
30 May 2023
Investigating Pre-trained Audio Encoders in the Low-Resource Condition
Investigating Pre-trained Audio Encoders in the Low-Resource Condition
Haomiao Yang
Jinming Zhao
Gholamreza Haffari
Ehsan Shareghi
21
6
0
28 May 2023
External Language Model Integration for Factorized Neural Transducers
External Language Model Integration for Factorized Neural Transducers
Michael Levit
S. Parthasarathy
Cem Aksoylar
Mohammad Sadegh Rasooli
Shuangyu Chang
29
2
0
26 May 2023
DistriBlock: Identifying adversarial audio samples by leveraging
  characteristics of the output distribution
DistriBlock: Identifying adversarial audio samples by leveraging characteristics of the output distribution
Matías P. Pizarro
D. Kolossa
Asja Fischer
AAML
35
1
0
26 May 2023
NormMark: A Weakly Supervised Markov Model for Socio-cultural Norm
  Discovery
NormMark: A Weakly Supervised Markov Model for Socio-cultural Norm Discovery
Farhad Moghimifar
Shilin Qu
Tongtong Wu
Yuan-Fang Li
Gholamreza Haffari
34
4
0
26 May 2023
Context-aware attention layers coupled with optimal transport domain
  adaptation and multimodal fusion methods for recognizing dementia from
  spontaneous speech
Context-aware attention layers coupled with optimal transport domain adaptation and multimodal fusion methods for recognizing dementia from spontaneous speech
Loukas Ilias
D. Askounis
31
9
0
25 May 2023
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text
  Translation
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
Chenyang Le
Yao Qian
Long Zhou
Shujie Liu
Yanmin Qian
Michael Zeng
Xuedong Huang
24
13
0
24 May 2023
Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic
  Modeling of life histories of the Museum of the Person
Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person
L. Gris
R. Marcacini
Arnaldo Cândido Júnior
Edresson Casanova
A. S. Soares
S. Aluísio
21
7
0
23 May 2023
Debiased Automatic Speech Recognition for Dysarthric Speech via Sample
  Reweighting with Sample Affinity Test
Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test
Eungbeom Kim
Yunkee Chae
Jaeheon Sim
Kyogu Lee
17
1
0
22 May 2023
Textually Pretrained Speech Language Models
Textually Pretrained Speech Language Models
Michael Hassid
Tal Remez
Tu Nguyen
Itai Gat
Alexis Conneau
...
Alexandre Défossez
Gabriel Synnaeve
Emmanuel Dupoux
Roy Schwartz
Yossi Adi
VLM
SyDa
31
53
0
22 May 2023
CopyNE: Better Contextual ASR by Copying Named Entities
CopyNE: Better Contextual ASR by Copying Named Entities
Shilin Zhou
Zhenghua Li
Yu Hong
Mengdi Zhang
Zhefeng Wang
Baoxing Huai
15
5
0
22 May 2023
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech
  Pre-Training for Adaptation to Unseen Languages
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages
Andrew Rouditchenko
Sameer Khurana
Samuel Thomas
Rogerio Feris
Leonid Karlinsky
Hilde Kuehne
David Harwath
Brian Kingsbury
James R. Glass
VLM
37
22
0
21 May 2023
i-Code V2: An Autoregressive Generation Framework over Vision, Language,
  and Speech Data
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Ziyi Yang
Mahmoud Khademi
Yichong Xu
Reid Pryzant
Yuwei Fang
...
Yu Shi
Lu Yuan
Takuya Yoshioka
Michael Zeng
Xuedong Huang
17
2
0
21 May 2023
Scaling laws for language encoding models in fMRI
Scaling laws for language encoding models in fMRI
Richard Antonello
Aditya R. Vaidya
Alexander G. Huth
MedIm
30
55
0
19 May 2023
Solving NLP Problems through Human-System Collaboration: A
  Discussion-based Approach
Solving NLP Problems through Human-System Collaboration: A Discussion-based Approach
Masahiro Kaneko
Graham Neubig
Naoaki Okazaki
36
6
0
19 May 2023
MD3: The Multi-Dialect Dataset of Dialogues
MD3: The Multi-Dialect Dataset of Dialogues
Jacob Eisenstein
Vinodkumar Prabhakaran
Clara E. Rivera
Dorottya Demszky
D. Sharma
35
7
0
19 May 2023
Data Redaction from Conditional Generative Models
Data Redaction from Conditional Generative Models
Zhifeng Kong
Kamalika Chaudhuri
KELM
16
7
0
18 May 2023
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions
  with Large Language Model
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model
Siyuan Huang
Zhengkai Jiang
Hao Dong
Yu Qiao
Peng Gao
Hongsheng Li
LM&Ro
27
93
0
18 May 2023
An Android Robot Head as Embodied Conversational Agent
An Android Robot Head as Embodied Conversational Agent
Marcel Heisler
C. Becker-Asano
LM&Ro
LLMAG
29
0
0
18 May 2023
The Interpreter Understands Your Meaning: End-to-end Spoken Language
  Understanding Aided by Speech Translation
The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation
Mutian He
Philip N. Garner
44
4
0
16 May 2023
Previous
123...1089
Next