ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.04356
  4. Cited By
Robust Speech Recognition via Large-Scale Weak Supervision

Robust Speech Recognition via Large-Scale Weak Supervision

6 December 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
    OffRL
ArXivPDFHTML

Papers citing "Robust Speech Recognition via Large-Scale Weak Supervision"

50 / 510 papers shown
Title
Unconstrained Dysfluency Modeling for Dysfluent Speech Transcription and
  Detection
Unconstrained Dysfluency Modeling for Dysfluent Speech Transcription and Detection
Jiachen Lian
Carly Feng
Naasir Farooqi
Steve Li
Anshul Kashyap
Cheol Jun Cho
Peter Wu
Robin Netzorg
Tingle Li
Gopala Krishna Anumanchipalli
40
13
0
20 Dec 2023
Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition
Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition
Peng Shen
Xugang Lu
Hisashi Kawai
29
1
0
18 Dec 2023
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang
Liumeng Xue
Yicheng Gu
Yuancheng Wang
Haorui He
...
Mingxuan Wang
Jun Han
Kai Chen
Haizhou Li
Zhizheng Wu
29
28
0
15 Dec 2023
GSQA: An End-to-End Model for Generative Spoken Question Answering
GSQA: An End-to-End Model for Generative Spoken Question Answering
Min-Han Shih
Ho-Lam Chung
Yu-Chi Pai
Ming-Hao Hsu
Guan-Ting Lin
Shang-Wen Li
Hung-yi Lee
ELM
AuLLM
33
2
0
15 Dec 2023
Attention-Guided Adaptation for Code-Switching Speech Recognition
Attention-Guided Adaptation for Code-Switching Speech Recognition
Bobbi Aditya
Mahdin Rohmatillah
Liang-Hsuan Tai
Jen-Tzung Chien
31
8
0
14 Dec 2023
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech
  Recognition with Universal Speech Models
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
Shaojin Ding
David Qiu
David Rim
Yanzhang He
Oleg Rybakov
...
Tara N. Sainath
Zhonglin Han
Jian Li
Amir Yazdanbakhsh
Shivani Agrawal
MQ
31
9
0
13 Dec 2023
Toward a Reinforcement-Learning-Based System for Adjusting Medication to
  Minimize Speech Disfluency
Toward a Reinforcement-Learning-Based System for Adjusting Medication to Minimize Speech Disfluency
Pavlos Constas
Vikram Rawal
Matthew Honorio Oliveira
Andreas Constas
Aditya Khan
...
Heraa Murqi
Asad Khan
Nimit Amikumar Bhanshali
Youssef Rachad
Michael Guerzhoy
OffRL
15
0
0
12 Dec 2023
Photorealistic Video Generation with Diffusion Models
Photorealistic Video Generation with Diffusion Models
Agrim Gupta
Lijun Yu
Kihyuk Sohn
Xiuye Gu
Meera Hahn
Fei-Fei Li
Irfan Essa
Lu Jiang
José Lezama
VGen
59
177
0
11 Dec 2023
TabMT: Generating tabular data with masked transformers
TabMT: Generating tabular data with masked transformers
Manbir Gulati
Paul F. Roysdon
LMTD
50
33
0
11 Dec 2023
Audio-Visual LLM for Video Understanding
Audio-Visual LLM for Video Understanding
Fangxun Shu
Lei Zhang
Hao Jiang
Cihang Xie
VLM
MLLM
27
38
0
11 Dec 2023
Multimodal Data and Resource Efficient Device-Directed Speech Detection
  with Large Foundation Models
Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models
Dominik Wagner
Alexander W. Churchill
Siddharth Sigtia
Panayiotis Georgiou
Matt Mirsamadi
Aarshee Mishra
Erik Marchi
26
3
0
06 Dec 2023
Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain
  Adaptation
Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain Adaptation
Linzi Xing
Quan Tran
Fabian Caba
Franck Dernoncourt
Seunghyun Yoon
Zhaowen Wang
Trung Bui
Giuseppe Carenini
46
1
0
30 Nov 2023
Decentralized Deepfake Detection Blockchain Network using Dynamic
  Algorithm management
Decentralized Deepfake Detection Blockchain Network using Dynamic Algorithm management
Dipankar Sarkar
23
1
0
30 Nov 2023
FAT-HuBERT: Front-end Adaptive Training of Hidden-unit BERT for
  Distortion-Invariant Robust Speech Recognition
FAT-HuBERT: Front-end Adaptive Training of Hidden-unit BERT for Distortion-Invariant Robust Speech Recognition
Dongning Yang
Wei Wang
Yanmin Qian
13
3
0
29 Nov 2023
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech
  Gesture Generation
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Xingqun Qi
Jiahao Pan
Peng Li
Ruibin Yuan
Xiaowei Chi
...
Wenhan Luo
Wei Xue
Shanghang Zhang
Qi-fei Liu
Yi-Ting Guo
SLR
34
11
0
29 Nov 2023
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context
  Learning
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Chaoyi Zhang
K. Lin
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
VGen
23
28
0
29 Nov 2023
Critical Influence of Overparameterization on Sharpness-aware Minimization
Critical Influence of Overparameterization on Sharpness-aware Minimization
Sungbin Shin
Dongyeop Lee
Maksym Andriushchenko
Namhoon Lee
AAML
44
1
0
29 Nov 2023
The Claire French Dialogue Dataset
The Claire French Dialogue Dataset
Julie Hunter
Jérôme Louradour
Virgile Rennard
Ismail Harrando
Guokan Shang
Jean-Pierre Lorré
29
1
0
28 Nov 2023
Do VSR Models Generalize Beyond LRS3?
Do VSR Models Generalize Beyond LRS3?
Y. A. D. Djilali
Sanath Narayan
Eustache Le Bihan
Haithem Boussaid
Ebtesam Almazrouei
Merouane Debbah
35
4
0
23 Nov 2023
A Safer Vision-based Autonomous Planning System for Quadrotor UAVs with
  Dynamic Obstacle Trajectory Prediction and Its Application with LLMs
A Safer Vision-based Autonomous Planning System for Quadrotor UAVs with Dynamic Obstacle Trajectory Prediction and Its Application with LLMs
J. Zhong
Ming Li
Yinliang Chen
Zihang Wei
Fan Yang
Haoran Shen
32
14
0
21 Nov 2023
DINO-VITS: Data-Efficient Zero-Shot TTS with Self-Supervised Speaker
  Verification Loss for Noise Robustness
DINO-VITS: Data-Efficient Zero-Shot TTS with Self-Supervised Speaker Verification Loss for Noise Robustness
Vikentii Pankov
Valeria Pronina
Alexander Kuzmin
Maksim Borisov
Nikita Usoltsev
Xingshan Zeng
Alexander Golubkov
Nikolai Ermolenko
Aleksandra Shirshova
Yulia Matveeva
29
2
0
16 Nov 2023
Fumbling in Babel: An Investigation into ChatGPT's Language
  Identification Ability
Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability
Wei-Rui Chen
Ife Adebara
Khai Duy Doan
Qisheng Liao
Muhammad Abdul-Mageed
19
5
0
16 Nov 2023
R-Spin: Efficient Speaker and Noise-invariant Representation Learning
  with Acoustic Pieces
R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces
Heng-Jui Chang
James R. Glass
33
3
0
15 Nov 2023
Fast Certification of Vision-Language Models Using Incremental
  Randomized Smoothing
Fast Certification of Vision-Language Models Using Incremental Randomized Smoothing
Ashutosh Nirala
Ameya Joshi
Chinmay Hegde
S Sarkar
VLM
36
0
0
15 Nov 2023
The taste of IPA: Towards open-vocabulary keyword spotting and forced
  alignment in any language
The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language
Jian Zhu
Changbing Yang
Farhan Samir
Jahurul Islam
32
4
0
14 Nov 2023
Qwen-Audio: Advancing Universal Audio Understanding via Unified
  Large-Scale Audio-Language Models
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
42
274
0
14 Nov 2023
On the Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition
On the Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition
Xiaohan Shi
Jiajun He
Xingfeng Li
T. Toda
34
3
0
13 Nov 2023
Towards End-to-End Spoken Grammatical Error Correction
Towards End-to-End Spoken Grammatical Error Correction
Stefano Bannò
Rao Ma
Mengjie Qian
Kate Knill
Mark J. F. Gales
24
2
0
09 Nov 2023
Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust
  Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation
Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation
Haram Choi
Sang-Hoon Lee
Seong-Whan Lee
DiffM
21
24
0
08 Nov 2023
SAGE: Smart home Agent with Grounded Execution
SAGE: Smart home Agent with Grounded Execution
D. Rivkin
F. Hogan
Amal Feriani
Abhisek Konar
Adam Sigal
Steve Liu
Gregory Dudek
LM&Ro
LLMAG
ELM
LRM
34
3
0
01 Nov 2023
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech
  Translation
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation
Juan Pablo Zuluaga
Zhaocheng Huang
Xing Niu
Rohit Paturi
S. Srinivasan
Prashant Mathur
Brian Thompson
Marcello Federico
BDL
35
2
0
01 Nov 2023
MM-VID: Advancing Video Understanding with GPT-4V(ision)
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Kevin Qinghong Lin
Faisal Ahmed
Linjie Li
Chung-Ching Lin
E. Azarnasab
...
Lin Liang
Zicheng Liu
Yumao Lu
Ce Liu
Lijuan Wang
MLLM
28
63
0
30 Oct 2023
torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free
  Deep Learning Studies: A Case Study on NLP
torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free Deep Learning Studies: A Case Study on NLP
Yoshitomo Matsubara
VLM
34
1
0
26 Oct 2023
Can Language Models Laugh at YouTube Short-form Videos?
Can Language Models Laugh at YouTube Short-form Videos?
Dayoon Ko
Sangho Lee
Gunhee Kim
36
7
0
22 Oct 2023
SALMONN: Towards Generic Hearing Abilities for Large Language Models
SALMONN: Towards Generic Hearing Abilities for Large Language Models
Changli Tang
Wenyi Yu
Guangzhi Sun
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
LM&MA
AuLLM
39
206
0
20 Oct 2023
CLARA: Multilingual Contrastive Learning for Audio Representation
  Acquisition
CLARA: Multilingual Contrastive Learning for Audio Representation Acquisition
K. A. Noriy
Xiaosong Yang
Marcin Budka
Jian Jun Zhang
VLM
26
3
0
18 Oct 2023
ViPE: Visualise Pretty-much Everything
ViPE: Visualise Pretty-much Everything
Hassan Shahmohammadi
Adhiraj Ghosh
Hendrik P. A. Lensch
DiffM
28
1
0
16 Oct 2023
Optimized Tokenization for Transcribed Error Correction
Optimized Tokenization for Transcribed Error Correction
Tomer Wullach
Shlomo E. Chazan
32
0
0
16 Oct 2023
Advancing Audio Emotion and Intent Recognition with Large Pre-Trained
  Models and Bayesian Inference
Advancing Audio Emotion and Intent Recognition with Large Pre-Trained Models and Bayesian Inference
Dejan Porjazovski
Yaroslav Getman
Tamás Grósz
M. Kurimo
30
3
0
16 Oct 2023
Farzi Data: Autoregressive Data Distillation
Farzi Data: Autoregressive Data Distillation
Noveen Sachdeva
Zexue He
Wang-Cheng Kang
Jianmo Ni
D. Cheng
Julian McAuley
DD
23
3
0
15 Oct 2023
Homophone Disambiguation Reveals Patterns of Context Mixing in Speech
  Transformers
Homophone Disambiguation Reveals Patterns of Context Mixing in Speech Transformers
Hosein Mohebbi
Grzegorz Chrupała
Willem H. Zuidema
A. Alishahi
36
12
0
15 Oct 2023
Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text
Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text
Chanho Park
Chengsong Lu
Mingjie Chen
Thomas Hain
28
3
0
12 Oct 2023
AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
32
36
0
10 Oct 2023
S4Sleep: Elucidating the design space of deep-learning-based sleep stage classification models
S4Sleep: Elucidating the design space of deep-learning-based sleep stage classification models
Tiezhi Wang
Nils Strodthoff
47
5
0
10 Oct 2023
Fine-grained Audio-Visual Joint Representations for Multimodal Large
  Language Models
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models
Guangzhi Sun
Wenyi Yu
Changli Tang
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
33
12
0
09 Oct 2023
Write What You Want: Applying Text-to-video Retrieval to Audiovisual
  Archives
Write What You Want: Applying Text-to-video Retrieval to Audiovisual Archives
Yuchen Yang
VGen
19
7
0
09 Oct 2023
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Anna Langedijk
Hosein Mohebbi
Gabriele Sarti
Willem H. Zuidema
Jaap Jumelet
32
10
0
05 Oct 2023
uTalk: Bridging the Gap Between Humans and AI
uTalk: Bridging the Gap Between Humans and AI
Hussam Azzuni
Sharim Jamal
Abdulmotaleb Elsaddik
19
6
0
04 Oct 2023
GeRA: Label-Efficient Geometrically Regularized Alignment
GeRA: Label-Efficient Geometrically Regularized Alignment
Dustin Klebe
Tal Shnitzer
Mikhail Yurochkin
Leonid Karlinsky
Justin Solomon
13
2
0
01 Oct 2023
Data Filtering Networks
Data Filtering Networks
Alex Fang
Albin Madappally Jose
Amit Jain
Ludwig Schmidt
Alexander Toshev
Vaishaal Shankar
CLIP
46
125
0
29 Sep 2023
Previous
123...1011789
Next