ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.04356
  4. Cited By
Robust Speech Recognition via Large-Scale Weak Supervision

Robust Speech Recognition via Large-Scale Weak Supervision

6 December 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
    OffRL
ArXivPDFHTML

Papers citing "Robust Speech Recognition via Large-Scale Weak Supervision"

50 / 496 papers shown
Title
BiMarker: Enhancing Text Watermark Detection for Large Language Models with Bipolar Watermarks
BiMarker: Enhancing Text Watermark Detection for Large Language Models with Bipolar Watermarks
Zhuang Li
50
1
0
21 Jan 2025
Noise-Agnostic Multitask Whisper Training for Reducing False Alarm Errors in Call-for-Help Detection
Noise-Agnostic Multitask Whisper Training for Reducing False Alarm Errors in Call-for-Help Detection
Myeonghoon Ryu
June-Woo Kim
Minseok Oh
Suji Lee
Han Park
41
0
0
20 Jan 2025
LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations
LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations
Soumya Dutta
Sriram Ganapathy
39
2
0
20 Jan 2025
Unsupervised Rhythm and Voice Conversion of Dysarthric to Healthy Speech for ASR
Unsupervised Rhythm and Voice Conversion of Dysarthric to Healthy Speech for ASR
Karl El Hajal
Enno Hermann
Ajinkya Kulkarni
Mathew Magimai.-Doss
36
0
0
20 Jan 2025
Human-like Nonverbal Behavior with MetaHumans in Real-World Interaction Studies: An Architecture Using Generative Methods and Motion Capture
Human-like Nonverbal Behavior with MetaHumans in Real-World Interaction Studies: An Architecture Using Generative Methods and Motion Capture
Oliver Chojnowski
Alexander Eberhard
Michael Schiffmann
Ana Müller
Anja Richert
AI4CE
34
0
0
18 Jan 2025
USED: Universal Speaker Extraction and Diarization
USED: Universal Speaker Extraction and Diarization
Junyi Ao
Mehmet Sinan Yildirim
Ruijie Tao
Mengyao Ge
Shuai Wang
Yan-min Qian
Haizhou Li
41
5
0
17 Jan 2025
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Junyi Ao
Yuancheng Wang
Xiaohai Tian
Dekun Chen
Jingyang Zhang
Lu Lu
Yixuan Wang
Haizhou Li
Zhikai Wu
AuLLM
90
17
0
17 Jan 2025
Target Speaker ASR with Whisper
Target Speaker ASR with Whisper
Alexander Polok
Dominik Klement
Matthew Wiesner
Sanjeev Khudanpur
J. Černocký
L. Burget
107
1
0
17 Jan 2025
A Non-autoregressive Model for Joint STT and TTS
A Non-autoregressive Model for Joint STT and TTS
Vishal Sunder
Brian Kingsbury
G. Saon
Samuel Thomas
Slava Shechtman Hagai Aronowitz
Hagai Aronowitz
Eric Fosler-Lussier
Luis A. Lastras
63
0
0
15 Jan 2025
WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning
WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning
Rajath Rao
Adithya V Ganesan
O. Kjell
Jonah Luby
Akshay Raghavan
...
B. Luft
Camilo Ruggero
Neville Ryant
R. Kotov
H. A. Schwartz
37
0
0
15 Jan 2025
Speech Recognition for Automatically Assessing Afrikaans and isiXhosa Preschool Oral Narratives
Speech Recognition for Automatically Assessing Afrikaans and isiXhosa Preschool Oral Narratives
C. Jacobs
Annelien Smith
Daleen Klop
Ondřej Klejch
Febe de Wet
Herman Kamper
49
0
0
11 Jan 2025
A Survey on Spoken Italian Datasets and Corpora
A Survey on Spoken Italian Datasets and Corpora
Marco Giordano
Claudia Rinaldi
41
0
0
11 Jan 2025
Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented Conversational AI
Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented Conversational AI
Yuya Asano
Sabit Hassan
P. Sharma
Anthony Sicilia
Katherine Atwell
Diane Litman
Malihe Alikhani
39
0
0
10 Jan 2025
LUPET: Incorporating Hierarchical Information Path into Multilingual ASR
LUPET: Incorporating Hierarchical Information Path into Multilingual ASR
Wei Liu
Jingyong Hou
Dong Yang
Muyong Cao
Tan Lee
77
1
0
10 Jan 2025
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Tsz Kin Lam
Marco Gaido
Sara Papi
L. Bentivogli
Barry Haddow
36
0
0
04 Jan 2025
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Jeong Hun Yeo
Chae Won Kim
Hyunjun Kim
Hyeongseop Rha
Seunghee Han
Wen-Huang Cheng
Y. Ro
59
3
0
03 Jan 2025
Fotheidil: an Automatic Transcription System for the Irish Language
Liam Lonergan
Ibon Saratxaga
John Sloan
Oscar Maharog
Mengjie Qian
Neasa Ní Chiaráin
Christer Gobl
A. N. Chasaide
29
0
0
03 Jan 2025
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Helin Wang
Meng Yu
Jiarui Hai
Chen Chen
Yuchen Hu
Rilin Chen
Najim Dehak
Dong Yu
87
3
0
03 Jan 2025
EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion
EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion
Ashishkumar Gudmalwar
Ishan D. Biyani
Nirmesh J. Shah
Pankaj Wasnik
R. Shah
DiffM
26
0
0
31 Dec 2024
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Ji-Hoon Kim
Hong-Sun Yang
Yoon-Cheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
BDL
54
0
0
31 Dec 2024
DiFiC: Your Diffusion Model Holds the Secret to Fine-Grained Clustering
DiFiC: Your Diffusion Model Holds the Secret to Fine-Grained Clustering
Ruohong Yang
Peng Hu
Xi Peng
Xiting Liu
Yunfan Li
39
0
0
25 Dec 2024
autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks
autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks
Simon Rampp
Andreas Triantafyllopoulos
M. Milling
Björn Schuller
85
0
0
16 Dec 2024
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Gaoxiang Cong
Jiadong Pan
Liang-Sheng Li
Yuankai Qi
Yuxin Peng
Anton Van Den Hengel
Jian Yang
Qingming Huang
92
6
0
12 Dec 2024
MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models
MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models
Thai-Binh Nguyen
Alexander Waibel
79
1
0
27 Nov 2024
AMPS: ASR with Multimodal Paraphrase Supervision
AMPS: ASR with Multimodal Paraphrase Supervision
Amruta Parulekar
Abhishek Gupta
Sameep Chattopadhyay
P. Jyothi
75
0
0
27 Nov 2024
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Weijia Wu
Mingyu Liu
Zeyu Zhu
Xi Xia
Haoen Feng
Wen Wang
Kevin Qinghong Lin
Chunhua Shen
Mike Zheng Shou
DiffM
VGen
122
1
0
22 Nov 2024
Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation
Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation
Seokil Ham
H. Kim
Sangmin Woo
Changick Kim
Mamba
186
0
0
21 Nov 2024
StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification
StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification
Yichen He
Yuan Lin
Jianchao Wu
Hanchong Zhang
Yuchen Zhang
Ruicheng Le
VGen
VLM
151
2
0
11 Nov 2024
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Dennis Fucci
Marco Gaido
Beatrice Savoldi
Matteo Negri
Mauro Cettolo
L. Bentivogli
54
1
0
03 Nov 2024
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Lawrence Jang
Yinheng Li
Charles Ding
Justin Lin
Paul Pu Liang
Dan Zhao
Rogerio Bonatti
K. Koishida
46
5
0
24 Oct 2024
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Qinglin Zhang
Luyao Cheng
Chong Deng
Qian Chen
Wen Wang
...
Jiaqing Liu
Hai Yu
Chaohong Tan
Zhihao Du
Shiliang Zhang
SyDa
BDL
AuLLM
VLM
56
11
0
23 Oct 2024
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Yifan Peng
Krishna C. Puvvada
Zhehuai Chen
Piotr .Zelasko
He Huang
Kunal Dhawan
Ke Hu
Shinji Watanabe
Jagadeesh Balam
Boris Ginsburg
58
2
0
23 Oct 2024
Continuous Speech Tokenizer in Text To Speech
Continuous Speech Tokenizer in Text To Speech
Yixing Li
Ruobing Xie
Xingchen Sun
Yu Cheng
Zhanhui Kang
AuLLM
CLL
63
2
0
22 Oct 2024
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant
Alan Dao
Dinh Bach Vu
Huy Hoang Ha
AuLLM
VLM
73
3
0
20 Oct 2024
BrainECHO: Semantic Brain Signal Decoding through Vector-Quantized Spectrogram Reconstruction for Whisper-Enhanced Text Generation
BrainECHO: Semantic Brain Signal Decoding through Vector-Quantized Spectrogram Reconstruction for Whisper-Enhanced Text Generation
Juntao Li
Zhenxi Song
Jiaqi Wang
Min Zhang
Honghai Liu
Min Zhang
Zhiguo Zhang
31
1
0
19 Oct 2024
A Framework for Adapting Human-Robot Interaction to Diverse User Groups
A Framework for Adapting Human-Robot Interaction to Diverse User Groups
Theresa Pekarek-Rosin
Vanessa Hassouna
Xiaowen Sun
Luca Krohm
Henri-Leon Kordt
Michael Beetz
Stefan Wermter
28
0
0
15 Oct 2024
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval
Reno Kriz
Kate Sanders
David Etter
Kenton W. Murray
Cameron Carpenter
...
Alexander Martin
Ronald Colaianni
Nolan King
Eugene Yang
Benjamin Van Durme
VGen
43
2
0
15 Oct 2024
Characterizing the MrDeepFakes Sexual Deepfake Marketplace
Characterizing the MrDeepFakes Sexual Deepfake Marketplace
Catherine Han
Anne Li
Deepak Kumar
Zakir Durumeric
29
1
0
14 Oct 2024
Improving Semantic Understanding in Speech Language Models via Brain-tuning
Improving Semantic Understanding in Speech Language Models via Brain-tuning
Omer Moussa
Dietrich Klakow
Mariya Toneva
52
3
0
11 Oct 2024
The First VoicePrivacy Attacker Challenge Evaluation Plan
The First VoicePrivacy Attacker Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Emmanuel Vincent
Junichi Yamagishi
128
2
0
09 Oct 2024
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Cheol Jun Cho
Nicholas Lee
Akshat Gupta
Dhruv Agarwal
Ethan Chen
Alan W Black
Gopala K. Anumanchipalli
34
0
0
09 Oct 2024
Organizing Unstructured Image Collections using Natural Language
Organizing Unstructured Image Collections using Natural Language
Mingxuan Liu
Zhun Zhong
Jun Li
Gianni Franchi
Subhankar Roy
Elisa Ricci
VLM
39
3
0
07 Oct 2024
Efficiently Identifying Low-Quality Language Subsets in Multilingual
  Datasets: A Case Study on a Large-Scale Multilingual Audio Dataset
Efficiently Identifying Low-Quality Language Subsets in Multilingual Datasets: A Case Study on a Large-Scale Multilingual Audio Dataset
Farhan Samir
Emily P. Ahn
Shreya Prakash
Márton Soskuthy
Vered Shwartz
Jian Zhu
26
0
0
05 Oct 2024
Context and System Fusion in Post-ASR Emotion Recognition with Large
  Language Models
Context and System Fusion in Post-ASR Emotion Recognition with Large Language Models
Pavel Stepachev
Pinzhen Chen
Barry Haddow
33
0
0
04 Oct 2024
Reverb: Open-Source ASR and Diarization from Rev
Reverb: Open-Source ASR and Diarization from Rev
Nishchal Bhandari
Danny Chen
Miguel Ángel del Río Fernández
Natalie Delworth
Jennifer Drexler Fox
...
Ondrej Novotný
Jan Profant
Nan Qin
Martin Ratajczak
Jean-Philippe Robichaud
VLM
33
1
0
04 Oct 2024
Differentially Private Parameter-Efficient Fine-tuning for Large ASR
  Models
Differentially Private Parameter-Efficient Fine-tuning for Large ASR Models
Hongbin Liu
Lun Wang
Om Thakkar
Abhradeep Thakurta
Arun Narayanan
31
0
0
02 Oct 2024
SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios
SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios
Kai Li
Wendi Sang
Chang Zeng
Runxuan Yang
Guo Chen
Xiaolin Hu
31
2
0
02 Oct 2024
The Conformer Encoder May Reverse the Time Dimension
The Conformer Encoder May Reverse the Time Dimension
Robin Schmitt
Albert Zeyer
Mohammad Zeineldeen
Ralf Schluter
Hermann Ney
36
0
0
01 Oct 2024
Recent Advances in Speech Language Models: A Survey
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
61
14
0
01 Oct 2024
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Kun Yuan
V. Srivastav
Nassir Navab
N. Padoy
44
7
0
30 Sep 2024
Previous
123456...8910
Next