Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.04356
Cited By
Robust Speech Recognition via Large-Scale Weak Supervision
6 December 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Robust Speech Recognition via Large-Scale Weak Supervision"
50 / 506 papers shown
Title
SONICS: Synthetic Or Not -- Identifying Counterfeit Songs
Md Awsafur Rahman
Zaber Ibn Abdul Hakim
Najibul Haque Sarker
Bishmoy Paul
S. Fattah
46
7
0
26 Aug 2024
Sample-Independent Federated Learning Backdoor Attack in Speaker Recognition
Weida Xu
Yang Xu
Sicong Zhang
FedML
AAML
41
0
0
25 Aug 2024
D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching
Jingyu Liu
Minquan Wang
Ye Ma
Bo Wang
Aozhu Chen
Quan Chen
Peng Jiang
Xirong Li
48
1
0
23 Aug 2024
MuRAR: A Simple and Effective Multimodal Retrieval and Answer Refinement Framework for Multimodal Question Answering
Zhengyuan Zhu
Daniel Lee
Hong Zhang
Sai Sree Harsha
Loic Feujio
Akash Maharaj
Yunyao Li
24
2
0
16 Aug 2024
Enhancing Large Language Model-based Speech Recognition by Contextualization for Rare and Ambiguous Words
Kento Nozawa
Takashi Masuko
Toru Taniguchi
43
1
0
15 Aug 2024
End-to-end Semantic-centric Video-based Multimodal Affective Computing
Ronghao Lin
Ying Zeng
Sijie Mai
Haifeng Hu
VGen
45
0
0
14 Aug 2024
An Investigation Into Explainable Audio Hate Speech Detection
Jinmyeong An
Wonjun Lee
Yejin Jeon
Jungseul Ok
Yunsu Kim
Gary Geunbae Lee
30
2
0
12 Aug 2024
Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate
Yiqun Zhang
Xiaocui Yang
Shi Feng
Daling Wang
Yifei Zhang
Kaisong Song
LLMAG
40
4
0
08 Aug 2024
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond
Beomseok Lee
Ioan Calapodescu
Marco Gaido
Matteo Negri
Laurent Besacier
AuLLM
39
3
0
07 Aug 2024
Language Model Can Listen While Speaking
Ziyang Ma
Yakun Song
Chenpeng Du
Jian Cong
Zhuo Chen
Yuping Wang
Yansen Wang
Xie Chen
AuLLM
37
23
0
05 Aug 2024
TrustNavGPT: Modeling Uncertainty to Improve Trustworthiness of Audio-Guided LLM-Based Robot Navigation
Xingpeng Sun
Yiran Zhang
Xindi Tang
Amrit Singh Bedi
Aniket Bera
50
4
0
03 Aug 2024
Neural Network Emulator for Atmospheric Chemical ODE
Zhi-Song Liu
Petri S. Clusius
Michael Boy
42
3
0
03 Aug 2024
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
46
5
0
31 Jul 2024
Accelerating Large Language Model Inference with Self-Supervised Early Exits
Florian Valade
LRM
44
1
0
30 Jul 2024
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
Samuele Cornell
Taejin Park
Steve Huang
Christoph Boeddeker
Xuankai Chang
Matthew Maciejewski
Matthew Wiesner
Paola García
Shinji Watanabe
39
9
0
23 Jul 2024
J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling
Wataru Nakata
Kentaro Seki
Hitomi Yanaka
Yuki Saito
Shinnosuke Takamichi
Hiroshi Saruwatari
AuLLM
43
0
0
22 Jul 2024
Empirical Capacity Model for Self-Attention Neural Networks
Aki Härmä
M. Pietrasik
Anna Wilbik
42
1
0
22 Jul 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
39
4
0
21 Jul 2024
Audio-visual training for improved grounding in video-text LLMs
Shivprasad Sagare
Hemachandran S
Kinshuk Sarabhai
Prashant Ullegaddi
SA Rajeshkumar
30
0
0
21 Jul 2024
Morphosyntactic Analysis for CHILDES
Houjun Liu
Brian MacWhinney
24
1
0
17 Jul 2024
Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors
J. Hauret
Malo Olivier
Thomas Joubaud
C. Langrenne
Sarah Poirée
V. Zimpfer
Éric Bavu
80
1
0
16 Jul 2024
Walk along: An Experiment on Controlling the Mobile Robot 'Spot' with Voice and Gestures
Renchi Zhang
Jesse van der Linden
Dimitra Dodou
H. Seyffert
Y. B. Eisma
J. D. Winter
48
0
0
15 Jul 2024
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers
David Gimeno-Gómez
Carlos David Martínez Hinarejos
96
2
0
09 Jul 2024
Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities
Avinash Anand
Chayan Tank
Sarthak Pol
Vinayak Katoch
Shaina Mehta
R. Shah
40
4
0
08 Jul 2024
MINDECHO: Role-Playing Language Agents for Key Opinion Leaders
Rui Xu
Dakuan Lu
Xiaoyu Tan
Xintao Wang
Siyu Yuan
Jiangjie Chen
Wei Chu
Xu Yinghui
LLMAG
34
3
0
07 Jul 2024
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Ye Bai
Jingping Chen
Jitong Chen
Wei Chen
Zhuo Chen
...
Wanyi Zhang
Yang Zhang
Yawei Zhang
Yijie Zheng
Ming Zou
AuLLM
49
19
0
05 Jul 2024
Prosody-Driven Privacy-Preserving Dementia Detection
Dominika Woszczyk
Ranya Aloufi
Soteris Demetriou
34
2
0
03 Jul 2024
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
Jianzhu Guo
Dingyun Zhang
Xiaoqiang Liu
Zhizhou Zhong
Yuan Zhang
Pengfei Wan
Di Zhang
VGen
63
54
0
03 Jul 2024
Multi-View Black-Box Physical Attacks on Infrared Pedestrian Detectors Using Adversarial Infrared Grid
Kalibinuer Tiliwalidi
Chengyin Hu
Weiwen Shi
AAML
28
1
0
01 Jul 2024
Cross-Lingual Transfer Learning for Speech Translation
Rao Ma
Yassir Fathullah
Mengjie Qian
Siyuan Tang
Mark J. F. Gales
Kate Knill
28
1
0
01 Jul 2024
Clustering in pure-attention hardmax transformers and its role in sentiment analysis
Albert Alcalde
Giovanni Fantuzzi
Enrique Zuazua
35
3
0
26 Jun 2024
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
Jiafeng Liang
Shixin Jiang
Zekun Wang
Haojie Pan
Zerui Chen
Zheng Chu
Ming Liu
Ruiji Fu
Zhongyuan Wang
Bing Qin
29
2
0
26 Jun 2024
FASA: a Flexible and Automatic Speech Aligner for Extracting High-quality Aligned Children Speech Data
Dancheng Liu
Jinjun Xiong
33
0
0
25 Jun 2024
Generative AI Systems: A Systems-based Perspective on Generative AI
Jakub M. Tomczak
50
1
0
25 Jun 2024
AG-LSEC: Audio Grounded Lexical Speaker Error Correction
Rohit Paturi
Xiang Li
S. Srinivasan
36
1
0
25 Jun 2024
Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation
Yingting Li
Ambuj Mehrish
Bryan Chew
Bo Cheng
Soujanya Poria
40
0
0
25 Jun 2024
Sound Tagging in Infant-centric Home Soundscapes
Mohammad Nur Hossain Khan
Jialu Li
Nancy L. McElwain
M. Hasegawa-Johnson
Bashima Islam
18
0
0
25 Jun 2024
Towards Zero-Shot Text-To-Speech for Arabic Dialects
Khai Duy Doan
Abdul Waheed
Muhammad Abdul-Mageed
40
0
0
24 Jun 2024
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer
Lu Zhang
Tiancheng Zhao
Heting Ying
Yibo Ma
Kyusong Lee
LLMAG
38
9
0
24 Jun 2024
Perception of Phonological Assimilation by Neural Speech Recognition Models
Charlotte Pouw
Marianne de Heer Kloots
A. Alishahi
Willem H. Zuidema
49
2
0
21 Jun 2024
MU-Bench: A Multitask Multimodal Benchmark for Machine Unlearning
Jiali Cheng
Hadi Amiri
BDL
43
3
0
21 Jun 2024
On Newton's Method to Unlearn Neural Networks
Nhung Bui
Xinyang Lu
Rachael Hwee Ling Sim
See-Kiong Ng
Bryan Kian Hsiang Low
MU
41
2
0
20 Jun 2024
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Yunxin Li
Xinyu Chen
Baotian Hu
Longyue Wang
Haoyuan Shi
Min-Ling Zhang
MLLM
LRM
56
25
0
17 Jun 2024
Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Block Gradient Descent
Lin Wang
Zhichao Wang
Xiaoying Tang
45
1
0
17 Jun 2024
Large Language Models for Dysfluency Detection in Stuttered Speech
Dominik Wagner
Sebastian P. Bayerl
Ilja Baumann
Korbinian Riedhammer
Elmar Nöth
Tobias Bocklet
45
3
0
16 Jun 2024
Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models
Dominik Wagner
Ilja Baumann
Korbinian Riedhammer
Tobias Bocklet
MQ
32
1
0
16 Jun 2024
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving
Bhavani Shankar
P. Jyothi
Pushpak Bhattacharyya
48
1
0
16 Jun 2024
Optimizing Automatic Speech Assessment: W-RankSim Regularization and Hybrid Feature Fusion Strategies
Chung-Wen Wu
Berlin Chen
48
0
0
16 Jun 2024
Large Language Models for Automatic Milestone Detection in Group Discussions
Zhuoxu Duan
Zhengye Yang
Samuel Westby
Christoph Riedl
B. F. Welles
Richard J. Radke
30
0
0
16 Jun 2024
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
Pooneh Mousavi
J. Duret
Salah Zaiem
Luca Della Libera
Artem Ploujnikov
Cem Subakan
Mirco Ravanelli
42
9
0
15 Jun 2024
Previous
1
2
3
4
5
6
...
9
10
11
Next