Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1912.05533
Cited By
SpecAugment on Large Scale Datasets
11 December 2019
Daniel S. Park
Yu Zhang
Chung-Cheng Chiu
Youzheng Chen
Yue Liu
William Chan
Quoc V. Le
Yonghui Wu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SpecAugment on Large Scale Datasets"
50 / 89 papers shown
Title
Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition
Asahi Sakuma
Hiroaki Sato
Ryuga Sugano
Tadashi Kumano
Yoshihiko Kawai
Tetsuji Ogawa
30
0
0
09 Jun 2025
Robust fine-tuning of speech recognition models via model merging: application to disordered speech
Alexandre Ducorroy
Rachid Riad
MoMe
33
0
0
26 May 2025
Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition
Yosuke Higuchi
Tetsuji Ogawa
Tetsunori Kobayashi
AuLLM
87
1
0
08 Jan 2025
Reassessing Noise Augmentation Methods in the Context of Adversarial Speech
Karla Pizzi
Matías Pizarro
Asja Fischer
60
0
0
03 Sep 2024
DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module
Xinyu Wang
Qian Wang
Haolin Huang
Yu Fang
Mengjie Xu
Qian Wang
93
0
0
31 Aug 2024
Improving noisy student training for low-resource languages in End-to-End ASR using CycleGAN and inter-domain losses
C. Li
Ngoc Thang Vu
74
4
0
26 Jul 2024
Investigating the Effect of Label Topology and Training Criterion on ASR Performance and Alignment Quality
Tina Raissi
Christoph Luscher
Simon Berger
Ralf Schluter
Hermann Ney
67
2
0
16 Jul 2024
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers
David Gimeno-Gómez
Carlos David Martínez Hinarejos
180
2
0
09 Jul 2024
Mixture of Mixups for Multi-label Classification of Rare Anuran Sounds
Ilyass Moummad
Nicolas Farrugia
Romain Serizel
Jérémy S. P. Froidevaux
Vincent Lostanlen
84
1
0
14 Mar 2024
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
Maxime Burchi
Krishna C. Puvvada
Jagadeesh Balam
Boris Ginsburg
Radu Timofte
83
8
0
14 Mar 2024
HINT: High-quality INPainting Transformer with Mask-Aware Encoding and Enhanced Attention
Shuang Chen
Amir Atapour-Abarghouei
Hubert P. H. Shum
ViT
71
20
0
22 Feb 2024
Consistency Based Unsupervised Self-training For ASR Personalisation
Jisi Zhang
Vandana Rajan
Haaris Mehmood
David Tuckey
Pablo Peso Parada
Md. Asif Jalal
Karthikeyan P. Saravanan
Gil Ho Lee
Jungin Lee
Seokyeong Jung
48
0
0
22 Jan 2024
Self-Supervised Learning for Few-Shot Bird Sound Classification
Ilyass Moummad
Romain Serizel
Nicolas Farrugia
SSL
110
10
0
25 Dec 2023
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
64
1
0
18 Dec 2023
Phoneme-aware Encoding for Prefix-tree-based Contextual ASR
Hayato Futami
E. Tsunoo
Yosuke Kashiwagi
Hiroaki Ogawa
Siddhant Arora
Shinji Watanabe
69
7
0
15 Dec 2023
Towards Automatic Data Augmentation for Disordered Speech Recognition
Zengrui Jin
Xurong Xie
Tianzi Wang
Mengzhe Geng
Jiajun Deng
Guinan Li
Shujie Hu
Xunying Liu
60
2
0
14 Dec 2023
Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Jianqiao Lu
Wenyong Huang
Nianzu Zheng
Xingshan Zeng
Y. Yeung
Xiao Chen
SyDa
63
1
0
09 Oct 2023
One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition
Samuele Cornell
Jee-weon Jung
Shinji Watanabe
S. Squartini
VLM
121
19
0
02 Oct 2023
KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods
Antoine Nzeyimana
SSL
136
0
0
23 Aug 2023
Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition
Belen Alastruey
Lukas Drude
Jahn Heymann
Simon Wiesler
65
1
0
12 Jun 2023
Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition
Haoyu Tang
Zhaoyi Liu
Chang Zeng
Xinfeng Li
58
1
0
23 Mar 2023
End-to-End Speech Recognition: A Survey
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
87
172
0
03 Mar 2023
Audio-Visual Efficient Conformer for Robust Speech Recognition
Maxime Burchi
Radu Timofte
VLM
86
35
0
04 Jan 2023
Visual Transformers for Primates Classification and Covid Detection
Steffen Illium
Robert Muller
Andreas Sedlmeier
Claudia Linnhoff-Popien
76
11
0
20 Dec 2022
Contextual-Utterance Training for Automatic Speech Recognition
Alejandro Gomez-Alanis
Lukas Drude
A. Schwarz
Rupak Vignesh Swaminathan
Simon Wiesler
64
1
0
27 Oct 2022
G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR
Gary Wang
Ekin D.Cubuk
Andrew Rosenberg
Shuyang Cheng
Ron J. Weiss
Bhuvana Ramabhadran
Pedro J. Moreno
Quoc V. Le
Daniel S. Park
117
2
0
19 Oct 2022
A Policy-based Approach to the SpecAugment Method for Low Resource E2E ASR
Rui Li
Guodong Ma
Dexin Zhao
Ranran Zeng
Xiaoyu Li
Haolin Huang
71
2
0
16 Oct 2022
Foundation Transformers
Hongyu Wang
Shuming Ma
Shaohan Huang
Li Dong
Wenhui Wang
...
Barun Patra
Zhun Liu
Vishrav Chaudhary
Xia Song
Furu Wei
AI4CE
91
27
0
12 Oct 2022
Simple Pooling Front-ends For Efficient Audio Classification
Xubo Liu
Haohe Liu
Qiuqiang Kong
Xinhao Mei
Mark D. Plumbley
Wenwu Wang
107
17
0
03 Oct 2022
A Comparison of Transformer, Convolutional, and Recurrent Neural Networks on Phoneme Recognition
Kyuhong Shim
Wonyong Sung
71
2
0
01 Oct 2022
A Language Agnostic Multilingual Streaming On-Device ASR System
Yue Liu
Tara N. Sainath
Ruoming Pang
Shuo-yiin Chang
Qiumin Xu
...
Qiao Liang
Heguang Liu
Yanzhang He
Parisa Haghani
Sameer Bidichandani
AuLLM
64
11
0
29 Aug 2022
Improving Mandarin Speech Recogntion with Block-augmented Transformer
Xiaoming Ren
Huifeng Zhu
Liuwei Wei
Minghui Wu
Jie Hao
108
10
0
24 Jul 2022
pMCT: Patched Multi-Condition Training for Robust Speech Recognition
Pablo Peso Parada
A. Dobrowolska
Karthikeyan P. Saravanan
Mete Ozay
97
6
0
11 Jul 2022
Speech Augmentation Based Unsupervised Learning for Keyword Spotting
Jian Luo
Jianzong Wang
Ning Cheng
Haobin Tang
Jing Xiao
SSL
72
2
0
28 May 2022
Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep Neural Network, a Survey
Paul Wimmer
Jens Mehnert
Alexandru Paul Condurache
DD
98
21
0
17 May 2022
Separator-Transducer-Segmenter: Streaming Recognition and Segmentation of Multi-party Speech
Ilya Sklyar
A. Piunova
Christian Osendorfer
66
6
0
10 May 2022
Efficient Training of Neural Transducer for Speech Recognition
Wei Zhou
Wilfried Michel
Ralf Schluter
Hermann Ney
AI4TS
99
24
0
22 Apr 2022
Extracting Targeted Training Data from ASR Models, and How to Mitigate It
Ehsan Amid
Om Thakkar
A. Narayanan
Rajiv Mathews
Franccoise Beaufays
46
9
0
18 Apr 2022
Improving Rare Word Recognition with LM-aware MWER Training
Weiran Wang
Tongzhou Chen
Tara N. Sainath
Ehsan Variani
Rohit Prabhavalkar
...
S. Mavandadi
Cal Peyser
Trevor Strohman
Yanzhang He
David Rybach
KELM
85
13
0
15 Apr 2022
Data Augmentation for Electrocardiograms
Aniruddh Raghu
Divya Shanmugam
E. Pomerantsev
John Guttag
Collin M. Stultz
63
20
0
09 Apr 2022
Interspace Pruning: Using Adaptive Filter Representations to Improve Training of Sparse CNNs
Paul Wimmer
Jens Mehnert
Alexandru Paul Condurache
CVBM
64
20
0
15 Mar 2022
Korean Tokenization for Beam Search Rescoring in Speech Recognition
Kyuhong Shim
Hyewon Bae
Wonyong Sung
47
0
0
22 Feb 2022
Streaming Multi-Talker ASR with Token-Level Serialized Output Training
Naoyuki Kanda
Jian Wu
Yu Wu
Xiong Xiao
Zhong Meng
Xiaofei Wang
Yashesh Gaur
Zhuo Chen
Jinyu Li
Takuya Yoshioka
142
60
0
02 Feb 2022
Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection
Minglun Han
Linhao Dong
Zhenlin Liang
Meng Cai
Shiyu Zhou
Zejun Ma
Bo Xu
80
46
0
30 Jan 2022
Improving the fusion of acoustic and text representations in RNN-T
Chao Zhang
Yue Liu
Zhiyun Lu
Tara N. Sainath
Shuo-yiin Chang
AI4CE
100
12
0
25 Jan 2022
SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training
Wenyong Huang
Zhenhe Zhang
Y. Yeung
Xin Jiang
Qun Liu
111
23
0
25 Jan 2022
Multi-turn RNN-T for streaming recognition of multi-party speech
Ilya Sklyar
A. Piunova
Xianrui Zheng
Yulan Liu
114
24
0
19 Dec 2021
PM-MMUT: Boosted Phone-Mask Data Augmentation using Multi-Modeling Unit Training for Phonetic-Reduction-Robust E2E Speech Recognition
Guodong Ma
Pengfei Hu
Nurmemet Yolwas
Shen Huang
Hao-Ming Huang
92
4
0
13 Dec 2021
Deliberation of Streaming RNN-Transducer by Non-autoregressive Decoding
Weiran Wang
Ke Hu
Tara N. Sainath
63
21
0
01 Dec 2021
A comparison of streaming models and data augmentation methods for robust speech recognition
Jiyeon Kim
Mehul Kumar
Dhananjaya N. Gowda
Abhinav Garg
Chanwoo Kim
86
6
0
19 Nov 2021
1
2
Next