ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.05533
  4. Cited By
SpecAugment on Large Scale Datasets

SpecAugment on Large Scale Datasets

11 December 2019
Daniel S. Park
Yu Zhang
Chung-Cheng Chiu
Youzheng Chen
Yue Liu
William Chan
Quoc V. Le
Yonghui Wu
ArXiv (abs)PDFHTML

Papers citing "SpecAugment on Large Scale Datasets"

50 / 89 papers shown
Title
Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition
Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition
Asahi Sakuma
Hiroaki Sato
Ryuga Sugano
Tadashi Kumano
Yoshihiko Kawai
Tetsuji Ogawa
30
0
0
09 Jun 2025
Robust fine-tuning of speech recognition models via model merging: application to disordered speech
Robust fine-tuning of speech recognition models via model merging: application to disordered speech
Alexandre Ducorroy
Rachid Riad
MoMe
33
0
0
26 May 2025
Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition
Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition
Yosuke Higuchi
Tetsuji Ogawa
Tetsunori Kobayashi
AuLLM
87
1
0
08 Jan 2025
Reassessing Noise Augmentation Methods in the Context of Adversarial
  Speech
Reassessing Noise Augmentation Methods in the Context of Adversarial Speech
Karla Pizzi
Matías Pizarro
Asja Fischer
60
0
0
03 Sep 2024
DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module
DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module
Xinyu Wang
Qian Wang
Haolin Huang
Yu Fang
Mengjie Xu
Qian Wang
93
0
0
31 Aug 2024
Improving noisy student training for low-resource languages in
  End-to-End ASR using CycleGAN and inter-domain losses
Improving noisy student training for low-resource languages in End-to-End ASR using CycleGAN and inter-domain losses
C. Li
Ngoc Thang Vu
74
4
0
26 Jul 2024
Investigating the Effect of Label Topology and Training Criterion on ASR
  Performance and Alignment Quality
Investigating the Effect of Label Topology and Training Criterion on ASR Performance and Alignment Quality
Tina Raissi
Christoph Luscher
Simon Berger
Ralf Schluter
Hermann Ney
67
2
0
16 Jul 2024
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers
David Gimeno-Gómez
Carlos David Martínez Hinarejos
180
2
0
09 Jul 2024
Mixture of Mixups for Multi-label Classification of Rare Anuran Sounds
Mixture of Mixups for Multi-label Classification of Rare Anuran Sounds
Ilyass Moummad
Nicolas Farrugia
Romain Serizel
Jérémy S. P. Froidevaux
Vincent Lostanlen
84
1
0
14 Mar 2024
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast
  Conformer
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
Maxime Burchi
Krishna C. Puvvada
Jagadeesh Balam
Boris Ginsburg
Radu Timofte
83
8
0
14 Mar 2024
HINT: High-quality INPainting Transformer with Mask-Aware Encoding and
  Enhanced Attention
HINT: High-quality INPainting Transformer with Mask-Aware Encoding and Enhanced Attention
Shuang Chen
Amir Atapour-Abarghouei
Hubert P. H. Shum
ViT
71
20
0
22 Feb 2024
Consistency Based Unsupervised Self-training For ASR Personalisation
Consistency Based Unsupervised Self-training For ASR Personalisation
Jisi Zhang
Vandana Rajan
Haaris Mehmood
David Tuckey
Pablo Peso Parada
Md. Asif Jalal
Karthikeyan P. Saravanan
Gil Ho Lee
Jungin Lee
Seokyeong Jung
48
0
0
22 Jan 2024
Self-Supervised Learning for Few-Shot Bird Sound Classification
Self-Supervised Learning for Few-Shot Bird Sound Classification
Ilyass Moummad
Romain Serizel
Nicolas Farrugia
SSL
110
10
0
25 Dec 2023
Efficiency-oriented approaches for self-supervised speech representation
  learning
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
64
1
0
18 Dec 2023
Phoneme-aware Encoding for Prefix-tree-based Contextual ASR
Phoneme-aware Encoding for Prefix-tree-based Contextual ASR
Hayato Futami
E. Tsunoo
Yosuke Kashiwagi
Hiroaki Ogawa
Siddhant Arora
Shinji Watanabe
69
7
0
15 Dec 2023
Towards Automatic Data Augmentation for Disordered Speech Recognition
Towards Automatic Data Augmentation for Disordered Speech Recognition
Zengrui Jin
Xurong Xie
Tianzi Wang
Mengzhe Geng
Jiajun Deng
Guinan Li
Shujie Hu
Xunying Liu
60
2
0
14 Dec 2023
Improving End-to-End Speech Processing by Efficient Text Data
  Utilization with Latent Synthesis
Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Jianqiao Lu
Wenyong Huang
Nianzu Zheng
Xingshan Zeng
Y. Yeung
Xiao Chen
SyDa
63
1
0
09 Oct 2023
One model to rule them all ? Towards End-to-End Joint Speaker
  Diarization and Speech Recognition
One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition
Samuele Cornell
Jee-weon Jung
Shinji Watanabe
S. Squartini
VLM
121
19
0
02 Oct 2023
KinSPEAK: Improving speech recognition for Kinyarwanda via
  semi-supervised learning methods
KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods
Antoine Nzeyimana
SSL
136
0
0
23 Aug 2023
Multi-View Frequency-Attention Alternative to CNN Frontends for
  Automatic Speech Recognition
Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition
Belen Alastruey
Lukas Drude
Jahn Heymann
Simon Wiesler
65
1
0
12 Jun 2023
Beyond Universal Transformer: block reusing with adaptor in Transformer
  for automatic speech recognition
Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition
Haoyu Tang
Zhaoyi Liu
Chang Zeng
Xinfeng Li
58
1
0
23 Mar 2023
End-to-End Speech Recognition: A Survey
End-to-End Speech Recognition: A Survey
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
87
172
0
03 Mar 2023
Audio-Visual Efficient Conformer for Robust Speech Recognition
Audio-Visual Efficient Conformer for Robust Speech Recognition
Maxime Burchi
Radu Timofte
VLM
86
35
0
04 Jan 2023
Visual Transformers for Primates Classification and Covid Detection
Visual Transformers for Primates Classification and Covid Detection
Steffen Illium
Robert Muller
Andreas Sedlmeier
Claudia Linnhoff-Popien
76
11
0
20 Dec 2022
Contextual-Utterance Training for Automatic Speech Recognition
Contextual-Utterance Training for Automatic Speech Recognition
Alejandro Gomez-Alanis
Lukas Drude
A. Schwarz
Rupak Vignesh Swaminathan
Simon Wiesler
64
1
0
27 Oct 2022
G-Augment: Searching for the Meta-Structure of Data Augmentation
  Policies for ASR
G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR
Gary Wang
Ekin D.Cubuk
Andrew Rosenberg
Shuyang Cheng
Ron J. Weiss
Bhuvana Ramabhadran
Pedro J. Moreno
Quoc V. Le
Daniel S. Park
117
2
0
19 Oct 2022
A Policy-based Approach to the SpecAugment Method for Low Resource E2E
  ASR
A Policy-based Approach to the SpecAugment Method for Low Resource E2E ASR
Rui Li
Guodong Ma
Dexin Zhao
Ranran Zeng
Xiaoyu Li
Haolin Huang
71
2
0
16 Oct 2022
Foundation Transformers
Foundation Transformers
Hongyu Wang
Shuming Ma
Shaohan Huang
Li Dong
Wenhui Wang
...
Barun Patra
Zhun Liu
Vishrav Chaudhary
Xia Song
Furu Wei
AI4CE
91
27
0
12 Oct 2022
Simple Pooling Front-ends For Efficient Audio Classification
Simple Pooling Front-ends For Efficient Audio Classification
Xubo Liu
Haohe Liu
Qiuqiang Kong
Xinhao Mei
Mark D. Plumbley
Wenwu Wang
107
17
0
03 Oct 2022
A Comparison of Transformer, Convolutional, and Recurrent Neural
  Networks on Phoneme Recognition
A Comparison of Transformer, Convolutional, and Recurrent Neural Networks on Phoneme Recognition
Kyuhong Shim
Wonyong Sung
71
2
0
01 Oct 2022
A Language Agnostic Multilingual Streaming On-Device ASR System
A Language Agnostic Multilingual Streaming On-Device ASR System
Yue Liu
Tara N. Sainath
Ruoming Pang
Shuo-yiin Chang
Qiumin Xu
...
Qiao Liang
Heguang Liu
Yanzhang He
Parisa Haghani
Sameer Bidichandani
AuLLM
64
11
0
29 Aug 2022
Improving Mandarin Speech Recogntion with Block-augmented Transformer
Improving Mandarin Speech Recogntion with Block-augmented Transformer
Xiaoming Ren
Huifeng Zhu
Liuwei Wei
Minghui Wu
Jie Hao
108
10
0
24 Jul 2022
pMCT: Patched Multi-Condition Training for Robust Speech Recognition
pMCT: Patched Multi-Condition Training for Robust Speech Recognition
Pablo Peso Parada
A. Dobrowolska
Karthikeyan P. Saravanan
Mete Ozay
97
6
0
11 Jul 2022
Speech Augmentation Based Unsupervised Learning for Keyword Spotting
Speech Augmentation Based Unsupervised Learning for Keyword Spotting
Jian Luo
Jianzong Wang
Ning Cheng
Haobin Tang
Jing Xiao
SSL
72
2
0
28 May 2022
Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep
  Neural Network, a Survey
Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep Neural Network, a Survey
Paul Wimmer
Jens Mehnert
Alexandru Paul Condurache
DD
98
21
0
17 May 2022
Separator-Transducer-Segmenter: Streaming Recognition and Segmentation
  of Multi-party Speech
Separator-Transducer-Segmenter: Streaming Recognition and Segmentation of Multi-party Speech
Ilya Sklyar
A. Piunova
Christian Osendorfer
66
6
0
10 May 2022
Efficient Training of Neural Transducer for Speech Recognition
Efficient Training of Neural Transducer for Speech Recognition
Wei Zhou
Wilfried Michel
Ralf Schluter
Hermann Ney
AI4TS
99
24
0
22 Apr 2022
Extracting Targeted Training Data from ASR Models, and How to Mitigate
  It
Extracting Targeted Training Data from ASR Models, and How to Mitigate It
Ehsan Amid
Om Thakkar
A. Narayanan
Rajiv Mathews
Franccoise Beaufays
46
9
0
18 Apr 2022
Improving Rare Word Recognition with LM-aware MWER Training
Improving Rare Word Recognition with LM-aware MWER Training
Weiran Wang
Tongzhou Chen
Tara N. Sainath
Ehsan Variani
Rohit Prabhavalkar
...
S. Mavandadi
Cal Peyser
Trevor Strohman
Yanzhang He
David Rybach
KELM
85
13
0
15 Apr 2022
Data Augmentation for Electrocardiograms
Data Augmentation for Electrocardiograms
Aniruddh Raghu
Divya Shanmugam
E. Pomerantsev
John Guttag
Collin M. Stultz
63
20
0
09 Apr 2022
Interspace Pruning: Using Adaptive Filter Representations to Improve
  Training of Sparse CNNs
Interspace Pruning: Using Adaptive Filter Representations to Improve Training of Sparse CNNs
Paul Wimmer
Jens Mehnert
Alexandru Paul Condurache
CVBM
64
20
0
15 Mar 2022
Korean Tokenization for Beam Search Rescoring in Speech Recognition
Korean Tokenization for Beam Search Rescoring in Speech Recognition
Kyuhong Shim
Hyewon Bae
Wonyong Sung
47
0
0
22 Feb 2022
Streaming Multi-Talker ASR with Token-Level Serialized Output Training
Streaming Multi-Talker ASR with Token-Level Serialized Output Training
Naoyuki Kanda
Jian Wu
Yu Wu
Xiong Xiao
Zhong Meng
Xiaofei Wang
Yashesh Gaur
Zhuo Chen
Jinyu Li
Takuya Yoshioka
142
60
0
02 Feb 2022
Improving End-to-End Contextual Speech Recognition with Fine-Grained
  Contextual Knowledge Selection
Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection
Minglun Han
Linhao Dong
Zhenlin Liang
Meng Cai
Shiyu Zhou
Zejun Ma
Bo Xu
80
46
0
30 Jan 2022
Improving the fusion of acoustic and text representations in RNN-T
Improving the fusion of acoustic and text representations in RNN-T
Chao Zhang
Yue Liu
Zhiyun Lu
Tara N. Sainath
Shuo-yiin Chang
AI4CE
100
12
0
25 Jan 2022
SPIRAL: Self-supervised Perturbation-Invariant Representation Learning
  for Speech Pre-Training
SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training
Wenyong Huang
Zhenhe Zhang
Y. Yeung
Xin Jiang
Qun Liu
111
23
0
25 Jan 2022
Multi-turn RNN-T for streaming recognition of multi-party speech
Multi-turn RNN-T for streaming recognition of multi-party speech
Ilya Sklyar
A. Piunova
Xianrui Zheng
Yulan Liu
114
24
0
19 Dec 2021
PM-MMUT: Boosted Phone-Mask Data Augmentation using Multi-Modeling Unit
  Training for Phonetic-Reduction-Robust E2E Speech Recognition
PM-MMUT: Boosted Phone-Mask Data Augmentation using Multi-Modeling Unit Training for Phonetic-Reduction-Robust E2E Speech Recognition
Guodong Ma
Pengfei Hu
Nurmemet Yolwas
Shen Huang
Hao-Ming Huang
92
4
0
13 Dec 2021
Deliberation of Streaming RNN-Transducer by Non-autoregressive Decoding
Deliberation of Streaming RNN-Transducer by Non-autoregressive Decoding
Weiran Wang
Ke Hu
Tara N. Sainath
63
21
0
01 Dec 2021
A comparison of streaming models and data augmentation methods for
  robust speech recognition
A comparison of streaming models and data augmentation methods for robust speech recognition
Jiyeon Kim
Mehul Kumar
Dhananjaya N. Gowda
Abhinav Garg
Chanwoo Kim
86
6
0
19 Nov 2021
12
Next