ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition
v1v2v3 (latest)

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXiv (abs)PDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 1,048 papers shown
Title
Updated Corpora and Benchmarks for Long-Form Speech Recognition
Updated Corpora and Benchmarks for Long-Form Speech Recognition
Jennifer Drexler Fox
Desh Raj
Natalie Delworth
Quinn Mcnamara
Corey Miller
Miguel Jetté
AuLLM
74
8
0
26 Sep 2023
Reproducing Whisper-Style Training Using an Open-Source Toolkit and
  Publicly Available Data
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Yifan Peng
Jinchuan Tian
Brian Yan
Dan Berrebbi
Xuankai Chang
...
Yui Sudo
Muhammad Shakeel
Jee-weon Jung
Soumi Maiti
Shinji Watanabe
VLM
141
41
0
25 Sep 2023
Asca: less audio data is more insightful
Asca: less audio data is more insightful
Xiang Li
Jing Chen
Chao Li
Hongwu Lv
50
0
0
23 Sep 2023
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
Jiamin Xie
Ke Li
Jinxi Guo
Andros Tjandra
Shangguan Yuan
Leda Sari
Chunyang Wu
Junteng Jia
Jay Mahadeokar
Ozlem Kalinli
125
2
0
22 Sep 2023
End-to-End Speech Recognition Contextualization with Large Language
  Models
End-to-End Speech Recognition Contextualization with Large Language Models
Egor Lakomkin
Chunyang Wu
Yassir Fathullah
Ozlem Kalinli
M. Seltzer
Christian Fuegen
126
22
0
19 Sep 2023
Instruction-Following Speech Recognition
Instruction-Following Speech Recognition
Cheng-I Jeff Lai
Zhiyun Lu
Liangliang Cao
Ruoming Pang
AuLLM
77
6
0
18 Sep 2023
HM-Conformer: A Conformer-based audio deepfake detection system with
  hierarchical pooling and multi-level classification token aggregation methods
HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods
Hyun-Seo Shin
Ju-Sung Heo
Ju-ho Kim
Chanmann Lim
Wonbin Kim
Ha-Jin Yu
59
7
0
15 Sep 2023
Fine-tune the pretrained ATST model for sound event detection
Fine-tune the pretrained ATST model for sound event detection
Nian Shao
Xian Li
Xiaofei Li
72
27
0
15 Sep 2023
USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained
  Foundation Models
USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models
Guanlong Zhao
Yongqiang Wang
Jason W. Pelecanos
Yu Zhang
Hank Liao
Yiling Huang
Han Lu
Quan Wang
85
4
0
14 Sep 2023
RoDia: A New Dataset for Romanian Dialect Identification from Speech
RoDia: A New Dataset for Romanian Dialect Identification from Speech
Codrut Rotaru
Nicolae-Cătălin Ristea
Radu Tudor Ionescu
62
4
0
06 Sep 2023
Leveraging Label Information for Multimodal Emotion Recognition
Leveraging Label Information for Multimodal Emotion Recognition
Pei-Hsin Wang
Sunlu Zeng
Junqing Chen
Lu Fan
Meng Chen
Youzheng Wu
Xiaodong He
81
5
0
05 Sep 2023
Text Injection for Capitalization and Turn-Taking Prediction in Speech
  Models
Text Injection for Capitalization and Turn-Taking Prediction in Speech Models
Shaan Bijwadia
Shuo-yiin Chang
Weiran Wang
Zhong Meng
Hao Zhang
Tara N. Sainath
48
2
0
14 Aug 2023
Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech
  Recognition
Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition
Hanjing Zhu
Dongji Gao
Gaofeng Cheng
Daniel Povey
Pengyuan Zhang
Yonghong Yan
NoLa
76
4
0
12 Aug 2023
A Snoring Sound Dataset for Body Position Recognition: Collection,
  Annotation, and Analysis
A Snoring Sound Dataset for Body Position Recognition: Collection, Annotation, and Analysis
Li Xiao
Xiuping Yang
Xinhong Li
Weiping Tu
Xiong Chen
Weiyan Yi
Jie Lin
Yuhong Yang
Yanzhen Ren
61
2
0
25 Jul 2023
Modality Confidence Aware Training for Robust End-to-End Spoken Language
  Understanding
Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding
Suyoun Kim
Akshat Shrivastava
Duc Le
Ju Lin
Ozlem Kalinli
M. Seltzer
AuLLM
91
2
0
22 Jul 2023
Integrating Pretrained ASR and LM to Perform Sequence Generation for
  Spoken Language Understanding
Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding
Siddhant Arora
Hayato Futami
Yosuke Kashiwagi
E. Tsunoo
Brian Yan
Shinji Watanabe
64
4
0
20 Jul 2023
Globally Normalising the Transducer for Streaming Speech Recognition
Globally Normalising the Transducer for Streaming Speech Recognition
Rogier van Dalen
68
0
0
20 Jul 2023
PAS: Partial Additive Speech Data Augmentation Method for Noise Robust
  Speaker Verification
PAS: Partial Additive Speech Data Augmentation Method for Noise Robust Speaker Verification
Wonbin Kim
Hyun-Seo Shin
Ju-ho Kim
Ju-Sung Heo
Chanmann Lim
Ha-Jin Yu
100
0
0
20 Jul 2023
AnuraSet: A dataset for benchmarking Neotropical anuran calls
  identification in passive acoustic monitoring
AnuraSet: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring
Juan Sebastián Canas
Maria Paula Toro-Gómez
L. S. M. Sugai
H. Benítez-Restrepo
J. Rudas
...
José Luiz Massao Moreira Sugai
Carolina Emília dos Santos
R. Bastos
Diego Llusia
J. Ulloa
73
21
0
11 Jul 2023
Can Generative Large Language Models Perform ASR Error Correction?
Can Generative Large Language Models Perform ASR Error Correction?
Rao Ma
Mengjie Qian
Potsawee Manakul
Mark Gales
Kate Knill
AuLLMKELM
82
60
0
09 Jul 2023
Align With Purpose: Optimize Desired Properties in CTC Models with a
  General Plug-and-Play Framework
Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework
Eliya Segev
Maya Alroy
Ronen Katsir
Noam Wies
Ayana Shenhav
...
D. Zar
Oren Tadmor
Jacob Bitterman
Amnon Shashua
Tal Rosenwein
93
2
0
04 Jul 2023
Leveraging Cross-Utterance Context For ASR Decoding
Leveraging Cross-Utterance Context For ASR Decoding
Robert Flynn
Anton Ragni
71
1
0
29 Jun 2023
Probabilistic Linguistic Knowledge and Token-level Text Augmentation
Probabilistic Linguistic Knowledge and Token-level Text Augmentation
Zhengxiang Wang
67
0
0
29 Jun 2023
Prompting Large Language Models for Zero-Shot Domain Adaptation in
  Speech Recognition
Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition
Yuang Li
Yu-Huan Wu
Jinyu Li
Shujie Liu
129
47
0
28 Jun 2023
Multi-perspective Information Fusion Res2Net with RandomSpecmix for Fake
  Speech Detection
Multi-perspective Information Fusion Res2Net with RandomSpecmix for Fake Speech Detection
Shunbo Dong
Jun Xue
Cunhang Fan
Kang Zhu
Yujie Chen
Zhao Lv
64
4
0
27 Jun 2023
Recent Advances in Direct Speech-to-text Translation
Recent Advances in Direct Speech-to-text Translation
Chen Xu
Rong Ye
Qianqian Dong
Chengqi Zhao
Tom Ko
Mingxuan Wang
Tong Xiao
Jingbo Zhu
116
23
0
20 Jun 2023
Frequency & Channel Attention for Computationally Efficient Sound Event
  Detection
Frequency & Channel Attention for Computationally Efficient Sound Event Detection
Hyeonuk Nam
Seong-Hu Kim
D. Min
Yong-Hwa Park
67
9
0
20 Jun 2023
Correlation Clustering of Bird Sounds
Correlation Clustering of Bird Sounds
David Stein
Bjoern Andres
58
1
0
16 Jun 2023
Learning Cross-lingual Mappings for Data Augmentation to Improve
  Low-Resource Speech Recognition
Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition
Muhammad Umar Farooq
Thomas Hain
42
2
0
14 Jun 2023
Improving Frame-level Classifier for Word Timings with Non-peaky CTC in
  End-to-End Automatic Speech Recognition
Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition
Xianzhao Chen
Yist Y. Lin
Kang Wang
Yi He
Zejun Ma
62
2
0
09 Jun 2023
Transfer Learning from Pre-trained Language Models Improves End-to-End
  Speech Summarization
Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization
Kohei Matsuura
Takanori Ashihara
Takafumi Moriya
Tomohiro Tanaka
Takatomo Kano
A. Ogawa
Marc Delcroix
61
8
0
07 Jun 2023
Text-only Domain Adaptation using Unified Speech-Text Representation in
  Transducer
Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer
Lu Huang
Yangqiu Song
Jun Zhang
Lu Lu
Zejun Ma
92
3
0
07 Jun 2023
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
Yochai Yemini
Aviv Shamsian
Lior Bracha
Sharon Gannot
Ethan Fetaya
DiffM
116
15
0
05 Jun 2023
Adaptive Contextual Biasing for Transducer Based Streaming Speech
  Recognition
Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition
Tianyi Xu
Zhanheng Yang
Kaixun Huang
Pengcheng Guo
Aoting Zhang
Biao Li
Changru Chen
Chong Li
Linfu Xie
83
12
0
01 Jun 2023
Some voices are too common: Building fair speech recognition systems
  using the Common Voice dataset
Some voices are too common: Building fair speech recognition systems using the Common Voice dataset
Lucas Maison
Yannick Esteve
102
3
0
01 Jun 2023
Graph Neural Networks for Contextual ASR with the Tree-Constrained
  Pointer Generator
Graph Neural Networks for Contextual ASR with the Tree-Constrained Pointer Generator
Guangzhi Sun
Chuxu Zhang
P. Woodland
47
6
0
30 May 2023
E-PANNs: Sound Recognition Using Efficient Pre-trained Audio Neural
  Networks
E-PANNs: Sound Recognition Using Efficient Pre-trained Audio Neural Networks
Arshdeep Singh
Haohe Liu
Mark D. Plumbley
VLM
63
5
0
30 May 2023
Building Accurate Low Latency ASR for Streaming Voice Search
Building Accurate Low Latency ASR for Streaming Voice Search
Abhinav Goyal
Nikesh Garera
27
1
0
29 May 2023
DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes
DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes
Xilin Jiang
Yinghao Aaron Li
N. Mesgarani
CLL
50
1
0
29 May 2023
Streaming Audio Transformers for Online Audio Tagging
Streaming Audio Transformers for Online Audio Tagging
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
Bin Wang
87
4
0
29 May 2023
Bridging the Granularity Gap for Acoustic Modeling
Bridging the Granularity Gap for Acoustic Modeling
Chen Xu
Yuhao Zhang
Chengbo Jiao
Xiaoqian Liu
Chi Hu
Xin Zeng
Tong Xiao
Anxiang Ma
Huizhen Wang
JingBo Zhu
65
6
0
27 May 2023
Transfer Learning for Personality Perception via Speech Emotion
  Recognition
Transfer Learning for Personality Perception via Speech Emotion Recognition
Yuanchao Li
P. Bell
Catherine Lai
CVBM
66
4
0
25 May 2023
Mixture-of-Expert Conformer for Streaming Multilingual ASR
Mixture-of-Expert Conformer for Streaming Multilingual ASR
Ke Hu
Yue Liu
Tara N. Sainath
Yu Zhang
F. Beaufays
MoE
124
14
0
25 May 2023
Towards Robust Family-Infant Audio Analysis Based on Unsupervised
  Pretraining of Wav2vec 2.0 on Large-Scale Unlabeled Family Audio
Towards Robust Family-Infant Audio Analysis Based on Unsupervised Pretraining of Wav2vec 2.0 on Large-Scale Unlabeled Family Audio
Jialu Li
M. Hasegawa-Johnson
Nancy L. McElwain
94
11
0
21 May 2023
A New Benchmark of Aphasia Speech Recognition and Detection Based on
  E-Branchformer and Multi-task Learning
A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning
Jiyang Tang
William Chen
Xuankai Chang
Shinji Watanabe
B. MacWhinney
59
12
0
19 May 2023
Unsupervised ASR via Cross-Lingual Pseudo-Labeling
Unsupervised ASR via Cross-Lingual Pseudo-Labeling
Tatiana Likhomanenko
Loren Lugosch
R. Collobert
48
0
0
19 May 2023
Low-complexity deep learning frameworks for acoustic scene
  classification using teacher-student scheme and multiple spectrograms
Low-complexity deep learning frameworks for acoustic scene classification using teacher-student scheme and multiple spectrograms
L. D. Pham
Dat Ngo
Cam Le
Anahid N. Jalali
Alexander Schindler
56
1
0
16 May 2023
Semi-Supervised Federated Learning for Keyword Spotting
Semi-Supervised Federated Learning for Keyword Spotting
Enmao Diao
Eric W. Tramel
Jie Ding
Tao Zhang
42
3
0
09 May 2023
Employing Hybrid Deep Neural Networks on Dari Speech
Employing Hybrid Deep Neural Networks on Dari Speech
J. Baktash
Mursal Dawodi
43
0
0
04 May 2023
Hybrid Transducer and Attention based Encoder-Decoder Modeling for
  Speech-to-Text Tasks
Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks
Yun Tang
Anna Y. Sun
Hirofumi Inaguma
Xinyue Chen
Ning Dong
Xutai Ma
Paden Tomasello
J. Pino
108
22
0
04 May 2023
Previous
123456...192021
Next