ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXivPDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 734 papers shown
Title
Discrete Audio Representation as an Alternative to Mel-Spectrograms for
  Speaker and Speech Recognition
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition
Krishna C. Puvvada
Nithin Rao Koluguri
Kunal Dhawan
Jagadeesh Balam
Boris Ginsburg
34
13
0
19 Sep 2023
The Power of Sound (TPoS): Audio Reactive Video Generation with Stable
  Diffusion
The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion
Yujin Jeong
Won-Wha Ryoo
Seunghyun Lee
Dabin Seo
Wonmin Byeon
Sangpil Kim
Jinkyu Kim
DiffM
32
29
0
08 Sep 2023
BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer
BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer
Kunkun Pang
Dafei Qin
Yingruo Fan
Julian Habekost
Takaaki Shiratori
Junichi Yamagishi
Taku Komura
SLR
ViT
26
19
0
07 Sep 2023
Leveraging Label Information for Multimodal Emotion Recognition
Leveraging Label Information for Multimodal Emotion Recognition
Pei-Hsin Wang
Sunlu Zeng
Junqing Chen
Lu Fan
Meng Chen
Youzheng Wu
Xiaodong He
29
4
0
05 Sep 2023
ASTER: Automatic Speech Recognition System Accessibility Testing for
  Stutterers
ASTER: Automatic Speech Recognition System Accessibility Testing for Stutterers
Yi Liu
Yuekang Li
Gelei Deng
Felix Juefei Xu
Yao Du
Cen Zhang
Chengwei Liu
Yeting Li
Lei Ma
Yang Liu
24
3
0
30 Aug 2023
Improving Continuous Sign Language Recognition with Cross-Lingual Signs
Improving Continuous Sign Language Recognition with Cross-Lingual Signs
Fangyun Wei
Yutong Chen
SLR
30
28
0
21 Aug 2023
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Jinchuan Tian
Jianwei Yu
Hangting Chen
Brian Yan
Chao Weng
Dong Yu
Shinji Watanabe
37
1
0
19 Aug 2023
Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech
  Recognition
Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition
Hanjing Zhu
Dongji Gao
Gaofeng Cheng
Daniel Povey
Pengyuan Zhang
Yonghong Yan
NoLa
38
4
0
12 Aug 2023
Conformer-based Target-Speaker Automatic Speech Recognition for
  Single-Channel Audio
Conformer-based Target-Speaker Automatic Speech Recognition for Single-Channel Audio
Yang Zhang
Krishna C. Puvvada
Vitaly Lavrukhin
Boris Ginsburg
38
14
0
09 Aug 2023
ApproBiVT: Lead ASR Models to Generalize Better Using Approximated
  Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
Fangyuan Wang
Ming Hao
Yuhai Shi
Bo Xu
MoMe
21
0
0
05 Aug 2023
CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech
  Recognition
CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition
Tian-Hao Zhang
Dinghao Zhou
Guiping Zhong
Jiaming Zhou
Baoxiang Li
20
3
0
26 Jul 2023
A Snoring Sound Dataset for Body Position Recognition: Collection,
  Annotation, and Analysis
A Snoring Sound Dataset for Body Position Recognition: Collection, Annotation, and Analysis
Li Xiao
Xiuping Yang
Xinhong Li
Weiping Tu
Xiong Chen
Weiyan Yi
Jie Lin
Yuhong Yang
Yanzhen Ren
29
2
0
25 Jul 2023
Modality Confidence Aware Training for Robust End-to-End Spoken Language
  Understanding
Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding
Suyoun Kim
Akshat Shrivastava
Duc Le
Ju Lin
Ozlem Kalinli
M. Seltzer
AuLLM
33
2
0
22 Jul 2023
PAS: Partial Additive Speech Data Augmentation Method for Noise Robust
  Speaker Verification
PAS: Partial Additive Speech Data Augmentation Method for Noise Robust Speaker Verification
Wonbin Kim
Hyun-Seo Shin
Ju-ho Kim
Ju-Sung Heo
Chanmann Lim
Ha-Jin Yu
26
0
0
20 Jul 2023
Multimodal Distillation for Egocentric Action Recognition
Multimodal Distillation for Egocentric Action Recognition
Gorjan Radevski
Dusan Grujicic
Marie-Francine Moens
Matthew Blaschko
Tinne Tuytelaars
EgoV
30
23
0
14 Jul 2023
AnuraSet: A dataset for benchmarking Neotropical anuran calls
  identification in passive acoustic monitoring
AnuraSet: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring
Juan Sebastián Canas
Maria Paula Toro-Gómez
L. S. M. Sugai
H. Benítez-Restrepo
J. Rudas
...
José Luiz Massao Moreira Sugai
Carolina Emília dos Santos
R. Bastos
Diego Llusia
J. Ulloa
41
18
0
11 Jul 2023
Align With Purpose: Optimize Desired Properties in CTC Models with a
  General Plug-and-Play Framework
Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework
Eliya Segev
Maya Alroy
Ronen Katsir
Noam Wies
Ayana Shenhav
...
D. Zar
Oren Tadmor
Jacob Bitterman
Amnon Shashua
Tal Rosenwein
34
2
0
04 Jul 2023
Dataset balancing can hurt model performance
Dataset balancing can hurt model performance
R. C. Moore
D. Ellis
Eduardo Fonseca
Shawn Hershey
A. Jansen
Manoj Plakal
35
9
0
30 Jun 2023
Leveraging Cross-Utterance Context For ASR Decoding
Leveraging Cross-Utterance Context For ASR Decoding
Robert Flynn
Anton Ragni
33
1
0
29 Jun 2023
Towards Effective and Compact Contextual Representation for Conformer
  Transducer Speech Recognition Systems
Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems
Mingyu Cui
Jiawen Kang
Jiajun Deng
Xiaoyue Yin
Yutao Xie
Xie Chen
Xunying Liu
35
8
0
23 Jun 2023
Mixture Encoder for Joint Speech Separation and Recognition
Mixture Encoder for Joint Speech Separation and Recognition
Simon Berger
Peter Vieting
Christoph Boeddeker
Ralf Schluter
Reinhold Häb-Umbach
26
6
0
21 Jun 2023
Recent Advances in Direct Speech-to-text Translation
Recent Advances in Direct Speech-to-text Translation
Chen Xu
Rong Ye
Qianqian Dong
Chengqi Zhao
Tom Ko
Mingxuan Wang
Tong Xiao
Jingbo Zhu
27
18
0
20 Jun 2023
Frequency & Channel Attention for Computationally Efficient Sound Event
  Detection
Frequency & Channel Attention for Computationally Efficient Sound Event Detection
Hyeonuk Nam
Seong-Hu Kim
D. Min
Yong-Hwa Park
19
9
0
20 Jun 2023
SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition
SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition
Desh Raj
Daniel Povey
Sanjeev Khudanpur
VLM
34
9
0
18 Jun 2023
Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation
Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation
K. Lakshminarayana
C. Dittmar
N. Pia
Emanuel Habets
34
0
0
16 Jun 2023
Correlation Clustering of Bird Sounds
Correlation Clustering of Bird Sounds
David Stein
Bjoern Andres
34
1
0
16 Jun 2023
Speaker Verification Across Ages: Investigating Deep Speaker Embedding
  Sensitivity to Age Mismatch in Enrollment and Test Speech
Speaker Verification Across Ages: Investigating Deep Speaker Embedding Sensitivity to Age Mismatch in Enrollment and Test Speech
Vishwanath Pratap Singh
Md. Sahidullah
Tomi Kinnunen
23
3
0
13 Jun 2023
Transfer Learning from Pre-trained Language Models Improves End-to-End
  Speech Summarization
Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization
Kohei Matsuura
Takanori Ashihara
Takafumi Moriya
Tomohiro Tanaka
Takatomo Kano
A. Ogawa
Marc Delcroix
29
9
0
07 Jun 2023
Text-only Domain Adaptation using Unified Speech-Text Representation in
  Transducer
Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer
Lu Huang
Yangqiu Song
Jun Zhang
Lu Lu
Zejun Ma
36
2
0
07 Jun 2023
Meta-Learning Framework for End-to-End Imposter Identification in Unseen
  Speaker Recognition
Meta-Learning Framework for End-to-End Imposter Identification in Unseen Speaker Recognition
Ashutosh Chaubey
Sparsh Sinha
Susmita Ghose
19
0
0
01 Jun 2023
Adaptive Contextual Biasing for Transducer Based Streaming Speech
  Recognition
Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition
Tianyi Xu
Zhanheng Yang
Kaixun Huang
Pengcheng Guo
Aoting Zhang
Biao Li
Changru Chen
Chong Li
Linfu Xie
22
10
0
01 Jun 2023
Some voices are too common: Building fair speech recognition systems
  using the Common Voice dataset
Some voices are too common: Building fair speech recognition systems using the Common Voice dataset
Lucas Maison
Yannick Esteve
28
3
0
01 Jun 2023
Dual Transformer Decoder based Features Fusion Network for Automated
  Audio Captioning
Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning
Jianyuan Sun
Xubo Liu
Xinhao Mei
V. Kılıç
Mark D. Plumbley
Wenwu Wang
33
3
0
30 May 2023
DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes
DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes
Xilin Jiang
Yinghao Aaron Li
N. Mesgarani
CLL
24
1
0
29 May 2023
Streaming Audio Transformers for Online Audio Tagging
Streaming Audio Transformers for Online Audio Tagging
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
Bin Wang
37
4
0
29 May 2023
Context-aware attention layers coupled with optimal transport domain
  adaptation and multimodal fusion methods for recognizing dementia from
  spontaneous speech
Context-aware attention layers coupled with optimal transport domain adaptation and multimodal fusion methods for recognizing dementia from spontaneous speech
Loukas Ilias
D. Askounis
34
9
0
25 May 2023
Transfer Learning for Personality Perception via Speech Emotion
  Recognition
Transfer Learning for Personality Perception via Speech Emotion Recognition
Yuanchao Li
P. Bell
Catherine Lai
CVBM
37
4
0
25 May 2023
Spoken Question Answering and Speech Continuation Using
  Spectrogram-Powered LLM
Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM
Eliya Nachmani
Alon Levkovitch
Roy Hirsch
Julián Salazar
Chulayutsh Asawaroengchai
Soroosh Mariooryad
Ehud Rivlin
RJ Skerry-Ryan
Michelle Tadmor Ramanovich
AuLLM
36
34
0
24 May 2023
Improving speech translation by fusing speech and text
Improving speech translation by fusing speech and text
Wenbiao Yin
Zhicheng Liu
Chengqi Zhao
Tao Wang
Jian-Fei Tong
Rong Ye
15
4
0
23 May 2023
Contextualized End-to-End Speech Recognition with Contextual Phrase
  Prediction Network
Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network
Kaixun Huang
Aoting Zhang
Zhanheng Yang
Pengcheng Guo
Bingshen Mu
Tianyi Xu
Linfu Xie
35
16
0
21 May 2023
A New Benchmark of Aphasia Speech Recognition and Detection Based on
  E-Branchformer and Multi-task Learning
A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning
Jiyang Tang
William Chen
Xuankai Chang
Shinji Watanabe
B. MacWhinney
24
10
0
19 May 2023
A Comparative Study on E-Branchformer vs Conformer in Speech
  Recognition, Translation, and Understanding Tasks
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
Yifan Peng
Kwangyoun Kim
Felix Wu
Brian Yan
Siddhant Arora
William Chen
Jiyang Tang
Suwon Shon
Prashant Sridhar
Shinji Watanabe
31
17
0
18 May 2023
Low-complexity deep learning frameworks for acoustic scene
  classification using teacher-student scheme and multiple spectrograms
Low-complexity deep learning frameworks for acoustic scene classification using teacher-student scheme and multiple spectrograms
L. D. Pham
Dat Ngo
Cam Le
Anahid N. Jalali
Alexander Schindler
19
1
0
16 May 2023
Exploration of Language Dependency for Japanese Self-Supervised Speech
  Representation Models
Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
33
3
0
09 May 2023
Multitask learning in Audio Captioning: a sentence embedding regression
  loss acts as a regularizer
Multitask learning in Audio Captioning: a sentence embedding regression loss acts as a regularizer
Etienne Labbé
J. Pinquier
Thomas Pellegrini
48
5
0
02 May 2023
Contrastive Speech Mixup for Low-resource Keyword Spotting
Contrastive Speech Mixup for Low-resource Keyword Spotting
Dianwen Ng
Ruixi Zhang
J. Yip
Chong Zhang
Yukun Ma
Trung Hieu Nguyen
Chongjia Ni
Eng Siong Chng
B. Ma
38
10
0
02 May 2023
When Deep Learning Meets Polyhedral Theory: A Survey
When Deep Learning Meets Polyhedral Theory: A Survey
Joey Huchette
Gonzalo Muñoz
Thiago Serra
Calvin Tsay
AI4CE
96
33
0
29 Apr 2023
MMViT: Multiscale Multiview Vision Transformers
MMViT: Multiscale Multiview Vision Transformers
Yuchen Liu
Natasha Ong
Kaiyan Peng
Bo Xiong
Qifan Wang
...
Madian Khabsa
Kaiyue Yang
David C. Liu
Donald Williamson
Hanchao Yu
ViT
38
4
0
28 Apr 2023
DropDim: A Regularization Method for Transformer Networks
DropDim: A Regularization Method for Transformer Networks
Hao Zhang
Dan Qu
Kejia Shao
Xu Yang
28
12
0
20 Apr 2023
Improving Speech Translation by Cross-Modal Multi-Grained Contrastive
  Learning
Improving Speech Translation by Cross-Modal Multi-Grained Contrastive Learning
Hao Zhang
Nianwen Si
Yaqi Chen
Wenlin Zhang
Xukui Yang
Dan Qu
Weiqiang Zhang
37
9
0
20 Apr 2023
Previous
123456...131415
Next