Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.08779
Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"
50 / 734 papers shown
Title
A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation
Zhengrui Ma
Qingkai Fang
Shaolei Zhang
Shoutao Guo
Yang Feng
Min Zhang
53
10
0
11 Jun 2024
Audio-based Step-count Estimation for Running -- Windowing and Neural Network Baselines
Philipp Wagner
Andreas Triantafyllopoulos
Alexander Gebhard
Björn Schuller
45
0
0
10 Jun 2024
Contrastive Learning from Synthetic Audio Doppelgängers
Manuel Cherep
Nikhil Singh
45
1
0
09 Jun 2024
Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection
Hyeonuk Nam
Seong-Hu Kim
D. Min
Junhyeok Lee
Yong-Hwa Park
34
3
0
08 Jun 2024
Multi-Microphone Speech Emotion Recognition using the Hierarchical Token-semantic Audio Transformer Architecture
Ohad Cohen
G. Hazan
Sharon Gannot
42
1
0
05 Jun 2024
Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation
Min-Jae Hwang
Ilia Kulikov
Benjamin Peloquin
Hongyu Gong
Peng-Jen Chen
Ann Lee
35
2
0
04 Jun 2024
Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
Saierdaer Yusuyin
Te Ma
Hao Huang
Wenbo Zhao
Zhijian Ou
52
2
0
04 Jun 2024
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition
Zijin Gu
Tatiana Likhomanenko
Richard He Bai
Erik McDermott
R. Collobert
Navdeep Jaitly
AuLLM
60
2
0
24 May 2024
A Comprehensive Survey on Data Augmentation
Zaitian Wang
Pengfei Wang
Kunpeng Liu
Pengyang Wang
Yanjie Fu
Chang-Tien Lu
Charu Aggarwal
Jian Pei
Yuanchun Zhou
ViT
109
23
0
15 May 2024
Sonos Voice Control Bias Assessment Dataset: A Methodology for Demographic Bias Assessment in Voice Assistants
Chloe Sekkat
Fanny Leroy
Salima Mdhaffar
Blake Perry Smith
Yannick Esteve
Joseph Dureau
A. Coucke
32
1
0
14 May 2024
RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification
June-Woo Kim
Miika Toikkanen
Sangmin Bae
Minseok Kim
Ho-Young Jung
40
5
0
05 May 2024
EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization
Jianzong Wang
Ziqi Liang
Xulong Zhang
Ning Cheng
Jing Xiao
38
0
0
30 Apr 2024
A cost minimization approach to fix the vocabulary size in a tokenizer for an End-to-End ASR system
Sunil Kumar Kopparapu
Ashish Panda
31
0
0
29 Apr 2024
Comparison of self-supervised in-domain and supervised out-domain transfer learning for bird species recognition
H. Ghaffari
Paul Devos
50
0
0
26 Apr 2024
Audio Anti-Spoofing Detection: A Survey
Menglu Li
Yasaman Ahmadiadli
Xiao-Ping Zhang
50
19
0
22 Apr 2024
AudioRepInceptionNeXt: A lightweight single-stream architecture for efficient audio recognition
Kin Wai Lau
Yasar Abbas Ur Rehman
L. Po
46
1
0
21 Apr 2024
A Large-Scale Evaluation of Speech Foundation Models
Shu-Wen Yang
Heng-Jui Chang
Zili Huang
Andy T. Liu
Cheng-I Jeff Lai
...
Kushal Lakhotia
Shang-Wen Li
Abdelrahman Mohamed
Shinji Watanabe
Hung-yi Lee
40
20
0
15 Apr 2024
Guided Masked Self-Distillation Modeling for Distributed Multimedia Sensor Event Analysis
Masahiro Yasuda
Noboru Harada
Yasunori Ohishi
Shoichiro Saito
Akira Nakayama
Nobutaka Ono
36
3
0
12 Apr 2024
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
Philip Anastassiou
Zhenyu Tang
Kainan Peng
Dongya Jia
Jiaxin Li
Ming Tu
Yuping Wang
Yuxuan Wang
Mingbo Ma
42
4
0
10 Apr 2024
VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain
Khai Le-Duc
LM&MA
44
9
0
08 Apr 2024
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Yash Jain
David M. Chan
Pranav Dheram
Aparna Khare
Olabanji Shonibare
Venkatesh Ravichandran
Shalini Ghosh
40
2
0
28 Mar 2024
Exploration of Adapter for Noise Robust Automatic Speech Recognition
Hao Shi
Tatsuya Kawahara
47
5
0
28 Feb 2024
Beyond Accuracy: An Empirical Study on Unit Testing in Open-source Deep Learning Projects
Han Wang
Sijia Yu
Chunyang Chen
Burak Turhan
Xiaodong Zhu
ELM
MLAU
28
2
0
26 Feb 2024
Understanding Test-Time Augmentation
Masanari Kimura
ViT
24
29
0
10 Feb 2024
Masked Audio Modeling with CLAP and Multi-Objective Learning
Yifei Xin
Xiulian Peng
Yan Lu
54
8
0
29 Jan 2024
Can Synthetic Data Boost the Training of Deep Acoustic Vehicle Counting Networks?
Stefano Damiano
Luca Bondi
Shabnam Ghaffarzadegan
Andre Guntoro
Toon van Waterschoot
18
5
0
17 Jan 2024
Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR
Junwen Bai
Bo-wen Li
Qiujia Li
Tara N. Sainath
Trevor Strohman
38
3
0
17 Jan 2024
Promptformer: Prompted Conformer Transducer for ASR
Sergio Duarte Torres
Arunasish Sen
Aman Rana
Lukas Drude
Alejandro Gomez-Alanis
Andreas Schwarz
Leif Rädel
Volker Leutnant
40
3
0
14 Jan 2024
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
38
1
0
18 Dec 2023
NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for Speaker Verification
Hyunjun Heo
U.H Shin
Ran Lee
YoungJu Cheon
Hyung-Min Park
31
9
0
14 Dec 2023
Graph Convolutions Enrich the Self-Attention in Transformers!
Jeongwhan Choi
Hyowon Wi
Jayoung Kim
Yehjin Shin
Kookjin Lee
Nathaniel Trask
Noseong Park
40
4
0
07 Dec 2023
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
30
17
0
27 Nov 2023
The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language
Jian Zhu
Changbing Yang
Farhan Samir
Jahurul Islam
32
4
0
14 Nov 2023
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
42
280
0
14 Nov 2023
Soundbay: Deep Learning Framework for Marine Mammals and Bioacoustic Research
Noam Bressler
Michael Faran
Amit Galor
Michael Moshe Michelashvili
Tomer Nachshon
Noa Weiss
48
0
0
07 Nov 2023
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation
Juan Pablo Zuluaga
Zhaocheng Huang
Xing Niu
Rohit Paturi
S. Srinivasan
Prashant Mathur
Brian Thompson
Marcello Federico
BDL
35
2
0
01 Nov 2023
Deep Neural Networks for Automatic Speaker Recognition Do Not Learn Supra-Segmental Temporal Features
Daniel Neururer
Volker Dellwo
Thilo Stadelmann
44
2
0
01 Nov 2023
The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System
T. Park
He Huang
Ante Jukić
Kunal Dhawan
Krishna C. Puvvada
Nithin Rao Koluguri
Nikolay Karpov
A. Laptev
Jagadeesh Balam
Boris Ginsburg
35
6
0
18 Oct 2023
Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation
T. Park
He Huang
Coleman Hooper
Nithin Rao Koluguri
Kunal Dhawan
Ante Jukić
Jagadeesh Balam
Boris Ginsburg
21
7
0
18 Oct 2023
Audio-AdapterFusion: A Task-ID-free Approach for Efficient and Non-Destructive Multi-task Speech Recognition
Hillary Ngai
Rohan Agrawal
Neeraj Gaur
Ronny Huang
Parisa Haghani
P. M. Mengibar
MoMe
46
0
0
17 Oct 2023
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
Cheol Jun Cho
Abdelrahman Mohamed
Shang-Wen Li
Alan W. Black
Gopala K. Anumanchipalli
39
8
0
16 Oct 2023
Efficient Supervised Training of Audio Transformers for Music Representation Learning
Pablo Alonso-Jiménez
Xavier Serra
Dmitry Bogdanov
ViT
35
3
0
28 Sep 2023
LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR
Guodong Ma
Wenxuan Wang
Yuke Li
Yuting Yang
Binbin Du
Haoran Fu
31
5
0
28 Sep 2023
Exploring Self-Supervised Contrastive Learning of Spatial Sound Event Representation
Xilin Jiang
Cong Han
Yinghao Aaron Li
N. Mesgarani
SSL
35
1
0
27 Sep 2023
Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
A. Hussein
Brian Yan
Antonios Anastasopoulos
Shinji Watanabe
Sanjeev Khudanpur
37
3
0
27 Sep 2023
Direct Models for Simultaneous Translation and Automatic Subtitling: FBK@IWSLT2023
Sara Papi
Marco Gaido
Matteo Negri
45
7
0
27 Sep 2023
Updated Corpora and Benchmarks for Long-Form Speech Recognition
Jennifer Drexler Fox
Desh Raj
Natalie Delworth
Quinn Mcnamara
Corey Miller
Miguel Jetté
AuLLM
38
7
0
26 Sep 2023
Human Transcription Quality Improvement
Jian Gao
Hanbo Sun
Cheng Cao
Zheng Du
43
2
0
24 Sep 2023
Memory-augmented conformer for improved end-to-end long-form ASR
Carlos Carvalho
A. Abad
RALM
32
1
0
22 Sep 2023
Self-Supervised Contrastive Learning for Robust Audio-Sheet Music Retrieval Systems
Luis Carvalho
Tobias Washüttl
Gerhard Widmer
29
4
0
21 Sep 2023
Previous
1
2
3
4
5
...
13
14
15
Next