Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.08100
Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition
16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Conformer: Convolution-augmented Transformer for Speech Recognition"
50 / 1,749 papers shown
Title
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates
Hirofumi Inaguma
Siddharth Dalmia
Brian Yan
Shinji Watanabe
65
11
0
27 Sep 2021
ChannelAugment: Improving generalization of multi-channel ASR by training with input channel randomization
M. Gaudesi
F. Weninger
D. Sharma
P. Zhan
AAML
33
1
0
23 Sep 2021
Audiomer: A Convolutional Transformer For Keyword Spotting
Surya Kant Sahu
Sai Mitheran
Juhi Kamdar
Meet Gandhi
40
8
0
21 Sep 2021
Audio-Visual Speech Recognition is Worth 32
×
\times
×
32
×
\times
×
8 Voxels
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
29
7
0
20 Sep 2021
Influence of ASR and Language Model on Alzheimer's Disease Detection
Joan Codina-Filbà
Guillermo Cámbara
Jordi Luque
Mireia Farrús
21
2
0
20 Sep 2021
Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition
Guolin Zheng
Yubei Xiao
Ke Gong
Pan Zhou
Xiaodan Liang
Liang Lin
32
26
0
19 Sep 2021
Dual-Encoder Architecture with Encoder Selection for Joint Close-Talk and Far-Talk Speech Recognition
F. Weninger
M. Gaudesi
Ralf Leibold
R. Gemello
P. Zhan
35
4
0
17 Sep 2021
Primer: Searching for Efficient Transformers for Language Modeling
David R. So
Wojciech Mañke
Hanxiao Liu
Zihang Dai
Noam M. Shazeer
Quoc V. Le
VLM
91
152
0
17 Sep 2021
PDAugment: Data Augmentation by Pitch and Duration Adjustments for Automatic Lyrics Transcription
Chen Zhang
Jiaxing Yu
Luchin Chang
Xu Tan
Jiawei Chen
Tao Qin
Kecheng Zhang
22
15
0
16 Sep 2021
Tied & Reduced RNN-T Decoder
Rami Botros
Tara N. Sainath
R. David
Emmanuel Guzman
Wei Li
Yanzhang He
38
55
0
15 Sep 2021
Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition
Felix Wu
Kwangyoun Kim
Jing Pan
Kyu Jeong Han
Kilian Q. Weinberger
Yoav Artzi
27
71
0
14 Sep 2021
Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring
Hirofumi Inaguma
Yosuke Higuchi
Kevin Duh
Tatsuya Kawahara
Shinji Watanabe
63
11
0
09 Sep 2021
Beijing ZKJ-NPU Speaker Verification System for VoxCeleb Speaker Recognition Challenge 2021
Li Zhang
Huan Zhao
Qinling Meng
Yanli Chen
Min Liu
Lei Xie
32
10
0
08 Sep 2021
A Survey of Sound Source Localization with Deep Learning Methods
Pierre-Amaury Grumiaux
Srdjan Kitić
Laurent Girin
Alexandre Guérin
33
246
0
08 Sep 2021
Efficient conformer: Progressive downsampling and grouped attention for automatic speech recognition
Maxime Burchi
Valentin Vielzeuf
37
84
0
31 Aug 2021
Multi-Channel Transformer Transducer for Speech Recognition
Feng-Ju Chang
Martin H. Radfar
Athanasios Mouchtaris
M. Omologo
26
19
0
30 Aug 2021
Injecting Text in Self-Supervised Speech Pretraining
Zhehuai Chen
Yu Zhang
Andrew Rosenberg
Bhuvana Ramabhadran
Gary Wang
Pedro J. Moreno
SSL
25
36
0
27 Aug 2021
Self-Attention for Audio Super-Resolution
Nathanaël Carraz Rakotonirina
SupR
38
23
0
26 Aug 2021
Multilingual Speech Recognition for Low-Resource Indian Languages using Multi-Task conformer
Krishna D N Freshworks
29
7
0
22 Aug 2021
A Dual-Decoder Conformer for Multilingual Speech Recognition
Krishna D N Freshworks
9
1
0
22 Aug 2021
Generalizing RNN-Transducer to Out-Domain Audio via Sparse Self-Attention Layers
Juntae Kim
Jee-Hye Lee
24
6
0
22 Aug 2021
Towards Efficient Point Cloud Graph Neural Networks Through Architectural Simplification
Shyam A. Tailor
R. D. Jong
Tiago Azevedo
Matthew Mattina
Partha P. Maji
3DPC
GNN
25
12
0
13 Aug 2021
Masked Acoustic Unit for Mispronunciation Detection and Correction
Zhan Zhang
Yuehai Wang
Jianyi Yang
25
3
0
12 Aug 2021
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Yu-An Chung
Yu Zhang
Wei Han
Chung-Cheng Chiu
James Qin
Ruoming Pang
Yonghui Wu
SSL
VLM
12
412
0
07 Aug 2021
Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language and Accent Identification
Sangeeta Ghangam
Daniel Whitenack
Joshua Nemecek
16
4
0
04 Aug 2021
Decoupling recognition and transcription in Mandarin ASR
Jiahong Yuan
Xingyu Cai
Dongji Gao
Renjie Zheng
Liang Huang
Kenneth Ward Church
36
9
0
02 Aug 2021
USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments
M. Musaev
Saida Mussakhojayeva
Ilyos Khujayorov
Yerbolat Khassanov
M. Ochilov
H. A. Varol
16
19
0
30 Jul 2021
Proposal-based Few-shot Sound Event Detection for Speech and Environmental Sounds with Perceivers
Piper Wolters
Logan Sizemore
Chris Daw
Brian Hutchinson
Lauren A. Phillips
37
11
0
28 Jul 2021
CarneliNet: Neural Mixture Model for Automatic Speech Recognition
A. Kalinov
Somshubra Majumdar
Jagadeesh Balam
Boris Ginsburg
MoE
24
3
0
22 Jul 2021
Multitask-Based Joint Learning Approach To Robust ASR For Radio Communication Speech
Duo Ma
Nana Hou
Van Tung Pham
Haihua Xu
Chng Eng Siong
33
22
0
22 Jul 2021
Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models
Tianzi Wang
Yuya Fujita
Xuankai Chang
Shinji Watanabe
13
15
0
20 Jul 2021
Assessment of Self-Attention on Learned Features For Sound Event Localization and Detection
Parthasaarathy Sudarsanam
A. Politis
K. Drossos
16
13
0
20 Jul 2021
Translatotron 2: High-quality direct speech-to-speech translation with voice preservation
Ye Jia
Michelle Tadmor Ramanovich
Tal Remez
Roi Pomerantz
26
67
0
19 Jul 2021
VAD-free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording
Hirofumi Inaguma
Tatsuya Kawahara
19
2
0
15 Jul 2021
Conformer-based End-to-end Speech Recognition With Rotary Position Embedding
Shengqiang Li
Menglong Xu
Xiao-Lei Zhang
18
9
0
13 Jul 2021
Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021
Takashi Maekaku
Xuankai Chang
Yuya Fujita
Li-Wei Chen
Shinji Watanabe
Alexander I. Rudnicky
115
13
0
13 Jul 2021
Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation
Jian Luo
Jianzong Wang
Ning Cheng
Jing Xiao
SSL
27
6
0
09 Jul 2021
On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models
Xiaohui Zhang
Vimal Manohar
David C. Zhang
Frank Zhang
Yangyang Shi
Nayan Singhal
Julian Chan
Fuchun Peng
Yatharth Saraf
M. Seltzer
20
14
0
09 Jul 2021
Improved Language Identification Through Cross-Lingual Self-Supervised Learning
Andros Tjandra
Diptanu Gon Choudhury
Frank Zhang
Kritika Singh
Alexis Conneau
Alexei Baevski
Assaf Sela
Yatharth Saraf
Michael Auli
VLM
SSL
24
35
0
08 Jul 2021
Advancing CTC-CRF Based End-to-End Speech Recognition with Wordpieces and Conformers
Huahuan Zheng
Wenjie Peng
Zhijian Ou
Jinsong Zhang
28
5
0
07 Jul 2021
GLiT: Neural Architecture Search for Global and Local Image Transformer
Boyu Chen
Peixia Li
Chuming Li
Baopu Li
Lei Bai
Chen Lin
Ming Sun
Junjie Yan
Wanli Ouyang
ViT
35
85
0
07 Jul 2021
A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio
Naoyuki Kanda
Xiong Xiao
Jian Wu
Tianyan Zhou
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Takuya Yoshioka
19
14
0
06 Jul 2021
The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline Task
Chen Xu
Xiaoqian Liu
Xiaowen Liu
Laohu Wang
Canan Huang
Tong Xiao
Jingbo Zhu
34
5
0
06 Jul 2021
Investigation of Practical Aspects of Single Channel Speech Separation for ASR
Jian Wu
Zhuo Chen
Sanyuan Chen
Yu-Huan Wu
Takuya Yoshioka
Naoyuki Kanda
Shujie Liu
Jinyu Li
27
17
0
05 Jul 2021
Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition
Timo Lohrenz
P. Schwarz
Zhengyang Li
Tim Fingscheidt
21
11
0
02 Jul 2021
Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition
Niko Moritz
Takaaki Hori
Jonathan Le Roux
6
20
0
02 Jul 2021
ESPnet-ST IWSLT 2021 Offline Speech Translation System
Hirofumi Inaguma
Shun Kiyono
Nelson Enrique Yalta Soplin
Pengcheng Guo
Jun Suzuki
Kevin Duh
Shinji Watanabe
3DV
37
2
0
01 Jul 2021
StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR
Hirofumi Inaguma
Tatsuya Kawahara
25
4
0
01 Jul 2021
IMS' Systems for the IWSLT 2021 Low-Resource Speech Translation Task
Pavel Denisov
Manuel Mager
Ngoc Thang Vu
37
6
0
30 Jun 2021
DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement
Yuma Koizumi
Shigeki Karita
Scott Wisdom
Hakan Erdogan
J. Hershey
Llion Jones
M. Bacchiani
19
41
0
30 Jun 2021
Previous
1
2
3
...
31
32
33
34
35
Next