Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.08100
Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition
16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Conformer: Convolution-augmented Transformer for Speech Recognition"
50 / 1,749 papers shown
Title
N-Singer: A Non-Autoregressive Korean Singing Voice Synthesis System for Pronunciation Enhancement
Gyeong-Hoon Lee
Tae-Woo Kim
Hanbin Bae
Min-Ji Lee
Young-Ik Kim
Hoon-Young Cho
VLM
14
19
0
29 Jun 2021
Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding
Shengjie Luo
Shanda Li
Tianle Cai
Di He
Dinglan Peng
Shuxin Zheng
Guolin Ke
Liwei Wang
Tie-Yan Liu
29
50
0
23 Jun 2021
An Improved Single Step Non-autoregressive Transformer for Automatic Speech Recognition
Ruchao Fan
Wei Chu
Peng Chang
Jing Xiao
Abeer Alwan
29
15
0
18 Jun 2021
Multi-mode Transformer Transducer with Stochastic Future Context
Kwangyoun Kim
Felix Wu
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
30
9
0
17 Jun 2021
Efficient Conformer with Prob-Sparse Attention Mechanism for End-to-EndSpeech Recognition
Xiong Wang
Sining Sun
Lei Xie
Long Ma
24
18
0
17 Jun 2021
Layer Pruning on Demand with Intermediate CTC
Jaesong Lee
Jingu Kang
Shinji Watanabe
19
16
0
17 Jun 2021
LiRA: Learning Visual Speech Representations from Audio through Self-supervision
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Björn W. Schuller
M. Pantic
SSL
24
53
0
16 Jun 2021
Collaborative Training of Acoustic Encoders for Speech Recognition
Varun K. Nagaraja
Yangyang Shi
Ganesh Venkatesh
Ozlem Kalinli
M. Seltzer
Vikas Chandra
43
11
0
16 Jun 2021
Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain
Pengcheng Guo
Xuankai Chang
Shinji Watanabe
Lei Xie
19
18
0
16 Jun 2021
RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis
Rohola Zandie
Mohammad H. Mahoor
Julia Madsen
Eshrat S. Emamian
32
25
0
15 Jun 2021
Learning Audio-Visual Dereverberation
Changan Chen
Wei-Ju Sun
David Harwath
Kristen Grauman
31
32
0
14 Jun 2021
Overcoming Domain Mismatch in Low Resource Sequence-to-Sequence ASR Models using Hybrid Generated Pseudotranscripts
Chak-Fai Li
Francis Keith
William Hartmann
M. Snover
O. Kimball
17
4
0
14 Jun 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
55
2,770
0
14 Jun 2021
End-to-end Neural Diarization: From Transformer to Conformer
Yi Y. Liu
Eunjung Han
Chul Lee
A. Stolcke
22
40
0
14 Jun 2021
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Guoguo Chen
Shuzhou Chai
Guan-Bo Wang
Jiayu Du
Weiqiang Zhang
...
Xuchen Yao
Yongqing Wang
Yujun Wang
Zhao You
Zhiyong Yan
60
351
0
13 Jun 2021
PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition
Cheng-I Jeff Lai
Yang Zhang
Alexander H. Liu
Shiyu Chang
Yi-Lun Liao
Yung-Sung Chuang
Kaizhi Qian
Sameer Khurana
David D. Cox
James R. Glass
VLM
66
70
0
10 Jun 2021
Balanced End-to-End Monolingual pre-training for Low-Resourced Indic Languages Code-Switching Speech Recognition
A. Hussein
Shammur A. Chowdhury
Najim Dehak
Ahmed M. Ali
16
2
0
10 Jun 2021
GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures
Ivan Chelombiev
Daniel Justus
Douglas Orr
A. Dietrich
Frithjof Gressmann
A. Koliousis
Carlo Luschi
24
5
0
10 Jun 2021
U2++: Unified Two-pass Bidirectional End-to-end Model for Speech Recognition
Di Wu
Binbin Zhang
Chao Yang
Zhendong Peng
Wenjing Xia
Xiaoyu Chen
X. Lei
21
47
0
10 Jun 2021
Audiovisual transfer learning for audio tagging and sound event detection
Wim Boes
Hugo Van hamme
CLIP
VLM
13
11
0
09 Jun 2021
Do Transformers Really Perform Bad for Graph Representation?
Chengxuan Ying
Tianle Cai
Shengjie Luo
Shuxin Zheng
Guolin Ke
Di He
Yanming Shen
Tie-Yan Liu
GNN
30
433
0
09 Jun 2021
A Comparative Study on Neural Architectures and Training Methods for Japanese Speech Recognition
Shigeki Karita
Yotaro Kubo
M. Bacchiani
Llion Jones
19
13
0
09 Jun 2021
Unsupervised Automatic Speech Recognition: A Review
Hanan Aldarmaki
Asad Ullah
Nazar Zaki
VLM
SSL
39
56
0
09 Jun 2021
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
53
1,088
0
08 Jun 2021
Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition
Max W. Y. Lam
Jun Wang
Chao Weng
Dan Su
Dong Yu
29
6
0
08 Jun 2021
LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution Homography Estimation
Ruizhi Shao
Gaochang Wu
Yuemei Zhou
Ying Fu
Yebin Liu
ViT
21
42
0
08 Jun 2021
Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios
E. Tsunoo
Kentarou Shibata
Chaitanya Narisetty
Yosuke Kashiwagi
Shinji Watanabe
19
12
0
07 Jun 2021
CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings
Tatiana Likhomanenko
Qiantong Xu
Gabriel Synnaeve
R. Collobert
A. Rogozhnikov
OOD
ViT
33
54
0
06 Jun 2021
Attention mechanisms and deep learning for machine vision: A survey of the state of the art
A. M. Hafiz
S. A. Parah
R. A. Bhat
21
45
0
03 Jun 2021
Dual Script E2E framework for Multilingual and Code-Switching ASR
Mari Ganesh Kumar
Jom Kuriakose
Anand Thyagachandran
A. Arunkumar
Ashish Seth
L. D. Prasad
Saish Jaiswal
Anusha Prakash
H. Murthy
32
10
0
02 Jun 2021
Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR
Shammur A. Chowdhury
A. Hussein
Ahmed Abdelali
Ahmed M. Ali
14
33
0
31 May 2021
Cross-Referencing Self-Training Network for Sound Event Detection in Audio Mixtures
Sangwook Park
D. Han
Mounya Elhilali
30
12
0
27 May 2021
Unsupervised Speech Recognition
Alexei Baevski
Wei-Ning Hsu
Alexis Conneau
Michael Auli
SSL
26
270
0
24 May 2021
Attention-based Neural Beamforming Layers for Multi-channel Speech Recognition
Bhargav Pulugundla
Yang Gao
Brian King
Gokce Keskin
Sri Harish Reddy Mallidi
Minhua Wu
J. Droppo
Roland Maas
19
2
0
12 May 2021
Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders
Chen Xu
Bojie Hu
Yanyang Li
Yuhao Zhang
Shen Huang
Qi Ju
Tong Xiao
Jingbo Zhu
22
75
0
12 May 2021
GSPMD: General and Scalable Parallelization for ML Computation Graphs
Yuanzhong Xu
HyoukJoong Lee
Dehao Chen
Blake A. Hechtman
Yanping Huang
...
Noam M. Shazeer
Shibo Wang
Tao Wang
Yonghui Wu
Zhifeng Chen
MoE
28
128
0
10 May 2021
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
Yichong Leng
Xu Tan
Linchen Zhu
Jin Xu
Renqian Luo
Linquan Liu
Tao Qin
Xiang-Yang Li
Ed Lin
Tie-Yan Liu
KELM
24
63
0
09 May 2021
SpeechNet: A Universal Modularized Model for Speech Processing Tasks
Yi-Chen Chen
Po-Han Chi
Shu-Wen Yang
Kai-Wei Chang
Jheng-hao Lin
Sung-Feng Huang
Da-Rong Liu
Chi-Liang Liu
Cheng-Kuang Lee
Hung-yi Lee
MoE
21
17
0
07 May 2021
SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts
Zhao You
Shulin Feng
Dan Su
Dong Yu
MoE
24
51
0
07 May 2021
Efficient Weight factorization for Multilingual Speech Recognition
Ngoc-Quan Pham
Tuan-Nam Nguyen
S. Stueker
A. Waibel
43
19
0
07 May 2021
On the limit of English conversational speech recognition
Zoltán Tüske
G. Saon
Brian Kingsbury
22
50
0
03 May 2021
Scaling End-to-End Models for Large-Scale Multilingual ASR
Bo-wen Li
Ruoming Pang
Tara N. Sainath
Anmol Gulati
Yu Zhang
James Qin
Parisa Haghani
Yifan Jiang
Min Ma
Junwen Bai
CLL
34
76
0
30 Apr 2021
Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models
Thibault Doutre
Wei Han
Chung-Cheng Chiu
Ruoming Pang
Olivier Siohan
Liangliang Cao
35
5
0
25 Apr 2021
Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers
Takaaki Hori
Niko Moritz
Chiori Hori
Jonathan Le Roux
27
34
0
19 Apr 2021
Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition
Wei Zhou
Mohammad Zeineldeen
Zuoyun Zheng
Ralf Schluter
Hermann Ney
25
14
0
19 Apr 2021
A novel time-frequency Transformer based on self-attention mechanism and its application in fault diagnosis of rolling bearings
Yifei Ding
M. Jia
Qiuhua Miao
Yudong Cao
16
268
0
19 Apr 2021
Efficient conformer-based speech recognition with linear attention
Shengqiang Li
Menglong Xu
Xiao-Lei Zhang
24
20
0
14 Apr 2021
Non-autoregressive sequence-to-sequence voice conversion
Tomoki Hayashi
Wen-Chin Huang
Kazuhiro Kobayashi
T. Toda
6
23
0
14 Apr 2021
Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation
Hirofumi Inaguma
Tatsuya Kawahara
Shinji Watanabe
29
42
0
13 Apr 2021
Lessons on Parameter Sharing across Layers in Transformers
Sho Takase
Shun Kiyono
25
84
0
13 Apr 2021
Previous
1
2
3
...
32
33
34
35
Next