Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.08100
Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition
16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Conformer: Convolution-augmented Transformer for Speech Recognition"
50 / 1,750 papers shown
Title
Towards Personalization of CTC Speech Recognition Models with Contextual Adapters and Adaptive Boosting
Saket Dingliwal
Monica Sunkara
S. Bodapati
S. Ronanki
Jeffrey J. Farris
Katrin Kirchhoff
33
0
0
18 Oct 2022
Sub-8-bit quantization for on-device speech recognition: a regularization-free approach
Kai Zhen
Martin H. Radfar
Hieu Duy Nguyen
Grant P. Strimel
Nathan Susanj
Athanasios Mouchtaris
MQ
28
8
0
17 Oct 2022
A Policy-based Approach to the SpecAugment Method for Low Resource E2E ASR
Rui Li
Guodong Ma
Dexin Zhao
Ranran Zeng
Xiaoyu Li
Haolin Huang
29
2
0
16 Oct 2022
LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge
Yan Jia
Mihee Hong
Jingyu Hou
Kailong Ren
Sifan Ma
Jin Wang
Fangzhen Peng
Yinglin Ji
Lin Yang
Junjie Wang
25
1
0
14 Oct 2022
JOIST: A Joint Speech and Text Streaming Model For ASR
Tara N. Sainath
Rohit Prabhavalkar
Ankur Bapna
Yu Zhang
Zhouyuan Huo
Zhehuai Chen
Bo-wen Li
Weiran Wang
Trevor Strohman
RALM
AuLLM
53
35
0
13 Oct 2022
Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy
Sarina Meyer
Pascal Tilli
Pavel Denisov
Florian Lux
Julia Koch
Ngoc Thang Vu
28
31
0
13 Oct 2022
SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection
Piotr Kawa
Marcin Plata
P. Syga
37
14
0
12 Oct 2022
Summary on the ISCSLP 2022 Chinese-English Code-Switching ASR Challenge
Shuhao Deng
Chengfei Li
Jinfeng Bai
Qingqing Zhang
Weiqiang Zhang
Runyan Yang
Gaofeng Cheng
Pengyuan Zhang
Yonghong Yan
20
1
0
12 Oct 2022
Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR
DongSeon Hwang
K. Sim
Yu Zhang
Trevor Strohman
24
10
0
11 Oct 2022
Scaling Up Deliberation for Multilingual ASR
Ke Hu
Bo-wen Li
Tara N. Sainath
LRM
30
9
0
11 Oct 2022
MFCCA:Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenario
Fan Yu
Shiliang Zhang
Pengcheng Guo
Yuhao Liang
Zhihao Du
Yuxiao Lin
Linfu Xie
33
11
0
11 Oct 2022
CTC Alignments Improve Autoregressive Translation
Brian Yan
Siddharth Dalmia
Yosuke Higuchi
Graham Neubig
Florian Metze
A. Black
Shinji Watanabe
46
33
0
11 Oct 2022
LW-ISP: A Lightweight Model with ISP and Deep Learning
Hongyang Chen
Kaisheng Ma
VLM
24
1
0
08 Oct 2022
Synthetic Voice Detection and Audio Splicing Detection using SE-Res2Net-Conformer Architecture
Lei Wang
Benedict Yeoh
Jun Wah Ng
40
7
0
07 Oct 2022
Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition
Somshubra Majumdar
Shantanu Acharya
Vitaly Lavrukhin
Boris Ginsburg
32
3
0
06 Oct 2022
JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT
Mayumi Ohta
Julia Kreutzer
Stefan Riezler
19
0
0
05 Oct 2022
That Sounds Right: Auditory Self-Supervision for Dynamic Robot Manipulation
Abitha Thankaraj
Lerrel Pinto
35
14
0
03 Oct 2022
A Comparison of Transformer, Convolutional, and Recurrent Neural Networks on Phoneme Recognition
Kyuhong Shim
Wonyong Sung
27
2
0
01 Oct 2022
Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition
Jash Rathod
Nauman Dawalatabad
Shatrughan Singh
Dhananjaya N. Gowda
25
9
0
01 Oct 2022
E-Branchformer: Branchformer with Enhanced merging for speech recognition
Kwangyoun Kim
Felix Wu
Yifan Peng
Jing Pan
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
61
105
0
30 Sep 2022
Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition
Chendong Zhao
Jianzong Wang
Wentao Wei
Xiaoyang Qu
Haoqian Wang
Jing Xiao
41
2
0
30 Sep 2022
Dilated Neighborhood Attention Transformer
Ali Hassani
Humphrey Shi
ViT
MedIm
33
68
0
29 Sep 2022
ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition
Martin H. Radfar
Rohit Barnwal
Rupak Vignesh Swaminathan
Feng-Ju Chang
Grant P. Strimel
Nathan Susanj
Athanasios Mouchtaris
36
13
0
29 Sep 2022
A Survey on Physical Adversarial Attack in Computer Vision
Donghua Wang
Wen Yao
Tingsong Jiang
Guijian Tang
Xiaoqian Chen
AAML
71
38
0
28 Sep 2022
TVLT: Textless Vision-Language Transformer
Zineng Tang
Jaemin Cho
Yixin Nie
Joey Tianyi Zhou
VLM
54
28
0
28 Sep 2022
Direct Speech Translation for Automatic Subtitling
Sara Papi
Marco Gaido
Alina Karakanta
Mauro Cettolo
Matteo Negri
Marco Turchi
59
11
0
27 Sep 2022
Multi-encoder attention-based architectures for sound recognition with partial visual assistance
Wim Boes
Hugo Van hamme
19
1
0
26 Sep 2022
End-to-End Lyrics Recognition with Self-supervised Learning
Xiangyu Zhang
Shuyue Stella Li
Zhanhong He
R. Togneri
Leibny Paola García
30
0
0
26 Sep 2022
Unsupervised domain adaptation for speech recognition with unsupervised error correction
Long Mai
Julie Carson-Berndsen
43
8
0
24 Sep 2022
Spatial-aware Speaker Diarization for Multi-channel Multi-party Meeting
Jie Wang
Yuji Liu
Binling Wang
Yiming Zhi
Song Li
Shipeng Xia
Jiayang Zhang
Feng Tong
Lin Li
Q. Hong
31
6
0
24 Sep 2022
CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement
Sherif Abdulatif
Ru Cao
Bin Yang
29
62
0
22 Sep 2022
Graph Reasoning Transformer for Image Parsing
Dong Zhang
Jinhui Tang
Kwang-Ting Cheng
ViT
24
16
0
20 Sep 2022
Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition
Ye Bai
Jie Li
W. Han
Hao Ni
Kaituo Xu
Zhuo Zhang
Cheng Yi
Xiaorui Wang
MoE
31
1
0
17 Sep 2022
What Do Children and Parents Want and Perceive in Conversational Agents? Towards Transparent, Trustworthy, Democratized Agents
Jessica Van Brummelen
M. Kelleher
Mi Tian
Nghi Hoang Nguyen
24
10
0
16 Sep 2022
Distribution Aware Metrics for Conditional Natural Language Generation
David M. Chan
Yiming Ni
David A. Ross
Sudheendra Vijayanarasimhan
Austin Myers
John F. Canny
50
4
0
15 Sep 2022
Non-Parallel Voice Conversion for ASR Augmentation
Gary Wang
Andrew Rosenberg
Bhuvana Ramabhadran
Fadi Biadsy
Yinghui Huang
Jesse Emond
P. M. Mengibar
26
2
0
15 Sep 2022
A Universally-Deployable ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice Separation
Tom O'Malley
A. Narayanan
Quan Wang
27
5
0
14 Sep 2022
Federated Pruning: Improving Neural Network Efficiency with Federated Learning
Rongmei Lin
Yonghui Xiao
Tien-Ju Yang
Ding Zhao
Li Xiong
Giovanni Motta
Franccoise Beaufays
FedML
39
12
0
14 Sep 2022
Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition
Kartik Audhkhasi
Yinghui Huang
Bhuvana Ramabhadran
Pedro J. Moreno
27
3
0
13 Sep 2022
Learning ASR pathways: A sparse multilingual ASR model
Mu Yang
Andros Tjandra
Chunxi Liu
David C. Zhang
Duc Le
Ozlem Kalinli
48
13
0
13 Sep 2022
Streaming Target-Speaker ASR with Neural Transducer
Takafumi Moriya
Hiroshi Sato
Tsubasa Ochiai
Marc Delcroix
T. Shinozaki
34
21
0
09 Sep 2022
Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM
Hayato Futami
Hirofumi Inaguma
Sei Ueno
Masato Mimura
S. Sakai
Tatsuya Kawahara
KELM
60
12
0
08 Sep 2022
AudioLM: a Language Modeling Approach to Audio Generation
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
...
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
AuLLM
73
573
0
07 Sep 2022
ASR2K: Speech Recognition for Around 2000 Languages without Audio
Xinjian Li
Florian Metze
David R. Mortensen
A. Black
Shinji Watanabe
28
27
0
06 Sep 2022
Distilling the Knowledge of BERT for CTC-based ASR
Hayato Futami
Hirofumi Inaguma
Masato Mimura
S. Sakai
Tatsuya Kawahara
29
9
0
05 Sep 2022
Sound Event Localization and Detection for Real Spatial Sound Scenes: Event-Independent Network and Data Augmentation Chains
Jinbo Hu
Yin Cao
Ming Wu
Qiuqiang Kong
Feiran Yang
Mark D. Plumbley
J. Yang
21
9
0
05 Sep 2022
Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception
Jiadong Wang
Xinyuan Qian
Haizhou Li
41
14
0
05 Sep 2022
A Review of Sparse Expert Models in Deep Learning
W. Fedus
J. Dean
Barret Zoph
MoE
25
145
0
04 Sep 2022
Attention Enhanced Citrinet for Speech Recognition
Xianchao Wu
20
1
0
01 Sep 2022
Deep Sparse Conformer for Speech Recognition
Xianchao Wu
30
2
0
01 Sep 2022
Previous
1
2
3
...
23
24
25
...
33
34
35
Next