Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.08100
Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition
16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Conformer: Convolution-augmented Transformer for Speech Recognition"
50 / 1,750 papers shown
Title
The THUEE System Description for the IARPA OpenASR21 Challenge
Jing Zhao
Haoyu Wang
Jinpeng Li
Shuzhou Chai
Guan-Bo Wang
Guoguo Chen
Weiqiang Zhang
VLM
27
1
0
29 Jun 2022
On the Prediction Network Architecture in RNN-T for ASR
Dario Albesano
Jesús Andrés-Ferrer
Nicola Ferri
Puming Zhan
AI4TS
26
0
0
29 Jun 2022
Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models
Daniel Bermuth
Alexander Poeppel
W. Reif
31
7
0
29 Jun 2022
Deformable Graph Transformer
Jinyoung Park
Seongjun Yun
Hyeon-ju Park
Jaewoo Kang
Jisu Jeong
KyungHyun Kim
Jung-Woo Ha
Hyunwoo J. Kim
93
7
0
29 Jun 2022
A Hierarchical Speaker Representation Framework for One-shot Singing Voice Conversion
Xu Li
Shansong Liu
Ying Shan
35
13
0
28 Jun 2022
Exploring linguistic feature and model combination for speech recognition based automatic AD detection
Yi Wang
Tianzi Wang
Zi Ye
Lingwei Meng
Shoukang Hu
Xixin Wu
Xunying Liu
Helen M. Meng
47
17
0
28 Jun 2022
Tiny-Sepformer: A Tiny Time-Domain Transformer Network for Speech Separation
Jian Luo
Jianzong Wang
Ning Cheng
Edward Xiao
Xulong Zhang
Jing Xiao
ViT
32
12
0
28 Jun 2022
Wav2Vec-Aug: Improved self-supervised training with limited data
Anuroop Sriram
Michael Auli
Alexei Baevski
SSL
VLM
22
15
0
27 Jun 2022
TALCS: An Open-Source Mandarin-English Code-Switching Corpus and a Speech Recognition Baseline
Chengfei Li
Shuhao Deng
Yaoping Wang
Guangjing Wang
Y. Gong
Changbin Chen
Jinfeng Bai
33
16
0
27 Jun 2022
Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire
Zhiyun Fan
Linhao Dong
Meng Cai
Zejun Ma
Bo Xu
36
4
0
27 Jun 2022
Improving the Training Recipe for a Robust Conformer-based Hybrid Model
Mohammad Zeineldeen
Jingjing Xu
Christoph Luscher
Ralf Schluter
Hermann Ney
36
18
0
26 Jun 2022
On Comparison of Encoders for Attention based End to End Speech Recognition in Standalone and Rescoring Mode
Raviraj Joshi
Subodh Kumar
36
2
0
26 Jun 2022
Multitask vocal burst modeling with ResNets and pre-trained paralinguistic Conformers
Joshua Belanich
Krishna Somandepalli
B. Eoff
Brendan Jou
30
2
0
24 Jun 2022
Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech
Florian Lux
Julia Koch
Ngoc Thang Vu
40
19
0
24 Jun 2022
Confidence Score Based Conformer Speaker Adaptation for Speech Recognition
Jiajun Deng
Xurong Xie
Tianzi Wang
Mingyu Cui
Boyang Xue
Zengrui Jin
Mengzhe Geng
Guinan Li
Xunying Liu
Helen M. Meng
19
13
0
24 Jun 2022
NTIRE 2022 Challenge on Perceptual Image Quality Assessment
Jinjin Gu
Haoming Cai
Chao Dong
Jimmy S. J. Ren
Radu Timofte
SupR
60
67
0
23 Jun 2022
Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection
Tianzi Wang
Jiajun Deng
Mengzhe Geng
Zi Ye
Shoukang Hu
Yi Wang
Mingyu Cui
Zengrui Jin
Xunying Liu
Helen M. Meng
33
20
0
23 Jun 2022
Pruned RNN-T for fast, memory-efficient ASR training
Fangjun Kuang
Liyong Guo
Wei Kang
Long Lin
Mingshuang Luo
Zengwei Yao
Daniel Povey
27
64
0
23 Jun 2022
Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Switchboard Corpus
Junhao Xu
Shoukang Hu
Xunying Liu
Helen M. Meng
MQ
19
5
0
23 Jun 2022
Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems
Mingyu Cui
Jiajun Deng
Shoukang Hu
Xurong Xie
Tianzi Wang
Shujie Hu
Mengzhe Geng
Boyang Xue
Xunying Liu
Helen M. Meng
35
9
0
23 Jun 2022
Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in Singing Voice Synthesis
Tae-Woo Kim
Minguk Kang
Gyeong-Hoon Lee
AAML
34
6
0
23 Jun 2022
UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2022
Yuanhang Zhang
Susan Liang
Shuang Yang
Shiguang Shan
10
4
0
22 Jun 2022
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
...
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
137
1,076
0
22 Jun 2022
Transfer Learning for Robust Low-Resource Children's Speech ASR with Transformers and Source-Filter Warping
Jenthe Thienpondt
Kris Demuynck
25
11
0
19 Jun 2022
Learning Multiscale Transformer Models for Sequence Generation
Bei Li
Tong Zheng
Yi Jing
Chengbo Jiao
Tong Xiao
Jingbo Zhu
32
9
0
19 Jun 2022
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
Zhifu Gao
Shiliang Zhang
Ian Mcloughlin
Zhijie Yan
20
92
0
16 Jun 2022
Automatic Prosody Annotation with Pre-Trained Text-Speech Model
Ziqian Dai
Jianwei Yu
Yan Wang
Nuo Chen
Yanyao Bian
Guangzhi Li
Deng Cai
Dong Yu
217
7
0
16 Jun 2022
Residual Language Model for End-to-end Speech Recognition
E. Tsunoo
Yosuke Kashiwagi
Chaitanya Narisetty
Shinji Watanabe
33
11
0
15 Jun 2022
Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition
Shujie Hu
Xurong Xie
Mengzhe Geng
Mingyu Cui
Jiajun Deng
Guinan Li
Tianzi Wang
Xunying Liu
Helen Meng
30
7
0
15 Jun 2022
BigVGAN: A Universal Neural Vocoder with Large-Scale Training
Sang-gil Lee
Ming-Yu Liu
Boris Ginsburg
Bryan Catanzaro
Sung-Hoon Yoon
33
230
0
09 Jun 2022
Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos
Alexander Waibel
M. Behr
Fevziye Irem Eyiokur
Dogucan Yaman
Tuan-Nam Nguyen
Carlos Mullov
Mehmet Arif Demirtas
Alperen Kantarci
Stefan Constantin
H. K. Ekenel
CVBM
15
14
0
09 Jun 2022
LegoNN: Building Modular Encoder-Decoder Models
Siddharth Dalmia
Dmytro Okhonko
M. Lewis
Sergey Edunov
Shinji Watanabe
Florian Metze
Luke Zettlemoyer
Abdel-rahman Mohamed
AuLLM
MoE
29
14
0
07 Jun 2022
FedNST: Federated Noisy Student Training for Automatic Speech Recognition
Haaris Mehmood
A. Dobrowolska
Karthikeyan P. Saravanan
Mete Ozay
24
7
0
06 Jun 2022
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR
Jinchuan Tian
Jianwei Yu
Chunlei Zhang
Chao Weng
Yuexian Zou
Dong Yu
AuLLM
22
25
0
05 Jun 2022
Learning Speaker-specific Lip-to-Speech Generation
Munender Varshney
Ravindra Yadav
Vinay P. Namboodiri
R. Hegde
29
7
0
04 Jun 2022
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Sehoon Kim
A. Gholami
Albert Eaton Shaw
Nicholas Lee
K. Mangalam
Jitendra Malik
Michael W. Mahoney
Kurt Keutzer
32
99
0
02 Jun 2022
Chefs' Random Tables: Non-Trigonometric Random Features
Valerii Likhosherstov
K. Choromanski
Kumar Avinava Dubey
Frederick Liu
Tamás Sarlós
Adrian Weller
38
17
0
30 May 2022
Contrastive Siamese Network for Semi-supervised Speech Recognition
S. Khorram
Jaeyoung Kim
Anshuman Tripathi
Han Lu
Qian Zhang
Hasim Sak
SSL
31
11
0
27 May 2022
Global Normalization for Streaming Speech Recognition in a Modular Framework
Ehsan Variani
Ke Wu
Michael Riley
David Rybach
Matt Shannon
Cyril Allauzen
20
9
0
26 May 2022
Contextual Adapters for Personalized Speech Recognition in Neural Transducers
Kanthashree Mysore Sathyendra
Thejaswi Muniyappa
Feng-Ju Chang
Jing Liu
Jinru Su
Grant P. Strimel
Athanasios Mouchtaris
Siegfried Kunzmann
19
75
0
26 May 2022
Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling
Kaitao Song
Yichong Leng
Xu Tan
Yicheng Zou
Tao Qin
Dongsheng Li
16
11
0
25 May 2022
Improving CTC-based ASR Models with Gated Interlayer Collaboration
Yuting Yang
Yuke Li
Binbin Du
34
11
0
25 May 2022
FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech
Alexis Conneau
Min Ma
Simran Khanuja
Yu Zhang
Vera Axelrod
Siddharth Dalmia
Jason Riesa
Clara E. Rivera
Ankur Bapna
VLM
91
287
0
25 May 2022
Adaptive multilingual speech recognition with pretrained models
Ngoc-Quan Pham
A. Waibel
Jan Niehues
VLM
17
23
0
24 May 2022
Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition
Yuting Yang
Binbin Du
Yuke Li
26
1
0
24 May 2022
Content-Context Factorized Representations for Automated Speech Recognition
David M. Chan
Shalini Ghosh
36
11
0
19 May 2022
End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions
Wonjune Kang
M. Hasegawa-Johnson
D. Roy
37
8
0
19 May 2022
The AI Mechanic: Acoustic Vehicle Characterization Neural Networks
Adam M. Terwilliger
J. Siegel
22
2
0
19 May 2022
Minimising Biasing Word Errors for Contextual ASR with the Tree-Constrained Pointer Generator
Guangzhi Sun
C. Zhang
P. Woodland
34
14
0
18 May 2022
Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation
Qianqian Dong
Fengpeng Yue
Tom Ko
Mingxuan Wang
Qibing Bai
Yu Zhang
49
16
0
18 May 2022
Previous
1
2
3
...
25
26
27
...
33
34
35
Next