Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.08100
Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition
16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Conformer: Convolution-augmented Transformer for Speech Recognition"
50 / 1,749 papers shown
Title
Enhance Language Identification using Dual-mode Model with Knowledge Distillation
Hexin Liu
Leibny Paola García Perera
Andy W. H. Khong
Justin Dauwels
S. Styles
Sanjeev Khudanpur
VLM
30
5
0
07 Mar 2022
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features
Florian Lux
Ngoc Thang Vu
25
29
0
07 Mar 2022
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform
Takuhiro Kaneko
Kou Tanaka
Hirokazu Kameoka
Shogo Seki
30
60
0
04 Mar 2022
Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement
Jun Xiong
Yu Zhou
Peng Zhang
Lei Xie
Wei Huang
Yufei Zha
33
20
0
04 Mar 2022
MANNER: Multi-view Attention Network for Noise Erasure
Hyun Joon Park
Byung Ha Kang
Wooseok Shin
Jin Sob Kim
S. W. Han
30
48
0
04 Mar 2022
Audio Self-supervised Learning: A Survey
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
40
106
0
02 Mar 2022
A Conformer Based Acoustic Model for Robust Automatic Speech Recognition
Yufeng Yang
Peidong Wang
DeLiang Wang
20
12
0
01 Mar 2022
A Brief Overview of Unsupervised Neural Speech Representation Learning
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
Lars Maaløe
Christian Igel
BDL
AI4TS
SSL
19
11
0
01 Mar 2022
TRILLsson: Distilled Universal Paralinguistic Speech Representations
Joel Shor
Subhashini Venugopalan
25
37
0
01 Mar 2022
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR
Xuankai Chang
Niko Moritz
Takaaki Hori
Shinji Watanabe
Jonathan Le Roux
24
6
0
01 Mar 2022
PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers
Yunseong Kim
Yujeong Choi
Minsoo Rhu
23
15
0
27 Feb 2022
Learning the Beauty in Songs: Neural Singing Voice Beautifier
Jinglin Liu
Chengxi Li
Yi Ren
Zhiying Zhu
Zhou Zhao
DiffM
35
14
0
27 Feb 2022
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
M. Pantic
VLM
130
145
0
26 Feb 2022
A Survey of Multilingual Models for Automatic Speech Recognition
Hemant Yadav
Sunayana Sitaram
24
35
0
25 Feb 2022
Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech
Quan Wang
Yang Yu
Jason W. Pelecanos
Yiling Huang
Ignacio López Moreno
21
14
0
24 Feb 2022
A Differential Attention Fusion Model Based on Transformer for Time Series Forecasting
Benhan Li
Shengdong Du
Tianrui Li
AI4TS
28
2
0
23 Feb 2022
Korean Tokenization for Beam Search Rescoring in Speech Recognition
Kyuhong Shim
Hyewon Bae
Wonyong Sung
24
0
0
22 Feb 2022
Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition
Mengzhe Geng
Xurong Xie
Zi Ye
Tianzi Wang
Guinan Li
Shujie Hu
Xunying Liu
Helen Meng
22
28
0
21 Feb 2022
Adaptive Discounting of Implicit Language Models in RNN-Transducers
Vinit Unni
Shreya Khare
Ashish R. Mittal
P. Jyothi
Sunita Sarawagi
Samarth Bharadwaj
27
3
0
21 Feb 2022
Domain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer Models
Vrunda N. Sukhadia
S. Umesh
36
8
0
18 Feb 2022
VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion
Disong Wang
Shan Yang
Dan Su
Xunying Liu
Dong Yu
Helen Meng
15
11
0
18 Feb 2022
AISHELL-NER: Named Entity Recognition from Chinese Speech
Boli Chen
Guangwei Xu
Xiaobin Wang
Pengjun Xie
Meishan Zhang
Fei Huang
16
30
0
17 Feb 2022
Mitigating Closed-model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech Recognition
Chao-Han Huck Yang
Zeeshan Ahmed
Yile Gu
Joseph Szurley
Roger Ren
Linda Liu
A. Stolcke
I. Bulyko
AAML
24
3
0
17 Feb 2022
Non-Autoregressive ASR with Self-Conditioned Folded Encoders
Tatsuya Komatsu
28
7
0
17 Feb 2022
MLP-ASR: Sequence-length agnostic all-MLP architectures for speech recognition
Jin Sakuma
Tatsuya Komatsu
Robin Scheibler
21
6
0
17 Feb 2022
Capitalization Normalization for Language Modeling with an Accurate and Efficient Hierarchical RNN Model
Hao Zhang
You-Chi Cheng
Shankar Kumar
Yifan Jiang
Mingqing Chen
Rajiv Mathews
18
7
0
16 Feb 2022
Knowledge Transfer from Large-scale Pretrained Language Models to End-to-end Speech Recognizers
Yotaro Kubo
Shigeki Karita
M. Bacchiani
8
26
0
16 Feb 2022
Conversational Speech Recognition By Learning Conversation-level Characteristics
Kun Wei
Yike Zhang
Sining Sun
Lei Xie
Long Ma
43
7
0
16 Feb 2022
Visual Acoustic Matching
Changan Chen
Ruohan Gao
P. Calamia
Kristen Grauman
21
56
0
14 Feb 2022
The xmuspeech system for multi-channel multi-party meeting transcription challenge
Jie Wang
Yuji Liu
Binling Wang
Yiming Zhi
Song Li
Shipeng Xia
Jiayang Zhang
Lin Li
Q. Hong
Feng Tong
16
0
0
11 Feb 2022
Improving Automatic Speech Recognition for Non-Native English with Transfer Learning and Language Model Decoding
Peter Sullivan
Toshiko Shibano
Muhammad Abdul-Mageed
44
11
0
10 Feb 2022
The Volcspeech system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge
Chen Shen
Yi Y. Liu
Wenzhi Fan
Bin Wang
Shi-Xue Wen
Yao Tian
Jun Zhang
Jingsheng Yang
Zejun Ma
12
4
0
09 Feb 2022
Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge
Fan Yu
Shiliang Zhang
Pengcheng Guo
Yihui Fu
Zhihao Du
...
Kong Aik Lee
Zhijie Yan
B. Ma
Xin Xu
Hui Bu
18
28
0
08 Feb 2022
Exploring Self-Attention Mechanisms for Speech Separation
Cem Subakan
Mirco Ravanelli
Samuele Cornell
François Grondin
Mirko Bronzi
40
23
0
06 Feb 2022
Self-supervised Learning with Random-projection Quantizer for Speech Recognition
Chung-Cheng Chiu
James Qin
Yu Zhang
Jiahui Yu
Yonghui Wu
SSL
30
163
0
03 Feb 2022
The RoyalFlush System of Speech Recognition for M2MeT Challenge
Shuaishuai Ye
Peiyao Wang
Shunfei Chen
Xinhui Hu
Xinkang Xu
24
5
0
03 Feb 2022
mSLAM: Massively multilingual joint pre-training for speech and text
Ankur Bapna
Colin Cherry
Yu Zhang
Ye Jia
Melvin Johnson
Yong Cheng
Simran Khanuja
Jason Riesa
Alexis Conneau
VLM
30
111
0
03 Feb 2022
Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection
Minglun Han
Linhao Dong
Zhenlin Liang
Meng Cai
Shiyu Zhou
Zejun Ma
Bo Xu
26
45
0
30 Jan 2022
The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge
Ziyi Chen
Hua Hua
Yuxiang Zhang
Ming Li
Pengyuan Zhang
27
0
0
29 Jan 2022
Star Temporal Classification: Sequence Classification with Partially Labeled Data
Vineel Pratap
Awni Y. Hannun
Gabriel Synnaeve
R. Collobert
23
8
0
28 Jan 2022
Reducing language context confusion for end-to-end code-switching automatic speech recognition
Shuai Zhang
Jiangyan Yi
Zhengkun Tian
J. Tao
Y. Yeung
Liqun Deng
27
11
0
28 Jan 2022
Improving End-to-End Models for Set Prediction in Spoken Language Understanding
H. Kuo
Zoltán Tüske
Samuel Thomas
Brian Kingsbury
G. Saon
21
0
0
28 Jan 2022
On the Effectiveness of Pinyin-Character Dual-Decoding for End-to-End Mandarin Chinese ASR
Zhao Yang
Dianwen Ng
Xiao Fu
Liping Han
Wei Xi
Ruimeng Wang
Rui Jiang
Jizhong Zhao
40
2
0
26 Jan 2022
Internal Language Model Estimation Through Explicit Context Vector Learning for Attention-based Encoder-decoder ASR
Yufei Liu
Rao Ma
Haihua Xu
Yi He
Zejun Ma
Weibin Zhang
28
12
0
26 Jan 2022
Noise-robust voice conversion with domain adversarial training
Hongqiang Du
Lei Xie
Haizhou Li
19
11
0
26 Jan 2022
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
94
40
0
25 Jan 2022
Improving the fusion of acoustic and text representations in RNN-T
Chao Zhang
Bo-wen Li
Zhiyun Lu
Tara N. Sainath
Shuo-yiin Chang
AI4CE
43
12
0
25 Jan 2022
SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training
Wenyong Huang
Zhenhe Zhang
Y. Yeung
Xin Jiang
Qun Liu
38
23
0
25 Jan 2022
A Study of Transducer based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies
Florian Boyer
Yusuke Shinohara
Takaaki Ishii
Hirofumi Inaguma
Shinji Watanabe
35
34
0
14 Jan 2022
CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition
Wenliang Dai
Samuel Cahyawijaya
Tiezheng Yu
Elham J. Barezi
Peng Xu
...
Genta Indra Winata
Qifeng Chen
Xiaojuan Ma
Bertram E. Shi
Pascale Fung
41
11
0
11 Jan 2022
Previous
1
2
3
...
28
29
30
...
33
34
35
Next