Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.08100
Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition
16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Conformer: Convolution-augmented Transformer for Speech Recognition"
50 / 1,749 papers shown
Title
CVSS Corpus and Massively Multilingual Speech-to-Speech Translation
Yeting Jia
Michelle Tadmor Ramanovich
Quan Wang
Heiga Zen
SLR
36
66
0
11 Jan 2022
Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks
Shou-Yong Hu
Xurong Xie
Mingyu Cui
Jiajun Deng
Shansong Liu
Jianwei Yu
Mengzhe Geng
Xunying Liu
Helen Meng
44
26
0
08 Jan 2022
Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model
Jinchuan Tian
Jianwei Yu
Chao Weng
Yuexian Zou
Dong Yu
31
10
0
06 Jan 2022
Voice Quality and Pitch Features in Transformer-Based Speech Recognition
Guillermo Cámbara
Jordi Luque
Mireia Farrús
24
0
0
21 Dec 2021
JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification
Shinnosuke Takamichi
Ludwig Kurzinger
Takaaki Saeki
Sayaka Shiota
Shinji Watanabe
16
22
0
17 Dec 2021
Self-Supervised Learning for speech recognition with Intermediate layer supervision
Chengyi Wang
Yu-Huan Wu
Sanyuan Chen
Shujie Liu
Jinyu Li
Yao Qian
Zhenglu Yang
SSL
26
28
0
16 Dec 2021
Progressive Graph Convolution Network for EEG Emotion Recognition
Yijing Zhou
Fu Li
Yang Li
Youshuo Ji
Guangming Shi
Wenming Zheng
Lijian Zhang
Yuanfang Chen
Rui Cheng
25
37
0
14 Dec 2021
PM-MMUT: Boosted Phone-Mask Data Augmentation using Multi-Modeling Unit Training for Phonetic-Reduction-Robust E2E Speech Recognition
Guodong Ma
Pengfei Hu
Nurmemet Yolwas
Shen Huang
Hao-Ming Huang
27
4
0
13 Dec 2021
ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation
Holy Lovenia
Samuel Cahyawijaya
Genta Indra Winata
Peng Xu
Xu Yan
...
Elham J. Barezi
Qifeng Chen
Xiaojuan Ma
Bertram E. Shi
Pascale Fung
36
32
0
12 Dec 2021
Perceptual Loss with Recognition Model for Single-Channel Enhancement and Robust ASR
Peter William VanHarn Plantinga
Deblin Bagchi
Eric Fosler-Lussier
46
10
0
11 Dec 2021
Are E2E ASR models ready for an industrial usage?
Valentin Vielzeuf
G. Antipov
26
8
0
09 Dec 2021
Audio-Visual Synchronisation in the wild
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
26
37
0
08 Dec 2021
A Transferable Approach for Partitioning Machine Learning Models on Multi-Chip-Modules
Xinfeng Xie
Prakash Prabhu
Ulysse Beaugnon
P. Phothilimthana
Sudip Roy
Azalia Mirhoseini
E. Brevdo
James Laudon
Yanqi Zhou
30
5
0
07 Dec 2021
BBS-KWS:The Mandarin Keyword Spotting System Won the Video Keyword Wakeup Challenge
Yuting Yang
Binbin Du
Yingxin Zhang
Wenxuan Wang
Yuke Li
21
0
0
03 Dec 2021
Deliberation of Streaming RNN-Transducer by Non-autoregressive Decoding
Weiran Wang
Ke Hu
Tara N. Sainath
35
21
0
01 Dec 2021
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization
Brian Yan
Chunlei Zhang
Meng Yu
Shi-Xiong Zhang
Siddharth Dalmia
Dan Berrebbi
Chao Weng
Shinji Watanabe
Dong Yu
19
22
0
29 Nov 2021
ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
Siddhant Arora
Siddharth Dalmia
Pavel Denisov
Xuankai Chang
Yushi Ueda
...
Karthik Ganesan
Brian Yan
Ngoc Thang Vu
A. Black
Shinji Watanabe
VLM
33
74
0
29 Nov 2021
Global Interaction Modelling in Vision Transformer via Super Tokens
Ammarah Farooq
Muhammad Awais
S. Ahmed
J. Kittler
ViT
36
6
0
25 Nov 2021
SimpleTRON: Simple Transformer with O(N) Complexity
Uladzislau Yorsh
Alexander Kovalenko
Vojtvech Vanvcura
Daniel Vavsata
Pavel Kordík
Tomávs Mikolov
33
1
0
23 Nov 2021
Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance
Heeseung Kim
Sungwon Kim
Sungroh Yoon
DiffM
BDL
19
107
0
23 Nov 2021
Semi-Supervised Vision Transformers
Zejia Weng
Xitong Yang
Ang Li
Zuxuan Wu
Yu-Gang Jiang
ViT
17
40
0
22 Nov 2021
Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature
Yiwen Shao
Shi-Xiong Zhang
Dong Yu
18
15
0
22 Nov 2021
Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions
Chunxi Liu
M. Picheny
Leda Sari
Pooja Chitkara
Alex Xiao
Xiaohui Zhang
Mark Chou
Andres Alvarado
C. Hazirbas
Yatharth Saraf
30
41
0
18 Nov 2021
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation
Tom O'Malley
A. Narayanan
Quan Wang
Alex Park
James Walker
N. Howard
33
27
0
18 Nov 2021
Joint Unsupervised and Supervised Training for Multilingual ASR
Junwen Bai
Bo-wen Li
Yu Zhang
Ankur Bapna
Nikhil Siddhartha
K. Sim
Tara N. Sainath
32
58
0
15 Nov 2021
Attention based end to end Speech Recognition for Voice Search in Hindi and English
Raviraj Joshi
Venkateshan Kannan
20
6
0
15 Nov 2021
Soft-Sensing ConFormer: A Curriculum Learning-based Convolutional Transformer
Jaswanth K. Yella
Chao Zhang
Sergei Petrov
Yu Huang
Xiaoye Qian
A. Minai
Sthitie Bom
33
7
0
12 Nov 2021
Transformer-based Image Compression
Ming Lu
Peiyao Guo
Huiqing Shi
Chuntong Cao
Zhan Ma
ViT
64
103
0
12 Nov 2021
Uformer: A Unet based dilated complex & real dual-path conformer network for simultaneous speech enhancement and dereverberation
Yihui Fu
Yun Liu
Jingdong Li
Dawei Luo
Shubo Lv
Yukai Jv
Lei Xie
27
49
0
11 Nov 2021
Oracle Teacher: Leveraging Target Information for Better Knowledge Distillation of CTC Models
J. Yoon
H. Kim
Hyeon Seung Lee
Sunghwan Ahn
N. Kim
36
1
0
05 Nov 2021
Conformer-based Hybrid ASR System for Switchboard Dataset
Mohammad Zeineldeen
Jingjing Xu
Christoph Luscher
Wilfried Michel
Alexander Gerstenberger
Ralf Schluter
Hermann Ney
22
24
0
05 Nov 2021
MT3: Multi-Task Multitrack Music Transcription
Josh Gardner
Ian Simon
Ethan Manilow
Curtis Hawthorne
Jesse Engel
37
95
0
04 Nov 2021
WaveFake: A Data Set to Facilitate Audio Deepfake Detection
Joel Frank
Lea Schonherr
DiffM
129
123
0
04 Nov 2021
Recent Advances in End-to-End Automatic Speech Recognition
Jinyu Li
VLM
35
363
0
02 Nov 2021
Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity
Peter Wu
Jiatong Shi
Yifan Zhong
Shinji Watanabe
A. Black
27
8
0
02 Nov 2021
Sequence Transduction with Graph-based Supervision
Niko Moritz
Takaaki Hori
Shinji Watanabe
Jonathan Le Roux
24
6
0
01 Nov 2021
Exploring Non-Autoregressive End-To-End Neural Modeling For English Mispronunciation Detection And Diagnosis
Hsin-Wei Wang
Bi-Cheng Yan
Hsuan-Sheng Chiu
Yung-Chang Hsu
Berlin Chen
21
7
0
01 Nov 2021
SNRi Target Training for Joint Speech Enhancement and Recognition
Yuma Koizumi
Shigeki Karita
A. Narayanan
S. Panchapagesan
M. Bacchiani
30
14
0
01 Nov 2021
Cross-attention conformer for context modeling in speech enhancement for ASR
A. Narayanan
Chung-Cheng Chiu
Tom O'Malley
Quan Wang
Yanzhang He
24
14
0
30 Oct 2021
Visual Keyword Spotting with Attention
Prajwal K R
Liliane Momeni
Triantafyllos Afouras
Andrew Zisserman
11
13
0
29 Oct 2021
Combining Unsupervised and Text Augmented Semi-Supervised Learning for Low Resourced Autoregressive Speech Recognition
Chak-Fai Li
Francis Keith
William Hartmann
M. Snover
SSL
21
2
0
29 Oct 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
124
1,715
0
26 Oct 2021
DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021
Yanqing Liu
Rui Shao
G. Wang
Kuan Chen
Bohan Li
Pong C. Yuen
Jinzhu Li
Lei He
Sheng Zhao
34
55
0
25 Oct 2021
Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition
Ting-Yao Hu
Mohammadreza Armandpour
A. Shrivastava
Jen-Hao Rick Chang
H. Koppula
Oncel Tuzel
SyDa
55
42
0
21 Oct 2021
SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training
Ankur Bapna
Yu-An Chung
Na Wu
Anmol Gulati
Ye Jia
J. Clark
Melvin Johnson
Jason Riesa
Alexis Conneau
Yu Zhang
VLM
61
94
0
20 Oct 2021
Personalized Speech Enhancement: New Models and Comprehensive Evaluation
Sefik Emre Eskimez
Takuya Yoshioka
Huaming Wang
Xiaofei Wang
Zhuo Chen
Xuedong Huang
32
62
0
18 Oct 2021
VRM-Phase I VKW system description of long-short video customizable keyword wakeup challenge
Yougen Yuan
Zhiqiang Lv
Shen Huang
Pengfei Hu
14
0
0
18 Oct 2021
A Unified Speaker Adaptation Approach for ASR
Yingzhu Zhao
Chongjia Ni
C. Leung
Chenyu You
Chng Eng Siong
B. Ma
CLL
92
9
0
16 Oct 2021
StreaMulT: Streaming Multimodal Transformer for Heterogeneous and Arbitrary Long Sequential Data
Victor Pellegrain
Myriam Tami
M. Batteux
C´eline Hudelot
AI4TS
28
2
0
15 Oct 2021
ESPnet2-TTS: Extending the Edge of TTS Research
Tomoki Hayashi
Ryuichi Yamamoto
Takenori Yoshimura
Peter Wu
Jiatong Shi
Takaaki Saeki
Yooncheol Ju
Yusuke Yasuda
Shinnosuke Takamichi
Shinji Watanabe
VLM
50
60
0
15 Oct 2021
Previous
1
2
3
...
29
30
31
...
33
34
35
Next