Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.08100
Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition
16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Conformer: Convolution-augmented Transformer for Speech Recognition"
50 / 1,750 papers shown
Title
HYBRIDFORMER: improving SqueezeFormer with hybrid attention and NSR mechanism
Yuguang Yang
Yu Pan
Jingjing Yin
Jiangyu Han
Lei Ma
Heng Lu
31
8
0
15 Mar 2023
Enhancing Unsupervised Audio Representation Learning via Adversarial Sample Generation
Yulin Pan
Xiangteng He
Biao Gong
Yuxin Peng
Yiliang Lv
SSL
29
0
0
15 Mar 2023
Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models
Steven M. Hernandez
Ding Zhao
Shaojin Ding
A. Bruguier
Rohit Prabhavalkar
Tara N. Sainath
Yanzhang He
Ian McGraw
31
7
0
15 Mar 2023
Learning Cross-lingual Visual Speech Representations
Andreas Zinonos
A. Haliassos
Pingchuan Ma
Stavros Petridis
Maja Pantic
SSL
22
8
0
14 Mar 2023
Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy
Xulong Zhang
Haobin Tang
Jianzong Wang
Ning Cheng
Jian Luo
Jing Xiao
30
2
0
14 Mar 2023
I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Yifan Peng
Jaesong Lee
Shinji Watanabe
32
19
0
14 Mar 2023
Speech Intelligibility Classifiers from 550k Disordered Speech Samples
Subhashini Venugopalan
Jimmy Tobin
Samuel J. Yang
Katie Seaver
Richard Cave
P. Jiang
Neil Zeghidour
Rus Heywood
Jordan R. Green
Michael P. Brenner
49
9
0
13 Mar 2023
Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model
Shuangping Huang
Y. Luo
Zhenzhou Zhuang
Jin-Gang Yu
Mengchao He
Yongpan Wang
40
8
0
13 Mar 2023
The System Description of dun_oscar team for The ICPR MSR Challenge
Binbin Du
Rui Deng
Yingxin Zhang
23
0
0
13 Mar 2023
Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss
Mohammad Zeineldeen
Kartik Audhkhasi
M. Baskar
Bhuvana Ramabhadran
26
2
0
10 Mar 2023
Multi-Dimensional and Multi-Scale Modeling for Speech Separation Optimized by Discriminative Learning
Zhaoxi Mu
Xinyu Yang
Wenjing Zhu
36
5
0
07 Mar 2023
The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis
Haoxu Wang
Ming Cheng
Qiang Fu
Ming Li
46
8
0
04 Mar 2023
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
Yu Zhang
Wei Han
Ankur Bapna
M. Bacchiani
39
22
0
03 Mar 2023
End-to-End Speech Recognition: A Survey
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
31
153
0
03 Mar 2023
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Yu Zhang
Wei Han
James Qin
Yongqiang Wang
Ankur Bapna
...
Pedro J. Moreno
Chung-Cheng Chiu
J. Schalkwyk
Franccoise Beaufays
Yonghui Wu
VLM
85
255
0
02 Mar 2023
Leveraging Large Text Corpora for End-to-End Speech Summarization
Kohei Matsuura
Takanori Ashihara
Takafumi Moriya
Tomohiro Tanaka
A. Ogawa
Marc Delcroix
Ryo Masumura
27
14
0
02 Mar 2023
On the Audio-visual Synchronization for Lip-to-Speech Synthesis
Zhe Niu
Brian Mak
30
3
0
01 Mar 2023
OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge Collaborative AutoML System
Chao Xue
Wen Liu
Shunxing Xie
Zhenfang Wang
Jiaxing Li
...
Shi-Yong Chen
Yibing Zhan
Jing Zhang
Chaoyue Wang
Dacheng Tao
52
2
0
01 Mar 2023
N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space
Rao Ma
Mark Gales
Kate Knill
Mengjie Qian
16
32
0
01 Mar 2023
Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition
Shujie Hu
Xurong Xie
Zengrui Jin
Mengzhe Geng
Yi Wang
Mingyu Cui
Jiajun Deng
Xunying Liu
Helen M. Meng
24
30
0
28 Feb 2023
Practice of the conformer enhanced AUDIO-VISUAL HUBERT on Mandarin and English
Xiaoming Ren
Chao Li
Shenjian Wang
Biao Li
38
0
0
28 Feb 2023
Diagonal State Space Augmented Transformers for Speech Recognition
G. Saon
Ankit Gupta
Xiaodong Cui
AI4TS
40
26
0
27 Feb 2023
Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator
Vladimir Bataev
Roman Korostik
Evgeny Shabalin
Vitaly Lavrukhin
Boris Ginsburg
VLM
38
14
0
27 Feb 2023
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
41
102
0
27 Feb 2023
MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition
Yoohwan Kwon
Soo-Whan Chung
MoE
24
16
0
27 Feb 2023
SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing
Weidong Chen
Xiaofen Xing
Xiangmin Xu
Jianxin Pang
Lan Du
35
38
0
27 Feb 2023
Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training Model
Jaeyoung Huh
Sangjoon Park
Jeonghyeon Lee
Jong Chul Ye
LM&MA
25
9
0
27 Feb 2023
Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video
Minsu Kim
Chae Won Kim
Y. Ro
CVBM
DiffM
38
3
0
27 Feb 2023
Improving Massively Multilingual ASR With Auxiliary CTC Objectives
William Chen
Brian Yan
Jiatong Shi
Yifan Peng
Soumi Maiti
Shinji Watanabe
39
38
0
24 Feb 2023
Factual Consistency Oriented Speech Recognition
Naoyuki Kanda
Takuya Yoshioka
Yang Liu
45
0
0
24 Feb 2023
D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network using Joint Complex Masking and Complex Spectral Mapping for Monaural Speech Enhancement
Shengkui Zhao
Bin Ma
40
16
0
23 Feb 2023
MossFormer: Pushing the Performance Limit of Monaural Speech Separation using Gated Single-Head Transformer with Convolution-Augmented Joint Self-Attentions
Shengkui Zhao
Bin Ma
41
53
0
23 Feb 2023
Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition
Yuchen Hu
Chen Chen
Ruizhe Li
Qiu-shi Zhu
Eng Siong Chng
42
15
0
22 Feb 2023
Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation
Xiaoqiang Wang
Yanqing Liu
Jinyu Li
Sheng Zhao
34
7
0
22 Feb 2023
Efficient CTC Regularization via Coarse Labels for End-to-End Speech Translation
Biao Zhang
Barry Haddow
Rico Sennrich
19
3
0
21 Feb 2023
DasFormer: Deep Alternating Spectrogram Transformer for Multi/Single-Channel Speech Separation
Shuo Wang
Xiangyu Kong
Xiulian Peng
H. Movassagh
Vinod Prakash
Yan Lu
26
11
0
21 Feb 2023
A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One
Lingwei Meng
Jiawen Kang
Mingyu Cui
Yuejiao Wang
Xixin Wu
Helen M. Meng
25
17
0
20 Feb 2023
Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition
Leyuan Qu
C. Weber
S. Wermter
19
5
0
20 Feb 2023
Why Is Public Pretraining Necessary for Private Model Training?
Arun Ganesh
Mahdi Haghifam
Milad Nasr
Sewoong Oh
Thomas Steinke
Om Thakkar
Abhradeep Thakurta
Lun Wang
31
36
0
19 Feb 2023
Massively Multilingual Shallow Fusion with Large Language Models
Ke Hu
Tara N. Sainath
Yue Liu
Nan Du
Yanping Huang
Andrew M. Dai
Yu Zhang
Rodrigo Cabrera
Zhehuai Chen
Trevor Strohman
40
13
0
17 Feb 2023
Lip-to-Speech Synthesis in the Wild with Multi-task Learning
Minsu Kim
Joanna Hong
Y. Ro
27
21
0
17 Feb 2023
Conformers are All You Need for Visual Speech Recognition
Oscar Chang
H. Liao
Dmitriy Serdyuk
Ankit Parag Shah
Olivier Siohan
VLM
57
14
0
17 Feb 2023
Improving Transformer-based Networks With Locality For Automatic Speaker Verification
Mufan Sang
Yong Zhao
Gang Liu
John H. L. Hansen
Jian Wu
ViT
33
14
0
17 Feb 2023
E2E Spoken Entity Extraction for Virtual Agents
Karan Singla
Yeon-Jun Kim
S. Bangalore
34
1
0
16 Feb 2023
Adaptable End-to-End ASR Models using Replaceable Internal LMs and Residual Softmax
Keqi Deng
P. Woodland
AuLLM
KELM
37
11
0
16 Feb 2023
ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations
Shehzeen Samarah Hussain
Paarth Neekhara
Jocelyn Huang
Jason Chun Lok Li
Boris Ginsburg
13
21
0
16 Feb 2023
Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition
Minsu Kim
Hyungil Kim
Y. Ro
VLM
18
18
0
16 Feb 2023
Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems
Jiajun Deng
Xurong Xie
Tianzi Wang
Mingyu Cui
Boyang Xue
Zengrui Jin
Guinan Li
Shujie Hu
Xunying Liu
31
5
0
15 Feb 2023
PATCorrect: Non-autoregressive Phoneme-augmented Transformer for ASR Error Correction
Zi Xuan Zhang
Zhehui Wang
R. Kamma
S. Eswaran
Narayanan Sadagopan
KELM
36
4
0
10 Feb 2023
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Jiachen Lian
Alexei Baevski
Wei-Ning Hsu
Michael Auli
SSL
47
34
0
10 Feb 2023
Previous
1
2
3
...
19
20
21
...
33
34
35
Next