Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.08100
Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition
16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Conformer: Convolution-augmented Transformer for Speech Recognition"
50 / 1,750 papers shown
Title
A General Unfolding Speech Enhancement Method Motivated by Taylor's Theorem
Andong Li
Guochen Yu
C. Zheng
Wenzhe Liu
Xiaodong Li
48
10
0
30 Nov 2022
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Xiaohuan Zhou
Jiaming Wang
Zeyu Cui
Shiliang Zhang
Zhijie Yan
Jingren Zhou
Chang Zhou
40
12
0
29 Nov 2022
Comparison Study Between Token Classification and Sequence Classification In Text Classification
Amir Jafari
27
5
0
25 Nov 2022
Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition
Injy Hamed
A. Hussein
Oumnia Chellah
Shammur A. Chowdhury
Hamdy Mubarak
Sunayana Sitaram
Nizar Habash
Ahmed M. Ali
36
6
0
22 Nov 2022
SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale
Raphael Tang
K. Kumar
Gefei Yang
Akshat Pandey
Yajie Mao
Vladislav Belyaev
Madhuri Emmadi
Craig Murray
Ferhan Ture
Jimmy J. Lin
27
4
0
21 Nov 2022
SSCFormer: Push the Limit of Chunk-wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal Convolution
Fangyuan Wang
Bo Xu
Bo Xu
45
0
0
21 Nov 2022
Self-Remixing: Unsupervised Speech Separation via Separation and Remixing
Kohei Saijo
Tetsuji Ogawa
SSL
22
11
0
18 Nov 2022
VeLO: Training Versatile Learned Optimizers by Scaling Up
Luke Metz
James Harrison
C. Freeman
Amil Merchant
Lucas Beyer
...
Naman Agrawal
Ben Poole
Igor Mordatch
Adam Roberts
Jascha Narain Sohl-Dickstein
40
60
0
17 Nov 2022
Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models
Simon Alexanderson
Rajmund Nagy
Jonas Beskow
G. Henter
DiffM
VGen
29
166
0
17 Nov 2022
LongFNT: Long-form Speech Recognition with Factorized Neural Transducer
Xun Gong
Yu-Huan Wu
Jinyu Li
Shujie Liu
Rui Zhao
Xie Chen
Y. Qian
RALM
32
10
0
17 Nov 2022
Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire
Zhiyun Fan
Zhenlin Liang
Linhao Dong
Yi Liu
Shiyu Zhou
Meng Cai
Jun Zhang
Zejun Ma
Bo Xu
37
2
0
17 Nov 2022
Unsupervised Model-based speaker adaptation of end-to-end lattice-free MMI model for speech recognition
Xurong Xie
Xunying Liu
Hui Chen
Hongan Wang
29
1
0
17 Nov 2022
Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy Environments
Dominik Wagner
Ilja Baumann
Sebastian P. Bayerl
Korbinian Riedhammer
Tobias Bocklet
47
2
0
16 Nov 2022
Towards A Unified Conformer Structure: from ASR to ASV Task
Dexin Liao
Tao Jiang
Feng Wang
Lin Li
Q. Hong
30
10
0
14 Nov 2022
Exploring Sequence-to-Sequence Transformer-Transducer Models for Keyword Spotting
Beltrán Labrador
Guanlong Zhao
Ignacio López Moreno
Angelo Scorza Scarpati
Liam H. Fowl
Quan Wang
19
2
0
11 Nov 2022
Speech-to-Speech Translation For A Real-world Unwritten Language
Peng-Jen Chen
Ke M. Tran
Yilin Yang
Jingfei Du
Justine T. Kao
...
Sravya Popuri
Changhan Wang
J. Pino
Wei-Ning Hsu
Ann Lee
39
26
0
11 Nov 2022
Enhancing and Adversarial: Improve ASR with Speaker Labels
Wei Zhou
Haotian Wu
Jingjing Xu
Mohammad Zeineldeen
Christoph Luscher
Ralf Schluter
Hermann Ney
32
8
0
11 Nov 2022
MaskedSpeech: Context-aware Speech Synthesis with Masking Strategy
Ya-Jie Zhang
Wei Song
Ya Yue
Zhengchen Zhang
Youzheng Wu
Xiaodong He
44
7
0
11 Nov 2022
Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation
Motoi Omachi
Brian Yan
Siddharth Dalmia
Yuya Fujita
Shinji Watanabe
LRM
32
3
0
11 Nov 2022
A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding
Yifan Peng
Siddhant Arora
Yosuke Higuchi
Yushi Ueda
Sujay S. Kumar
Karthik Ganesan
Siddharth Dalmia
Xuankai Chang
Shinji Watanabe
32
20
0
10 Nov 2022
Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models
Travis M. Bartley
Fei Jia
Krishna C. Puvvada
Samuel Kriman
Boris Ginsburg
SSL
31
6
0
09 Nov 2022
Improving Noisy Student Training on Non-target Domain Data for Automatic Speech Recognition
Yu Chen
Wen Ding
Junjie Lai
37
8
0
09 Nov 2022
Linear Self-Attention Approximation via Trainable Feedforward Kernel
Uladzislau Yorsh
Alexander Kovalenko
35
0
0
08 Nov 2022
High-resolution embedding extractor for speaker diarisation
Hee-Soo Heo
Youngki Kwon
Bong-Jin Lee
You Jin Kim
Jee-weon Jung
32
5
0
08 Nov 2022
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech
Xiaoran Fan
Chao Pang
Tian Yuan
Richard He Bai
Renjie Zheng
...
Junkun Chen
Zeyu Chen
Liang Huang
Yu Sun
Hua Wu
40
0
0
07 Nov 2022
A Context-Aware Computational Approach for Measuring Vocal Entrainment in Dyadic Conversations
Rimita Lahiri
Md. Nasir
C. Lord
So Hyun Kim
Shrikanth Narayanan
18
4
0
07 Nov 2022
RUBICON: A Framework for Designing Efficient Deep Learning-Based Genomic Basecallers
Gagandeep Singh
M. Alser
K. Denolf
Can Firtina
Alireza Khodamoradi
Meryem Banu Cavlak
Henk Corporaal
O. Mutlu
46
12
0
06 Nov 2022
Resource-Efficient Transfer Learning From Speech Foundation Model Using Hierarchical Feature Fusion
Zhouyuan Huo
K. Sim
Bo Li
DongSeon Hwang
Tara N. Sainath
Trevor Strohman
30
5
0
04 Nov 2022
Multi-blank Transducers for Speech Recognition
Hainan Xu
Fei Jia
Somshubra Majumdar
Shinji Watanabe
Boris Ginsburg
38
11
0
04 Nov 2022
Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition
Yusuke Shinohara
Shinji Watanabe
AI4TS
28
9
0
04 Nov 2022
Real-Time Target Sound Extraction
Bandhav Veluri
Justin Chan
Malek Itani
Tuochao Chen
Takuya Yoshioka
Shyamnath Gollakota
44
30
0
04 Nov 2022
The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC): Dataset, Tracks, Baseline and Results
Ao Zhang
F. Yu
Kaixun Huang
Linfu Xie
Longbiao Wang
Eng Siong Chng
Hui Bu
Binbin Zhang
Wei Chen
Xin Xu
32
4
0
03 Nov 2022
Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system
Li Li
Dongxing Xu
Haoran Wei
Yanhua Long
26
2
0
03 Nov 2022
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Yonggan Fu
Yang Zhang
Kaizhi Qian
Zhifan Ye
Zhongzhi Yu
Cheng-I Jeff Lai
Yingyan Lin
35
8
0
02 Nov 2022
Towards Zero-Shot Code-Switched Speech Recognition
Brian Yan
Sanjeev Khudanpur
Ondˇrej Klejch
Preethi Jyothi
Shinji Watanabe
26
19
0
02 Nov 2022
Variable Attention Masking for Configurable Transformer Transducer Speech Recognition
P. Swietojanski
Stefan Braun
Dogan Can
Thiago Fraga da Silva
Arnab Ghoshal
...
Henry Mason
Erik McDermott
Honza Silovsky
R. Travadi
Xiaodan Zhuang
42
13
0
02 Nov 2022
Inference and Denoise: Causal Inference-based Neural Speech Enhancement
Tsun-An Hsieh
Chao-Han Huck Yang
Pin-Yu Chen
Sabato Marco Siniscalchi
Yu Tsao
CML
63
2
0
02 Nov 2022
Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition
Lester Phillip Violeta
D. Ma
Wen-Chin Huang
Tomoki Toda
39
7
0
02 Nov 2022
Monolingual Recognizers Fusion for Code-switching Speech Recognition
Tongtong Song
Qiang Xu
Haoyu Lu
Longbiao Wang
Hao Shi
Yuqin Lin
Yanbing Yang
J. Dang
27
4
0
02 Nov 2022
Internal Language Model Estimation based Adaptive Language Model Fusion for Domain Adaptation
Rao Ma
Xiaobo Wu
Jin Qiu
Yanan Qin
Haihua Xu
Peihao Wu
Zejun Ma
32
2
0
02 Nov 2022
Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames
Che-Yuan Liang
Xiao-Lei Zhang
BinBin Zhang
Di Wu
Shengqiang Li
Xingcheng Song
Zhendong Peng
Fuping Pan
18
8
0
02 Nov 2022
Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers
Duc Le
Frank Seide
Yuhao Wang
Heng Chang
Kjell Schubert
Ozlem Kalinli
M. Seltzer
19
6
0
02 Nov 2022
Conversation-oriented ASR with multi-look-ahead CBS architecture
Huaibo Zhao
S. Fujie
Tetsuji Ogawa
Jin Sakuma
Yusuke Kida
Tetsunori Kobayashi
39
3
0
02 Nov 2022
InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss
Yosuke Higuchi
Tetsuji Ogawa
Tetsunori Kobayashi
Shinji Watanabe
32
0
0
02 Nov 2022
BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder
Yosuke Higuchi
Tetsuji Ogawa
Tetsunori Kobayashi
Shinji Watanabe
66
13
0
02 Nov 2022
Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems
Shaan Bijwadia
Shuo-yiin Chang
Bo Li
Tara N. Sainath
Chaoyang Zhang
Yanzhang He
47
7
0
01 Nov 2022
TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty
Xingcheng Song
Di Wu
Zhiyong Wu
Binbin Zhang
Yuekai Zhang
Zhendong Peng
Wenpeng Li
Fuping Pan
Changbao Zhu
39
8
0
01 Nov 2022
A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings
Mohan Shi
Jie Zhang
Zhihao Du
Fan Yu
Qian Chen
Shiliang Zhang
Lirong Dai
51
4
0
01 Nov 2022
Speech-text based multi-modal training with bidirectional attention for improved speech recognition
Yuhang Yang
Haihua Xu
Hao-Ming Huang
Eng Siong Chng
Sheng Li
47
7
0
01 Nov 2022
Controllable Factuality in Document-Grounded Dialog Systems Using a Noisy Channel Model
Nico Daheim
David Thulke
Christian Dugast
Hermann Ney
HILM
19
4
0
31 Oct 2022
Previous
1
2
3
...
21
22
23
...
33
34
35
Next