Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.08100
Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition
16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Conformer: Convolution-augmented Transformer for Speech Recognition"
50 / 1,758 papers shown
Title
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
...
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
AuLLM
48
270
0
23 Jun 2023
Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems
Mingyu Cui
Jiawen Kang
Jiajun Deng
Xiaoyue Yin
Yutao Xie
Xie Chen
Xunying Liu
43
8
0
23 Jun 2023
Automatic Speech Disentanglement for Voice Conversion using Rank Module and Speech Augmentation
Zhonghua Liu
Shijun Wang
Ning Chen
DRL
37
2
0
21 Jun 2023
Recent Advances in Direct Speech-to-text Translation
Chen Xu
Rong Ye
Qianqian Dong
Chengqi Zhao
Tom Ko
Mingxuan Wang
Tong Xiao
Jingbo Zhu
32
19
0
20 Jun 2023
Timestamped Embedding-Matching Acoustic-to-Word CTC ASR
Woojay Jeon
32
0
0
20 Jun 2023
Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition
Xuefei Wang
Yanhua Long
Yijie Li
Haoran Wei
40
4
0
20 Jun 2023
HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation
Cihan Xiao
Lin Zhang
Jinyi Yang
Dongji Gao
Sanjeev Khudanpur
Kevin Duh
Sanjeev Khudanpur
37
1
0
20 Jun 2023
Rehearsal-Free Online Continual Learning for Automatic Speech Recognition
Steven Vander Eeckt
Hugo Van hamme
CLL
45
3
0
19 Jun 2023
SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces
Ziqiao Peng
Yihao Luo
Yue Shi
Hao-Xuan Xu
Xiangyu Zhu
Jun He
Hongyan Liu
Zhaoxin Fan
58
41
0
19 Jun 2023
NAR-Former V2: Rethinking Transformer for Universal Neural Network Representation Learning
Yun Yi
Haokui Zhang
Rong Xiao
Nan Wang
Xiaoyu Wang
GNN
43
2
0
19 Jun 2023
Multitrack Music Transcription with a Time-Frequency Perceiver
Weiyi Lu
Ju-Chiang Wang
Yun-Ning Hung
ViT
AI4TS
34
24
0
19 Jun 2023
MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition
Yuchen Hu
Chen Chen
Ruizhe Li
Heqing Zou
Chng Eng Siong
GAN
52
9
0
18 Jun 2023
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition
Yuchen Hu
Ruizhe Li
Cheng Chen
Chengwei Qin
Qiu-shi Zhu
Eng Siong Chng
44
5
0
18 Jun 2023
Competitive and Resource Efficient Factored Hybrid HMM Systems are Simpler Than You Think
Tina Raissi
Christoph Luscher
Moritz Gunz
Ralf Schluter
Hermann Ney
BDL
20
3
0
15 Jun 2023
Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis
Shivam Mehta
Siyang Wang
Simon Alexanderson
Jonas Beskow
Éva Székely
G. Henter
DiffM
26
14
0
15 Jun 2023
Lexical Speaker Error Correction: Leveraging Language Models for Speaker Diarization Error Correction
Rohit Paturi
S. Srinivasan
Xiang Li
31
13
0
15 Jun 2023
MobileASR: A resource-aware on-device learning framework for user voice personalization applications on mobile phones
Zitha Sasindran
Harsha Yelchuri
Pooja S B. Rao
Prabhakar Venkata Tamma
31
1
0
15 Jun 2023
CoverHunter: Cover Song Identification with Refined Attention and Alignments
Feng Liu
Deyi Tuo
Yinan Xu
Xintong Han
19
4
0
15 Jun 2023
Unified model for code-switching speech recognition and language identification based on a concatenated tokenizer
Kunal Dhawan
KDimating Rekesh
Boris Ginsburg
22
10
0
14 Jun 2023
EM-Network: Oracle Guided Self-distillation for Sequence Learning
J. Yoon
Sunghwan Ahn
Hyeon Seung Lee
Minchan Kim
Seokhwan Kim
N. Kim
VLM
43
2
0
14 Jun 2023
Research on an improved Conformer end-to-end Speech Recognition Model with R-Drop Structure
Weidong Ji
Shijie Zan
Guohui Zhou
Xu Wang
SyDa
27
1
0
14 Jun 2023
DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer ASR
Goeric Huybrechts
S. Ronanki
Xilai Li
H. Nosrati
S. Bodapati
Katrin Kirchhoff
26
1
0
13 Jun 2023
Large-scale Language Model Rescoring on Long-form Data
Tongzhou Chen
Cyril Allauzen
Yinghui Huang
Daniel S. Park
David Rybach
...
Rodrigo Cabrera
Kartik Audhkhasi
Bhuvana Ramabhadran
Pedro J. Moreno
Michael Riley
43
14
0
13 Jun 2023
Efficient Adapters for Giant Speech Models
Nanxin Chen
Izhak Shafran
Yu Zhang
Chung-Cheng Chiu
H. Soltau
James Qin
Yonghui Wu
30
10
0
13 Jun 2023
Contrastive Learning-Based Audio to Lyrics Alignment for Multiple Languages
Simon Durand
Daniel Stoller
Sebastian Ewert
34
12
0
13 Jun 2023
Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation
Yucheng Han
Chen Xu
Tong Xiao
Jingbo Zhu
35
3
0
13 Jun 2023
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
Chenpeng Du
Yiwei Guo
Feiyu Shen
Zhijun Liu
Zheng Liang
Xie Chen
Shuai Wang
Hui Zhang
K. Yu
DiffM
29
43
0
13 Jun 2023
Parameter-efficient Dysarthric Speech Recognition Using Adapter Fusion and Householder Transformation
Jinzi Qi
Hugo Van hamme
48
3
0
12 Jun 2023
Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition
Belen Alastruey
Lukas Drude
Jahn Heymann
Simon Wiesler
38
1
0
12 Jun 2023
Multimodal Audio-textual Architecture for Robust Spoken Language Understanding
Anderson R. Avila
Mehdi Rezagholizadeh
Chao Xing
23
1
0
12 Jun 2023
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks
Saidul Islam
Hanae Elmekki
Ahmed Elsebai
Jamal Bentahar
Najat Drawel
Gaith Rjoub
Witold Pedrycz
ViT
MedIm
29
175
0
11 Jun 2023
Efficient Encoder-Decoder and Dual-Path Conformer for Comprehensive Feature Learning in Speech Enhancement
Junyu Wang
34
4
0
09 Jun 2023
Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition
Xianzhao Chen
Yist Y. Lin
Kang Wang
Yi He
Zejun Ma
29
2
0
09 Jun 2023
Trajectory Prediction with Observations of Variable-Length for Motion Planning in Highway Merging scenarios
Sajjad Mozaffari
Mreza Alipour Sormoli
K. Koufos
Graham Lee
M. Dianati
52
8
0
08 Jun 2023
Latent Phrase Matching for Dysarthric Speech
Colin S. Lea
Dianna Yee
Jaya Narain
Zifang Huang
Lauren Tooley
Jeffrey P. Bigham
Leah Findlater
38
4
0
08 Jun 2023
Language-specific Acoustic Boundary Learning for Mandarin-English Code-switching Speech Recognition
Zhiyun Fan
Linhao Dong
Chen Shen
Zhenlin Liang
Jun Zhang
Lu Lu
Zejun Ma
32
4
0
08 Jun 2023
Matching Latent Encoding for Audio-Text based Keyword Spotting
K. Nishu
Minsik Cho
Devang Naik
25
15
0
08 Jun 2023
Lenient Evaluation of Japanese Speech Recognition: Modeling Naturally Occurring Spelling Inconsistency
Shigeki Karita
R. Sproat
Haruko Ishikawa
35
4
0
07 Jun 2023
Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization
Kohei Matsuura
Takanori Ashihara
Takafumi Moriya
Tomohiro Tanaka
Takatomo Kano
A. Ogawa
Marc Delcroix
37
9
0
07 Jun 2023
Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer
Lu Huang
Yangqiu Song
Jun Zhang
Lu Lu
Zejun Ma
44
2
0
07 Jun 2023
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
Yochai Yemini
Aviv Shamsian
Lior Bracha
Sharon Gannot
Ethan Fetaya
DiffM
33
11
0
05 Jun 2023
Incorporating L2 Phonemes Using Articulatory Features for Robust Speech Recognition
Jisung Wang
Haram Lee
Myungwoo Oh
34
1
0
05 Jun 2023
Streaming Speech-to-Confusion Network Speech Recognition
Denis Filimonov
Prabhat Pandey
Ariya Rastrow
Ankur Gandhe
A. Stolcke
HAI
37
0
0
02 Jun 2023
ALO-VC: Any-to-any Low-latency One-shot Voice Conversion
Bo Wang
Damien Ronssin
Milos Cernak
BDL
38
3
0
01 Jun 2023
Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts
Dongji Gao
Sanjeev Khudanpur
Hainan Xu
Leibny Paola García
Daniel Povey
Sanjeev Khudanpur
29
8
0
01 Jun 2023
Enhancing the Unified Streaming and Non-streaming Model with Contrastive Learning
Yuting Yang
Yuke Li
Binbin Du
AI4TS
33
0
0
01 Jun 2023
Encoder-decoder multimodal speaker change detection
Jee-weon Jung
Soonshin Seo
Hee-Soo Heo
Geon-min Kim
You Jin Kim
Youngki Kwon
Min-Ji Lee
Bong-Jin Lee
45
2
0
01 Jun 2023
Some voices are too common: Building fair speech recognition systems using the Common Voice dataset
Lucas Maison
Yannick Esteve
30
3
0
01 Jun 2023
Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners
Sarthak Yadav
Sergios Theodoridis
Lars Kai Hansen
Zheng-Hua Tan
30
7
0
01 Jun 2023
Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?
Salah Zaiem
Youcef Kemiche
Titouan Parcollet
S. Essid
Mirco Ravanelli
SSL
19
23
0
01 Jun 2023
Previous
1
2
3
...
16
17
18
...
34
35
36
Next