ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,749 papers shown
Title
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with
  Non-Autoregressive Hidden Intermediates
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates
Hirofumi Inaguma
Siddharth Dalmia
Brian Yan
Shinji Watanabe
65
11
0
27 Sep 2021
ChannelAugment: Improving generalization of multi-channel ASR by
  training with input channel randomization
ChannelAugment: Improving generalization of multi-channel ASR by training with input channel randomization
M. Gaudesi
F. Weninger
D. Sharma
P. Zhan
AAML
33
1
0
23 Sep 2021
Audiomer: A Convolutional Transformer For Keyword Spotting
Surya Kant Sahu
Sai Mitheran
Juhi Kamdar
Meet Gandhi
40
8
0
21 Sep 2021
Audio-Visual Speech Recognition is Worth 32$\times$32$\times$8 Voxels
Audio-Visual Speech Recognition is Worth 32×\times×32×\times×8 Voxels
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
29
7
0
20 Sep 2021
Influence of ASR and Language Model on Alzheimer's Disease Detection
Influence of ASR and Language Model on Alzheimer's Disease Detection
Joan Codina-Filbà
Guillermo Cámbara
Jordi Luque
Mireia Farrús
21
2
0
20 Sep 2021
Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning
  for Low-Resource Speech Recognition
Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition
Guolin Zheng
Yubei Xiao
Ke Gong
Pan Zhou
Xiaodan Liang
Liang Lin
32
26
0
19 Sep 2021
Dual-Encoder Architecture with Encoder Selection for Joint Close-Talk
  and Far-Talk Speech Recognition
Dual-Encoder Architecture with Encoder Selection for Joint Close-Talk and Far-Talk Speech Recognition
F. Weninger
M. Gaudesi
Ralf Leibold
R. Gemello
P. Zhan
35
4
0
17 Sep 2021
Primer: Searching for Efficient Transformers for Language Modeling
Primer: Searching for Efficient Transformers for Language Modeling
David R. So
Wojciech Mañke
Hanxiao Liu
Zihang Dai
Noam M. Shazeer
Quoc V. Le
VLM
91
152
0
17 Sep 2021
PDAugment: Data Augmentation by Pitch and Duration Adjustments for
  Automatic Lyrics Transcription
PDAugment: Data Augmentation by Pitch and Duration Adjustments for Automatic Lyrics Transcription
Chen Zhang
Jiaxing Yu
Luchin Chang
Xu Tan
Jiawei Chen
Tao Qin
Kecheng Zhang
22
15
0
16 Sep 2021
Tied & Reduced RNN-T Decoder
Tied & Reduced RNN-T Decoder
Rami Botros
Tara N. Sainath
R. David
Emmanuel Guzman
Wei Li
Yanzhang He
38
55
0
15 Sep 2021
Performance-Efficiency Trade-offs in Unsupervised Pre-training for
  Speech Recognition
Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition
Felix Wu
Kwangyoun Kim
Jing Pan
Kyu Jeong Han
Kilian Q. Weinberger
Yoav Artzi
27
71
0
14 Sep 2021
Non-autoregressive End-to-end Speech Translation with Parallel
  Autoregressive Rescoring
Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring
Hirofumi Inaguma
Yosuke Higuchi
Kevin Duh
Tatsuya Kawahara
Shinji Watanabe
63
11
0
09 Sep 2021
Beijing ZKJ-NPU Speaker Verification System for VoxCeleb Speaker
  Recognition Challenge 2021
Beijing ZKJ-NPU Speaker Verification System for VoxCeleb Speaker Recognition Challenge 2021
Li Zhang
Huan Zhao
Qinling Meng
Yanli Chen
Min Liu
Lei Xie
32
10
0
08 Sep 2021
A Survey of Sound Source Localization with Deep Learning Methods
A Survey of Sound Source Localization with Deep Learning Methods
Pierre-Amaury Grumiaux
Srdjan Kitić
Laurent Girin
Alexandre Guérin
33
246
0
08 Sep 2021
Efficient conformer: Progressive downsampling and grouped attention for
  automatic speech recognition
Efficient conformer: Progressive downsampling and grouped attention for automatic speech recognition
Maxime Burchi
Valentin Vielzeuf
37
84
0
31 Aug 2021
Multi-Channel Transformer Transducer for Speech Recognition
Multi-Channel Transformer Transducer for Speech Recognition
Feng-Ju Chang
Martin H. Radfar
Athanasios Mouchtaris
M. Omologo
26
19
0
30 Aug 2021
Injecting Text in Self-Supervised Speech Pretraining
Injecting Text in Self-Supervised Speech Pretraining
Zhehuai Chen
Yu Zhang
Andrew Rosenberg
Bhuvana Ramabhadran
Gary Wang
Pedro J. Moreno
SSL
25
36
0
27 Aug 2021
Self-Attention for Audio Super-Resolution
Self-Attention for Audio Super-Resolution
Nathanaël Carraz Rakotonirina
SupR
38
23
0
26 Aug 2021
Multilingual Speech Recognition for Low-Resource Indian Languages using
  Multi-Task conformer
Multilingual Speech Recognition for Low-Resource Indian Languages using Multi-Task conformer
Krishna D N Freshworks
29
7
0
22 Aug 2021
A Dual-Decoder Conformer for Multilingual Speech Recognition
A Dual-Decoder Conformer for Multilingual Speech Recognition
Krishna D N Freshworks
9
1
0
22 Aug 2021
Generalizing RNN-Transducer to Out-Domain Audio via Sparse
  Self-Attention Layers
Generalizing RNN-Transducer to Out-Domain Audio via Sparse Self-Attention Layers
Juntae Kim
Jee-Hye Lee
24
6
0
22 Aug 2021
Towards Efficient Point Cloud Graph Neural Networks Through
  Architectural Simplification
Towards Efficient Point Cloud Graph Neural Networks Through Architectural Simplification
Shyam A. Tailor
R. D. Jong
Tiago Azevedo
Matthew Mattina
Partha P. Maji
3DPC
GNN
25
12
0
13 Aug 2021
Masked Acoustic Unit for Mispronunciation Detection and Correction
Masked Acoustic Unit for Mispronunciation Detection and Correction
Zhan Zhang
Yuehai Wang
Jianyi Yang
25
3
0
12 Aug 2021
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling
  for Self-Supervised Speech Pre-Training
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Yu-An Chung
Yu Zhang
Wei Han
Chung-Cheng Chiu
James Qin
Ruoming Pang
Yonghui Wu
SSL
VLM
12
412
0
07 Aug 2021
Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language
  and Accent Identification
Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language and Accent Identification
Sangeeta Ghangam
Daniel Whitenack
Joshua Nemecek
16
4
0
04 Aug 2021
Decoupling recognition and transcription in Mandarin ASR
Decoupling recognition and transcription in Mandarin ASR
Jiahong Yuan
Xingyu Cai
Dongji Gao
Renjie Zheng
Liang Huang
Kenneth Ward Church
36
9
0
02 Aug 2021
USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition
  Experiments
USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments
M. Musaev
Saida Mussakhojayeva
Ilyos Khujayorov
Yerbolat Khassanov
M. Ochilov
H. A. Varol
16
19
0
30 Jul 2021
Proposal-based Few-shot Sound Event Detection for Speech and
  Environmental Sounds with Perceivers
Proposal-based Few-shot Sound Event Detection for Speech and Environmental Sounds with Perceivers
Piper Wolters
Logan Sizemore
Chris Daw
Brian Hutchinson
Lauren A. Phillips
37
11
0
28 Jul 2021
CarneliNet: Neural Mixture Model for Automatic Speech Recognition
CarneliNet: Neural Mixture Model for Automatic Speech Recognition
A. Kalinov
Somshubra Majumdar
Jagadeesh Balam
Boris Ginsburg
MoE
24
3
0
22 Jul 2021
Multitask-Based Joint Learning Approach To Robust ASR For Radio
  Communication Speech
Multitask-Based Joint Learning Approach To Robust ASR For Radio Communication Speech
Duo Ma
Nana Hou
Van Tung Pham
Haihua Xu
Chng Eng Siong
33
22
0
22 Jul 2021
Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models
Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models
Tianzi Wang
Yuya Fujita
Xuankai Chang
Shinji Watanabe
13
15
0
20 Jul 2021
Assessment of Self-Attention on Learned Features For Sound Event
  Localization and Detection
Assessment of Self-Attention on Learned Features For Sound Event Localization and Detection
Parthasaarathy Sudarsanam
A. Politis
K. Drossos
16
13
0
20 Jul 2021
Translatotron 2: High-quality direct speech-to-speech translation with
  voice preservation
Translatotron 2: High-quality direct speech-to-speech translation with voice preservation
Ye Jia
Michelle Tadmor Ramanovich
Tal Remez
Roi Pomerantz
26
67
0
19 Jul 2021
VAD-free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording
VAD-free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording
Hirofumi Inaguma
Tatsuya Kawahara
19
2
0
15 Jul 2021
Conformer-based End-to-end Speech Recognition With Rotary Position
  Embedding
Conformer-based End-to-end Speech Recognition With Rotary Position Embedding
Shengqiang Li
Menglong Xu
Xiao-Lei Zhang
18
9
0
13 Jul 2021
Speech Representation Learning Combining Conformer CPC with Deep Cluster
  for the ZeroSpeech Challenge 2021
Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021
Takashi Maekaku
Xuankai Chang
Yuya Fujita
Li-Wei Chen
Shinji Watanabe
Alexander I. Rudnicky
115
13
0
13 Jul 2021
Dropout Regularization for Self-Supervised Learning of Transformer
  Encoder Speech Representation
Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation
Jian Luo
Jianzong Wang
Ning Cheng
Jing Xiao
SSL
27
6
0
09 Jul 2021
On lattice-free boosted MMI training of HMM and CTC-based full-context
  ASR models
On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models
Xiaohui Zhang
Vimal Manohar
David C. Zhang
Frank Zhang
Yangyang Shi
Nayan Singhal
Julian Chan
Fuchun Peng
Yatharth Saraf
M. Seltzer
20
14
0
09 Jul 2021
Improved Language Identification Through Cross-Lingual Self-Supervised
  Learning
Improved Language Identification Through Cross-Lingual Self-Supervised Learning
Andros Tjandra
Diptanu Gon Choudhury
Frank Zhang
Kritika Singh
Alexis Conneau
Alexei Baevski
Assaf Sela
Yatharth Saraf
Michael Auli
VLM
SSL
24
35
0
08 Jul 2021
Advancing CTC-CRF Based End-to-End Speech Recognition with Wordpieces
  and Conformers
Advancing CTC-CRF Based End-to-End Speech Recognition with Wordpieces and Conformers
Huahuan Zheng
Wenjie Peng
Zhijian Ou
Jinsong Zhang
28
5
0
07 Jul 2021
GLiT: Neural Architecture Search for Global and Local Image Transformer
GLiT: Neural Architecture Search for Global and Local Image Transformer
Boyu Chen
Peixia Li
Chuming Li
Baopu Li
Lei Bai
Chen Lin
Ming Sun
Junjie Yan
Wanli Ouyang
ViT
35
85
0
07 Jul 2021
A Comparative Study of Modular and Joint Approaches for
  Speaker-Attributed ASR on Monaural Long-Form Audio
A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio
Naoyuki Kanda
Xiong Xiao
Jian Wu
Tianyan Zhou
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Takuya Yoshioka
19
14
0
06 Jul 2021
The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline
  Task
The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline Task
Chen Xu
Xiaoqian Liu
Xiaowen Liu
Laohu Wang
Canan Huang
Tong Xiao
Jingbo Zhu
34
5
0
06 Jul 2021
Investigation of Practical Aspects of Single Channel Speech Separation
  for ASR
Investigation of Practical Aspects of Single Channel Speech Separation for ASR
Jian Wu
Zhuo Chen
Sanyuan Chen
Yu-Huan Wu
Takuya Yoshioka
Naoyuki Kanda
Shujie Liu
Jinyu Li
27
17
0
05 Jul 2021
Relaxed Attention: A Simple Method to Boost Performance of End-to-End
  Automatic Speech Recognition
Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition
Timo Lohrenz
P. Schwarz
Zhengyang Li
Tim Fingscheidt
21
11
0
02 Jul 2021
Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech
  Recognition
Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition
Niko Moritz
Takaaki Hori
Jonathan Le Roux
6
20
0
02 Jul 2021
ESPnet-ST IWSLT 2021 Offline Speech Translation System
ESPnet-ST IWSLT 2021 Offline Speech Translation System
Hirofumi Inaguma
Shun Kiyono
Nelson Enrique Yalta Soplin
Pengcheng Guo
Jun Suzuki
Kevin Duh
Shinji Watanabe
3DV
37
2
0
01 Jul 2021
StableEmit: Selection Probability Discount for Reducing Emission Latency
  of Streaming Monotonic Attention ASR
StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR
Hirofumi Inaguma
Tatsuya Kawahara
25
4
0
01 Jul 2021
IMS' Systems for the IWSLT 2021 Low-Resource Speech Translation Task
IMS' Systems for the IWSLT 2021 Low-Resource Speech Translation Task
Pavel Denisov
Manuel Mager
Ngoc Thang Vu
37
6
0
30 Jun 2021
DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using
  linear complexity self-attention for speech enhancement
DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement
Yuma Koizumi
Shigeki Karita
Scott Wisdom
Hakan Erdogan
J. Hershey
Llion Jones
M. Bacchiani
19
41
0
30 Jun 2021
Previous
123...3132333435
Next