ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,749 papers shown
Title
Enhance Language Identification using Dual-mode Model with Knowledge
  Distillation
Enhance Language Identification using Dual-mode Model with Knowledge Distillation
Hexin Liu
Leibny Paola García Perera
Andy W. H. Khong
Justin Dauwels
S. Styles
Sanjeev Khudanpur
VLM
30
5
0
07 Mar 2022
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with
  Articulatory Features
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features
Florian Lux
Ngoc Thang Vu
25
29
0
07 Mar 2022
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating
  Inverse Short-Time Fourier Transform
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform
Takuhiro Kaneko
Kou Tanaka
Hirokazu Kameoka
Shogo Seki
30
60
0
04 Mar 2022
Look\&Listen: Multi-Modal Correlation Learning for Active Speaker
  Detection and Speech Enhancement
Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement
Jun Xiong
Yu Zhou
Peng Zhang
Lei Xie
Wei Huang
Yufei Zha
33
20
0
04 Mar 2022
MANNER: Multi-view Attention Network for Noise Erasure
MANNER: Multi-view Attention Network for Noise Erasure
Hyun Joon Park
Byung Ha Kang
Wooseok Shin
Jin Sob Kim
S. W. Han
30
48
0
04 Mar 2022
Audio Self-supervised Learning: A Survey
Audio Self-supervised Learning: A Survey
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
40
106
0
02 Mar 2022
A Conformer Based Acoustic Model for Robust Automatic Speech Recognition
A Conformer Based Acoustic Model for Robust Automatic Speech Recognition
Yufeng Yang
Peidong Wang
DeLiang Wang
20
12
0
01 Mar 2022
A Brief Overview of Unsupervised Neural Speech Representation Learning
A Brief Overview of Unsupervised Neural Speech Representation Learning
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
Lars Maaløe
Christian Igel
BDL
AI4TS
SSL
19
11
0
01 Mar 2022
TRILLsson: Distilled Universal Paralinguistic Speech Representations
TRILLsson: Distilled Universal Paralinguistic Speech Representations
Joel Shor
Subhashini Venugopalan
25
37
0
01 Mar 2022
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR
Xuankai Chang
Niko Moritz
Takaaki Hori
Shinji Watanabe
Jonathan Le Roux
24
6
0
01 Mar 2022
PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable
  Multi-GPU Inference Servers
PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers
Yunseong Kim
Yujeong Choi
Minsoo Rhu
23
15
0
27 Feb 2022
Learning the Beauty in Songs: Neural Singing Voice Beautifier
Learning the Beauty in Songs: Neural Singing Voice Beautifier
Jinglin Liu
Chengxi Li
Yi Ren
Zhiying Zhu
Zhou Zhao
DiffM
35
14
0
27 Feb 2022
Visual Speech Recognition for Multiple Languages in the Wild
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
M. Pantic
VLM
130
145
0
26 Feb 2022
A Survey of Multilingual Models for Automatic Speech Recognition
A Survey of Multilingual Models for Automatic Speech Recognition
Hemant Yadav
Sunayana Sitaram
24
35
0
25 Feb 2022
Attentive Temporal Pooling for Conformer-based Streaming Language
  Identification in Long-form Speech
Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech
Quan Wang
Yang Yu
Jason W. Pelecanos
Yiling Huang
Ignacio López Moreno
21
14
0
24 Feb 2022
A Differential Attention Fusion Model Based on Transformer for Time
  Series Forecasting
A Differential Attention Fusion Model Based on Transformer for Time Series Forecasting
Benhan Li
Shengdong Du
Tianrui Li
AI4TS
28
2
0
23 Feb 2022
Korean Tokenization for Beam Search Rescoring in Speech Recognition
Korean Tokenization for Beam Search Rescoring in Speech Recognition
Kyuhong Shim
Hyewon Bae
Wonyong Sung
24
0
0
22 Feb 2022
Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric
  and Elderly Speech Recognition
Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition
Mengzhe Geng
Xurong Xie
Zi Ye
Tianzi Wang
Guinan Li
Shujie Hu
Xunying Liu
Helen Meng
22
28
0
21 Feb 2022
Adaptive Discounting of Implicit Language Models in RNN-Transducers
Adaptive Discounting of Implicit Language Models in RNN-Transducers
Vinit Unni
Shreya Khare
Ashish R. Mittal
P. Jyothi
Sunita Sarawagi
Samarth Bharadwaj
27
3
0
21 Feb 2022
Domain Adaptation of low-resource Target-Domain models using
  well-trained ASR Conformer Models
Domain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer Models
Vrunda N. Sukhadia
S. Umesh
36
8
0
18 Feb 2022
VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge
  transfer from voice conversion
VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion
Disong Wang
Shan Yang
Dan Su
Xunying Liu
Dong Yu
Helen Meng
15
11
0
18 Feb 2022
AISHELL-NER: Named Entity Recognition from Chinese Speech
AISHELL-NER: Named Entity Recognition from Chinese Speech
Boli Chen
Guangwei Xu
Xiaobin Wang
Pengjun Xie
Meishan Zhang
Fei Huang
16
30
0
17 Feb 2022
Mitigating Closed-model Adversarial Examples with Bayesian Neural
  Modeling for Enhanced End-to-End Speech Recognition
Mitigating Closed-model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech Recognition
Chao-Han Huck Yang
Zeeshan Ahmed
Yile Gu
Joseph Szurley
Roger Ren
Linda Liu
A. Stolcke
I. Bulyko
AAML
24
3
0
17 Feb 2022
Non-Autoregressive ASR with Self-Conditioned Folded Encoders
Non-Autoregressive ASR with Self-Conditioned Folded Encoders
Tatsuya Komatsu
28
7
0
17 Feb 2022
MLP-ASR: Sequence-length agnostic all-MLP architectures for speech
  recognition
MLP-ASR: Sequence-length agnostic all-MLP architectures for speech recognition
Jin Sakuma
Tatsuya Komatsu
Robin Scheibler
21
6
0
17 Feb 2022
Capitalization Normalization for Language Modeling with an Accurate and
  Efficient Hierarchical RNN Model
Capitalization Normalization for Language Modeling with an Accurate and Efficient Hierarchical RNN Model
Hao Zhang
You-Chi Cheng
Shankar Kumar
Yifan Jiang
Mingqing Chen
Rajiv Mathews
18
7
0
16 Feb 2022
Knowledge Transfer from Large-scale Pretrained Language Models to
  End-to-end Speech Recognizers
Knowledge Transfer from Large-scale Pretrained Language Models to End-to-end Speech Recognizers
Yotaro Kubo
Shigeki Karita
M. Bacchiani
8
26
0
16 Feb 2022
Conversational Speech Recognition By Learning Conversation-level
  Characteristics
Conversational Speech Recognition By Learning Conversation-level Characteristics
Kun Wei
Yike Zhang
Sining Sun
Lei Xie
Long Ma
43
7
0
16 Feb 2022
Visual Acoustic Matching
Visual Acoustic Matching
Changan Chen
Ruohan Gao
P. Calamia
Kristen Grauman
21
56
0
14 Feb 2022
The xmuspeech system for multi-channel multi-party meeting transcription
  challenge
The xmuspeech system for multi-channel multi-party meeting transcription challenge
Jie Wang
Yuji Liu
Binling Wang
Yiming Zhi
Song Li
Shipeng Xia
Jiayang Zhang
Lin Li
Q. Hong
Feng Tong
16
0
0
11 Feb 2022
Improving Automatic Speech Recognition for Non-Native English with
  Transfer Learning and Language Model Decoding
Improving Automatic Speech Recognition for Non-Native English with Transfer Learning and Language Model Decoding
Peter Sullivan
Toshiko Shibano
Muhammad Abdul-Mageed
44
11
0
10 Feb 2022
The Volcspeech system for the ICASSP 2022 multi-channel multi-party
  meeting transcription challenge
The Volcspeech system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge
Chen Shen
Yi Y. Liu
Wenzhi Fan
Bin Wang
Shi-Xue Wen
Yao Tian
Jun Zhang
Jingsheng Yang
Zejun Ma
12
4
0
09 Feb 2022
Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting
  Transcription Grand Challenge
Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge
Fan Yu
Shiliang Zhang
Pengcheng Guo
Yihui Fu
Zhihao Du
...
Kong Aik Lee
Zhijie Yan
B. Ma
Xin Xu
Hui Bu
18
28
0
08 Feb 2022
Exploring Self-Attention Mechanisms for Speech Separation
Exploring Self-Attention Mechanisms for Speech Separation
Cem Subakan
Mirco Ravanelli
Samuele Cornell
François Grondin
Mirko Bronzi
40
23
0
06 Feb 2022
Self-supervised Learning with Random-projection Quantizer for Speech
  Recognition
Self-supervised Learning with Random-projection Quantizer for Speech Recognition
Chung-Cheng Chiu
James Qin
Yu Zhang
Jiahui Yu
Yonghui Wu
SSL
30
163
0
03 Feb 2022
The RoyalFlush System of Speech Recognition for M2MeT Challenge
The RoyalFlush System of Speech Recognition for M2MeT Challenge
Shuaishuai Ye
Peiyao Wang
Shunfei Chen
Xinhui Hu
Xinkang Xu
24
5
0
03 Feb 2022
mSLAM: Massively multilingual joint pre-training for speech and text
mSLAM: Massively multilingual joint pre-training for speech and text
Ankur Bapna
Colin Cherry
Yu Zhang
Ye Jia
Melvin Johnson
Yong Cheng
Simran Khanuja
Jason Riesa
Alexis Conneau
VLM
30
111
0
03 Feb 2022
Improving End-to-End Contextual Speech Recognition with Fine-Grained
  Contextual Knowledge Selection
Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection
Minglun Han
Linhao Dong
Zhenlin Liang
Meng Cai
Shiyu Zhou
Zejun Ma
Bo Xu
26
45
0
30 Jan 2022
The HCCL-DKU system for fake audio generation task of the 2022 ICASSP
  ADD Challenge
The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge
Ziyi Chen
Hua Hua
Yuxiang Zhang
Ming Li
Pengyuan Zhang
27
0
0
29 Jan 2022
Star Temporal Classification: Sequence Classification with Partially
  Labeled Data
Star Temporal Classification: Sequence Classification with Partially Labeled Data
Vineel Pratap
Awni Y. Hannun
Gabriel Synnaeve
R. Collobert
23
8
0
28 Jan 2022
Reducing language context confusion for end-to-end code-switching
  automatic speech recognition
Reducing language context confusion for end-to-end code-switching automatic speech recognition
Shuai Zhang
Jiangyan Yi
Zhengkun Tian
J. Tao
Y. Yeung
Liqun Deng
27
11
0
28 Jan 2022
Improving End-to-End Models for Set Prediction in Spoken Language
  Understanding
Improving End-to-End Models for Set Prediction in Spoken Language Understanding
H. Kuo
Zoltán Tüske
Samuel Thomas
Brian Kingsbury
G. Saon
21
0
0
28 Jan 2022
On the Effectiveness of Pinyin-Character Dual-Decoding for End-to-End
  Mandarin Chinese ASR
On the Effectiveness of Pinyin-Character Dual-Decoding for End-to-End Mandarin Chinese ASR
Zhao Yang
Dianwen Ng
Xiao Fu
Liping Han
Wei Xi
Ruimeng Wang
Rui Jiang
Jizhong Zhao
40
2
0
26 Jan 2022
Internal Language Model Estimation Through Explicit Context Vector
  Learning for Attention-based Encoder-decoder ASR
Internal Language Model Estimation Through Explicit Context Vector Learning for Attention-based Encoder-decoder ASR
Yufei Liu
Rao Ma
Haihua Xu
Yi He
Zejun Ma
Weibin Zhang
28
12
0
26 Jan 2022
Noise-robust voice conversion with domain adversarial training
Noise-robust voice conversion with domain adversarial training
Hongqiang Du
Lei Xie
Haizhou Li
19
11
0
26 Jan 2022
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition
  for Single and Multi-Person Video
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
94
40
0
25 Jan 2022
Improving the fusion of acoustic and text representations in RNN-T
Improving the fusion of acoustic and text representations in RNN-T
Chao Zhang
Bo-wen Li
Zhiyun Lu
Tara N. Sainath
Shuo-yiin Chang
AI4CE
43
12
0
25 Jan 2022
SPIRAL: Self-supervised Perturbation-Invariant Representation Learning
  for Speech Pre-Training
SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training
Wenyong Huang
Zhenhe Zhang
Y. Yeung
Xin Jiang
Qun Liu
38
23
0
25 Jan 2022
A Study of Transducer based End-to-End ASR with ESPnet: Architecture,
  Auxiliary Loss and Decoding Strategies
A Study of Transducer based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies
Florian Boyer
Yusuke Shinohara
Takaaki Ishii
Hirofumi Inaguma
Shinji Watanabe
35
34
0
14 Jan 2022
CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command
  Recognition
CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition
Wenliang Dai
Samuel Cahyawijaya
Tiezheng Yu
Elham J. Barezi
Peng Xu
...
Genta Indra Winata
Qifeng Chen
Xiaojuan Ma
Bertram E. Shi
Pascale Fung
41
11
0
11 Jan 2022
Previous
123...282930...333435
Next