ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,758 papers shown
Title
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
...
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
AuLLM
48
270
0
23 Jun 2023
Towards Effective and Compact Contextual Representation for Conformer
  Transducer Speech Recognition Systems
Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems
Mingyu Cui
Jiawen Kang
Jiajun Deng
Xiaoyue Yin
Yutao Xie
Xie Chen
Xunying Liu
43
8
0
23 Jun 2023
Automatic Speech Disentanglement for Voice Conversion using Rank Module
  and Speech Augmentation
Automatic Speech Disentanglement for Voice Conversion using Rank Module and Speech Augmentation
Zhonghua Liu
Shijun Wang
Ning Chen
DRL
37
2
0
21 Jun 2023
Recent Advances in Direct Speech-to-text Translation
Recent Advances in Direct Speech-to-text Translation
Chen Xu
Rong Ye
Qianqian Dong
Chengqi Zhao
Tom Ko
Mingxuan Wang
Tong Xiao
Jingbo Zhu
32
19
0
20 Jun 2023
Timestamped Embedding-Matching Acoustic-to-Word CTC ASR
Timestamped Embedding-Matching Acoustic-to-Word CTC ASR
Woojay Jeon
32
0
0
20 Jun 2023
Multi-pass Training and Cross-information Fusion for Low-resource
  End-to-end Accented Speech Recognition
Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition
Xuefei Wang
Yanhua Long
Yijie Li
Haoran Wei
40
4
0
20 Jun 2023
HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation
HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation
Cihan Xiao
Lin Zhang
Jinyi Yang
Dongji Gao
Sanjeev Khudanpur
Kevin Duh
Sanjeev Khudanpur
37
1
0
20 Jun 2023
Rehearsal-Free Online Continual Learning for Automatic Speech
  Recognition
Rehearsal-Free Online Continual Learning for Automatic Speech Recognition
Steven Vander Eeckt
Hugo Van hamme
CLL
45
3
0
19 Jun 2023
SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend
  3D Talking Faces
SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces
Ziqiao Peng
Yihao Luo
Yue Shi
Hao-Xuan Xu
Xiangyu Zhu
Jun He
Hongyan Liu
Zhaoxin Fan
58
41
0
19 Jun 2023
NAR-Former V2: Rethinking Transformer for Universal Neural Network
  Representation Learning
NAR-Former V2: Rethinking Transformer for Universal Neural Network Representation Learning
Yun Yi
Haokui Zhang
Rong Xiao
Nan Wang
Xiaoyu Wang
GNN
43
2
0
19 Jun 2023
Multitrack Music Transcription with a Time-Frequency Perceiver
Multitrack Music Transcription with a Time-Frequency Perceiver
Weiyi Lu
Ju-Chiang Wang
Yun-Ning Hung
ViT
AI4TS
34
24
0
19 Jun 2023
MIR-GAN: Refining Frame-Level Modality-Invariant Representations with
  Adversarial Network for Audio-Visual Speech Recognition
MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition
Yuchen Hu
Chen Chen
Ruizhe Li
Heqing Zou
Chng Eng Siong
GAN
52
9
0
18 Jun 2023
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for
  Robust Audio-Visual Speech Recognition
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition
Yuchen Hu
Ruizhe Li
Cheng Chen
Chengwei Qin
Qiu-shi Zhu
Eng Siong Chng
44
5
0
18 Jun 2023
Competitive and Resource Efficient Factored Hybrid HMM Systems are
  Simpler Than You Think
Competitive and Resource Efficient Factored Hybrid HMM Systems are Simpler Than You Think
Tina Raissi
Christoph Luscher
Moritz Gunz
Ralf Schluter
Hermann Ney
BDL
20
3
0
15 Jun 2023
Diff-TTSG: Denoising probabilistic integrated speech and gesture
  synthesis
Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis
Shivam Mehta
Siyang Wang
Simon Alexanderson
Jonas Beskow
Éva Székely
G. Henter
DiffM
26
14
0
15 Jun 2023
Lexical Speaker Error Correction: Leveraging Language Models for Speaker
  Diarization Error Correction
Lexical Speaker Error Correction: Leveraging Language Models for Speaker Diarization Error Correction
Rohit Paturi
S. Srinivasan
Xiang Li
31
13
0
15 Jun 2023
MobileASR: A resource-aware on-device learning framework for user voice
  personalization applications on mobile phones
MobileASR: A resource-aware on-device learning framework for user voice personalization applications on mobile phones
Zitha Sasindran
Harsha Yelchuri
Pooja S B. Rao
Prabhakar Venkata Tamma
31
1
0
15 Jun 2023
CoverHunter: Cover Song Identification with Refined Attention and
  Alignments
CoverHunter: Cover Song Identification with Refined Attention and Alignments
Feng Liu
Deyi Tuo
Yinan Xu
Xintong Han
19
4
0
15 Jun 2023
Unified model for code-switching speech recognition and language
  identification based on a concatenated tokenizer
Unified model for code-switching speech recognition and language identification based on a concatenated tokenizer
Kunal Dhawan
KDimating Rekesh
Boris Ginsburg
22
10
0
14 Jun 2023
EM-Network: Oracle Guided Self-distillation for Sequence Learning
EM-Network: Oracle Guided Self-distillation for Sequence Learning
J. Yoon
Sunghwan Ahn
Hyeon Seung Lee
Minchan Kim
Seokhwan Kim
N. Kim
VLM
43
2
0
14 Jun 2023
Research on an improved Conformer end-to-end Speech Recognition Model
  with R-Drop Structure
Research on an improved Conformer end-to-end Speech Recognition Model with R-Drop Structure
Weidong Ji
Shijie Zan
Guohui Zhou
Xu Wang
SyDa
27
1
0
14 Jun 2023
DCTX-Conformer: Dynamic context carry-over for low latency unified
  streaming and non-streaming Conformer ASR
DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer ASR
Goeric Huybrechts
S. Ronanki
Xilai Li
H. Nosrati
S. Bodapati
Katrin Kirchhoff
26
1
0
13 Jun 2023
Large-scale Language Model Rescoring on Long-form Data
Large-scale Language Model Rescoring on Long-form Data
Tongzhou Chen
Cyril Allauzen
Yinghui Huang
Daniel S. Park
David Rybach
...
Rodrigo Cabrera
Kartik Audhkhasi
Bhuvana Ramabhadran
Pedro J. Moreno
Michael Riley
43
14
0
13 Jun 2023
Efficient Adapters for Giant Speech Models
Efficient Adapters for Giant Speech Models
Nanxin Chen
Izhak Shafran
Yu Zhang
Chung-Cheng Chiu
H. Soltau
James Qin
Yonghui Wu
30
10
0
13 Jun 2023
Contrastive Learning-Based Audio to Lyrics Alignment for Multiple
  Languages
Contrastive Learning-Based Audio to Lyrics Alignment for Multiple Languages
Simon Durand
Daniel Stoller
Sebastian Ewert
34
12
0
13 Jun 2023
Modality Adaption or Regularization? A Case Study on End-to-End Speech
  Translation
Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation
Yucheng Han
Chen Xu
Tong Xiao
Jingbo Zhu
35
3
0
13 Jun 2023
UniCATS: A Unified Context-Aware Text-to-Speech Framework with
  Contextual VQ-Diffusion and Vocoding
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
Chenpeng Du
Yiwei Guo
Feiyu Shen
Zhijun Liu
Zheng Liang
Xie Chen
Shuai Wang
Hui Zhang
K. Yu
DiffM
29
43
0
13 Jun 2023
Parameter-efficient Dysarthric Speech Recognition Using Adapter Fusion
  and Householder Transformation
Parameter-efficient Dysarthric Speech Recognition Using Adapter Fusion and Householder Transformation
Jinzi Qi
Hugo Van hamme
48
3
0
12 Jun 2023
Multi-View Frequency-Attention Alternative to CNN Frontends for
  Automatic Speech Recognition
Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition
Belen Alastruey
Lukas Drude
Jahn Heymann
Simon Wiesler
38
1
0
12 Jun 2023
Multimodal Audio-textual Architecture for Robust Spoken Language
  Understanding
Multimodal Audio-textual Architecture for Robust Spoken Language Understanding
Anderson R. Avila
Mehdi Rezagholizadeh
Chao Xing
23
1
0
12 Jun 2023
A Comprehensive Survey on Applications of Transformers for Deep Learning
  Tasks
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks
Saidul Islam
Hanae Elmekki
Ahmed Elsebai
Jamal Bentahar
Najat Drawel
Gaith Rjoub
Witold Pedrycz
ViT
MedIm
29
175
0
11 Jun 2023
Efficient Encoder-Decoder and Dual-Path Conformer for Comprehensive
  Feature Learning in Speech Enhancement
Efficient Encoder-Decoder and Dual-Path Conformer for Comprehensive Feature Learning in Speech Enhancement
Junyu Wang
34
4
0
09 Jun 2023
Improving Frame-level Classifier for Word Timings with Non-peaky CTC in
  End-to-End Automatic Speech Recognition
Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition
Xianzhao Chen
Yist Y. Lin
Kang Wang
Yi He
Zejun Ma
29
2
0
09 Jun 2023
Trajectory Prediction with Observations of Variable-Length for Motion
  Planning in Highway Merging scenarios
Trajectory Prediction with Observations of Variable-Length for Motion Planning in Highway Merging scenarios
Sajjad Mozaffari
Mreza Alipour Sormoli
K. Koufos
Graham Lee
M. Dianati
52
8
0
08 Jun 2023
Latent Phrase Matching for Dysarthric Speech
Latent Phrase Matching for Dysarthric Speech
Colin S. Lea
Dianna Yee
Jaya Narain
Zifang Huang
Lauren Tooley
Jeffrey P. Bigham
Leah Findlater
38
4
0
08 Jun 2023
Language-specific Acoustic Boundary Learning for Mandarin-English
  Code-switching Speech Recognition
Language-specific Acoustic Boundary Learning for Mandarin-English Code-switching Speech Recognition
Zhiyun Fan
Linhao Dong
Chen Shen
Zhenlin Liang
Jun Zhang
Lu Lu
Zejun Ma
32
4
0
08 Jun 2023
Matching Latent Encoding for Audio-Text based Keyword Spotting
Matching Latent Encoding for Audio-Text based Keyword Spotting
K. Nishu
Minsik Cho
Devang Naik
25
15
0
08 Jun 2023
Lenient Evaluation of Japanese Speech Recognition: Modeling Naturally
  Occurring Spelling Inconsistency
Lenient Evaluation of Japanese Speech Recognition: Modeling Naturally Occurring Spelling Inconsistency
Shigeki Karita
R. Sproat
Haruko Ishikawa
35
4
0
07 Jun 2023
Transfer Learning from Pre-trained Language Models Improves End-to-End
  Speech Summarization
Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization
Kohei Matsuura
Takanori Ashihara
Takafumi Moriya
Tomohiro Tanaka
Takatomo Kano
A. Ogawa
Marc Delcroix
37
9
0
07 Jun 2023
Text-only Domain Adaptation using Unified Speech-Text Representation in
  Transducer
Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer
Lu Huang
Yangqiu Song
Jun Zhang
Lu Lu
Zejun Ma
44
2
0
07 Jun 2023
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
Yochai Yemini
Aviv Shamsian
Lior Bracha
Sharon Gannot
Ethan Fetaya
DiffM
33
11
0
05 Jun 2023
Incorporating L2 Phonemes Using Articulatory Features for Robust Speech
  Recognition
Incorporating L2 Phonemes Using Articulatory Features for Robust Speech Recognition
Jisung Wang
Haram Lee
Myungwoo Oh
34
1
0
05 Jun 2023
Streaming Speech-to-Confusion Network Speech Recognition
Streaming Speech-to-Confusion Network Speech Recognition
Denis Filimonov
Prabhat Pandey
Ariya Rastrow
Ankur Gandhe
A. Stolcke
HAI
37
0
0
02 Jun 2023
ALO-VC: Any-to-any Low-latency One-shot Voice Conversion
ALO-VC: Any-to-any Low-latency One-shot Voice Conversion
Bo Wang
Damien Ronssin
Milos Cernak
BDL
38
3
0
01 Jun 2023
Bypass Temporal Classification: Weakly Supervised Automatic Speech
  Recognition with Imperfect Transcripts
Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts
Dongji Gao
Sanjeev Khudanpur
Hainan Xu
Leibny Paola García
Daniel Povey
Sanjeev Khudanpur
29
8
0
01 Jun 2023
Enhancing the Unified Streaming and Non-streaming Model with Contrastive
  Learning
Enhancing the Unified Streaming and Non-streaming Model with Contrastive Learning
Yuting Yang
Yuke Li
Binbin Du
AI4TS
33
0
0
01 Jun 2023
Encoder-decoder multimodal speaker change detection
Encoder-decoder multimodal speaker change detection
Jee-weon Jung
Soonshin Seo
Hee-Soo Heo
Geon-min Kim
You Jin Kim
Youngki Kwon
Min-Ji Lee
Bong-Jin Lee
45
2
0
01 Jun 2023
Some voices are too common: Building fair speech recognition systems
  using the Common Voice dataset
Some voices are too common: Building fair speech recognition systems using the Common Voice dataset
Lucas Maison
Yannick Esteve
30
3
0
01 Jun 2023
Masked Autoencoders with Multi-Window Local-Global Attention Are Better
  Audio Learners
Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners
Sarthak Yadav
Sergios Theodoridis
Lars Kai Hansen
Zheng-Hua Tan
30
7
0
01 Jun 2023
Speech Self-Supervised Representation Benchmarking: Are We Doing it
  Right?
Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?
Salah Zaiem
Youcef Kemiche
Titouan Parcollet
S. Essid
Mirco Ravanelli
SSL
19
23
0
01 Jun 2023
Previous
123...161718...343536
Next