ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,749 papers shown
Title
Automatic Speech Recognition System-Independent Word Error Rate
  Estimation
Automatic Speech Recognition System-Independent Word Error Rate Estimation
Chanho Park
Mingjie Chen
Thomas Hain
26
0
0
25 Apr 2024
U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF
U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF
Xingchen Song
Di Wu
Binbin Zhang
Dinghao Zhou
Zhendong Peng
Bo Dang
Fuping Pan
Chao Yang
MoE
47
5
0
25 Apr 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for
  Long Sequence Modelling: Methods, Applications, and Challenges
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
46
38
0
24 Apr 2024
Gated Low-rank Adaptation for personalized Code-Switching Automatic
  Speech Recognition on the low-spec devices
Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices
Gwantae Kim
Bokyeung Lee
Donghyeon Kim
Hanseok Ko
OffRL
28
0
0
24 Apr 2024
Less Peaky and More Accurate CTC Forced Alignment by Label Priors
Less Peaky and More Accurate CTC Forced Alignment by Label Priors
Ruizhe Huang
Xiaohui Zhang
Zhaoheng Ni
Li Sun
Moto Hira
...
Vineel Pratap
Matthew Wiesner
Shinji Watanabe
Daniel Povey
Sanjeev Khudanpur
29
4
0
22 Apr 2024
Exploring neural oscillations during speech perception via surrogate
  gradient spiking neural networks
Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks
Alexandre Bittar
Philip N. Garner
37
0
0
22 Apr 2024
Audio Anti-Spoofing Detection: A Survey
Audio Anti-Spoofing Detection: A Survey
Menglu Li
Yasaman Ahmadiadli
Xiao-Ping Zhang
48
19
0
22 Apr 2024
SmartMem: Layout Transformation Elimination and Adaptation for Efficient
  DNN Execution on Mobile
SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile
Wei Niu
Md. Musfiqur Rahman Sanim
Zhihao Shu
Jiexiong Guan
Xipeng Shen
Miao Yin
Gagan Agrawal
Bin Ren
35
6
0
21 Apr 2024
Learn2Talk: 3D Talking Face Learns from 2D Talking Face
Learn2Talk: 3D Talking Face Learns from 2D Talking Face
Yixiang Zhuang
Baoping Cheng
Yao Cheng
Yuntao Jin
Renshuai Liu
Chengyang Li
Xuan Cheng
Jing Liao
Juncong Lin
CVBM
3DH
37
6
0
19 Apr 2024
Efficient infusion of self-supervised representations in Automatic
  Speech Recognition
Efficient infusion of self-supervised representations in Automatic Speech Recognition
Darshan Prabhu
Sai Ganesh Mirishkar
Pankaj Wasnik
14
0
0
19 Apr 2024
Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio
  Source Separation
Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio Source Separation
Ye Bai
Chenxing Li
Hao Li
Yuanyuan Zhao
Xiaorui Wang
24
0
0
17 Apr 2024
Deferred NAM: Low-latency Top-K Context Injection via Deferred Context
  Encoding for Non-Streaming ASR
Deferred NAM: Low-latency Top-K Context Injection via Deferred Context Encoding for Non-Streaming ASR
Zelin Wu
Gan Song
Christopher Li
Pat Rondon
Zhong Meng
...
D. Caseiro
Golan Pundak
Tsendsuren Munkhdalai
Angad Chandorkar
Rohit Prabhavalkar
18
3
0
15 Apr 2024
Anatomy of Industrial Scale Multilingual ASR
Anatomy of Industrial Scale Multilingual ASR
Francis McCann Ramirez
Luka Chkhetiani
Andrew Ehrenberg
R. McHardy
Rami Botros
...
Ahmed Efty
Daniel McCrystal
Sam Flamini
Domenic Donato
Takuya Yoshioka
42
7
0
15 Apr 2024
TransformerFAM: Feedback attention is working memory
TransformerFAM: Feedback attention is working memory
Dongseong Hwang
Weiran Wang
Zhuoyuan Huo
K. Sim
P. M. Mengibar
40
12
0
14 Apr 2024
NeurIT: Pushing the Limit of Neural Inertial Tracking for Indoor Robotic
  IoT
NeurIT: Pushing the Limit of Neural Inertial Tracking for Indoor Robotic IoT
Xinzhe Zheng
Sijie Ji
Yipeng Pan
Kaiwen Zhang
Chenshu Wu
35
1
0
13 Apr 2024
Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping
Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping
Kevin Zhang
Luka Chkhetiani
Francis McCann Ramirez
Yash Khare
Andrea Vanzo
...
Ruben Bousbib
Taufiquzzaman Peyash
Michael Nguyen
Dillon Pulliam
Domenic Donato
40
2
0
10 Apr 2024
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving
  Zero-Shot Voice Editing
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
Philip Anastassiou
Zhenyu Tang
Kainan Peng
Dongya Jia
Jiaxin Li
Ming Tu
Yuping Wang
Yuxuan Wang
Mingbo Ma
42
4
0
10 Apr 2024
Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder
Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder
He Wang
Pengcheng Guo
Xucheng Wan
Huan Zhou
Lei Xie
43
2
0
08 Apr 2024
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting
  for Text-to-Speech Synthesis
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Detai Xin
Xu Tan
Kai Shen
Zeqian Ju
Dongchao Yang
...
Shinnosuke Takamichi
Hiroshi Saruwatari
Shujie Liu
Jinyu Li
Sheng Zhao
37
23
0
04 Apr 2024
Noise Masking Attacks and Defenses for Pretrained Speech Models
Noise Masking Attacks and Defenses for Pretrained Speech Models
Matthew Jagielski
Om Thakkar
Lun Wang
AAML
37
5
0
02 Apr 2024
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Yash Jain
David M. Chan
Pranav Dheram
Aparna Khare
Olabanji Shonibare
Venkatesh Ravichandran
Shalini Ghosh
40
2
0
28 Mar 2024
DANCER: Entity Description Augmented Named Entity Corrector for
  Automatic Speech Recognition
DANCER: Entity Description Augmented Named Entity Corrector for Automatic Speech Recognition
Yi-Cheng Wang
Hsin-Wei Wang
Bi-Cheng Yan
Chi-Han Lin
Berlin Chen
37
1
0
26 Mar 2024
Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of
  Large Speech Models
Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models
Tsendsuren Munkhdalai
Youzheng Chen
K. Sim
Fadi Biadsy
Tara N. Sainath
P. M. Mengibar
35
1
0
25 Mar 2024
Landmark-Guided Cross-Speaker Lip Reading with Mutual Information
  Regularization
Landmark-Guided Cross-Speaker Lip Reading with Mutual Information Regularization
Linzhi Wu
Xingyu Zhang
Yakun Zhang
Changyan Zheng
Tiejun Liu
Liang Xie
Ye Yan
Erwei Yin
35
1
0
24 Mar 2024
A Multimodal Approach to Device-Directed Speech Detection with Large
  Language Models
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
Dominik Wagner
Alexander W. Churchill
Siddharth Sigtia
Panayiotis Georgiou
Matt Mirsamadi
Aarshee Mishra
Erik Marchi
49
6
0
21 Mar 2024
M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual
  Academic Lecture Dataset
M3^33AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Zhe Chen
Heyang Liu
Wenyi Yu
Guangzhi Sun
Hongcheng Liu
Ji Wu
Chao Zhang
Yu Wang
Yanfeng Wang
VGen
57
1
0
21 Mar 2024
Advanced Long-Content Speech Recognition With Factorized Neural
  Transducer
Advanced Long-Content Speech Recognition With Factorized Neural Transducer
Xun Gong
Yu Wu
Jinyu Li
Shujie Liu
Rui Zhao
Xie Chen
Yanmin Qian
37
6
0
20 Mar 2024
TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration
  Transducer
TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer
Yu Xi
Hao Li
Baochen Yang
Haoyu Li
Hai-kun Xu
Kai Yu
35
1
0
20 Mar 2024
Speech-driven Personalized Gesture Synthetics: Harnessing Automatic
  Fuzzy Feature Inference
Speech-driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference
Fan Zhang
Zhaohan Wang
Xin Lyu
Siyuan Zhao
Mengjian Li
...
Naye Ji
Hui Du
Fuxing Gao
Hao Wu
Shunman Li
VGen
48
3
0
16 Mar 2024
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast
  Conformer
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
Maxime Burchi
Krishna C. Puvvada
Jagadeesh Balam
Boris Ginsburg
Radu Timofte
46
8
0
14 Mar 2024
The Garden of Forking Paths: Observing Dynamic Parameters Distribution
  in Large Language Models
The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models
Carlo Nicolini
Jacopo Staiano
Bruno Lepri
Raffaele Marino
MoE
34
1
0
13 Mar 2024
Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition
Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition
Wenjing Zhu
Sining Sun
Changhao Shan
Peng Fan
Qing Yang
37
1
0
13 Mar 2024
SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech
  Recognition Evaluation
SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation
Jiayu Du
Jinpeng Li
Guoguo Chen
Wei-Qiang Zhang
ELM
37
3
0
13 Mar 2024
Real-Time Multimodal Cognitive Assistant for Emergency Medical Services
Real-Time Multimodal Cognitive Assistant for Emergency Medical Services
Keshara Weerasinghe
Saahith Janapati
Xueren Ge
Sion Kim
S. Iyer
John A. Stankovic
H. Alemzadeh
46
2
0
11 Mar 2024
FFSTC: Fongbe to French Speech Translation Corpus
FFSTC: Fongbe to French Speech Translation Corpus
D. F. Kponou
F. Laleye
E. C. Ezin
29
0
0
08 Mar 2024
Efficient High-Resolution Time Series Classification via Attention
  Kronecker Decomposition
Efficient High-Resolution Time Series Classification via Attention Kronecker Decomposition
Aosong Feng
Jialin Chen
Juan Garza
Brooklyn Berry
Francisco Salazar
Yifeng Gao
Rex Ying
Leandros Tassiulas
46
1
0
07 Mar 2024
CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional
  Encoding for Single- and Multi-Channel Speaker Separation
CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single- and Multi-Channel Speaker Separation
Vahid Ahmadi Kalkhorani
DeLiang Wang
46
3
0
06 Mar 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and
  Diffusion Models
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Zeqian Ju
Yuancheng Wang
Kai Shen
Xu Tan
Detai Xin
...
Shikun Zhang
Jiang Bian
Lei He
Jinyu Li
Sheng Zhao
DiffM
49
147
0
05 Mar 2024
SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
Zhiyun Fan
Linhao Dong
Jun Zhang
Lu Lu
Zejun Ma
43
5
0
04 Mar 2024
IndicVoices: Towards building an Inclusive Multilingual Speech Dataset
  for Indian Languages
IndicVoices: Towards building an Inclusive Multilingual Speech Dataset for Indian Languages
Tahir Javed
J. Nawale
E. George
Sakshi Joshi
Kaushal Bhogale
...
M. ManickamK
C. V. Vaijayanthi
Krishnan Srinivasa Raghavan Karunganni
Pratyush Kumar
Mitesh M Khapra
41
16
0
04 Mar 2024
JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge
  Distillation for Visual Speech Recognition
JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition
Chang Sun
Hong Yang
Bo Qin
VLM
35
1
0
04 Mar 2024
Partial Federated Learning
Partial Federated Learning
Tiantian Feng
Anil Ramakrishna
Jimit Majmudar
Charith Peris
Jixuan Wang
Clement Chung
Richard Zemel
Morteza Ziyadi
Rahul Gupta
44
1
0
03 Mar 2024
Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn
  Medical Interview
Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview
Heyang Liu
Yu Wang
Yanfeng Wang
46
0
0
01 Mar 2024
SoD$^2$: Statically Optimizing Dynamic Deep Neural Network
SoD2^22: Statically Optimizing Dynamic Deep Neural Network
Wei Niu
Gagan Agrawal
Bin Ren
33
4
0
29 Feb 2024
Compact Speech Translation Models via Discrete Speech Units Pretraining
Compact Speech Translation Models via Discrete Speech Units Pretraining
Tsz Kin Lam
Alexandra Birch
Barry Haddow
61
2
0
29 Feb 2024
Extending Multilingual Speech Synthesis to 100+ Languages without
  Transcribed Data
Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data
Takaaki Saeki
Gary Wang
Nobuyuki Morioka
Isaac Elias
Kyle Kastner
...
Andrew Rosenberg
Bhuvana Ramabhadran
Heiga Zen
Francoise Beaufays
Hadar Shemtov
43
13
0
29 Feb 2024
Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale
  Speech Recognition
Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition
Jeehyun Lee
Yerin Choi
Tae-Jin Song
M. Koo
16
4
0
29 Feb 2024
Exploration of Adapter for Noise Robust Automatic Speech Recognition
Exploration of Adapter for Noise Robust Automatic Speech Recognition
Hao Shi
Tatsuya Kawahara
45
5
0
28 Feb 2024
Extreme Encoder Output Frame Rate Reduction: Improving Computational
  Latencies of Large End-to-End Models
Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models
Rohit Prabhavalkar
Zhong Meng
Weiran Wang
Adam Stooke
Xingyu Cai
Yanzhang He
Arun Narayanan
Dongseong Hwang
Tara N. Sainath
Pedro J. Moreno
30
8
0
27 Feb 2024
PIDformer: Transformer Meets Control Theory
PIDformer: Transformer Meets Control Theory
Tam Nguyen
César A. Uribe
Tan-Minh Nguyen
Richard G. Baraniuk
56
7
0
25 Feb 2024
Previous
123...8910...333435
Next