ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,750 papers shown
Title
Masked Audio Text Encoders are Effective Multi-Modal Rescorers
Masked Audio Text Encoders are Effective Multi-Modal Rescorers
Jason (Jinglun) Cai
Monica Sunkara
Xilai Li
Anshu Bhatia
Xiao Pan
S. Bodapati
33
3
0
11 May 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by
  Unsupervised Learning from Voice Recordings
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings
Wei Xue
Yiwen Wang
Qi-fei Liu
Yi-Ting Guo
44
1
0
09 May 2023
Robust Acoustic and Semantic Contextual Biasing in Neural Transducers
  for Speech Recognition
Robust Acoustic and Semantic Contextual Biasing in Neural Transducers for Speech Recognition
Xuandi Fu
Kanthashree Mysore Sathyendra
Ankur Gandhe
Jing Liu
Grant P. Strimel
Ross McGowan
Athanasios Mouchtaris
37
14
0
09 May 2023
Exploration of Language Dependency for Japanese Self-Supervised Speech
  Representation Models
Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
35
3
0
09 May 2023
Zero-shot personalized lip-to-speech synthesis with face image based
  voice control
Zero-shot personalized lip-to-speech synthesis with face image based voice control
Zheng-Yan Sheng
Yang Ai
Zhenhua Ling
CVBM
27
5
0
09 May 2023
Who Needs Decoders? Efficient Estimation of Sequence-level Attributes
Who Needs Decoders? Efficient Estimation of Sequence-level Attributes
Yassir Fathullah
Puria Radmard
Adian Liusie
Mark Gales
OODD
32
1
0
09 May 2023
Fast Conformer with Linearly Scalable Attention for Efficient Speech
  Recognition
Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition
Dima Rekesh
Nithin Rao Koluguri
Samuel Kriman
Somshubra Majumdar
Vahid Noroozi
...
Oleksii Hrinchuk
Krishna Puvvada
Ankur Kumar
Jagadeesh Balam
Boris Ginsburg
61
84
0
08 May 2023
Leveraging Synthetic Targets for Machine Translation
Leveraging Synthetic Targets for Machine Translation
Sarthak Mittal
Oleksii Hrinchuk
Oleksii Kuchaiev
35
2
0
07 May 2023
X-LLM: Bootstrapping Advanced Large Language Models by Treating
  Multi-Modalities as Foreign Languages
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Feilong Chen
Minglun Han
Haozhi Zhao
Qingyang Zhang
Jing Shi
Shuang Xu
Bo Xu
MLLM
46
116
0
07 May 2023
Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR
  with Internal Language Model Estimation
Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR with Internal Language Model Estimation
Nilaksh Das
Monica Sunkara
S. Bodapati
Jason (Jinglun) Cai
Devang Kulshreshtha
Jeffrey J. Farris
Katrin Kirchhoff
28
2
0
05 May 2023
Hybrid Transducer and Attention based Encoder-Decoder Modeling for
  Speech-to-Text Tasks
Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks
Yun Tang
Anna Y. Sun
Hirofumi Inaguma
Xinyue Chen
Ning Dong
Xutai Ma
Paden Tomasello
J. Pino
48
19
0
04 May 2023
Joint Modelling of Spoken Language Understanding Tasks with Integrated
  Dialog History
Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History
Siddhant Arora
Hayato Futami
E. Tsunoo
Brian Yan
Shinji Watanabe
43
4
0
01 May 2023
Enhancing multilingual speech recognition in air traffic control by
  sentence-level language identification
Enhancing multilingual speech recognition in air traffic control by sentence-level language identification
Peng Fan
Dongyue Guo
Jianwei Zhang
Bo Yang
Yi Lin
19
6
0
29 Apr 2023
Vision Conformer: Incorporating Convolutions into Vision Transformer
  Layers
Vision Conformer: Incorporating Convolutions into Vision Transformer Layers
Brian Kenji Iwana
Akihiro Kusuda
ViT
47
2
0
27 Apr 2023
UniNeXt: Exploring A Unified Architecture for Vision Recognition
UniNeXt: Exploring A Unified Architecture for Vision Recognition
Fangjian Lin
Jianlong Yuan
Sitong Wu
Fan Wang
Zhibin Wang
ViT
32
14
0
26 Apr 2023
Depth-Relative Self Attention for Monocular Depth Estimation
Depth-Relative Self Attention for Monocular Depth Estimation
Kyuhong Shim
Jiyoung Kim
Gusang Lee
B. Shim
MDE
28
7
0
25 Apr 2023
DiffVoice: Text-to-Speech with Latent Diffusion
DiffVoice: Text-to-Speech with Latent Diffusion
Zhijun Liu
Yiwei Guo
K. Yu
DiffM
33
22
0
23 Apr 2023
Non-autoregressive End-to-end Approaches for Joint Automatic Speech
  Recognition and Spoken Language Understanding
Non-autoregressive End-to-end Approaches for Joint Automatic Speech Recognition and Spoken Language Understanding
Mohan Li
R. Doddipatla
36
6
0
21 Apr 2023
A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at
  Scale
A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale
Cal Peyser
M. Picheny
Kyunghyun Cho
Rohit Prabhavalkar
Ronny Huang
Tara N. Sainath
AI4TS
42
1
0
19 Apr 2023
CB-Conformer: Contextual biasing Conformer for biased word recognition
CB-Conformer: Contextual biasing Conformer for biased word recognition
Yaoxun Xu
Baiji Liu
Qiaochu Huang and
Xingcheng Song
Zhiyong Wu
Shiyin Kang
Helen Meng
76
14
0
19 Apr 2023
Dynamic Chunk Convolution for Unified Streaming and Non-Streaming
  Conformer ASR
Dynamic Chunk Convolution for Unified Streaming and Non-Streaming Conformer ASR
Xilai Li
Goeric Huybrechts
S. Ronanki
Jeffrey J. Farris
S. Bodapati
43
6
0
18 Apr 2023
Approximate Nearest Neighbour Phrase Mining for Contextual Speech
  Recognition
Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition
Maurits J. R. Bleeker
P. Swietojanski
Stefan Braun
Xiaodan Zhuang
55
8
0
18 Apr 2023
AdaMTL: Adaptive Input-dependent Inference for Efficient Multi-Task
  Learning
AdaMTL: Adaptive Input-dependent Inference for Efficient Multi-Task Learning
Marina Neseem
Ahmed A. Agiza
Sherief Reda
42
6
0
17 Apr 2023
A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers
A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers
Juan Pablo Zuluaga
Amrutha Prasad
Iuliia Nigmatulina
P. Motlícek
Matthias Kleinert
32
22
0
16 Apr 2023
A CTC Alignment-based Non-autoregressive Transformer for End-to-end
  Automatic Speech Recognition
A CTC Alignment-based Non-autoregressive Transformer for End-to-end Automatic Speech Recognition
Ruchao Fan
Wei Chu
Peng Chang
Abeer Alwan
18
10
0
15 Apr 2023
Efficient Sequence Transduction by Jointly Predicting Tokens and
  Durations
Efficient Sequence Transduction by Jointly Predicting Tokens and Durations
Hainan Xu
Fei Jia
Somshubra Majumdar
Hengguan Huang
Shinji Watanabe
Boris Ginsburg
32
19
0
13 Apr 2023
The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of
  Inductive Biases in Machine Learning
The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning
Micah Goldblum
Marc Finzi
K. Rowan
A. Wilson
UQCV
FedML
37
38
0
11 Apr 2023
Sim-T: Simplify the Transformer Network by Multiplexing Technique for
  Speech Recognition
Sim-T: Simplify the Transformer Network by Multiplexing Technique for Speech Recognition
Guangyong Wei
Zhikui Duan
Shiren Li
Guangguang Yang
Xinmei Yu
Junhua Li
30
4
0
11 Apr 2023
Conditional Adapters: Parameter-efficient Transfer Learning with Fast
  Inference
Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference
Tao Lei
Junwen Bai
Siddhartha Brahma
Joshua Ainslie
Kenton Lee
...
Vincent Zhao
Yuexin Wu
Yue Liu
Yu Zhang
Ming-Wei Chang
BDL
AI4CE
32
55
0
11 Apr 2023
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Brian Yan
Jiatong Shi
Yun Tang
Hirofumi Inaguma
Yifan Peng
...
Zhaoheng Ni
Moto Hira
Soumi Maiti
J. Pino
Shinji Watanabe
24
20
0
10 Apr 2023
On the Impact of Voice Anonymization on Speech Diagnostic Applications:
  a Case Study on COVID-19 Detection
On the Impact of Voice Anonymization on Speech Diagnostic Applications: a Case Study on COVID-19 Detection
Yi Zhu
Mohamed Imoussaïne-Aïkous
Carolyn Côté-Lussier
Tiago H. Falk
20
4
0
05 Apr 2023
Designing and Evaluating Speech Emotion Recognition Systems: A reality
  check case study with IEMOCAP
Designing and Evaluating Speech Emotion Recognition Systems: A reality check case study with IEMOCAP
Nikolaos Antoniou
Athanasios Katsamanis
Theodoros Giannakopoulos
Shrikanth Narayanan
39
17
0
03 Apr 2023
Dual-Attention Neural Transducers for Efficient Wake Word Spotting in
  Speech Recognition
Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition
Saumya Yashmohini Sahai
Jing Liu
Thejaswi Muniyappa
Kanthashree Mysore Sathyendra
Anastasios Alexandridis
...
Ross McGowan
Ariya Rastrow
Feng-Ju Chang
Athanasios Mouchtaris
Siegfried Kunzmann
44
5
0
03 Apr 2023
Multilingual Word Error Rate Estimation: e-WER3
Multilingual Word Error Rate Estimation: e-WER3
Shammur A. Chowdhury
Ahmed M. Ali
29
7
0
02 Apr 2023
Lego-Features: Exporting modular encoder features for streaming and
  deliberation ASR
Lego-Features: Exporting modular encoder features for streaming and deliberation ASR
Rami Botros
Rohit Prabhavalkar
J. Schalkwyk
Ciprian Chelba
Tara N. Sainath
Franccoise Beaufays
AuLLM
26
3
0
31 Mar 2023
Practical Conformer: Optimizing size, speed and flops of Conformer for
  on-Device and cloud ASR
Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR
Rami Botros
Anmol Gulati
Tara N. Sainath
K. Choromanski
Ruoming Pang
Trevor Strohman
Weiran Wang
Jiahui Yu
MQ
28
3
0
31 Mar 2023
SynthVSR: Scaling Up Visual Speech Recognition With Synthetic
  Supervision
SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
Xubo Liu
Egor Lakomkin
Konstantinos Vougioukas
Pingchuan Ma
Honglie Chen
...
Niko Moritz
J. Kolár
Stavros Petridis
Maja Pantic
Christian Fuegen
56
19
0
30 Mar 2023
AraSpot: Arabic Spoken Command Spotting
AraSpot: Arabic Spoken Command Spotting
Mahmoud Salhab
H. Harmanani
28
0
0
29 Mar 2023
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot
  AV-ASR
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
29
15
0
29 Mar 2023
When Good and Reproducible Results are a Giant with Feet of Clay: The
  Importance of Software Quality in NLP
When Good and Reproducible Results are a Giant with Feet of Clay: The Importance of Software Quality in NLP
Sara Papi
Marco Gaido
Andrea Pilzer
Matteo Negri
64
10
0
28 Mar 2023
Text is All You Need: Personalizing ASR Models using Controllable Speech
  Synthesis
Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis
Karren D. Yang
Ting-Yao Hu
Jen-Hao Rick Chang
H. Koppula
Oncel Tuzel
48
12
0
27 Mar 2023
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Pingchuan Ma
A. Haliassos
Adriana Fernandez-Lopez
Honglie Chen
Stavros Petridis
Maja Pantic
27
107
0
25 Mar 2023
Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for
  Mandarin Speech Recognition
Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition
Kai Liu
Hailiang Xiong
Gangqiang Yang
Zhengfeng Du
Yewen Cao
D. Shah
18
0
0
23 Mar 2023
Beyond Universal Transformer: block reusing with adaptor in Transformer
  for automatic speech recognition
Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition
Haoyu Tang
Zhaoyi Liu
Chang Zeng
Xinfeng Li
34
1
0
23 Mar 2023
Enhancing Unsupervised Speech Recognition with Diffusion GANs
Enhancing Unsupervised Speech Recognition with Diffusion GANs
Xianchao Wu
DiffM
16
2
0
23 Mar 2023
LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models
LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models
Teerapat Jenrungrot
Michael Chinen
W. Kleijn
Jan Skoglund
Zalan Borsos
Neil Zeghidour
Marco Tagliasacchi
62
19
0
23 Mar 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
48
47
0
21 Mar 2023
Knowledge Distillation from Multiple Foundation Models for End-to-End
  Speech Recognition
Knowledge Distillation from Multiple Foundation Models for End-to-End Speech Recognition
Xiaoyu Yang
Qiujia Li
Chuxu Zhang
P. Woodland
44
6
0
20 Mar 2023
Powerful and Extensible WFST Framework for RNN-Transducer Losses
Powerful and Extensible WFST Framework for RNN-Transducer Losses
A. Laptev
Vladimir Bataev
Igor Gitman
Boris Ginsburg
26
3
0
18 Mar 2023
Effectively Modeling Time Series with Simple Discrete State Spaces
Effectively Modeling Time Series with Simple Discrete State Spaces
Michael Zhang
Khaled Kamal Saab
Michael Poli
Tri Dao
Karan Goel
Christopher Ré
AI4TS
30
45
0
16 Mar 2023
Previous
123...181920...333435
Next