ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,750 papers shown
Title
PIDformer: Transformer Meets Control Theory
PIDformer: Transformer Meets Control Theory
Tam Nguyen
César A. Uribe
Tan-Minh Nguyen
Richard G. Baraniuk
56
7
0
25 Feb 2024
Direct Punjabi to English speech translation using discrete units
Direct Punjabi to English speech translation using discrete units
Prabhjot Kaur
L. A. M. Bush
Weisong Shi
34
0
0
25 Feb 2024
Alternating Weak Triphone/BPE Alignment Supervision from Hybrid Model
  Improves End-to-End ASR
Alternating Weak Triphone/BPE Alignment Supervision from Hybrid Model Improves End-to-End ASR
Jintao Jiang
Yingbo Gao
Mohammad Zeineldeen
Zoltán Tüske
34
0
0
23 Feb 2024
HINT: High-quality INPainting Transformer with Mask-Aware Encoding and
  Enhanced Attention
HINT: High-quality INPainting Transformer with Mask-Aware Encoding and Enhanced Attention
Shuang Chen
Amir Atapour-Abarghouei
Hubert P. H. Shum
ViT
45
12
0
22 Feb 2024
How do Hyenas deal with Human Speech? Speech Recognition and Translation
  with ConfHyena
How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena
Marco Gaido
Sara Papi
Matteo Negri
L. Bentivogli
46
1
0
20 Feb 2024
Comparison of Conventional Hybrid and CTC/Attention Decoders for
  Continuous Visual Speech Recognition
Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition
David Gimeno-Gómez
Carlos David Martínez Hinarejos
32
1
0
20 Feb 2024
Handling Ambiguity in Emotion: From Out-of-Domain Detection to
  Distribution Estimation
Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation
Wen Wu
Bo-wen Li
C. Zhang
Chung-Cheng Chiu
Qiujia Li
Junwen Bai
Tara N. Sainath
P. Woodland
38
2
0
20 Feb 2024
Speech Translation with Speech Foundation Models and Large Language
  Models: What is There and What is Missing?
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Marco Gaido
Sara Papi
Matteo Negri
L. Bentivogli
49
13
0
19 Feb 2024
When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate
  Speech into Large Language Models for Depression Detection
When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection
Xiangyu Zhang
Hexin Liu
Kaishuai Xu
Qiquan Zhang
Daijiao Liu
Beena Ahmed
Julien Epps
31
8
0
17 Feb 2024
Supporting Experts with a Multimodal Machine-Learning-Based Tool for
  Human Behavior Analysis of Conversational Videos
Supporting Experts with a Multimodal Machine-Learning-Based Tool for Human Behavior Analysis of Conversational Videos
Riku Arakawa
Kiyosu Maeda
Hiromu Yakura
37
3
0
17 Feb 2024
Bidirectional Generative Pre-training for Improving Time Series
  Representation Learning
Bidirectional Generative Pre-training for Improving Time Series Representation Learning
Ziyang Song
Qincheng Lu
He Zhu
Yue Li
AI4TS
31
3
0
14 Feb 2024
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot
  Text-to-Speech
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
Shengpeng Ji
Ziyue Jiang
Hanting Wang
Jia-li Zuo
Zhou Zhao
40
10
0
14 Feb 2024
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
Ziyang Ma
Guanrou Yang
Yifan Yang
Zhifu Gao
Jiaming Wang
...
Fan Yu
Qian Chen
Siqi Zheng
Shiliang Zhang
Xie Chen
AuLLM
55
41
0
13 Feb 2024
Self-consistent context aware conformer transducer for speech
  recognition
Self-consistent context aware conformer transducer for speech recognition
Konstantin Kolokolov
Pavel Pekichev
Karthik Raghunathan
22
0
0
09 Feb 2024
Resolving Transcription Ambiguity in Spanish: A Hybrid Acoustic-Lexical
  System for Punctuation Restoration
Resolving Transcription Ambiguity in Spanish: A Hybrid Acoustic-Lexical System for Punctuation Restoration
Xiliang Zhu
Chia-Tien Chang
Shayna Gardiner
David Rossouw
Jonas Robertson
35
1
0
05 Feb 2024
CompeteSMoE -- Effective Training of Sparse Mixture of Experts via
  Competition
CompeteSMoE -- Effective Training of Sparse Mixture of Experts via Competition
Quang Pham
Giang Do
Huy Nguyen
TrungTin Nguyen
Chenghao Liu
...
Binh T. Nguyen
Savitha Ramasamy
Xiaoli Li
Steven C. H. Hoi
Nhat Ho
30
18
0
04 Feb 2024
Retrieval Augmented End-to-End Spoken Dialog Models
Retrieval Augmented End-to-End Spoken Dialog Models
Mingqiu Wang
Izhak Shafran
H. Soltau
Wei Han
Yuan Cao
Dian Yu
Laurent El Shafey
RALM
AuLLM
30
11
0
02 Feb 2024
Low-Resource Cross-Domain Singing Voice Synthesis via Reduced
  Self-Supervised Speech Representations
Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations
Panos Kakoulidis
Nikolaos Ellinas
G. Vamvoukakis
Myrsini Christidou
Alexandra Vioni
...
Junkwang Oh
Gunu Jho
Inchul Hwang
Pirros Tsiakoulis
Aimilios Chalamandaris
28
1
0
02 Feb 2024
Streaming Sequence Transduction through Dynamic Compression
Streaming Sequence Transduction through Dynamic Compression
Weiting Tan
Yunmo Chen
Tongfei Chen
Guanghui Qin
Haoran Xu
Heidi C. Zhang
Benjamin Van Durme
Philipp Koehn
24
2
0
02 Feb 2024
BAT: Learning to Reason about Spatial Sounds with Large Language Models
BAT: Learning to Reason about Spatial Sounds with Large Language Models
Zhisheng Zheng
Puyuan Peng
Ziyang Ma
Xie Chen
Eunsol Choi
David Harwath
LRM
35
14
0
02 Feb 2024
Frame-Wise Breath Detection with Self-Training: An Exploration of
  Enhancing Breath Naturalness in Text-to-Speech
Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech
Dong Yang
Tomoki Koriyama
Yuki Saito
32
1
0
01 Feb 2024
Exploring the limits of decoder-only models trained on public speech
  recognition corpora
Exploring the limits of decoder-only models trained on public speech recognition corpora
Ankit Gupta
G. Saon
Brian Kingsbury
OffRL
25
5
0
31 Jan 2024
Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and
  ACE-KiSing
Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing
Jiatong Shi
Yueqian Lin
Xinyi Bai
Keyi Zhang
Yuning Wu
Yuxun Tang
Yifeng Yu
Qin Jin
Shinji Watanabe
33
6
0
31 Jan 2024
Local and Global Contexts for Conversation
Local and Global Contexts for Conversation
Zuoquan Lin
Xinyi Shen
29
1
0
31 Jan 2024
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible
  recipes, self-supervised front-ends, and off-the-shelf models
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
Jee-weon Jung
Wangyou Zhang
Jiatong Shi
Zakaria Aldeneh
Takuya Higuchi
B. Theobald
Ahmed Hussen Abdelaziz
Shinji Watanabe
81
21
0
30 Jan 2024
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on
  E-Branchformer
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Yifan Peng
Jinchuan Tian
William Chen
Siddhant Arora
Brian Yan
...
Kwanghee Choi
Jiatong Shi
Xuankai Chang
Jee-weon Jung
Shinji Watanabe
VLM
OSLM
34
40
0
30 Jan 2024
Diffusion-based Graph Generative Methods
Diffusion-based Graph Generative Methods
Hongyang Chen
Can Xu
Lingyu Zheng
Qiang Zhang
Xuemin Lin
DiffM
MedIm
37
0
0
28 Jan 2024
Intelli-Z: Toward Intelligible Zero-Shot TTS
Intelli-Z: Toward Intelligible Zero-Shot TTS
Sunghee Jung
Won Jang
Jaesam Yoon
Bongwan Kim
38
0
0
25 Jan 2024
Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?
Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?
Huy Nguyen
Pedram Akbarian
Nhat Ho
MoE
30
10
0
25 Jan 2024
SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation
SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation
Dong Zhang
Xin Zhang
Jun Zhan
Shimin Li
Yaqian Zhou
Xipeng Qiu
AuLLM
BDL
42
16
0
24 Jan 2024
Locality enhanced dynamic biasing and sampling strategies for contextual
  ASR
Locality enhanced dynamic biasing and sampling strategies for contextual ASR
Md. Asif Jalal
Pablo Peso Parada
George Pavlidis
Vasileios Moschopoulos
Karthikeyan P. Saravanan
...
Jisi Zhang
Anastasios Drosou
Gil Ho Lee
Jungin Lee
Seokyeong Jung
28
2
0
23 Jan 2024
Multilingual and Fully Non-Autoregressive ASR with Large Language Model
  Fusion: A Comprehensive Study
Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study
Yifan Jiang
Cyril Allauzen
Tongzhou Chen
Kilol Gupta
Ke Hu
James Qin
Yu Zhang
Yongqiang Wang
Shuo-yiin Chang
Tara N. Sainath
MoMe
40
10
0
23 Jan 2024
EEND-M2F: Masked-attention mask transformers for speaker diarization
EEND-M2F: Masked-attention mask transformers for speaker diarization
Marc Härkönen
Samuel J. Broughton
Lahiru Samarakoon
44
7
0
23 Jan 2024
Consistency Based Unsupervised Self-training For ASR Personalisation
Consistency Based Unsupervised Self-training For ASR Personalisation
Jisi Zhang
Vandana Rajan
Haaris Mehmood
David Tuckey
Pablo Peso Parada
Md. Asif Jalal
Karthikeyan P. Saravanan
Gil Ho Lee
Jungin Lee
Seokyeong Jung
26
0
0
22 Jan 2024
Benchmarking Large Multimodal Models against Common Corruptions
Benchmarking Large Multimodal Models against Common Corruptions
Jiawei Zhang
Tianyu Pang
Chao Du
Yi Ren
Bo-wen Li
Min Lin
MLLM
35
14
0
22 Jan 2024
Keep Decoding Parallel with Effective Knowledge Distillation from
  Language Models to End-to-end Speech Recognisers
Keep Decoding Parallel with Effective Knowledge Distillation from Language Models to End-to-end Speech Recognisers
Michael Hentschel
Yuta Nishikawa
Tatsuya Komatsu
Yusuke Fujita
27
4
0
22 Jan 2024
Word-Level ASR Quality Estimation for Efficient Corpus Sampling and
  Post-Editing through Analyzing Attentions of a Reference-Free Metric
Word-Level ASR Quality Estimation for Efficient Corpus Sampling and Post-Editing through Analyzing Attentions of a Reference-Free Metric
Golara Javadi
K. Yuksel
Yunsu Kim
Thiago Castro Ferreira
Mohamed Al-Badrashiny
32
2
0
20 Jan 2024
LMUFormer: Low Complexity Yet Powerful Spiking Model With Legendre
  Memory Units
LMUFormer: Low Complexity Yet Powerful Spiking Model With Legendre Memory Units
Zeyu Liu
Gourav Datta
Anni Li
P. Beerel
35
9
0
20 Jan 2024
Contextualized Automatic Speech Recognition with Attention-Based Bias
  Phrase Boosted Beam Search
Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search
Yui Sudo
Muhammad Shakeel
Yosuke Fukumoto
Yifan Peng
Shinji Watanabe
34
5
0
19 Jan 2024
Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks
Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks
Yichao Du
Zhirui Zhang
Linan Yue
Xu Huang
Yuqing Zhang
Tong Xu
Linli Xu
Enhong Chen
FedML
75
5
0
18 Jan 2024
Towards Hierarchical Spoken Language Dysfluency Modeling
Towards Hierarchical Spoken Language Dysfluency Modeling
Jiachen Lian
Gopala Anumanchipalli
32
9
0
18 Jan 2024
Efficient Adapter Finetuning for Tail Languages in Streaming
  Multilingual ASR
Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR
Junwen Bai
Bo-wen Li
Qiujia Li
Tara N. Sainath
Trevor Strohman
38
3
0
17 Jan 2024
NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant
  Meeting Transcription
NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription
Alon Vinnikov
Amir Ivry
Aviv Hurvitz
Igor Abramovski
S. Koubi
...
S. Sivasankaran
Yifan Gong
Min Tang
Huaming Wang
Eyal Krupka
41
20
0
16 Jan 2024
Improving ASR Contextual Biasing with Guided Attention
Improving ASR Contextual Biasing with Guided Attention
Jiyang Tang
Kwangyoun Kim
Suwon Shon
Felix Wu
Prashant Sridhar
Shinji Watanabe
31
8
0
16 Jan 2024
Promptformer: Prompted Conformer Transducer for ASR
Promptformer: Prompted Conformer Transducer for ASR
Sergio Duarte Torres
Arunasish Sen
Aman Rana
Lukas Drude
Alejandro Gomez-Alanis
Andreas Schwarz
Leif Rädel
Volker Leutnant
40
3
0
14 Jan 2024
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided
  Sequence Reordering
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering
Ya-Zhen Song
Zhuo Chen
Xiaofei Wang
Ziyang Ma
Xie Chen
AuLLM
27
37
0
14 Jan 2024
Joint Unsupervised and Supervised Training for Automatic Speech
  Recognition via Bilevel Optimization
Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization
A. F. M. Saif
Xiaodong Cui
Han Shen
Songtao Lu
Brian Kingsbury
Tianyi Chen
37
3
0
13 Jan 2024
LCB-net: Long-Context Biasing for Audio-Visual Speech Recognition
LCB-net: Long-Context Biasing for Audio-Visual Speech Recognition
Fan Yu
Haoxu Wang
Xian Shi
Shiliang Zhang
27
3
0
12 Jan 2024
R-BI: Regularized Batched Inputs enhance Incremental Decoding Framework
  for Low-Latency Simultaneous Speech Translation
R-BI: Regularized Batched Inputs enhance Incremental Decoding Framework for Low-Latency Simultaneous Speech Translation
Jiaxin Guo
Zhanglin Wu
Zongyao Li
Hengchao Shang
Daimeng Wei
Xiaoyu Chen
Zhiqiang Rao
Shaojun Li
Hao Yang
35
1
0
11 Jan 2024
UCorrect: An Unsupervised Framework for Automatic Speech Recognition
  Error Correction
UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction
Jiaxin Guo
Minghan Wang
Xiaosong Qiao
Daimeng Wei
Hengchao Shang
...
Yinglu Li
Chang Su
Min Zhang
Shimin Tao
Hao Yang
31
6
0
11 Jan 2024
Previous
123...91011...333435
Next