ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,749 papers shown
Title
Exploiting Consistency-Preserving Loss and Perceptual Contrast
  Stretching to Boost SSL-based Speech Enhancement
Exploiting Consistency-Preserving Loss and Perceptual Contrast Stretching to Boost SSL-based Speech Enhancement
Muhammad Salman Khan
Moreno La Quatra
Kuo-Hsuan Hung
Szu-Wei Fu
Sabato Marco Siniscalchi
Yu Tsao
31
2
0
08 Aug 2024
Survey: Transformer-based Models in Data Modality Conversion
Survey: Transformer-based Models in Data Modality Conversion
Elyas Rashno
Amir Eskandari
Aman Anand
F. Zulkernine
MedIm
35
0
0
08 Aug 2024
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency
Jiawei Huang
Chen Zhang
Yi Ren
Ziyue Jiang
Zhenhui Ye
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
43
2
0
08 Aug 2024
HydraFormer: One Encoder For All Subsampling Rates
HydraFormer: One Encoder For All Subsampling Rates
Yaoxun Xu
Xingchen Song
Zhiyong Wu
Di Wu
Zhendong Peng
Binbin Zhang
25
0
0
08 Aug 2024
Towards Linguistic Neural Representation Learning and Sentence Retrieval
  from Electroencephalogram Recordings
Towards Linguistic Neural Representation Learning and Sentence Retrieval from Electroencephalogram Recordings
Jinzhao Zhou
Yiqun Duan
Ziyi Zhao
Yu-Cheng Chang
Yu-Kai Wang
T. Do
Chin-Teng Lin
47
1
0
08 Aug 2024
Speaker Adaptation for Quantised End-to-End ASR Models
Speaker Adaptation for Quantised End-to-End ASR Models
Qiuming Zhao
Guangzhi Sun
Chao Zhang
Mingxing Xu
Thomas Fang Zheng
46
1
0
07 Aug 2024
HiQuE: Hierarchical Question Embedding Network for Multimodal Depression
  Detection
HiQuE: Hierarchical Question Embedding Network for Multimodal Depression Detection
Juho Jung
Chaewon Kang
Jeewoo Yoon
Seungbae Kim
Jinyoung Han
33
5
0
07 Aug 2024
TF-Locoformer: Transformer with Local Modeling by Convolution for Speech
  Separation and Enhancement
TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement
Kohei Saijo
G. Wichern
François G. Germain
Zexu Pan
Jonathan Le Roux
46
7
0
06 Aug 2024
DiM-Gesture: Co-Speech Gesture Generation with Adaptive Layer
  Normalization Mamba-2 framework
DiM-Gesture: Co-Speech Gesture Generation with Adaptive Layer Normalization Mamba-2 framework
Fan Zhang
Naye Ji
Fuxing Gao
Bozuo Zhao
Jingmei Wu
...
Zhenqing Ye
Jiayang Zhu
WeiFan Zhong
Leyao Yan
Xiaomeng Ma
32
0
0
01 Aug 2024
Towards Achieving Human Parity on End-to-end Simultaneous Speech
  Translation via LLM Agent
Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent
Shanbo Cheng
Zhichao Huang
Tom Ko
Hang Li
Ningxin Peng
Lu Xu
Qini Zhang
48
3
0
31 Jul 2024
On the Problem of Text-To-Speech Model Selection for Synthetic Data
  Generation in Automatic Speech Recognition
On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition
Nick Rossenbach
Ralf Schluter
S. Sakti
44
2
0
31 Jul 2024
Dynamic Language Group-Based MoE: Enhancing Code-Switching Speech
  Recognition with Hierarchical Routing
Dynamic Language Group-Based MoE: Enhancing Code-Switching Speech Recognition with Hierarchical Routing
Hukai Huang
Shenghui Lu
Yahui Shan
He Qu
Wenhao Guan
Q. Hong
Lin Li
MoE
38
0
0
26 Jul 2024
Coupling Speech Encoders with Downstream Text Models
Coupling Speech Encoders with Downstream Text Models
Ciprian Chelba
J. Schalkwyk
AuLLM
45
0
0
24 Jul 2024
Speech Editing -- a Summary
Speech Editing -- a Summary
Tobias Kässmann
Yining Liu
Danni Liu
32
0
0
24 Jul 2024
Synth4Kws: Synthesized Speech for User Defined Keyword Spotting in Low
  Resource Environments
Synth4Kws: Synthesized Speech for User Defined Keyword Spotting in Low Resource Environments
Pai Zhu
Dhruuv Agarwal
Jacob Bartel
Kurt Partridge
H. Park
Quan Wang
46
1
0
23 Jul 2024
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant
  Automatic Speech Recognition and Diarization
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
Samuele Cornell
Taejin Park
Steve Huang
Christoph Boeddeker
Xuankai Chang
Matthew Maciejewski
Matthew Wiesner
Paola García
Shinji Watanabe
39
9
0
23 Jul 2024
dMel: Speech Tokenization made Simple
dMel: Speech Tokenization made Simple
Richard He Bai
Tatiana Likhomanenko
Ruixiang Zhang
Zijin Gu
Zakaria Aldeneh
Navdeep Jaitly
43
4
0
22 Jul 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep
  Speaker Representation Learning
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
39
4
0
21 Jul 2024
PolyFormer: Scalable Node-wise Filters via Polynomial Graph Transformer
PolyFormer: Scalable Node-wise Filters via Polynomial Graph Transformer
Jiahong Ma
Mingguo He
Zhewei Wei
52
2
0
19 Jul 2024
Linear-Complexity Self-Supervised Learning for Speech Processing
Linear-Complexity Self-Supervised Learning for Speech Processing
Shucong Zhang
Titouan Parcollet
Rogier van Dalen
Sourav Bhattacharya
46
1
0
18 Jul 2024
Robust ASR Error Correction with Conservative Data Filtering
Robust ASR Error Correction with Conservative Data Filtering
Takuma Udagawa
Masayuki Suzuki
Masayasu Muraoka
Gakuto Kurata
59
0
0
18 Jul 2024
Investigating the Effect of Label Topology and Training Criterion on ASR
  Performance and Alignment Quality
Investigating the Effect of Label Topology and Training Criterion on ASR Performance and Alignment Quality
Tina Raissi
Christoph Luscher
Simon Berger
Ralf Schluter
Hermann Ney
40
2
0
16 Jul 2024
RIMformer: An End-to-End Transformer for FMCW Radar Interference
  Mitigation
RIMformer: An End-to-End Transformer for FMCW Radar Interference Mitigation
Ziang Zhang
Guangzhi Chen
Youlong Weng
Shunchuan Yang
Zhiyu Jia
Jingxuan Chen
29
1
0
16 Jul 2024
Genomic Language Models: Opportunities and Challenges
Genomic Language Models: Opportunities and Challenges
Gonzalo Benegas
Chengzhong Ye
C. Albors
Jianan Canal Li
Yun S. Song
AI4CE
LM&MA
ELM
50
18
0
16 Jul 2024
Improving Neural Biasing for Contextual Speech Recognition by Early
  Context Injection and Text Perturbation
Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation
Ruizhe Huang
M. Yarmohammadi
Sanjeev Khudanpur
Dan Povey
43
2
0
14 Jul 2024
CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based
  Streaming ASR
CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR
Wenbo Zhao
Ziwei Li
Chuan Yu
Zhijian Ou
AI4TS
28
0
0
14 Jul 2024
Speech Slytherin: Examining the Performance and Efficiency of Mamba for
  Speech Separation, Recognition, and Synthesis
Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis
Xilin Jiang
Yinghao Aaron Li
Adrian Nicolas Florea
Cong Han
N. Mesgarani
Mamba
46
9
0
13 Jul 2024
FlashAttention-3: Fast and Accurate Attention with Asynchrony and
  Low-precision
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Jay Shah
Ganesh Bikshandi
Ying Zhang
Vijay Thakkar
Pradeep Ramani
Tri Dao
74
114
0
11 Jul 2024
Autoregressive Speech Synthesis without Vector Quantization
Autoregressive Speech Synthesis without Vector Quantization
Lingwei Meng
Long Zhou
Shujie Liu
Sanyuan Chen
Bing Han
...
Jinyu Li
Sheng Zhao
Xixin Wu
Helen Meng
Furu Wei
54
31
0
11 Jul 2024
Scaling Law in Neural Data: Non-Invasive Speech Decoding with 175 Hours
  of EEG Data
Scaling Law in Neural Data: Non-Invasive Speech Decoding with 175 Hours of EEG Data
Motoshige Sato
Kenichi Tomeoka
Ilya Horiguchi
Kai Arulkumaran
Ryota Kanai
Shuntaro Sasai
40
3
0
10 Jul 2024
Dynamic Encoder Size Based on Data-Driven Layer-wise Pruning for Speech
  Recognition
Dynamic Encoder Size Based on Data-Driven Layer-wise Pruning for Speech Recognition
Jingjing Xu
Wei Zhou
Zijian Yang
Eugen Beck
Ralf Schlueter
38
1
0
10 Jul 2024
Improving Speech Enhancement by Integrating Inter-Channel and Band
  Features with Dual-branch Conformer
Improving Speech Enhancement by Integrating Inter-Channel and Band Features with Dual-branch Conformer
Jizhen Li
Xinmeng Xu
Weiping Tu
Yuhong Yang
Rong Zhu
32
1
0
09 Jul 2024
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based
  Speech Recognition
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Ye Bai
Jingping Chen
Jitong Chen
Wei Chen
Zhuo Chen
...
Wanyi Zhang
Yang Zhang
Yawei Zhang
Yijie Zheng
Ming Zou
AuLLM
52
19
0
05 Jul 2024
Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of
  Language Models
Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models
Bolaji Yusuf
M. Baskar
Andrew Rosenberg
Bhuvana Ramabhadran
45
1
0
05 Jul 2024
Serialized Output Training by Learned Dominance
Serialized Output Training by Learned Dominance
Ying Shi
Lantian Li
Shi Yin
D. Wang
Jiqing Han
23
4
0
04 Jul 2024
Improving Self-supervised Pre-training using Accent-Specific Codebooks
Improving Self-supervised Pre-training using Accent-Specific Codebooks
Darshan Prabhu
Abhishek Gupta
Omkar Nitsure
P. Jyothi
Sriram Ganapathy
SSL
47
0
0
04 Jul 2024
Multi-Convformer: Extending Conformer with Multiple Convolution Kernels
Multi-Convformer: Extending Conformer with Multiple Convolution Kernels
Darshan Prabhu
Yifan Peng
P. Jyothi
Shinji Watanabe
39
0
0
04 Jul 2024
Codec-ASR: Training Performant Automatic Speech Recognition Systems with
  Discrete Speech Representations
Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
Kunal Dhawan
Nithin Rao Koluguri
Ante Jukić
Ryan Langman
Jagadeesh Balam
Boris Ginsburg
49
1
0
03 Jul 2024
Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End
  Multi-Accent Speech Recognition
Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End Multi-Accent Speech Recognition
Jinming Chen
Jingyi Fang
Yuanzhong Zheng
Yaoxuan Wang
Haojun Fei
26
1
0
03 Jul 2024
Self-supervised ASR Models and Features For Dysarthric and Elderly
  Speech Recognition
Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition
Shujie Hu
Xurong Xie
Mengzhe Geng
Zengrui Jin
Jiajun Deng
...
Yi Wang
Mingyu Cui
Tianzi Wang
Helen Meng
Xunying Liu
51
6
0
03 Jul 2024
VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic
  Features
VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic Features
Tomoki Koriyama
41
0
0
03 Jul 2024
Towards Robust Speech Representation Learning for Thousands of Languages
Towards Robust Speech Representation Learning for Thousands of Languages
William Chen
Wangyou Zhang
Yifan Peng
Xinjian Li
Jinchuan Tian
Jiatong Shi
Xuankai Chang
Soumi Maiti
Karen Livescu
Shinji Watanabe
ELM
42
6
0
30 Jun 2024
An Attribute Interpolation Method in Speech Synthesis by Model Merging
An Attribute Interpolation Method in Speech Synthesis by Model Merging
Masato Murata
Koichi Miyazaki
Tomoki Koriyama
MoMe
45
4
0
30 Jun 2024
Open-Source Conversational AI with SpeechBrain 1.0
Open-Source Conversational AI with SpeechBrain 1.0
Mirco Ravanelli
Titouan Parcollet
Adel Moumen
Sylvain de Langen
Cem Subakan
...
Salima Mdhaffar
G. Laperriere
Mickael Rouvier
Renato De Mori
Yannick Esteve
VLM
47
10
0
29 Jun 2024
BESTOW: Efficient and Streamable Speech Language Model with the Best of
  Two Worlds in GPT and T5
BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
Zhehuai Chen
He Huang
Oleksii Hrinchuk
Krishna C. Puvvada
Nithin Rao Koluguri
Piotr Żelasko
Jagadeesh Balam
Boris Ginsburg
AuLLM
RALM
40
10
0
28 Jun 2024
SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
Qiuming Zhao
Guangzhi Sun
Chao Zhang
Mingxing Xu
Thomas Fang Zheng
MoE
31
0
0
28 Jun 2024
Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition
  Systems
Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems
Zheng Fang
Tao Wang
Lingchen Zhao
Shenyi Zhang
Bowen Li
Yunjie Ge
Q. Li
Chao Shen
Qian Wang
16
4
0
27 Jun 2024
Token-Weighted RNN-T for Learning from Flawed Data
Token-Weighted RNN-T for Learning from Flawed Data
Gil Keren
Wei Zhou
Ozlem Kalinli
43
0
0
26 Jun 2024
SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and
  Non-streaming Code-Switching ASR
SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR
Shuaishuai Ye
Shunfei Chen
Xinhui Hu
Xinkang Xu
MoE
43
3
0
26 Jun 2024
Sequential Editing for Lifelong Training of Speech Recognition Models
Sequential Editing for Lifelong Training of Speech Recognition Models
Devang Kulshreshtha
Saket Dingliwal
Brady C. Houston
Nikolaos Pappas
S. Ronanki
KELM
CLL
34
1
0
25 Jun 2024
Previous
123...567...333435
Next