ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,749 papers shown
Title
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with
  Multi-Modal Context and Large Language Model
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
Jinlong Xue
Yayue Deng
Yicheng Han
Yingming Gao
Ya Li
40
4
0
06 Jun 2024
Enhancing CTC-based speech recognition with diverse modeling units
Enhancing CTC-based speech recognition with diverse modeling units
Shiyi Han
Zhihong Lei
Mingbin Xu
Xingyu Na
Zhen Huang
41
0
0
05 Jun 2024
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task
  Learning
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning
Shaolei Zhang
Qingkai Fang
Shoutao Guo
Zhengrui Ma
Min Zhang
Yang Feng
31
5
0
05 Jun 2024
Joint Beam Search Integrating CTC, Attention, and Transducer Decoders
Joint Beam Search Integrating CTC, Attention, and Transducer Decoders
Yui Sudo
Muhammad Shakeel
Yosuke Fukumoto
Brian Yan
Jiatong Shi
Yifan Peng
Shinji Watanabe
27
0
0
05 Jun 2024
Task Arithmetic can Mitigate Synthetic-to-Real Gap in Automatic Speech
  Recognition
Task Arithmetic can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition
Hsuan Su
Hua Farn
Fan-Yun Sun
Shang-Tse Chen
Hung-yi Lee
MoMe
37
3
0
05 Jun 2024
Text Injection for Neural Contextual Biasing
Text Injection for Neural Contextual Biasing
Zhong Meng
Zelin Wu
Rohit Prabhavalkar
Cal Peyser
Weiran Wang
Nanxin Chen
Tara N. Sainath
Bhuvana Ramabhadran
46
3
0
05 Jun 2024
USM RNN-T model weights binarization
USM RNN-T model weights binarization
Oleg Rybakov
Dmitriy Serdyuk
Chengjian Zheng
MQ
34
0
0
05 Jun 2024
ConPCO: Preserving Phoneme Characteristics for Automatic Pronunciation
  Assessment Leveraging Contrastive Ordinal Regularization
ConPCO: Preserving Phoneme Characteristics for Automatic Pronunciation Assessment Leveraging Contrastive Ordinal Regularization
Bi-Cheng Yan
Wei-Cheng Chao
Jiun-Ting Li
Yi-Cheng Wang
Hsin-Wei Wang
Meng-Shin Lin
Berlin Chen
23
0
0
05 Jun 2024
Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
Saierdaer Yusuyin
Te Ma
Hao Huang
Wenbo Zhao
Zhijian Ou
52
2
0
04 Jun 2024
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and
  Zero-shot Language Style Control With Decoupled Codec
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
Shengpeng Ji
Jia-li Zuo
Minghui Fang
Siqi Zheng
Qian Chen
...
Ziyue Jiang
Hai Huang
Xize Cheng
Rongjie Huang
Zhou Zhao
55
8
0
03 Jun 2024
Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in
  Zero and Few-shot Learning
Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning
Keqi Deng
Guangzhi Sun
Phil Woodland
VLM
41
4
0
01 Jun 2024
Audio-Visual Talker Localization in Video for Spatial Sound Reproduction
Audio-Visual Talker Localization in Video for Spatial Sound Reproduction
Davide Berghi
Philip J. B. Jackson
47
0
0
01 Jun 2024
A Survey of Deep Learning Audio Generation Methods
A Survey of Deep Learning Audio Generation Methods
Matej Bozic
Marko Horvat
VLM
MedIm
61
0
0
31 May 2024
Dual sparse training framework: inducing activation map sparsity via
  Transformed $\ell1$ regularization
Dual sparse training framework: inducing activation map sparsity via Transformed ℓ1\ell1ℓ1 regularization
Xiaolong Yu
Cong Tian
52
0
0
30 May 2024
A Full-duplex Speech Dialogue Scheme Based On Large Language Models
A Full-duplex Speech Dialogue Scheme Based On Large Language Models
Peng Wang
Songshuo Lu
Yaohua Tang
Sijie Yan
Yuanjun Xiong
Wei Xia
AuLLM
36
10
0
29 May 2024
4-bit Shampoo for Memory-Efficient Network Training
4-bit Shampoo for Memory-Efficient Network Training
Sike Wang
Jia Li
Pan Zhou
Hua Huang
MQ
44
6
0
28 May 2024
Detail-Enhanced Intra- and Inter-modal Interaction for Audio-Visual
  Emotion Recognition
Detail-Enhanced Intra- and Inter-modal Interaction for Audio-Visual Emotion Recognition
Tong Shi
Xuri Ge
Joemon M. Jose
Nicolas Pugeault
Paul Henderson
36
0
0
26 May 2024
Crossmodal ASR Error Correction with Discrete Speech Units
Crossmodal ASR Error Correction with Discrete Speech Units
Yuanchao Li
Pinzhen Chen
Peter Bell
Catherine Lai
36
6
0
26 May 2024
InstructAvatar: Text-Guided Emotion and Motion Control for Avatar
  Generation
InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
Yuchi Wang
Junliang Guo
Jianhong Bai
Runyi Yu
Tianyu He
Xu Tan
Xu Sun
Jiang Bian
DiffM
50
9
0
24 May 2024
The Road Less Scheduled
The Road Less Scheduled
Aaron Defazio
Xingyu Yang
Yang
Harsh Mehta
Konstantin Mishchenko
Ahmed Khaled
Ashok Cutkosky
33
46
0
24 May 2024
Denoising LM: Pushing the Limits of Error Correction Models for Speech
  Recognition
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition
Zijin Gu
Tatiana Likhomanenko
Richard He Bai
Erik McDermott
R. Collobert
Navdeep Jaitly
AuLLM
58
2
0
24 May 2024
Statistical Advantages of Perturbing Cosine Router in Mixture of Experts
Statistical Advantages of Perturbing Cosine Router in Mixture of Experts
Huy Le Nguyen
Pedram Akbarian
Trang Pham
Trang Nguyen
Shujian Zhang
Nhat Ho
MoE
51
2
0
23 May 2024
Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture
  of Experts
Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts
Huy Nguyen
Nhat Ho
Alessandro Rinaldo
55
3
0
22 May 2024
Joint Optimization of Streaming and Non-Streaming Automatic Speech
  Recognition with Multi-Decoder and Knowledge Distillation
Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation
Muhammad Shakeel
Yui Sudo
Yifan Peng
Shinji Watanabe
43
0
0
22 May 2024
Contextualized Automatic Speech Recognition with Dynamic Vocabulary
Contextualized Automatic Speech Recognition with Dynamic Vocabulary
Yui Sudo
Yosuke Fukumoto
Muhammad Shakeel
Yifan Peng
Shinji Watanabe
34
0
0
22 May 2024
DiffNorm: Self-Supervised Normalization for Non-autoregressive
  Speech-to-speech Translation
DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation
Weiting Tan
Jingyu Zhang
Lingfeng Shen
Daniel Khashabi
Philipp Koehn
32
0
0
22 May 2024
Non-autoregressive real-time Accent Conversion model with voice cloning
Non-autoregressive real-time Accent Conversion model with voice cloning
Vladimir Nechaev
Sergey Kosyakov
42
1
0
21 May 2024
FAdam: Adam is a natural gradient optimizer using diagonal empirical
  Fisher information
FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information
Dongseong Hwang
ODL
37
4
0
21 May 2024
Mamba in Speech: Towards an Alternative to Self-Attention
Mamba in Speech: Towards an Alternative to Self-Attention
Xiangyu Zhang
Qiquan Zhang
Hexin Liu
Tianyi Xiao
Xinyuan Qian
Beena Ahmed
E. Ambikairajah
Haizhou Li
Julien Epps
Mamba
54
37
0
21 May 2024
Neighborhood Attention Transformer with Progressive Channel Fusion for
  Speaker Verification
Neighborhood Attention Transformer with Progressive Channel Fusion for Speaker Verification
Nian Li
Jianguo Wei
ViT
32
0
0
20 May 2024
Continuous Sign Language Recognition with Adapted Conformer via
  Unsupervised Pretraining
Continuous Sign Language Recognition with Adapted Conformer via Unsupervised Pretraining
Neena Aloysius
M. Geetha
Prema Nedungadi
SLR
27
2
0
20 May 2024
Du-IN: Discrete units-guided mask modeling for decoding speech from
  Intracranial Neural signals
Du-IN: Discrete units-guided mask modeling for decoding speech from Intracranial Neural signals
Hui Zheng
Haiteng Wang
Wei-Bang Jiang
Zhongtao Chen
Li He
Pei-Yang Lin
Peng-Hu Wei
Guo-Guang Zhao
Yun-Zhe Liu
52
1
0
19 May 2024
SBAAM! Eliminating Transcript Dependency in Automatic Subtitling
SBAAM! Eliminating Transcript Dependency in Automatic Subtitling
Marco Gaido
Sara Papi
Matteo Negri
Mauro Cettolo
L. Bentivogli
43
1
0
17 May 2024
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Youngjoon Jang
Ji-Hoon Kim
Junseok Ahn
Doyeop Kwak
Hong-Sun Yang
Yooncheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
CVBM
36
9
0
16 May 2024
Robust Singing Voice Transcription Serves Synthesis
Robust Singing Voice Transcription Serves Synthesis
Ruiqi Li
Yu Zhang
Yongqi Wang
Zhiqing Hong
Rongjie Huang
Zhou Zhao
40
7
0
16 May 2024
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large
  Language Models
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models
Raghuveer Peri
Sai Muralidhar Jayanthi
S. Ronanki
Anshu Bhatia
Karel Mundnich
...
Srikanth Vishnubhotla
Daniel Garcia-Romero
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
AAML
34
3
0
14 May 2024
Semantic MIMO Systems for Speech-to-Text Transmission
Semantic MIMO Systems for Speech-to-Text Transmission
Zhenzi Weng
Zhijin Qin
Huiqiang Xie
Xiaoming Tao
Khaled B. Letaief
36
3
0
13 May 2024
Improving Multimodal Learning with Multi-Loss Gradient Modulation
Improving Multimodal Learning with Multi-Loss Gradient Modulation
Konstantinos Kontras
Christos Chatzichristos
Matthew Blaschko
M. D. Vos
32
3
0
13 May 2024
Rene: A Pre-trained Multi-modal Architecture for Auscultation of
  Respiratory Diseases
Rene: A Pre-trained Multi-modal Architecture for Auscultation of Respiratory Diseases
Pengfei Zhang
Zhihang Zheng
Shichen Zhang
Minghao Yang
Shaojun Tang
19
1
0
13 May 2024
AraSpell: A Deep Learning Approach for Arabic Spelling Correction
AraSpell: A Deep Learning Approach for Arabic Spelling Correction
Mahmoud Salhab
Faisal Abu-Khzam
35
6
0
11 May 2024
Lost in Transcription: Identifying and Quantifying the Accuracy Biases
  of Automatic Speech Recognition Systems Against Disfluent Speech
Lost in Transcription: Identifying and Quantifying the Accuracy Biases of Automatic Speech Recognition Systems Against Disfluent Speech
Dena F. Mujtaba
Nihar R. Mahapatra
Megan Arney
J Scott Yaruss
Hope Gerlach-Houck
Caryn Herring
Jia Bin
40
1
0
10 May 2024
Transforming the Bootstrap: Using Transformers to Compute Scattering
  Amplitudes in Planar N = 4 Super Yang-Mills Theory
Transforming the Bootstrap: Using Transformers to Compute Scattering Amplitudes in Planar N = 4 Super Yang-Mills Theory
Tianji Cai
G. W. Merz
Franccois Charton
Niklas Nolte
Matthias Wilhelm
K. Cranmer
Lance J. Dixon
39
15
0
09 May 2024
The Codecfake Dataset and Countermeasures for the Universally Detection
  of Deepfake Audio
The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio
Yuankun Xie
Yi Lu
Ruibo Fu
Zhengqi Wen
Zhiyong Wang
...
Xiaopeng Wang
Yukun Liu
Haonan Cheng
Long Ye
Yi Sun
47
15
0
08 May 2024
When LLMs Meet Cybersecurity: A Systematic Literature Review
When LLMs Meet Cybersecurity: A Systematic Literature Review
Jie Zhang
Haoyu Bu
Hui Wen
Yu Chen
Lun Li
Hongsong Zhu
45
36
0
06 May 2024
LGTM: Local-to-Global Text-Driven Human Motion Diffusion Model
LGTM: Local-to-Global Text-Driven Human Motion Diffusion Model
Haowen Sun
Ruikun Zheng
Haibin Huang
Chongyang Ma
Hui Huang
Ruizhen Hu
DiffM
47
7
0
06 May 2024
MMGER: Multi-modal and Multi-granularity Generative Error Correction
  with LLM for Joint Accent and Speech Recognition
MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition
Bingshen Mu
Yangze Li
Qijie Shao
Kun Wei
Xucheng Wan
Naijun Zheng
Huan Zhou
Lei Xie
48
6
0
06 May 2024
AniTalker: Animate Vivid and Diverse Talking Faces through
  Identity-Decoupled Facial Motion Encoding
AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding
Tao Liu
Feilong Chen
Shuai Fan
Chenpeng Du
Qi Chen
Xie Chen
Kai Yu
DiffM
PINN
36
25
0
06 May 2024
Low-resource speech recognition and dialect identification of Irish in a
  multi-task framework
Low-resource speech recognition and dialect identification of Irish in a multi-task framework
Liam Lonergan
Mengjie Qian
Neasa Ní Chiaráin
Christer Gobl
A. N. Chasaide
43
2
0
02 May 2024
Improving Membership Inference in ASR Model Auditing with Perturbed Loss
  Features
Improving Membership Inference in ASR Model Auditing with Perturbed Loss Features
Francisco Teixeira
Karla Pizzi
R. Olivier
A. Abad
Bhiksha Raj
Isabel Trancoso
AAML
45
2
0
02 May 2024
EfficientASR: Speech Recognition Network Compression via Attention
  Redundancy and Chunk-Level FFN Optimization
EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization
Jianzong Wang
Ziqi Liang
Xulong Zhang
Ning Cheng
Jing Xiao
38
0
0
30 Apr 2024
Previous
123...789...333435
Next