ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,750 papers shown
Title
HYBRIDFORMER: improving SqueezeFormer with hybrid attention and NSR
  mechanism
HYBRIDFORMER: improving SqueezeFormer with hybrid attention and NSR mechanism
Yuguang Yang
Yu Pan
Jingjing Yin
Jiangyu Han
Lei Ma
Heng Lu
31
8
0
15 Mar 2023
Enhancing Unsupervised Audio Representation Learning via Adversarial
  Sample Generation
Enhancing Unsupervised Audio Representation Learning via Adversarial Sample Generation
Yulin Pan
Xiangteng He
Biao Gong
Yuxin Peng
Yiliang Lv
SSL
29
0
0
15 Mar 2023
Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech
  Recognition Models
Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models
Steven M. Hernandez
Ding Zhao
Shaojin Ding
A. Bruguier
Rohit Prabhavalkar
Tara N. Sainath
Yanzhang He
Ian McGraw
31
7
0
15 Mar 2023
Learning Cross-lingual Visual Speech Representations
Learning Cross-lingual Visual Speech Representations
Andreas Zinonos
A. Haliassos
Pingchuan Ma
Stavros Petridis
Maja Pantic
SSL
22
8
0
14 Mar 2023
Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy
Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy
Xulong Zhang
Haobin Tang
Jianzong Wang
Ning Cheng
Jian Luo
Jing Xiao
30
2
0
14 Mar 2023
I3D: Transformer architectures with input-dependent dynamic depth for
  speech recognition
I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Yifan Peng
Jaesong Lee
Shinji Watanabe
32
19
0
14 Mar 2023
Speech Intelligibility Classifiers from 550k Disordered Speech Samples
Speech Intelligibility Classifiers from 550k Disordered Speech Samples
Subhashini Venugopalan
Jimmy Tobin
Samuel J. Yang
Katie Seaver
Richard Cave
P. Jiang
Neil Zeghidour
Rus Heywood
Jordan R. Green
Michael P. Brenner
49
9
0
13 Mar 2023
Context-Aware Selective Label Smoothing for Calibrating Sequence
  Recognition Model
Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model
Shuangping Huang
Y. Luo
Zhenzhou Zhuang
Jin-Gang Yu
Mengchao He
Yongpan Wang
40
8
0
13 Mar 2023
The System Description of dun_oscar team for The ICPR MSR Challenge
The System Description of dun_oscar team for The ICPR MSR Challenge
Binbin Du
Rui Deng
Yingxin Zhang
23
0
0
13 Mar 2023
Robust Knowledge Distillation from RNN-T Models With Noisy Training
  Labels Using Full-Sum Loss
Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss
Mohammad Zeineldeen
Kartik Audhkhasi
M. Baskar
Bhuvana Ramabhadran
26
2
0
10 Mar 2023
Multi-Dimensional and Multi-Scale Modeling for Speech Separation
  Optimized by Discriminative Learning
Multi-Dimensional and Multi-Scale Modeling for Speech Separation Optimized by Discriminative Learning
Zhaoxi Mu
Xinyu Yang
Wenjing Zhu
36
5
0
07 Mar 2023
The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the
  2021 MISP Challenge: Deep Analysis
The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis
Haoxu Wang
Ming Cheng
Qiang Fu
Ming Li
46
8
0
04 Mar 2023
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised
  Speech and Text Representations
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
Yu Zhang
Wei Han
Ankur Bapna
M. Bacchiani
39
22
0
03 Mar 2023
End-to-End Speech Recognition: A Survey
End-to-End Speech Recognition: A Survey
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
31
153
0
03 Mar 2023
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Yu Zhang
Wei Han
James Qin
Yongqiang Wang
Ankur Bapna
...
Pedro J. Moreno
Chung-Cheng Chiu
J. Schalkwyk
Franccoise Beaufays
Yonghui Wu
VLM
85
255
0
02 Mar 2023
Leveraging Large Text Corpora for End-to-End Speech Summarization
Leveraging Large Text Corpora for End-to-End Speech Summarization
Kohei Matsuura
Takanori Ashihara
Takafumi Moriya
Tomohiro Tanaka
A. Ogawa
Marc Delcroix
Ryo Masumura
27
14
0
02 Mar 2023
On the Audio-visual Synchronization for Lip-to-Speech Synthesis
On the Audio-visual Synchronization for Lip-to-Speech Synthesis
Zhe Niu
Brian Mak
30
3
0
01 Mar 2023
OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge
  Collaborative AutoML System
OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge Collaborative AutoML System
Chao Xue
Wen Liu
Shunxing Xie
Zhenfang Wang
Jiaxing Li
...
Shi-Yong Chen
Yibing Zhan
Jing Zhang
Chaoyue Wang
Dacheng Tao
52
2
0
01 Mar 2023
N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses
  and Constrained Decoding Space
N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space
Rao Ma
Mark Gales
Kate Knill
Mengjie Qian
16
32
0
01 Mar 2023
Exploring Self-supervised Pre-trained ASR Models For Dysarthric and
  Elderly Speech Recognition
Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition
Shujie Hu
Xurong Xie
Zengrui Jin
Mengzhe Geng
Yi Wang
Mingyu Cui
Jiajun Deng
Xunying Liu
Helen M. Meng
24
30
0
28 Feb 2023
Practice of the conformer enhanced AUDIO-VISUAL HUBERT on Mandarin and
  English
Practice of the conformer enhanced AUDIO-VISUAL HUBERT on Mandarin and English
Xiaoming Ren
Chao Li
Shenjian Wang
Biao Li
38
0
0
28 Feb 2023
Diagonal State Space Augmented Transformers for Speech Recognition
Diagonal State Space Augmented Transformers for Speech Recognition
G. Saon
Ankit Gupta
Xiaodong Cui
AI4TS
40
26
0
27 Feb 2023
Text-only domain adaptation for end-to-end ASR using integrated
  text-to-mel-spectrogram generator
Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator
Vladimir Bataev
Roman Korostik
Evgeny Shabalin
Vitaly Lavrukhin
Boris Ginsburg
VLM
38
14
0
27 Feb 2023
Full Stack Optimization of Transformer Inference: a Survey
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
41
102
0
27 Feb 2023
MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech
  Recognition
MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition
Yoohwan Kwon
Soo-Whan Chung
MoE
24
16
0
27 Feb 2023
SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic
  Speech Processing
SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing
Weidong Chen
Xiaofen Xing
Xiangmin Xu
Jianxin Pang
Lan Du
35
38
0
27 Feb 2023
Improving Medical Speech-to-Text Accuracy with Vision-Language
  Pre-training Model
Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training Model
Jaeyoung Huh
Sangjoon Park
Jeonghyeon Lee
Jong Chul Ye
LM&MA
25
9
0
27 Feb 2023
Deep Visual Forced Alignment: Learning to Align Transcription with
  Talking Face Video
Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video
Minsu Kim
Chae Won Kim
Y. Ro
CVBM
DiffM
38
3
0
27 Feb 2023
Improving Massively Multilingual ASR With Auxiliary CTC Objectives
Improving Massively Multilingual ASR With Auxiliary CTC Objectives
William Chen
Brian Yan
Jiatong Shi
Yifan Peng
Soumi Maiti
Shinji Watanabe
39
38
0
24 Feb 2023
Factual Consistency Oriented Speech Recognition
Factual Consistency Oriented Speech Recognition
Naoyuki Kanda
Takuya Yoshioka
Yang Liu
45
0
0
24 Feb 2023
D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network using
  Joint Complex Masking and Complex Spectral Mapping for Monaural Speech
  Enhancement
D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network using Joint Complex Masking and Complex Spectral Mapping for Monaural Speech Enhancement
Shengkui Zhao
Bin Ma
40
16
0
23 Feb 2023
MossFormer: Pushing the Performance Limit of Monaural Speech Separation
  using Gated Single-Head Transformer with Convolution-Augmented Joint
  Self-Attentions
MossFormer: Pushing the Performance Limit of Monaural Speech Separation using Gated Single-Head Transformer with Convolution-Augmented Joint Self-Attentions
Shengkui Zhao
Bin Ma
41
53
0
23 Feb 2023
Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust
  Speech Recognition
Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition
Yuchen Hu
Chen Chen
Ruizhe Li
Qiu-shi Zhu
Eng Siong Chng
42
15
0
22 Feb 2023
Improving Contextual Spelling Correction by External Acoustics Attention
  and Semantic Aware Data Augmentation
Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation
Xiaoqiang Wang
Yanqing Liu
Jinyu Li
Sheng Zhao
34
7
0
22 Feb 2023
Efficient CTC Regularization via Coarse Labels for End-to-End Speech
  Translation
Efficient CTC Regularization via Coarse Labels for End-to-End Speech Translation
Biao Zhang
Barry Haddow
Rico Sennrich
19
3
0
21 Feb 2023
DasFormer: Deep Alternating Spectrogram Transformer for
  Multi/Single-Channel Speech Separation
DasFormer: Deep Alternating Spectrogram Transformer for Multi/Single-Channel Speech Separation
Shuo Wang
Xiangyu Kong
Xiulian Peng
H. Movassagh
Vinod Prakash
Yan Lu
26
11
0
21 Feb 2023
A Sidecar Separator Can Convert a Single-Talker Speech Recognition
  System to a Multi-Talker One
A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One
Lingwei Meng
Jiawen Kang
Mingyu Cui
Yuejiao Wang
Xixin Wu
Helen M. Meng
25
17
0
20 Feb 2023
Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End
  Speech Recognition
Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition
Leyuan Qu
C. Weber
S. Wermter
19
5
0
20 Feb 2023
Why Is Public Pretraining Necessary for Private Model Training?
Why Is Public Pretraining Necessary for Private Model Training?
Arun Ganesh
Mahdi Haghifam
Milad Nasr
Sewoong Oh
Thomas Steinke
Om Thakkar
Abhradeep Thakurta
Lun Wang
31
36
0
19 Feb 2023
Massively Multilingual Shallow Fusion with Large Language Models
Massively Multilingual Shallow Fusion with Large Language Models
Ke Hu
Tara N. Sainath
Yue Liu
Nan Du
Yanping Huang
Andrew M. Dai
Yu Zhang
Rodrigo Cabrera
Zhehuai Chen
Trevor Strohman
40
13
0
17 Feb 2023
Lip-to-Speech Synthesis in the Wild with Multi-task Learning
Lip-to-Speech Synthesis in the Wild with Multi-task Learning
Minsu Kim
Joanna Hong
Y. Ro
27
21
0
17 Feb 2023
Conformers are All You Need for Visual Speech Recognition
Conformers are All You Need for Visual Speech Recognition
Oscar Chang
H. Liao
Dmitriy Serdyuk
Ankit Parag Shah
Olivier Siohan
VLM
57
14
0
17 Feb 2023
Improving Transformer-based Networks With Locality For Automatic Speaker
  Verification
Improving Transformer-based Networks With Locality For Automatic Speaker Verification
Mufan Sang
Yong Zhao
Gang Liu
John H. L. Hansen
Jian Wu
ViT
33
14
0
17 Feb 2023
E2E Spoken Entity Extraction for Virtual Agents
E2E Spoken Entity Extraction for Virtual Agents
Karan Singla
Yeon-Jun Kim
S. Bangalore
34
1
0
16 Feb 2023
Adaptable End-to-End ASR Models using Replaceable Internal LMs and
  Residual Softmax
Adaptable End-to-End ASR Models using Replaceable Internal LMs and Residual Softmax
Keqi Deng
P. Woodland
AuLLM
KELM
37
11
0
16 Feb 2023
ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly
  Disentangled Self-supervised Speech Representations
ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations
Shehzeen Samarah Hussain
Paarth Neekhara
Jocelyn Huang
Jason Chun Lok Li
Boris Ginsburg
13
21
0
16 Feb 2023
Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech
  Recognition
Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition
Minsu Kim
Hyungil Kim
Y. Ro
VLM
18
18
0
16 Feb 2023
Confidence Score Based Speaker Adaptation of Conformer Speech
  Recognition Systems
Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems
Jiajun Deng
Xurong Xie
Tianzi Wang
Mingyu Cui
Boyang Xue
Zengrui Jin
Guinan Li
Shujie Hu
Xunying Liu
31
5
0
15 Feb 2023
PATCorrect: Non-autoregressive Phoneme-augmented Transformer for ASR
  Error Correction
PATCorrect: Non-autoregressive Phoneme-augmented Transformer for ASR Error Correction
Zi Xuan Zhang
Zhehui Wang
R. Kamma
S. Eswaran
Narayanan Sadagopan
KELM
36
4
0
10 Feb 2023
AV-data2vec: Self-supervised Learning of Audio-Visual Speech
  Representations with Contextualized Target Representations
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Jiachen Lian
Alexei Baevski
Wei-Ning Hsu
Michael Auli
SSL
47
34
0
10 Feb 2023
Previous
123...192021...333435
Next