ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,750 papers shown
Title
The Power of Fragmentation: A Hierarchical Transformer Model for
  Structural Segmentation in Symbolic Music Generation
The Power of Fragmentation: A Hierarchical Transformer Model for Structural Segmentation in Symbolic Music Generation
Guowei Wu
Shipei Liu
Xiaoya Fan
22
12
0
17 May 2022
MulT: An End-to-End Multitask Learning Transformer
MulT: An End-to-End Multitask Learning Transformer
Deblina Bhattacharjee
Tong Zhang
Sabine Süsstrunk
Mathieu Salzmann
ViT
47
63
0
17 May 2022
Accented Speech Recognition: Benchmarking, Pre-training, and Diverse
  Data
Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data
Alena Aksenova
Zhehuai Chen
Chung-Cheng Chiu
D. Esch
Pavel Golik
...
Levi King
Bhuvana Ramabhadran
Andrew Rosenberg
Suzan Schwartz
Gary Wang
62
22
0
16 May 2022
Personalized Adversarial Data Augmentation for Dysarthric and Elderly
  Speech Recognition
Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition
Zengrui Jin
Mengzhe Geng
Jiajun Deng
Tianzi Wang
Shujie Hu
Guinan Li
Xunying Liu
30
20
0
13 May 2022
Automated Audio Captioning: An Overview of Recent Progress and New
  Challenges
Automated Audio Captioning: An Overview of Recent Progress and New Challenges
Xinhao Mei
Xubo Liu
Mark D. Plumbley
Wenwu Wang
29
38
0
12 May 2022
Deep Learning Enabled Semantic Communications with Speech Recognition
  and Synthesis
Deep Learning Enabled Semantic Communications with Speech Recognition and Synthesis
Zhenzi Weng
Zhijin Qin
Xiaoming Tao
Chengkang Pan
Guangyi Liu
Geoffrey Ye Li
44
132
0
09 May 2022
Online Model Compression for Federated Learning with Large Models
Online Model Compression for Federated Learning with Large Models
Tien-Ju Yang
Yonghui Xiao
Giovanni Motta
F. Beaufays
Rajiv Mathews
Mingqing Chen
FedML
MQ
49
8
0
06 May 2022
A Conformer-based Waveform-domain Neural Acoustic Echo Canceller
  Optimized for ASR Accuracy
A Conformer-based Waveform-domain Neural Acoustic Echo Canceller Optimized for ASR Accuracy
S. Panchapagesan
A. Narayanan
T. Shabestary
Shuai Shao
N. Howard
Alex Park
James Walker
A. Gruenstein
29
3
0
06 May 2022
Efficient yet Competitive Speech Translation: FBK@IWSLT2022
Efficient yet Competitive Speech Translation: FBK@IWSLT2022
Marco Gaido
Sara Papi
Dennis Fucci
G. Fiameni
Matteo Negri
Marco Turchi
33
19
0
05 May 2022
SVTS: Scalable Video-to-Speech Synthesis
SVTS: Scalable Video-to-Speech Synthesis
Rodrigo Mira
A. Haliassos
Stavros Petridis
Björn W. Schuller
Maja Pantic
22
32
0
04 May 2022
ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource
  Speech Translation Tasks
ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks
Marcely Zanon Boito
John E. Ortega
Hugo Riguidel
Antoine Laurent
Loïc Barrault
...
Firas Chaabani
H. Nguyen
Florentin Barbier
Souhir Gahbiche
Yannick Esteve
27
16
0
04 May 2022
On monoaural speech enhancement for automatic recognition of real noisy
  speech using mixture invariant training
On monoaural speech enhancement for automatic recognition of real noisy speech using mixture invariant training
Jisi Zhang
Catalin Zorila
R. Doddipatla
Jon Barker
30
4
0
03 May 2022
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
  Languages
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Felix Wu
Kwangyoun Kim
Shinji Watanabe
Kyu Jeong Han
Ryan T. McDonald
Kilian Q. Weinberger
Yoav Artzi
SyDa
48
38
0
02 May 2022
Taylor, Can You Hear Me Now? A Taylor-Unfolding Framework for Monaural
  Speech Enhancement
Taylor, Can You Hear Me Now? A Taylor-Unfolding Framework for Monaural Speech Enhancement
Andong Li
Shan You
Guochen Yu
C. Zheng
Xiaodong Li
38
26
0
30 Apr 2022
Conformer and Blind Noisy Students for Improved Image Quality Assessment
Conformer and Blind Noisy Students for Improved Image Quality Assessment
Marcos V. Conde
Maxime Burchi
Radu Timofte
DiffM
46
14
0
27 Apr 2022
Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training
Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training
Dading Chong
Helin Wang
Peilin Zhou
Qingcheng Zeng
39
65
0
27 Apr 2022
Mask scalar prediction for improving robust automatic speech recognition
Mask scalar prediction for improving robust automatic speech recognition
A. Narayanan
James Walker
S. Panchapagesan
N. Howard
Yuma Koizumi
19
4
0
26 Apr 2022
Cleanformer: A multichannel array configuration-invariant neural
  enhancement frontend for ASR in smart speakers
Cleanformer: A multichannel array configuration-invariant neural enhancement frontend for ASR in smart speakers
Joseph Peter Caroselli
A. Narayanan
N. Howard
Tom O'Malley
28
4
0
25 Apr 2022
High-Efficiency Lossy Image Coding Through Adaptive Neighborhood
  Information Aggregation
High-Efficiency Lossy Image Coding Through Adaptive Neighborhood Information Aggregation
Ming Lu
Fangdong Chen
Shiliang Pu
Zhan Ma
42
44
0
25 Apr 2022
Improving the Naturalness of Simulated Conversations for End-to-End
  Neural Diarization
Improving the Naturalness of Simulated Conversations for End-to-End Neural Diarization
Natsuo Yamashita
Shota Horiguchi
Takeshi Homma
26
16
0
24 Apr 2022
Efficient Training of Neural Transducer for Speech Recognition
Efficient Training of Neural Transducer for Speech Recognition
Wei Zhou
Wilfried Michel
Ralf Schluter
Hermann Ney
AI4TS
24
22
0
22 Apr 2022
Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using
  Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation
Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation
Ryo Terashima
Ryuichi Yamamoto
Eunwoo Song
Yuma Shirahata
Hyun-Wook Yoon
Jae-Min Kim
Kentaro Tachibana
13
15
0
21 Apr 2022
Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech
  Recognition
Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition
Xun Gong
Y. Qian
Houjun Huang
Yanmin Qian
34
44
0
21 Apr 2022
Detecting Unintended Memorization in Language-Model-Fused ASR
Detecting Unintended Memorization in Language-Model-Fused ASR
Yifan Jiang
Steve Chien
Om Thakkar
Rajiv Mathews
41
11
0
20 Apr 2022
On the Locality of Attention in Direct Speech Translation
On the Locality of Attention in Direct Speech Translation
Belen Alastruey
Javier Ferrando
Gerard I. Gállego
Marta R. Costa-jussá
16
7
0
19 Apr 2022
An Investigation of Monotonic Transducers for Large-Scale Automatic
  Speech Recognition
An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
Niko Moritz
Frank Seide
Duc Le
Jay Mahadeokar
Christian Fuegen
23
8
0
19 Apr 2022
Extracting Targeted Training Data from ASR Models, and How to Mitigate
  It
Extracting Targeted Training Data from ASR Models, and How to Mitigate It
Ehsan Amid
Om Thakkar
A. Narayanan
Rajiv Mathews
Franccoise Beaufays
20
9
0
18 Apr 2022
Self-critical Sequence Training for Automatic Speech Recognition
Self-critical Sequence Training for Automatic Speech Recognition
Chen Chen
Yuchen Hu
Nana Hou
Xiaofeng Qi
Heqing Zou
Chng Eng Siong
27
15
0
13 Apr 2022
A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes
A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes
Shaojin Ding
Weiran Wang
Ding Zhao
Tara N. Sainath
Yanzhang He
...
Qiao Liang
Dongseong Hwang
Ian McGraw
Rohit Prabhavalkar
Trevor Strohman
30
17
0
13 Apr 2022
ASR in German: A Detailed Error Analysis
ASR in German: A Detailed Error Analysis
John M. Wirth
René Peinl
26
5
0
12 Apr 2022
Multichannel Speech Separation with Narrow-band Conformer
Multichannel Speech Separation with Narrow-band Conformer
Changsheng Quan
Xiaofei Li
31
12
0
09 Apr 2022
Hierarchical and Multi-Scale Variational Autoencoder for Diverse and
  Natural Non-Autoregressive Text-to-Speech
Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech
Jaesung Bae
Jinhyeok Yang
Taejun Bak
Young-Sun Joo
DiffM
21
6
0
08 Apr 2022
Points to Patches: Enabling the Use of Self-Attention for 3D Shape
  Recognition
Points to Patches: Enabling the Use of Self-Attention for 3D Shape Recognition
Axel Berg
Magnus Oskarsson
Mark O'Connor
3DPC
ViT
29
26
0
08 Apr 2022
Adding Connectionist Temporal Summarization into Conformer to Improve
  Its Decoder Efficiency For Speech Recognition
Adding Connectionist Temporal Summarization into Conformer to Improve Its Decoder Efficiency For Speech Recognition
N. J. Wang
Zongfeng Quan
Shaojun Wang
Jing Xiao
23
1
0
08 Apr 2022
Transducer-based language embedding for spoken language identification
Transducer-based language embedding for spoken language identification
Peng Shen
Xugang Lu
Hisashi Kawai
56
6
0
08 Apr 2022
A Study of Different Ways to Use The Conformer Model For Spoken Language
  Understanding
A Study of Different Ways to Use The Conformer Model For Spoken Language Understanding
N. J. Wang
Shaojun Wang
Jing Xiao
16
0
0
08 Apr 2022
Hierarchical Softmax for End-to-End Low-resource Multilingual Speech
  Recognition
Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition
Qianying Liu
Zhuo Gong
Zhengdong Yang
Yuhang Yang
Sheng Li
...
N. Minematsu
Hao-Ming Huang
Fei Cheng
Chenhui Chu
Sadao Kurohashi
29
5
0
08 Apr 2022
Defense against Adversarial Attacks on Hybrid Speech Recognition using
  Joint Adversarial Fine-tuning with Denoiser
Defense against Adversarial Attacks on Hybrid Speech Recognition using Joint Adversarial Fine-tuning with Denoiser
Sonal Joshi
Saurabh Kataria
Yiwen Shao
Piotr Żelasko
Jesus Villalba
Sanjeev Khudanpur
Najim Dehak
AAML
33
4
0
08 Apr 2022
Personal VAD 2.0: Optimizing Personal Voice Activity Detection for
  On-Device Speech Recognition
Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition
Shaojin Ding
R. Rikhye
Qiao Liang
Yanzhang He
Quan Wang
A. Narayanan
Tom O'Malley
Ian McGraw
29
27
0
08 Apr 2022
Does Simultaneous Speech Translation need Simultaneous Models?
Does Simultaneous Speech Translation need Simultaneous Models?
Sara Papi
Marco Gaido
Matteo Negri
Marco Turchi
43
26
0
08 Apr 2022
MAESTRO: Matched Speech Text Representations through Modality Matching
MAESTRO: Matched Speech Text Representations through Modality Matching
Zhehuai Chen
Yu Zhang
Andrew Rosenberg
Bhuvana Ramabhadran
Pedro J. Moreno
Ankur Bapna
Heiga Zen
25
106
0
07 Apr 2022
Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition
Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition
Qijie Shao
Jinghao Yan
Jian Kang
Pengcheng Guo
Xian Shi
Pengfei Hu
Lei Xie
28
6
0
07 Apr 2022
3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech
  recognition
3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition
Zhao You
Shulin Feng
Dan Su
Dong Yu
22
9
0
07 Apr 2022
Enhanced Direct Speech-to-Speech Translation Using Self-supervised
  Pre-training and Data Augmentation
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation
Sravya Popuri
Peng-Jen Chen
Changhan Wang
J. Pino
Yossi Adi
Jiatao Gu
Wei-Ning Hsu
Ann Lee
28
56
0
06 Apr 2022
A Wav2vec2-Based Experimental Study on Self-Supervised Learning Methods
  to Improve Child Speech Recognition
A Wav2vec2-Based Experimental Study on Self-Supervised Learning Methods to Improve Child Speech Recognition
Rishabh Jain
Andrei Barcovschi
Mariam Yiwere
Dan Bigioi
Peter Corcoran
H. Cucu
28
31
0
06 Apr 2022
Towards End-to-end Unsupervised Speech Recognition
Towards End-to-end Unsupervised Speech Recognition
Alexander H. Liu
Wei-Ning Hsu
Michael Auli
Alexei Baevski
SSL
31
74
0
05 Apr 2022
Hear No Evil: Towards Adversarial Robustness of Automatic Speech
  Recognition via Multi-Task Learning
Hear No Evil: Towards Adversarial Robustness of Automatic Speech Recognition via Multi-Task Learning
Nilaksh Das
Duen Horng Chau
AAML
37
0
0
05 Apr 2022
Cross-lingual Self-Supervised Speech Representations for Improved
  Dysarthric Speech Recognition
Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition
Abner Hernandez
Paula Andrea Pérez-Toro
Elmar Nöth
J. Orozco-Arroyave
Andreas Maier
S. Yang
28
39
0
04 Apr 2022
Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur
  Speech Recognition
Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition
Guodong Ma
Pengfei Hu
Jian Kang
Shen Huang
Hao-Ming Huang
29
9
0
02 Apr 2022
Deep Neural Convolutive Matrix Factorization for Articulatory
  Representation Decomposition
Deep Neural Convolutive Matrix Factorization for Articulatory Representation Decomposition
Jiachen Lian
A. Black
Louis Goldstein
Gopala Krishna Anumanchipalli
28
16
0
01 Apr 2022
Previous
123...262728...333435
Next