ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,749 papers shown
Title
Text-To-Speech Data Augmentation for Low Resource Speech Recognition
Text-To-Speech Data Augmentation for Low Resource Speech Recognition
Rodolfo Zevallos
24
4
0
01 Apr 2022
COOL, a Context Outlooker, and its Application to Question Answering and
  other Natural Language Processing Tasks
COOL, a Context Outlooker, and its Application to Question Answering and other Natural Language Processing Tasks
Fangyi Zhu
See-Kiong Ng
S. Bressan
LRM
22
1
0
01 Apr 2022
Effect and Analysis of Large-scale Language Model Rescoring on
  Competitive ASR Systems
Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems
Takuma Udagawa
Masayuki Suzuki
Gakuto Kurata
N. Itoh
G. Saon
42
23
0
01 Apr 2022
Better Intermediates Improve CTC Inference
Better Intermediates Improve CTC Inference
Tatsuya Komatsu
Yusuke Fujita
Jaesong Lee
Lukas Lee
Shinji Watanabe
Yusuke Kida
19
5
0
01 Apr 2022
Alternate Intermediate Conditioning with Syllable-level and
  Character-level Targets for Japanese ASR
Alternate Intermediate Conditioning with Syllable-level and Character-level Targets for Japanese ASR
Yusuke Fujita
Tatsuya Komatsu
Yusuke Kida
35
3
0
01 Apr 2022
InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR
InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR
Yumi Nakagome
Tatsuya Komatsu
Yusuke Fujita
Shuta Ichimura
Yusuke Kida
19
4
0
01 Apr 2022
Scaling Language Model Size in Cross-Device Federated Learning
Scaling Language Model Size in Cross-Device Federated Learning
Jae Hun Ro
Theresa Breiner
Lara McConnaughey
Mingqing Chen
A. Suresh
Shankar Kumar
Rajiv Mathews
FedML
29
24
0
31 Mar 2022
CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained
  ASR Embeddings for Speech Emotion Recognition
CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition
Chengxin Chen
Pengyuan Zhang
AI4TS
21
10
0
31 Mar 2022
Analyzing the factors affecting usefulness of Self-Supervised
  Pre-trained Representations for Speech Recognition
Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition
Ashish Seth
L. D. Prasad
Sreyan Ghosh
S. Umesh
30
3
0
31 Mar 2022
HiFi-VC: High Quality ASR-Based Voice Conversion
HiFi-VC: High Quality ASR-Based Voice Conversion
A. Kashkin
I. Karpukhin
S. Shishkin
29
5
0
31 Mar 2022
Memory-Efficient Training of RNN-Transducer with Sampled Softmax
Memory-Efficient Training of RNN-Transducer with Sampled Softmax
Jaesong Lee
Lukas Lee
Shinji Watanabe
33
8
0
31 Mar 2022
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to
  Speech
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
D. Lim
Sunghee Jung
Eesung Kim
19
51
0
31 Mar 2022
Open Source MagicData-RAMC: A Rich Annotated Mandarin
  Conversational(RAMC) Speech Dataset
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset
Zehui Yang
Yifan Chen
Lei Luo
Runyan Yang
Lingxuan Ye
...
Yaohui Jin
Qingqing Zhang
Pengyuan Zhang
Lei Xie
Yonghong Yan
20
47
0
31 Mar 2022
NeuFA: Neural Network Based End-to-End Forced Alignment with
  Bidirectional Attention Mechanism
NeuFA: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism
Jingbei Li
Yi Meng
Zhiyong Wu
Helen Meng
Qiao Tian
Yuping Wang
Yuxuan Wang
17
21
0
31 Mar 2022
A Comparative Study on Speaker-attributed Automatic Speech Recognition
  in Multi-party Meetings
A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings
Fan Yu
Zhihao Du
Shiliang Zhang
Yuxiao Lin
Linfu Xie
22
13
0
31 Mar 2022
An Empirical Study of Language Model Integration for Transducer based
  Speech Recognition
An Empirical Study of Language Model Integration for Transducer based Speech Recognition
Huahuan Zheng
Keyu An
Zhijian Ou
Chen Huang
Ke Ding
Guanglu Wan
33
5
0
31 Mar 2022
CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming
  ASR
CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASR
Keyu An
Huahuan Zheng
Zhijian Ou
Hongyu Xiang
Ke Ding
Guanglu Wan
AI4TS
28
17
0
31 Mar 2022
Combination of Time-domain, Frequency-domain, and Cepstral-domain
  Acoustic Features for Speech Commands Classification
Combination of Time-domain, Frequency-domain, and Cepstral-domain Acoustic Features for Speech Commands Classification
Yikang Wang
Hiromitsu Nishizaki
34
1
0
30 Mar 2022
4-bit Conformer with Native Quantization Aware Training for Speech
  Recognition
4-bit Conformer with Native Quantization Aware Training for Speech Recognition
Shaojin Ding
Phoenix Meadowlark
Yanzhang He
Lukasz Lew
Shivani Agrawal
Oleg Rybakov
MQ
31
32
0
29 Mar 2022
Integrating Lattice-Free MMI into End-to-End Speech Recognition
Integrating Lattice-Free MMI into End-to-End Speech Recognition
Jinchuan Tian
Jianwei Yu
Chao Weng
Yuexian Zou
Dong Yu
35
8
0
29 Mar 2022
Locality Matters: A Locality-Biased Linear Attention for Automatic
  Speech Recognition
Locality Matters: A Locality-Biased Linear Attention for Automatic Speech Recognition
J. Sun
Guiping Zhong
Dinghao Zhou
Baoxiang Li
Yiran Zhong
33
7
0
29 Mar 2022
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit
Binbin Zhang
Di Wu
Zhendong Peng
Xingcheng Song
Zhuoyuan Yao
Hang Lv
Linfu Xie
Chao Yang
Fuping Pan
Jianwei Niu
VLM
29
94
0
29 Mar 2022
Investigating Self-supervised Pretraining Frameworks for Pathological
  Speech Recognition
Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition
Lester Phillip Violeta
Wen-Chin Huang
T. Toda
22
31
0
29 Mar 2022
MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic
  Speaker Verification
MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification
Yang Zhang
Zhiqiang Lv
Haibin Wu
Shanshan Zhang
Pengfei Hu
Zhiyong Wu
Hung-yi Lee
Helen Meng
ViT
27
130
0
29 Mar 2022
Shifted Chunk Encoder for Transformer Based Streaming End-to-End ASR
Shifted Chunk Encoder for Transformer Based Streaming End-to-End ASR
Fangyuan Wang
Bo Xu
29
4
0
29 Mar 2022
CMGAN: Conformer-based Metric GAN for Speech Enhancement
CMGAN: Conformer-based Metric GAN for Speech Enhancement
Ru Cao
Sherif Abdulatif
Bin Yang
21
92
0
28 Mar 2022
Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition
Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition
Yuchen Hu
Nana Hou
Chen Chen
Chng Eng Siong
27
14
0
28 Mar 2022
On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and
  Elderly Speech Recognition
On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition
Mengzhe Geng
Xurong Xie
Rongfeng Su
Jianwei Yu
Zengrui Jin
Tianzi Wang
Shujie Hu
Zi Ye
Helen M. Meng
Xunying Liu
40
6
0
28 Mar 2022
Chain-based Discriminative Autoencoders for Speech Recognition
Chain-based Discriminative Autoencoders for Speech Recognition
Hung-Shin Lee
Pin-Tuan Huang
Yao-Fei Cheng
Hsin-Min Wang
11
1
0
25 Mar 2022
Leveraging unsupervised and weakly-supervised data to improve direct
  speech-to-speech translation
Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation
Ye Jia
Yifan Ding
Ankur Bapna
Colin Cherry
Yu Zhang
Alexis Conneau
Nobuyuki Morioka
47
20
0
24 Mar 2022
HiFi++: a Unified Framework for Bandwidth Extension and Speech
  Enhancement
HiFi++: a Unified Framework for Bandwidth Extension and Speech Enhancement
Pavel Andreev
Aibek Alanov
Oleg Ivanov
Dmitry Vetrov
38
38
0
24 Mar 2022
Disentangleing Content and Fine-grained Prosody Information via Hybrid
  ASR Bottleneck Features for Voice Conversion
Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion
Xintao Zhao
Feng Liu
Changhe Song
Zhiyong Wu
Shiyin Kang
Deyi Tuo
Helen Meng
26
21
0
24 Mar 2022
A Scalable Model Specialization Framework for Training and Inference
  using Submodels and its Application to Speech Model Personalization
A Scalable Model Specialization Framework for Training and Inference using Submodels and its Application to Speech Model Personalization
Fadi Biadsy
Youzheng Chen
Xia Zhang
Oleg Rybakov
Andrew Rosenberg
Pedro J. Moreno
48
13
0
23 Mar 2022
Pseudo Label Is Better Than Human Label
Pseudo Label Is Better Than Human Label
DongSeon Hwang
K. Sim
Zhouyuan Huo
Trevor Strohman
24
32
0
22 Mar 2022
Enhancing Speech Recognition Decoding via Layer Aggregation
Enhancing Speech Recognition Decoding via Layer Aggregation
Tomer Wullach
Shlomo E. Chazan
32
1
0
21 Mar 2022
WeSinger: Data-augmented Singing Voice Synthesis with Auxiliary Losses
WeSinger: Data-augmented Singing Voice Synthesis with Auxiliary Losses
Zewang Zhang
Yibin Zheng
Xinhui Li
Li Lu
26
16
0
21 Mar 2022
Delta Keyword Transformer: Bringing Transformers to the Edge through
  Dynamically Pruned Multi-Head Self-Attention
Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention
Zuzana Jelčicová
Marian Verhelst
28
5
0
20 Mar 2022
Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For
  Disordered Speech Recognition
Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For Disordered Speech Recognition
Shujie Hu
Shansong Liu
Xurong Xie
Mengzhe Geng
Tianzi Wang
Shoukang Hu
Mingyu Cui
Xunying Liu
Helen Meng
25
14
0
19 Mar 2022
Similarity and Content-based Phonetic Self Attention for Speech
  Recognition
Similarity and Content-based Phonetic Self Attention for Speech Recognition
Kyuhong Shim
Wonyong Sung
20
7
0
19 Mar 2022
A Track-Wise Ensemble Event Independent Network for Polyphonic Sound
  Event Localization and Detection
A Track-Wise Ensemble Event Independent Network for Polyphonic Sound Event Localization and Detection
Jinbo Hu
Yin Cao
Ming Wu
Qiuqiang Kong
Feiran Yang
Mark D. Plumbley
J. Yang
24
23
0
19 Mar 2022
A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech
  Synthesis and Editing
A3^33T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing
Richard He Bai
Renjie Zheng
Junkun Chen
Xintong Li
Mingbo Ma
Liang Huang
29
49
0
18 Mar 2022
SepTr: Separable Transformer for Audio Spectrogram Processing
SepTr: Separable Transformer for Audio Spectrogram Processing
Nicolae-Cătălin Ristea
Radu Tudor Ionescu
Fahad Shahbaz Khan
ViT
23
30
0
17 Mar 2022
BrainGB: A Benchmark for Brain Network Analysis with Graph Neural
  Networks
BrainGB: A Benchmark for Brain Network Analysis with Graph Neural Networks
Hejie Cui
Wei Dai
Yanqiao Zhu
Xuan Kan
Antonio Aodong Chen Gu
Joshua Lukemire
Liang Zhan
Lifang He
Ying Guo
Carl Yang
19
113
0
17 Mar 2022
Learning Audio Representations with MLPs
Learning Audio Representations with MLPs
Mashrur M. Morshed
Ahmad Omar Ahsan
H. Mahmud
Md. Kamrul Hasan
27
4
0
16 Mar 2022
A Squeeze-and-Excitation and Transformer based Cross-task System for
  Environmental Sound Recognition
A Squeeze-and-Excitation and Transformer based Cross-task System for Environmental Sound Recognition
Jisheng Bai
Jianfeng Chen
Mou Wang
Muhammad Saad Ayub
19
9
0
16 Mar 2022
CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio
  Classification
CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification
Yuan Gong
Sameer Khurana
Andrew Rouditchenko
James R. Glass
VLM
25
29
0
13 Mar 2022
DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale
  Click-Through Rate Prediction
DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction
Buyun Zhang
Liangchen Luo
Xi Liu
Jay Li
Zeliang Chen
...
Yasmine Badr
Jongsoo Park
Jiyan Yang
Dheevatsa Mudigere
Ellie Wen
3DV
22
11
0
11 Mar 2022
Parameter-Free Attentive Scoring for Speaker Verification
Parameter-Free Attentive Scoring for Speaker Verification
Jason W. Pelecanos
Quan Wang
Yiling Huang
Ignacio López Moreno
17
5
0
10 Mar 2022
Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word
  Speech Recognition
Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition
Yifan Jiang
Cal Peyser
Tara N. Sainath
Ruoming Pang
Trevor Strohman
Shankar Kumar
26
16
0
09 Mar 2022
Language Matters: A Weakly Supervised Vision-Language Pre-training
  Approach for Scene Text Detection and Spotting
Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting
Chuhui Xue
Wenqing Zhang
Yu Hao
Shijian Lu
Philip Torr
Song Bai
VLM
40
32
0
08 Mar 2022
Previous
123...272829...333435
Next