ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,750 papers shown
Title
QNet: A Quantum-native Sequence Encoder Architecture
QNet: A Quantum-native Sequence Encoder Architecture
Wei-Yen Day
Hao-Sheng Chen
Min Sun
26
0
0
31 Oct 2022
Mining Word Boundaries in Speech as Naturally Annotated Word
  Segmentation Data
Mining Word Boundaries in Speech as Naturally Annotated Word Segmentation Data
Lei Zhang
Zhenghua Li
Shilin Zhou
Chen Gong
Zhefeng Wang
Baoxing Huai
Min Zhang
36
0
0
31 Oct 2022
Fast and parallel decoding for transducer
Fast and parallel decoding for transducer
Wei Kang
Liyong Guo
Fangjun Kuang
Long Lin
Mingshuang Luo
Zengwei Yao
Xiaoyu Yang
Piotr Żelasko
Daniel Povey
AI4TS
29
15
0
31 Oct 2022
Delay-penalized transducer for low-latency streaming ASR
Delay-penalized transducer for low-latency streaming ASR
Wei Kang
Zengwei Yao
Fangjun Kuang
Liyong Guo
Xiaoyu Yang
Long lin
Piotr Żelasko
Daniel Povey
30
6
0
31 Oct 2022
Predicting Multi-Codebook Vector Quantization Indexes for Knowledge
  Distillation
Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation
Liyong Guo
Xiaoyu Yang
Quandong Wang
Yuxiang Kong
Zengwei Yao
...
Wei Kang
Long Lin
Mingshuang Luo
Piotr Żelasko
Daniel Povey
VLM
43
7
0
31 Oct 2022
Structured State Space Decoder for Speech Recognition and Synthesis
Structured State Space Decoder for Speech Recognition and Synthesis
Koichi Miyazaki
Masato Murata
Tomoki Koriyama
39
12
0
31 Oct 2022
FusionFormer: Fusing Operations in Transformer for Efficient Streaming
  Speech Recognition
FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition
Xingcheng Song
Di Wu
Binbin Zhang
Zhiyong Wu
Wenpeng Li
...
Peng Zhang
Zhendong Peng
Fuping Pan
Changbao Zhu
Zhongqin Wu
31
2
0
31 Oct 2022
Modular Hybrid Autoregressive Transducer
Modular Hybrid Autoregressive Transducer
Zhong Meng
Tongzhou Chen
Rohit Prabhavalkar
Yu Zhang
Gary Wang
...
Bhuvana Ramabhadran
Wenjie Huang
Ehsan Variani
Yinghui Huang
Pedro J. Moreno
34
20
0
31 Oct 2022
Blank Collapse: Compressing CTC emission for the faster decoding
Blank Collapse: Compressing CTC emission for the faster decoding
Minkyu Jung
Ohhyeok Kwon
S. Seo
Soonshin Seo
41
3
0
31 Oct 2022
Partitioned Gradient Matching-based Data Subset Selection for
  Compute-Efficient Robust ASR Training
Partitioned Gradient Matching-based Data Subset Selection for Compute-Efficient Robust ASR Training
Ashish R. Mittal
D. Sivasubramanian
Rishabh K. Iyer
Preethi Jyothi
Ganesh Ramakrishnan
26
3
0
30 Oct 2022
Improvements to Embedding-Matching Acoustic-to-Word ASR Using
  Multiple-Hypothesis Pronunciation-Based Embeddings
Improvements to Embedding-Matching Acoustic-to-Word ASR Using Multiple-Hypothesis Pronunciation-Based Embeddings
Hao Yen
Woojay Jeon
34
2
0
30 Oct 2022
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with
  Pre-trained Masked Language Model
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model
Yosuke Higuchi
Brian Yan
Siddhant Arora
Tetsuji Ogawa
Tetsunori Kobayashi
Shinji Watanabe
56
25
0
29 Oct 2022
End-to-end Spoken Language Understanding with Tree-constrained Pointer
  Generator
End-to-end Spoken Language Understanding with Tree-constrained Pointer Generator
Guangzhi Sun
Chuxu Zhang
P. Woodland
35
8
0
29 Oct 2022
Accelerating RNN-T Training and Inference Using CTC guidance
Accelerating RNN-T Training and Inference Using CTC guidance
Yongqiang Wang
Zhehuai Chen
Cheng-yong Zheng
Yu Zhang
Wei Han
Parisa Haghani
40
23
0
29 Oct 2022
Efficient Speech Translation with Dynamic Latent Perceivers
Efficient Speech Translation with Dynamic Latent Perceivers
Ioannis Tsiamas
Gerard I. Gállego
José A. R. Fonollosa
Marta R. Costa-jussá
33
2
0
28 Oct 2022
Target-Speaker Voice Activity Detection via Sequence-to-Sequence
  Prediction
Target-Speaker Voice Activity Detection via Sequence-to-Sequence Prediction
Ming Cheng
Weiqing Wang
Yucong Zhang
Xiaoyi Qin
Ming Li
VLM
56
33
0
28 Oct 2022
Random Utterance Concatenation Based Data Augmentation for Improving
  Short-video Speech Recognition
Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition
Yist Y. Lin
Tao Han
Haihua Xu
Van Tung Pham
Yerbolat Khassanov
Tze Yuang Chong
Yi He
Lu Lu
Zejun Ma
18
2
0
28 Oct 2022
Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation
Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation
Nobuyuki Morioka
Heiga Zen
Nanxin Chen
Yu Zhang
Yifan Ding
42
16
0
28 Oct 2022
A Compact End-to-End Model with Local and Global Context for Spoken
  Language Identification
A Compact End-to-End Model with Local and Global Context for Spoken Language Identification
Fei Jia
Nithin Rao Koluguri
Jagadeesh Balam
Boris Ginsburg
33
3
0
27 Oct 2022
Token-level Sequence Labeling for Spoken Language Understanding using
  Compositional End-to-End Models
Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models
Siddhant Arora
Siddharth Dalmia
Brian Yan
Florian Metze
A. Black
Shinji Watanabe
23
12
0
27 Oct 2022
Make More of Your Data: Minimal Effort Data Augmentation for Automatic
  Speech Recognition and Translation
Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation
Tsz Kin Lam
Shigehiko Schamoni
Stefan Riezler
VLM
46
8
0
27 Oct 2022
SAN: a robust end-to-end ASR model architecture
SAN: a robust end-to-end ASR model architecture
Zeping Min
Qian Ge
Guanhua Huang
24
2
0
27 Oct 2022
Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic
  Forgetting in Automatic Speech Recognition
Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic Forgetting in Automatic Speech Recognition
Steven Vander Eeckt
Hugo Van hamme
CLL
MoMe
67
14
0
27 Oct 2022
Contextual-Utterance Training for Automatic Speech Recognition
Contextual-Utterance Training for Automatic Speech Recognition
Alejandro Gomez-Alanis
Lukas Drude
A. Schwarz
Rupak Vignesh Swaminathan
Simon Wiesler
26
1
0
27 Oct 2022
Streaming Voice Conversion Via Intermediate Bottleneck Features And
  Non-streaming Teacher Guidance
Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher Guidance
Yuan-Jui Chen
Ming Tu
Tang-Chun Li
Xin Li
Qiuqiang Kong
Jiaxin Li
Zhichao Wang
Qiao Tian
Yuping Wang
Yuxuan Wang
42
11
0
27 Oct 2022
Training Autoregressive Speech Recognition Models with Limited in-domain
  Supervision
Training Autoregressive Speech Recognition Models with Limited in-domain Supervision
Chak-Fai Li
Francis Keith
William Hartmann
M. Snover
19
0
0
27 Oct 2022
ViT-CAT: Parallel Vision Transformers with Cross Attention Fusion for
  Popularity Prediction in MEC Networks
ViT-CAT: Parallel Vision Transformers with Cross Attention Fusion for Popularity Prediction in MEC Networks
Zohreh Hajiakhondi-Meybodi
Arash Mohammadi
Ming Hou
J. Abouei
Konstantinos N. Plataniotis
6
8
0
27 Oct 2022
In search of strong embedding extractors for speaker diarisation
In search of strong embedding extractors for speaker diarisation
Jee-weon Jung
Hee-Soo Heo
Bong-Jin Lee
Jaesung Huh
A. Brown
Youngki Kwon
Shinji Watanabe
Joon Son Chung
44
16
0
26 Oct 2022
Reducing Language confusion for Code-switching Speech Recognition with
  Token-level Language Diarization
Reducing Language confusion for Code-switching Speech Recognition with Token-level Language Diarization
Hexin Liu
Haihua Xu
Leibny Paola García
Andy W. H. Khong
Yi He
Sanjeev Khudanpur
27
24
0
26 Oct 2022
UFO2: A unified pre-training framework for online and offline speech
  recognition
UFO2: A unified pre-training framework for online and offline speech recognition
Li Fu
Siqi Li
Qingtao Li
L. Deng
Fangzhu Li
Lu Fan
Meng Chen
Xiaodong He
OffRL
34
8
0
26 Oct 2022
Improving Speech-to-Speech Translation Through Unlabeled Text
Improving Speech-to-Speech Translation Through Unlabeled Text
Xuan-Phi Nguyen
Sravya Popuri
Changhan Wang
Yun Tang
Ilia Kulikov
Hongyu Gong
19
9
0
26 Oct 2022
SCP-GAN: Self-Correcting Discriminator Optimization for Training
  Consistency Preserving Metric GAN on Speech Enhancement Tasks
SCP-GAN: Self-Correcting Discriminator Optimization for Training Consistency Preserving Metric GAN on Speech Enhancement Tasks
Vasily Zadorozhnyy
Qian Ye
K. Koishida
21
10
0
26 Oct 2022
Linguistic-Enhanced Transformer with CTC Embedding for Speech
  Recognition
Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition
Xulong Zhang
Jianzong Wang
Ning Cheng
Mengyuan Zhao
Zhiyong Zhang
Jing Xiao
14
0
0
25 Oct 2022
Streaming Parrotron for on-device speech-to-speech conversion
Streaming Parrotron for on-device speech-to-speech conversion
Oleg Rybakov
Fadi Biadsy
Xia Zhang
Liyang Jiang
Phoenix Meadowlark
Shivani Agrawal
37
3
0
25 Oct 2022
Highly Efficient Real-Time Streaming and Fully On-Device Speaker
  Diarization with Multi-Stage Clustering
Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering
Quan Wang
Yiling Huang
Han Lu
Guanlong Zhao
Ignacio López Moreno
34
11
0
25 Oct 2022
MetaFormer Baselines for Vision
MetaFormer Baselines for Vision
Weihao Yu
Chenyang Si
Pan Zhou
Mi Luo
Yichen Zhou
Jiashi Feng
Shuicheng Yan
Xinchao Wang
MoE
42
160
0
24 Oct 2022
Development of Hybrid ASR Systems for Low Resource Medical Domain
  Conversational Telephone Speech
Development of Hybrid ASR Systems for Low Resource Medical Domain Conversational Telephone Speech
Christoph Luscher
Mohammad Zeineldeen
Zijian Yang
Tina Raissi
Peter Vieting
Khai Le-Duc
Weiyue Wang
Ralf Schluter
Hermann Ney
16
5
0
24 Oct 2022
ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition
ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition
Sanchit Gandhi
Patrick von Platen
Alexander M. Rush
32
24
0
24 Oct 2022
Weak-Supervised Dysarthria-invariant Features for Spoken Language
  Understanding using an FHVAE and Adversarial Training
Weak-Supervised Dysarthria-invariant Features for Spoken Language Understanding using an FHVAE and Adversarial Training
Jinzi Qi
Hugo Van hamme
AAML
21
1
0
24 Oct 2022
10 hours data is all you need
10 hours data is all you need
Zeping Min
Qian Ge
Zhong Li
26
2
0
24 Oct 2022
Tighter Abstract Queries in Neural Network Verification
Tighter Abstract Queries in Neural Network Verification
Elazar Cohen
Y. Elboher
Clark W. Barrett
Guy Katz
35
5
0
23 Oct 2022
Low-Resource Multilingual and Zero-Shot Multispeaker TTS
Low-Resource Multilingual and Zero-Shot Multispeaker TTS
Florian Lux
Julia Koch
Ngoc Thang Vu
40
22
0
21 Oct 2022
Optimizing Bilingual Neural Transducer with Synthetic Code-switching
  Text Generation
Optimizing Bilingual Neural Transducer with Synthetic Code-switching Text Generation
Thien Nguyen
Nathalie Tran
Liuhui Deng
Thiago Fraga da Silva
Matthew Radzihovsky
...
Honza Silovsky
Arnab Ghoshal
M. Martel
Bharat Ram Ambati
Mohamed Ali
46
5
0
21 Oct 2022
Audio-to-Intent Using Acoustic-Textual Subword Representations from
  End-to-End ASR
Audio-to-Intent Using Acoustic-Textual Subword Representations from End-to-End ASR
Pranay Dighe
Prateeth Nayak
Oggi Rudovic
Erik Marchi
Xiaochuan Niu
Ahmed H. Tewfik
51
4
0
21 Oct 2022
Joint Speech Translation and Named Entity Recognition
Joint Speech Translation and Named Entity Recognition
Marco Gaido
Sara Papi
Matteo Negri
Marco Turchi
35
3
0
21 Oct 2022
Play It Back: Iterative Attention for Audio Recognition
Play It Back: Iterative Attention for Audio Recognition
Alexandros Stergiou
Dima Damen
39
4
0
20 Oct 2022
Large-scale learning of generalised representations for speaker
  recognition
Large-scale learning of generalised representations for speaker recognition
Jee-weon Jung
Hee-Soo Heo
Bong-Jin Lee
Jaesong Lee
Hye-jin Shim
Youngki Kwon
Joon Son Chung
Shinji Watanabe
CVBM
36
6
0
20 Oct 2022
G-Augment: Searching for the Meta-Structure of Data Augmentation
  Policies for ASR
G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR
Gary Wang
Ekin D.Cubuk
Andrew Rosenberg
Shuyang Cheng
Ron J. Weiss
Bhuvana Ramabhadran
Pedro J. Moreno
Quoc V. Le
Daniel S. Park
35
1
0
19 Oct 2022
End-to-End Integration of Speech Recognition, Dereverberation,
  Beamforming, and Self-Supervised Learning Representation
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation
Yoshiki Masuyama
Xuankai Chang
Samuele Cornell
Shinji Watanabe
Nobutaka Ono
24
19
0
19 Oct 2022
Two-stage training method for Japanese electrolaryngeal speech
  enhancement based on sequence-to-sequence voice conversion
Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion
D. Ma
Lester Phillip Violeta
Kazuhiro Kobayashi
Tomoki Toda
29
6
0
19 Oct 2022
Previous
123...222324...333435
Next