ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,758 papers shown
Title
AAS-VC: On the Generalization Ability of Automatic Alignment Search
  based Non-autoregressive Sequence-to-sequence Voice Conversion
AAS-VC: On the Generalization Ability of Automatic Alignment Search based Non-autoregressive Sequence-to-sequence Voice Conversion
Wen-Chin Huang
Kazuhiro Kobayashi
Tomoki Toda
24
2
0
14 Sep 2023
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
Yongqiang Wang
Jionghao Bai
Rongjie Huang
Ruiqi Li
Zhiqing Hong
Zhou Zhao
24
3
0
14 Sep 2023
Outlier-aware Inlier Modeling and Multi-scale Scoring for Anomalous
  Sound Detection via Multitask Learning
Outlier-aware Inlier Modeling and Multi-scale Scoring for Anomalous Sound Detection via Multitask Learning
Yucong Zhang
Hongbin Suo
Yulong Wan
Ming Li
32
4
0
14 Sep 2023
CPPF: A contextual and post-processing-free model for automatic speech
  recognition
CPPF: A contextual and post-processing-free model for automatic speech recognition
Lei Zhang
Zhengkun Tian
Xiang Chen
Jiaming Sun
Hongyu Xiang
Ke Ding
Guanglu Wan
39
0
0
14 Sep 2023
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Yifan Yang
Feiyu Shen
Chenpeng Du
Ziyang Ma
K. Yu
Daniel Povey
Xie Chen
43
26
0
14 Sep 2023
Attention-based Encoder-Decoder End-to-End Neural Diarization with
  Embedding Enhancer
Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer
Zhengyang Chen
Bing Han
Shuai Wang
Yan-min Qian
33
18
0
13 Sep 2023
Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio
  Representation
Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation
Anna Deichler
Shivam Mehta
Simon Alexanderson
Jonas Beskow
DiffM
25
24
0
11 Sep 2023
Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of
  SSWP
Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP
Jinzuomu Zhong
Yang Li
Hui Huang
Korin Richmond
Jie Liu
Zhiba Su
Jing Guo
Benlai Tang
Fengjie Zhu
23
1
0
11 Sep 2023
SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
Haoxu Wang
Fan Yu
Xian Shi
Yuezhang Wang
Shiliang Zhang
Ming Li
37
11
0
11 Sep 2023
Enhancing Speaker Diarization with Large Language Models: A Contextual
  Beam Search Approach
Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach
T. Park
Kunal Dhawan
Nithin Rao Koluguri
Jagadeesh Balam
44
15
0
11 Sep 2023
Leveraging Large Language Models for Exploiting ASR Uncertainty
Leveraging Large Language Models for Exploiting ASR Uncertainty
Pranay Dighe
Yi Su
Shangshang Zheng
Yunshu Liu
Vineet Garg
Xiaochuan Niu
Ahmed H. Tewfik
13
12
0
09 Sep 2023
Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech
  Recognition
Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition
Huaibo Zhao
Yosuke Higuchi
Yusuke Kida
Tetsuji Ogawa
Tetsunori Kobayashi
28
1
0
09 Sep 2023
End-to-End Speech Recognition and Disfluency Removal with Acoustic
  Language Model Pretraining
End-to-End Speech Recognition and Disfluency Removal with Acoustic Language Model Pretraining
Saksham Bassi
Giulio Duregon
Siddhartha Jalagam
David Roth
41
2
0
08 Sep 2023
Multiple Representation Transfer from Large Language Models to
  End-to-End ASR Systems
Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems
Takuma Udagawa
Masayuki Suzuki
Gakuto Kurata
Masayasu Muraoka
G. Saon
46
2
0
07 Sep 2023
MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge
  2023
MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023
Zhihang Xu
Shaofei Zhang
Xi Wang
Jiajun Zhang
Wenning Wei
Lei He
Sheng Zhao
23
2
0
06 Sep 2023
Bring the Noise: Introducing Noise Robustness to Pretrained Automatic
  Speech Recognition
Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition
Patrick Eickhoff
M. Möller
Theresa Pekarek-Rosin
Johannes Twiefel
Stefan Wermter
28
2
0
05 Sep 2023
Text-Only Domain Adaptation for End-to-End Speech Recognition through
  Down-Sampling Acoustic Representation
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
Jiaxu Zhu
Weinan Tong
Yaoxun Xu
Chang Song
Zhiyong Wu
Zhao You
Dan Su
Dong Yu
Helen M. Meng
32
0
0
04 Sep 2023
SememeASR: Boosting Performance of End-to-End Speech Recognition against
  Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Jiaxu Zhu
Chang Song
Zhiyong Wu
Helen Meng
VLM
34
0
0
04 Sep 2023
MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice
  Conversion by Multi-scale Style Modeling
MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Zhichao Wang
Xinsheng Wang
Qicong Xie
Tao Li
Linfu Xie
Qiao Tian
Yuping Wang
34
4
0
03 Sep 2023
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for
  Text-to-Speech -- A Study between English and Mandarin
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Tao Li
Chenxu Hu
Jian Cong
Xinfa Zhu
Jingbei Li
Qiao Tian
Yuping Wang
Linfu Xie
DiffM
57
8
0
02 Sep 2023
CoNeTTE: An efficient Audio Captioning system leveraging multiple
  datasets with Task Embedding
CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding
Etienne Labbé
Thomas Pellegrini
J. Pinquier
30
12
0
01 Sep 2023
Remixing-based Unsupervised Source Separation from Scratch
Remixing-based Unsupervised Source Separation from Scratch
Kohei Saijo
Tetsuji Ogawa
18
3
0
01 Sep 2023
RepCodec: A Speech Representation Codec for Speech Tokenization
RepCodec: A Speech Representation Codec for Speech Tokenization
Zhichao Huang
Chutong Meng
Tom Ko
22
25
0
31 Aug 2023
Improving vision-inspired keyword spotting using dynamic module skipping
  in streaming conformer encoder
Improving vision-inspired keyword spotting using dynamic module skipping in streaming conformer encoder
Alexandre Bittar
Paul Dixon
Mohammad Samragh
K. Nishu
Devang Naik
33
3
0
31 Aug 2023
Knowledge Distillation from Non-streaming to Streaming ASR Encoder using
  Auxiliary Non-streaming Layer
Knowledge Distillation from Non-streaming to Streaming ASR Encoder using Auxiliary Non-streaming Layer
Kyuhong Shim
Jinkyu Lee
Simyoung Chang
Kyuwoong Hwang
47
2
0
31 Aug 2023
Let There Be Sound: Reconstructing High Quality Speech from Silent
  Videos
Let There Be Sound: Reconstructing High Quality Speech from Silent Videos
Ji-Hoon Kim
Jaehun Kim
Joon Son Chung
42
5
0
29 Aug 2023
Speech Self-Supervised Representations Benchmarking: a Case for Larger
  Probing Heads
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads
Salah Zaiem
Youcef Kemiche
Titouan Parcollet
S. Essid
Mirco Ravanelli
SSL
29
11
0
28 Aug 2023
Decoupled Structure for Improved Adaptability of End-to-End Models
Decoupled Structure for Improved Adaptability of End-to-End Models
Keqi Deng
P. Woodland
AuLLM
32
2
0
25 Aug 2023
TC-LIF: A Two-Compartment Spiking Neuron Model for Long-Term Sequential
  Modelling
TC-LIF: A Two-Compartment Spiking Neuron Model for Long-Term Sequential Modelling
Shimin Zhang
Qu Yang
Chenxiang Ma
Jibin Wu
Haizhou Li
Kay Chen Tan
35
16
0
25 Aug 2023
Exploiting Time-Frequency Conformers for Music Audio Enhancement
Exploiting Time-Frequency Conformers for Music Audio Enhancement
Yunkee Chae
Junghyun Koo
Sungho Lee
Kyogu Lee
40
3
0
24 Aug 2023
AdVerb: Visually Guided Audio Dereverberation
AdVerb: Visually Guided Audio Dereverberation
Sanjoy Chowdhury
Sreyan Ghosh
Subhrajyoti Dasgupta
Anton Ratnarajah
Utkarsh Tyagi
Tianyi Zhou
34
11
0
23 Aug 2023
KinSPEAK: Improving speech recognition for Kinyarwanda via
  semi-supervised learning methods
KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods
Antoine Nzeyimana
SSL
30
0
0
23 Aug 2023
Convoifilter: A case study of doing cocktail party speech recognition
Convoifilter: A case study of doing cocktail party speech recognition
Thai-Binh Nguyen
A. Waibel
25
2
0
22 Aug 2023
How Much Temporal Long-Term Context is Needed for Action Segmentation?
How Much Temporal Long-Term Context is Needed for Action Segmentation?
Emad Bahrami Rad
Gianpiero Francesca
Juergen Gall
ViT
32
27
0
22 Aug 2023
An Effective Transformer-based Contextual Model and Temporal Gate
  Pooling for Speaker Identification
An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Harunori Kawano
Sota Shimizu
30
1
0
22 Aug 2023
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Jinchuan Tian
Jianwei Yu
Hangting Chen
Brian Yan
Chao Weng
Dong Yu
Shinji Watanabe
42
1
0
19 Aug 2023
Explicit Estimation of Magnitude and Phase Spectra in Parallel for
  High-Quality Speech Enhancement
Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement
Ye-Xin Lu
Yang Ai
Zhenhua Ling
30
9
0
17 Aug 2023
Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals
Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals
Running Zhao
Jiang-Tao Luca Yu
Haiying Zhao
Edith C.H. Ngai
37
4
0
16 Aug 2023
Domain-Aware Fine-Tuning: Enhancing Neural Network Adaptability
Domain-Aware Fine-Tuning: Enhancing Neural Network Adaptability
Seokhyeon Ha
S. Jung
Jungwook Lee
27
3
0
15 Aug 2023
Improving CTC-AED model with integrated-CTC and auxiliary loss
  regularization
Improving CTC-AED model with integrated-CTC and auxiliary loss regularization
Daobin Zhu
Xiangdong Su
Hongbin Zhang
21
1
0
15 Aug 2023
O-1: Self-training with Oracle and 1-best Hypothesis
O-1: Self-training with Oracle and 1-best Hypothesis
M. Baskar
Andrew Rosenberg
Bhuvana Ramabhadran
Kartik Audhkhasi
VLM
27
0
0
14 Aug 2023
Text Injection for Capitalization and Turn-Taking Prediction in Speech
  Models
Text Injection for Capitalization and Turn-Taking Prediction in Speech Models
Shaan Bijwadia
Shuo-yiin Chang
Weiran Wang
Zhong Meng
Hao Zhang
Tara N. Sainath
24
1
0
14 Aug 2023
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Xiaofei Wang
Manthan Thakker
Zhuo Chen
Naoyuki Kanda
Sefik Emre Eskimez
Sanyuan Chen
M. Tang
Shujie Liu
Jinyu Li
Takuya Yoshioka
28
80
0
14 Aug 2023
Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech
  Recognition
Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition
Hanjing Zhu
Dongji Gao
Gaofeng Cheng
Daniel Povey
Pengyuan Zhang
Yonghong Yan
NoLa
40
4
0
12 Aug 2023
Flexible Keyword Spotting based on Homogeneous Audio-Text Embedding
Flexible Keyword Spotting based on Homogeneous Audio-Text Embedding
K. Nishu
Minsik Cho
Paul Dixon
Devang Naik
37
13
0
12 Aug 2023
Improving Joint Speech-Text Representations Without Alignment
Improving Joint Speech-Text Representations Without Alignment
Cal Peyser
Zhong Meng
Ke Hu
Rohit Prabhavalkar
Andrew Rosenberg
Tara N. Sainath
M. Picheny
Kyunghyun Cho
VLM
33
4
0
11 Aug 2023
Lip2Vec: Efficient and Robust Visual Speech Recognition via
  Latent-to-Latent Visual to Audio Representation Mapping
Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping
Y. A. D. Djilali
Sanath Narayan
Haithem Boussaid
Ebtesam Almazrouei
Merouane Debbah
42
10
0
11 Aug 2023
Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model
Fan Zhang
Naye Ji
Fuxing Gao
Siyuan Zhao
Zhaohan Wang
Shunman Li
32
0
0
11 Aug 2023
Conformer-based Target-Speaker Automatic Speech Recognition for
  Single-Channel Audio
Conformer-based Target-Speaker Automatic Speech Recognition for Single-Channel Audio
Yang Zhang
Krishna C. Puvvada
Vitaly Lavrukhin
Boris Ginsburg
40
14
0
09 Aug 2023
Cross-view Semantic Alignment for Livestreaming Product Recognition
Cross-view Semantic Alignment for Livestreaming Product Recognition
Wenjie Yang
Yiyi Chen
Yan Li
Yanhua Cheng
Xudong Liu
Quanming Chen
Han Li
34
2
0
09 Aug 2023
Previous
123...141516...343536
Next