ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,758 papers shown
Title
MSAC: Multiple Speech Attribute Control Method for Reliable Speech
  Emotion Recognition
MSAC: Multiple Speech Attribute Control Method for Reliable Speech Emotion Recognition
Yu Pan
Yuguang Yang
Yuheng Huang
Jixun Yao
Jingjing Yin
Yanni Hu
Heng Lu
Lei Ma
Jianjun Zhao
42
6
0
08 Aug 2023
ApproBiVT: Lead ASR Models to Generalize Better Using Approximated
  Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
Fangyuan Wang
Ming Hao
Yuhai Shi
Bo Xu
MoMe
26
0
0
05 Aug 2023
Audio-visual video-to-speech synthesis with synthesized input audio
Audio-visual video-to-speech synthesis with synthesized input audio
Triantafyllos Kefalas
Yannis Panagakis
Maja Pantic
VGen
DiffM
43
1
0
31 Jul 2023
SpatialNet: Extensively Learning Spatial Information for Multichannel
  Joint Speech Separation, Denoising and Dereverberation
SpatialNet: Extensively Learning Spatial Information for Multichannel Joint Speech Separation, Denoising and Dereverberation
Changsheng Quan
Xiaofei Li
18
36
0
31 Jul 2023
ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus
ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus
Tolulope Ogunremi
Kólá Túbosún
Aremu Anuoluwapo
Iroro Orife
David Ifeoluwa Adelani
47
6
0
29 Jul 2023
On-Device Speaker Anonymization of Acoustic Embeddings for ASR based
  onFlexible Location Gradient Reversal Layer
On-Device Speaker Anonymization of Acoustic Embeddings for ASR based onFlexible Location Gradient Reversal Layer
Md. Asif Jalal
Pablo Peso Parada
Jisi Zhang
Karthikeyan P. Saravanan
Mete Ozay
Myoungji Han
Jung In Lee
Seokyeong Jung
28
1
0
25 Jul 2023
Integration of Frame- and Label-synchronous Beam Search for Streaming
  Encoder-decoder Speech Recognition
Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition
E. Tsunoo
Hayato Futami
Yosuke Kashiwagi
Siddhant Arora
Shinji Watanabe
35
4
0
24 Jul 2023
Adaptation of Whisper models to child speech recognition
Adaptation of Whisper models to child speech recognition
Rishabh Jain
Andrei Barcovschi
Mariam Yiwere
Peter Corcoran
H. Cucu
21
30
0
24 Jul 2023
Modality Confidence Aware Training for Robust End-to-End Spoken Language
  Understanding
Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding
Suyoun Kim
Akshat Shrivastava
Duc Le
Ju Lin
Ozlem Kalinli
M. Seltzer
AuLLM
38
2
0
22 Jul 2023
PINNsFormer: A Transformer-Based Framework For Physics-Informed Neural
  Networks
PINNsFormer: A Transformer-Based Framework For Physics-Informed Neural Networks
Leo Zhao
Xueying Ding
B. Prakash
PINN
AI4CE
28
28
0
21 Jul 2023
Prompting Large Language Models with Speech Recognition Abilities
Prompting Large Language Models with Speech Recognition Abilities
Yassir Fathullah
Chunyang Wu
Egor Lakomkin
Junteng Jia
Yuan Shangguan
...
Wenhan Xiong
Jay Mahadeokar
Ozlem Kalinli
Christian Fuegen
M. Seltzer
AuLLM
35
133
0
21 Jul 2023
Globally Normalising the Transducer for Streaming Speech Recognition
Globally Normalising the Transducer for Streaming Speech Recognition
Rogier van Dalen
40
0
0
20 Jul 2023
Leveraging Visemes for Better Visual Speech Representation and Lip
  Reading
Leveraging Visemes for Better Visual Speech Representation and Lip Reading
J. Peymanfard
Vahid Saeedi
Mohammad Reza Mohammadi
Hossein Zeinali
N. Mozayani
51
2
0
19 Jul 2023
Exploring Transformer Extrapolation
Exploring Transformer Extrapolation
Zhen Qin
Yiran Zhong
Huiyuan Deng
31
9
0
19 Jul 2023
Linearized Relative Positional Encoding
Linearized Relative Positional Encoding
Zhen Qin
Weixuan Sun
Kaiyue Lu
Huizhong Deng
Dong Li
Xiaodong Han
Yuchao Dai
Lingpeng Kong
Yiran Zhong
20
13
0
18 Jul 2023
OxfordVGG Submission to the EGO4D AV Transcription Challenge
OxfordVGG Submission to the EGO4D AV Transcription Challenge
Jaesung Huh
Max Bain
Andrew Zisserman
47
0
0
18 Jul 2023
BASS: Block-wise Adaptation for Speech Summarization
BASS: Block-wise Adaptation for Speech Summarization
Roshan S. Sharma
Kenneth Zheng
Siddhant Arora
Shinji Watanabe
Rita Singh
Bhiksha Raj
37
7
0
17 Jul 2023
Representation Learning With Hidden Unit Clustering For Low Resource
  Speech Applications
Representation Learning With Hidden Unit Clustering For Low Resource Speech Applications
Varun Krishna
T. Sai
Sriram Ganapathy
SSL
37
2
0
14 Jul 2023
Replay to Remember: Continual Layer-Specific Fine-tuning for German
  Speech Recognition
Replay to Remember: Continual Layer-Specific Fine-tuning for German Speech Recognition
Theresa Pekarek-Rosin
S. Wermter
VLM
CLL
37
2
0
14 Jul 2023
Improving BERT with Hybrid Pooling Network and Drop Mask
Improving BERT with Hybrid Pooling Network and Drop Mask
Qian Chen
Wen Wang
Qinglin Zhang
Chong Deng
Ma Yukun
Siqi Zheng
21
1
0
14 Jul 2023
Long Short-term Memory with Two-Compartment Spiking Neuron
Long Short-term Memory with Two-Compartment Spiking Neuron
Shimin Zhang
Qu Yang
Chenxiang Ma
Jibin Wu
Haizhou Li
Kay Chen Tan
38
7
0
14 Jul 2023
Leveraging Pretrained ASR Encoders for Effective and Efficient
  End-to-End Speech Intent Classification and Slot Filling
Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling
Hengguan Huang
Jagadeesh Balam
Boris Ginsburg
28
4
0
13 Jul 2023
Adapting an ASR Foundation Model for Spoken Language Assessment
Adapting an ASR Foundation Model for Spoken Language Assessment
Rao Ma
Mengjie Qian
Mark Gales
Kate Knill
41
12
0
13 Jul 2023
Exploring the Integration of Large Language Models into Automatic Speech
  Recognition Systems: An Empirical Study
Exploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical Study
Zeping Min
Jinbo Wang
AuLLM
40
13
0
13 Jul 2023
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for
  Speech Recognition and Understanding
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding
Titouan Parcollet
Rogier van Dalen
Shucong Zhang
S. Bhattacharya
31
6
0
12 Jul 2023
Improving RNN-Transducers with Acoustic LookAhead
Improving RNN-Transducers with Acoustic LookAhead
Vinit Unni
Ashish R. Mittal
Preethi Jyothi
Sunita Sarawagi
37
2
0
11 Jul 2023
The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023
  Speech-to-Speech Translation Task
The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task
Kun Song
Yinjiao Lei
Pei-Ning Chen
Yiqing Cao
Kun Wei
Yongmao Zhang
Linfu Xie
Ning Jiang
Guoqing Zhao
29
1
0
10 Jul 2023
Can Generative Large Language Models Perform ASR Error Correction?
Can Generative Large Language Models Perform ASR Error Correction?
Rao Ma
Mengjie Qian
Potsawee Manakul
Mark Gales
Kate Knill
AuLLM
KELM
29
50
0
09 Jul 2023
inTformer: A Time-Embedded Attention-Based Transformer for Crash
  Likelihood Prediction at Intersections Using Connected Vehicle Data
inTformer: A Time-Embedded Attention-Based Transformer for Crash Likelihood Prediction at Intersections Using Connected Vehicle Data
B. M. Tazbiul
Zubayer Islam
Mohamed Abdel-Aty
38
6
0
07 Jul 2023
Token-Level Serialized Output Training for Joint Streaming ASR and ST
  Leveraging Textual Alignments
Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Sara Papi
Peidong Wan
Junkun Chen
Jian Xue
Jinyu Li
Yashesh Gaur
33
8
0
07 Jul 2023
Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation
  and Recognition
Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition
Guinan Li
Jiajun Deng
Mengzhe Geng
Zengrui Jin
Tianzi Wang
Shujie Hu
Mingyu Cui
Helen M. Meng
Xunying Liu
47
10
0
06 Jul 2023
Using Data Augmentations and VTLN to Reduce Bias in Dutch End-to-End
  Speech Recognition Systems
Using Data Augmentations and VTLN to Reduce Bias in Dutch End-to-End Speech Recognition Systems
T. Patel
O. Scharenborg
42
3
0
05 Jul 2023
Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited
  Annotated Data
Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data
Guangzhi Sun
Chuxu Zhang
Ivan Vulić
Paweł Budzianowski
P. Woodland
41
6
0
04 Jul 2023
Align With Purpose: Optimize Desired Properties in CTC Models with a
  General Plug-and-Play Framework
Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework
Eliya Segev
Maya Alroy
Ronen Katsir
Noam Wies
Ayana Shenhav
...
D. Zar
Oren Tadmor
Jacob Bitterman
Amnon Shashua
Tal Rosenwein
39
2
0
04 Jul 2023
Pretraining Conformer with ASR or ASV for Anti-Spoofing Countermeasure
Pretraining Conformer with ASR or ASV for Anti-Spoofing Countermeasure
Yikang Wang
Hiromitsu Nishizaki
Ming Li
45
0
0
04 Jul 2023
Multilingual Contextual Adapters To Improve Custom Word Recognition In
  Low-resource Languages
Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages
Devang Kulshreshtha
Saket Dingliwal
Brady C. Houston
S. Bodapati
24
2
0
03 Jul 2023
Towards Real Smart Apps: Investigating Human-AI Interactions in
  Smartphone On-Device AI Apps
Towards Real Smart Apps: Investigating Human-AI Interactions in Smartphone On-Device AI Apps
Jason Ching Yuen Siu
Jieshan Chen
Yujin Huang
Zhenchang Xing
Chunyang Chen
21
0
0
03 Jul 2023
Conformer LLMs -- Convolution Augmented Large Language Models
Conformer LLMs -- Convolution Augmented Large Language Models
Prateek Verma
30
1
0
02 Jul 2023
What Do Self-Supervised Speech Models Know About Words?
What Do Self-Supervised Speech Models Know About Words?
Ankita Pasad
C. Chien
Shane Settle
Karen Livescu
SSL
56
26
0
30 Jun 2023
Leveraging Cross-Utterance Context For ASR Decoding
Leveraging Cross-Utterance Context For ASR Decoding
Robert Flynn
Anton Ragni
35
1
0
29 Jun 2023
Cascaded encoders for fine-tuning ASR models on overlapped speech
Cascaded encoders for fine-tuning ASR models on overlapped speech
R. Rose
Oscar Chang
Olivier Siohan
34
1
0
28 Jun 2023
UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data
UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data
Heeseung Kim
Sungwon Kim
Ji-Ran Yeom
Sung-Wan Yoon
DiffM
37
21
0
28 Jun 2023
Confidence-based Ensembles of End-to-End Speech Recognition Models
Confidence-based Ensembles of End-to-End Speech Recognition Models
Igor Gitman
Vitaly Lavrukhin
A. Laptev
Boris Ginsburg
UQCV
38
8
0
27 Jun 2023
Large-scale unsupervised audio pre-training for video-to-speech
  synthesis
Large-scale unsupervised audio pre-training for video-to-speech synthesis
Triantafyllos Kefalas
Yannis Panagakis
Maja Pantic
VGen
40
3
0
27 Jun 2023
Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and
  Dysarthric Speech Recognition
Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition
Tianzi Wang
Shoukang Hu
Jiajun Deng
Zengrui Jin
Mengzhe Geng
Yi Wang
Helen M. Meng
Xunying Liu
30
5
0
27 Jun 2023
Reducing the gap between streaming and non-streaming Transducer-based
  ASR by adaptive two-stage knowledge distillation
Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation
Haitao Tang
Yu Fu
Lei Sun
Jiabin Xue
Dan Liu
...
Zhiqiang Ma
Minghui Wu
Jia Pan
Genshun Wan
Ming’En Zhao
34
2
0
27 Jun 2023
Factorised Speaker-environment Adaptive Training of Conformer Speech
  Recognition Systems
Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems
Jiajun Deng
Guinan Li
Xurong Xie
Zengrui Jin
Mingyu Cui
Tianzi Wang
Shujie Hu
Mengzhe Geng
Xunying Liu
BDL
30
1
0
26 Jun 2023
Intensity-free Convolutional Temporal Point Process: Incorporating Local
  and Global Event Contexts
Intensity-free Convolutional Temporal Point Process: Incorporating Local and Global Event Contexts
Wangtao Zhou
Zhao Kang
Ling Tian
Yimu Su
46
11
0
24 Jun 2023
An Analysis of Personalized Speech Recognition System Development for
  the Deaf and Hard-of-Hearing
An Analysis of Personalized Speech Recognition System Development for the Deaf and Hard-of-Hearing
Lester Phillip Violeta
Tomoki Toda
31
2
0
24 Jun 2023
Improving End-to-End Neural Diarization Using Conversational Summary
  Representations
Improving End-to-End Neural Diarization Using Conversational Summary Representations
Samuel J. Broughton
Lahiru Samarakoon
23
7
0
24 Jun 2023
Previous
123...151617...343536
Next