ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,749 papers shown
Title
Speaker-Independent Acoustic-to-Articulatory Inversion through
  Multi-Channel Attention Discriminator
Speaker-Independent Acoustic-to-Articulatory Inversion through Multi-Channel Attention Discriminator
Woo-Jin Chung
Hong-Goo Kang
31
1
0
25 Jun 2024
AG-LSEC: Audio Grounded Lexical Speaker Error Correction
AG-LSEC: Audio Grounded Lexical Speaker Error Correction
Rohit Paturi
Xiang Li
S. Srinivasan
38
1
0
25 Jun 2024
Exploring the Capability of Mamba in Speech Applications
Exploring the Capability of Mamba in Speech Applications
Koichi Miyazaki
Yoshiki Masuyama
Masato Murata
Mamba
40
12
0
24 Jun 2024
Contextualized End-to-end Automatic Speech Recognition with Intermediate
  Biasing Loss
Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
Muhammad Shakeel
Yui Sudo
Yifan Peng
Shinji Watanabe
AI4CE
31
2
0
23 Jun 2024
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient
  Zero-Shot Text to Speech Synthesizers
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers
Yakun Song
Zhuo Chen
Xiaofei Wang
Ziyang Ma
Guanrou Yang
Xie Chen
AuLLM
46
3
0
22 Jun 2024
PI-Whisper: An Adaptive and Incremental ASR Framework for Diverse and
  Evolving Speaker Characteristics
PI-Whisper: An Adaptive and Incremental ASR Framework for Diverse and Evolving Speaker Characteristics
Amir Nassereldine
Dancheng Liu
Chenhui Xu
Jinjun Xiong
44
0
0
21 Jun 2024
Investigating the impact of 2D gesture representation on co-speech
  gesture generation
Investigating the impact of 2D gesture representation on co-speech gesture generation
Teo Guichoux
Laure Soulier
Nicolas Obin
Catherine Pelachaud
SLR
19
0
0
21 Jun 2024
InterBiasing: Boost Unseen Word Recognition through Biasing Intermediate
  Predictions
InterBiasing: Boost Unseen Word Recognition through Biasing Intermediate Predictions
Yu Nakagome
Michael Hentschel
47
0
0
21 Jun 2024
DASB -- Discrete Audio and Speech Benchmark
DASB -- Discrete Audio and Speech Benchmark
Pooneh Mousavi
Luca Della Libera
J. Duret
Artem Ploujnikov
Cem Subakan
Mirco Ravanelli
35
12
0
20 Jun 2024
SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech Translation
SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech Translation
Sara Papi
Marco Gaido
Matteo Negri
L. Bentivogli
67
2
0
20 Jun 2024
A Primal-Dual Framework for Transformers and Neural Networks
A Primal-Dual Framework for Transformers and Neural Networks
Tan M. Nguyen
Tam Nguyen
Nhat Ho
Andrea L. Bertozzi
Richard G. Baraniuk
Stanley J. Osher
ViT
29
13
0
19 Jun 2024
Enhancing Automated Audio Captioning via Large Language Models with
  Optimized Audio Encoding
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
Jizhong Liu
Gang Li
Junbo Zhang
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Yujun Wang
Bin Wang
AuLLM
57
2
0
19 Jun 2024
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End
  Crossmodal Audio Token Synchronization
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
Young Jin Ahn
Jungwoo Park
Sangha Park
Jonghyun Choi
Kee-Eung Kim
34
7
0
18 Jun 2024
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for
  Low-Resource Languages with Automated Crawling, Transcription and Refinement
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
Yifan Yang
Zheshu Song
Jianheng Zhuo
Mingyu Cui
Jinpeng Li
...
Shuai Fan
Kai Yu
Wei-Qiang Zhang
Guoguo Chen
Xie Chen
35
8
0
17 Jun 2024
Self-Train Before You Transcribe
Self-Train Before You Transcribe
Robert Flynn
Anton Ragni
38
0
0
17 Jun 2024
NAST: Noise Aware Speech Tokenization for Speech Language Models
NAST: Noise Aware Speech Tokenization for Speech Language Models
Shoval Messica
Yossi Adi
30
6
0
16 Jun 2024
Lightweight Audio Segmentation for Long-form Speech Translation
Lightweight Audio Segmentation for Long-form Speech Translation
Jaesong Lee
Soyoon Kim
Hanbyul Kim
Joon Son Chung
38
0
0
15 Jun 2024
One-pass Multiple Conformer and Foundation Speech Systems Compression
  and Quantization Using An All-in-one Neural Model
One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model
Zhaoqing Li
Haoning Xu
Tianzi Wang
Shoukang Hu
Zengrui Jin
Shujie Hu
Jiajun Deng
Mingyu Cui
Mengzhe Geng
Xunying Liu
MQ
37
1
0
14 Jun 2024
Joint Speaker Features Learning for Audio-visual Multichannel Speech
  Separation and Recognition
Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
Guinan Li
Jiajun Deng
Youjun Chen
Mengzhe Geng
Shujie Hu
...
Zengrui Jin
Tianzi Wang
Xurong Xie
Helen Meng
Xunying Liu
VLM
34
0
0
14 Jun 2024
An efficient text augmentation approach for contextualized Mandarin
  speech recognition
An efficient text augmentation approach for contextualized Mandarin speech recognition
Naijun Zheng
Xucheng Wan
Kai Liu
Ziqing Du
Zhou Huan
40
1
0
14 Jun 2024
Period Singer: Integrating Periodic and Aperiodic Variational
  Autoencoders for Natural-Sounding End-to-End Singing Voice Synthesis
Period Singer: Integrating Periodic and Aperiodic Variational Autoencoders for Natural-Sounding End-to-End Singing Voice Synthesis
Taewoo Kim
Choongsang Cho
Young Han Lee
AI4TS
41
0
0
14 Jun 2024
Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese
  Disordered Speech Recognition
Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
Yicong Jiang
Tianzi Wang
Xurong Xie
Juan Liu
Wei Sun
Nan Yan
Hui Chen
Lan Wang
Xunying Liu
Feng Tian
26
2
0
14 Jun 2024
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with
  Progressive Constraints in a Dual-mode Training Strategy
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy
Linhan Ma
Xinfa Zhu
Yuanjun Lv
Zhichao Wang
Ziqian Wang
Wendi He
Hongbin Zhou
Lei Xie
42
2
0
14 Jun 2024
Optimizing Byte-level Representation for End-to-end ASR
Optimizing Byte-level Representation for End-to-end ASR
Roger Hsiao
Liuhui Deng
Erik McDermott
R. Travadi
Xiaodan Zhuang
26
0
0
14 Jun 2024
Speech ReaLLM -- Real-time Streaming Speech Recognition with Multimodal
  LLMs by Teaching the Flow of Time
Speech ReaLLM -- Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of Time
Frank Seide
Morrie Doulaty
Yangyang Shi
Yashesh Gaur
Junteng Jia
Chunyang Wu
AuLLM
KELM
32
8
0
13 Jun 2024
MFF-EINV2: Multi-scale Feature Fusion across Spectral-Spatial-Temporal
  Domains for Sound Event Localization and Detection
MFF-EINV2: Multi-scale Feature Fusion across Spectral-Spatial-Temporal Domains for Sound Event Localization and Detection
Da Mu
Zhicheng Zhang
Haobo Yue
29
2
0
13 Jun 2024
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling
  Constraints, Languages, and Datasets
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
Jiatong Shi
Shih-Heng Wang
William Chen
Martijn Bartelds
Vanya Bannihatti Kumar
...
Xuankai Chang
Dan Jurafsky
Karen Livescu
Hung-yi Lee
Shinji Watanabe
AuLLM
77
5
0
12 Jun 2024
SCDNet: Self-supervised Learning Feature-based Speaker Change Detection
SCDNet: Self-supervised Learning Feature-based Speaker Change Detection
Yue Li
Xinsheng Wang
Li Zhang
Lei Xie
45
1
0
12 Jun 2024
Short-Long Convolutions Help Hardware-Efficient Linear Attention to
  Focus on Long Sequences
Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences
Zicheng Liu
Siyuan Li
Li Wang
Zedong Wang
Yunfan Liu
Stan Z. Li
35
7
0
12 Jun 2024
Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with
  Motion and Appearance Disentanglement
Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement
Runyi Yu
Tianyu He
Ailing Zhang
Yuchi Wang
Junliang Guo
Xu Tan
Chang Liu
Jie Chen
Jiang Bian
VGen
34
4
0
12 Jun 2024
Improving child speech recognition with augmented child-like speech
Improving child speech recognition with augmented child-like speech
Yuanyuan Zhang
Zhengjun Yue
T. Patel
O. Scharenborg
32
5
0
12 Jun 2024
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via
  Monotonic Alignment
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment
Bing Han
Long Zhou
Shujie Liu
Sanyuan Chen
Lingwei Meng
Yanming Qian
Yanqing Liu
Sheng Zhao
Jinyu Li
Furu Wei
49
15
0
12 Jun 2024
Zero-Shot Fake Video Detection by Audio-Visual Consistency
Zero-Shot Fake Video Detection by Audio-Visual Consistency
Xiaolou Li
Zehua Liu
Chen Chen
Lantian Li
Li Guo
D. Wang
63
4
0
12 Jun 2024
Target Speaker Extraction with Curriculum Learning
Target Speaker Extraction with Curriculum Learning
Yun Liu
Xuechen Liu
Xiaoxiao Miao
Junichi Yamagishi
23
3
0
12 Jun 2024
PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken
  Language Understanding
PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding
Trang Le
Daniel Lazar
Suyoun Kim
Shan Jiang
Duc Le
Adithya Sagar
Aleksandr Livshits
Ahmed Aly
Akshat Shrivastava
43
0
0
12 Jun 2024
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Xuankai Chang
Jiatong Shi
Jinchuan Tian
Yuning Wu
Yuxun Tang
Yihan Wu
Shinji Watanabe
Yossi Adi
Xie Chen
Qin Jin
47
15
0
11 Jun 2024
CTC-based Non-autoregressive Textless Speech-to-Speech Translation
CTC-based Non-autoregressive Textless Speech-to-Speech Translation
Qingkai Fang
Zhengrui Ma
Yan Zhou
Min Zhang
Yang Feng
52
0
0
11 Jun 2024
MM-KWS: Multi-modal Prompts for Multilingual User-defined Keyword
  Spotting
MM-KWS: Multi-modal Prompts for Multilingual User-defined Keyword Spotting
Zhiqi Ai
Zhiyong Chen
Shugong Xu
40
2
0
11 Jun 2024
Can We Achieve High-quality Direct Speech-to-Speech Translation without
  Parallel Speech Data?
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?
Qingkai Fang
Shaolei Zhang
Zhengrui Ma
Min Zhang
Yang Feng
VLM
43
1
0
11 Jun 2024
AS-70: A Mandarin stuttered speech dataset for automatic speech
  recognition and stuttering event detection
AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
Rong Gong
Hongfei Xue
L. xilinx Wang
Xin Xu
Qisheng Li
...
Yong Qin
Binbin Zhang
Jun Du
Jia Bin
Ming Li
25
6
0
11 Jun 2024
A Non-autoregressive Generation Framework for End-to-End Simultaneous
  Speech-to-Any Translation
A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation
Zhengrui Ma
Qingkai Fang
Shaolei Zhang
Shoutao Guo
Yang Feng
Min Zhang
53
9
0
11 Jun 2024
Controlling Emotion in Text-to-Speech with Natural Language Prompts
Controlling Emotion in Text-to-Speech with Natural Language Prompts
Thomas Bott
Florian Lux
Ngoc Thang Vu
38
6
0
10 Jun 2024
ASTRA: Aligning Speech and Text Representations for Asr without Sampling
ASTRA: Aligning Speech and Text Representations for Asr without Sampling
Neeraj Gaur
Rohan Agrawal
Gary Wang
Parisa Haghani
Andrew Rosenberg
Bhuvana Ramabhadran
42
0
0
10 Jun 2024
A Parameter-efficient Language Extension Framework for Multilingual ASR
A Parameter-efficient Language Extension Framework for Multilingual ASR
Wei Liu
Jingyong Hou
Dong Yang
Muyong Cao
Tan Lee
CLL
80
2
0
10 Jun 2024
Label-Looping: Highly Efficient Decoding for Transducers
Label-Looping: Highly Efficient Decoding for Transducers
Vladimir Bataev
Hainan Xu
Daniel Galvez
Vitaly Lavrukhin
Boris Ginsburg
40
5
0
10 Jun 2024
StreamAtt: Direct Streaming Speech-to-Text Translation with
  Attention-based Audio History Selection
StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection
Sara Papi
Marco Gaido
Matteo Negri
L. Bentivogli
79
4
0
10 Jun 2024
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text
  to Speech Synthesizers
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Sanyuan Chen
Shujie Liu
Long Zhou
Yanqing Liu
Xu Tan
Jinyu Li
Sheng Zhao
Yao Qian
Furu Wei
VLM
47
67
0
08 Jun 2024
Label-Synchronous Neural Transducer for E2E Simultaneous Speech
  Translation
Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation
Keqi Deng
Philip C. Woodland
43
4
0
06 Jun 2024
LipGER: Visually-Conditioned Generative Error Correction for Robust
  Automatic Speech Recognition
LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
Sreyan Ghosh
Sonal Kumar
Ashish Seth
Purva Chiniya
Utkarsh Tyagi
R. Duraiswami
Dinesh Manocha
49
0
0
06 Jun 2024
Vectorized Conditional Neural Fields: A Framework for Solving
  Time-dependent Parametric Partial Differential Equations
Vectorized Conditional Neural Fields: A Framework for Solving Time-dependent Parametric Partial Differential Equations
Jan Hagnberger
Marimuthu Kalimuthu
Daniel Musekamp
Mathias Niepert
AI4TS
AI4CE
47
5
0
06 Jun 2024
Previous
123...678...333435
Next