ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,750 papers shown
Title
GAIA: Zero-shot Talking Avatar Generation
GAIA: Zero-shot Talking Avatar Generation
Tianyu He
Junliang Guo
Runyi Yu
Yuchi Wang
Jialiang Zhu
...
Chunyu Wang
Han Hu
HsiangTao Wu
Sheng Zhao
Jiang Bian
38
25
0
26 Nov 2023
SwiftLearn: A Data-Efficient Training Method of Deep Learning Models
  using Importance Sampling
SwiftLearn: A Data-Efficient Training Method of Deep Learning Models using Importance Sampling
Habib Hajimolahoseini
Omar Mohamed Awad
Walid Ahmed
Austin Wen
Saina Asani
...
Farnoosh Javadi
Mehdi Ahmadi
Foozhan Ataiefard
Kangling Liu
Yang Liu
31
2
0
25 Nov 2023
Weak Alignment Supervision from Hybrid Model Improves End-to-end ASR
Weak Alignment Supervision from Hybrid Model Improves End-to-end ASR
Jintao Jiang
Yingbo Gao
Zoltán Tüske
41
1
0
24 Nov 2023
Differentially Private SGD Without Clipping Bias: An Error-Feedback
  Approach
Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach
Xinwei Zhang
Zhiqi Bu
Zhiwei Steven Wu
Mingyi Hong
22
7
0
24 Nov 2023
Efficient Deep Speech Understanding at the Edge
Efficient Deep Speech Understanding at the Edge
Rongxiang Wang
Felix Lin
21
2
0
22 Nov 2023
Speaker-Adapted End-to-End Visual Speech Recognition for Continuous
  Spanish
Speaker-Adapted End-to-End Visual Speech Recognition for Continuous Spanish
David Gimeno-Gómez
Carlos David Martínez Hinarejos
36
0
0
21 Nov 2023
Improving Large-scale Deep Biasing with Phoneme Features and Text-only
  Data in Streaming Transducer
Improving Large-scale Deep Biasing with Phoneme Features and Text-only Data in Streaming Transducer
Jin Qiu
Lu Huang
Boyu Li
Jun Zhang
Lu Lu
Zejun Ma
33
3
0
15 Nov 2023
Accelerating Toeplitz Neural Network with Constant-time Inference
  Complexity
Accelerating Toeplitz Neural Network with Constant-time Inference Complexity
Zhen Qin
Yiran Zhong
26
6
0
15 Nov 2023
Retrieve and Copy: Scaling ASR Personalization to Large Catalogs
Retrieve and Copy: Scaling ASR Personalization to Large Catalogs
Sai Muralidhar Jayanthi
Devang Kulshreshtha
Saket Dingliwal
S. Ronanki
S. Bodapati
46
7
0
14 Nov 2023
Reimagining Speech: A Scoping Review of Deep Learning-Powered Voice
  Conversion
Reimagining Speech: A Scoping Review of Deep Learning-Powered Voice Conversion
A. R. Bargum
Stefania Serafin
Cumhur Erkut
28
3
0
14 Nov 2023
On the Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition
On the Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition
Xiaohan Shi
Jiajun He
Xingfeng Li
Tomoki Toda
36
4
0
13 Nov 2023
Decoupling and Interacting Multi-Task Learning Network for Joint Speech
  and Accent Recognition
Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition
Qijie Shao
Pengcheng Guo
Jinghao Yan
Pengfei Hu
Lei Xie
32
8
0
13 Nov 2023
AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs
AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs
Yassir Fathullah
Chunyang Wu
Egor Lakomkin
Ke Li
Junteng Jia
Shangguan Yuan
Jay Mahadeokar
Ozlem Kalinli
Christian Fuegen
Michael Seltzer
LM&MA
MLLM
AuLLM
29
35
0
12 Nov 2023
Sparse Attention-Based Neural Networks for Code Classification
Sparse Attention-Based Neural Networks for Code Classification
Ziyang Xiang
Zaixin Zhang
Qi Liu
20
0
0
11 Nov 2023
A comparative analysis between Conformer-Transducer, Whisper, and
  wav2vec2 for improving the child speech recognition
A comparative analysis between Conformer-Transducer, Whisper, and wav2vec2 for improving the child speech recognition
Andrei Barcovschi
Rishabh Jain
Peter Corcoran
21
3
0
07 Nov 2023
p-Laplacian Transformer
p-Laplacian Transformer
Tuan Nguyen
Tam Nguyen
Vinh-Tiep Nguyen
Tan-Minh Nguyen
84
0
0
06 Nov 2023
Pseudo-Labeling for Domain-Agnostic Bangla Automatic Speech Recognition
Pseudo-Labeling for Domain-Agnostic Bangla Automatic Speech Recognition
R. N. Nandi
Mehadi Hasan Menon
Tareq Al Muntasir
Sagor Sarker
Quazi Sarwar Muhtaseem
Md. Tariqul Islam
Shammur A. Chowdhury
Firoj Alam
37
3
0
06 Nov 2023
Personalizing Keyword Spotting with Speaker Information
Personalizing Keyword Spotting with Speaker Information
Beltrán Labrador
Pai Zhu
Guanlong Zhao
Angelo Scorza Scarpati
Quan Wang
Alicia Lozano-Diez
Alex Park
Ignacio López Moreno
26
1
0
06 Nov 2023
Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic
  Token Prediction
Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction
Minchan Kim
Myeonghun Jeong
Byoung Jin Choi
Dongjune Lee
N. Kim
AI4TS
40
10
0
06 Nov 2023
Attention or Convolution: Transformer Encoders in Audio Language Models
  for Inference Efficiency
Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency
Sungho Jeon
Ching-Feng Yeh
Hakan Inan
Wei-Ning Hsu
Rashi Rungta
Yashar Mehdad
Daniel M. Bikel
36
0
0
05 Nov 2023
DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with
  Diffusion Auto-encoder
DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder
Tao Liu
Chenpeng Du
Shuai Fan
Feilong Chen
Kai Yu
DiffM
VGen
41
6
0
03 Nov 2023
Are cascade dialogue state tracking models speaking out of turn in
  spoken dialogues?
Are cascade dialogue state tracking models speaking out of turn in spoken dialogues?
Lucas Druart
Léo Jacqmin
Benoit Favre
L. Rojas-Barahona
Valentin Vielzeuf
40
0
0
03 Nov 2023
TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices
TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices
Jianlei Yang
Jiacheng Liao
Fanding Lei
Meichen Liu
Junyi Chen
Lingkun Long
Han Wan
Bei Yu
Weisheng Zhao
MoE
40
2
0
03 Nov 2023
RIR-SF: Room Impulse Response Based Spatial Feature for Target Speech
  Recognition in Multi-Channel Multi-Speaker Scenarios
RIR-SF: Room Impulse Response Based Spatial Feature for Target Speech Recognition in Multi-Channel Multi-Speaker Scenarios
Yiwen Shao
Shi-Xiong Zhang
Dong Yu
36
0
0
31 Oct 2023
DiffSpectralNet : Unveiling the Potential of Diffusion Models for
  Hyperspectral Image Classification
DiffSpectralNet : Unveiling the Potential of Diffusion Models for Hyperspectral Image Classification
Neetu Sigger
Tuan T. Nguyen
Gianluca Tozzi
Quoc-Tuan Vien
Sinh Van Nguyen
DiffM
MedIm
24
3
0
29 Oct 2023
TorchAudio 2.1: Advancing speech recognition, self-supervised learning,
  and audio processing components for PyTorch
TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Jeff Hwang
Moto Hira
Caroline Chen
Xiaohui Zhang
Zhaoheng Ni
...
Yumeng Tao
Robin Scheibler
Samuele Cornell
Sean Kim
Stavros Petridis
48
22
0
27 Oct 2023
The IMS Toucan System for the Blizzard Challenge 2023
The IMS Toucan System for the Blizzard Challenge 2023
Florian Lux
Julia Koch
Sarina Meyer
Thomas Bott
Nadja Schauffler
Pavel Denisov
Antje Schweitzer
Ngoc Thang Vu
32
6
0
26 Oct 2023
CL-MASR: A Continual Learning Benchmark for Multilingual ASR
CL-MASR: A Continual Learning Benchmark for Multilingual ASR
Luca Della Libera
Pooneh Mousavi
Salah Zaiem
Cem Subakan
Mirco Ravanelli
AuLLM
CLL
53
13
0
25 Oct 2023
Accented Speech Recognition With Accent-specific Codebooks
Accented Speech Recognition With Accent-specific Codebooks
Darshan Prabhu
Preethi Jyothi
Sriram Ganapathy
Vinit Unni
40
7
0
24 Oct 2023
How Much Context Does My Attention-Based ASR System Need?
How Much Context Does My Attention-Based ASR System Need?
Robert Flynn
Anton Ragni
37
1
0
24 Oct 2023
CDSD: Chinese Dysarthria Speech Database
CDSD: Chinese Dysarthria Speech Database
Mengyi Sun
Ming Gao
Xinchen Kang
Shiru Wang
Jun Du
Dengfeng Yao
Su-Jing Wang
40
3
0
24 Oct 2023
How To Build Competitive Multi-gender Speech Translation Models For
  Controlling Speaker Gender Translation
How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation
Marco Gaido
Dennis Fucci
Matteo Negri
L. Bentivogli
46
2
0
23 Oct 2023
Key Frame Mechanism For Efficient Conformer Based End-to-end Speech
  Recognition
Key Frame Mechanism For Efficient Conformer Based End-to-end Speech Recognition
Peng Fan
Changhao Shan
Sining Sun
Qing Yang
Jianwei Zhang
30
3
0
23 Oct 2023
PartialFormer: Modeling Part Instead of Whole for Machine Translation
PartialFormer: Modeling Part Instead of Whole for Machine Translation
Tong Zheng
Bei Li
Huiwen Bao
Jiale Wang
Weiqiao Shan
Tong Xiao
Jingbo Zhu
MoE
AI4CE
16
0
0
23 Oct 2023
Conversational Speech Recognition by Learning Audio-textual Cross-modal
  Contextual Representation
Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation
Kun Wei
Bei Li
Hang Lv
Quan Lu
Ning Jiang
Lei Xie
49
3
0
22 Oct 2023
A General Theory for Softmax Gating Multinomial Logistic Mixture of
  Experts
A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts
Huy Nguyen
Pedram Akbarian
TrungTin Nguyen
Nhat Ho
37
11
0
22 Oct 2023
The CHiME-7 Challenge: System Description and Performance of NeMo Team's
  DASR System
The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System
T. Park
He Huang
Ante Jukić
Kunal Dhawan
Krishna C. Puvvada
Nithin Rao Koluguri
Nikolay Karpov
A. Laptev
Jagadeesh Balam
Boris Ginsburg
40
6
0
18 Oct 2023
Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for
  Long Sequences
Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for Long Sequences
Yanming Kang
Giang Tran
H. Sterck
23
3
0
18 Oct 2023
BUT CHiME-7 system description
BUT CHiME-7 system description
M. Karafiát
Karel Veselý
Igor Szöke
Ladislav Mošner
Karel Beneš
Marcin Witkowski
Germán Barchi
L. Pepino
37
1
0
18 Oct 2023
Audio-AdapterFusion: A Task-ID-free Approach for Efficient and
  Non-Destructive Multi-task Speech Recognition
Audio-AdapterFusion: A Task-ID-free Approach for Efficient and Non-Destructive Multi-task Speech Recognition
Hillary Ngai
Rohan Agrawal
Neeraj Gaur
Ronny Huang
Parisa Haghani
P. M. Mengibar
MoMe
48
0
0
17 Oct 2023
Robust Wake-Up Word Detection by Two-stage Multi-resolution Ensembles
Robust Wake-Up Word Detection by Two-stage Multi-resolution Ensembles
F. López
Jordi Luque
Carlos Segura
Pablo Gómez
33
0
0
17 Oct 2023
Generative error correction for code-switching speech recognition using
  large language models
Generative error correction for code-switching speech recognition using large language models
Chen Chen
Yuchen Hu
Chao-Han Huck Yang
Hexin Liu
Sabato Marco Siniscalchi
Chng Eng Siong
37
8
0
17 Oct 2023
Zipformer: A faster and better encoder for automatic speech recognition
Zipformer: A faster and better encoder for automatic speech recognition
Zengwei Yao
Liyong Guo
Xiaoyu Yang
Wei Kang
Fangjun Kuang
Yifan Yang
Zengrui Jin
Long Lin
Daniel Povey
VLM
38
65
0
17 Oct 2023
Leveraging Diverse Semantic-based Audio Pretrained Models for Singing
  Voice Conversion
Leveraging Diverse Semantic-based Audio Pretrained Models for Singing Voice Conversion
Xueyao Zhang
Yicheng Gu
Haopeng Chen
Zihao Fang
Lexiao Zou
Junan Zhang
Liumeng Xue
Jinchao Zhang
Jie Zhou
Zhizheng Wu
DiffM
43
1
0
17 Oct 2023
Iterative Shallow Fusion of Backward Language Model for End-to-End
  Speech Recognition
Iterative Shallow Fusion of Backward Language Model for End-to-End Speech Recognition
A. Ogawa
Takafumi Moriya
Naoyuki Kamo
Naohiro Tawara
Marc Delcroix
23
1
0
17 Oct 2023
Detecting Speech Abnormalities with a Perceiver-based Sequence
  Classifier that Leverages a Universal Speech Model
Detecting Speech Abnormalities with a Perceiver-based Sequence Classifier that Leverages a Universal Speech Model
H. Soltau
Izhak Shafran
Alex Ottenwess
Joseph R. Duffy
Rene L. Utianski
L. Barnard
John L. Stricker
D. Wiepert
David T. Jones
Hugo Botha
62
3
0
16 Oct 2023
Personalization of CTC-based End-to-End Speech Recognition Using
  Pronunciation-Driven Subword Tokenization
Personalization of CTC-based End-to-End Speech Recognition Using Pronunciation-Driven Subword Tokenization
Zhihong Lei
Ernest Pusateri
Shiyi Han
Leo Liu
Mingbin Xu
...
R. Travadi
Youyuan Zhang
Mirko Hannemann
Man-Hung Siu
Zhen Huang
23
9
0
16 Oct 2023
Homophone Disambiguation Reveals Patterns of Context Mixing in Speech
  Transformers
Homophone Disambiguation Reveals Patterns of Context Mixing in Speech Transformers
Hosein Mohebbi
Grzegorz Chrupała
Willem H. Zuidema
Afra Alishahi
36
12
0
15 Oct 2023
SelfVC: Voice Conversion With Iterative Refinement using Self
  Transformations
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Paarth Neekhara
Shehzeen Samarah Hussain
Rafael Valle
Boris Ginsburg
Rishabh Ranjan
Shlomo Dubnov
F. Koushanfar
Julian McAuley
28
3
0
14 Oct 2023
SALM: Speech-augmented Language Model with In-context Learning for
  Speech Recognition and Translation
SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation
Zhehuai Chen
He Huang
A. Andrusenko
Oleksii Hrinchuk
Krishna C. Puvvada
Jason Chun Lok Li
Subhankar Ghosh
Jagadeesh Balam
Boris Ginsburg
LRM
34
51
0
13 Oct 2023
Previous
123...111213...333435
Next