ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,750 papers shown
Title
Asca: less audio data is more insightful
Asca: less audio data is more insightful
Xiang Li
Jing Chen
Chao Li
Hongwu Lv
20
0
0
23 Sep 2023
Two vs. Four-Channel Sound Event Localization and Detection
Two vs. Four-Channel Sound Event Localization and Detection
J. Wilkins
Magdalena Fuentes
Luca Bondi
Shabnam Ghaffarzadegan
A. Abavisani
J. P. Bello
16
1
0
23 Sep 2023
ClusterFormer: Clustering As A Universal Visual Learner
ClusterFormer: Clustering As A Universal Visual Learner
James Liang
Yiming Cui
Qifan Wang
Tong Geng
Wenguan Wang
Dongfang Liu
VLM
44
9
0
22 Sep 2023
Memory-augmented conformer for improved end-to-end long-form ASR
Memory-augmented conformer for improved end-to-end long-form ASR
Carlos Carvalho
A. Abad
RALM
37
1
0
22 Sep 2023
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient
  Pruning of A Multilingual ASR Model
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
Jiamin Xie
Ke Li
Jinxi Guo
Andros Tjandra
Shangguan Yuan
Leda Sari
Chunyang Wu
Junteng Jia
Jay Mahadeokar
Ozlem Kalinli
38
2
0
22 Sep 2023
Importance of Smoothness Induced by Optimizers in FL4ASR: Towards
  Understanding Federated Learning for End-to-End ASR
Importance of Smoothness Induced by Optimizers in FL4ASR: Towards Understanding Federated Learning for End-to-End ASR
Sheikh Shams Azam
Tatiana Likhomanenko
Martin Pelikan
Jan Honza Silovsky
40
6
0
22 Sep 2023
Massive End-to-end Models for Short Search Queries
Massive End-to-end Models for Short Search Queries
Weiran Wang
Rohit Prabhavalkar
Dongseong Hwang
Qiujia Li
K. Sim
...
Zhong Meng
CJ Zheng
Yanzhang He
Tara N. Sainath
P. M. Mengibar
34
2
0
22 Sep 2023
Big model only for hard audios: Sample dependent Whisper model selection
  for efficient inferences
Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences
Hugo Malard
Salah Zaiem
Robin Algayres
50
2
0
22 Sep 2023
Vision Transformers for Computer Go
Vision Transformers for Computer Go
Amani Sagri
Tristan Cazenave
Jérôme Arjonilla
Abdallah Saffidine
ViT
13
2
0
22 Sep 2023
Bridging the Gaps of Both Modality and Language: Synchronous Bilingual
  CTC for Speech Translation and Speech Recognition
Bridging the Gaps of Both Modality and Language: Synchronous Bilingual CTC for Speech Translation and Speech Recognition
Chen Xu
Xiaoqian Liu
Erfeng He
Yuhao Zhang
Qianqian Dong
Tong Xiao
Jingbo Zhu
Dapeng Man
Wu Yang
40
0
0
21 Sep 2023
BELT:Bootstrapping Electroencephalography-to-Language Decoding and Zero-Shot Sentiment Classification by Natural Language Supervision
Jinzhao Zhou
Yiqun Duan
Yu-Cheng Chang
Yu-Kai Wang
Chin-Teng Lin
44
6
0
21 Sep 2023
CoMFLP: Correlation Measure based Fast Search on ASR Layer Pruning
CoMFLP: Correlation Measure based Fast Search on ASR Layer Pruning
W. Liu
Zhiyuan Peng
Tan Lee
19
1
0
21 Sep 2023
Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement
Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement
Shafique Ahmed
Chia-Wei Chen
Wenze Ren
Chin-Jou Li
Ernie Chu
Jun-Cheng Chen
Amir Hussain
H. Wang
Yu Tsao
Jen-Cheng Hou
30
1
0
20 Sep 2023
Test-Time Training for Speech
Test-Time Training for Speech
Sri Harsha Dumpala
Chandramouli Shama Sastry
Sageev Oore
46
1
0
19 Sep 2023
Semi-Autoregressive Streaming ASR With Label Context
Semi-Autoregressive Streaming ASR With Label Context
Siddhant Arora
G. Saon
Shinji Watanabe
Brian Kingsbury
AI4TS
28
5
0
19 Sep 2023
Discrete Audio Representation as an Alternative to Mel-Spectrograms for
  Speaker and Speech Recognition
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition
Krishna C. Puvvada
Nithin Rao Koluguri
Kunal Dhawan
Jagadeesh Balam
Boris Ginsburg
41
13
0
19 Sep 2023
End-to-End Speech Recognition Contextualization with Large Language
  Models
End-to-End Speech Recognition Contextualization with Large Language Models
Egor Lakomkin
Chunyang Wu
Yassir Fathullah
Ozlem Kalinli
M. Seltzer
Christian Fuegen
57
17
0
19 Sep 2023
Multimodal Modeling For Spoken Language Identification
Multimodal Modeling For Spoken Language Identification
Shikhar Bharadwaj
Min Ma
Shikhar Vashishth
Ankur Bapna
Sriram Ganapathy
...
Yu Zhang
D. Esch
Sandy Ritchie
Partha P. Talukdar
Jason Riesa
44
0
0
19 Sep 2023
HTEC: Human Transcription Error Correction
HTEC: Human Transcription Error Correction
Hanbo Sun
Jian Gao
Xiaomin Wu
Anjie Fang
Cheng Cao
Zheng Du
26
1
0
18 Sep 2023
Investigating End-to-End ASR Architectures for Long Form Audio
  Transcription
Investigating End-to-End ASR Architectures for Long Form Audio Transcription
Nithin Rao Koluguri
Samuel Kriman
Georgy Zelenfroind
Somshubra Majumdar
Dima Rekesh
Vahid Noroozi
Jagadeesh Balam
Boris Ginsburg
AuLLM
39
9
0
18 Sep 2023
Corpus Synthesis for Zero-shot ASR domain Adaptation using Large
  Language Models
Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models
Hsuan Su
Ting-Yao Hu
H. Koppula
Raviteja Vemulapalli
Jen-Hao Rick Chang
Karren D. Yang
G. Mantena
Oncel Tuzel
SyDa
44
7
0
18 Sep 2023
Electrolaryngeal Speech Intelligibility Enhancement Through Robust
  Linguistic Encoders
Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders
Lester Phillip Violeta
Wen-Chin Huang
D. Ma
Ryuichi Yamamoto
Kazuhiro Kobayashi
Tomoki Toda
27
3
0
18 Sep 2023
Improved Factorized Neural Transducer Model For text-only Domain
  Adaptation
Improved Factorized Neural Transducer Model For text-only Domain Adaptation
Jing Liu
Jianwei Yu
Xie Chen
48
1
0
18 Sep 2023
Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding
  with Sequence-to-Sequence Architecture
Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture
Gaobin Yang
Maokui He
Shutong Niu
Ruoyu Wang
Yanyan Yue
Shuangqing Qian
Shilong Wu
Jun Du
Chin-Hui Lee
34
11
0
17 Sep 2023
Enhancing Quantised End-to-End ASR Models via Personalisation
Enhancing Quantised End-to-End ASR Models via Personalisation
Qiuming Zhao
Guangzhi Sun
Chao Zhang
Mingxing Xu
Thomas Fang Zheng
MQ
41
2
0
17 Sep 2023
Improving Speech Recognition for African American English With Audio
  Classification
Improving Speech Recognition for African American English With Audio Classification
Shefali Garg
Zhouyuan Huo
K. Sim
Suzan Schwartz
Mason Chua
...
Zion Mengesha
Dongseong Hwang
Tara N. Sainath
Francoise Beaufays
P. M. Mengibar
42
4
0
16 Sep 2023
Boosting End-to-End Multilingual Phoneme Recognition through Exploiting
  Universal Speech Attributes Constraints
Boosting End-to-End Multilingual Phoneme Recognition through Exploiting Universal Speech Attributes Constraints
Hao Yen
Sabato Marco Siniscalchi
Chin-Hui Lee
42
1
0
16 Sep 2023
Augmenting conformers with structured state-space sequence models for
  online speech recognition
Augmenting conformers with structured state-space sequence models for online speech recognition
Haozhe Shan
Albert Gu
Zhong Meng
Weiran Wang
Krzysztof Choromanski
Tara N. Sainath
RALM
32
4
0
15 Sep 2023
Visual Speech Recognition for Languages with Limited Labeled Data using
  Automatic Labels from Whisper
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper
Jeong Hun Yeo
Minsu Kim
Shinji Watanabe
Y. Ro
VLM
34
12
0
15 Sep 2023
Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary
  Network
Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network
Yiling Huang
Weiran Wang
Guanlong Zhao
Hank Liao
Wei Xia
Quan Wang
29
4
0
15 Sep 2023
Chunked Attention-based Encoder-Decoder Model for Streaming Speech
  Recognition
Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition
Mohammad Zeineldeen
Albert Zeyer
Ralf Schluter
Hermann Ney
AuLLM
31
4
0
15 Sep 2023
HM-Conformer: A Conformer-based audio deepfake detection system with
  hierarchical pooling and multi-level classification token aggregation methods
HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods
Hyun-Seo Shin
Ju-Sung Heo
Ju-ho Kim
Chanmann Lim
Wonbin Kim
Ha-Jin Yu
35
5
0
15 Sep 2023
Unimodal Aggregation for CTC-based Speech Recognition
Unimodal Aggregation for CTC-based Speech Recognition
Ying Fang
Xiaofei Li
36
1
0
15 Sep 2023
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech
  Using Natural Language Descriptions
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions
Reo Shimizu
Ryuichi Yamamoto
Masaya Kawamura
Yuma Shirahata
Hironori Doi
Tatsuya Komatsu
Kentaro Tachibana
DiffM
29
20
0
15 Sep 2023
t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation
  Capability
t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability
Jian Wu
Naoyuki Kanda
Takuya Yoshioka
Rui Zhao
Zhuo Chen
Jinyu Li
21
5
0
15 Sep 2023
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and
  context
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
Wei Kang
Xiaoyu Yang
Zengwei Yao
Fangjun Kuang
Yifan Yang
Liyong Guo
Long Lin
Daniel Povey
32
44
0
15 Sep 2023
USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained
  Foundation Models
USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models
Guanlong Zhao
Yongqiang Wang
Jason W. Pelecanos
Yu Zhang
Hank Liao
Yiling Huang
Han Lu
Quan Wang
27
4
0
14 Sep 2023
DiariST: Streaming Speech Translation with Speaker Diarization
DiariST: Streaming Speech Translation with Speaker Diarization
Muqiao Yang
Naoyuki Kanda
Xiaofei Wang
Junkun Chen
Peidong Wang
Jian Xue
Jinyu Li
Takuya Yoshioka
32
6
0
14 Sep 2023
Folding Attention: Memory and Power Optimization for On-Device
  Transformer-based Streaming Speech Recognition
Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
Yang Li
Liangzhen Lai
Shangguan Yuan
Forrest N. Iandola
Zhaoheng Ni
Ernie Chang
Yangyang Shi
Vikas Chandra
34
2
0
14 Sep 2023
CoLLD: Contrastive Layer-to-layer Distillation for Compressing
  Multilingual Pre-trained Speech Encoders
CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders
Heng-Jui Chang
Ning Dong
Ruslan Mavlyutov
Sravya Popuri
Yu-An Chung
47
6
0
14 Sep 2023
Aligning Speakers: Evaluating and Visualizing Text-based Diarization
  Using Efficient Multiple Sequence Alignment (Extended Version)
Aligning Speakers: Evaluating and Visualizing Text-based Diarization Using Efficient Multiple Sequence Alignment (Extended Version)
Chen Gong
Peilin Wu
Jinho Choi
28
1
0
14 Sep 2023
Incorporating Class-based Language Model for Named Entity Recognition in
  Factorized Neural Transducer
Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer
Peng Wang
Yifan Yang
Zheng Liang
Tian Tan
Shiliang Zhang
Xie Chen
25
0
0
14 Sep 2023
AAS-VC: On the Generalization Ability of Automatic Alignment Search
  based Non-autoregressive Sequence-to-sequence Voice Conversion
AAS-VC: On the Generalization Ability of Automatic Alignment Search based Non-autoregressive Sequence-to-sequence Voice Conversion
Wen-Chin Huang
Kazuhiro Kobayashi
Tomoki Toda
21
2
0
14 Sep 2023
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
Yongqiang Wang
Jionghao Bai
Rongjie Huang
Ruiqi Li
Zhiqing Hong
Zhou Zhao
24
3
0
14 Sep 2023
Outlier-aware Inlier Modeling and Multi-scale Scoring for Anomalous
  Sound Detection via Multitask Learning
Outlier-aware Inlier Modeling and Multi-scale Scoring for Anomalous Sound Detection via Multitask Learning
Yucong Zhang
Hongbin Suo
Yulong Wan
Ming Li
32
4
0
14 Sep 2023
CPPF: A contextual and post-processing-free model for automatic speech
  recognition
CPPF: A contextual and post-processing-free model for automatic speech recognition
Lei Zhang
Zhengkun Tian
Xiang Chen
Jiaming Sun
Hongyu Xiang
Ke Ding
Guanglu Wan
39
0
0
14 Sep 2023
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Yifan Yang
Feiyu Shen
Chenpeng Du
Ziyang Ma
K. Yu
Daniel Povey
Xie Chen
43
26
0
14 Sep 2023
Attention-based Encoder-Decoder End-to-End Neural Diarization with
  Embedding Enhancer
Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer
Zhengyang Chen
Bing Han
Shuai Wang
Yan-min Qian
33
18
0
13 Sep 2023
Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio
  Representation
Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation
Anna Deichler
Shivam Mehta
Simon Alexanderson
Jonas Beskow
DiffM
25
24
0
11 Sep 2023
Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of
  SSWP
Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP
Jinzuomu Zhong
Yang Li
Hui Huang
Korin Richmond
Jie Liu
Zhiba Su
Jing Guo
Benlai Tang
Fengjie Zhu
23
1
0
11 Sep 2023
Previous
123...131415...333435
Next