ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,750 papers shown
Title
AugSumm: towards generalizable speech summarization using synthetic
  labels from large language model
AugSumm: towards generalizable speech summarization using synthetic labels from large language model
Jee-weon Jung
Roshan S. Sharma
William Chen
Bhiksha Raj
Shinji Watanabe
53
4
0
10 Jan 2024
Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
Jiawen Kang
Lingwei Meng
Mingyu Cui
Haohan Guo
Xixin Wu
Xunying Liu
Helen M. Meng
59
6
0
08 Jan 2024
The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in
  CNVSRC 2023
The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in CNVSRC 2023
He Wang
Pengcheng Guo
Wei Chen
Pan Zhou
Lei Xie
34
2
0
07 Jan 2024
Multichannel AV-wav2vec2: A Framework for Learning Multichannel
  Multi-Modal Speech Representation
Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation
Qiu-shi Zhu
Jie Zhang
Yu Gu
Yuli Hu
Lirong Dai
SSL
46
11
0
07 Jan 2024
MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech
  Recognition
MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition
He Wang
Pengcheng Guo
Pan Zhou
Lei Xie
39
12
0
07 Jan 2024
conv_einsum: A Framework for Representation and Fast Evaluation of
  Multilinear Operations in Convolutional Tensorial Neural Networks
conv_einsum: A Framework for Representation and Fast Evaluation of Multilinear Operations in Convolutional Tensorial Neural Networks
Tahseen Rabbani
Jiahao Su
Xiaoyu Liu
David Chan
Geoffrey Sangston
Furong Huang
36
1
0
07 Jan 2024
TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in
  End-to-End ASR
TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR
Nagarathna Ravi
Thishyan Raj T
Vipul Arora
29
3
0
06 Jan 2024
CogGPT: Unleashing the Power of Cognitive Dynamics on Large Language
  Models
CogGPT: Unleashing the Power of Cognitive Dynamics on Large Language Models
Yaojia Lv
Haojie Pan
Ruiji Fu
Ming Liu
Zhongyuan Wang
Bing Qin
35
5
0
06 Jan 2024
Pheme: Efficient and Conversational Speech Generation
Pheme: Efficient and Conversational Speech Generation
Paweł Budzianowski
Taras Sereda
Tomasz Cichy
Ivan Vulić
34
7
0
05 Jan 2024
Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic
  Speech Recognition
Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition
David M. Chan
Shalini Ghosh
Hitesh Tulsiani
Ariya Rastrow
Björn Hoffmeister
30
1
0
04 Jan 2024
Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic
  Token Prediction
Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction
Minchan Kim
Myeonghun Jeong
Byoung Jin Choi
Semin Kim
Joun Yeop Lee
Nam Soo Kim
AI4TS
30
4
0
03 Jan 2024
Efficient Parallel Audio Generation using Group Masked Language Modeling
Efficient Parallel Audio Generation using Group Masked Language Modeling
Myeonghun Jeong
Minchan Kim
Joun Yeop Lee
Nam Soo Kim
30
5
0
02 Jan 2024
Stateful Conformer with Cache-based Inference for Streaming Automatic
  Speech Recognition
Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition
Vahid Noroozi
Somshubra Majumdar
Ankur Kumar
Jagadeesh Balam
Boris Ginsburg
43
10
0
27 Dec 2023
Heterogeneous Encoders Scaling In The Transformer For Neural Machine
  Translation
Heterogeneous Encoders Scaling In The Transformer For Neural Machine Translation
J. Hu
Roberto Cavicchioli
Giulia Berardinelli
Alessandro Capotondi
44
2
0
26 Dec 2023
Deformable Audio Transformer for Audio Event Detection
Deformable Audio Transformer for Audio Event Detection
Wentao Zhu
28
0
0
24 Dec 2023
Self-Supervised Adaptive AV Fusion Module for Pre-Trained ASR Models
Self-Supervised Adaptive AV Fusion Module for Pre-Trained ASR Models
Christopher Simic
Tobias Bocklet
34
5
0
21 Dec 2023
Style Modeling for Multi-Speaker Articulation-to-Speech
Style Modeling for Multi-Speaker Articulation-to-Speech
Miseul Kim
Zhenyu Piao
Jihyun Lee
Hong-Goo Kang
31
8
0
21 Dec 2023
kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
Jiaming Zhou
Shiwan Zhao
Yaqi Liu
Wenjia Zeng
Yong Chen
Yong Qin
39
9
0
21 Dec 2023
Unconstrained Dysfluency Modeling for Dysfluent Speech Transcription and
  Detection
Unconstrained Dysfluency Modeling for Dysfluent Speech Transcription and Detection
Jiachen Lian
Carly Feng
Naasir Farooqi
Steve Li
Anshul Kashyap
Cheol Jun Cho
Peter Wu
Robin Netzorg
Tingle Li
Gopala Krishna Anumanchipalli
49
13
0
20 Dec 2023
Efficiency-oriented approaches for self-supervised speech representation
  learning
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
38
1
0
18 Dec 2023
Improved Long-Form Speech Recognition by Jointly Modeling the Primary
  and Non-primary Speakers
Improved Long-Form Speech Recognition by Jointly Modeling the Primary and Non-primary Speakers
Guru Prakash Arumugam
Shuo-yiin Chang
Tara N. Sainath
Rohit Prabhavalkar
Quan Wang
Shaan Bijwadia
34
3
0
18 Dec 2023
Generative linguistic representation for spoken language identification
Generative linguistic representation for spoken language identification
Peng Shen
Xuguang Lu
Hisashi Kawai
22
0
0
18 Dec 2023
Conformer-Based Speech Recognition On Extreme Edge-Computing Devices
Conformer-Based Speech Recognition On Extreme Edge-Computing Devices
Mingbin Xu
Alex Jin
Sicheng Wang
Mu Su
Tim Ng
...
Shiyi Han
Zhihong Lei
Yaqiao Deng
Zhen Huang
Mahesh Krishnamoorthy
24
4
0
16 Dec 2023
On the compression of shallow non-causal ASR models using knowledge
  distillation and tied-and-reduced decoder for low-latency on-device speech
  recognition
On the compression of shallow non-causal ASR models using knowledge distillation and tied-and-reduced decoder for low-latency on-device speech recognition
Nagaraj Adiga
Jinhwan Park
Chintigari Shiva Kumar
Shatrughan Singh
Kyungmin Lee
Chanwoo Kim
Dhananjaya N. Gowda
26
1
0
15 Dec 2023
U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword
  Bias
U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword Bias
Aoting Zhang
Pan Zhou
Kaixun Huang
Yong Zou
Ming Liu
Lei Xie
29
3
0
15 Dec 2023
LiteVSR: Efficient Visual Speech Recognition by Learning from Speech
  Representations of Unlabeled Data
LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data
Hendrik Laux
Emil Mededovic
Ahmed Hallawa
Lukas Martin
A. Peine
Anke Schmeink
VLM
26
4
0
15 Dec 2023
Audio-visual fine-tuning of audio-only ASR models
Audio-visual fine-tuning of audio-only ASR models
Avner May
Dmitriy Serdyuk
Ankit Parag Shah
Otavio Braga
Olivier Siohan
31
3
0
14 Dec 2023
Fusion of Audio and Visual Embeddings for Sound Event Localization and
  Detection
Fusion of Audio and Visual Embeddings for Sound Event Localization and Detection
Davide Berghi
Peipei Wu
Jinzheng Zhao
Wenwu Wang
Philip J. B. Jackson
36
10
0
14 Dec 2023
Multi-CMGAN+/+: Leveraging Multi-Objective Speech Quality Metric
  Prediction for Speech Enhancement
Multi-CMGAN+/+: Leveraging Multi-Objective Speech Quality Metric Prediction for Speech Enhancement
George Close
William Ravenscroft
Thomas Hain
Stefan Goetze
37
2
0
14 Dec 2023
RdimKD: Generic Distillation Paradigm by Dimensionality Reduction
RdimKD: Generic Distillation Paradigm by Dimensionality Reduction
Yi Guo
Yiqian He
Xiaoyang Li
Haotong Qin
Van Tung Pham
Yang Zhang
Shouda Liu
53
1
0
14 Dec 2023
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross
  Attention
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention
Junjie Li
Yiwei Guo
Xie Chen
Kai Yu
50
13
0
14 Dec 2023
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech
  Recognition with Universal Speech Models
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
Shaojin Ding
David Qiu
David Rim
Yanzhang He
Oleg Rybakov
...
Tara N. Sainath
Zhonglin Han
Jian Li
Amir Yazdanbakhsh
Shivani Agrawal
MQ
36
9
0
13 Dec 2023
On Robustness to Missing Video for Audiovisual Speech Recognition
On Robustness to Missing Video for Audiovisual Speech Recognition
Oscar Chang
Otavio Braga
H. Liao
Dmitriy Serdyuk
Olivier Siohan
45
11
0
13 Dec 2023
Revisiting the Entropy Semiring for Neural Speech Recognition
Revisiting the Entropy Semiring for Neural Speech Recognition
Oscar Chang
DongSeon Hwang
Olivier Siohan
37
2
0
13 Dec 2023
Polynomial-based Self-Attention for Table Representation learning
Polynomial-based Self-Attention for Table Representation learning
Jayoung Kim
Yehjin Shin
Jeongwhan Choi
Hyowon Wi
Noseong Park
LMTD
45
2
0
12 Dec 2023
BIRB: A Generalization Benchmark for Information Retrieval in
  Bioacoustics
BIRB: A Generalization Benchmark for Information Retrieval in Bioacoustics
Jenny Hamer
Eleni Triantafillou
B. V. Merrienboer
Stefan Kahl
Holger Klinck
Tom Denton
Vincent Dumoulin
44
14
0
12 Dec 2023
The GUA-Speech System Description for CNVSRC Challenge 2023
The GUA-Speech System Description for CNVSRC Challenge 2023
Shengqiang Li
Chao Lei
Baozhong Ma
Binbin Zhang
Fuping Pan
29
0
0
12 Dec 2023
Robust End-to-End Diarization with Domain Adaptive Training and
  Multi-Task Learning
Robust End-to-End Diarization with Domain Adaptive Training and Multi-Task Learning
Ivan Fung
Lahiru Samarakoon
Samuel J. Broughton
OOD
37
2
0
12 Dec 2023
Survey on Foundation Models for Prognostics and Health Management in Industrial Cyber-Physical Systems
Ruonan Liu
Quanhu Zhang
Te Han
AI4CE
49
2
0
11 Dec 2023
Transformer Attractors for Robust and Efficient End-to-End Neural
  Diarization
Transformer Attractors for Robust and Efficient End-to-End Neural Diarization
Lahiru Samarakoon
Samuel J. Broughton
Marc Härkönen
Ivan Fung
37
6
0
11 Dec 2023
Concrete Subspace Learning based Interference Elimination for Multi-task
  Model Fusion
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion
Anke Tang
Li Shen
Yong Luo
Liang Ding
Han Hu
Bo Du
Dacheng Tao
MoMe
46
21
0
11 Dec 2023
DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors
DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors
Federico Landini
Mireia Díez
Themos Stafylakis
Lukávs Burget
41
11
0
07 Dec 2023
Graph Convolutions Enrich the Self-Attention in Transformers!
Graph Convolutions Enrich the Self-Attention in Transformers!
Jeongwhan Choi
Hyowon Wi
Jayoung Kim
Yehjin Shin
Kookjin Lee
Nathaniel Trask
Noseong Park
48
4
0
07 Dec 2023
Bigger is not Always Better: The Effect of Context Size on Speech
  Pre-Training
Bigger is not Always Better: The Effect of Context Size on Speech Pre-Training
Sean Robertson
Ewan Dunbar
SSL
32
1
0
03 Dec 2023
Mitigating Over-smoothing in Transformers via Regularized Nonlocal
  Functionals
Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals
Tam Nguyen
Tan-Minh Nguyen
Richard G. Baraniuk
29
8
0
01 Dec 2023
Self-Supervised Learning of Spatial Acoustic Representation with
  Cross-Channel Signal Reconstruction and Multi-Channel Conformer
Self-Supervised Learning of Spatial Acoustic Representation with Cross-Channel Signal Reconstruction and Multi-Channel Conformer
Bing Yang
Xiaofei Li
SSL
44
3
0
01 Dec 2023
Speech Understanding on Tiny Devices with A Learning Cache
Speech Understanding on Tiny Devices with A Learning Cache
A. Benazir
Zhiming Xu
Felix Xiaozhu Lin
37
1
0
30 Nov 2023
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced
  Training
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Pavan Kumar Anasosalu Vasu
Hadi Pouransari
Fartash Faghri
Raviteja Vemulapalli
Oncel Tuzel
CLIP
VLM
49
44
0
28 Nov 2023
D4AM: A General Denoising Framework for Downstream Acoustic Models
D4AM: A General Denoising Framework for Downstream Acoustic Models
H. Wang
Yu Tsao
Hsin-Min Wang
Chu-Song Chen
21
4
0
28 Nov 2023
A Quantitative Approach to Understand Self-Supervised Models as
  Cross-lingual Feature Extractors
A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extractors
Shuyue Stella Li
Beining Xu
Xiangyu Zhang
Hexin Liu
Wen-Han Chao
Leibny Paola García
SSL
37
4
0
27 Nov 2023
Previous
123...101112...333435
Next