Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.08100
Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition
16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Conformer: Convolution-augmented Transformer for Speech Recognition"
50 / 1,749 papers shown
Title
Complexity boosted adaptive training for better low resource ASR performance
Hongxuan Lu
Shenjian Wang
Biao Li
78
0
0
01 Dec 2024
From Audio Deepfake Detection to AI-Generated Music Detection -- A Pathway and Overview
Yupei Li
M. Milling
Lucia Specia
Björn Schuller
89
6
0
30 Nov 2024
PREBA: A Hardware/Software Co-Design for Multi-Instance GPU based AI Inference Servers
Gwangoo Yeo
Jiin Kim
Yujeong Choi
Minsoo Rhu
81
0
0
28 Nov 2024
Continual Learning in Machine Speech Chain Using Gradient Episodic Memory
Geoffrey Tyndall
Kurniawati Azizah
Dipta Tanaya
Ayu Purwarianti
Dessi Lestari
S. Sakti
CLL
65
0
0
27 Nov 2024
Tiny-Align: Bridging Automatic Speech Recognition and Large Language Model on the Edge
Ruiyang Qin
Dancheng Liu
Gelei Xu
Zheyu Yan
Chenhui Xu
Yuting Hu
Xiaolin Hu
Jinjun Xiong
Yiyu Shi
AuLLM
115
1
0
21 Nov 2024
Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM
Jiawei Yu
Yong Li
Xiaosong Qiao
Huan Zhao
Xiaofeng Zhao
Wei Tang
Hao Fei
Hao Yang
Jinsong Su
80
0
0
20 Nov 2024
Signformer is all you need: Towards Edge AI for Sign Language
Eta Yang
SLR
82
0
0
19 Nov 2024
SAMOS: A Neural MOS Prediction Model Leveraging Semantic Representations and Acoustic Features
Yu-Fei Shi
Yang Ai
Ye-Xin Lu
Hui-Peng Du
Zhen-Hua Ling
36
0
0
18 Nov 2024
Pitch-and-Spectrum-Aware Singing Quality Assessment with Bias Correction and Model Fusion
Yu-Fei Shi
Yang Ai
Ye-Xin Lu
Hui-Peng Du
Zhen-Hua Ling
33
0
0
17 Nov 2024
XLSR-Mamba: A Dual-Column Bidirectional State Space Model for Spoofing Attack Detection
Yang Xiao
Rohan Kumar Das
Mamba
36
1
0
15 Nov 2024
Robust AI-Synthesized Speech Detection Using Feature Decomposition Learning and Synthesizer Feature Augmentation
Kuiyuan Zhang
Zhongyun Hua
Yushu Zhang
Yifang Guo
Tao Xiang
29
0
0
14 Nov 2024
Multimodal Fusion Balancing Through Game-Theoretic Regularization
Konstantinos Kontras
Thomas Strypsteen
Christos Chatzichristos
Paul P. Liang
Matthew Blaschko
M. D. Vos
36
0
0
11 Nov 2024
Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition
Yoshiki Masuyama
Koichi Miyazaki
Masato Murata
Mamba
43
0
0
11 Nov 2024
Wavehax: Aliasing-Free Neural Waveform Synthesis Based on 2D Convolution and Harmonic Prior for Reliable Complex Spectrogram Estimation
Reo Yoneyama
Atsushi Miyashita
Ryuichi Yamamoto
T. Toda
27
1
0
11 Nov 2024
Gen-AI for User Safety: A Survey
Akshar Prabhu Desai
Tejasvi Ravi
Mohammad Luqman
Mohit Sharma
Nithya Kota
Pranjul Yadav
38
1
0
10 Nov 2024
PSELDNets: Pre-trained Neural Networks on Large-scale Synthetic Datasets for Sound Event Localization and Detection
Jinbo Hu
Yin Cao
Ming Wu
Fang Kang
Feiran Yang
Wenwu Wang
Mark D. Plumbley
J. Yang
33
0
0
10 Nov 2024
Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward
Shashi Kumar
Iuliia Thorbecke
Sergio Burdisso
Esaú Villatoro-Tello
Marcelo Errecalde
Kadri Hacioğlu
Pradeep Rangappa
P. Motlícek
A. Ganapathiraju
Andreas Stolcke
55
1
0
06 Nov 2024
LASER: Attention with Exponential Transformation
Sai Surya Duvvuri
Inderjit Dhillon
43
1
0
05 Nov 2024
DeepContext: A Context-aware, Cross-platform, and Cross-framework Tool for Performance Profiling and Analysis of Deep Learning Workloads
Qidong Zhao
Hao Wu
Yuming Hao
Zilingfeng Ye
Jiajia Li
Xu Liu
Keren Zhou
31
0
0
05 Nov 2024
MoCE: Adaptive Mixture of Contextualization Experts for Byte-based Neural Machine Translation
Langlin Huang
Mengyu Bu
Yang Feng
33
0
0
03 Nov 2024
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Dennis Fucci
Marco Gaido
Beatrice Savoldi
Matteo Negri
Mauro Cettolo
L. Bentivogli
57
1
0
03 Nov 2024
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
Heng-Jui Chang
Hongyu Gong
Changhan Wang
James R. Glass
Yu-An Chung
28
0
0
31 Oct 2024
Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?
Ioannis Tsiamas
Matthias Sperber
Andrew Finch
Sarthak Garg
36
0
0
31 Oct 2024
Augmenting Polish Automatic Speech Recognition System With Synthetic Data
Łukasz Bondaruk
Jakub Kubiak
Mateusz Czyżnikiewicz
39
0
0
30 Oct 2024
Leveraging Reverberation and Visual Depth Cues for Sound Event Localization and Detection with Distance Estimation
Davide Berghi
Philip J. B. Jackson
29
1
0
29 Oct 2024
Representational learning for an anomalous sound detection system with source separation model
S. Shin
Seokjin Lee
27
0
0
29 Oct 2024
Device-Directed Speech Detection for Follow-up Conversations Using Large Language Models
Ognjen
Rudovic
Pranay Dighe
Yi Su
Vineet Garg
Sameer Dharur
Xiaochuan Niu
Ahmed H. Abdelaziz
Saurabh N. Adya
Ahmed H. Tewfik
31
0
0
28 Oct 2024
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Yifan Peng
Krishna C. Puvvada
Zhehuai Chen
Piotr .Zelasko
He Huang
Kunal Dhawan
Ke Hu
Shinji Watanabe
Jagadeesh Balam
Boris Ginsburg
61
2
0
23 Oct 2024
Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
Guanrou Yang
Fan Yu
Z. Ma
Zhihao Du
Zhifu Gao
Shiliang Zhang
Xie Chen
32
1
0
22 Oct 2024
GE2E-KWS: Generalized End-to-End Training and Evaluation for Zero-shot Keyword Spotting
Pai Zhu
Jacob Bartel
Dhruuv Agarwal
Kurt Partridge
Hyun-jin Park
Quan Wang
26
0
0
22 Oct 2024
AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition
Ziqiang Liu
Xiaolou Li
Chen Chen
Li Guo
Lantian Li
D. Wang
35
0
0
21 Oct 2024
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
Yiwei Guo
Zhihan Li
Chenpeng Du
Hankun Wang
Xie Chen
Kai Yu
36
1
0
21 Oct 2024
Optimizing Neural Speech Codec for Low-Bitrate Compression via Multi-Scale Encoding
Peiji Yang
Fengping Wang
Yicheng Zhong
Huawei Wei
Zhisheng Wang
23
0
0
21 Oct 2024
Generalized Probabilistic Attention Mechanism in Transformers
DongNyeong Heo
Heeyoul Choi
56
0
0
21 Oct 2024
Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example
Suhita Ghosh
Melanie Jouaiti
Arnab Das
Yamini Sinha
Tim Polzehl
Ingo Siegert
Sebastian Stober
23
2
0
20 Oct 2024
AC-Mix: Self-Supervised Adaptation for Low-Resource Automatic Speech Recognition using Agnostic Contrastive Mixup
Carlos Carvalho
A. Abad
21
0
0
18 Oct 2024
Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization
Bin Lin
Yanzhen Yu
Jianhao Ye
Ruitao Lv
Yuqing Yang
Ruoye Xie
Pan Yu
Hongbin Zhou
VGen
35
1
0
18 Oct 2024
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
Xuyuan Li
Zengqiang Shang
Hua Hua
Peiyang Shi
Chen Yang
Li Wang
Pengyuan Zhang
47
2
0
16 Oct 2024
Quality-Aware End-to-End Audio-Visual Neural Speaker Diarization
Mao-Kui He
Jun Du
Shu-Tong Niu
Qing-Feng Liu
Chin-Hui Lee
24
0
0
15 Oct 2024
Investigation of Speaker Representation for Target-Speaker Speech Processing
Takanori Ashihara
Takafumi Moriya
Shota Horiguchi
Junyi Peng
Tsubasa Ochiai
Marc Delcroix
Kohei Matsuura
Hiroshi Sato
31
1
0
15 Oct 2024
Character-aware audio-visual subtitling in context
Jaesung Huh
Andrew Zisserman
41
0
0
14 Oct 2024
In-Materia Speech Recognition
Mohamadreza Zolfagharinejad
Julian Büchel
Lorenzo Cassola
Sachin Kinge
Ghazi Sarwat Syed
Abu Sebastian
Wilfred G. van der Wiel
26
0
0
14 Oct 2024
Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition Models
Adriana Fernandez-Lopez
Shiwei Liu
L. Yin
Stavros Petridis
Maja Pantic
29
0
0
10 Oct 2024
Transducer Consistency Regularization for Speech to Text Applications
Cindy Tseng
Yun Tang
Vijendra Raj Apsingekar
40
0
0
09 Oct 2024
LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor Extraction
Di Liang
Xiaofei Li
24
0
0
09 Oct 2024
Enforcing Interpretability in Time Series Transformers: A Concept Bottleneck Framework
Angela van Sprang
Erman Acar
Willem Zuidema
AI4TS
51
1
0
08 Oct 2024
FGCL: Fine-grained Contrastive Learning For Mandarin Stuttering Event Detection
Han Jiang
Wenyu Wang
Yiquan Zhou
Hongwu Ding
Jiacheng Xu
Jihua Zhu
25
0
0
08 Oct 2024
Automatic Screening for Children with Speech Disorder using Automatic Speech Recognition: Opportunities and Challenges
Dancheng Liu
Jason Yang
Ishan Albrecht-Buehler
Helen Qin
Sophie Li
Yuting Hu
Amir Nassereldine
Jinjun Xiong
24
1
0
07 Oct 2024
CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation
Rui Zhao
Jinyu Li
Ruchao Fan
Matt Post
38
1
0
07 Oct 2024
Improving Speaker Representations Using Contrastive Losses on Multi-scale Features
Satvik Dixit
Massa Baali
Rita Singh
Bhiksha Raj
29
0
0
07 Oct 2024
Previous
1
2
3
4
5
6
...
33
34
35
Next