Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,749 papers shown

Title
Complexity boosted adaptive training for better low resource ASR performance Hongxuan Lu Shenjian Wang Biao Li 78 0 0 01 Dec 2024
From Audio Deepfake Detection to AI-Generated Music Detection -- A Pathway and Overview Yupei Li M. Milling Lucia Specia Björn Schuller 89 6 0 30 Nov 2024
PREBA: A Hardware/Software Co-Design for Multi-Instance GPU based AI Inference Servers Gwangoo Yeo Jiin Kim Yujeong Choi Minsoo Rhu 81 0 0 28 Nov 2024
Continual Learning in Machine Speech Chain Using Gradient Episodic Memory Geoffrey Tyndall Kurniawati Azizah Dipta Tanaya Ayu Purwarianti Dessi Lestari S. Sakti CLL 65 0 0 27 Nov 2024
Tiny-Align: Bridging Automatic Speech Recognition and Large Language Model on the Edge Ruiyang Qin Dancheng Liu Gelei Xu Zheyu Yan Chenhui Xu Yuting Hu Xiaolin Hu Jinjun Xiong Yiyu Shi AuLLM 115 1 0 21 Nov 2024
Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM Jiawei Yu Yong Li Xiaosong Qiao Huan Zhao Xiaofeng Zhao Wei Tang Hao Fei Hao Yang Jinsong Su 80 0 0 20 Nov 2024
Signformer is all you need: Towards Edge AI for Sign Language Eta Yang SLR 82 0 0 19 Nov 2024
SAMOS: A Neural MOS Prediction Model Leveraging Semantic Representations and Acoustic Features Yu-Fei Shi Yang Ai Ye-Xin Lu Hui-Peng Du Zhen-Hua Ling 36 0 0 18 Nov 2024
Pitch-and-Spectrum-Aware Singing Quality Assessment with Bias Correction and Model Fusion Yu-Fei Shi Yang Ai Ye-Xin Lu Hui-Peng Du Zhen-Hua Ling 33 0 0 17 Nov 2024
XLSR-Mamba: A Dual-Column Bidirectional State Space Model for Spoofing Attack Detection Yang Xiao Rohan Kumar Das Mamba 36 1 0 15 Nov 2024
Robust AI-Synthesized Speech Detection Using Feature Decomposition Learning and Synthesizer Feature Augmentation Kuiyuan Zhang Zhongyun Hua Yushu Zhang Yifang Guo Tao Xiang 29 0 0 14 Nov 2024
Multimodal Fusion Balancing Through Game-Theoretic Regularization Konstantinos Kontras Thomas Strypsteen Christos Chatzichristos Paul P. Liang Matthew Blaschko M. D. Vos 36 0 0 11 Nov 2024
Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition Yoshiki Masuyama Koichi Miyazaki Masato Murata Mamba 43 0 0 11 Nov 2024
Wavehax: Aliasing-Free Neural Waveform Synthesis Based on 2D Convolution and Harmonic Prior for Reliable Complex Spectrogram Estimation Reo Yoneyama Atsushi Miyashita Ryuichi Yamamoto T. Toda 27 1 0 11 Nov 2024
Gen-AI for User Safety: A Survey Akshar Prabhu Desai Tejasvi Ravi Mohammad Luqman Mohit Sharma Nithya Kota Pranjul Yadav 38 1 0 10 Nov 2024
PSELDNets: Pre-trained Neural Networks on Large-scale Synthetic Datasets for Sound Event Localization and Detection Jinbo Hu Yin Cao Ming Wu Fang Kang Feiran Yang Wenwu Wang Mark D. Plumbley J. Yang 33 0 0 10 Nov 2024
Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward Shashi Kumar Iuliia Thorbecke Sergio Burdisso Esaú Villatoro-Tello Marcelo Errecalde Kadri Hacioğlu Pradeep Rangappa P. Motlícek A. Ganapathiraju Andreas Stolcke 55 1 0 06 Nov 2024
LASER: Attention with Exponential Transformation Sai Surya Duvvuri Inderjit Dhillon 43 1 0 05 Nov 2024
DeepContext: A Context-aware, Cross-platform, and Cross-framework Tool for Performance Profiling and Analysis of Deep Learning Workloads Qidong Zhao Hao Wu Yuming Hao Zilingfeng Ye Jiajia Li Xu Liu Keren Zhou 31 0 0 05 Nov 2024
MoCE: Adaptive Mixture of Contextualization Experts for Byte-based Neural Machine Translation Langlin Huang Mengyu Bu Yang Feng 33 0 0 03 Nov 2024
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation Dennis Fucci Marco Gaido Beatrice Savoldi Matteo Negri Mauro Cettolo L. Bentivogli 57 1 0 03 Nov 2024
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models Heng-Jui Chang Hongyu Gong Changhan Wang James R. Glass Yu-An Chung 28 0 0 31 Oct 2024
Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody? Ioannis Tsiamas Matthias Sperber Andrew Finch Sarthak Garg 36 0 0 31 Oct 2024
Augmenting Polish Automatic Speech Recognition System With Synthetic Data Łukasz Bondaruk Jakub Kubiak Mateusz Czyżnikiewicz 39 0 0 30 Oct 2024
Leveraging Reverberation and Visual Depth Cues for Sound Event Localization and Detection with Distance Estimation Davide Berghi Philip J. B. Jackson 29 1 0 29 Oct 2024
Representational learning for an anomalous sound detection system with source separation model S. Shin Seokjin Lee 27 0 0 29 Oct 2024
Device-Directed Speech Detection for Follow-up Conversations Using Large Language Models Ognjen Rudovic Pranay Dighe Yi Su Vineet Garg Sameer Dharur Xiaochuan Niu Ahmed H. Abdelaziz Saurabh N. Adya Ahmed H. Tewfik 31 0 0 28 Oct 2024
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning Yifan Peng Krishna C. Puvvada Zhehuai Chen Piotr .Zelasko He Huang Kunal Dhawan Ke Hu Shinji Watanabe Jagadeesh Balam Boris Ginsburg 61 2 0 23 Oct 2024
Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap Guanrou Yang Fan Yu Z. Ma Zhihao Du Zhifu Gao Shiliang Zhang Xie Chen 32 1 0 22 Oct 2024
GE2E-KWS: Generalized End-to-End Training and Evaluation for Zero-shot Keyword Spotting Pai Zhu Jacob Bartel Dhruuv Agarwal Kurt Partridge Hyun-jin Park Quan Wang 26 0 0 22 Oct 2024
AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition Ziqiang Liu Xiaolou Li Chen Chen Li Guo Lantian Li D. Wang 35 0 0 21 Oct 2024
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec Yiwei Guo Zhihan Li Chenpeng Du Hankun Wang Xie Chen Kai Yu 36 1 0 21 Oct 2024
Optimizing Neural Speech Codec for Low-Bitrate Compression via Multi-Scale Encoding Peiji Yang Fengping Wang Yicheng Zhong Huawei Wei Zhisheng Wang 23 0 0 21 Oct 2024
Generalized Probabilistic Attention Mechanism in Transformers DongNyeong Heo Heeyoul Choi 56 0 0 21 Oct 2024
Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example Suhita Ghosh Melanie Jouaiti Arnab Das Yamini Sinha Tim Polzehl Ingo Siegert Sebastian Stober 23 2 0 20 Oct 2024
AC-Mix: Self-Supervised Adaptation for Low-Resource Automatic Speech Recognition using Agnostic Contrastive Mixup Carlos Carvalho A. Abad 21 0 0 18 Oct 2024
Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization Bin Lin Yanzhen Yu Jianhao Ye Ruitao Lv Yuqing Yang Ruoye Xie Pan Yu Hongbin Zhou VGen 35 1 0 18 Oct 2024
SF-Speech: Straightened Flow for Zero-Shot Voice Clone Xuyuan Li Zengqiang Shang Hua Hua Peiyang Shi Chen Yang Li Wang Pengyuan Zhang 47 2 0 16 Oct 2024
Quality-Aware End-to-End Audio-Visual Neural Speaker Diarization Mao-Kui He Jun Du Shu-Tong Niu Qing-Feng Liu Chin-Hui Lee 24 0 0 15 Oct 2024
Investigation of Speaker Representation for Target-Speaker Speech Processing Takanori Ashihara Takafumi Moriya Shota Horiguchi Junyi Peng Tsubasa Ochiai Marc Delcroix Kohei Matsuura Hiroshi Sato 31 1 0 15 Oct 2024
Character-aware audio-visual subtitling in context Jaesung Huh Andrew Zisserman 41 0 0 14 Oct 2024
In-Materia Speech Recognition Mohamadreza Zolfagharinejad Julian Büchel Lorenzo Cassola Sachin Kinge Ghazi Sarwat Syed Abu Sebastian Wilfred G. van der Wiel 26 0 0 14 Oct 2024
Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition Models Adriana Fernandez-Lopez Shiwei Liu L. Yin Stavros Petridis Maja Pantic 29 0 0 10 Oct 2024
Transducer Consistency Regularization for Speech to Text Applications Cindy Tseng Yun Tang Vijendra Raj Apsingekar 40 0 0 09 Oct 2024
LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor Extraction Di Liang Xiaofei Li 24 0 0 09 Oct 2024
Enforcing Interpretability in Time Series Transformers: A Concept Bottleneck Framework Angela van Sprang Erman Acar Willem Zuidema AI4TS 51 1 0 08 Oct 2024
FGCL: Fine-grained Contrastive Learning For Mandarin Stuttering Event Detection Han Jiang Wenyu Wang Yiquan Zhou Hongwu Ding Jiacheng Xu Jihua Zhu 25 0 0 08 Oct 2024
Automatic Screening for Children with Speech Disorder using Automatic Speech Recognition: Opportunities and Challenges Dancheng Liu Jason Yang Ishan Albrecht-Buehler Helen Qin Sophie Li Yuting Hu Amir Nassereldine Jinjun Xiong 24 1 0 07 Oct 2024
CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation Rui Zhao Jinyu Li Ruchao Fan Matt Post 38 1 0 07 Oct 2024
Improving Speaker Representations Using Contrastive Losses on Multi-scale Features Satvik Dixit Massa Baali Rita Singh Bhiksha Raj 29 0 0 07 Oct 2024