Title
Non-Stationary Time Series Forecasting Based on Fourier Analysis and Cross Attention Mechanism Yuqi Xiong Yang Wen AI4TS 31 0 0 11 May 2025
Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning Lucas Block Medin Thomas Pellegrini Lucile Gelin SSL 69 1 0 06 Mar 2025
Reservoir Network with Structural Plasticity for Human Activity Recognition Abdullah M. Zyarah Alaa M. Abdul-Hadi Dhireesha Kudithipudi 31 3 0 01 Mar 2025
ChordFormer: A Conformer-Based Architecture for Large-Vocabulary Audio Chord Recognition Muhammad Waseem Akram Stefano Dettori V. Colla Giorgio Buttazzo 57 0 0 17 Feb 2025
Aligner-Encoders: Self-Attention Transformers Can Be Self-Transducers Adam Stooke Rohit Prabhavalkar K. Sim P. M. Mengibar 39 0 0 06 Feb 2025
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets Jiatong Shi Shih-Heng Wang William Chen Martijn Bartelds Vanya Bannihatti Kumar ... Xuankai Chang Dan Jurafsky Karen Livescu Hung-yi Lee Shinji Watanabe AuLLM 77 5 0 12 Jun 2024
Augmenting emotion features in irony detection with Large language modeling Yucheng Lin Yuhan Xia Yunfei Long 38 3 0 18 Apr 2024
Guided Masked Self-Distillation Modeling for Distributed Multimedia Sensor Event Analysis Masahiro Yasuda Noboru Harada Yasunori Ohishi Shoichiro Saito Akira Nakayama Nobutaka Ono 36 3 0 12 Apr 2024
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition Yash Jain David M. Chan Pranav Dheram Aparna Khare Olabanji Shonibare Venkatesh Ravichandran Shalini Ghosh 40 2 0 28 Mar 2024
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition A. Ogawa Naohiro Tawara Takatomo Kano Marc Delcroix 46 4 0 22 Dec 2023
Exploring RWKV for Memory Efficient and Low Latency Streaming ASR Keyu An Shiliang Zhang 31 4 0 26 Sep 2023
Speech enhancement with frequency domain auto-regressive modeling Anurenjan Purushothaman Debottam Dutta Rohit Kumar Sriram Ganapathy 22 2 0 24 Sep 2023
Transformers versus LSTMs for electronic trading Paul Bilokon Yitao Qiu AI4TS AIFin 18 13 0 20 Sep 2023
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding Titouan Parcollet Rogier van Dalen Shucong Zhang S. Bhattacharya 26 6 0 12 Jul 2023
Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems Mingyu Cui Jiawen Kang Jiajun Deng Xiaoyue Yin Yutao Xie Xie Chen Xunying Liu 35 8 0 23 Jun 2023
Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization Kohei Matsuura Takanori Ashihara Takafumi Moriya Tomohiro Tanaka Takatomo Kano A. Ogawa Marc Delcroix 29 9 0 07 Jun 2023
Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer Lu Huang Yangqiu Song Jun Zhang Lu Lu Zejun Ma 29 2 0 07 Jun 2023
Language-universal phonetic encoder for low-resource speech recognition Siyuan Feng Ming Tu Rui Xia Chuanzeng Huang Yuxuan Wang 36 2 0 19 May 2023
Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition Siyuan Feng Ming Tu Rui Xia Chuanzeng Huang Yuxuan Wang 35 5 0 19 May 2023
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks Yifan Peng Kwangyoun Kim Felix Wu Brian Yan Siddhant Arora William Chen Jiyang Tang Suwon Shon Prashant Sridhar Shinji Watanabe 29 17 0 18 May 2023
Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition Mohan Li R. Doddipatla Catalin Zorila 30 0 0 24 Apr 2023
Multi-Modal Deep Learning for Credit Rating Prediction Using Text and Numerical Data Streams M. Tavakoli Rohitash Chandra Fengrui Tian Cristián Bravo 29 8 0 21 Apr 2023
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit Brian Yan Jiatong Shi Yun Tang Hirofumi Inaguma Yifan Peng ... Zhaoheng Ni Moto Hira Soumi Maiti J. Pino Shinji Watanabe 19 20 0 10 Apr 2023
Transformers in Speech Processing: A Survey S. Latif Aun Zaidi Heriberto Cuayáhuitl Fahad Shamshad Moazzam Shoukat Junaid Qadir 42 47 0 21 Mar 2023
I3D: Transformer architectures with input-dependent dynamic depth for speech recognition Yifan Peng Jaesong Lee Shinji Watanabe 27 19 0 14 Mar 2023
Stabilising and accelerating light gated recurrent units for automatic speech recognition Adel Moumen Titouan Parcollet 26 3 0 16 Feb 2023
Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems Jiajun Deng Xurong Xie Tianzi Wang Mingyu Cui Boyang Xue Zengrui Jin Guinan Li Shujie Hu Xunying Liu 26 5 0 15 Feb 2023
A Text-guided Protein Design Framework Shengchao Liu Yanjing Li Zhuoxinran Li A. Gitter Yutao Zhu ... Arvind Ramanathan Chaowei Xiao Jian Tang Hongyu Guo Anima Anandkumar 70 61 0 09 Feb 2023
AI2: The next leap toward native language based and explainable machine learning framework J. Dessureault Daniel Massicotte 14 1 0 09 Jan 2023
Images Speak in Images: A Generalist Painter for In-Context Visual Learning Xinlong Wang Wen Wang Yue Cao Chunhua Shen Tiejun Huang VLM MLLM 66 244 0 05 Dec 2022
An Overview of Indian Spoken Language Recognition from Machine Learning Perspective Spandan Dey Md. Sahidullah G. Saha 33 20 0 30 Nov 2022
Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation Motoi Omachi Brian Yan Siddharth Dalmia Yuya Fujita Shinji Watanabe LRM 25 3 0 11 Nov 2022
Structured State Space Decoder for Speech Recognition and Synthesis Koichi Miyazaki Masato Murata Tomoki Koriyama 34 12 0 31 Oct 2022
Articulatory Representation Learning Via Joint Factor Analysis and Neural Matrix Factorization Jiachen Lian A. Black Yijingxiu Lu L. Goldstein Shinji Watanabe Gopala K. Anumanchipalli 46 14 0 29 Oct 2022
Are Deep Sequence Classifiers Good at Non-Trivial Generalization? Francesco Cazzaro A. Quattoni X. Carreras MQ 26 0 0 24 Oct 2022
Revisiting Checkpoint Averaging for Neural Machine Translation Yingbo Gao Christian Herold Zijian Yang Hermann Ney MoMe 27 11 0 21 Oct 2022
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation Yoshiki Masuyama Xuankai Chang Samuele Cornell Shinji Watanabe Nobutaka Ono 17 19 0 19 Oct 2022
LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge Yan Jia Mihee Hong Jingyu Hou Kailong Ren Sifan Ma Jin Wang Fangzhen Peng Yinglin Ji Lin Yang Junjie Wang 25 1 0 14 Oct 2022
SQuAT: Sharpness- and Quantization-Aware Training for BERT Zheng Wang Juncheng Billy Li Shuhui Qu Florian Metze Emma Strubell MQ 24 7 0 13 Oct 2022
Synthetic Voice Detection and Audio Splicing Detection using SE-Res2Net-Conformer Architecture Lei Wang Benedict Yeoh Jun Wah Ng 40 7 0 07 Oct 2022
A Comparison of Transformer, Convolutional, and Recurrent Neural Networks on Phoneme Recognition Kyuhong Shim Wonyong Sung 25 2 0 01 Oct 2022
E-Branchformer: Branchformer with Enhanced merging for speech recognition Kwangyoun Kim Felix Wu Yifan Peng Jing Pan Prashant Sridhar Kyu Jeong Han Shinji Watanabe 61 105 0 30 Sep 2022
Two-Pass Low Latency End-to-End Spoken Language Understanding Siddhant Arora Siddharth Dalmia Xuankai Chang Brian Yan A. Black Shinji Watanabe VLM 30 19 0 14 Jul 2022
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding Yifan Peng Siddharth Dalmia Ian Lane Shinji Watanabe 30 143 0 06 Jul 2022
Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism Kun Wei Pengcheng Guo Ning Jiang 48 11 0 02 Jul 2022
Confidence Score Based Conformer Speaker Adaptation for Speech Recognition Jiajun Deng Xurong Xie Tianzi Wang Mingyu Cui Boyang Xue Zengrui Jin Mengzhe Geng Guinan Li Xunying Liu Helen M. Meng 17 13 0 24 Jun 2022
LegoNN: Building Modular Encoder-Decoder Models Siddharth Dalmia Dmytro Okhonko M. Lewis Sergey Edunov Shinji Watanabe Florian Metze Luke Zettlemoyer Abdel-rahman Mohamed AuLLM MoE 29 14 0 07 Jun 2022
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition Sehoon Kim A. Gholami Albert Eaton Shaw Nicholas Lee K. Mangalam Jitendra Malik Michael W. Mahoney Kurt Keutzer 32 99 0 02 Jun 2022
Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training Dading Chong Helin Wang Peilin Zhou Qingcheng Zeng 39 65 0 27 Apr 2022
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation Dan Berrebbi Jiatong Shi Brian Yan Osbel López-Francisco Jonathan D. Amith Shinji Watanabe 10 26 0 05 Apr 2022