ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.06317
  4. Cited By
A Comparative Study on Transformer vs RNN in Speech Applications

A Comparative Study on Transformer vs RNN in Speech Applications

13 September 2019
Shigeki Karita
Nanxin Chen
Tomoki Hayashi
Takaaki Hori
Hirofumi Inaguma
Ziyan Jiang
Masao Someki
Nelson Yalta
Ryuichi Yamamoto
Xiao-fei Wang
Shinji Watanabe
Takenori Yoshimura
Wangyou Zhang
ArXivPDFHTML

Papers citing "A Comparative Study on Transformer vs RNN in Speech Applications"

50 / 128 papers shown
Title
Non-Stationary Time Series Forecasting Based on Fourier Analysis and Cross Attention Mechanism
Non-Stationary Time Series Forecasting Based on Fourier Analysis and Cross Attention Mechanism
Yuqi Xiong
Yang Wen
AI4TS
31
0
0
11 May 2025
Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning
Lucas Block Medin
Thomas Pellegrini
Lucile Gelin
SSL
69
1
0
06 Mar 2025
Reservoir Network with Structural Plasticity for Human Activity Recognition
Abdullah M. Zyarah
Alaa M. Abdul-Hadi
Dhireesha Kudithipudi
31
3
0
01 Mar 2025
ChordFormer: A Conformer-Based Architecture for Large-Vocabulary Audio Chord Recognition
ChordFormer: A Conformer-Based Architecture for Large-Vocabulary Audio Chord Recognition
Muhammad Waseem Akram
Stefano Dettori
V. Colla
Giorgio Buttazzo
57
0
0
17 Feb 2025
Aligner-Encoders: Self-Attention Transformers Can Be Self-Transducers
Aligner-Encoders: Self-Attention Transformers Can Be Self-Transducers
Adam Stooke
Rohit Prabhavalkar
K. Sim
P. M. Mengibar
39
0
0
06 Feb 2025
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling
  Constraints, Languages, and Datasets
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
Jiatong Shi
Shih-Heng Wang
William Chen
Martijn Bartelds
Vanya Bannihatti Kumar
...
Xuankai Chang
Dan Jurafsky
Karen Livescu
Hung-yi Lee
Shinji Watanabe
AuLLM
77
5
0
12 Jun 2024
Augmenting emotion features in irony detection with Large language
  modeling
Augmenting emotion features in irony detection with Large language modeling
Yucheng Lin
Yuhan Xia
Yunfei Long
38
3
0
18 Apr 2024
Guided Masked Self-Distillation Modeling for Distributed Multimedia
  Sensor Event Analysis
Guided Masked Self-Distillation Modeling for Distributed Multimedia Sensor Event Analysis
Masahiro Yasuda
Noboru Harada
Yasunori Ohishi
Shoichiro Saito
Akira Nakayama
Nobutaka Ono
36
3
0
12 Apr 2024
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Yash Jain
David M. Chan
Pranav Dheram
Aparna Khare
Olabanji Shonibare
Venkatesh Ravichandran
Shalini Ghosh
40
2
0
28 Mar 2024
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition
A. Ogawa
Naohiro Tawara
Takatomo Kano
Marc Delcroix
46
4
0
22 Dec 2023
Exploring RWKV for Memory Efficient and Low Latency Streaming ASR
Exploring RWKV for Memory Efficient and Low Latency Streaming ASR
Keyu An
Shiliang Zhang
31
4
0
26 Sep 2023
Speech enhancement with frequency domain auto-regressive modeling
Speech enhancement with frequency domain auto-regressive modeling
Anurenjan Purushothaman
Debottam Dutta
Rohit Kumar
Sriram Ganapathy
22
2
0
24 Sep 2023
Transformers versus LSTMs for electronic trading
Transformers versus LSTMs for electronic trading
Paul Bilokon
Yitao Qiu
AI4TS
AIFin
18
13
0
20 Sep 2023
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for
  Speech Recognition and Understanding
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding
Titouan Parcollet
Rogier van Dalen
Shucong Zhang
S. Bhattacharya
26
6
0
12 Jul 2023
Towards Effective and Compact Contextual Representation for Conformer
  Transducer Speech Recognition Systems
Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems
Mingyu Cui
Jiawen Kang
Jiajun Deng
Xiaoyue Yin
Yutao Xie
Xie Chen
Xunying Liu
35
8
0
23 Jun 2023
Transfer Learning from Pre-trained Language Models Improves End-to-End
  Speech Summarization
Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization
Kohei Matsuura
Takanori Ashihara
Takafumi Moriya
Tomohiro Tanaka
Takatomo Kano
A. Ogawa
Marc Delcroix
29
9
0
07 Jun 2023
Text-only Domain Adaptation using Unified Speech-Text Representation in
  Transducer
Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer
Lu Huang
Yangqiu Song
Jun Zhang
Lu Lu
Zejun Ma
29
2
0
07 Jun 2023
Language-universal phonetic encoder for low-resource speech recognition
Language-universal phonetic encoder for low-resource speech recognition
Siyuan Feng
Ming Tu
Rui Xia
Chuanzeng Huang
Yuxuan Wang
36
2
0
19 May 2023
Language-Universal Phonetic Representation in Multilingual Speech
  Pretraining for Low-Resource Speech Recognition
Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition
Siyuan Feng
Ming Tu
Rui Xia
Chuanzeng Huang
Yuxuan Wang
35
5
0
19 May 2023
A Comparative Study on E-Branchformer vs Conformer in Speech
  Recognition, Translation, and Understanding Tasks
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
Yifan Peng
Kwangyoun Kim
Felix Wu
Brian Yan
Siddhant Arora
William Chen
Jiyang Tang
Suwon Shon
Prashant Sridhar
Shinji Watanabe
29
17
0
18 May 2023
Self-regularised Minimum Latency Training for Streaming
  Transformer-based Speech Recognition
Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition
Mohan Li
R. Doddipatla
Catalin Zorila
30
0
0
24 Apr 2023
Multi-Modal Deep Learning for Credit Rating Prediction Using Text and
  Numerical Data Streams
Multi-Modal Deep Learning for Credit Rating Prediction Using Text and Numerical Data Streams
M. Tavakoli
Rohitash Chandra
Fengrui Tian
Cristián Bravo
29
8
0
21 Apr 2023
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Brian Yan
Jiatong Shi
Yun Tang
Hirofumi Inaguma
Yifan Peng
...
Zhaoheng Ni
Moto Hira
Soumi Maiti
J. Pino
Shinji Watanabe
19
20
0
10 Apr 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
42
47
0
21 Mar 2023
I3D: Transformer architectures with input-dependent dynamic depth for
  speech recognition
I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Yifan Peng
Jaesong Lee
Shinji Watanabe
27
19
0
14 Mar 2023
Stabilising and accelerating light gated recurrent units for automatic
  speech recognition
Stabilising and accelerating light gated recurrent units for automatic speech recognition
Adel Moumen
Titouan Parcollet
26
3
0
16 Feb 2023
Confidence Score Based Speaker Adaptation of Conformer Speech
  Recognition Systems
Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems
Jiajun Deng
Xurong Xie
Tianzi Wang
Mingyu Cui
Boyang Xue
Zengrui Jin
Guinan Li
Shujie Hu
Xunying Liu
26
5
0
15 Feb 2023
A Text-guided Protein Design Framework
A Text-guided Protein Design Framework
Shengchao Liu
Yanjing Li
Zhuoxinran Li
A. Gitter
Yutao Zhu
...
Arvind Ramanathan
Chaowei Xiao
Jian Tang
Hongyu Guo
Anima Anandkumar
70
61
0
09 Feb 2023
AI2: The next leap toward native language based and explainable machine
  learning framework
AI2: The next leap toward native language based and explainable machine learning framework
J. Dessureault
Daniel Massicotte
14
1
0
09 Jan 2023
Images Speak in Images: A Generalist Painter for In-Context Visual
  Learning
Images Speak in Images: A Generalist Painter for In-Context Visual Learning
Xinlong Wang
Wen Wang
Yue Cao
Chunhua Shen
Tiejun Huang
VLM
MLLM
66
244
0
05 Dec 2022
An Overview of Indian Spoken Language Recognition from Machine Learning
  Perspective
An Overview of Indian Spoken Language Recognition from Machine Learning Perspective
Spandan Dey
Md. Sahidullah
G. Saha
33
20
0
30 Nov 2022
Align, Write, Re-order: Explainable End-to-End Speech Translation via
  Operation Sequence Generation
Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation
Motoi Omachi
Brian Yan
Siddharth Dalmia
Yuya Fujita
Shinji Watanabe
LRM
25
3
0
11 Nov 2022
Structured State Space Decoder for Speech Recognition and Synthesis
Structured State Space Decoder for Speech Recognition and Synthesis
Koichi Miyazaki
Masato Murata
Tomoki Koriyama
34
12
0
31 Oct 2022
Articulatory Representation Learning Via Joint Factor Analysis and
  Neural Matrix Factorization
Articulatory Representation Learning Via Joint Factor Analysis and Neural Matrix Factorization
Jiachen Lian
A. Black
Yijingxiu Lu
L. Goldstein
Shinji Watanabe
Gopala K. Anumanchipalli
46
14
0
29 Oct 2022
Are Deep Sequence Classifiers Good at Non-Trivial Generalization?
Are Deep Sequence Classifiers Good at Non-Trivial Generalization?
Francesco Cazzaro
A. Quattoni
X. Carreras
MQ
26
0
0
24 Oct 2022
Revisiting Checkpoint Averaging for Neural Machine Translation
Revisiting Checkpoint Averaging for Neural Machine Translation
Yingbo Gao
Christian Herold
Zijian Yang
Hermann Ney
MoMe
27
11
0
21 Oct 2022
End-to-End Integration of Speech Recognition, Dereverberation,
  Beamforming, and Self-Supervised Learning Representation
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation
Yoshiki Masuyama
Xuankai Chang
Samuele Cornell
Shinji Watanabe
Nobutaka Ono
17
19
0
19 Oct 2022
LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge
Yan Jia
Mihee Hong
Jingyu Hou
Kailong Ren
Sifan Ma
Jin Wang
Fangzhen Peng
Yinglin Ji
Lin Yang
Junjie Wang
25
1
0
14 Oct 2022
SQuAT: Sharpness- and Quantization-Aware Training for BERT
SQuAT: Sharpness- and Quantization-Aware Training for BERT
Zheng Wang
Juncheng Billy Li
Shuhui Qu
Florian Metze
Emma Strubell
MQ
24
7
0
13 Oct 2022
Synthetic Voice Detection and Audio Splicing Detection using
  SE-Res2Net-Conformer Architecture
Synthetic Voice Detection and Audio Splicing Detection using SE-Res2Net-Conformer Architecture
Lei Wang
Benedict Yeoh
Jun Wah Ng
40
7
0
07 Oct 2022
A Comparison of Transformer, Convolutional, and Recurrent Neural
  Networks on Phoneme Recognition
A Comparison of Transformer, Convolutional, and Recurrent Neural Networks on Phoneme Recognition
Kyuhong Shim
Wonyong Sung
25
2
0
01 Oct 2022
E-Branchformer: Branchformer with Enhanced merging for speech
  recognition
E-Branchformer: Branchformer with Enhanced merging for speech recognition
Kwangyoun Kim
Felix Wu
Yifan Peng
Jing Pan
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
61
105
0
30 Sep 2022
Two-Pass Low Latency End-to-End Spoken Language Understanding
Two-Pass Low Latency End-to-End Spoken Language Understanding
Siddhant Arora
Siddharth Dalmia
Xuankai Chang
Brian Yan
A. Black
Shinji Watanabe
VLM
30
19
0
14 Jul 2022
Branchformer: Parallel MLP-Attention Architectures to Capture Local and
  Global Context for Speech Recognition and Understanding
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Yifan Peng
Siddharth Dalmia
Ian Lane
Shinji Watanabe
30
143
0
06 Jul 2022
Improving Transformer-based Conversational ASR by Inter-Sentential
  Attention Mechanism
Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism
Kun Wei
Pengcheng Guo
Ning Jiang
48
11
0
02 Jul 2022
Confidence Score Based Conformer Speaker Adaptation for Speech
  Recognition
Confidence Score Based Conformer Speaker Adaptation for Speech Recognition
Jiajun Deng
Xurong Xie
Tianzi Wang
Mingyu Cui
Boyang Xue
Zengrui Jin
Mengzhe Geng
Guinan Li
Xunying Liu
Helen M. Meng
17
13
0
24 Jun 2022
LegoNN: Building Modular Encoder-Decoder Models
LegoNN: Building Modular Encoder-Decoder Models
Siddharth Dalmia
Dmytro Okhonko
M. Lewis
Sergey Edunov
Shinji Watanabe
Florian Metze
Luke Zettlemoyer
Abdel-rahman Mohamed
AuLLM
MoE
29
14
0
07 Jun 2022
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Sehoon Kim
A. Gholami
Albert Eaton Shaw
Nicholas Lee
K. Mangalam
Jitendra Malik
Michael W. Mahoney
Kurt Keutzer
32
99
0
02 Jun 2022
Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training
Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training
Dading Chong
Helin Wang
Peilin Zhou
Qingcheng Zeng
39
65
0
27 Apr 2022
Combining Spectral and Self-Supervised Features for Low Resource Speech
  Recognition and Translation
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation
Dan Berrebbi
Jiatong Shi
Brian Yan
Osbel López-Francisco
Jonathan D. Amith
Shinji Watanabe
10
26
0
05 Apr 2022
123
Next