ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.06317
  4. Cited By
A Comparative Study on Transformer vs RNN in Speech Applications

A Comparative Study on Transformer vs RNN in Speech Applications

13 September 2019
Shigeki Karita
Nanxin Chen
Tomoki Hayashi
Takaaki Hori
Hirofumi Inaguma
Ziyan Jiang
Masao Someki
Nelson Yalta
Ryuichi Yamamoto
Xiao-fei Wang
Shinji Watanabe
Takenori Yoshimura
Wangyou Zhang
ArXivPDFHTML

Papers citing "A Comparative Study on Transformer vs RNN in Speech Applications"

50 / 131 papers shown
Title
End-to-End Multi-speaker ASR with Independent Vector Analysis
End-to-End Multi-speaker ASR with Independent Vector Analysis
Robin Scheibler
Wangyou Zhang
Xuankai Chang
Shinji Watanabe
Y. Qian
24
2
0
01 Apr 2022
Noise-robust Speech Recognition with 10 Minutes Unparalleled In-domain
  Data
Noise-robust Speech Recognition with 10 Minutes Unparalleled In-domain Data
Chen Chen
Nana Hou
Yuchen Hu
Shashank Shirol
Chng Eng Siong
NoLa
14
43
0
29 Mar 2022
Transformer-based Streaming ASR with Cumulative Attention
Transformer-based Streaming ASR with Cumulative Attention
Mohan Li
Shucong Zhang
Catalin Zorila
R. Doddipatla
27
9
0
11 Mar 2022
A Conformer Based Acoustic Model for Robust Automatic Speech Recognition
A Conformer Based Acoustic Model for Robust Automatic Speech Recognition
Yufeng Yang
Peidong Wang
DeLiang Wang
20
12
0
01 Mar 2022
A Differential Attention Fusion Model Based on Transformer for Time
  Series Forecasting
A Differential Attention Fusion Model Based on Transformer for Time Series Forecasting
Benhan Li
Shengdong Du
Tianrui Li
AI4TS
20
2
0
23 Feb 2022
Multi-view and Multi-modal Event Detection Utilizing Transformer-based
  Multi-sensor fusion
Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion
Masahiro Yasuda
Yasunori Ohishi
Shoichiro Saito
N. Harada
38
13
0
18 Feb 2022
Discovering Phonetic Inventories with Crosslingual Automatic Speech
  Recognition
Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition
Piotr Żelasko
Siyuan Feng
Laureano Moro Velázquez
A. Abavisani
Saurabhchand Bhati
O. Scharenborg
M. Hasegawa-Johnson
Najim Dehak
33
15
0
26 Jan 2022
A Study of Transducer based End-to-End ASR with ESPnet: Architecture,
  Auxiliary Loss and Decoding Strategies
A Study of Transducer based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies
Florian Boyer
Yusuke Shinohara
Takaaki Ishii
Hirofumi Inaguma
Shinji Watanabe
32
34
0
14 Jan 2022
An exploratory experiment on Hindi, Bengali hate-speech detection and
  transfer learning using neural networks
An exploratory experiment on Hindi, Bengali hate-speech detection and transfer learning using neural networks
Tung Minh Phung
Janis Cloos
26
2
0
06 Jan 2022
ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
Siddhant Arora
Siddharth Dalmia
Pavel Denisov
Xuankai Chang
Yushi Ueda
...
Karthik Ganesan
Brian Yan
Ngoc Thang Vu
A. Black
Shinji Watanabe
VLM
33
74
0
29 Nov 2021
Mixed Precision Low-bit Quantization of Neural Network Language Models
  for Speech Recognition
Mixed Precision Low-bit Quantization of Neural Network Language Models for Speech Recognition
Junhao Xu
Jianwei Yu
Shoukang Hu
Xunying Liu
Helen Meng
MQ
27
13
0
29 Nov 2021
Attention-based Multi-hypothesis Fusion for Speech Summarization
Attention-based Multi-hypothesis Fusion for Speech Summarization
Takatomo Kano
A. Ogawa
Marc Delcroix
Shinji Watanabe
22
13
0
16 Nov 2021
Cross-attention conformer for context modeling in speech enhancement for
  ASR
Cross-attention conformer for context modeling in speech enhancement for ASR
A. Narayanan
Chung-Cheng Chiu
Tom O'Malley
Quan Wang
Yanzhang He
24
14
0
30 Oct 2021
Multi-Modal Pre-Training for Automated Speech Recognition
Multi-Modal Pre-Training for Automated Speech Recognition
David M. Chan
Shalini Ghosh
D. Chakrabarty
Björn Hoffmeister
SSL
30
16
0
12 Oct 2021
An Exploration of Self-Supervised Pretrained Representations for
  End-to-End Speech Recognition
An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition
Xuankai Chang
Takashi Maekaku
Pengcheng Guo
Jing Shi
Yen-Ju Lu
...
Tianzi Wang
Shu-Wen Yang
Yu Tsao
Hung-yi Lee
Shinji Watanabe
SSL
AI4TS
24
81
0
09 Oct 2021
Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular
  Subword Units
Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units
Yosuke Higuchi
Keita Karube
Tetsuji Ogawa
Tetsunori Kobayashi
18
22
0
08 Oct 2021
Streaming Transformer Transducer Based Speech Recognition Using
  Non-Causal Convolution
Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution
Yangyang Shi
Chunyang Wu
Dilin Wang
Alex Xiao
Jay Mahadeokar
...
Ke Li
Yuan Shangguan
Varun K. Nagaraja
Ozlem Kalinli
M. Seltzer
36
15
0
07 Oct 2021
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with
  Non-Autoregressive Hidden Intermediates
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates
Hirofumi Inaguma
Siddharth Dalmia
Brian Yan
Shinji Watanabe
65
11
0
27 Sep 2021
Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic
  Interactions
Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic Interactions
D. Curto
Albert Clapés
Javier Selva
Sorina Smeureanu
Julio C. S. Jacques Junior
...
G. Guilera
D. Leiva
T. Moeslund
Sergio Escalera
Cristina Palmero
46
29
0
20 Sep 2021
Beyond Isolated Utterances: Conversational Emotion Recognition
Beyond Isolated Utterances: Conversational Emotion Recognition
R. Pappagari
Piotr Żelasko
Jesús Villalba
Laureano Moro Velázquez
Najim Dehak
27
4
0
13 Sep 2021
Multilingual Speech Recognition for Low-Resource Indian Languages using
  Multi-Task conformer
Multilingual Speech Recognition for Low-Resource Indian Languages using Multi-Task conformer
Krishna D N Freshworks
29
7
0
22 Aug 2021
Optimizing Latency for Online Video CaptioningUsing Audio-Visual
  Transformers
Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers
Chiori Hori
Takaaki Hori
Jonathan Le Roux
25
4
0
04 Aug 2021
ESPnet-ST IWSLT 2021 Offline Speech Translation System
ESPnet-ST IWSLT 2021 Offline Speech Translation System
Hirofumi Inaguma
Shun Kiyono
Nelson Enrique Yalta Soplin
Pengcheng Guo
Jun Suzuki
Kevin Duh
Shinji Watanabe
3DV
37
2
0
01 Jul 2021
OadTR: Online Action Detection with Transformers
OadTR: Online Action Detection with Transformers
Xiang Wang
Shiwei Zhang
Zhiwu Qing
Yuanjie Shao
Zhe Zuo
Changxin Gao
Nong Sang
OffRL
ViT
34
109
0
21 Jun 2021
Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse
  Response Simulation for Sound Event Localization and Detection
Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection
Kazuki Shimada
Naoya Takahashi
Yuichiro Koyama
Shusuke Takahashi
E. Tsunoo
Masafumi Takahashi
Yuki Mitsufuji
30
23
0
21 Jun 2021
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of
  Transcribed Audio
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Guoguo Chen
Shuzhou Chai
Guan-Bo Wang
Jiayu Du
Weiqiang Zhang
...
Xuchen Yao
Yongqing Wang
Yujun Wang
Zhao You
Zhiyong Yan
60
351
0
13 Jun 2021
Label-Synchronous Speech-to-Text Alignment for ASR Using Forward and
  Backward Transformers
Label-Synchronous Speech-to-Text Alignment for ASR Using Forward and Backward Transformers
Yusuke Kida
Tatsuya Komatsu
M. Togami
21
1
0
21 Apr 2021
Advanced Long-context End-to-end Speech Recognition Using
  Context-expanded Transformers
Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers
Takaaki Hori
Niko Moritz
Chiori Hori
Jonathan Le Roux
27
34
0
19 Apr 2021
Comparing the Benefit of Synthetic Training Data for Various Automatic
  Speech Recognition Architectures
Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures
Nick Rossenbach
Mohammad Zeineldeen
Benedikt Hilmes
Ralf Schluter
Hermann Ney
28
12
0
12 Apr 2021
Going deeper with Image Transformers
Going deeper with Image Transformers
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Hervé Jégou
ViT
27
986
0
31 Mar 2021
Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning
  with Self-Knowledge Distillation
Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation
Md. Akmal Haidar
Chao Xing
Mehdi Rezagholizadeh
27
7
0
17 Mar 2021
End-to-end acoustic modelling for phone recognition of young readers
End-to-end acoustic modelling for phone recognition of young readers
Lucile Gelin
Morgane Daniel
J. Pinquier
Thomas Pellegrini
18
13
0
04 Mar 2021
End-to-End Dereverberation, Beamforming, and Speech Recognition with
  Improved Numerical Stability and Advanced Frontend
End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend
Wangyou Zhang
Christoph Boeddeker
Shinji Watanabe
Tomohiro Nakatani
Marc Delcroix
K. Kinoshita
Tsubasa Ochiai
Naoyuki Kamo
Reinhold Haeb-Umbach
Y. Qian
20
32
0
23 Feb 2021
Deep Learning based Multi-Source Localization with Source Splitting and
  its Effectiveness in Multi-Talker Speech Recognition
Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition
Aswin Shanmugam Subramanian
Chao Weng
Shinji Watanabe
Meng Yu
Dong Yu
34
78
0
16 Feb 2021
Train your classifier first: Cascade Neural Networks Training from upper
  layers to lower layers
Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers
Shucong Zhang
Cong-Thanh Do
R. Doddipatla
Erfan Loweimi
P. Bell
Steve Renals
24
2
0
09 Feb 2021
Advances in Electron Microscopy with Deep Learning
Advances in Electron Microscopy with Deep Learning
Jeffrey M. Ede
35
2
0
04 Jan 2021
A Survey on Deep Reinforcement Learning for Audio-Based Applications
A Survey on Deep Reinforcement Learning for Audio-Based Applications
S. Latif
Heriberto Cuayáhuitl
Farrukh Pervez
Fahad Shamshad
Hafiz Shehbaz Ali
Min Zhang
OffRL
47
73
0
01 Jan 2021
NeurST: Neural Speech Translation Toolkit
NeurST: Neural Speech Translation Toolkit
Chengqi Zhao
Mingxuan Wang
Qianqian Dong
Rong Ye
Lei Li
30
32
0
18 Dec 2020
Group Communication with Context Codec for Lightweight Source Separation
Group Communication with Context Codec for Lightweight Source Separation
Yi Luo
Cong Han
N. Mesgarani
26
20
0
14 Dec 2020
Unsupervised Domain Adaptation for Speech Recognition via Uncertainty
  Driven Self-Training
Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training
Sameer Khurana
Niko Moritz
Takaaki Hori
Jonathan Le Roux
24
54
0
26 Nov 2020
The SLT 2021 children speech recognition challenge: Open datasets, rules
  and baselines
The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines
Fan Yu
Zhuoyuan Yao
Xiong Wang
Keyu An
Lei Xie
Zhijian Ou
Bo Liu
Xiulin Li
Guanqiong Miao
28
20
0
13 Nov 2020
Towards Semi-Supervised Semantics Understanding from Speech
Towards Semi-Supervised Semantics Understanding from Speech
Cheng-I Jeff Lai
Jin Cao
S. Bodapati
Shang-Wen Li
SSL
22
7
0
11 Nov 2020
On the Usefulness of Self-Attention for Automatic Speech Recognition
  with Transformers
On the Usefulness of Self-Attention for Automatic Speech Recognition with Transformers
Shucong Zhang
Erfan Loweimi
P. Bell
Steve Renals
30
36
0
08 Nov 2020
Internal Language Model Estimation for Domain-Adaptive End-to-End Speech
  Recognition
Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition
Zhong Meng
S. Parthasarathy
Eric Sun
Yashesh Gaur
Naoyuki Kanda
Liang Lu
Xie Chen
Rui Zhao
Jinyu Li
Jiawei Liu
AuLLM
19
107
0
03 Nov 2020
Speech SIMCLR: Combining Contrastive and Reconstruction Objective for
  Self-supervised Speech Representation Learning
Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning
Dongwei Jiang
Wubo Li
Miao Cao
Wei Zou
Xiangang Li
SSL
21
65
0
27 Oct 2020
Recent Developments on ESPnet Toolkit Boosted by Conformer
Recent Developments on ESPnet Toolkit Boosted by Conformer
Pengcheng Guo
Florian Boyer
Xuankai Chang
Tomoki Hayashi
Yosuke Higuchi
...
Jing Shi
Shinji Watanabe
Kun Wei
Wangyou Zhang
Yuekai Zhang
45
262
0
26 Oct 2020
Attention is All You Need in Speech Separation
Attention is All You Need in Speech Separation
Cem Subakan
Mirco Ravanelli
Samuele Cornell
Mirko Bronzi
Jianyuan Zhong
45
537
0
25 Oct 2020
Align-Refine: Non-Autoregressive Speech Recognition via Iterative
  Realignment
Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment
Ethan A. Chi
Julian Salazar
Katrin Kirchhoff
AI4TS
17
51
0
24 Oct 2020
Transformer-based End-to-End Speech Recognition with Local Dense
  Synthesizer Attention
Transformer-based End-to-End Speech Recognition with Local Dense Synthesizer Attention
Menglong Xu
Shengqiang Li
Xiao-Lei Zhang
27
31
0
23 Oct 2020
Developing Real-time Streaming Transformer Transducer for Speech
  Recognition on Large-scale Dataset
Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset
Xie Chen
Yu-Huan Wu
Zhenghao Wang
Shujie Liu
Jinyu Li
22
169
0
22 Oct 2020
Previous
123
Next