Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.02562
Cited By
Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss
7 February 2020
Qian Zhang
Han Lu
Hasim Sak
Anshuman Tripathi
Erik McDermott
Stephen Koo
Shankar Kumar
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss"
50 / 108 papers shown
Title
Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision Strategies
Zehan Li
Haoran Miao
Keqi Deng
Gaofeng Cheng
Sanli Tian
Ta Li
Yonghong Yan
KELM
27
4
0
06 Jul 2022
Deformable Graph Transformer
Jinyoung Park
Seongjun Yun
Hyeon-ju Park
Jaewoo Kang
Jisu Jeong
KyungHyun Kim
Jung-Woo Ha
Hyunwoo J. Kim
90
7
0
29 Jun 2022
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
...
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
125
1,066
0
22 Jun 2022
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Sehoon Kim
A. Gholami
Albert Eaton Shaw
Nicholas Lee
K. Mangalam
Jitendra Malik
Michael W. Mahoney
Kurt Keutzer
32
99
0
02 Jun 2022
Contrastive Siamese Network for Semi-supervised Speech Recognition
S. Khorram
Jaeyoung Kim
Anshuman Tripathi
Han Lu
Qian Zhang
Hasim Sak
SSL
29
11
0
27 May 2022
Global Normalization for Streaming Speech Recognition in a Modular Framework
Ehsan Variani
Ke Wu
Michael Riley
David Rybach
Matt Shannon
Cyril Allauzen
20
9
0
26 May 2022
Improving CTC-based ASR Models with Gated Interlayer Collaboration
Yuting Yang
Yuke Li
Binbin Du
34
11
0
25 May 2022
Deep Learning Enabled Semantic Communications with Speech Recognition and Synthesis
Zhenzi Weng
Zhijin Qin
Xiaoming Tao
Chengkang Pan
Guangyi Liu
Geoffrey Ye Li
41
132
0
09 May 2022
Large-Scale Streaming End-to-End Speech Translation with Neural Transducers
Jian Xue
Peidong Wang
Jinyu Li
Matt Post
Yashesh Gaur
AI4TS
32
26
0
11 Apr 2022
End-to-end multi-talker audio-visual ASR using an active speaker attention module
R. Rose
Olivier Siohan
13
3
0
01 Apr 2022
CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition
Chengxin Chen
Pengyuan Zhang
AI4TS
16
10
0
31 Mar 2022
Dynamic Latency for CTC-Based Streaming Automatic Speech Recognition With Emformer
J. Sun
Guiping Zhong
Dinghao Zhou
Baoxiang Li
21
0
0
29 Mar 2022
Similarity and Content-based Phonetic Self Attention for Speech Recognition
Kyuhong Shim
Wonyong Sung
20
7
0
19 Mar 2022
Transformer-based Streaming ASR with Cumulative Attention
Mohan Li
Shucong Zhang
Catalin Zorila
R. Doddipatla
27
9
0
11 Mar 2022
A Conformer Based Acoustic Model for Robust Automatic Speech Recognition
Yufeng Yang
Peidong Wang
DeLiang Wang
20
12
0
01 Mar 2022
Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition
Piotr Żelasko
Siyuan Feng
Laureano Moro Velázquez
A. Abavisani
Saurabhchand Bhati
O. Scharenborg
M. Hasegawa-Johnson
Najim Dehak
33
15
0
26 Jan 2022
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
91
40
0
25 Jan 2022
Improving the fusion of acoustic and text representations in RNN-T
Chao Zhang
Bo-wen Li
Zhiyun Lu
Tara N. Sainath
Shuo-yiin Chang
AI4CE
43
12
0
25 Jan 2022
A Study of Transducer based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies
Florian Boyer
Yusuke Shinohara
Takaaki Ishii
Hirofumi Inaguma
Shinji Watanabe
35
34
0
14 Jan 2022
Self-Supervised Learning for speech recognition with Intermediate layer supervision
Chengyi Wang
Yu-Huan Wu
Sanyuan Chen
Shujie Liu
Jinyu Li
Yao Qian
Zhenglu Yang
SSL
26
28
0
16 Dec 2021
Mixed Precision Low-bit Quantization of Neural Network Language Models for Speech Recognition
Junhao Xu
Jianwei Yu
Shoukang Hu
Xunying Liu
Helen Meng
MQ
30
13
0
29 Nov 2021
Context-Aware Transformer Transducer for Speech Recognition
Feng-Ju Chang
Jing Liu
Martin H. Radfar
Athanasios Mouchtaris
M. Omologo
Ariya Rastrow
Siegfried Kunzmann
21
79
0
05 Nov 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
124
1,715
0
26 Oct 2021
Multilingual Speech Recognition using Knowledge Transfer across Learning Processes
Rimita Lahiri
K. Kumatani
Eric Sun
Yao Qian
55
6
0
15 Oct 2021
Multi-Modal Pre-Training for Automated Speech Recognition
David M. Chan
Shalini Ghosh
D. Chakrabarty
Björn Hoffmeister
SSL
30
16
0
12 Oct 2021
Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution
Yangyang Shi
Chunyang Wu
Dilin Wang
Alex Xiao
Jay Mahadeokar
...
Ke Li
Yuan Shangguan
Varun K. Nagaraja
Ozlem Kalinli
M. Seltzer
36
15
0
07 Oct 2021
Factorized Neural Transducer for Efficient Language Model Adaptation
Xie Chen
Zhong Meng
S. Parthasarathy
Jinyu Li
21
39
0
27 Sep 2021
Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection
Wei Xia
Han Lu
Quan Wang
Anshuman Tripathi
Yiling Huang
Ignacio López Moreno
Hasim Sak
41
51
0
23 Sep 2021
ChannelAugment: Improving generalization of multi-channel ASR by training with input channel randomization
M. Gaudesi
F. Weninger
D. Sharma
P. Zhan
AAML
33
1
0
23 Sep 2021
Tied & Reduced RNN-T Decoder
Rami Botros
Tara N. Sainath
R. David
Emmanuel Guzman
Wei Li
Yanzhang He
38
55
0
15 Sep 2021
Residual Adapters for Parameter-Efficient ASR Adaptation to Atypical and Accented Speech
Katrin Tomanek
Vicky Zayats
Dirk Padfield
K. Vaillancourt
Fadi Biadsy
59
57
0
14 Sep 2021
Learning a Neural Diff for Speech Models
J. Macoskey
Grant P. Strimel
Ariya Rastrow
18
2
0
03 Aug 2021
GLiT: Neural Architecture Search for Global and Local Image Transformer
Boyu Chen
Peixia Li
Chuming Li
Baopu Li
Lei Bai
Chen Lin
Ming Sun
Junjie Yan
Wanli Ouyang
ViT
35
85
0
07 Jul 2021
Multi-mode Transformer Transducer with Stochastic Future Context
Kwangyoun Kim
Felix Wu
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
30
9
0
17 Jun 2021
Collaborative Training of Acoustic Encoders for Speech Recognition
Varun K. Nagaraja
Yangyang Shi
Ganesh Venkatesh
Ozlem Kalinli
M. Seltzer
Vikas Chandra
43
11
0
16 Jun 2021
End-to-end Neural Diarization: From Transformer to Conformer
Yi Y. Liu
Eunjung Han
Chul Lee
A. Stolcke
22
40
0
14 Jun 2021
On the limit of English conversational speech recognition
Zoltán Tüske
G. Saon
Brian Kingsbury
22
50
0
03 May 2021
Scaling End-to-End Models for Large-Scale Multilingual ASR
Bo-wen Li
Ruoming Pang
Tara N. Sainath
Anmol Gulati
Yu Zhang
James Qin
Parisa Haghani
Yifan Jiang
Min Ma
Junwen Bai
CLL
34
76
0
30 Apr 2021
On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era
Shahin Amiriparian
Artem Sokolov
Ilhan Aslan
Lukas Christ
Maurice Gerczuk
...
M. Milling
Sandra Ottl
Ilya Poduremennykh
E. Shuranov
Björn W. Schuller
33
17
0
20 Apr 2021
Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers
Takaaki Hori
Niko Moritz
Chiori Hori
Jonathan Le Roux
30
34
0
19 Apr 2021
HMM-Free Encoder Pre-Training for Streaming RNN Transducer
Lu Huang
J. Sun
Yu Tang
Junfeng Hou
Jinkun Chen
Jun Zhang
Zejun Ma
25
3
0
02 Apr 2021
Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation
Md. Akmal Haidar
Chao Xing
Mehdi Rezagholizadeh
27
7
0
17 Mar 2021
Learning Word-Level Confidence For Subword End-to-End ASR
David Qiu
Qiujia Li
Yanzhang He
Yu Zhang
Bo-wen Li
...
Deepti Bhatia
Wei Li
Ke Hu
Tara N. Sainath
Ian McGraw
32
32
0
11 Mar 2021
Less Is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging
Rohit Prabhavalkar
Yanzhang He
David Rybach
S. Campbell
A. Narayanan
Trevor Strohman
Tara N. Sainath
49
35
0
12 Dec 2020
Streaming end-to-end multi-talker speech recognition
Liang Lu
Naoyuki Kanda
Jinyu Li
Jiawei Liu
13
41
0
26 Nov 2020
Improving RNN-T ASR Accuracy Using Context Audio
A. Schwarz
Ilya Sklyar
Simon Wiesler
21
9
0
20 Nov 2020
Improving RNN Transducer Based ASR with Auxiliary Tasks
Chunxi Liu
Frank Zhang
Duc Le
Suyoun Kim
Yatharth Saraf
Geoffrey Zweig
26
49
0
05 Nov 2020
Cascaded encoders for unifying streaming and non-streaming ASR
A. Narayanan
Tara N. Sainath
Ruoming Pang
Jiahui Yu
Chung-Cheng Chiu
Rohit Prabhavalkar
Ehsan Variani
Trevor Strohman
AuLLM
8
85
0
27 Oct 2020
Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer
Suyoun Kim
Shangguan Yuan
Jay Mahadeokar
A. Bruguier
Christian Fuegen
M. Seltzer
Duc Le
15
28
0
26 Oct 2020
Transformer-based End-to-End Speech Recognition with Local Dense Synthesizer Attention
Menglong Xu
Shengqiang Li
Xiao-Lei Zhang
27
31
0
23 Oct 2020
Previous
1
2
3
Next