ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.02562
  4. Cited By
Transformer Transducer: A Streamable Speech Recognition Model with
  Transformer Encoders and RNN-T Loss

Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss

7 February 2020
Qian Zhang
Han Lu
Hasim Sak
Anshuman Tripathi
Erik McDermott
Stephen Koo
Shankar Kumar
ArXivPDFHTML

Papers citing "Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss"

50 / 108 papers shown
Title
ChordFormer: A Conformer-Based Architecture for Large-Vocabulary Audio Chord Recognition
ChordFormer: A Conformer-Based Architecture for Large-Vocabulary Audio Chord Recognition
Muhammad Waseem Akram
Stefano Dettori
V. Colla
Giorgio Buttazzo
57
0
0
17 Feb 2025
Aligner-Encoders: Self-Attention Transformers Can Be Self-Transducers
Aligner-Encoders: Self-Attention Transformers Can Be Self-Transducers
Adam Stooke
Rohit Prabhavalkar
K. Sim
P. M. Mengibar
39
0
0
06 Feb 2025
Training Large ASR Encoders with Differential Privacy
Training Large ASR Encoders with Differential Privacy
Geeticka Chauhan
Steve Chien
Om Thakkar
Abhradeep Thakurta
Arun Narayanan
33
1
0
21 Sep 2024
A Non-autoregressive Generation Framework for End-to-End Simultaneous
  Speech-to-Any Translation
A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation
Zhengrui Ma
Qingkai Fang
Shaolei Zhang
Shoutao Guo
Yang Feng
Min Zhang
53
9
0
11 Jun 2024
Combining X-Vectors and Bayesian Batch Active Learning: Two-Stage Active Learning Pipeline for Speech Recognition
Combining X-Vectors and Bayesian Batch Active Learning: Two-Stage Active Learning Pipeline for Speech Recognition
O. Kundacina
V. Vincan
D. Mišković
BDL
104
0
0
03 May 2024
EfficientASR: Speech Recognition Network Compression via Attention
  Redundancy and Chunk-Level FFN Optimization
EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization
Jianzong Wang
Ziqi Liang
Xulong Zhang
Ning Cheng
Jing Xiao
38
0
0
30 Apr 2024
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Yash Jain
David M. Chan
Pranav Dheram
Aparna Khare
Olabanji Shonibare
Venkatesh Ravichandran
Shalini Ghosh
40
2
0
28 Mar 2024
Stateful Conformer with Cache-based Inference for Streaming Automatic
  Speech Recognition
Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition
Vahid Noroozi
Somshubra Majumdar
Ankur Kumar
Jagadeesh Balam
Boris Ginsburg
33
10
0
27 Dec 2023
Streaming Anchor Loss: Augmenting Supervision with Temporal Significance
Streaming Anchor Loss: Augmenting Supervision with Temporal Significance
U. Sarawgi
John Berkowitz
Vineet Garg
Arnav Kundu
Minsik Cho
Sai Srujana Buddi
Saurabh N. Adya
Ahmed H. Tewfik
31
1
0
09 Oct 2023
Exploring RWKV for Memory Efficient and Low Latency Streaming ASR
Exploring RWKV for Memory Efficient and Low Latency Streaming ASR
Keyu An
Shiliang Zhang
31
4
0
26 Sep 2023
Human Transcription Quality Improvement
Human Transcription Quality Improvement
Jian Gao
Hanbo Sun
Cheng Cao
Zheng Du
43
2
0
24 Sep 2023
Integration of Frame- and Label-synchronous Beam Search for Streaming
  Encoder-decoder Speech Recognition
Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition
E. Tsunoo
Hayato Futami
Yosuke Kashiwagi
Siddhant Arora
Shinji Watanabe
30
4
0
24 Jul 2023
TST: Time-Sparse Transducer for Automatic Speech Recognition
TST: Time-Sparse Transducer for Automatic Speech Recognition
Xiaohui Zhang
Mangui Liang
Zhengkun Tian
Jiangyan Yi
J. Tao
14
0
0
17 Jul 2023
A Dual-Stream Recurrence-Attention Network With Global-Local Awareness
  for Emotion Recognition in Textual Dialog
A Dual-Stream Recurrence-Attention Network With Global-Local Awareness for Emotion Recognition in Textual Dialog
Jiang Li
Xiaoping Wang
Zhigang Zeng
24
4
0
02 Jul 2023
Reducing the gap between streaming and non-streaming Transducer-based
  ASR by adaptive two-stage knowledge distillation
Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation
Haitao Tang
Yu Fu
Lei Sun
Jiabin Xue
Dan Liu
...
Zhiqiang Ma
Minghui Wu
Jia Pan
Genshun Wan
Ming’En Zhao
26
2
0
27 Jun 2023
Towards Effective and Compact Contextual Representation for Conformer
  Transducer Speech Recognition Systems
Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems
Mingyu Cui
Jiawen Kang
Jiajun Deng
Xiaoyue Yin
Yutao Xie
Xie Chen
Xunying Liu
35
8
0
23 Jun 2023
Parameter-efficient Dysarthric Speech Recognition Using Adapter Fusion
  and Householder Transformation
Parameter-efficient Dysarthric Speech Recognition Using Adapter Fusion and Householder Transformation
Jinzi Qi
Hugo Van hamme
43
3
0
12 Jun 2023
Streaming Speech-to-Confusion Network Speech Recognition
Streaming Speech-to-Confusion Network Speech Recognition
Denis Filimonov
Prabhat Pandey
Ariya Rastrow
Ankur Gandhe
A. Stolcke
HAI
29
0
0
02 Jun 2023
A Comparative Study on E-Branchformer vs Conformer in Speech
  Recognition, Translation, and Understanding Tasks
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
Yifan Peng
Kwangyoun Kim
Felix Wu
Brian Yan
Siddhant Arora
William Chen
Jiyang Tang
Suwon Shon
Prashant Sridhar
Shinji Watanabe
29
17
0
18 May 2023
DropDim: A Regularization Method for Transformer Networks
DropDim: A Regularization Method for Transformer Networks
Hao Zhang
Dan Qu
Kejia Shao
Xu Yang
28
12
0
20 Apr 2023
Dynamic Chunk Convolution for Unified Streaming and Non-Streaming
  Conformer ASR
Dynamic Chunk Convolution for Unified Streaming and Non-Streaming Conformer ASR
Xilai Li
Goeric Huybrechts
S. Ronanki
Jeffrey J. Farris
S. Bodapati
38
6
0
18 Apr 2023
Efficient Sequence Transduction by Jointly Predicting Tokens and
  Durations
Efficient Sequence Transduction by Jointly Predicting Tokens and Durations
Hainan Xu
Fei Jia
Somshubra Majumdar
Hengguan Huang
Shinji Watanabe
Boris Ginsburg
27
17
0
13 Apr 2023
Dual-Attention Neural Transducers for Efficient Wake Word Spotting in
  Speech Recognition
Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition
Saumya Yashmohini Sahai
Jing Liu
Thejaswi Muniyappa
Kanthashree Mysore Sathyendra
Anastasios Alexandridis
...
Ross McGowan
Ariya Rastrow
Feng-Ju Chang
Athanasios Mouchtaris
Siegfried Kunzmann
39
5
0
03 Apr 2023
Practical Conformer: Optimizing size, speed and flops of Conformer for
  on-Device and cloud ASR
Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR
Rami Botros
Anmol Gulati
Tara N. Sainath
K. Choromanski
Ruoming Pang
Trevor Strohman
Weiran Wang
Jiahui Yu
MQ
26
3
0
31 Mar 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
42
47
0
21 Mar 2023
Building High-accuracy Multilingual ASR with Gated Language Experts and
  Curriculum Training
Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training
Eric Sun
Jinyu Li
Yuxuan Hu
Yilun Zhu
Long Zhou
...
Peidong Wang
Linquan Liu
Shujie Liu
Ed Lin
Yifan Gong
34
6
0
01 Mar 2023
Neural Transducer Training: Reduced Memory Consumption with Sample-wise
  Computation
Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation
Stefan Braun
Erik McDermott
Roger Hsiao
40
1
0
29 Nov 2022
Augmenting Transformer-Transducer Based Speaker Change Detection With
  Token-Level Training Loss
Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss
Guanlong Zhao
Quan Wang
Han Lu
Yiling Huang
Ignacio López Moreno
19
14
0
11 Nov 2022
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture,
  and Generalization Capabilities
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities
Andros Tjandra
Nayan Singhal
David C. Zhang
Ozlem Kalinli
Abdel-rahman Mohamed
Duc Le
M. Seltzer
37
12
0
10 Nov 2022
Self-supervised learning with bi-label masked speech prediction for
  streaming multi-talker speech recognition
Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
Zili Huang
Zhuo Chen
Naoyuki Kanda
Jian Wu
Yiming Wang
Jinyu Li
Takuya Yoshioka
Xiaofei Wang
Peidong Wang
25
3
0
10 Nov 2022
Streaming, fast and accurate on-device Inverse Text Normalization for
  Automatic Speech Recognition
Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Yashesh Gaur
Nick Kibre
Jian Xue
Kangyuan Shu
Yuhui Wang
Issac Alphonso
Jinyu Li
Jiawei Liu
16
6
0
07 Nov 2022
Multi-blank Transducers for Speech Recognition
Multi-blank Transducers for Speech Recognition
Hainan Xu
Fei Jia
Somshubra Majumdar
Shinji Watanabe
Boris Ginsburg
28
11
0
04 Nov 2022
Minimum Latency Training of Sequence Transducers for Streaming
  End-to-End Speech Recognition
Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition
Yusuke Shinohara
Shinji Watanabe
AI4TS
23
9
0
04 Nov 2022
Variable Attention Masking for Configurable Transformer Transducer
  Speech Recognition
Variable Attention Masking for Configurable Transformer Transducer Speech Recognition
P. Swietojanski
Stefan Braun
Dogan Can
Thiago Fraga da Silva
Arnab Ghoshal
...
Henry Mason
Erik McDermott
Honza Silovsky
R. Travadi
Xiaodan Zhuang
40
13
0
02 Nov 2022
Factorized Blank Thresholding for Improved Runtime Efficiency of Neural
  Transducers
Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers
Duc Le
Frank Seide
Yuhao Wang
Heng Chang
Kjell Schubert
Ozlem Kalinli
M. Seltzer
19
6
0
02 Nov 2022
Unified End-to-End Speech Recognition and Endpointing for Fast and
  Efficient Speech Systems
Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems
Shaan Bijwadia
Shuo-yiin Chang
Bo-wen Li
Tara N. Sainath
Chaoyang Zhang
Yanzhang He
39
7
0
01 Nov 2022
Delay-penalized transducer for low-latency streaming ASR
Delay-penalized transducer for low-latency streaming ASR
Wei Kang
Zengwei Yao
Fangjun Kuang
Liyong Guo
Xiaoyu Yang
Long lin
Piotr Żelasko
Daniel Povey
30
6
0
31 Oct 2022
Structured State Space Decoder for Speech Recognition and Synthesis
Structured State Space Decoder for Speech Recognition and Synthesis
Koichi Miyazaki
Masato Murata
Tomoki Koriyama
34
12
0
31 Oct 2022
Highly Efficient Real-Time Streaming and Fully On-Device Speaker
  Diarization with Multi-Stage Clustering
Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering
Quan Wang
Yiling Huang
Han Lu
Guanlong Zhao
Ignacio López Moreno
29
11
0
25 Oct 2022
Synthetic Voice Detection and Audio Splicing Detection using
  SE-Res2Net-Conformer Architecture
Synthetic Voice Detection and Audio Splicing Detection using SE-Res2Net-Conformer Architecture
Lei Wang
Benedict Yeoh
Jun Wah Ng
40
7
0
07 Oct 2022
Damage Control During Domain Adaptation for Transducer Based Automatic
  Speech Recognition
Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition
Somshubra Majumdar
Shantanu Acharya
Vitaly Lavrukhin
Boris Ginsburg
27
3
0
06 Oct 2022
A Comparison of Transformer, Convolutional, and Recurrent Neural
  Networks on Phoneme Recognition
A Comparison of Transformer, Convolutional, and Recurrent Neural Networks on Phoneme Recognition
Kyuhong Shim
Wonyong Sung
25
2
0
01 Oct 2022
E-Branchformer: Branchformer with Enhanced merging for speech
  recognition
E-Branchformer: Branchformer with Enhanced merging for speech recognition
Kwangyoun Kim
Felix Wu
Yifan Peng
Jing Pan
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
61
105
0
30 Sep 2022
ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers
  for Streaming Speech Recognition
ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition
Martin H. Radfar
Rohit Barnwal
R. Swaminathan
Feng-Ju Chang
Grant P. Strimel
Nathan Susanj
Athanasios Mouchtaris
34
13
0
29 Sep 2022
FV2ES: A Fully End2End Multimodal System for Fast Yet Effective Video
  Emotion Recognition Inference
FV2ES: A Fully End2End Multimodal System for Fast Yet Effective Video Emotion Recognition Inference
Qinglan Wei
Xu-Juan Huang
Yuan Zhang
21
14
0
21 Sep 2022
Analysis of Self-Attention Head Diversity for Conformer-based Automatic
  Speech Recognition
Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition
Kartik Audhkhasi
Yinghui Huang
Bhuvana Ramabhadran
Pedro J. Moreno
24
3
0
13 Sep 2022
VarArray Meets t-SOT: Advancing the State of the Art of Streaming
  Distant Conversational Speech Recognition
VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition
Naoyuki Kanda
Jian Wu
Xiaofei Wang
Zhuo Chen
Jinyu Li
Takuya Yoshioka
29
16
0
12 Sep 2022
Learning a Dual-Mode Speech Recognition Model via Self-Pruning
Learning a Dual-Mode Speech Recognition Model via Self-Pruning
Chunxi Liu
Yuan Shangguan
Haichuan Yang
Yangyang Shi
Raghuraman Krishnamoorthi
Ozlem Kalinli
SSL
29
7
0
25 Jul 2022
Improving Mandarin Speech Recogntion with Block-augmented Transformer
Improving Mandarin Speech Recogntion with Block-augmented Transformer
Xiaoming Ren
Huifeng Zhu
Liuwei Wei
Minghui Wu
Jie Hao
38
9
0
24 Jul 2022
GraphCFC: A Directed Graph Based Cross-Modal Feature Complementation
  Approach for Multimodal Conversational Emotion Recognition
GraphCFC: A Directed Graph Based Cross-Modal Feature Complementation Approach for Multimodal Conversational Emotion Recognition
Jiang Li
Xiaoping Wang
Guoqing Lv
Zhigang Zeng
39
37
0
06 Jul 2022
123
Next