ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.12977
  4. Cited By
Transformer-Transducer: End-to-End Speech Recognition with
  Self-Attention

Transformer-Transducer: End-to-End Speech Recognition with Self-Attention

28 October 2019
Ching-Feng Yeh
Jay Mahadeokar
Kaustubh Kalgaonkar
Yongqiang Wang
Duc Le
Mahaveer Jain
Kjell Schubert
Christian Fuegen
M. Seltzer
ArXivPDFHTML

Papers citing "Transformer-Transducer: End-to-End Speech Recognition with Self-Attention"

50 / 102 papers shown
Title
Aligner-Encoders: Self-Attention Transformers Can Be Self-Transducers
Aligner-Encoders: Self-Attention Transformers Can Be Self-Transducers
Adam Stooke
Rohit Prabhavalkar
K. Sim
P. M. Mengibar
37
0
0
06 Feb 2025
Large Language Models Are Read/Write Policy-Makers for Simultaneous Generation
Shoutao Guo
Shaolei Zhang
Zhengrui Ma
Yang Feng
31
0
0
03 Jan 2025
Transducer Consistency Regularization for Speech to Text Applications
Transducer Consistency Regularization for Speech to Text Applications
Cindy Tseng
Yun Tang
Vijendra Raj Apsingekar
33
0
0
09 Oct 2024
HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASR
HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASR
Hainan Xu
Travis M. Bartley
Vladimir Bataev
Boris Ginsburg
141
0
0
03 Oct 2024
Fast Streaming Transducer ASR Prototyping via Knowledge Distillation
  with Whisper
Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper
Iuliia Thorbecke
Juan Zuluaga-Gomez
Esaú Villatoro-Tello
Shashi Kumar
Pradeep Rangappa
Sergio Burdisso
P. Motlícek
Karthik Pandia
A. Ganapathiraju
31
0
0
20 Sep 2024
Clustering and Mining Accented Speech for Inclusive and Fair Speech
  Recognition
Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition
Jaeyoung Kim
Han Lu
S. Khorram
Anshuman Tripathi
Qian Zhang
Hasim Sak
21
0
0
05 Aug 2024
TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration
  Transducer
TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer
Yu Xi
Hao Li
Baochen Yang
Haoyu Li
Hai-kun Xu
Kai Yu
35
1
0
20 Mar 2024
Partial Rewriting for Multi-Stage ASR
Partial Rewriting for Multi-Stage ASR
A. Bruguier
David Qiu
Yanzhang He
24
0
0
08 Dec 2023
Unified Segment-to-Segment Framework for Simultaneous Sequence
  Generation
Unified Segment-to-Segment Framework for Simultaneous Sequence Generation
Shaolei Zhang
Yang Feng
23
7
0
27 Oct 2023
Leveraging Timestamp Information for Serialized Joint Streaming
  Recognition and Translation
Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation
Sara Papi
Peidong Wang
Junkun Chen
Jian Xue
Naoyuki Kanda
Jinyu Li
Yashesh Gaur
13
3
0
23 Oct 2023
Improving End-to-End Speech Processing by Efficient Text Data
  Utilization with Latent Synthesis
Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Jianqiao Lu
Wenyong Huang
Nianzu Zheng
Xingshan Zeng
Y. Yeung
Xiao Chen
SyDa
19
1
0
09 Oct 2023
Improving Stability in Simultaneous Speech Translation: A
  Revision-Controllable Decoding Approach
Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach
Junkun Chen
Jian Xue
Peidong Wang
Jing Pan
Jinyu Li
21
2
0
06 Oct 2023
Folding Attention: Memory and Power Optimization for On-Device
  Transformer-based Streaming Speech Recognition
Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
Yang Li
Liangzhen Lai
Shangguan Yuan
Forrest N. Iandola
Zhaoheng Ni
Ernie Chang
Yangyang Shi
Vikas Chandra
31
2
0
14 Sep 2023
Cross-view Semantic Alignment for Livestreaming Product Recognition
Cross-view Semantic Alignment for Livestreaming Product Recognition
Wenjie Yang
Yiyi Chen
Yan Li
Yanhua Cheng
Xudong Liu
Quanming Chen
Han Li
34
2
0
09 Aug 2023
CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech
  Recognition
CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition
Tian-Hao Zhang
Dinghao Zhou
Guiping Zhong
Jiaming Zhou
Baoxiang Li
18
3
0
26 Jul 2023
Token-Level Serialized Output Training for Joint Streaming ASR and ST
  Leveraging Textual Alignments
Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Sara Papi
Peidong Wan
Junkun Chen
Jian Xue
Jinyu Li
Yashesh Gaur
26
8
0
07 Jul 2023
Reducing the gap between streaming and non-streaming Transducer-based
  ASR by adaptive two-stage knowledge distillation
Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation
Haitao Tang
Yu Fu
Lei Sun
Jiabin Xue
Dan Liu
...
Zhiqiang Ma
Minghui Wu
Jia Pan
Genshun Wan
Ming’En Zhao
21
2
0
27 Jun 2023
Towards Effective and Compact Contextual Representation for Conformer
  Transducer Speech Recognition Systems
Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems
Mingyu Cui
Jiawen Kang
Jiajun Deng
Xiaoyue Yin
Yutao Xie
Xie Chen
Xunying Liu
27
8
0
23 Jun 2023
When to Use Efficient Self Attention? Profiling Text, Speech and Image
  Transformer Variants
When to Use Efficient Self Attention? Profiling Text, Speech and Image Transformer Variants
Anuj Diwan
Eunsol Choi
David F. Harwath
41
0
0
14 Jun 2023
A Comprehensive Survey on Applications of Transformers for Deep Learning
  Tasks
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks
Saidul Islam
Hanae Elmekki
Ahmed Elsebai
Jamal Bentahar
Najat Drawel
Gaith Rjoub
Witold Pedrycz
ViT
MedIm
21
171
0
11 Jun 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in
  1-layer Transformer
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Yuandong Tian
Yiping Wang
Beidi Chen
S. Du
MLT
26
70
0
25 May 2023
InterFormer: Interactive Local and Global Features Fusion for Automatic
  Speech Recognition
InterFormer: Interactive Local and Global Features Fusion for Automatic Speech Recognition
Zhibing Lai
Tianren Zhang
Qi Liu
Xinyuan Qian
Li-Fang Wei
Songlu Chen
Feng Chen
Xu-Cheng Yin
35
2
0
24 May 2023
Cross-lingual Knowledge Transfer and Iterative Pseudo-labeling for
  Low-Resource Speech Recognition with Transducers
Cross-lingual Knowledge Transfer and Iterative Pseudo-labeling for Low-Resource Speech Recognition with Transducers
J. Silovský
Liuhui Deng
Arturo Argueta
Tresi Arvizo
Roger Hsiao
Sasha Kuznietsov
Yiu-Chang Lin
Xiaoqiang Xiao
Yuanyuan Zhang
30
2
0
23 May 2023
Efficient Sequence Transduction by Jointly Predicting Tokens and
  Durations
Efficient Sequence Transduction by Jointly Predicting Tokens and Durations
Hainan Xu
Fei Jia
Somshubra Majumdar
Hengguan Huang
Shinji Watanabe
Boris Ginsburg
27
17
0
13 Apr 2023
Dual-Attention Neural Transducers for Efficient Wake Word Spotting in
  Speech Recognition
Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition
Saumya Yashmohini Sahai
Jing Liu
Thejaswi Muniyappa
Kanthashree Mysore Sathyendra
Anastasios Alexandridis
...
Ross McGowan
Ariya Rastrow
Feng-Ju Chang
Athanasios Mouchtaris
Siegfried Kunzmann
33
5
0
03 Apr 2023
PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech
  recognition in neural transducers
PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers
R. Pandey
Roger Ren
Qi Luo
Jing Liu
Ariya Rastrow
Ankur Gandhe
Denis Filimonov
Grant P. Strimel
A. Stolcke
I. Bulyko
25
13
0
30 Mar 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
42
47
0
21 Mar 2023
End-to-End Speech Recognition: A Survey
End-to-End Speech Recognition: A Survey
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
26
148
0
03 Mar 2023
Building High-accuracy Multilingual ASR with Gated Language Experts and
  Curriculum Training
Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training
Eric Sun
Jinyu Li
Yuxuan Hu
Yilun Zhu
Long Zhou
...
Peidong Wang
Linquan Liu
Shujie Liu
Ed Lin
Yifan Gong
29
6
0
01 Mar 2023
Fast and accurate factorized neural transducer for text adaption of
  end-to-end speech recognition models
Fast and accurate factorized neural transducer for text adaption of end-to-end speech recognition models
Rui Zhao
Jian Xue
P. Parthasarathy
Veljko Miljanic
Jinyu Li
19
13
0
05 Dec 2022
LongFNT: Long-form Speech Recognition with Factorized Neural Transducer
LongFNT: Long-form Speech Recognition with Factorized Neural Transducer
Xun Gong
Yu-Huan Wu
Jinyu Li
Shujie Liu
Rui Zhao
Xie Chen
Y. Qian
RALM
24
10
0
17 Nov 2022
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture,
  and Generalization Capabilities
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities
Andros Tjandra
Nayan Singhal
David C. Zhang
Ozlem Kalinli
Abdel-rahman Mohamed
Duc Le
M. Seltzer
32
12
0
10 Nov 2022
Minimum Latency Training of Sequence Transducers for Streaming
  End-to-End Speech Recognition
Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition
Yusuke Shinohara
Shinji Watanabe
AI4TS
21
9
0
04 Nov 2022
Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint
  CTC/Attention Frames
Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames
Che-Yuan Liang
Xiao-Lei Zhang
BinBin Zhang
Di Wu
Shengqiang Li
Xingcheng Song
Zhendong Peng
Fuping Pan
16
8
0
02 Nov 2022
Factorized Blank Thresholding for Improved Runtime Efficiency of Neural
  Transducers
Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers
Duc Le
Frank Seide
Yuhao Wang
Y. Li
Kjell Schubert
Ozlem Kalinli
M. Seltzer
19
6
0
02 Nov 2022
Acoustic-aware Non-autoregressive Spell Correction with Mask Sample
  Decoding
Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Ruchao Fan
Guoli Ye
Yashesh Gaur
Jinyu Li
11
4
0
16 Oct 2022
WakeUpNet: A Mobile-Transformer based Framework for End-to-End Streaming
  Voice Trigger
WakeUpNet: A Mobile-Transformer based Framework for End-to-End Streaming Voice Trigger
Zixing Zhang
Thorin Farnsworth
Senling Lin
S. Karout
28
2
0
06 Oct 2022
ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers
  for Streaming Speech Recognition
ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition
Martin H. Radfar
Rohit Barnwal
R. Swaminathan
Feng-Ju Chang
Grant P. Strimel
Nathan Susanj
Athanasios Mouchtaris
28
13
0
29 Sep 2022
Analysis of Self-Attention Head Diversity for Conformer-based Automatic
  Speech Recognition
Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition
Kartik Audhkhasi
Yinghui Huang
Bhuvana Ramabhadran
Pedro J. Moreno
19
3
0
13 Sep 2022
Adversarial Attacks on ASR Systems: An Overview
Adversarial Attacks on ASR Systems: An Overview
Xiao Zhang
Hao Tan
Xuan Huang
Denghui Zhang
Keke Tang
Zhaoquan Gu
AAML
14
3
0
03 Aug 2022
Compute Cost Amortized Transformer for Streaming ASR
Compute Cost Amortized Transformer for Streaming ASR
Yifan Xie
J. Macoskey
Martin H. Radfar
Feng-Ju Chang
Brian King
Ariya Rastrow
Athanasios Mouchtaris
Grant P. Strimel
22
7
0
05 Jul 2022
Improving Deliberation by Text-Only and Semi-Supervised Training
Improving Deliberation by Text-Only and Semi-Supervised Training
Ke Hu
Tara N. Sainath
Yanzhang He
Rohit Prabhavalkar
Trevor Strohman
S. Mavandadi
Weiran Wang
26
12
0
29 Jun 2022
On the Prediction Network Architecture in RNN-T for ASR
On the Prediction Network Architecture in RNN-T for ASR
Dario Albesano
Jesús Andrés-Ferrer
Nicola Ferri
Puming Zhan
AI4TS
24
0
0
29 Jun 2022
Answer Fast: Accelerating BERT on the Tensor Streaming Processor
Answer Fast: Accelerating BERT on the Tensor Streaming Processor
I. Ahmed
Sahil Parmar
Matthew Boyd
Michael Beidler
Kris Kang
Bill Liu
Kyle Roach
John Kim
D. Abts
LLMAG
12
6
0
22 Jun 2022
Large-Scale Streaming End-to-End Speech Translation with Neural
  Transducers
Large-Scale Streaming End-to-End Speech Translation with Neural Transducers
Jian Xue
Peidong Wang
Jinyu Li
Matt Post
Yashesh Gaur
AI4TS
24
26
0
11 Apr 2022
CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming
  ASR
CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASR
Keyu An
Huahuan Zheng
Zhijian Ou
Hongyu Xiang
Ke Ding
Guanglu Wan
AI4TS
20
17
0
31 Mar 2022
Towards Contextual Spelling Correction for Customization of End-to-end
  Speech Recognition Systems
Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems
Xiaoqiang Wang
Yanqing Liu
Jinyu Li
Veljko Miljanic
Sheng Zhao
H. Khalil
KELM
11
18
0
02 Mar 2022
Tricks and Plugins to GBM on Images and Sequences
Tricks and Plugins to GBM on Images and Sequences
Biyi Fang
J. Utke
Diego Klabjan
25
0
0
01 Mar 2022
Endpoint Detection for Streaming End-to-End Multi-talker ASR
Endpoint Detection for Streaming End-to-End Multi-talker ASR
Liang Lu
Jinyu Li
Yifan Gong
17
17
0
24 Jan 2022
Scaling ASR Improves Zero and Few Shot Learning
Scaling ASR Improves Zero and Few Shot Learning
Alex Xiao
Weiyi Zheng
Gil Keren
Duc Le
Frank Zhang
Christian Fuegen
Ozlem Kalinli
Yatharth Saraf
Abdel-rahman Mohamed
11
21
0
10 Nov 2021
123
Next