ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.06621
  4. Cited By
Streaming End-to-end Speech Recognition For Mobile Devices

Streaming End-to-end Speech Recognition For Mobile Devices

15 November 2018
Yanzhang He
Tara N. Sainath
Rohit Prabhavalkar
Ian McGraw
R. Álvarez
Ding Zhao
David Rybach
Anjuli Kannan
Yonghui Wu
Ruoming Pang
Qiao Liang
Deepti Bhatia
Yuan Shangguan
Bo-wen Li
Golan Pundak
K. Sim
Tom Bagby
Shuo-yiin Chang
Kanishka Rao
A. Gruenstein
ArXivPDFHTML

Papers citing "Streaming End-to-end Speech Recognition For Mobile Devices"

50 / 154 papers shown
Title
SpinML: Customized Synthetic Data Generation for Private Training of Specialized ML Models
SpinML: Customized Synthetic Data Generation for Private Training of Specialized ML Models
Jiang Zhang
Rohan Sequeira
Konstantinos Psounis
SyDa
78
0
0
05 Mar 2025
HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASR
HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASR
Hainan Xu
Travis M. Bartley
Vladimir Bataev
Boris Ginsburg
199
0
0
03 Oct 2024
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based
  Speech Recognition
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Ye Bai
Jingping Chen
Jitong Chen
Wei Chen
Zhuo Chen
...
Wanyi Zhang
Yang Zhang
Yawei Zhang
Yijie Zheng
Ming Zou
AuLLM
52
19
0
05 Jul 2024
Towards Effective and Efficient Non-autoregressive Decoding Using
  Block-based Attention Mask
Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask
Tianzi Wang
Xurong Xie
Zhaoqing Li
Shoukang Hu
Zengrui Jin
...
Shujie Hu
Mengzhe Geng
Guinan Li
Helen Meng
Xunying Liu
34
0
0
14 Jun 2024
ecVoice: Audio Text Extraction and Optimization of Video Based on Idioms
  Similarity Replacement
ecVoice: Audio Text Extraction and Optimization of Video Based on Idioms Similarity Replacement
Jinwei Lin
47
0
0
20 May 2024
Sonos Voice Control Bias Assessment Dataset: A Methodology for
  Demographic Bias Assessment in Voice Assistants
Sonos Voice Control Bias Assessment Dataset: A Methodology for Demographic Bias Assessment in Voice Assistants
Chloe Sekkat
Fanny Leroy
Salima Mdhaffar
Blake Perry Smith
Yannick Esteve
Joseph Dureau
A. Coucke
32
1
0
14 May 2024
Automatic Speech Recognition System-Independent Word Error Rate
  Estimation
Automatic Speech Recognition System-Independent Word Error Rate Estimation
Chanho Park
Mingjie Chen
Thomas Hain
26
0
0
25 Apr 2024
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
Chenpeng Du
Yiwei Guo
Hankun Wang
Yifan Yang
Zhikang Niu
Shuai Wang
Hui Zhang
Xie Chen
Kai Yu
VLM
35
25
0
25 Jan 2024
Stateful Conformer with Cache-based Inference for Streaming Automatic
  Speech Recognition
Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition
Vahid Noroozi
Somshubra Majumdar
Ankur Kumar
Jagadeesh Balam
Boris Ginsburg
33
10
0
27 Dec 2023
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech
  Recognition with Universal Speech Models
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
Shaojin Ding
David Qiu
David Rim
Yanzhang He
Oleg Rybakov
...
Tara N. Sainath
Zhonglin Han
Jian Li
Amir Yazdanbakhsh
Shivani Agrawal
MQ
34
9
0
13 Dec 2023
Massive End-to-end Models for Short Search Queries
Massive End-to-end Models for Short Search Queries
Weiran Wang
Rohit Prabhavalkar
Dongseong Hwang
Qiujia Li
K. Sim
...
Zhong Meng
CJ Zheng
Yanzhang He
Tara N. Sainath
P. M. Mengibar
32
2
0
22 Sep 2023
Modality Confidence Aware Training for Robust End-to-End Spoken Language
  Understanding
Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding
Suyoun Kim
Akshat Shrivastava
Duc Le
Ju Lin
Ozlem Kalinli
M. Seltzer
AuLLM
33
2
0
22 Jul 2023
TST: Time-Sparse Transducer for Automatic Speech Recognition
TST: Time-Sparse Transducer for Automatic Speech Recognition
Xiaohui Zhang
Mangui Liang
Zhengkun Tian
Jiangyan Yi
J. Tao
14
0
0
17 Jul 2023
Long Short-term Memory with Two-Compartment Spiking Neuron
Long Short-term Memory with Two-Compartment Spiking Neuron
Shimin Zhang
Qu Yang
Chenxiang Ma
Jibin Wu
Haizhou Li
Kay Chen Tan
33
7
0
14 Jul 2023
SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition
SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition
Desh Raj
Daniel Povey
Sanjeev Khudanpur
VLM
34
9
0
18 Jun 2023
Adaptive Contextual Biasing for Transducer Based Streaming Speech
  Recognition
Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition
Tianyi Xu
Zhanheng Yang
Kaixun Huang
Pengcheng Guo
Aoting Zhang
Biao Li
Changru Chen
Chong Li
Linfu Xie
22
10
0
01 Jun 2023
Contextualized End-to-End Speech Recognition with Contextual Phrase
  Prediction Network
Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network
Kaixun Huang
Aoting Zhang
Zhanheng Yang
Pengcheng Guo
Bingshen Mu
Tianyi Xu
Linfu Xie
35
16
0
21 May 2023
Robust Acoustic and Semantic Contextual Biasing in Neural Transducers
  for Speech Recognition
Robust Acoustic and Semantic Contextual Biasing in Neural Transducers for Speech Recognition
Xuandi Fu
Kanthashree Mysore Sathyendra
Ankur Gandhe
Jing Liu
Grant P. Strimel
Ross McGowan
Athanasios Mouchtaris
30
14
0
09 May 2023
Dual-Attention Neural Transducers for Efficient Wake Word Spotting in
  Speech Recognition
Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition
Saumya Yashmohini Sahai
Jing Liu
Thejaswi Muniyappa
Kanthashree Mysore Sathyendra
Anastasios Alexandridis
...
Ross McGowan
Ariya Rastrow
Feng-Ju Chang
Athanasios Mouchtaris
Siegfried Kunzmann
39
5
0
03 Apr 2023
A Deliberation-based Joint Acoustic and Text Decoder
A Deliberation-based Joint Acoustic and Text Decoder
S. Mavandadi
Tara N. Sainath
Ke Hu
Zelin Wu
21
7
0
23 Mar 2023
Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for
  Mandarin Speech Recognition
Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition
Kai Liu
Hailiang Xiong
Gangqiang Yang
Zhengfeng Du
Yewen Cao
D. Shah
18
0
0
23 Mar 2023
Speech Intelligibility Classifiers from 550k Disordered Speech Samples
Speech Intelligibility Classifiers from 550k Disordered Speech Samples
Subhashini Venugopalan
Jimmy Tobin
Samuel J. Yang
Katie Seaver
Richard Cave
P. Jiang
Neil Zeghidour
Rus Heywood
Jordan R. Green
Michael P. Brenner
44
9
0
13 Mar 2023
Building High-accuracy Multilingual ASR with Gated Language Experts and
  Curriculum Training
Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training
Eric Sun
Jinyu Li
Yuxuan Hu
Yilun Zhu
Long Zhou
...
Peidong Wang
Linquan Liu
Shujie Liu
Ed Lin
Yifan Gong
34
6
0
01 Mar 2023
A Token-Wise Beam Search Algorithm for RNN-T
A Token-Wise Beam Search Algorithm for RNN-T
Gil Keren
31
1
0
28 Feb 2023
Massively Multilingual Shallow Fusion with Large Language Models
Massively Multilingual Shallow Fusion with Large Language Models
Ke Hu
Tara N. Sainath
Bo-wen Li
Nan Du
Yanping Huang
Andrew M. Dai
Yu Zhang
Rodrigo Cabrera
Z. Chen
Trevor Strohman
35
13
0
17 Feb 2023
Two Stage Contextual Word Filtering for Context bias in Unified
  Streaming and Non-streaming Transducer
Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-streaming Transducer
Zhanheng Yang
Sining Sun
Xiong Wang
Yike Zhang
Long Ma
Linfu Xie
26
9
0
17 Jan 2023
Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition
  Systems A case study for Modern Greek
Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition Systems A case study for Modern Greek
Georgios Paraskevopoulos
Theodoros Kouzelis
Georgios Rouvalis
Athanasios Katsamanis
Vassilis Katsouros
Alexandros Potamianos
VLM
28
7
0
31 Dec 2022
Training Integer-Only Deep Recurrent Neural Networks
Training Integer-Only Deep Recurrent Neural Networks
V. Nia
Eyyub Sari
Vanessa Courville
M. Asgharian
MQ
53
2
0
22 Dec 2022
Continual Learning for On-Device Speech Recognition using Disentangled
  Conformers
Continual Learning for On-Device Speech Recognition using Disentangled Conformers
Anuj Diwan
Ching-Feng Yeh
Wei-Ning Hsu
Paden Tomasello
Eunsol Choi
David Harwath
Abdel-rahman Mohamed
CLL
BDL
30
7
0
02 Dec 2022
Neural Transducer Training: Reduced Memory Consumption with Sample-wise
  Computation
Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation
Stefan Braun
Erik McDermott
Roger Hsiao
40
1
0
29 Nov 2022
Learning Reward Functions for Robotic Manipulation by Observing Humans
Learning Reward Functions for Robotic Manipulation by Observing Humans
Minttu Alakuijala
Gabriel Dulac-Arnold
Julien Mairal
Jean Ponce
Cordelia Schmid
OffRL
39
27
0
16 Nov 2022
Streaming Joint Speech Recognition and Disfluency Detection
Streaming Joint Speech Recognition and Disfluency Detection
Hayato Futami
E. Tsunoo
Kentarou Shibata
Yosuke Kashiwagi
Takao Okuda
Siddhant Arora
Shinji Watanabe
42
6
0
16 Nov 2022
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture,
  and Generalization Capabilities
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities
Andros Tjandra
Nayan Singhal
David C. Zhang
Ozlem Kalinli
Abdel-rahman Mohamed
Duc Le
M. Seltzer
37
12
0
10 Nov 2022
Deliberation Networks and How to Train Them
Deliberation Networks and How to Train Them
Qingyun Dou
Mark Gales
24
0
0
06 Nov 2022
A Weakly-Supervised Streaming Multilingual Speech Model with Truly
  Zero-Shot Capability
A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability
Jian Xue
Peidong Wang
Jinyu Li
Eric Sun
32
10
0
04 Nov 2022
Factorized Blank Thresholding for Improved Runtime Efficiency of Neural
  Transducers
Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers
Duc Le
Frank Seide
Yuhao Wang
Heng Chang
Kjell Schubert
Ozlem Kalinli
M. Seltzer
19
6
0
02 Nov 2022
Unified End-to-End Speech Recognition and Endpointing for Fast and
  Efficient Speech Systems
Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems
Shaan Bijwadia
Shuo-yiin Chang
Bo-wen Li
Tara N. Sainath
Chaoyang Zhang
Yanzhang He
39
7
0
01 Nov 2022
Joint Audio/Text Training for Transformer Rescorer of Streaming Speech
  Recognition
Joint Audio/Text Training for Transformer Rescorer of Streaming Speech Recognition
Suyoun Kim
Ke Li
Lucas Kabela
Rongqing Huang
Jiedan Zhu
Ozlem Kalinli
Duc Le
27
8
0
31 Oct 2022
Fast and parallel decoding for transducer
Fast and parallel decoding for transducer
Wei Kang
Liyong Guo
Fangjun Kuang
Long Lin
Mingshuang Luo
Zengwei Yao
Xiaoyu Yang
Piotr Żelasko
Daniel Povey
AI4TS
19
15
0
31 Oct 2022
Partitioned Gradient Matching-based Data Subset Selection for
  Compute-Efficient Robust ASR Training
Partitioned Gradient Matching-based Data Subset Selection for Compute-Efficient Robust ASR Training
Ashish R. Mittal
D. Sivasubramanian
Rishabh K. Iyer
P. Jyothi
Ganesh Ramakrishnan
19
3
0
30 Oct 2022
Accelerating RNN-T Training and Inference Using CTC guidance
Accelerating RNN-T Training and Inference Using CTC guidance
Yongqiang Wang
Zhehuai Chen
Cheng-yong Zheng
Yu Zhang
Wei Han
Parisa Haghani
40
23
0
29 Oct 2022
Can Visual Context Improve Automatic Speech Recognition for an Embodied
  Agent?
Can Visual Context Improve Automatic Speech Recognition for an Embodied Agent?
Pradip Pramanick
Chayan Sarkar
24
7
0
21 Oct 2022
Towards Personalization of CTC Speech Recognition Models with Contextual
  Adapters and Adaptive Boosting
Towards Personalization of CTC Speech Recognition Models with Contextual Adapters and Adaptive Boosting
Saket Dingliwal
Monica Sunkara
S. Bodapati
S. Ronanki
Jeffrey J. Farris
Katrin Kirchhoff
33
0
0
18 Oct 2022
JOIST: A Joint Speech and Text Streaming Model For ASR
JOIST: A Joint Speech and Text Streaming Model For ASR
Tara N. Sainath
Rohit Prabhavalkar
Ankur Bapna
Yu Zhang
Zhouyuan Huo
Zhehuai Chen
Bo-wen Li
Weiran Wang
Trevor Strohman
RALM
AuLLM
53
35
0
13 Oct 2022
ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers
  for Streaming Speech Recognition
ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition
Martin H. Radfar
Rohit Barnwal
R. Swaminathan
Feng-Ju Chang
Grant P. Strimel
Nathan Susanj
Athanasios Mouchtaris
34
13
0
29 Sep 2022
Turn-Taking Prediction for Natural Conversational Speech
Turn-Taking Prediction for Natural Conversational Speech
Shuo-yiin Chang
Bo-wen Li
Tara N. Sainath
Chaoyang Zhang
Trevor Strohman
Qiao Liang
Yanzhang He
43
18
0
29 Aug 2022
UserLibri: A Dataset for ASR Personalization Using Only Text
UserLibri: A Dataset for ASR Personalization Using Only Text
Theresa Breiner
Swaroop Indra Ramaswamy
Ehsan Variani
Shefali Garg
Rajiv Mathews
K. Sim
Kilol Gupta
Mingqing Chen
Lara McConnaughey
30
16
0
02 Jul 2022
Sequence-level Speaker Change Detection with Difference-based Continuous
  Integrate-and-fire
Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire
Zhiyun Fan
Linhao Dong
Meng Cai
Zejun Ma
Bo Xu
31
4
0
27 Jun 2022
On Comparison of Encoders for Attention based End to End Speech
  Recognition in Standalone and Rescoring Mode
On Comparison of Encoders for Attention based End to End Speech Recognition in Standalone and Rescoring Mode
Raviraj Joshi
Subodh Kumar
36
2
0
26 Jun 2022
Pruned RNN-T for fast, memory-efficient ASR training
Pruned RNN-T for fast, memory-efficient ASR training
Fangjun Kuang
Liyong Guo
Wei Kang
Long Lin
Mingshuang Luo
Zengwei Yao
Daniel Povey
27
64
0
23 Jun 2022
1234
Next