ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.01769
  4. Cited By
State-of-the-art Speech Recognition With Sequence-to-Sequence Models
v1v2v3v4v5v6 (latest)

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

5 December 2017
Chung-Cheng Chiu
Tara N. Sainath
Yonghui Wu
Rohit Prabhavalkar
Patrick Nguyen
Zhiwen Chen
Anjuli Kannan
Ron J. Weiss
Kanishka Rao
Katya Gonina
Navdeep Jaitly
Yue Liu
J. Chorowski
M. Bacchiani
    AI4TS
ArXiv (abs)PDFHTML

Papers citing "State-of-the-art Speech Recognition With Sequence-to-Sequence Models"

50 / 501 papers shown
Title
Unifying Streaming and Non-streaming Zipformer-based ASR
Unifying Streaming and Non-streaming Zipformer-based ASR
Bidisha Sharma
Karthik Pandia Durai
Shankar Venkatesan
Jeena Prakash
Shashi Kumar
Malolan Chetlur
Andreas Stolcke
25
0
0
17 Jun 2025
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
Linus Nwankwo
Bjoern Ellensohn
Ozan Özdenizci
Elmar Rueckert
LM&Ro
241
0
0
03 May 2025
Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
Kai Qiu
Xianrui Li
Jason Kuen
Hong Chen
Xiaohao Xu
Jiuxiang Gu
Yinyi Luo
Bhiksha Raj
Zhe Lin
Marios Savvides
162
2
0
11 Mar 2025
Aligner-Encoders: Self-Attention Transformers Can Be Self-Transducers
Aligner-Encoders: Self-Attention Transformers Can Be Self-Transducers
Adam Stooke
Rohit Prabhavalkar
K. Sim
P. M. Mengibar
187
0
0
06 Feb 2025
Multimodal Human-Autonomous Agents Interaction Using Pre-Trained Language and Visual Foundation Models
Multimodal Human-Autonomous Agents Interaction Using Pre-Trained Language and Visual Foundation Models
Linus Nwankwo
Elmar Rueckert
142
2
0
31 Dec 2024
Unified Speech Recognition: A Single Model for Auditory, Visual, and
  Audiovisual Inputs
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
A. Haliassos
Rodrigo Mira
Honglie Chen
Zoe Landgraf
Stavros Petridis
Maja Pantic
SSL
86
7
0
04 Nov 2024
All models are wrong, some are useful: Model Selection with Limited
  Labels
All models are wrong, some are useful: Model Selection with Limited Labels
Patrik Okanovic
Andreas Kirsch
Jannes Kasper
Torsten Hoefler
Andreas Krause
Nezihe Merve Gürel
VLM
47
1
0
17 Oct 2024
A two-stage transliteration approach to improve performance of a
  multilingual ASR
A two-stage transliteration approach to improve performance of a multilingual ASR
Rohit Kumar
54
0
0
09 Oct 2024
Speechworthy Instruction-tuned Language Models
Speechworthy Instruction-tuned Language Models
Hyundong Justin Cho
Nicolaas Jedema
Leonardo F. R. Ribeiro
Karishma Sharma
Pedro Szekely
Alessandro Moschitti
Ruben Janssen
Jonathan May
ALM
85
1
0
23 Sep 2024
What does it take to get state of the art in simultaneous
  speech-to-speech translation?
What does it take to get state of the art in simultaneous speech-to-speech translation?
Vincent Wilmet
Johnson Du
38
0
0
02 Sep 2024
Measuring the Accuracy of Automatic Speech Recognition Solutions
Measuring the Accuracy of Automatic Speech Recognition Solutions
Korbinian Kuhn
Verena Kersken
Benedikt Reuter
Niklas Egger
Gottfried Zimmermann
66
22
0
29 Aug 2024
Toward Improving Synthetic Audio Spoofing Detection Robustness via
  Meta-Learning and Disentangled Training With Adversarial Examples
Toward Improving Synthetic Audio Spoofing Detection Robustness via Meta-Learning and Disentangled Training With Adversarial Examples
Zhenyu Wang
John H. L. Hansen
AAML
89
1
0
23 Aug 2024
BasisN: Reprogramming-Free RRAM-Based In-Memory-Computing by Basis
  Combination for Deep Neural Networks
BasisN: Reprogramming-Free RRAM-Based In-Memory-Computing by Basis Combination for Deep Neural Networks
Amro Eldebiky
Grace Li Zhang
Xunzhao Yin
Cheng Zhuo
Ing-Chao Lin
Ulf Schlichtmann
Bing Li
50
0
0
04 Jul 2024
Token-Weighted RNN-T for Learning from Flawed Data
Token-Weighted RNN-T for Learning from Flawed Data
Gil Keren
Wei Zhou
Ozlem Kalinli
89
0
0
26 Jun 2024
Text Injection for Neural Contextual Biasing
Text Injection for Neural Contextual Biasing
Zhong Meng
Zelin Wu
Rohit Prabhavalkar
Cal Peyser
Weiran Wang
Nanxin Chen
Tara N. Sainath
Bhuvana Ramabhadran
114
3
0
05 Jun 2024
Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech
  Foundation Models
Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models
Yuchen Hu
Chen Chen
Chao-Han Huck Yang
Chengwei Qin
Pin-Yu Chen
Chng Eng Siong
Chao Zhang
VLM
76
4
0
23 May 2024
You don't understand me!: Comparing ASR results for L1 and L2 speakers
  of Swedish
You don't understand me!: Comparing ASR results for L1 and L2 speakers of Swedish
Ronald Cumbal
Birger Moell
José Lopes
Olov Engwall
29
20
0
22 May 2024
AIris: An AI-powered Wearable Assistive Device for the Visually Impaired
AIris: An AI-powered Wearable Assistive Device for the Visually Impaired
Dionysia Danai Brilli
Evangelos Georgaras
Stefania Tsilivaki
Nikos Melanitis
Konstantina S. Nikita
29
1
0
13 May 2024
Efficient Sample-Specific Encoder Perturbations
Efficient Sample-Specific Encoder Perturbations
Yassir Fathullah
Mark Gales
49
0
0
01 May 2024
Advanced Long-Content Speech Recognition With Factorized Neural
  Transducer
Advanced Long-Content Speech Recognition With Factorized Neural Transducer
Xun Gong
Yu Wu
Jinyu Li
Shujie Liu
Rui Zhao
Xie Chen
Yanmin Qian
109
9
0
20 Mar 2024
Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of
  Speech Sound Disorders in Korean children
Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children
Taekyung Ahn
Yeonjung Hong
Younggon Im
Do Hyung Kim
Dayoung Kang
...
Jae Won Kim
Min Jung Kim
Ah-ra Cho
Dae-Hyun Jang
Hosung Nam
56
1
0
13 Mar 2024
Typist Experiment: an Investigation of Human-to-Human Dictation via
  Role-play to Inform Voice-based Text Authoring
Typist Experiment: an Investigation of Human-to-Human Dictation via Role-play to Inform Voice-based Text Authoring
Can Liu
Si-Yuan Hu
Li Feng
Mingming Fan
66
3
0
09 Mar 2024
Automatic Speech Recognition using Advanced Deep Learning Approaches: A
  survey
Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey
Hamza Kheddar
Mustapha Hemis
Yassine Himeur
OffRL
90
71
0
02 Mar 2024
Representing Online Handwriting for Recognition in Large Vision-Language
  Models
Representing Online Handwriting for Recognition in Large Vision-Language Models
Anastasiia Fadeeva
Philippe Schlattner
Andrii Maksai
Mark Collier
Efi Kokiopoulou
Jesse Berent
C. Musat
170
6
0
23 Feb 2024
Sheet Music Transformer: End-To-End Optical Music Recognition Beyond
  Monophonic Transcription
Sheet Music Transformer: End-To-End Optical Music Recognition Beyond Monophonic Transcription
Antonio Ríos-Vila
Jorge Calvo-Zaragoza
Thierry Paquet
104
11
0
12 Feb 2024
Automated speech audiometry: Can it work using open-source pre-trained
  Kaldi-NL automatic speech recognition?
Automated speech audiometry: Can it work using open-source pre-trained Kaldi-NL automatic speech recognition?
Gloria Araiza-Illan
Luke Meyer
K. Truong
D. Başkent
31
6
0
19 Dec 2023
PhasePerturbation: Speech Data Augmentation via Phase Perturbation for
  Automatic Speech Recognition
PhasePerturbation: Speech Data Augmentation via Phase Perturbation for Automatic Speech Recognition
Chengxi Lei
Satwinder Singh
Feng Hou
Xiaoyun Jia
Ruili Wang
60
1
0
13 Dec 2023
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech
  Recognition with Universal Speech Models
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
Shaojin Ding
David Qiu
David Rim
Yanzhang He
Oleg Rybakov
...
Tara N. Sainath
Zhonglin Han
Jian Li
Amir Yazdanbakhsh
Shivani Agrawal
MQ
105
12
0
13 Dec 2023
D4AM: A General Denoising Framework for Downstream Acoustic Models
D4AM: A General Denoising Framework for Downstream Acoustic Models
H. Wang
Yu Tsao
Hsin-Min Wang
Chu-Song Chen
70
4
0
28 Nov 2023
Neural Network Methods for Radiation Detectors and Imaging
Neural Network Methods for Radiation Detectors and Imaging
S. Lin
S. Ning
H. Zhu
T. Zhou
C. L. Morris
S. Clayton
M. Cherukara
R. T. Chen
Z. Wang
AI4CE
66
5
0
09 Nov 2023
TACNET: Temporal Audio Source Counting Network
TACNET: Temporal Audio Source Counting Network
Amirreza Ahmadnejad
Ahmad Mahmmodian Darviishani
Mohmmad Mehrdad Asadi
Sajjad Saffariyeh
Pedram Yousef
Emad Fatemizadeh
62
2
0
04 Nov 2023
Boosting Decision-Based Black-Box Adversarial Attack with Gradient
  Priors
Boosting Decision-Based Black-Box Adversarial Attack with Gradient Priors
Han Liu
Xingshuo Huang
Xiaotong Zhang
Qimai Li
Fenglong Ma
Wen Wang
Hongyang Chen
Hong Yu
Xianchao Zhang
AAML
72
2
0
29 Oct 2023
Quantifying the Dialect Gap and its Correlates Across Languages
Quantifying the Dialect Gap and its Correlates Across Languages
Anjali Kantharuban
Ivan Vulić
Anna Korhonen
87
23
0
23 Oct 2023
Unveiling Energy Efficiency in Deep Learning: Measurement, Prediction,
  and Scoring across Edge Devices
Unveiling Energy Efficiency in Deep Learning: Measurement, Prediction, and Scoring across Edge Devices
Xiaolong Tu
Anik Mallik
Dawei Chen
Kyungtae Han
Onur Altintas
Haoxin Wang
Jiang Xie
69
13
0
19 Oct 2023
Insightful analysis of historical sources at scales beyond human
  capabilities using unsupervised Machine Learning and XAI
Insightful analysis of historical sources at scales beyond human capabilities using unsupervised Machine Learning and XAI
Oliver Eberle
Jochen Büttner
Hassan el-Hajj
G. Montavon
Klaus-Robert Muller
Matteo Valleriani
65
2
0
13 Oct 2023
Generative Speech Recognition Error Correction with Large Language
  Models and Task-Activating Prompting
Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting
Chao-Han Huck Yang
Yile Gu
Yi-Chieh Liu
Shalini Ghosh
I. Bulyko
A. Stolcke
KELMLRM
120
52
0
27 Sep 2023
Memory-augmented conformer for improved end-to-end long-form ASR
Memory-augmented conformer for improved end-to-end long-form ASR
Carlos Carvalho
A. Abad
RALM
59
1
0
22 Sep 2023
Hybrid Attention-based Encoder-decoder Model for Efficient Language
  Model Adaptation
Hybrid Attention-based Encoder-decoder Model for Efficient Language Model Adaptation
Shaoshi Ling
Guoli Ye
Rui Zhao
Yifan Gong
VLM
63
1
0
14 Sep 2023
Typing on Any Surface: A Deep Learning-based Method for Real-Time
  Keystroke Detection in Augmented Reality
Typing on Any Surface: A Deep Learning-based Method for Real-Time Keystroke Detection in Augmented Reality
Xingyu Fu
Mingze Xi
25
0
0
31 Aug 2023
Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual
  Loss
Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss
M. Soleymanpour
Mahmoud Al Ismail
F. Bahmaninezhad
Kshitiz Kumar
Jian Wu
42
0
0
11 Aug 2023
On-Device Speaker Anonymization of Acoustic Embeddings for ASR based
  onFlexible Location Gradient Reversal Layer
On-Device Speaker Anonymization of Acoustic Embeddings for ASR based onFlexible Location Gradient Reversal Layer
Md. Asif Jalal
Pablo Peso Parada
Jisi Zhang
Karthikeyan P. Saravanan
Mete Ozay
Myoungji Han
Jung In Lee
Seokyeong Jung
58
1
0
25 Jul 2023
Analyzing sports commentary in order to automatically recognize events
  and extract insights
Analyzing sports commentary in order to automatically recognize events and extract insights
Yanis Miraoui
25
0
0
18 Jul 2023
Toward Interactive Dictation
Toward Interactive Dictation
Belinda Z. Li
J. Eisner
Adam Pauls
Sam Thomson
KELM
50
2
0
08 Jul 2023
Multi-pass Training and Cross-information Fusion for Low-resource
  End-to-end Accented Speech Recognition
Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition
Xuefei Wang
Yanhua Long
Yijie Li
Haoran Wei
62
4
0
20 Jun 2023
SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition
SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition
Desh Raj
Daniel Povey
Sanjeev Khudanpur
VLM
96
13
0
18 Jun 2023
Research on an improved Conformer end-to-end Speech Recognition Model
  with R-Drop Structure
Research on an improved Conformer end-to-end Speech Recognition Model with R-Drop Structure
Weidong Ji
Shijie Zan
Guohui Zhou
Xu Wang
SyDa
56
1
0
14 Jun 2023
Text-only Domain Adaptation using Unified Speech-Text Representation in
  Transducer
Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer
Lu Huang
Yangqiu Song
Jun Zhang
Lu Lu
Zejun Ma
87
3
0
07 Jun 2023
Edit Distance based RL for RNNT decoding
Edit Distance based RL for RNNT decoding
DongSeon Hwang
Changwan Ryu
K. Sim
47
0
0
31 May 2023
Graph Neural Networks for Contextual ASR with the Tree-Constrained
  Pointer Generator
Graph Neural Networks for Contextual ASR with the Tree-Constrained Pointer Generator
Guangzhi Sun
Chuxu Zhang
P. Woodland
45
6
0
30 May 2023
Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning
Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning
Patrik Okanovic
R. Waleffe
Vasilis Mageirakos
Konstantinos E. Nikolakakis
Amin Karbasi
Dionysis Kalogerias
Nezihe Merve Gürel
Theodoros Rekatsinas
DD
104
14
0
28 May 2023
1234...91011
Next