ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.06773
  4. Cited By
Joint CTC-Attention based End-to-End Speech Recognition using Multi-task
  Learning

Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning

21 September 2016
Suyoun Kim
Takaaki Hori
Shinji Watanabe
ArXivPDFHTML

Papers citing "Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning"

50 / 144 papers shown
Title
Joint Training of Speech Enhancement and Self-supervised Model for
  Noise-robust ASR
Joint Training of Speech Enhancement and Self-supervised Model for Noise-robust ASR
Qiu-shi Zhu
Jie Zhang
Zitian Zhang
Lirong Dai
43
15
0
26 May 2022
Minimising Biasing Word Errors for Contextual ASR with the
  Tree-Constrained Pointer Generator
Minimising Biasing Word Errors for Contextual ASR with the Tree-Constrained Pointer Generator
Guangzhi Sun
C. Zhang
P. Woodland
32
14
0
18 May 2022
Self-critical Sequence Training for Automatic Speech Recognition
Self-critical Sequence Training for Automatic Speech Recognition
Chen Chen
Yuchen Hu
Nana Hou
Xiaofeng Qi
Heqing Zou
Chng Eng Siong
24
15
0
13 Apr 2022
Auditory-Based Data Augmentation for End-to-End Automatic Speech
  Recognition
Auditory-Based Data Augmentation for End-to-End Automatic Speech Recognition
Zehai Tu
Jack Deadman
Ning Ma
Jon Barker
27
4
0
08 Apr 2022
Federated Self-supervised Speech Representations: Are We There Yet?
Federated Self-supervised Speech Representations: Are We There Yet?
Yan Gao
Javier Fernandez-Marques
Titouan Parcollet
Abhinav Mehrotra
Nicholas D. Lane
35
13
0
06 Apr 2022
Combining Spectral and Self-Supervised Features for Low Resource Speech
  Recognition and Translation
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation
Dan Berrebbi
Jiatong Shi
Brian Yan
Osbel López-Francisco
Jonathan D. Amith
Shinji Watanabe
10
26
0
05 Apr 2022
Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur
  Speech Recognition
Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition
Guodong Ma
Pengfei Hu
Jian Kang
Shen Huang
Hao-Ming Huang
18
9
0
02 Apr 2022
End-to-End Integration of Speech Recognition, Speech Enhancement, and
  Self-Supervised Learning Representation
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
Xuankai Chang
Takashi Maekaku
Yuya Fujita
Shinji Watanabe
VLM
51
45
0
01 Apr 2022
End-to-End Multi-speaker ASR with Independent Vector Analysis
End-to-End Multi-speaker ASR with Independent Vector Analysis
Robin Scheibler
Wangyou Zhang
Xuankai Chang
Shinji Watanabe
Y. Qian
21
2
0
01 Apr 2022
4-bit Conformer with Native Quantization Aware Training for Speech
  Recognition
4-bit Conformer with Native Quantization Aware Training for Speech Recognition
Shaojin Ding
Phoenix Meadowlark
Yanzhang He
Lukasz Lew
Shivani Agrawal
Oleg Rybakov
MQ
31
32
0
29 Mar 2022
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit
Binbin Zhang
Di Wu
Zhendong Peng
Xingcheng Song
Zhuoyuan Yao
Hang Lv
Linfu Xie
Chao Yang
Fuping Pan
Jianwei Niu
VLM
26
93
0
29 Mar 2022
A General Survey on Attention Mechanisms in Deep Learning
A General Survey on Attention Mechanisms in Deep Learning
Gianni Brauwers
Flavius Frasincar
31
296
0
27 Mar 2022
Discovering Phonetic Inventories with Crosslingual Automatic Speech
  Recognition
Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition
Piotr Żelasko
Siyuan Feng
Laureano Moro Velázquez
A. Abavisani
Saurabhchand Bhati
O. Scharenborg
M. Hasegawa-Johnson
Najim Dehak
33
15
0
26 Jan 2022
Improving the fusion of acoustic and text representations in RNN-T
Improving the fusion of acoustic and text representations in RNN-T
Chao Zhang
Bo-wen Li
Zhiyun Lu
Tara N. Sainath
Shuo-yiin Chang
AI4CE
43
12
0
25 Jan 2022
A Study of Transducer based End-to-End ASR with ESPnet: Architecture,
  Auxiliary Loss and Decoding Strategies
A Study of Transducer based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies
Florian Boyer
Yusuke Shinohara
Takaaki Ishii
H. Inaguma
Shinji Watanabe
29
34
0
14 Jan 2022
PM-MMUT: Boosted Phone-Mask Data Augmentation using Multi-Modeling Unit
  Training for Phonetic-Reduction-Robust E2E Speech Recognition
PM-MMUT: Boosted Phone-Mask Data Augmentation using Multi-Modeling Unit Training for Phonetic-Reduction-Robust E2E Speech Recognition
Guodong Ma
Pengfei Hu
Nurmemet Yolwas
Shen Huang
Hao-Ming Huang
27
4
0
13 Dec 2021
Perceptual Loss with Recognition Model for Single-Channel Enhancement
  and Robust ASR
Perceptual Loss with Recognition Model for Single-Channel Enhancement and Robust ASR
Peter William VanHarn Plantinga
Deblin Bagchi
Eric Fosler-Lussier
46
10
0
11 Dec 2021
BBS-KWS:The Mandarin Keyword Spotting System Won the Video Keyword
  Wakeup Challenge
BBS-KWS:The Mandarin Keyword Spotting System Won the Video Keyword Wakeup Challenge
Yuting Yang
Binbin Du
Yingxin Zhang
Wenxuan Wang
Yuke Li
16
0
0
03 Dec 2021
HASA-net: A non-intrusive hearing-aid speech assessment network
HASA-net: A non-intrusive hearing-aid speech assessment network
Hsin-Tien Chiang
Yi-Chiao Wu
Cheng Yu
T. Toda
Hsin-Min Wang
Yih-Chun Hu
Yu Tsao
20
12
0
10 Nov 2021
Speech recognition for air traffic control via feature learning and
  end-to-end training
Speech recognition for air traffic control via feature learning and end-to-end training
Peng Fan
Dongyue Guo
Yi Lin
Bo Yang
Jianwei Zhang
12
7
0
04 Nov 2021
Exploring Non-Autoregressive End-To-End Neural Modeling For English
  Mispronunciation Detection And Diagnosis
Exploring Non-Autoregressive End-To-End Neural Modeling For English Mispronunciation Detection And Diagnosis
Hsin-Wei Wang
Bi-Cheng Yan
Hsuan-Sheng Chiu
Yung-Chang Hsu
Berlin Chen
21
7
0
01 Nov 2021
Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on
  Real and Simulation Conditions
Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions
Wangyou Zhang
Jing Shi
Chenda Li
Shinji Watanabe
Y. Qian
19
22
0
27 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
232
1,024
0
13 Oct 2021
Speech Summarization using Restricted Self-Attention
Speech Summarization using Restricted Self-Attention
Roshan S. Sharma
Shruti Palaskar
A. Black
Florian Metze
30
33
0
12 Oct 2021
An Exploration of Self-Supervised Pretrained Representations for
  End-to-End Speech Recognition
An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition
Xuankai Chang
Takashi Maekaku
Pengcheng Guo
Jing Shi
Yen-Ju Lu
...
Tianzi Wang
Shu-Wen Yang
Yu Tsao
Hung-yi Lee
Shinji Watanabe
SSL
AI4TS
18
81
0
09 Oct 2021
Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular
  Subword Units
Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units
Yosuke Higuchi
Keita Karube
Tetsuji Ogawa
Tetsunori Kobayashi
16
22
0
08 Oct 2021
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context
  Prediction Network
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
26
3
0
22 Sep 2021
Greenformers: Improving Computation and Memory Efficiency in Transformer
  Models via Low-Rank Approximation
Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation
Samuel Cahyawijaya
26
12
0
24 Aug 2021
Multilingual Speech Recognition for Low-Resource Indian Languages using
  Multi-Task conformer
Multilingual Speech Recognition for Low-Resource Indian Languages using Multi-Task conformer
Krishna D N Freshworks
24
7
0
22 Aug 2021
A Study of Multilingual End-to-End Speech Recognition for Kazakh,
  Russian, and English
A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English
Saida Mussakhojayeva
Yerbolat Khassanov
H. A. Varol
14
17
0
03 Aug 2021
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for
  Natural-Sounding Voice Conversion
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
Yinghao Aaron Li
A. Zare
N. Mesgarani
19
98
0
21 Jul 2021
Leveraging Pre-trained Language Model for Speech Sentiment Analysis
Leveraging Pre-trained Language Model for Speech Sentiment Analysis
Suwon Shon
Pablo Brusco
Jing Pan
Kyu Jeong Han
Shinji Watanabe
9
16
0
11 Jun 2021
Streaming end-to-end speech recognition with jointly trained neural
  feature enhancement
Streaming end-to-end speech recognition with jointly trained neural feature enhancement
Chanwoo Kim
Abhinav Garg
Dhananjaya N. Gowda
Seongkyu Mun
C. Han
AuLLM
20
6
0
04 May 2021
Searchable Hidden Intermediates for End-to-End Models of Decomposable
  Sequence Tasks
Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks
Siddharth Dalmia
Brian Yan
Vikas Raunak
Florian Metze
Shinji Watanabe
37
30
0
02 May 2021
Scaling End-to-End Models for Large-Scale Multilingual ASR
Scaling End-to-End Models for Large-Scale Multilingual ASR
Bo-wen Li
Ruoming Pang
Tara N. Sainath
Anmol Gulati
Yu Zhang
James Qin
Parisa Haghani
Yifan Jiang
Min Ma
Junwen Bai
CLL
28
76
0
30 Apr 2021
End-to-End Speech Recognition from Federated Acoustic Models
End-to-End Speech Recognition from Federated Acoustic Models
Yan Gao
Titouan Parcollet
Salah Zaiem
Javier Fernandez-Marques
Pedro Porto Buarque de Gusmão
Daniel J. Beutel
Nicholas D. Lane
28
43
0
29 Apr 2021
Advanced Long-context End-to-end Speech Recognition Using
  Context-expanded Transformers
Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers
Takaaki Hori
Niko Moritz
Chiori Hori
Jonathan Le Roux
22
34
0
19 Apr 2021
Timers and Such: A Practical Benchmark for Spoken Language Understanding
  with Numbers
Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers
Loren Lugosch
Piyush Papreja
Mirco Ravanelli
A. Heba
Titouan Parcollet
24
12
0
04 Apr 2021
A study of latent monotonic attention variants
A study of latent monotonic attention variants
Albert Zeyer
Ralf Schluter
Hermann Ney
24
5
0
30 Mar 2021
End-to-End Dereverberation, Beamforming, and Speech Recognition with
  Improved Numerical Stability and Advanced Frontend
End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend
Wangyou Zhang
Christoph Boeddeker
Shinji Watanabe
Tomohiro Nakatani
Marc Delcroix
K. Kinoshita
Tsubasa Ochiai
Naoyuki Kamo
Reinhold Haeb-Umbach
Y. Qian
14
32
0
23 Feb 2021
Deep Learning based Multi-Source Localization with Source Splitting and
  its Effectiveness in Multi-Talker Speech Recognition
Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition
Aswin Shanmugam Subramanian
Chao Weng
Shinji Watanabe
Meng Yu
Dong Yu
25
78
0
16 Feb 2021
Train your classifier first: Cascade Neural Networks Training from upper
  layers to lower layers
Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers
Shucong Zhang
Cong-Thanh Do
R. Doddipatla
Erfan Loweimi
P. Bell
Steve Renals
16
2
0
09 Feb 2021
Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for
  Low-resource Speech Recognition
Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition
Cheng Yi
Shiyu Zhou
Bo Xu
51
40
0
17 Jan 2021
A review of on-device fully neural end-to-end automatic speech
  recognition algorithms
A review of on-device fully neural end-to-end automatic speech recognition algorithms
Chanwoo Kim
Dhananjaya N. Gowda
Dongsoo Lee
Jiyeon Kim
Ankur Kumar
Sungsoo Kim
Abhinav Garg
C. Han
19
27
0
14 Dec 2020
A Better and Faster End-to-End Model for Streaming ASR
A Better and Faster End-to-End Model for Streaming ASR
Bo-wen Li
Anmol Gulati
Jiahui Yu
Tara N. Sainath
Chung-Cheng Chiu
...
Wei Han
Qiao Liang
Yu Zhang
Trevor Strohman
Yonghui Wu
AuLLM
17
123
0
21 Nov 2020
On the Usefulness of Self-Attention for Automatic Speech Recognition
  with Transformers
On the Usefulness of Self-Attention for Automatic Speech Recognition with Transformers
Shucong Zhang
Erfan Loweimi
P. Bell
Steve Renals
19
36
0
08 Nov 2020
Cascaded encoders for unifying streaming and non-streaming ASR
Cascaded encoders for unifying streaming and non-streaming ASR
A. Narayanan
Tara N. Sainath
Ruoming Pang
Jiahui Yu
Chung-Cheng Chiu
Rohit Prabhavalkar
Ehsan Variani
Trevor Strohman
AuLLM
6
85
0
27 Oct 2020
Multitask Training with Text Data for End-to-End Speech Recognition
Multitask Training with Text Data for End-to-End Speech Recognition
Peidong Wang
Tara N. Sainath
Ron J. Weiss
14
27
0
27 Oct 2020
Recent Developments on ESPnet Toolkit Boosted by Conformer
Recent Developments on ESPnet Toolkit Boosted by Conformer
Pengcheng Guo
Florian Boyer
Xuankai Chang
Tomoki Hayashi
Yosuke Higuchi
...
Jing Shi
Shinji Watanabe
Kun Wei
Wangyou Zhang
Yuekai Zhang
36
262
0
26 Oct 2020
HarperValleyBank: A Domain-Specific Spoken Dialog Corpus
HarperValleyBank: A Domain-Specific Spoken Dialog Corpus
Mike Wu
J. Nafziger
A. Scodary
Andrew L. Maas
31
17
0
26 Oct 2020
Previous
123
Next