ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.06773
  4. Cited By
Joint CTC-Attention based End-to-End Speech Recognition using Multi-task
  Learning

Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning

21 September 2016
Suyoun Kim
Takaaki Hori
Shinji Watanabe
ArXivPDFHTML

Papers citing "Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning"

50 / 135 papers shown
Title
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Sungnyun Kim
Sungwoo Cho
Sangmin Bae
Kangwook Jang
Se-Young Yun
SSL
68
1
0
23 Jan 2025
Towards Effective and Efficient Non-autoregressive Decoding Using
  Block-based Attention Mask
Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask
Tianzi Wang
Xurong Xie
Zhaoqing Li
Shoukang Hu
Zengrui Jin
...
Shujie Hu
Mengzhe Geng
Guinan Li
Helen Meng
Xunying Liu
34
0
0
14 Jun 2024
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech
  Units for Spoken Language Understanding
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
Suwon Shon
Kwangyoun Kim
Yi-Te Hsu
Prashant Sridhar
Shinji Watanabe
Karen Livescu
AuLLM
46
2
0
13 Jun 2024
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
Zheshu Song
Jianheng Zhuo
Yifan Yang
Ziyang Ma
Shixiong Zhang
Xie Chen
36
9
0
07 Jun 2024
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech
  Recognition, Translation, and Language Identification
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Yifan Peng
Yui Sudo
Muhammad Shakeel
Shinji Watanabe
VLM
37
17
0
20 Feb 2024
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition
A. Ogawa
Naohiro Tawara
Takatomo Kano
Marc Delcroix
46
4
0
22 Dec 2023
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech
  Recognition with Universal Speech Models
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
Shaojin Ding
David Qiu
David Rim
Yanzhang He
Oleg Rybakov
...
Tara N. Sainath
Zhonglin Han
Jian Li
Amir Yazdanbakhsh
Shivani Agrawal
MQ
26
9
0
13 Dec 2023
Multimodal Data and Resource Efficient Device-Directed Speech Detection
  with Large Foundation Models
Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models
Dominik Wagner
Alexander W. Churchill
Siddharth Sigtia
Panayiotis Georgiou
Matt Mirsamadi
Aarshee Mishra
Erik Marchi
15
3
0
06 Dec 2023
Retrieve and Copy: Scaling ASR Personalization to Large Catalogs
Retrieve and Copy: Scaling ASR Personalization to Large Catalogs
Sai Muralidhar Jayanthi
Devang Kulshreshtha
Saket Dingliwal
S. Ronanki
S. Bodapati
32
7
0
14 Nov 2023
MUST: A Multilingual Student-Teacher Learning approach for low-resource
  speech recognition
MUST: A Multilingual Student-Teacher Learning approach for low-resource speech recognition
Muhammad Umar Farooq
Rehan Ahmad
Thomas Hain
23
0
0
29 Oct 2023
Audio-AdapterFusion: A Task-ID-free Approach for Efficient and
  Non-Destructive Multi-task Speech Recognition
Audio-AdapterFusion: A Task-ID-free Approach for Efficient and Non-Destructive Multi-task Speech Recognition
Hillary Ngai
Rohan Agrawal
Neeraj Gaur
Ronny Huang
Parisa Haghani
P. M. Mengibar
MoMe
36
0
0
17 Oct 2023
LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation
  Auxiliary Task for E2E Code-switching ASR
LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR
Guodong Ma
Wenxuan Wang
Yuke Li
Yuting Yang
Binbin Du
Haoran Fu
23
5
0
28 Sep 2023
Updated Corpora and Benchmarks for Long-Form Speech Recognition
Updated Corpora and Benchmarks for Long-Form Speech Recognition
Jennifer Drexler Fox
Desh Raj
Natalie Delworth
Quinn Mcnamara
Corey Miller
Miguel Jetté
AuLLM
28
7
0
26 Sep 2023
Memory-augmented conformer for improved end-to-end long-form ASR
Memory-augmented conformer for improved end-to-end long-form ASR
Carlos Carvalho
A. Abad
RALM
30
1
0
22 Sep 2023
Variational Connectionist Temporal Classification for Order-Preserving
  Sequence Modeling
Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling
Zheng Nan
T. Dang
V. Sethu
Beena Ahmed
BDL
19
2
0
21 Sep 2023
Semi-Autoregressive Streaming ASR With Label Context
Semi-Autoregressive Streaming ASR With Label Context
Siddhant Arora
G. Saon
Shinji Watanabe
Brian Kingsbury
AI4TS
23
5
0
19 Sep 2023
TST: Time-Sparse Transducer for Automatic Speech Recognition
TST: Time-Sparse Transducer for Automatic Speech Recognition
Xiaohui Zhang
Mangui Liang
Zhengkun Tian
Jiangyan Yi
J. Tao
9
0
0
17 Jul 2023
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for
  Speech Recognition and Understanding
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding
Titouan Parcollet
Rogier van Dalen
Shucong Zhang
S. Bhattacharya
26
6
0
12 Jul 2023
HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation
HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation
Cihan Xiao
Henry Li Xinyuan
Jinyi Yang
Dongji Gao
Matthew Wiesner
Kevin Duh
Sanjeev Khudanpur
31
1
0
20 Jun 2023
SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition
SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition
Desh Raj
Daniel Povey
Sanjeev Khudanpur
VLM
26
9
0
18 Jun 2023
CopyNE: Better Contextual ASR by Copying Named Entities
CopyNE: Better Contextual ASR by Copying Named Entities
Shilin Zhou
Zhenghua Li
Yu Hong
M. Zhang
Zhefeng Wang
Baoxing Huai
15
5
0
22 May 2023
Language-Universal Phonetic Representation in Multilingual Speech
  Pretraining for Low-Resource Speech Recognition
Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition
Siyuan Feng
Ming Tu
Rui Xia
Chuanzeng Huang
Yuxuan Wang
35
5
0
19 May 2023
A Comparative Study on E-Branchformer vs Conformer in Speech
  Recognition, Translation, and Understanding Tasks
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
Yifan Peng
Kwangyoun Kim
Felix Wu
Brian Yan
Siddhant Arora
William Chen
Jiyang Tang
Suwon Shon
Prashant Sridhar
Shinji Watanabe
21
17
0
18 May 2023
Dynamic Chunk Convolution for Unified Streaming and Non-Streaming
  Conformer ASR
Dynamic Chunk Convolution for Unified Streaming and Non-Streaming Conformer ASR
Xilai Li
Goeric Huybrechts
S. Ronanki
Jeffrey J. Farris
S. Bodapati
33
6
0
18 Apr 2023
Practical Conformer: Optimizing size, speed and flops of Conformer for
  on-Device and cloud ASR
Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR
Rami Botros
Anmol Gulati
Tara N. Sainath
K. Choromanski
Ruoming Pang
Trevor Strohman
Weiran Wang
Jiahui Yu
MQ
17
3
0
31 Mar 2023
A Deliberation-based Joint Acoustic and Text Decoder
A Deliberation-based Joint Acoustic and Text Decoder
S. Mavandadi
Tara N. Sainath
Ke Hu
Zelin Wu
21
7
0
23 Mar 2023
Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for
  Mandarin Speech Recognition
Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition
Kai Liu
Hailiang Xiong
Gangqiang Yang
Zhengfeng Du
Yewen Cao
D. Shah
13
0
0
23 Mar 2023
Exploring Turkish Speech Recognition via Hybrid CTC/Attention
  Architecture and Multi-feature Fusion Network
Exploring Turkish Speech Recognition via Hybrid CTC/Attention Architecture and Multi-feature Fusion Network
Zeyu Ren
Nurmemet Yolwas
Huiru Wang
Wushour Slamu
21
0
0
22 Mar 2023
Efficient CTC Regularization via Coarse Labels for End-to-End Speech
  Translation
Efficient CTC Regularization via Coarse Labels for End-to-End Speech Translation
Biao Zhang
Barry Haddow
Rico Sennrich
17
3
0
21 Feb 2023
Massively Multilingual Shallow Fusion with Large Language Models
Massively Multilingual Shallow Fusion with Large Language Models
Ke Hu
Tara N. Sainath
Bo-wen Li
Nan Du
Yanping Huang
Andrew M. Dai
Yu Zhang
Rodrigo Cabrera
Z. Chen
Trevor Strohman
35
13
0
17 Feb 2023
Cut your Losses with Squentropy
Cut your Losses with Squentropy
Like Hui
M. Belkin
S. Wright
UQCV
18
8
0
08 Feb 2023
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture,
  and Generalization Capabilities
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities
Andros Tjandra
Nayan Singhal
David C. Zhang
Ozlem Kalinli
Abdel-rahman Mohamed
Duc Le
M. Seltzer
32
12
0
10 Nov 2022
Joint Audio/Text Training for Transformer Rescorer of Streaming Speech
  Recognition
Joint Audio/Text Training for Transformer Rescorer of Streaming Speech Recognition
Suyoun Kim
Ke Li
Lucas Kabela
Rongqing Huang
Jiedan Zhu
Ozlem Kalinli
Duc Le
25
8
0
31 Oct 2022
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by
  Combining Regression and Improved Contrastive Learning
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning
Qiu-shi Zhu
Long Zhou
Jie Zhang
Shujie Liu
Yu-Chen Hu
Lirong Dai
VLM
SSL
60
37
0
27 Oct 2022
Training Autoregressive Speech Recognition Models with Limited in-domain
  Supervision
Training Autoregressive Speech Recognition Models with Limited in-domain Supervision
Chak-Fai Li
Francis Keith
William Hartmann
M. Snover
14
0
0
27 Oct 2022
Monotonic segmental attention for automatic speech recognition
Monotonic segmental attention for automatic speech recognition
Albert Zeyer
Robin Schmitt
Wei Zhou
Ralf Schluter
Hermann Ney
16
8
0
26 Oct 2022
Improving Semi-supervised End-to-end Automatic Speech Recognition using
  CycleGAN and Inter-domain Losses
Improving Semi-supervised End-to-end Automatic Speech Recognition using CycleGAN and Inter-domain Losses
C. Li
Ngoc Thang Vu
14
2
0
20 Oct 2022
Robust One-Shot Singing Voice Conversion
Robust One-Shot Singing Voice Conversion
Naoya Takahashi
M. Singh
Yuki Mitsufuji
DiffM
22
8
0
20 Oct 2022
Towards Personalization of CTC Speech Recognition Models with Contextual
  Adapters and Adaptive Boosting
Towards Personalization of CTC Speech Recognition Models with Contextual Adapters and Adaptive Boosting
Saket Dingliwal
Monica Sunkara
S. Bodapati
S. Ronanki
Jeffrey J. Farris
Katrin Kirchhoff
25
0
0
18 Oct 2022
A Policy-based Approach to the SpecAugment Method for Low Resource E2E
  ASR
A Policy-based Approach to the SpecAugment Method for Low Resource E2E ASR
Rui Li
Guodong Ma
Dexin Zhao
Ranran Zeng
Xiaoyu Li
Haolin Huang
21
2
0
16 Oct 2022
JOIST: A Joint Speech and Text Streaming Model For ASR
JOIST: A Joint Speech and Text Streaming Model For ASR
Tara N. Sainath
Rohit Prabhavalkar
Ankur Bapna
Yu Zhang
Zhouyuan Huo
Zhehuai Chen
Bo-wen Li
Weiran Wang
Trevor Strohman
RALM
AuLLM
48
35
0
13 Oct 2022
An Experimental Study on Private Aggregation of Teacher Ensemble
  Learning for End-to-End Speech Recognition
An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition
Chao-Han Huck Yang
I-Fan Chen
A. Stolcke
Sabato Marco Siniscalchi
Chin-Hui Lee
27
2
0
11 Oct 2022
CTC Alignments Improve Autoregressive Translation
CTC Alignments Improve Autoregressive Translation
Brian Yan
Siddharth Dalmia
Yosuke Higuchi
Graham Neubig
Florian Metze
A. Black
Shinji Watanabe
44
33
0
11 Oct 2022
E-Branchformer: Branchformer with Enhanced merging for speech
  recognition
E-Branchformer: Branchformer with Enhanced merging for speech recognition
Kwangyoun Kim
Felix Wu
Yifan Peng
Jing Pan
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
55
105
0
30 Sep 2022
Direct Speech Translation for Automatic Subtitling
Direct Speech Translation for Automatic Subtitling
Sara Papi
Marco Gaido
Alina Karakanta
Mauro Cettolo
Matteo Negri
Marco Turchi
54
11
0
27 Sep 2022
Improving Mandarin Speech Recogntion with Block-augmented Transformer
Improving Mandarin Speech Recogntion with Block-augmented Transformer
Xiaoming Ren
Huifeng Zhu
Liuwei Wei
Minghui Wu
Jie Hao
33
9
0
24 Jul 2022
Visual Context-driven Audio Feature Enhancement for Robust End-to-End
  Audio-Visual Speech Recognition
Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
Joanna Hong
Minsu Kim
Daehun Yoo
Y. Ro
26
20
0
13 Jul 2022
Branchformer: Parallel MLP-Attention Architectures to Capture Local and
  Global Context for Speech Recognition and Understanding
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Yifan Peng
Siddharth Dalmia
Ian Lane
Shinji Watanabe
21
143
0
06 Jul 2022
Joint Encoder-Decoder Self-Supervised Pre-training for ASR
Joint Encoder-Decoder Self-Supervised Pre-training for ASR
Arunkumar A
S. Umesh
SSL
34
8
0
09 Jun 2022
LegoNN: Building Modular Encoder-Decoder Models
LegoNN: Building Modular Encoder-Decoder Models
Siddharth Dalmia
Dmytro Okhonko
M. Lewis
Sergey Edunov
Shinji Watanabe
Florian Metze
Luke Zettlemoyer
Abdel-rahman Mohamed
AuLLM
MoE
26
14
0
07 Jun 2022
123
Next