ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.07205
  4. Cited By
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language
  Processing

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

14 October 2021
Junyi Ao
Rui Wang
Long Zhou
Chengyi Wang
Shuo Ren
Yu Wu
Shujie Liu
Tom Ko
Qing Li
Yu Zhang
Zhihua Wei
Yao Qian
Jinyu Li
Furu Wei
ArXivPDFHTML

Papers citing "SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing"

43 / 43 papers shown
Title
Multi-Modal Explainable Medical AI Assistant for Trustworthy Human-AI Collaboration
Multi-Modal Explainable Medical AI Assistant for Trustworthy Human-AI Collaboration
Honglong Yang
Shanshan Song
Yi Qin
Lehan Wang
Haonan Wang
Xinpeng Ding
Qixiang Zhang
Bodong Du
X. Li
LM&MA
34
0
0
11 May 2025
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
Zhongyang Li
Ziyue Li
Tianyi Zhou
MoE
53
0
0
27 Feb 2025
Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement
Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement
Jae-Sung Bae
Anastasia Kuznetsova
Dinesh Manocha
John Hershey
Trausti Kristjansson
Minje Kim
72
0
0
23 Jan 2025
A Non-autoregressive Model for Joint STT and TTS
A Non-autoregressive Model for Joint STT and TTS
Vishal Sunder
Brian Kingsbury
G. Saon
Samuel Thomas
Slava Shechtman Hagai Aronowitz
Hagai Aronowitz
Eric Fosler-Lussier
Luis A. Lastras
61
0
0
15 Jan 2025
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Tsz Kin Lam
Marco Gaido
Sara Papi
L. Bentivogli
Barry Haddow
31
0
0
04 Jan 2025
Prosody-Driven Privacy-Preserving Dementia Detection
Prosody-Driven Privacy-Preserving Dementia Detection
Dominika Woszczyk
Ranya Aloufi
Soteris Demetriou
34
2
0
03 Jul 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic
  Alignment
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Paarth Neekhara
Shehzeen Samarah Hussain
Subhankar Ghosh
Jason Chun Lok Li
Rafael Valle
Rohan Badlani
Boris Ginsburg
50
11
0
25 Jun 2024
Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual
  Text-to-Speech Adaptation
Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation
Yingting Li
Ambuj Mehrish
Bryan Chew
Bo Cheng
Soujanya Poria
38
0
0
25 Jun 2024
Inclusive ASR for Disfluent Speech: Cascaded Large-Scale Self-Supervised
  Learning with Targeted Fine-Tuning and Data Augmentation
Inclusive ASR for Disfluent Speech: Cascaded Large-Scale Self-Supervised Learning with Targeted Fine-Tuning and Data Augmentation
Dena F. Mujtaba
N. Mahapatra
Megan Arney
J Scott Yaruss
Caryn Herring
Jia Bin
29
1
0
14 Jun 2024
SpeechVerse: A Large-scale Generalizable Audio Language Model
SpeechVerse: A Large-scale Generalizable Audio Language Model
Nilaksh Das
Saket Dingliwal
S. Ronanki
Rohit Paturi
David Huang
...
Monica Sunkara
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
Katrin Kirchhoff
41
37
0
14 May 2024
Lost in Transcription: Identifying and Quantifying the Accuracy Biases
  of Automatic Speech Recognition Systems Against Disfluent Speech
Lost in Transcription: Identifying and Quantifying the Accuracy Biases of Automatic Speech Recognition Systems Against Disfluent Speech
Dena F. Mujtaba
N. Mahapatra
Megan Arney
J Scott Yaruss
Hope Gerlach-Houck
Caryn Herring
Jia Bin
32
0
0
10 May 2024
Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding
Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding
Jingjing Hu
Dan Guo
Kun Li
Zhan Si
Xun Yang
Xiaojun Chang
Meng Wang
59
3
0
21 Mar 2024
Qwen-Audio: Advancing Universal Audio Understanding via Unified
  Large-Scale Audio-Language Models
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
28
267
0
14 Nov 2023
Toward Joint Language Modeling for Speech Units and Text
Toward Joint Language Modeling for Speech Units and Text
Ju-Chieh Chou
Chung-Ming Chien
Wei-Ning Hsu
Karen Livescu
Arun Babu
Alexis Conneau
Alexei Baevski
Michael Auli
VLM
26
19
0
12 Oct 2023
Temporally Aligning Long Audio Interviews with Questions: A Case Study
  in Multimodal Data Integration
Temporally Aligning Long Audio Interviews with Questions: A Case Study in Multimodal Data Integration
Piyush Singh Pasi
Karthikeya Battepati
P. Jyothi
Ganesh Ramakrishnan
T. Mahapatra
Manoj Singh
51
0
0
10 Oct 2023
Few-Shot Spoken Language Understanding via Joint Speech-Text Models
Few-Shot Spoken Language Understanding via Joint Speech-Text Models
Chung-Ming Chien
Mingjiamei Zhang
Ju-Chieh Chou
Karen Livescu
26
3
0
09 Oct 2023
Exploring Multimodal Approaches for Alzheimer's Disease Detection Using
  Patient Speech Transcript and Audio Data
Exploring Multimodal Approaches for Alzheimer's Disease Detection Using Patient Speech Transcript and Audio Data
Hongmin Cai
Xiaoke Huang
Zheng Liu
Wenxiong Liao
Haixing Dai
...
Dajiang Zhu
Hui Ren
Quanzheng Li
Tianming Liu
Xiang Li
33
17
0
05 Jul 2023
Text-only Domain Adaptation using Unified Speech-Text Representation in
  Transducer
Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer
Lu Huang
B. Li
Jun Zhang
Lu Lu
Zejun Ma
21
2
0
07 Jun 2023
CopyNE: Better Contextual ASR by Copying Named Entities
CopyNE: Better Contextual ASR by Copying Named Entities
Shilin Zhou
Zhenghua Li
Yu Hong
M. Zhang
Zhefeng Wang
Baoxing Huai
15
5
0
22 May 2023
DUB: Discrete Unit Back-translation for Speech Translation
DUB: Discrete Unit Back-translation for Speech Translation
Dong Zhang
Rong Ye
Tom Ko
Mingxuan Wang
Yaqian Zhou
13
23
0
19 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
31
114
0
18 May 2023
Back Translation for Speech-to-text Translation Without Transcripts
Back Translation for Speech-to-text Translation Without Transcripts
Qingkai Fang
Yang Feng
30
13
0
15 May 2023
Seeing What You Said: Talking Face Generation Guided by a Lip Reading
  Expert
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert
Jiadong Wang
Xinyuan Qian
Malu Zhang
R. Tan
Haizhou Li
EGVM
22
93
0
29 Mar 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
42
47
0
21 Mar 2023
Efficient CTC Regularization via Coarse Labels for End-to-End Speech
  Translation
Efficient CTC Regularization via Coarse Labels for End-to-End Speech Translation
Biao Zhang
Barry Haddow
Rico Sennrich
15
3
0
21 Feb 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
43
639
0
05 Jan 2023
WACO: Word-Aligned Contrastive Learning for Speech Translation
WACO: Word-Aligned Contrastive Learning for Speech Translation
Siqi Ouyang
Rong Ye
Lei Li
24
25
0
19 Dec 2022
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist
  Models
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Jinze Bai
Rui Men
Han Yang
Xuancheng Ren
Kai Dang
...
Wenhang Ge
Jianxin Ma
Junyang Lin
Jingren Zhou
Chang Zhou
37
15
0
08 Dec 2022
TESSP: Text-Enhanced Self-Supervised Speech Pre-training
TESSP: Text-Enhanced Self-Supervised Speech Pre-training
Zhuoyuan Yao
Shuo Ren
Sanyuan Chen
Ziyang Ma
Pengcheng Guo
Linfu Xie
22
5
0
24 Nov 2022
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by
  Combining Regression and Improved Contrastive Learning
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning
Qiu-shi Zhu
Long Zhou
Jie M. Zhang
Shujie Liu
Yu-Chen Hu
Lirong Dai
VLM
SSL
54
37
0
27 Oct 2022
JOIST: A Joint Speech and Text Streaming Model For ASR
JOIST: A Joint Speech and Text Streaming Model For ASR
Tara N. Sainath
Rohit Prabhavalkar
Ankur Bapna
Yu Zhang
Zhouyuan Huo
Zhehuai Chen
Bo-wen Li
Weiran Wang
Trevor Strohman
RALM
AuLLM
38
35
0
13 Oct 2022
CoBERT: Self-Supervised Speech Representation Learning Through Code
  Representation Learning
CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
Chutong Meng
Junyi Ao
Tom Ko
Mingxuan Wang
Haizhou Li
SSL
39
6
0
08 Oct 2022
AudioGen: Textually Guided Audio Generation
AudioGen: Textually Guided Audio Generation
Felix Kreuk
Gabriel Synnaeve
Adam Polyak
Uriel Singer
Alexandre Défossez
Jade Copet
Devi Parikh
Yaniv Taigman
Yossi Adi
DiffM
17
289
0
30 Sep 2022
UAVM: Towards Unifying Audio and Visual Models
UAVM: Towards Unifying Audio and Visual Models
Yuan Gong
Alexander H. Liu
Andrew Rouditchenko
James R. Glass
27
20
0
29 Jul 2022
Joint Encoder-Decoder Self-Supervised Pre-training for ASR
Joint Encoder-Decoder Self-Supervised Pre-training for ASR
Arunkumar A
S. Umesh
SSL
34
8
0
09 Jun 2022
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
  Languages
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Felix Wu
Kwangyoun Kim
Shinji Watanabe
Kyu Jeong Han
Ryan T. McDonald
Kilian Q. Weinberger
Yoav Artzi
SyDa
40
37
0
02 May 2022
Enhanced Direct Speech-to-Speech Translation Using Self-supervised
  Pre-training and Data Augmentation
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation
Sravya Popuri
Peng-Jen Chen
Changhan Wang
J. Pino
Yossi Adi
Jiatao Gu
Wei-Ning Hsu
Ann Lee
20
56
0
06 Apr 2022
A Complementary Joint Training Approach Using Unpaired Speech and Text
  for Low-Resource Automatic Speech Recognition
A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition
Ye Du
Jie M. Zhang
Qiu-shi Zhu
Lirong Dai
Ming Wu
Xin Fang
Zhouwang Yang
24
2
0
05 Apr 2022
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired
  Speech Data
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
Junyi Ao
Zi-Hua Zhang
Long Zhou
Shujie Liu
Haizhou Li
Tom Ko
Lirong Dai
Jinyu Li
Yao Qian
Furu Wei
SSL
17
19
0
31 Mar 2022
LightHuBERT: Lightweight and Configurable Speech Representation Learning
  with Once-for-All Hidden-Unit BERT
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT
Rui Wang
Qibing Bai
Junyi Ao
Long Zhou
Zhixiang Xiong
Zhihua Wei
Yu Zhang
Tom Ko
Haizhou Li
28
61
0
29 Mar 2022
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
77
1,698
0
26 Oct 2021
Improving Speech Translation by Understanding and Learning from the
  Auxiliary Text Translation Task
Improving Speech Translation by Understanding and Learning from the Auxiliary Text Translation Task
Yun Tang
J. Pino
Xian Li
Changhan Wang
Dmitriy Genzel
106
81
0
12 Jul 2021
Generative Spoken Language Modeling from Raw Audio
Generative Spoken Language Modeling from Raw Audio
Kushal Lakhotia
Evgeny Kharitonov
Wei-Ning Hsu
Yossi Adi
Adam Polyak
...
Tu Nguyen
Jade Copet
Alexei Baevski
A. Mohamed
Emmanuel Dupoux
AuLLM
182
336
0
01 Feb 2021
1