Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.10234
Cited By
ESPnet-ST: All-in-One Speech Translation Toolkit
21 April 2020
Hirofumi Inaguma
Shun Kiyono
Kevin Duh
Shigeki Karita
Nelson Yalta
Tomoki Hayashi
Shinji Watanabe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ESPnet-ST: All-in-One Speech Translation Toolkit"
50 / 52 papers shown
Title
AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation
Wuwei Huang
Dexin Wang
Deyi Xiong
72
4
0
18 Mar 2025
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving
Bhavani Shankar
Preethi Jyothi
Pushpak Bhattacharyya
50
1
0
16 Jun 2024
Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation
Keqi Deng
Philip C. Woodland
43
4
0
06 Jun 2024
Few-Shot Spoken Language Understanding via Joint Speech-Text Models
Chung-Ming Chien
Mingjiamei Zhang
Ju-Chieh Chou
Karen Livescu
36
3
0
09 Oct 2023
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Peter Polák
Brian Yan
Shinji Watanabe
A. Waibel
Ondrej Bojar
28
9
0
20 Sep 2023
Strategies for improving low resource speech to text translation relying on pre-trained ASR models
Santosh Kesiraju
Marek Sarvaš
T. Pavlíček
Cécile Macaire
Alejandro Ciuba
17
4
0
31 May 2023
CTC-based Non-autoregressive Speech Translation
Chen Xu
Xiaoqian Liu
Xiaowen Liu
Qingxuan Sun
Yuhao Zhang
...
Tom Ko
Mingxuan Wang
Tong Xiao
Anxiang Ma
Jingbo Zhu
25
11
0
27 May 2023
DUB: Discrete Unit Back-translation for Speech Translation
Dong Zhang
Rong Ye
Tom Ko
Mingxuan Wang
Yaqian Zhou
31
23
0
19 May 2023
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
Yifan Peng
Kwangyoun Kim
Felix Wu
Brian Yan
Siddhant Arora
William Chen
Jiyang Tang
Suwon Shon
Prashant Sridhar
Shinji Watanabe
31
17
0
18 May 2023
DropDim: A Regularization Method for Transformer Networks
Hao Zhang
Dan Qu
Kejia Shao
Xu Yang
28
12
0
20 Apr 2023
Improving Speech Translation by Cross-Modal Multi-Grained Contrastive Learning
Hao Zhang
Nianwen Si
Yaqi Chen
Wenlin Zhang
Xukui Yang
Dan Qu
Weiqiang Zhang
35
9
0
20 Apr 2023
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Brian Yan
Jiatong Shi
Yun Tang
Hirofumi Inaguma
Yifan Peng
...
Zhaoheng Ni
Moto Hira
Soumi Maiti
J. Pino
Shinji Watanabe
21
20
0
10 Apr 2023
When Good and Reproducible Results are a Giant with Feet of Clay: The Importance of Software Quality in NLP
Sara Papi
Marco Gaido
Andrea Pilzer
Matteo Negri
59
10
0
28 Mar 2023
Efficient CTC Regularization via Coarse Labels for End-to-End Speech Translation
Biao Zhang
Barry Haddow
Rico Sennrich
19
3
0
21 Feb 2023
Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation
Motoi Omachi
Brian Yan
Siddharth Dalmia
Yuya Fujita
Shinji Watanabe
LRM
32
3
0
11 Nov 2022
Bridging Speech and Textual Pre-trained Models with Unsupervised ASR
Jiatong Shi
Chan-Jan Hsu
Ho-Lam Chung
Dongji Gao
Leibny Paola García-Perera
Shinji Watanabe
Ann Lee
Hung-yi Lee
32
12
0
06 Nov 2022
Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation
Chen Wang
Yuchen Liu
Boxing Chen
Jiajun Zhang
Wei Luo
Zhongqiang Huang
Chengqing Zong
39
10
0
18 Oct 2022
CTC Alignments Improve Autoregressive Translation
Brian Yan
Siddharth Dalmia
Yosuke Higuchi
Graham Neubig
Florian Metze
A. Black
Shinji Watanabe
46
33
0
11 Oct 2022
JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT
Mayumi Ohta
Julia Kreutzer
Stefan Riezler
19
0
0
05 Oct 2022
ESSumm: Extractive Speech Summarization from Untranscribed Meeting
Jun Wang
33
7
0
14 Sep 2022
On the Impact of Noises in Crowd-Sourced Data for Speech Translation
Siqi Ouyang
Rong Ye
Lei Li
32
8
0
28 Jun 2022
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
Hui Zhang
Tian Yuan
Junkun Chen
Xintong Li
Renjie Zheng
...
Zeyu Chen
Xiaoguang Hu
Dianhai Yu
Yanjun Ma
Liang Huang
AuLLM
38
24
0
20 May 2022
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Felix Wu
Kwangyoun Kim
Shinji Watanabe
Kyu Jeong Han
Ryan T. McDonald
Kilian Q. Weinberger
Yoav Artzi
SyDa
48
38
0
02 May 2022
Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation
Keqi Deng
Shinji Watanabe
Jiatong Shi
Siddhant Arora
33
15
0
19 Apr 2022
GigaST: A 10,000-hour Pseudo Speech Translation Corpus
Rong Ye
Chengqi Zhao
Tom Ko
Chutong Meng
Tao Wang
Mingxuan Wang
Jun Cao
11
23
0
08 Apr 2022
Does Simultaneous Speech Translation need Simultaneous Models?
Sara Papi
Marco Gaido
Matteo Negri
Marco Turchi
43
26
0
08 Apr 2022
Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation
Ryo Fukuda
Katsuhito Sudoh
Satoshi Nakamura
24
7
0
29 Mar 2022
STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation
Qingkai Fang
Rong Ye
Lei Li
Yang Feng
Mingxuan Wang
39
95
0
20 Mar 2022
ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
Siddhant Arora
Siddharth Dalmia
Pavel Denisov
Xuankai Chang
Yushi Ueda
...
Karthik Ganesan
Brian Yan
Ngoc Thang Vu
A. Black
Shinji Watanabe
VLM
35
74
0
29 Nov 2021
Attention-based Multi-hypothesis Fusion for Speech Summarization
Takatomo Kano
A. Ogawa
Marc Delcroix
Shinji Watanabe
22
13
0
16 Nov 2021
WaveFake: A Data Set to Facilitate Audio Deepfake Detection
Joel Frank
Lea Schonherr
DiffM
132
125
0
04 Nov 2021
An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition
Xuankai Chang
Takashi Maekaku
Pengcheng Guo
Jing Shi
Yen-Ju Lu
...
Tianzi Wang
Shu-Wen Yang
Yu Tsao
Hung-yi Lee
Shinji Watanabe
SSL
AI4TS
24
81
0
09 Oct 2021
SpliceOut: A Simple and Efficient Audio Augmentation Method
Arjit Jain
Pranay Reddy Samala
Deepak Mittal
Preethi Jyothi
M. Singh
30
10
0
30 Sep 2021
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates
Hirofumi Inaguma
Siddharth Dalmia
Brian Yan
Shinji Watanabe
65
11
0
27 Sep 2021
Speechformer: Reducing Information Loss in Direct Speech Translation
Sara Papi
Marco Gaido
Matteo Negri
Marco Turchi
65
23
0
09 Sep 2021
Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring
Hirofumi Inaguma
Yosuke Higuchi
Kevin Duh
Tatsuya Kawahara
Shinji Watanabe
63
11
0
09 Sep 2021
ESPnet-ST IWSLT 2021 Offline Speech Translation System
Hirofumi Inaguma
Shun Kiyono
Nelson Enrique Yalta Soplin
Pengcheng Guo
Jun Suzuki
Kevin Duh
Shinji Watanabe
3DV
40
2
0
01 Jul 2021
Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders
Chen Xu
Bojie Hu
Yanyang Li
Yuhao Zhang
Shen Huang
Qi Ju
Tong Xiao
Jingbo Zhu
25
76
0
12 May 2021
Learning Shared Semantic Space for Speech-to-Text Translation
Chi Han
Mingxuan Wang
Heng Ji
Lei Li
18
76
0
07 May 2021
Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks
Siddharth Dalmia
Brian Yan
Vikas Raunak
Florian Metze
Shinji Watanabe
47
30
0
02 May 2021
AlloST: Low-resource Speech Translation without Source Transcription
Yao-Fei Cheng
Hung-Shin Lee
Hsin-Min Wang
27
8
0
01 May 2021
End-to-end Speech Translation via Cross-modal Progressive Training
Rong Ye
Mingxuan Wang
Lei Li
28
71
0
21 Apr 2021
NeurST: Neural Speech Translation Toolkit
Chengqi Zhao
Mingxuan Wang
Qianqian Dong
Rong Ye
Lei Li
30
32
0
18 Dec 2020
Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation
Hang Le
J. Pino
Changhan Wang
Jiatao Gu
D. Schwab
Laurent Besacier
39
82
0
02 Nov 2020
Recent Developments on ESPnet Toolkit Boosted by Conformer
Pengcheng Guo
Florian Boyer
Xuankai Chang
Tomoki Hayashi
Yosuke Higuchi
...
Jing Shi
Shinji Watanabe
Kun Wei
Wangyou Zhang
Yuekai Zhang
45
262
0
26 Oct 2020
A Technical Report: BUT Speech Translation Systems
Hari Krishna Vydana
L. Burget
J. Černocký
24
0
0
22 Oct 2020
A General Multi-Task Learning Framework to Leverage Text Data for Speech to Text Tasks
Yun Tang
J. Pino
Changhan Wang
Xutai Ma
Dmitriy Genzel
26
73
0
21 Oct 2020
Transformer based unsupervised pre-training for acoustic representation learning
Ruixiong Zhang
Haiwei Wu
Wubo Li
Dongwei Jiang
Wei Zou
Xiangang Li
SSL
ViT
27
27
0
29 Jul 2020
NeMo: a toolkit for building AI applications using Neural Modules
Oleksii Kuchaiev
Jason Chun Lok Li
Huyen Nguyen
Oleksii Hrinchuk
Ryan Leary
...
Jack Cook
P. Castonguay
Mariya Popova
Jocelyn Huang
Jonathan M. Cohen
211
296
0
14 Sep 2019
Tied Multitask Learning for Neural Speech Translation
Antonios Anastasopoulos
David Chiang
102
172
0
19 Feb 2018
1
2
Next