Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.05423
Cited By
v1
v2 (latest)
Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP
11 September 2023
Jinzuomu Zhong
Yang Li
Hui Huang
Korin Richmond
Jie Liu
Zhiba Su
Jing Guo
Benlai Tang
Fengjie Zhu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP"
12 / 12 papers shown
Title
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
201
3,732
0
06 Dec 2022
Low-Resource Mongolian Speech Synthesis Based on Automatic Prosody Annotation
Xin Yuan
Robin Feng
Mingming Ye
44
3
0
17 Nov 2022
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
Yusong Wu
Kai Chen
Tianyu Zhang
Yuchen Hui
Marianna Nezhurina
Taylor Berg-Kirkpatrick
Shlomo Dubnov
CLIP
129
537
0
12 Nov 2022
Automatic Prosody Annotation with Pre-Trained Text-Speech Model
Ziqian Dai
Jianwei Yu
Yan Wang
Nuo Chen
Yanyao Bian
Guangzhi Li
Deng Cai
Dong Yu
410
8
0
16 Jun 2022
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
Guangyan Zhang
Kaitao Song
Xu Tan
Daxin Tan
Yuzi Yan
...
G. Wang
Wei Zhou
Tao Qin
Tan Lee
Sheng Zhao
SSL
66
21
0
31 Mar 2022
A Character-level Span-based Model for Mandarin Prosodic Structure Prediction
Xueyuan Chen
Chang Song
Yixuan Zhou
Zhiyong Wu
Changbin Chen
Zhongqin Wu
Helen Meng
38
10
0
31 Mar 2022
ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech
Yi Ren
Ming Lei
Zhiying Huang
Shi-Rui Zhang
Qian Chen
Zhijie Yan
Zhou Zhao
78
43
0
16 Feb 2022
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
Ye Jia
Heiga Zen
Jonathan Shen
Yu Zhang
Yonghui Wu
SSL
85
84
0
28 Mar 2021
PPG-based singing voice conversion with adversarial representation learning
Zhonghao Li
Benlai Tang
Xiang Yin
Yuan Wan
Linjia Xu
Chen Shen
Zejun Ma
43
37
0
28 Oct 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
179
1,947
0
12 Oct 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
105
1,406
0
08 Jun 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
229
3,155
0
16 May 2020
1