Articulatory Encodec: Coding Speech through Vocal Tract Kinematics

18 June 2024

Cheol Jun Cho

Peter Wu

Tejas S. Prabhune

Dhruv Agarwal

Gopala K. Anumanchipalli

ArXiv PDF HTML

Papers citing "Articulatory Encodec: Coding Speech through Vocal Tract Kinematics"

21 / 21 papers shown

Title
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis Sang-Hoon Lee Haram Choi Seung-Bin Kim Seong-Whan Lee BDL 96 35 0 21 Nov 2023
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT Cheol Jun Cho Abdelrahman Mohamed Shang-Wen Li Alan W. Black Gopala K. Anumanchipalli 77 9 0 16 Oct 2023
Improving Speech Inversion Through Self-Supervised Embeddings and Enhanced Tract Variables Ahmed Adel Attia Yashish M. Siriwardena Carol Espy-Wilson SSL 58 8 0 17 Sep 2023
QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion Houjian Guo Chaoran Liu C. Ishi H. Ishiguro BDL 63 12 0 16 Feb 2023
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis Hyeong-Seok Choi Jinhyeok Yang Juheon Lee Hyeongju Kim 57 46 0 17 Nov 2022
The Secret Source : Incorporating Source Features to Improve Acoustic-to-Articulatory Speech Inversion Yashish M. Siriwardena C. Espy-Wilson 60 16 0 29 Oct 2022
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition? Sanyuan Chen Yu Wu Chengyi Wang Shujie Liu Zhuo Chen ... Gang Liu Jinyu Li Jian Wu Xiangzhan Yu Furu Wei SSL 78 42 0 27 Apr 2022
UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022 Takaaki Saeki Detai Xin Wataru Nakata Tomoki Koriyama Shinnosuke Takamichi Hiroshi Saruwatari 90 207 0 05 Apr 2022
Deep Neural Convolutive Matrix Factorization for Articulatory Representation Decomposition Jiachen Lian A. Black Louis Goldstein Gopala Krishna Anumanchipalli 63 18 0 01 Apr 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone Edresson Casanova Julian Weber C. Shulby Arnaldo Cândido Júnior Eren Golge M. Ponti 221 408 0 04 Dec 2021
Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations Hyeong-Seok Choi Juheon Lee W. Kim Jie Hwan Lee Hoon Heo Kyogu Lee 74 158 0 27 Oct 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing Sanyuan Chen Chengyi Wang Zhengyang Chen Yu-Huan Wu Shujie Liu ... Yao Qian Jian Wu Micheal Zeng Xiangzhan Yu Furu Wei SSL 239 1,857 0 26 Oct 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units Wei-Ning Hsu Benjamin Bolte Yao-Hung Hubert Tsai Kushal Lakhotia Ruslan Salakhutdinov Abdel-rahman Mohamed SSL 166 2,949 0 14 Jun 2021
Exploring wav2vec 2.0 on speaker verification and language identification Zhiyun Fan Meng Li Shiyu Zhou Bo Xu 133 203 0 11 Dec 2020
MLS: A Large-Scale Multilingual Dataset for Speech Research Vineel Pratap Qiantong Xu Anuroop Sriram Gabriel Synnaeve R. Collobert AuLLM 86 503 0 07 Dec 2020
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines Yao Shi Hui Bu Xin Xu Shaojing Zhang Ming Li 70 222 0 22 Oct 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong Jaehyeon Kim Jaekyoung Bae 177 1,931 0 12 Oct 2020
JVS corpus: free Japanese multi-speaker voice corpus Shinnosuke Takamichi Kentaro Mitsui Yuki Saito Tomoki Koriyama Naoko Tanji Hiroshi Saruwatari 40 71 0 17 Aug 2019
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech Heiga Zen Viet Dang R. Clark Yu Zhang Ron J. Weiss Ye Jia Zhiwen Chen Yonghui Wu 104 951 0 05 Apr 2019
Neural Discrete Representation Learning Aaron van den Oord Oriol Vinyals Koray Kavukcuoglu BDL SSL OCL 226 5,008 0 02 Nov 2017
VoxCeleb: a large-scale speaker identification dataset Arsha Nagrani Joon Son Chung Andrew Zisserman 122 2,273 0 26 Jun 2017