Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.00814
Cited By
v1
v2
v3 (latest)
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
1 June 2023
Hubert Siuzdak
Re-assign community
ArXiv (abs)
PDF
HTML
Github (932★)
Papers citing
"Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis"
23 / 23 papers shown
Title
LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization
DaeJin Jo
Jeeyoung Yun
Byungseok Roh
Sungwoong Kim
23
0
0
20 Jun 2025
Single-Microphone-Based Sound Source Localization for Mobile Robots in Reverberant Environments
Jiang Wang
Runwu Shi
Benjamin Yen
He Kong
Kazuhiro Nakadai
15
0
0
19 Jun 2025
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
Han Zhu
Wei Kang
Zengwei Yao
Liyong Guo
Fangjun Kuang
Zhaoqing Li
Weiji Zhuang
Long Lin
Daniel Povey
52
0
0
16 Jun 2025
SpeechRefiner: Towards Perceptual Quality Refinement for Front-End Algorithms
Sirui Li
Shuai Wang
Zhijun Liu
Zhongjie Jiang
Yannan Wang
Haizhou Li
22
0
0
16 Jun 2025
Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments
Reo Yoneyama
Masaya Kawamura
Ryo Terashima
Ryuichi Yamamoto
Tomoki Toda
125
0
0
04 Jun 2025
Learning to Upsample and Upmix Audio in the Latent Domain
Dimitrios Bralios
Paris Smaragdis
Jonah Casebeer
37
0
0
31 May 2025
MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation
Yakun Song
Jiawei Chen
Xiaobin Zhuang
Chenpeng Du
Ziyang Ma
...
Dongya Jia
Zhuo Chen
Yuping Wang
Yuxuan Wang
Xie Chen
38
0
0
31 May 2025
Efficient Neural and Numerical Methods for High-Quality Online Speech Spectrogram Inversion via Gradient Theorem
Andres Fernandez
Juan Azcarreta
Cagdas Bilen
Jesus Monge Alvarez
39
0
0
30 May 2025
Mel-McNet: A Mel-Scale Framework for Online Multichannel Speech Enhancement
Yujie Yang
Bing Yang
Xiaofei Li
30
0
0
26 May 2025
Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling
Qixi Zheng
Yushen Chen
Zhikang Niu
Ziyang Ma
Xiaofei Wang
Kai Yu
Xie Chen
56
0
0
26 May 2025
Eta-WavLM: Efficient Speaker Identity Removal in Self-Supervised Speech Representations Using a Simple Linear Equation
Giuseppe Ruggiero
Matteo Testa
Jurgen Van de Walle
Luigi Di Caro
111
1
0
25 May 2025
OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching
Hieu-Nghia Huynh-Nguyen
Ngoc Son Nguyen
Huynh Nguyen Dang
Thieu Vo
Truong-Son Hy
Van Nguyen
76
0
0
19 May 2025
Deep Audio Watermarks are Shallow: Limitations of Post-Hoc Watermarking Techniques for Speech
P. O'Reilly
Zeyu Jin
Jiaqi Su
Bryan Pardo
87
0
0
15 Apr 2025
SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System
Hyeongju Kim
Jinhyeok Yang
Yechan Yu
Seunghun Ji
Jacob Morton
Frederik Bous
Joon Byun
Juheon Lee
149
0
0
29 Mar 2025
AudioMiXR: Spatial Audio Object Manipulation with 6DoF for Sound Design in Augmented Reality
Brandon Woodard
Margarita Geleta
Joseph J. LaViola Jr.
Andrea Fanelli
Rhonda Wilson
174
4
0
05 Feb 2025
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
Yiwei Guo
Zhihan Li
Chenpeng Du
Hankun Wang
Xie Chen
Kai Yu
102
3
0
21 Oct 2024
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
Xuyuan Li
Zengqiang Shang
Hua Hua
Peiyang Shi
Chen Yang
Li Wang
Pengyuan Zhang
153
3
0
16 Oct 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Yushen Chen
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
145
92
0
09 Oct 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Shengpeng Ji
Ziyue Jiang
Xize Cheng
Yifu Chen
Minghui Fang
...
Rongjie Huang
Yidi Jiang
Qian Chen
Zhou Zhao
Zhou Zhao
VLM
149
45
0
29 Aug 2024
Autoregressive Speech Synthesis without Vector Quantization
Lingwei Meng
Long Zhou
Shujie Liu
Sanyuan Chen
Bing Han
...
Jinyu Li
Sheng Zhao
Xixin Wu
Helen M. Meng
Furu Wei
169
43
0
11 Jul 2024
FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
Yuanjun Lv
Hai Li
Ying Yan
Junhui Liu
Danming Xie
Lei Xie
104
1
0
12 Jun 2024
BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation
Hui-Peng Du
Ye-Xin Lu
Yang Ai
Zhen-Hua Ling
72
3
0
04 Jun 2024
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
193
727
0
05 Jan 2023
1