Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.05646
Cited By
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"
50 / 1,102 papers shown
Title
User-Driven Voice Generation and Editing through Latent Space Navigation
Yusheng Tian
Junbin Liu
Tan Lee
DiffM
43
2
0
30 Aug 2024
SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
Ismail Rasim Ulgen
Shreeram Suresh Chandra
Junchen Lu
Berrak Sisman
195
0
0
30 Aug 2024
RAVE for Speech: Efficient Voice Conversion at High Sampling Rates
A. R. Bargum
Simon Lajboschitz
Cumhur Erkut
37
1
0
29 Aug 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Shengpeng Ji
Ziyue Jiang
Xize Cheng
Yifu Chen
Minghui Fang
...
Rongjie Huang
Yidi Jiang
Qian Chen
Zhou Zhao
Zhou Zhao
VLM
60
36
0
29 Aug 2024
SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge
You Zhang
Yongyi Zang
Jiatong Shi
Ryuichi Yamamoto
T. Toda
Zhiyao Duan
32
5
0
28 Aug 2024
Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation
K. Chen
Jiaqi Su
Taylor Berg-Kirkpatrick
Shlomo Dubnov
Zeyu Jin
40
1
0
28 Aug 2024
Easy, Interpretable, Effective: openSMILE for voice deepfake detection
Octavian Pascu
Dan Oneaţă
H. Cucu
Nicolas M. Muller
50
1
0
28 Aug 2024
Anonymization of Voices in Spaces for Civic Dialogue: Measuring Impact on Empathy, Trust, and Feeling Heard
Wonjune Kang
Margaret Hughes
Deb Roy
34
1
0
26 Aug 2024
Global-Local Distillation Network-Based Audio-Visual Speaker Tracking with Incomplete Modalities
Yidi Li
Yihan Li
Yixin Guo
Bin Ren
Zhenhuan Xu
Hao Guo
Hong Liu
N. Sebe
47
0
0
26 Aug 2024
SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks
Kai-Wei Chang
Haibin Wu
Yu-Kai Wang
Yuan-Kuei Wu
Hua Shen
Wei-Cheng Tseng
Iu-thing Kang
Shang-Wen Li
Hung-yi Lee
53
3
0
23 Aug 2024
Hierarchical Generative Modeling of Melodic Vocal Contours in Hindustani Classical Music
N. Shikarpur
Krishna Maneesha Dendukuri
Yusong Wu
Antoine Caillon
Cheng-Zhi Anna Huang
20
1
0
22 Aug 2024
Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech
Anastasia Avdeeva
Aleksei Gusev
24
0
0
21 Aug 2024
Disentangling segmental and prosodic factors to non-native speech comprehensibility
Waris Quamer
Ricardo Gutierrez-Osuna
45
1
0
20 Aug 2024
DisMix: Disentangling Mixtures of Musical Instruments for Source-level Pitch and Timbre Manipulation
Yin-Jyun Luo
K. Cheuk
Woosung Choi
Toshimitsu Uesaka
Keisuke Toyama
...
Chieh-Hsin Lai
Yuhta Takida
Wei-Hsiang Liao
Simon Dixon
Yuki Mitsufuji
CoGe
49
2
0
20 Aug 2024
Hear Your Face: Face-based voice conversion with F0 estimation
Jaejun Lee
Yoori Oh
Injune Hwang
Kyogu Lee
CVBM
29
2
0
19 Aug 2024
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale
Xin Wang
Héctor Delgado
Hemlata Tak
Jee-weon Jung
Hye-jin Shim
...
Md. Sahidullah
Tomi Kinnunen
Nicholas W. D. Evans
K. Lee
Junichi Yamagishi
AAML
45
39
0
16 Aug 2024
Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization
Sang-Hoon Lee
Ha-Yeong Choi
Seong-Whan Lee
AI4TS
35
1
0
15 Aug 2024
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
Sang-Hoon Lee
Ha-Yeong Choi
Seong-Whan Lee
OOD
DiffM
AI4TS
53
5
0
14 Aug 2024
VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis Vocoders
Yubing Cao
Yongming Li
Liejun Wang
Yinfeng Yu
25
0
0
13 Aug 2024
An Investigation Into Explainable Audio Hate Speech Detection
Jinmyeong An
Wonjun Lee
Yejin Jeon
Jungseul Ok
Yunsu Kim
Gary Geunbae Lee
33
2
0
12 Aug 2024
Adapting General Disentanglement-Based Speaker Anonymization for Enhanced Emotion Preservation
Xiaoxiao Miao
Yuxiang Zhang
Xin Wang
N. Tomashenko
D. Soh
Ian Mcloughlin
42
2
0
12 Aug 2024
ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild
Jiangyan Yi
Chu Yuan Zhang
Jianhua Tao
Chenglong Wang
Xinrui Yan
Yong Ren
Hao Gu
Junzuo Zhou
52
1
0
09 Aug 2024
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency
Jiawei Huang
Chen Zhang
Yi Ren
Ziyue Jiang
Zhenhui Ye
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
43
2
0
08 Aug 2024
Synchronous Multi-modal Semantic Communication System with Packet-level Coding
Yun Tian
Jingkai Ying
Zhijin Qin
Ye Jin
Xiaoming Tao
46
3
0
08 Aug 2024
MaskAnyone Toolkit: Offering Strategies for Minimizing Privacy Risks and Maximizing Utility in Audio-Visual Data Archiving
B. Owoyele
Martin Schilling
Rohan Sawahn
Niklas Kaemer
Pavel Zherebenkov
Bhuvanesh Verma
Wim Pouw
Gerard de Melo
34
0
0
06 Aug 2024
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
Hawraz A. Ahmad
Tarik A. Rashid
41
0
0
06 Aug 2024
Automatic Voice Identification after Speech Resynthesis using PPG
Thibault Gaudier
Marie Tahon
Anthony Larcher
Yannick Esteve
48
0
0
05 Aug 2024
Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction
Yansheng Li
Tingzhu Wang
Kang Wu
Linlin Wang
Xin Guo
Wenbin Wang
64
0
0
27 Jul 2024
Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks
Mahmoud Salhab
H. Harmanani
21
0
0
26 Jul 2024
Towards Improving NAM-to-Speech Synthesis Intelligibility using Self-Supervised Speech Models
N. Shah
Shirish S. Karande
Vineet Gandhi
41
1
0
26 Jul 2024
Speech Editing -- a Summary
Tobias Kässmann
Yining Liu
Danni Liu
32
0
0
24 Jul 2024
Distortion Recovery: A Two-Stage Method for Guitar Effect Removal
Ying-Shuo Lee
Yueh-Po Peng
Jui-Te Wu
Ming Cheng
Li Su
Yi-Hsuan Yang
33
0
0
23 Jul 2024
J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling
Wataru Nakata
Kentaro Seki
Hitomi Yanaka
Yuki Saito
Shinnosuke Takamichi
Hiroshi Saruwatari
AuLLM
43
0
0
22 Jul 2024
dMel: Speech Tokenization made Simple
Richard He Bai
Tatiana Likhomanenko
Ruixiang Zhang
Zijin Gu
Zakaria Aldeneh
Navdeep Jaitly
43
4
0
22 Jul 2024
Learning Physics for Unveiling Hidden Earthquake Ground Motions via Conditional Generative Modeling
Pu Ren
R. Nakata
Maxime Lacour
Ilan Naiman
Nori Nakata
...
Osman Asif Malik
Dmitriy Morozov
Omri Azencot
N. Benjamin Erichson
Michael W. Mahoney
AI4CE
38
8
0
21 Jul 2024
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2
Chun Xu
En-Wei Sun
36
0
0
19 Jul 2024
Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for Practical Applications through Low-Effort Data Strategies
Srija Anand
Praveena Varadhan
Ashwin Sankar
Giri Raju
Mitesh M. Khapra
45
1
0
18 Jul 2024
SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural Network
Kexin Wang
Jiahong Zhang
Yong Ren
Man Yao
Richard D. Shang
Boxing Xu
Guoqi Li
DiffM
31
2
0
17 Jul 2024
A Language Modeling Approach to Diacritic-Free Hebrew TTS
Amit Roth
A. Turetzky
Yossi Adi
37
2
0
16 Jul 2024
GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis
Weizhi Liu
Yue Li
Dongdong Lin
Hui Tian
Haizhou Li
WIGM
43
9
0
15 Jul 2024
Autoregressive Speech Synthesis without Vector Quantization
Lingwei Meng
Long Zhou
Shujie Liu
Sanyuan Chen
Bing Han
...
Jinyu Li
Sheng Zhao
Xixin Wu
Helen Meng
Furu Wei
54
33
0
11 Jul 2024
Video-to-Audio Generation with Hidden Alignment
Manjie Xu
Chenxing Li
Yong Ren
Rilin Chen
Yu Gu
Yu Gu
Dong Yu
Dong Yu
DiffM
VGen
43
12
0
10 Jul 2024
Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation
J. Duret
Yannick Esteve
Titouan Parcollet
41
0
0
08 Jul 2024
Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong
Yunji Kim
Sanghyuk Chun
Jiyoung Lee
VGen
DiffM
34
12
0
08 Jul 2024
A Benchmark for Multi-speaker Anonymization
Xiaoxiao Miao
Ruijie Tao
Chang Zeng
Xin Wang
49
1
0
08 Jul 2024
Fine-Grained and Interpretable Neural Speech Editing
Max Morrison
Cameron Churchwell
Nathan Pruyne
Bryan Pardo
51
3
0
07 Jul 2024
We Need Variations in Speech Synthesis: Sub-center Modelling for Speaker Embeddings
Ismail Rasim Ulgen
Carlos Busso
John H. L. Hansen
Berrak Sisman
37
1
0
05 Jul 2024
Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis
Cong-Thanh Do
Shuhei Imai
R. Doddipatla
Thomas Hain
28
2
0
04 Jul 2024
On the Effectiveness of Acoustic BPE in Decoder-Only TTS
Bohan Li
Feiyu Shen
Yiwei Guo
Shuai Wang
Xie Chen
Kai Yu
39
2
0
04 Jul 2024
Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
Kunal Dhawan
Nithin Rao Koluguri
Ante Jukić
Ryan Langman
Jagadeesh Balam
Boris Ginsburg
49
1
0
03 Jul 2024
Previous
1
2
3
4
5
6
...
21
22
23
Next