Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.05646
Cited By
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
12 October 2020
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis"
50 / 1,107 papers shown
Title
PerMod: Perceptually Grounded Voice Modification with Latent Diffusion Models
Robin Netzorg
A. Jalal
Luna McNulty
Gopala Krishna Anumanchipalli
DiffM
33
1
0
13 Dec 2023
Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism
Georgios Milis
P. Filntisis
A. Roussos
Petros Maragos
CVBM
41
2
0
11 Dec 2023
A Representative Study on Human Detection of Artificially Generated Media Across Countries
Joel Frank
Franziska Herbert
Jonas Ricker
Lea Schonherr
Thorsten Eisenhofer
Asja Fischer
Markus Dürmuth
Thorsten Holz
43
13
0
10 Dec 2023
CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional Modeling
Ruihan Yang
H. Gamper
Sebastian Braun
DiffM
36
5
0
08 Dec 2023
Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion
Binzhu Sha
Xu Li
Zhiyong Wu
Yin Shan
Helen M. Meng
23
7
0
08 Dec 2023
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Zehua Chen
Guande He
Kaiwen Zheng
Xu Tan
Jun Zhu
DiffM
56
21
0
06 Dec 2023
Detecting Voice Cloning Attacks via Timbre Watermarking
Chang-rui Liu
Jie Zhang
Tianwei Zhang
Xi Yang
Weiming Zhang
Neng H. Yu
33
29
0
06 Dec 2023
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
J. Choi
Se Jin Park
Minsu Kim
Y. Ro
39
12
0
05 Dec 2023
Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue State Tracking
Jihyun Lee
Yejin Jeon
Wonjun Lee
Yunsu Kim
Gary Geunbae Lee
17
1
0
04 Dec 2023
OpenVoice: Versatile Instant Voice Cloning
Zengyi Qin
Wenliang Zhao
Xumin Yu
Xin Sun
VLM
37
20
0
03 Dec 2023
Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning
Raviraj Joshi
Nikesh Garera
38
0
0
02 Dec 2023
Code-Mixed Text to Speech Synthesis under Low-Resource Constraints
Raviraj Joshi
Nikesh Garera
40
0
0
02 Dec 2023
AV-RIR: Audio-Visual Room Impulse Response Estimation
Anton Ratnarajah
Sreyan Ghosh
Sonal Kumar
Purva Chiniya
Dinesh Manocha
46
14
0
30 Nov 2023
Compression of end-to-end non-autoregressive image-to-speech system for low-resourced devices
Gokul Srinivasagan
Michael Deisher
Munir Georges
VLM
27
0
0
30 Nov 2023
Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes
Pavel Korshunov
Haolin Chen
Philip N. Garner
S´ebastien Marcel
CVBM
56
4
0
29 Nov 2023
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Sang-Hoon Lee
Haram Choi
Seung-Bin Kim
Seong-Whan Lee
BDL
35
31
0
21 Nov 2023
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
Jungil Kong
Junmo Lee
Jeongmin Kim
Beomjeong Kim
Jihoon Park
Dohee Kong
Changheon Lee
Sangjin Kim
25
1
0
20 Nov 2023
A Study on Altering the Latent Space of Pretrained Text to Speech Models for Improved Expressiveness
Mathias Vogel
DiffM
45
0
0
17 Nov 2023
SponTTS: modeling and transferring spontaneous style for TTS
Hanzhao Li
Xinfa Zhu
Liumeng Xue
Yang Song
Yunlin Chen
Lei Xie
48
7
0
13 Nov 2023
Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation
Haram Choi
Sang-Hoon Lee
Seong-Whan Lee
DiffM
34
24
0
08 Nov 2023
Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN
Neeraj Kumar
Ankur Narang
Brejesh Lall
DiffM
34
0
0
27 Oct 2023
DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation
Yongxin Zhu
Zhujin Gao
Xinyuan Zhou
Zhongyi Ye
Linli Xu
34
2
0
26 Oct 2023
Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions
Florian Lux
Pascal Tilli
Sarina Meyer
Ngoc Thang Vu
28
2
0
26 Oct 2023
The IMS Toucan System for the Blizzard Challenge 2023
Florian Lux
Julia Koch
Sarina Meyer
Thomas Bott
Nadja Schauffler
Pavel Denisov
Antje Schweitzer
Ngoc Thang Vu
32
6
0
26 Oct 2023
Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors
Marek Kubis
Pawel Skórzewski
Marcin Sowañski
Tomasz Ziętkiewicz
21
6
0
25 Oct 2023
Generative Pre-training for Speech with Flow Matching
Alexander H. Liu
Matt Le
Apoorv Vyas
Bowen Shi
Andros Tjandra
Wei-Ning Hsu
27
31
0
25 Oct 2023
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes
Seongho Joo
Hyukhun Koh
Kyomin Jung
DiffM
59
0
0
23 Oct 2023
Audio Editing with Non-Rigid Text Prompts
Francesco Paissan
Luca Della Libera
Zhepei Wang
Mirco Ravanelli
Paris Smaragdis
Cem Subakan
DiffM
46
5
0
19 Oct 2023
Black-Box Training Data Identification in GANs via Detector Networks
Lukman Olagoke
Salil P. Vadhan
Seth Neel
31
0
0
18 Oct 2023
A High Fidelity and Low Complexity Neural Audio Coding
Wenzhe Liu
Wei Xiao
Meng Wang
Shan Yang
Yupeng Shi
Yuyong Kang
Dan Su
Shidong Shang
Dong Yu
22
2
0
17 Oct 2023
Generation or Replication: Auscultating Audio Latent Diffusion Models
Dimitrios Bralios
Gordon Wichern
François Germain
Zexu Pan
Sameer Khurana
Chiori Hori
Jonathan Le Roux
DiffM
27
6
0
16 Oct 2023
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Paarth Neekhara
Shehzeen Samarah Hussain
Rafael Valle
Boris Ginsburg
Rishabh Ranjan
Shlomo Dubnov
F. Koushanfar
Julian McAuley
23
3
0
14 Oct 2023
Generative Adversarial Training for Text-to-Speech Synthesis Based on Raw Phonetic Input and Explicit Prosody Modelling
Tiberiu Boros
Stefan Daniel Dumitrescu
Ionut Mironica
Radu Chivereanu
GAN
22
1
0
14 Oct 2023
Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and Textually Described Voices
Matthew Baas
Herman Kamper
25
3
0
12 Oct 2023
DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation
Qingkai Fang
Yan Zhou
Yangzhou Feng
45
7
0
11 Oct 2023
Enhancing expressivity transfer in textless speech-to-speech translation
J. Duret
Benjamin O’Brien
Yannick Esteve
Titouan Parcollet
51
2
0
11 Oct 2023
Vec-Tok Speech: speech vectorization and tokenization for neural speech generation
Xinfa Zhu
Yuanjun Lv
Yinjiao Lei
Tao Li
Wendi He
Hongbin Zhou
Heng Lu
Lei Xie
45
16
0
11 Oct 2023
AutoCycle-VC: Towards Bottleneck-Independent Zero-Shot Cross-Lingual Voice Conversion
Haeyun Choi
Jio Gim
Yuho Lee
Youngin Kim
Young-Joo Suh
BDL
29
1
0
10 Oct 2023
JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions
Detai Xin
Junfeng Jiang
Shinnosuke Takamichi
Yuki Saito
Akiko Aizawa
Hiroshi Saruwatari
27
11
0
09 Oct 2023
An Initial Investigation of Neural Replay Simulator for Over-the-Air Adversarial Perturbations to Automatic Speaker Verification
Jiaqi Li
Li Wang
Liumeng Xue
Lei Wang
Zhizheng Wu
AAML
43
3
0
09 Oct 2023
Generative Spoken Language Model based on continuous word-sized audio tokens
Robin Algayres
Yossi Adi
Tu Nguyen
Jade Copet
Gabriel Synnaeve
Benoît Sagot
Emmanuel Dupoux
AuLLM
46
13
0
08 Oct 2023
A Comparative Study of Voice Conversion Models with Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023
Ryuichi Yamamoto
Reo Yoneyama
Lester Phillip Violeta
Wen-Chin Huang
Tomoki Toda
23
7
0
08 Oct 2023
Unified speech and gesture synthesis using flow matching
Shivam Mehta
Ruibo Tu
Simon Alexanderson
Jonas Beskow
Éva Székely
G. Henter
55
3
0
08 Oct 2023
VITS-based Singing Voice Conversion System with DSPGAN post-processing for SVCC2023
Yi-Hua Zhou
Meng Chen
Yi Lei
Jihua Zhu
Weifeng Zhao
26
5
0
08 Oct 2023
SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation
Yuanjun Lv
Jixun Yao
Peikun Chen
Hongbin Zhou
Heng Lu
Lei Xie
33
5
0
08 Oct 2023
Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset
Ze Liu
24
1
0
08 Oct 2023
VoiceExtender: Short-utterance Text-independent Speaker Verification with Guided Diffusion Model
Yayun He
Zuheng Kang
Jianzong Wang
Junqing Peng
Jing Xiao
DiffM
27
2
0
07 Oct 2023
Neural2Speech: A Transfer Learning Framework for Neural-Driven Speech Reconstruction
Jiawei Li
Chunxu Guo
Li Fu
Lu Fan
Edward F. Chang
Yuanning Li
14
2
0
07 Oct 2023
DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction
Jiarui Hai
Helin Wang
Dongchao Yang
Karan Thakkar
Najim Dehak
Mounya Elhilali
DiffM
31
7
0
06 Oct 2023
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Tao Li
Zhichao Wang
Xinfa Zhu
Jian Cong
Qiao Tian
Yuping Wang
Lei Xie
DiffM
37
3
0
06 Oct 2023
Previous
1
2
3
...
9
10
11
...
21
22
23
Next