v1v2 (latest)

WaveNet: A Generative Model for Raw Audio

12 September 2016

Papers citing "WaveNet: A Generative Model for Raw Audio"

50 / 3,082 papers shown

Title
HyperSound: Generating Implicit Neural Representations of Audio Signals with Hypernetworks Filip Szatkowski Karol J. Piczak Przemysław Spurek Jacek Tabor Tomasz Trzciñski 117 13 0 03 Nov 2022
Human in the loop approaches in multi-modal conversational task guidance system development R. Manuvinakurike Sovan Biswas G. Raffa R. Beckwith A. Rhodes Meng Shi Gesem Gudino Mejia Saurav Sahay L. Nachman 81 2 0 03 Nov 2022
Iterative autoregression: a novel trick to improve your low-latency speech enhancement model Pavel Andreev Nicholas Babaev Azat Saginbaev Ivan Shchekotov Aibek Alanov 88 5 0 03 Nov 2022
Audio Language Modeling using Perceptually-Guided Discrete Representations Felix Kreuk Yaniv Taigman Adam Polyak Jade Copet Gabriel Synnaeve Alexandre Défossez Yossi Adi 85 4 0 02 Nov 2022
Inference and Denoise: Causal Inference-based Neural Speech Enhancement Tsun-An Hsieh Chao-Han Huck Yang Pin-Yu Chen Sabato Marco Siniscalchi Yu Tsao CML 85 2 0 02 Nov 2022
Adversarial Guitar Amplifier Modelling With Unpaired Data Alec Wright Vesa Valimaki Lauri Juvela GAN 58 8 0 02 Nov 2022
SIMD-size aware weight regularization for fast neural vocoding on CPU Hiroki Kanagawa Yusuke Ijima 115 0 0 02 Nov 2022
Neural Fourier Shift for Binaural Speech Rendering Jinkyu Lee Kyogu Lee 80 8 0 02 Nov 2022
Comparision Of Adversarial And Non-Adversarial LSTM Music Generative Models Moseli Motsóehli Anna Sergeevna Bosman J. D. Villiers AAML GAN MGen 70 0 0 01 Nov 2022
Waveform Boundary Detection for Partially Spoofed Audio Zexin Cai Weiqing Wang Ming Li 48 28 0 01 Nov 2022
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS Kun Song Jian Cong Xinsheng Wang Yongmao Zhang Linfu Xie Ning Jiang Haiying Wu 69 0 0 31 Oct 2022
Audio Time-Scale Modification with Temporal Compressing Networks Ernie Chu Ju-Ting Chen Chia-Ping Chen 27 0 0 31 Oct 2022
Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders Jason Fong Yun Wang Prabhav Agrawal Vimal Manohar Jilong Wu Thilo Kohler Qing He 52 0 0 28 Oct 2022
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform Masaya Kawamura Yuma Shirahata Ryuichi Yamamoto Kentaro Tachibana 99 17 0 28 Oct 2022
Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis Yuma Shirahata Ryuichi Yamamoto Eunwoo Song Ryo Terashima Jae-Min Kim Kentaro Tachibana 86 11 0 28 Oct 2022
Nonparallel High-Quality Audio Super Resolution with Domain Adaptation and Resampling CycleGANs Reo Yoneyama Ryuichi Yamamoto Kentaro Tachibana 53 5 0 28 Oct 2022
One-Shot Acoustic Matching Of Audio Signals -- Learning to Hear Music In Any Room/ Concert Hall Prateek Verma C. Chafe J. Berger 63 1 0 27 Oct 2022
LyricJam Sonic: A Generative System for Real-Time Composition and Musical Improvisation Olga Vechtomova Gaurav Sahu 36 6 0 27 Oct 2022
Learned Inertial Odometry for Autonomous Drone Racing Giovanni Cioffi L. Bauersfeld Elia Kaufmann Davide Scaramuzza 85 22 0 27 Oct 2022
Cover Reproducible Steganography via Deep Generative Models Kejiang Chen Hang Zhou Yaofei Wang Meng Li Weiming Zhang Neng H. Yu DiffM 66 13 0 26 Oct 2022
WaveBound: Dynamic Error Bounds for Stable Time Series Forecasting Youngin Cho Daejin Kim Dongmin Kim Mohammad Azam Khan Jaegul Choo AI4TS 76 3 0 25 Oct 2022
EBEN: Extreme bandwidth extension network applied to speech signals captured with noise-resilient body-conduction microphones J. Hauret Thomas Joubaud V. Zimpfer Éric Bavu 48 10 0 25 Oct 2022
A Survey on Artificial Intelligence for Music Generation: Agents, Domains and Perspectives Carlos Hernandez-Olivan Javier Hernandez-Olivan J. R. Beltrán MGen 98 7 0 25 Oct 2022
Semi-Supervised Learning Based on Reference Model for Low-resource TTS Xulong Zhang Jianzong Wang Ning Cheng Jing Xiao AI4TS 58 5 0 25 Oct 2022
High Fidelity Neural Audio Compression Alexandre Défossez Jade Copet Gabriel Synnaeve Yossi Adi 133 674 0 24 Oct 2022
Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS Ziqi Liang 60 0 0 24 Oct 2022
A Machine Learning Approach to Classifying Construction Cost Documents into the International Construction Measurement Standard J. Ignacio Deza Hisham Ihshaish L. Mahdjoubi 45 0 0 24 Oct 2022
Federated Learning and Meta Learning: Approaches, Applications, and Directions Xiaonan Liu Yansha Deng Arumugam Nallanathan M. Bennis 125 40 0 24 Oct 2022
HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation Chunhui Wang Chang Zeng Jun Chen Xingji He 96 7 0 23 Oct 2022
Boomerang: Local sampling on image manifolds using diffusion models Lorenzo Luzi P. Mayer Josue Casco-Rodriguez Ali Siahkoohi Richard G. Baraniuk DiffM 108 20 0 21 Oct 2022
Adaptive re-calibration of channel-wise features for Adversarial Audio Classification Vardhan Dongre Abhinav Thimma Reddy Nikhitha Reddeddy AAML 26 0 0 21 Oct 2022
Improved Normalizing Flow-Based Speech Enhancement using an All-pole Gammatone Filterbank for Conditional Input Representation Martin Strauss Matteo Torcoli B. Edler 51 5 0 21 Oct 2022
Robust One-Shot Singing Voice Conversion Naoya Takahashi M. Singh Yuki Mitsufuji DiffM 116 8 0 20 Oct 2022
DOT-VAE: Disentangling One Factor at a Time Vaishnavi Patil Matthew Evanusa Joseph Jaja CoGe DRL CML 63 1 0 19 Oct 2022
Transformers Learn Shortcuts to Automata Bingbin Liu Jordan T. Ash Surbhi Goel A. Krishnamurthy Cyril Zhang OffRL LRM 161 178 0 19 Oct 2022
Autoregressive Generative Modeling with Noise Conditional Maximum Likelihood Estimation Henry Li Y. Kluger 57 2 0 19 Oct 2022
Language Does More Than Describe: On The Lack Of Figurative Speech in Text-To-Image Models Ricardo Kleinlein Cristina Luna Jiménez Fernando Fernández-Martínez DiffM 52 3 0 19 Oct 2022
Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders Xin Wang Junichi Yamagishi 116 43 0 19 Oct 2022
Mid-attribute speaker generation using optimal-transport-based interpolation of Gaussian mixture models Aya Watanabe Shinnosuke Takamichi Yuki Saito Detai Xin Hiroshi Saruwatari 69 3 0 18 Oct 2022
Fine-mixing: Mitigating Backdoors in Fine-tuned Language Models Zhiyuan Zhang Lingjuan Lyu Xingjun Ma Chenguang Wang Xu Sun AAML 66 43 0 18 Oct 2022
TorchDIVA: An Extensible Computational Model of Speech Production built on an Open-Source Machine Learning Library Sean M. Kinahan J. Liss Visar Berisha 23 2 0 17 Oct 2022
Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario Emily R. Bartusiak Edward J. Delp 60 14 0 14 Oct 2022
Hierarchical Diffusion Models for Singing Voice Neural Vocoder Naoya Takahashi Mayank Kumar Singh Yuki Mitsufuji DiffM 74 16 0 14 Oct 2022
Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction YuXuan Liu Nikhil Mishra Maximilian Sieb Yide Shentu Pieter Abbeel Xi Chen 3DPC 69 5 0 13 Oct 2022
Learning Multivariate CDFs and Copulas using Tensor Factorization Magda Amiridi N. Sidiropoulos 118 1 0 13 Oct 2022
Retrospectives on the Embodied AI Workshop Matt Deitke Dhruv Batra Yonatan Bisk Tommaso Campari Angel X. Chang ... Jesse Thomason Alexander Toshev Joanne Truong Luca Weihs Jiajun Wu LM&Ro 122 51 0 13 Oct 2022
Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech Byoung Jin Choi Myeonghun Jeong Minchan Kim Sung Hwan Mun N. Kim DiffM 94 6 0 12 Oct 2022
Unsupervised Learning of Equivariant Structure from Sequences Takeru Miyato Masanori Koyama Kenji Fukumizu 86 12 0 12 Oct 2022
Style-Guided Inference of Transformer for High-resolution Image Synthesis Jonghwa Yim Minjae Kim ViT 103 0 0 11 Oct 2022
GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion Models Matthew Baas Herman Kamper DiffM 86 8 0 11 Oct 2022