Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation

1 July 2016

Papers citing "Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation"

50 / 157 papers shown

Title
Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss Ziqiang Shi Rujie Liu Jiqing Han 16 7 0 06 Aug 2020
Sudo rm -rf: Efficient Networks for Universal Audio Source Separation Efthymios Tzinis Zhepei Wang Paris Smaragdis 36 127 0 14 Jul 2020
Speaker-Conditional Chain Model for Speech Separation and Extraction Jing Shi Jiaming Xu Yusuke Fujita Shinji Watanabe Bo Xu BDL 43 20 0 25 Jun 2020
Unsupervised Sound Separation Using Mixture Invariant Training Scott Wisdom Efthymios Tzinis Hakan Erdogan Ron J. Weiss K. Wilson J. Hershey 16 27 0 23 Jun 2020
Efficient Integration of Multi-channel Information for Speaker-independent Speech Separation Yuichiro Koyama Oluwafemi Azeez Bhiksha Raj 27 4 0 23 May 2020
End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors Shota Horiguchi Yusuke Fujita Shinji Watanabe Yawen Xue Kenji Nagamatsu 37 186 0 20 May 2020
Multimodal Target Speech Separation with Voice and Face References Leyuan Qu C. Weber S. Wermter CVBM 19 19 0 17 May 2020
FaceFilter: Audio-visual speech separation using still images Soo-Whan Chung Soyeon Choe Joon Son Chung Hong-Goo Kang CVBM 21 66 0 14 May 2020
Foreground-Background Ambient Sound Scene Separation Michel Olvera Emmanuel Vincent Romain Serizel Gilles Gasso 37 9 0 11 May 2020
SpEx+: A Complete Time Domain Speaker Extraction Network Meng Ge Chenglin Xu Longbiao Wang Chng Eng Siong J. Dang Haizhou Li 27 145 0 10 May 2020
Asteroid: the PyTorch-based audio source separation toolkit for researchers Manuel Pariente Samuele Cornell Joris Cosentino S. Sivasankaran Efthymios Tzinis ... Juan M. Martín-Donas David Ditter Ariel Frank Antoine Deleforge Emmanuel Vincent 27 151 0 08 May 2020
Neural Spatio-Temporal Beamformer for Target Speech Separation Yong-mei Xu Meng Yu Shi-Xiong Zhang Lianwu Chen Chao Weng Jianming Liu Dong Yu 26 41 0 08 May 2020
Serialized Output Training for End-to-End Overlapped Speech Recognition Naoyuki Kanda Yashesh Gaur Xiaofei Wang Zhong Meng Takuya Yoshioka 10 113 0 28 Mar 2020
Separating Varying Numbers of Sources with Auxiliary Autoencoding Loss Yi Luo N. Mesgarani 21 29 0 27 Mar 2020
Tackling real noisy reverberant meetings with all-neural source separation, counting, and diarization system K. Kinoshita Marc Delcroix S. Araki Tomohiro Nakatani 197 30 0 09 Mar 2020
Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning Rongzhi Gu Shi-Xiong Zhang Lianwu Chen Yong-mei Xu Meng Yu Dan Su Yuexian Zou Dong Yu 8 59 0 09 Mar 2020
Voice Separation with an Unknown Number of Multiple Speakers Eliya Nachmani Yossi Adi Lior Wolf 20 175 0 29 Feb 2020
End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification Yusuke Fujita Shinji Watanabe Shota Horiguchi Yawen Xue Kenji Nagamatsu 14 49 0 24 Feb 2020
Wavesplit: End-to-End Speech Separation by Speaker Clustering Neil Zeghidour David Grangier VLM 27 261 0 20 Feb 2020
End-to-End Multi-speaker Speech Recognition with Transformer Xuankai Chang Wangyou Zhang Y. Qian Jonathan Le Roux Shinji Watanabe ViT 27 103 0 10 Feb 2020
Continuous speech separation: dataset and analysis Zhuo Chen Takuya Yoshioka Liang Lu Tianyan Zhou Zhong Meng Yi Luo Jian Wu Xiong Xiao Jinyu Li 16 211 0 30 Jan 2020
Audio-visual Recognition of Overlapped speech for the LRS2 dataset Jianwei Yu Shi-Xiong Zhang Jian Wu Shahram Ghorbani Bo Wu Shiyin Kang Shansong Liu Xunying Liu Helen Meng Dong Yu 32 72 0 06 Jan 2020
End-to-end training of time domain audio separation and recognition Thilo von Neumann K. Kinoshita Lukas Drude Christoph Boeddeker Marc Delcroix Tomohiro Nakatani Reinhold Haeb-Umbach 25 34 0 18 Dec 2019
Unsupervised Training for Deep Speech Source Separation with Kullback-Leibler Divergence Based Probabilistic Loss Function M. Togami Yoshiki Masuyama Tatsuya Komatsu Yumi Nakagome 19 25 0 11 Nov 2019
End-to-end Non-Negative Autoencoders for Sound Source Separation Shrikant Venkataramani Efthymios Tzinis Paris Smaragdis 17 5 0 31 Oct 2019
Mixup-breakdown: a consistency training method for improving generalization of speech separation models Max W. Y. Lam Jun Wang Dan Su Dong Yu 33 22 0 28 Oct 2019
A Multi-Phase Gammatone Filterbank for Speech Separation via TasNet David Ditter Timo Gerkmann 17 57 0 25 Oct 2019
Filterbank design for end-to-end speech separation Manuel Pariente Samuele Cornell Antoine Deleforge Emmanuel Vincent 26 69 0 23 Oct 2019
Two-Step Sound Source Separation: Training on Learned Latent Targets Efthymios Tzinis Shrikant Venkataramani Zhepei Wang Y. C. Sübakan Paris Smaragdis 24 64 0 22 Oct 2019
Discriminative Neural Clustering for Speaker Diarisation Qiujia Li Florian Kreyssig Chao Zhang P. Woodland 11 44 0 22 Oct 2019
CochleaNet: A Robust Language-independent Audio-Visual Model for Speech Enhancement M. Gogate K. Dashtipour Ahsan Adeel Amir Hussain 23 53 0 23 Sep 2019
End-to-End Neural Speaker Diarization with Self-attention Yusuke Fujita Naoyuki Kanda Shota Horiguchi Yawen Xue Kenji Nagamatsu Shinji Watanabe 190 237 0 13 Sep 2019
My lips are concealed: Audio-visual speech enhancement through obstructions Triantafyllos Afouras Joon Son Chung Andrew Zisserman 16 90 0 11 Jul 2019
Object Discovery with a Copy-Pasting GAN Relja Arandjelović Andrew Zisserman 27 57 0 27 May 2019
Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech T. Menne Ilya Sklyar Ralf Schluter Hermann Ney 27 35 0 09 May 2019
Universal Sound Separation Ilya Kavalerov Scott Wisdom Hakan Erdogan Brian Patton K. Wilson Jonathan Le Roux J. Hershey 11 184 0 08 May 2019
Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering Gene-Ping Yang Chao-I Tuan Hung-yi Lee Lin-Shan Lee 20 25 0 16 Apr 2019
Co-Separating Sounds of Visual Objects Ruohan Gao Kristen Grauman 33 206 0 16 Apr 2019
The Sound of Motions Hang Zhao Chuang Gan Wei-Chiu Ma Antonio Torralba 17 251 0 11 Apr 2019
Optimization of Speaker Extraction Neural Network with Magnitude and Temporal Spectrum Approximation Loss Chenglin Xu Wei Rao Chng Eng Siong Haizhou Li 45 53 0 24 Mar 2019
Low-Latency Deep Clustering For Speech Separation Shanshan Wang Gaurav Naithani Tuomas Virtanen 24 14 0 19 Feb 2019
FurcaNet: An end-to-end deep gated convolutional, long short-term memory, deep neural networks for single channel speech separation Ziqiang Shi Huibin Lin L. Liu Rujie Liu Shoji Hayakawa Shouji Harada Jiqing Han 22 22 0 02 Feb 2019
The Visual Centrifuge: Model-Free Layered Video Representations Jean-Baptiste Alayrac João Carreira Andrew Zisserman 21 48 0 04 Dec 2018
Deep Learning Based Phase Reconstruction for Speaker Separation: A Trigonometric Perspective Zhong-Qiu Wang Ke Tan DeLiang Wang 50 95 0 22 Nov 2018
Building Corpora for Single-Channel Speech Separation Across Multiple Domains Aman Rana Gregory Sell Leibny Paola García Perera A. Lowe Pratik Shah 19 10 0 06 Nov 2018
Speaker Selective Beamformer with Keyword Mask Estimation Yusuke Kida Dung T. Tran Motoi Omachi T. Taniguchi Yuya Fujita 17 3 0 25 Oct 2018
Phasebook and Friends: Leveraging Discrete Representations for Source Separation Jonathan Le Roux Gordon Wichern Shinji Watanabe Andy M. Sarroff J. Hershey 19 76 0 02 Oct 2018
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation Yi Luo N. Mesgarani 63 1,750 0 20 Sep 2018
Deep Extractor Network for Target Speaker Recovery From Single Channel Speech Mixtures Jun Wang Jie Chen Dan Su Lianwu Chen Meng Yu Y. Qian Dong Yu 46 90 0 24 Jul 2018
Deep Speech Denoising with Vector Space Projections Jeff Hetherly Paul Gamble M. Barrios Cory Stephenson Karl S. Ni 13 0 0 27 Apr 2018