Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation

1 July 2016

Papers citing "Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation"

50 / 158 papers shown

Title
Listen only to me! How well can target speech extraction handle false alarms? Marc Delcroix K. Kinoshita Tsubasa Ochiai Kateřina Žmolíková Hiroshi Sato Tomohiro Nakatani 34 15 0 11 Apr 2022
Multichannel Speech Separation with Narrow-band Conformer Changsheng Quan Xiaofei Li 31 12 0 09 Apr 2022
Heterogeneous Target Speech Separation Hyunjae Cho Wonbin Jung Junhyeok Lee Paris Smaragdis Sanghyun Woo 51 26 0 07 Apr 2022
Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches Zifeng Zhao Dongchao Yang Rongzhi Gu Haoran Zhang Yuexian Zou 25 16 0 04 Apr 2022
End-to-end multi-talker audio-visual ASR using an active speaker attention module R. Rose Olivier Siohan 18 3 0 01 Apr 2022
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis Karren D. Yang Dejan Marković Steven Krenn Vasu Agrawal Alexander Richard VGen 18 32 0 31 Mar 2022
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities Hsiang-Sheng Tsai Heng-Jui Chang Wen-Chin Huang Zili Huang Kushal Lakhotia ... Hsuan-Jui Chen Shang-Wen Li Shinji Watanabe Abdel-rahman Mohamed Hung-yi Lee 26 109 0 14 Mar 2022
Audio-visual speech separation based on joint feature representation with cross-modal attention Jun Xiong Peng Zhang Lei Xie Wei Huang Yufei Zha Yanni Zhang 28 3 0 05 Mar 2022
Audio Self-supervised Learning: A Survey Shuo Liu Adria Mallol-Ragolta Emilia Parada-Cabeleiro Kun Qian Xingshuo Jing Alexander Kathan Bin Hu Bjoern W. Schuller SSL 42 106 0 02 Mar 2022
L-SpEx: Localized Target Speaker Extraction Meng Ge Chenglin Xu Longbiao Wang Eng Siong Chng J. Dang Haizhou Li 30 21 0 21 Feb 2022
MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation Invariant Training Ertuğ Karamatlı S. Kırbız SSL 36 9 0 08 Feb 2022
Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge Fan Yu Shiliang Zhang Pengcheng Guo Yihui Fu Zhihao Du ... Kong Aik Lee Zhijie Yan B. Ma Xin Xu Hui Bu 18 28 0 08 Feb 2022
SkiM: Skipping Memory LSTM for Low-Latency Real-Time Continuous Speech Separation Chenda Li Lei Yang Weiqin Wang Y. Qian 34 25 0 26 Jan 2022
Multi-turn RNN-T for streaming recognition of multi-party speech Ilya Sklyar A. Piunova Xianrui Zheng Yulan Liu 24 22 0 19 Dec 2021
End-to-end speaker diarization with transformer Yongquan Lai Xin Tang Yuanyuan Fu Rui Fang 31 1 0 14 Dec 2021
A Time-domain Real-valued Generalized Wiener Filter for Multi-channel Neural Separation Systems Yi Luo 29 14 0 07 Dec 2021
Single-channel speech separation using Soft-minimum Permutation Invariant Training Midia Yousefi John H. L. Hansen 21 3 0 16 Nov 2021
Reduction of Subjective Listening Effort for TV Broadcast Signals with Recurrent Neural Networks Nils L. Westhausen R. Huber Hannah Baumgartner Ragini Sinha J. Rennies B. Meyer 33 10 0 02 Nov 2021
Recent Advances in End-to-End Automatic Speech Recognition Jinyu Li VLM 37 363 0 02 Nov 2021
Real-time Speaker counting in a cocktail party scenario using Attention-guided Convolutional Neural Network Midia Yousefi John H. L. Hansen 28 10 0 30 Oct 2021
Continuous Speech Separation with Recurrent Selective Attention Network Yixuan Zhang Zhuo Chen Jian Wu Takuya Yoshioka Peidong Wang Zhong Meng Jinyu Li BDL 27 7 0 28 Oct 2021
Progressive Learning for Stabilizing Label Selection in Speech Separation with Mapping-based Method Chenyang Gao Yue Gu I. Marsic 38 0 0 20 Oct 2021
Continual self-training with bootstrapped remixing for speech enhancement Efthymios Tzinis Yossi Adi V. Ithapu Buye Xu Anurag Kumar 26 16 0 19 Oct 2021
All-neural beamformer for continuous speech separation Zhuohuang Zhang Takuya Yoshioka Naoyuki Kanda Zhuo Chen Xiaofei Wang Dongmei Wang Sefik Emre Eskimez 33 15 0 13 Oct 2021
Multi-channel Narrow-band Deep Speech Separation with Full-band Permutation Invariant Training Changsheng Quan Xiaofei Li 34 14 0 12 Oct 2021
Visual Scene Graphs for Audio Source Separation Moitreya Chatterjee Jonathan Le Roux Narendra Ahuja A. Cherian 26 36 0 24 Sep 2021
Graph-PIT: Generalized permutation invariant training for continuous separation of arbitrary numbers of speakers Thilo von Neumann K. Kinoshita Christoph Boeddeker Marc Delcroix Reinhold Haeb-Umbach 28 23 0 30 Jul 2021
Don't Separate, Learn to Remix: End-to-End Neural Remixing with Joint Optimization Haici Yang Shivani Firodiya Nicholas J. Bryan Minje Kim 34 7 0 28 Jul 2021
Deep neural network Based Low-latency Speech Separation with Asymmetric analysis-Synthesis Window Pair Shanshan Wang Gaurav Naithani A. Politis Tuomas Virtanen 40 10 0 22 Jun 2021
Encoder-Decoder Based Attractors for End-to-End Neural Diarization Shota Horiguchi Yusuke Fujita Shinji Watanabe Yawen Xue Leibny Paola García-Perera 37 64 0 20 Jun 2021
WASE: Learning When to Attend for Speaker Extraction in Cocktail Party Environments Yunzhe Hao Jiaming Xu Peng Zhang Bo Xu 17 17 0 13 Jun 2021
PILOT: Introducing Transformers for Probabilistic Sound Event Localization C. Schymura Benedikt T. Bönninghoff Tsubasa Ochiai Marc Delcroix K. Kinoshita Tomohiro Nakatani S. Araki D. Kolossa 27 24 0 07 Jun 2021
Lightweight Dual-channel Target Speaker Separation for Mobile Voice Communication Yuanyuan Bao Yanze Xu Na Xu Wenjing Yang Hongfeng Li Shicong Li Y. Jia Fei Xiang Jincheng He Ming Li 30 1 0 05 Jun 2021
Sparse, Efficient, and Semantic Mixture Invariant Training: Taming In-the-Wild Unsupervised Sound Separation Scott Wisdom A. Jansen Ron J. Weiss Hakan Erdogan J. Hershey 38 26 0 01 Jun 2021
Many-Speakers Single Channel Speech Separation with Optimal Permutation Training Shaked Dovrat Eliya Nachmani Lior Wolf VLM 14 21 0 18 Apr 2021
Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss Naoki Makishima Mana Ihori Akihiko Takashima Tomohiro Tanaka Shota Orihashi Ryo Masumura 30 8 0 02 Mar 2021
Sandglasset: A Light Multi-Granularity Self-attentive Network For Time-Domain Speech Separation Max W. Y. Lam Jun Wang Dan Su Dong Yu AI4TS 72 49 0 01 Mar 2021
End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend Wangyou Zhang Christoph Boeddeker Shinji Watanabe Tomohiro Nakatani Marc Delcroix K. Kinoshita Tsubasa Ochiai Naoyuki Kamo Reinhold Haeb-Umbach Y. Qian 20 32 0 23 Feb 2021
Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition Aswin Shanmugam Subramanian Chao Weng Shinji Watanabe Meng Yu Dong Yu 34 78 0 16 Feb 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency Ruohan Gao Kristen Grauman CVBM 196 199 0 08 Jan 2021
Group Communication with Context Codec for Lightweight Source Separation Yi Luo Cong Han N. Mesgarani 26 20 0 14 Dec 2020
Deep Ad-hoc Beamforming Based on Speaker Extraction for Target-Dependent Speech Separation Ziye Yang Shanzheng Guan Xiao-Lei Zhang 22 14 0 01 Dec 2020
Streaming end-to-end multi-talker speech recognition Liang Lu Naoyuki Kanda Jinyu Li Jiawei Liu 13 41 0 26 Nov 2020
Single channel voice separation for unknown number of speakers under reverberant and noisy settings Shlomo E. Chazan Lior Wolf Eliya Nachmani Yossi Adi 29 29 0 04 Nov 2020
Attention is All You Need in Speech Separation Cem Subakan Mirco Ravanelli Samuele Cornell Mirko Bronzi Jianyuan Zhong 45 539 0 25 Oct 2020
An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection Yin Cao Turab Iqbal Qiuqiang Kong Y. Zhong Wenwu Wang Mark D. Plumbley 16 75 0 25 Oct 2020
Speaker Separation Using Speaker Inventories and Estimated Speech Peidong Wang Zhuo Chen DeLiang Wang Jinyu Li Jiawei Liu 38 11 0 20 Oct 2020
Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation Zhong-Qiu Wang Peidong Wang DeLiang Wang 33 88 0 04 Oct 2020
VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition Quan Wang Ignacio López Moreno Mert Saglam K. Wilson Alan Chiao ... Yanzhang He Wei Li Jason W. Pelecanos M. Nika A. Gruenstein VLM 39 82 0 09 Sep 2020
Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention Bin Duan Hao Tang Wei Wang Ziliang Zong Guowei Yang Yan Yan 33 59 0 14 Aug 2020