ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1607.00325
  4. Cited By
Permutation Invariant Training of Deep Models for Speaker-Independent
  Multi-talker Speech Separation

Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation

1 July 2016
Dong Yu
Morten Kolbæk
Zheng-Hua Tan
Jesper Jensen
ArXivPDFHTML

Papers citing "Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation"

50 / 157 papers shown
Title
Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM
  with Auxiliary Identity Loss
Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss
Ziqiang Shi
Rujie Liu
Jiqing Han
16
7
0
06 Aug 2020
Sudo rm -rf: Efficient Networks for Universal Audio Source Separation
Sudo rm -rf: Efficient Networks for Universal Audio Source Separation
Efthymios Tzinis
Zhepei Wang
Paris Smaragdis
36
127
0
14 Jul 2020
Speaker-Conditional Chain Model for Speech Separation and Extraction
Speaker-Conditional Chain Model for Speech Separation and Extraction
Jing Shi
Jiaming Xu
Yusuke Fujita
Shinji Watanabe
Bo Xu
BDL
43
20
0
25 Jun 2020
Unsupervised Sound Separation Using Mixture Invariant Training
Unsupervised Sound Separation Using Mixture Invariant Training
Scott Wisdom
Efthymios Tzinis
Hakan Erdogan
Ron J. Weiss
K. Wilson
J. Hershey
16
27
0
23 Jun 2020
Efficient Integration of Multi-channel Information for
  Speaker-independent Speech Separation
Efficient Integration of Multi-channel Information for Speaker-independent Speech Separation
Yuichiro Koyama
Oluwafemi Azeez
Bhiksha Raj
27
4
0
23 May 2020
End-to-End Speaker Diarization for an Unknown Number of Speakers with
  Encoder-Decoder Based Attractors
End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors
Shota Horiguchi
Yusuke Fujita
Shinji Watanabe
Yawen Xue
Kenji Nagamatsu
37
186
0
20 May 2020
Multimodal Target Speech Separation with Voice and Face References
Multimodal Target Speech Separation with Voice and Face References
Leyuan Qu
C. Weber
S. Wermter
CVBM
19
19
0
17 May 2020
FaceFilter: Audio-visual speech separation using still images
FaceFilter: Audio-visual speech separation using still images
Soo-Whan Chung
Soyeon Choe
Joon Son Chung
Hong-Goo Kang
CVBM
21
66
0
14 May 2020
Foreground-Background Ambient Sound Scene Separation
Foreground-Background Ambient Sound Scene Separation
Michel Olvera
Emmanuel Vincent
Romain Serizel
Gilles Gasso
37
9
0
11 May 2020
SpEx+: A Complete Time Domain Speaker Extraction Network
SpEx+: A Complete Time Domain Speaker Extraction Network
Meng Ge
Chenglin Xu
Longbiao Wang
Chng Eng Siong
J. Dang
Haizhou Li
27
145
0
10 May 2020
Asteroid: the PyTorch-based audio source separation toolkit for
  researchers
Asteroid: the PyTorch-based audio source separation toolkit for researchers
Manuel Pariente
Samuele Cornell
Joris Cosentino
S. Sivasankaran
Efthymios Tzinis
...
Juan M. Martín-Donas
David Ditter
Ariel Frank
Antoine Deleforge
Emmanuel Vincent
27
151
0
08 May 2020
Neural Spatio-Temporal Beamformer for Target Speech Separation
Neural Spatio-Temporal Beamformer for Target Speech Separation
Yong-mei Xu
Meng Yu
Shi-Xiong Zhang
Lianwu Chen
Chao Weng
Jianming Liu
Dong Yu
26
41
0
08 May 2020
Serialized Output Training for End-to-End Overlapped Speech Recognition
Serialized Output Training for End-to-End Overlapped Speech Recognition
Naoyuki Kanda
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Takuya Yoshioka
10
113
0
28 Mar 2020
Separating Varying Numbers of Sources with Auxiliary Autoencoding Loss
Separating Varying Numbers of Sources with Auxiliary Autoencoding Loss
Yi Luo
N. Mesgarani
21
29
0
27 Mar 2020
Tackling real noisy reverberant meetings with all-neural source
  separation, counting, and diarization system
Tackling real noisy reverberant meetings with all-neural source separation, counting, and diarization system
K. Kinoshita
Marc Delcroix
S. Araki
Tomohiro Nakatani
197
30
0
09 Mar 2020
Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature
  Learning
Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning
Rongzhi Gu
Shi-Xiong Zhang
Lianwu Chen
Yong-mei Xu
Meng Yu
Dan Su
Yuexian Zou
Dong Yu
8
59
0
09 Mar 2020
Voice Separation with an Unknown Number of Multiple Speakers
Voice Separation with an Unknown Number of Multiple Speakers
Eliya Nachmani
Yossi Adi
Lior Wolf
20
175
0
29 Feb 2020
End-to-End Neural Diarization: Reformulating Speaker Diarization as
  Simple Multi-label Classification
End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification
Yusuke Fujita
Shinji Watanabe
Shota Horiguchi
Yawen Xue
Kenji Nagamatsu
14
49
0
24 Feb 2020
Wavesplit: End-to-End Speech Separation by Speaker Clustering
Wavesplit: End-to-End Speech Separation by Speaker Clustering
Neil Zeghidour
David Grangier
VLM
27
261
0
20 Feb 2020
End-to-End Multi-speaker Speech Recognition with Transformer
End-to-End Multi-speaker Speech Recognition with Transformer
Xuankai Chang
Wangyou Zhang
Y. Qian
Jonathan Le Roux
Shinji Watanabe
ViT
27
103
0
10 Feb 2020
Continuous speech separation: dataset and analysis
Continuous speech separation: dataset and analysis
Zhuo Chen
Takuya Yoshioka
Liang Lu
Tianyan Zhou
Zhong Meng
Yi Luo
Jian Wu
Xiong Xiao
Jinyu Li
16
211
0
30 Jan 2020
Audio-visual Recognition of Overlapped speech for the LRS2 dataset
Audio-visual Recognition of Overlapped speech for the LRS2 dataset
Jianwei Yu
Shi-Xiong Zhang
Jian Wu
Shahram Ghorbani
Bo Wu
Shiyin Kang
Shansong Liu
Xunying Liu
Helen Meng
Dong Yu
32
72
0
06 Jan 2020
End-to-end training of time domain audio separation and recognition
End-to-end training of time domain audio separation and recognition
Thilo von Neumann
K. Kinoshita
Lukas Drude
Christoph Boeddeker
Marc Delcroix
Tomohiro Nakatani
Reinhold Haeb-Umbach
25
34
0
18 Dec 2019
Unsupervised Training for Deep Speech Source Separation with
  Kullback-Leibler Divergence Based Probabilistic Loss Function
Unsupervised Training for Deep Speech Source Separation with Kullback-Leibler Divergence Based Probabilistic Loss Function
M. Togami
Yoshiki Masuyama
Tatsuya Komatsu
Yumi Nakagome
19
25
0
11 Nov 2019
End-to-end Non-Negative Autoencoders for Sound Source Separation
End-to-end Non-Negative Autoencoders for Sound Source Separation
Shrikant Venkataramani
Efthymios Tzinis
Paris Smaragdis
17
5
0
31 Oct 2019
Mixup-breakdown: a consistency training method for improving
  generalization of speech separation models
Mixup-breakdown: a consistency training method for improving generalization of speech separation models
Max W. Y. Lam
Jun Wang
Dan Su
Dong Yu
33
22
0
28 Oct 2019
A Multi-Phase Gammatone Filterbank for Speech Separation via TasNet
A Multi-Phase Gammatone Filterbank for Speech Separation via TasNet
David Ditter
Timo Gerkmann
17
57
0
25 Oct 2019
Filterbank design for end-to-end speech separation
Filterbank design for end-to-end speech separation
Manuel Pariente
Samuele Cornell
Antoine Deleforge
Emmanuel Vincent
26
69
0
23 Oct 2019
Two-Step Sound Source Separation: Training on Learned Latent Targets
Two-Step Sound Source Separation: Training on Learned Latent Targets
Efthymios Tzinis
Shrikant Venkataramani
Zhepei Wang
Y. C. Sübakan
Paris Smaragdis
24
64
0
22 Oct 2019
Discriminative Neural Clustering for Speaker Diarisation
Discriminative Neural Clustering for Speaker Diarisation
Qiujia Li
Florian Kreyssig
Chao Zhang
P. Woodland
11
44
0
22 Oct 2019
CochleaNet: A Robust Language-independent Audio-Visual Model for Speech
  Enhancement
CochleaNet: A Robust Language-independent Audio-Visual Model for Speech Enhancement
M. Gogate
K. Dashtipour
Ahsan Adeel
Amir Hussain
23
53
0
23 Sep 2019
End-to-End Neural Speaker Diarization with Self-attention
End-to-End Neural Speaker Diarization with Self-attention
Yusuke Fujita
Naoyuki Kanda
Shota Horiguchi
Yawen Xue
Kenji Nagamatsu
Shinji Watanabe
190
237
0
13 Sep 2019
My lips are concealed: Audio-visual speech enhancement through
  obstructions
My lips are concealed: Audio-visual speech enhancement through obstructions
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
16
90
0
11 Jul 2019
Object Discovery with a Copy-Pasting GAN
Object Discovery with a Copy-Pasting GAN
Relja Arandjelović
Andrew Zisserman
27
57
0
27 May 2019
Analysis of Deep Clustering as Preprocessing for Automatic Speech
  Recognition of Sparsely Overlapping Speech
Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech
T. Menne
Ilya Sklyar
Ralf Schluter
Hermann Ney
27
35
0
09 May 2019
Universal Sound Separation
Universal Sound Separation
Ilya Kavalerov
Scott Wisdom
Hakan Erdogan
Brian Patton
K. Wilson
Jonathan Le Roux
J. Hershey
11
184
0
08 May 2019
Improved Speech Separation with Time-and-Frequency Cross-domain Joint
  Embedding and Clustering
Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering
Gene-Ping Yang
Chao-I Tuan
Hung-yi Lee
Lin-Shan Lee
20
25
0
16 Apr 2019
Co-Separating Sounds of Visual Objects
Co-Separating Sounds of Visual Objects
Ruohan Gao
Kristen Grauman
33
206
0
16 Apr 2019
The Sound of Motions
The Sound of Motions
Hang Zhao
Chuang Gan
Wei-Chiu Ma
Antonio Torralba
17
251
0
11 Apr 2019
Optimization of Speaker Extraction Neural Network with Magnitude and
  Temporal Spectrum Approximation Loss
Optimization of Speaker Extraction Neural Network with Magnitude and Temporal Spectrum Approximation Loss
Chenglin Xu
Wei Rao
Chng Eng Siong
Haizhou Li
45
53
0
24 Mar 2019
Low-Latency Deep Clustering For Speech Separation
Low-Latency Deep Clustering For Speech Separation
Shanshan Wang
Gaurav Naithani
Tuomas Virtanen
24
14
0
19 Feb 2019
FurcaNet: An end-to-end deep gated convolutional, long short-term
  memory, deep neural networks for single channel speech separation
FurcaNet: An end-to-end deep gated convolutional, long short-term memory, deep neural networks for single channel speech separation
Ziqiang Shi
Huibin Lin
L. Liu
Rujie Liu
Shoji Hayakawa
Shouji Harada
Jiqing Han
22
22
0
02 Feb 2019
The Visual Centrifuge: Model-Free Layered Video Representations
The Visual Centrifuge: Model-Free Layered Video Representations
Jean-Baptiste Alayrac
João Carreira
Andrew Zisserman
21
48
0
04 Dec 2018
Deep Learning Based Phase Reconstruction for Speaker Separation: A
  Trigonometric Perspective
Deep Learning Based Phase Reconstruction for Speaker Separation: A Trigonometric Perspective
Zhong-Qiu Wang
Ke Tan
DeLiang Wang
50
95
0
22 Nov 2018
Building Corpora for Single-Channel Speech Separation Across Multiple
  Domains
Building Corpora for Single-Channel Speech Separation Across Multiple Domains
Aman Rana
Gregory Sell
Leibny Paola García Perera
A. Lowe
Pratik Shah
19
10
0
06 Nov 2018
Speaker Selective Beamformer with Keyword Mask Estimation
Speaker Selective Beamformer with Keyword Mask Estimation
Yusuke Kida
Dung T. Tran
Motoi Omachi
T. Taniguchi
Yuya Fujita
17
3
0
25 Oct 2018
Phasebook and Friends: Leveraging Discrete Representations for Source
  Separation
Phasebook and Friends: Leveraging Discrete Representations for Source Separation
Jonathan Le Roux
Gordon Wichern
Shinji Watanabe
Andy M. Sarroff
J. Hershey
19
76
0
02 Oct 2018
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for
  Speech Separation
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
Yi Luo
N. Mesgarani
63
1,750
0
20 Sep 2018
Deep Extractor Network for Target Speaker Recovery From Single Channel
  Speech Mixtures
Deep Extractor Network for Target Speaker Recovery From Single Channel Speech Mixtures
Jun Wang
Jie Chen
Dan Su
Lianwu Chen
Meng Yu
Y. Qian
Dong Yu
46
90
0
24 Jul 2018
Deep Speech Denoising with Vector Space Projections
Deep Speech Denoising with Vector Space Projections
Jeff Hetherly
Paul Gamble
M. Barrios
Cory Stephenson
Karl S. Ni
13
0
0
27 Apr 2018
Previous
1234
Next