ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1607.00325
  4. Cited By
Permutation Invariant Training of Deep Models for Speaker-Independent
  Multi-talker Speech Separation

Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation

1 July 2016
Dong Yu
Morten Kolbæk
Zheng-Hua Tan
Jesper Jensen
ArXivPDFHTML

Papers citing "Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation"

50 / 158 papers shown
Title
Listen only to me! How well can target speech extraction handle false
  alarms?
Listen only to me! How well can target speech extraction handle false alarms?
Marc Delcroix
K. Kinoshita
Tsubasa Ochiai
Kateřina Žmolíková
Hiroshi Sato
Tomohiro Nakatani
34
15
0
11 Apr 2022
Multichannel Speech Separation with Narrow-band Conformer
Multichannel Speech Separation with Narrow-band Conformer
Changsheng Quan
Xiaofei Li
31
12
0
09 Apr 2022
Heterogeneous Target Speech Separation
Heterogeneous Target Speech Separation
Hyunjae Cho
Wonbin Jung
Junhyeok Lee
Paris Smaragdis
Sanghyun Woo
51
26
0
07 Apr 2022
Target Confusion in End-to-end Speaker Extraction: Analysis and
  Approaches
Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches
Zifeng Zhao
Dongchao Yang
Rongzhi Gu
Haoran Zhang
Yuexian Zou
25
16
0
04 Apr 2022
End-to-end multi-talker audio-visual ASR using an active speaker
  attention module
End-to-end multi-talker audio-visual ASR using an active speaker attention module
R. Rose
Olivier Siohan
18
3
0
01 Apr 2022
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement
  by Re-Synthesis
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Karren D. Yang
Dejan Marković
Steven Krenn
Vasu Agrawal
Alexander Richard
VGen
18
32
0
31 Mar 2022
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
  for Semantic and Generative Capabilities
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
Hsiang-Sheng Tsai
Heng-Jui Chang
Wen-Chin Huang
Zili Huang
Kushal Lakhotia
...
Hsuan-Jui Chen
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
26
109
0
14 Mar 2022
Audio-visual speech separation based on joint feature representation
  with cross-modal attention
Audio-visual speech separation based on joint feature representation with cross-modal attention
Jun Xiong
Peng Zhang
Lei Xie
Wei Huang
Yufei Zha
Yanni Zhang
28
3
0
05 Mar 2022
Audio Self-supervised Learning: A Survey
Audio Self-supervised Learning: A Survey
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
42
106
0
02 Mar 2022
L-SpEx: Localized Target Speaker Extraction
L-SpEx: Localized Target Speaker Extraction
Meng Ge
Chenglin Xu
Longbiao Wang
Eng Siong Chng
J. Dang
Haizhou Li
30
21
0
21 Feb 2022
MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation
  Invariant Training
MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation Invariant Training
Ertuğ Karamatlı
S. Kırbız
SSL
36
9
0
08 Feb 2022
Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting
  Transcription Grand Challenge
Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge
Fan Yu
Shiliang Zhang
Pengcheng Guo
Yihui Fu
Zhihao Du
...
Kong Aik Lee
Zhijie Yan
B. Ma
Xin Xu
Hui Bu
18
28
0
08 Feb 2022
SkiM: Skipping Memory LSTM for Low-Latency Real-Time Continuous Speech
  Separation
SkiM: Skipping Memory LSTM for Low-Latency Real-Time Continuous Speech Separation
Chenda Li
Lei Yang
Weiqin Wang
Y. Qian
34
25
0
26 Jan 2022
Multi-turn RNN-T for streaming recognition of multi-party speech
Multi-turn RNN-T for streaming recognition of multi-party speech
Ilya Sklyar
A. Piunova
Xianrui Zheng
Yulan Liu
24
22
0
19 Dec 2021
End-to-end speaker diarization with transformer
End-to-end speaker diarization with transformer
Yongquan Lai
Xin Tang
Yuanyuan Fu
Rui Fang
31
1
0
14 Dec 2021
A Time-domain Real-valued Generalized Wiener Filter for Multi-channel
  Neural Separation Systems
A Time-domain Real-valued Generalized Wiener Filter for Multi-channel Neural Separation Systems
Yi Luo
29
14
0
07 Dec 2021
Single-channel speech separation using Soft-minimum Permutation
  Invariant Training
Single-channel speech separation using Soft-minimum Permutation Invariant Training
Midia Yousefi
John H. L. Hansen
21
3
0
16 Nov 2021
Reduction of Subjective Listening Effort for TV Broadcast Signals with
  Recurrent Neural Networks
Reduction of Subjective Listening Effort for TV Broadcast Signals with Recurrent Neural Networks
Nils L. Westhausen
R. Huber
Hannah Baumgartner
Ragini Sinha
J. Rennies
B. Meyer
33
10
0
02 Nov 2021
Recent Advances in End-to-End Automatic Speech Recognition
Recent Advances in End-to-End Automatic Speech Recognition
Jinyu Li
VLM
37
363
0
02 Nov 2021
Real-time Speaker counting in a cocktail party scenario using
  Attention-guided Convolutional Neural Network
Real-time Speaker counting in a cocktail party scenario using Attention-guided Convolutional Neural Network
Midia Yousefi
John H. L. Hansen
28
10
0
30 Oct 2021
Continuous Speech Separation with Recurrent Selective Attention Network
Continuous Speech Separation with Recurrent Selective Attention Network
Yixuan Zhang
Zhuo Chen
Jian Wu
Takuya Yoshioka
Peidong Wang
Zhong Meng
Jinyu Li
BDL
27
7
0
28 Oct 2021
Progressive Learning for Stabilizing Label Selection in Speech
  Separation with Mapping-based Method
Progressive Learning for Stabilizing Label Selection in Speech Separation with Mapping-based Method
Chenyang Gao
Yue Gu
I. Marsic
38
0
0
20 Oct 2021
Continual self-training with bootstrapped remixing for speech
  enhancement
Continual self-training with bootstrapped remixing for speech enhancement
Efthymios Tzinis
Yossi Adi
V. Ithapu
Buye Xu
Anurag Kumar
26
16
0
19 Oct 2021
All-neural beamformer for continuous speech separation
All-neural beamformer for continuous speech separation
Zhuohuang Zhang
Takuya Yoshioka
Naoyuki Kanda
Zhuo Chen
Xiaofei Wang
Dongmei Wang
Sefik Emre Eskimez
33
15
0
13 Oct 2021
Multi-channel Narrow-band Deep Speech Separation with Full-band
  Permutation Invariant Training
Multi-channel Narrow-band Deep Speech Separation with Full-band Permutation Invariant Training
Changsheng Quan
Xiaofei Li
34
14
0
12 Oct 2021
Visual Scene Graphs for Audio Source Separation
Visual Scene Graphs for Audio Source Separation
Moitreya Chatterjee
Jonathan Le Roux
Narendra Ahuja
A. Cherian
26
36
0
24 Sep 2021
Graph-PIT: Generalized permutation invariant training for continuous
  separation of arbitrary numbers of speakers
Graph-PIT: Generalized permutation invariant training for continuous separation of arbitrary numbers of speakers
Thilo von Neumann
K. Kinoshita
Christoph Boeddeker
Marc Delcroix
Reinhold Haeb-Umbach
28
23
0
30 Jul 2021
Don't Separate, Learn to Remix: End-to-End Neural Remixing with Joint
  Optimization
Don't Separate, Learn to Remix: End-to-End Neural Remixing with Joint Optimization
Haici Yang
Shivani Firodiya
Nicholas J. Bryan
Minje Kim
34
7
0
28 Jul 2021
Deep neural network Based Low-latency Speech Separation with Asymmetric
  analysis-Synthesis Window Pair
Deep neural network Based Low-latency Speech Separation with Asymmetric analysis-Synthesis Window Pair
Shanshan Wang
Gaurav Naithani
A. Politis
Tuomas Virtanen
40
10
0
22 Jun 2021
Encoder-Decoder Based Attractors for End-to-End Neural Diarization
Encoder-Decoder Based Attractors for End-to-End Neural Diarization
Shota Horiguchi
Yusuke Fujita
Shinji Watanabe
Yawen Xue
Leibny Paola García-Perera
37
64
0
20 Jun 2021
WASE: Learning When to Attend for Speaker Extraction in Cocktail Party
  Environments
WASE: Learning When to Attend for Speaker Extraction in Cocktail Party Environments
Yunzhe Hao
Jiaming Xu
Peng Zhang
Bo Xu
17
17
0
13 Jun 2021
PILOT: Introducing Transformers for Probabilistic Sound Event
  Localization
PILOT: Introducing Transformers for Probabilistic Sound Event Localization
C. Schymura
Benedikt T. Bönninghoff
Tsubasa Ochiai
Marc Delcroix
K. Kinoshita
Tomohiro Nakatani
S. Araki
D. Kolossa
27
24
0
07 Jun 2021
Lightweight Dual-channel Target Speaker Separation for Mobile Voice
  Communication
Lightweight Dual-channel Target Speaker Separation for Mobile Voice Communication
Yuanyuan Bao
Yanze Xu
Na Xu
Wenjing Yang
Hongfeng Li
Shicong Li
Y. Jia
Fei Xiang
Jincheng He
Ming Li
30
1
0
05 Jun 2021
Sparse, Efficient, and Semantic Mixture Invariant Training: Taming
  In-the-Wild Unsupervised Sound Separation
Sparse, Efficient, and Semantic Mixture Invariant Training: Taming In-the-Wild Unsupervised Sound Separation
Scott Wisdom
A. Jansen
Ron J. Weiss
Hakan Erdogan
J. Hershey
38
26
0
01 Jun 2021
Many-Speakers Single Channel Speech Separation with Optimal Permutation
  Training
Many-Speakers Single Channel Speech Separation with Optimal Permutation Training
Shaked Dovrat
Eliya Nachmani
Lior Wolf
VLM
14
21
0
18 Apr 2021
Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss
Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Shota Orihashi
Ryo Masumura
30
8
0
02 Mar 2021
Sandglasset: A Light Multi-Granularity Self-attentive Network For
  Time-Domain Speech Separation
Sandglasset: A Light Multi-Granularity Self-attentive Network For Time-Domain Speech Separation
Max W. Y. Lam
Jun Wang
Dan Su
Dong Yu
AI4TS
72
49
0
01 Mar 2021
End-to-End Dereverberation, Beamforming, and Speech Recognition with
  Improved Numerical Stability and Advanced Frontend
End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend
Wangyou Zhang
Christoph Boeddeker
Shinji Watanabe
Tomohiro Nakatani
Marc Delcroix
K. Kinoshita
Tsubasa Ochiai
Naoyuki Kamo
Reinhold Haeb-Umbach
Y. Qian
20
32
0
23 Feb 2021
Deep Learning based Multi-Source Localization with Source Splitting and
  its Effectiveness in Multi-Talker Speech Recognition
Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition
Aswin Shanmugam Subramanian
Chao Weng
Shinji Watanabe
Meng Yu
Dong Yu
34
78
0
16 Feb 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
196
199
0
08 Jan 2021
Group Communication with Context Codec for Lightweight Source Separation
Group Communication with Context Codec for Lightweight Source Separation
Yi Luo
Cong Han
N. Mesgarani
26
20
0
14 Dec 2020
Deep Ad-hoc Beamforming Based on Speaker Extraction for Target-Dependent
  Speech Separation
Deep Ad-hoc Beamforming Based on Speaker Extraction for Target-Dependent Speech Separation
Ziye Yang
Shanzheng Guan
Xiao-Lei Zhang
22
14
0
01 Dec 2020
Streaming end-to-end multi-talker speech recognition
Streaming end-to-end multi-talker speech recognition
Liang Lu
Naoyuki Kanda
Jinyu Li
Jiawei Liu
13
41
0
26 Nov 2020
Single channel voice separation for unknown number of speakers under
  reverberant and noisy settings
Single channel voice separation for unknown number of speakers under reverberant and noisy settings
Shlomo E. Chazan
Lior Wolf
Eliya Nachmani
Yossi Adi
29
29
0
04 Nov 2020
Attention is All You Need in Speech Separation
Attention is All You Need in Speech Separation
Cem Subakan
Mirco Ravanelli
Samuele Cornell
Mirko Bronzi
Jianyuan Zhong
45
539
0
25 Oct 2020
An Improved Event-Independent Network for Polyphonic Sound Event
  Localization and Detection
An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection
Yin Cao
Turab Iqbal
Qiuqiang Kong
Y. Zhong
Wenwu Wang
Mark D. Plumbley
16
75
0
25 Oct 2020
Speaker Separation Using Speaker Inventories and Estimated Speech
Speaker Separation Using Speaker Inventories and Estimated Speech
Peidong Wang
Zhuo Chen
DeLiang Wang
Jinyu Li
Jiawei Liu
38
11
0
20 Oct 2020
Multi-microphone Complex Spectral Mapping for Utterance-wise and
  Continuous Speech Separation
Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation
Zhong-Qiu Wang
Peidong Wang
DeLiang Wang
33
88
0
04 Oct 2020
VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device
  Speech Recognition
VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition
Quan Wang
Ignacio López Moreno
Mert Saglam
K. Wilson
Alan Chiao
...
Yanzhang He
Wei Li
Jason W. Pelecanos
M. Nika
A. Gruenstein
VLM
39
82
0
09 Sep 2020
Audio-Visual Event Localization via Recursive Fusion by Joint
  Co-Attention
Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention
Bin Duan
Hao Tang
Wei Wang
Ziliang Zong
Guowei Yang
Yan Yan
33
59
0
14 Aug 2020
Previous
1234
Next