ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1607.00325
  4. Cited By
Permutation Invariant Training of Deep Models for Speaker-Independent
  Multi-talker Speech Separation

Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation

1 July 2016
Dong Yu
Morten Kolbæk
Zheng-Hua Tan
Jesper Jensen
ArXivPDFHTML

Papers citing "Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation"

50 / 155 papers shown
Title
ArrayDPS: Unsupervised Blind Speech Separation with a Diffusion Prior
ArrayDPS: Unsupervised Blind Speech Separation with a Diffusion Prior
Zhongweiyang Xu
Xulin Fan
Zhong-Qiu Wang
Xilin Jiang
Romit Roy Choudhury
DiffM
54
0
0
08 May 2025
A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models
A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models
Kohei Saijo
Tetsuji Ogawa
52
1
0
28 Apr 2025
Location-Oriented Sound Event Localization and Detection with Spatial Mapping and Regression Localization
Location-Oriented Sound Event Localization and Detection with Spatial Mapping and Regression Localization
Xueping Zhang
Yaxiong Chen
Ruilin Yao
Yunfei Zi
Shengwu Xiong
38
0
0
11 Apr 2025
USED: Universal Speaker Extraction and Diarization
USED: Universal Speaker Extraction and Diarization
Junyi Ao
Mehmet Sinan Yildirim
Ruijie Tao
Mengyao Ge
Shuai Wang
Yan-min Qian
Haizhou Li
43
6
0
17 Jan 2025
Mask-Weighted Spatial Likelihood Coding for Speaker-Independent Joint Localization and Mask Estimation
Mask-Weighted Spatial Likelihood Coding for Speaker-Independent Joint Localization and Mask Estimation
Jakob Kienegger
Alina Mannanova
Timo Gerkmann
46
0
0
10 Jan 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
42
28
0
02 Jan 2025
SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic Camera
SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic Camera
Yuhang He
Sangyun Shin
Anoop Cherian
Niki Trigoni
Andrew Markham
88
0
0
31 Dec 2024
Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
Jiawen Kang
Lingwei Meng
Mingyu Cui
Yuejiao Wang
Xixin Wu
Xunying Liu
Helen Meng
41
2
0
19 Sep 2024
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions
Lingwei Meng
Shujie Hu
Jiawen Kang
Zhaoqing Li
Yuejiao Wang
Wenxuan Wu
Xixin Wu
Xunying Liu
Helen Meng
AuLLM
72
2
0
13 Sep 2024
USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
Bang Zeng
Ming Li
39
2
0
04 Sep 2024
Serialized Speech Information Guidance with Overlapped Encoding
  Separation for Multi-Speaker Automatic Speech Recognition
Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition
Hao Shi
Yuan Gao
Zhaoheng Ni
Tatsuya Kawahara
34
2
0
01 Sep 2024
A Large-Scale Evaluation of Speech Foundation Models
A Large-Scale Evaluation of Speech Foundation Models
Shu-Wen Yang
Heng-Jui Chang
Zili Huang
Andy T. Liu
Cheng-I Jeff Lai
...
Kushal Lakhotia
Shang-Wen Li
Abdelrahman Mohamed
Shinji Watanabe
Hung-yi Lee
38
19
0
15 Apr 2024
Weakly-supervised Audio Separation via Bi-modal Semantic Similarity
Weakly-supervised Audio Separation via Bi-modal Semantic Similarity
Tanvir Mahmud
Saeed Amizadeh
K. Koishida
Diana Marculescu
AI4TS
16
2
0
02 Apr 2024
Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by
  Attention Constraints
Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by Attention Constraints
PeiYing Lee
HauYun Guo
Berlin Chen
34
0
0
21 Mar 2024
Single-Channel Robot Ego-Speech Filtering during Human-Robot Interaction
Single-Channel Robot Ego-Speech Filtering during Human-Robot Interaction
Yue Li
Koen V. Hindriks
Florian A. Kunneman
35
2
0
05 Mar 2024
Sound Source Separation Using Latent Variational Block-Wise
  Disentanglement
Sound Source Separation Using Latent Variational Block-Wise Disentanglement
Karim Helwani
M. Togami
Paris Smaragdis
Michael M. Goodwin
BDL
DRL
26
1
0
08 Feb 2024
TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down
  Fusion
TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion
Samuel Pegg
Kai Li
Xiaolin Hu
32
1
0
25 Jan 2024
Self-Supervised Music Source Separation Using Vector-Quantized Source
  Category Estimates
Self-Supervised Music Source Separation Using Vector-Quantized Source Category Estimates
Marco Pasini
Stefan Lattner
George Fazekas
35
1
0
21 Nov 2023
Frame-wise streaming end-to-end speaker diarization with
  non-autoregressive self-attention-based attractors
Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors
Di Liang
Nian Shao
Xiaofei Li
33
4
0
25 Sep 2023
Attention-based Encoder-Decoder End-to-End Neural Diarization with
  Embedding Enhancer
Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer
Zhengyang Chen
Bing Han
Shuai Wang
Yan-min Qian
28
18
0
13 Sep 2023
The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing
  Track
The Sound Demixing Challenge 2023 \unicodex2013\unicode{x2013}\unicodex2013 Cinematic Demixing Track
Stefan Uhlich
Giorgio Fabbro
M. Hirano
Shusuke Takahashi
Gordon Wichern
...
R. Solovyev
A. Stempkovskiy
T. Habruseva
M. Sukhovei
Yuki Mitsufuji
50
11
0
14 Aug 2023
Complete and separate: Conditional separation with missing target source
  attribute completion
Complete and separate: Conditional separation with missing target source attribute completion
Dimitrios Bralios
Efthymios Tzinis
Paris Smaragdis
35
0
0
27 Jul 2023
AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker
  Extraction
AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction
Jiuxin Lin
X. Cai
Heinrich Dinkel
Jun Chen
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Zhiyong Wu
Yujun Wang
Helen M. Meng
22
21
0
25 Jun 2023
Mixture Encoder for Joint Speech Separation and Recognition
Mixture Encoder for Joint Speech Separation and Recognition
Simon Berger
Peter Vieting
Christoph Boeddeker
Ralf Schluter
Reinhold Häb-Umbach
24
6
0
21 Jun 2023
An Efficient Speech Separation Network Based on Recurrent Fusion Dilated
  Convolution and Channel Attention
An Efficient Speech Separation Network Based on Recurrent Fusion Dilated Convolution and Channel Attention
Junyu Wang
22
1
0
09 Jun 2023
Unified Modeling of Multi-Talker Overlapped Speech Recognition and
  Diarization with a Sidecar Separator
Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator
Lingwei Meng
Jiawen Kang
Mingyu Cui
Haibin Wu
Xixin Wu
Helen M. Meng
39
10
0
25 May 2023
BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR
BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR
Yuhao Liang
Fan Yu
Yangze Li
Pengcheng Guo
Shiliang Zhang
Qian Chen
Linfu Xie
30
8
0
23 May 2023
Unsupervised Multi-channel Separation and Adaptation
Unsupervised Multi-channel Separation and Adaptation
Cong Han
K. Wilson
Scott Wisdom
J. Hershey
26
4
0
18 May 2023
Neural Diarization with Non-autoregressive Intermediate Attractors
Neural Diarization with Non-autoregressive Intermediate Attractors
Yusuke Fujita
Tatsuya Komatsu
Robin Scheibler
Yusuke Kida
Tetsuji Ogawa
40
11
0
13 Mar 2023
Neural Target Speech Extraction: An Overview
Neural Target Speech Extraction: An Overview
Kateřina Žmolíková
Marc Delcroix
Tsubasa Ochiai
K. Kinoshita
JanHonza'' vCernocký
Dong Yu
23
86
0
31 Jan 2023
Multi-Scale Feature Fusion Transformer Network for End-to-End Single
  Channel Speech Separation
Multi-Scale Feature Fusion Transformer Network for End-to-End Single Channel Speech Separation
Yinhao Xu
Jian Zhou
L. Tao
H. Kwan
30
0
0
14 Dec 2022
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled
  Videos
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
Hao-Wen Dong
Naoya Takahashi
Yuki Mitsufuji
Julian McAuley
Taylor Berg-Kirkpatrick
VLM
CLIP
31
25
0
14 Dec 2022
iQuery: Instruments as Queries for Audio-Visual Sound Separation
iQuery: Instruments as Queries for Audio-Visual Sound Separation
Jiaben Chen
Renrui Zhang
Dongze Lian
Jiaqi Yang
Ziyao Zeng
Jianbo Shi
34
27
0
07 Dec 2022
Deep neural network techniques for monaural speech enhancement: state of
  the art analysis
Deep neural network techniques for monaural speech enhancement: state of the art analysis
P. Ochieng
30
21
0
01 Dec 2022
JaCappella Corpus: A Japanese a Cappella Vocal Ensemble Corpus
JaCappella Corpus: A Japanese a Cappella Vocal Ensemble Corpus
Tomohiko Nakamura
Shinnosuke Takamichi
Naoko Tanji
Satoru Fukayama
Hiroshi Saruwatari
23
4
0
29 Nov 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Mix and Localize: Localizing Sound Sources in Mixtures
Xixi Hu
Ziyang Chen
Andrew Owens
30
51
0
28 Nov 2022
Latent Iterative Refinement for Modular Source Separation
Latent Iterative Refinement for Modular Source Separation
Dimitrios Bralios
Efthymios Tzinis
Gordon Wichern
Paris Smaragdis
Jonathan Le Roux
BDL
33
5
0
22 Nov 2022
Self-Remixing: Unsupervised Speech Separation via Separation and
  Remixing
Self-Remixing: Unsupervised Speech Separation via Separation and Remixing
Kohei Saijo
Tetsuji Ogawa
SSL
22
11
0
18 Nov 2022
Show Me the Instruments: Musical Instrument Retrieval from Mixture Audio
Show Me the Instruments: Musical Instrument Retrieval from Mixture Audio
Kyungsuk Kim
Minju Park
Ha-na Joung
Yunkee Chae
Yeongbeom Hong
Seonghyeon Go
Kyogu Lee
11
6
0
15 Nov 2022
Self-supervised learning with bi-label masked speech prediction for
  streaming multi-talker speech recognition
Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
Zili Huang
Zhuo Chen
Naoyuki Kanda
Jian Wu
Yiming Wang
Jinyu Li
Takuya Yoshioka
Xiaofei Wang
Peidong Wang
28
3
0
10 Nov 2022
Speech separation with large-scale self-supervised learning
Speech separation with large-scale self-supervised learning
Zhuo Chen
Naoyuki Kanda
Jian Wu
Yu-Huan Wu
Xiaofei Wang
Takuya Yoshioka
Jinyu Li
S. Sivasankaran
Sefik Emre Eskimez
19
14
0
09 Nov 2022
BER: Balanced Error Rate For Speaker Diarization
BER: Balanced Error Rate For Speaker Diarization
Tao Liu
K. Yu
20
4
0
08 Nov 2022
TSUP Speaker Diarization System for Conversational Short-phrase Speaker
  Diarization Challenge
TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge
Bowen Pang
Huan Zhao
Gaosheng Zhang
Xiaoyue Yang
Yanguo Sun
Li Zhang
Qing Wang
Linfu Xie
BDL
28
2
0
26 Oct 2022
Position tracking of a varying number of sound sources with sliding
  permutation invariant training
Position tracking of a varying number of sound sources with sliding permutation invariant training
David Diaz-Guerra
A. Politis
Tuomas Virtanen
30
5
0
26 Oct 2022
Adversarial Permutation Invariant Training for Universal Sound
  Separation
Adversarial Permutation Invariant Training for Universal Sound Separation
Emilian Postolache
Jordi Pons
Santiago Pascual
Joan Serrà
VLM
28
6
0
21 Oct 2022
Deep Learning Based Stage-wise Two-dimensional Speaker Localization with
  Large Ad-hoc Microphone Arrays
Deep Learning Based Stage-wise Two-dimensional Speaker Localization with Large Ad-hoc Microphone Arrays
Shupei Liu
Linfeng Feng
Yijun Gong
Chengdong Liang
Chen Zhang
Xiao-Lei Zhang
Xuelong Li
18
3
0
19 Oct 2022
Semi-supervised Time Domain Target Speaker Extraction with Attention
Semi-supervised Time Domain Target Speaker Extraction with Attention
Zhepei Wang
Ritwik Giri
Shrikant Venkataramani
Umut Isik
J. Valin
Paris Smaragdis
Mike Goodwin
A. Krishnaswamy
24
7
0
18 Jun 2022
Heterogeneous Separation Consistency Training for Adaptation of
  Unsupervised Speech Separation
Heterogeneous Separation Consistency Training for Adaptation of Unsupervised Speech Separation
Jiangyu Han
Yanhua Long
28
6
0
23 Apr 2022
RadioSES: mmWave-Based Audioradio Speech Enhancement and Separation
  System
RadioSES: mmWave-Based Audioradio Speech Enhancement and Separation System
M. Z. Ozturk
Chenshu Wu
Beibei Wang
Min Wu
K. Liu
27
20
0
14 Apr 2022
Listen only to me! How well can target speech extraction handle false
  alarms?
Listen only to me! How well can target speech extraction handle false alarms?
Marc Delcroix
K. Kinoshita
Tsubasa Ochiai
Kateřina Žmolíková
Hiroshi Sato
Tomohiro Nakatani
34
15
0
11 Apr 2022
1234
Next