ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.04826
  4. Cited By
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned
  Spectrogram Masking

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

11 October 2018
Quan Wang
Hannah Muckenhirn
K. Wilson
Prashant Sridhar
Zelin Wu
J. Hershey
Rif A. Saurous
Ron J. Weiss
Ye Jia
Ignacio López Moreno
ArXivPDFHTML

Papers citing "VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking"

50 / 82 papers shown
Title
Listen to Extract: Onset-Prompted Target Speaker Extraction
Listen to Extract: Onset-Prompted Target Speaker Extraction
Pengjie Shen
Kangrui Chen
Shulin He
Pengru Chen
Shuqi Yuan
He Kong
Xueliang Zhang
Zehao Wang
53
0
0
08 May 2025
End-to-End Target Speaker Speech Recognition Using Context-Aware Attention Mechanisms for Challenging Enrollment Scenario
Mohsen Ghane
Mohammad Sadegh Safari
78
0
0
28 Jan 2025
Beyond Speaker Identity: Text Guided Target Speech Extraction
Beyond Speaker Identity: Text Guided Target Speech Extraction
Mingyue Huo
Abhinav Jain
Cong Phuoc Huynh
Fanjie Kong
Pichao Wang
Zhu Liu
Vimal Bhat
56
0
0
17 Jan 2025
USED: Universal Speaker Extraction and Diarization
USED: Universal Speaker Extraction and Diarization
Junyi Ao
Mehmet Sinan Yildirim
Ruijie Tao
Mengyao Ge
Shuai Wang
Yan-min Qian
Haizhou Li
43
6
0
17 Jan 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
45
28
0
02 Jan 2025
Cross-attention Inspired Selective State Space Models for Target Sound
  Extraction
Cross-attention Inspired Selective State Space Models for Target Sound Extraction
Donghang Wu
Yiwen Wang
Xihong Wu
T. Qu
Mamba
37
3
0
07 Sep 2024
USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
Bang Zeng
Ming Li
45
3
0
04 Sep 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep
  Speaker Representation Learning
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
44
4
0
21 Jul 2024
Joint Speaker Features Learning for Audio-visual Multichannel Speech
  Separation and Recognition
Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
Guinan Li
Jiajun Deng
Youjun Chen
Mengzhe Geng
Shujie Hu
...
Zengrui Jin
Tianzi Wang
Xurong Xie
Helen Meng
Xunying Liu
VLM
34
0
0
14 Jun 2024
Single-Channel Robot Ego-Speech Filtering during Human-Robot Interaction
Single-Channel Robot Ego-Speech Filtering during Human-Robot Interaction
Yue Li
Koen V. Hindriks
Florian A. Kunneman
35
2
0
05 Mar 2024
Continuous Target Speech Extraction: Enhancing Personalized Diarization
  and Extraction on Complex Recordings
Continuous Target Speech Extraction: Enhancing Personalized Diarization and Extraction on Complex Recordings
He Zhao
Hangting Chen
Jianwei Yu
Yuehai Wang
53
0
0
29 Jan 2024
TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down
  Fusion
TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion
Samuel Pegg
Kai Li
Xiaolin Hu
34
1
0
25 Jan 2024
Audio Prompt Tuning for Universal Sound Separation
Audio Prompt Tuning for Universal Sound Separation
Yuzhuo Liu
Xubo Liu
Yan Zhao
Yuanyuan Wang
Rui Xia
Pingchuan Tain
Yuxuan Wang
VLM
41
5
0
30 Nov 2023
Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions
Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions
Jinzheng Zhao
Yong-mei Xu
Xinyuan Qian
Davide Berghi
Peipei Wu
Meng Cui
Jianyuan Sun
Philip J. B. Jackson
Wenwu Wang
BDL
47
7
0
23 Oct 2023
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Xiaofei Wang
Manthan Thakker
Zhuo Chen
Naoyuki Kanda
Sefik Emre Eskimez
Sanyuan Chen
M. Tang
Shujie Liu
Jinyu Li
Takuya Yoshioka
26
80
0
14 Aug 2023
Complete and separate: Conditional separation with missing target source
  attribute completion
Complete and separate: Conditional separation with missing target source attribute completion
Dimitrios Bralios
Efthymios Tzinis
Paris Smaragdis
37
0
0
27 Jul 2023
BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with
  Convolutional Cross Attention in Multi-talker Conditions
BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with Convolutional Cross Attention in Multi-talker Conditions
Jie Zhang
Qingquan Xu
Qiu-shi Zhu
Zhenhua Ling
27
11
0
17 May 2023
Universal Source Separation with Weakly Labelled Data
Universal Source Separation with Weakly Labelled Data
Qiuqiang Kong
K. Chen
Haohe Liu
Xingjian Du
Taylor Berg-Kirkpatrick
Shlomo Dubnov
Mark D. Plumbley
18
17
0
11 May 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
46
47
0
21 Mar 2023
Target Sound Extraction with Variable Cross-modality Clues
Target Sound Extraction with Variable Cross-modality Clues
Chenda Li
Yao Qian
Zhuo Chen
Dongmei Wang
Takuya Yoshioka
Shujie Liu
Y. Qian
Michael Zeng
VLM
29
13
0
15 Mar 2023
Neural Target Speech Extraction: An Overview
Neural Target Speech Extraction: An Overview
Kateřina Žmolíková
Marc Delcroix
Tsubasa Ochiai
K. Kinoshita
JanHonza'' vCernocký
Dong Yu
23
86
0
31 Jan 2023
Improving Target Speaker Extraction with Sparse LDA-transformed Speaker
  Embeddings
Improving Target Speaker Extraction with Sparse LDA-transformed Speaker Embeddings
Kai Liu
Xucheng Wan
Z.C. Du
Huan Zhou
VLM
27
1
0
16 Jan 2023
Deep neural network techniques for monaural speech enhancement: state of
  the art analysis
Deep neural network techniques for monaural speech enhancement: state of the art analysis
P. Ochieng
35
21
0
01 Dec 2022
The Potential of Neural Speech Synthesis-based Data Augmentation for
  Personalized Speech Enhancement
The Potential of Neural Speech Synthesis-based Data Augmentation for Personalized Speech Enhancement
Anastasia Kuznetsova
Aswin Sivaraman
Minje Kim
32
3
0
14 Nov 2022
Real-Time Joint Personalized Speech Enhancement and Acoustic Echo
  Cancellation
Real-Time Joint Personalized Speech Enhancement and Acoustic Echo Cancellation
Sefik Emre Eskimez
Takuya Yoshioka
Alex Ju
M. Tang
Tanel Pärnamaa
Huaming Wang
32
7
0
04 Nov 2022
Spatially Selective Deep Non-linear Filters for Speaker Extraction
Spatially Selective Deep Non-linear Filters for Speaker Extraction
Kristina Tesch
Timo Gerkmann
32
17
0
04 Nov 2022
Hierarchical speaker representation for target speaker extraction
Hierarchical speaker representation for target speaker extraction
Shulin He
Huaiwen Zhang
Wei Rao
Kanghao Zhang
Yukai Ju
Yang-Rui Yang
Xueliang Zhang
37
4
0
28 Oct 2022
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated
  Open-Domain On-Screen Sound Separation
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation
Efthymios Tzinis
Scott Wisdom
Tal Remez
J. Hershey
41
30
0
20 Jul 2022
Multi-channel target speech enhancement based on ERB-scaled spatial
  coherence features
Multi-channel target speech enhancement based on ERB-scaled spatial coherence features
Yicheng Hsu
Yonghan Lee
M. Bai
31
1
0
17 Jul 2022
Speaker Verification in Multi-Speaker Environments Using Temporal
  Feature Fusion
Speaker Verification in Multi-Speaker Environments Using Temporal Feature Fusion
Ahmad Aloradi
Wolfgang Mack
Mohamed Elminshawi
Emanuel Habets
38
5
0
28 Jun 2022
Semi-supervised Time Domain Target Speaker Extraction with Attention
Semi-supervised Time Domain Target Speaker Extraction with Attention
Zhepei Wang
Ritwik Giri
Shrikant Venkataramani
Umut Isik
J. Valin
Paris Smaragdis
Mike Goodwin
A. Krishnaswamy
24
7
0
18 Jun 2022
NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified
  Acoustic Echo Suppression And Speech Enhancement
NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement
Meng Yu
Yong-mei Xu
Chunlei Zhang
Shizhong Zhang
Dong Yu
22
11
0
20 May 2022
Text-Driven Separation of Arbitrary Sounds
Text-Driven Separation of Arbitrary Sounds
Kevin Kilgour
Beat Gfeller
Qingqing Huang
A. Jansen
Scott Wisdom
Marco Tagliasacchi
30
30
0
12 Apr 2022
Listen only to me! How well can target speech extraction handle false
  alarms?
Listen only to me! How well can target speech extraction handle false alarms?
Marc Delcroix
K. Kinoshita
Tsubasa Ochiai
Kateřina Žmolíková
Hiroshi Sato
Tomohiro Nakatani
34
15
0
11 Apr 2022
SoundBeam: Target sound extraction conditioned on sound-class labels and
  enrollment clues for increased performance and continuous learning
SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning
Marc Delcroix
Jorge Bennasar Vázquez
Tsubasa Ochiai
K. Kinoshita
Yasunori Ohishi
S. Araki
VLM
22
32
0
08 Apr 2022
Heterogeneous Target Speech Separation
Heterogeneous Target Speech Separation
Hyunjae Cho
Wonbin Jung
Junhyeok Lee
Paris Smaragdis
Sanghyun Woo
51
26
0
07 Apr 2022
RaDur: A Reference-aware and Duration-robust Network for Target Sound
  Detection
RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection
Dongchao Yang
Helin Wang
Zhongjie Ye
Yuexian Zou
Wenwu Wang
28
0
0
05 Apr 2022
Speaker Extraction with Co-Speech Gestures Cue
Speaker Extraction with Co-Speech Gestures Cue
Zexu Pan
Xinyuan Qian
Haizhou Li
SLR
23
27
0
31 Mar 2022
Separate What You Describe: Language-Queried Audio Source Separation
Separate What You Describe: Language-Queried Audio Source Separation
Xubo Liu
Haohe Liu
Qiuqiang Kong
Xinhao Mei
Jinzheng Zhao
Qiushi Huang
Mark D. Plumbley
Wenwu Wang
42
58
0
28 Mar 2022
Single microphone speaker extraction using unified time-frequency
  Siamese-Unet
Single microphone speaker extraction using unified time-frequency Siamese-Unet
Aviad Eisenberg
Sharon Gannot
Shlomo E. Chazan
30
3
0
06 Mar 2022
SpeechPainter: Text-conditioned Speech Inpainting
SpeechPainter: Text-conditioned Speech Inpainting
Zalan Borsos
Matthew Sharifi
Marco Tagliasacchi
16
26
0
15 Feb 2022
SkiM: Skipping Memory LSTM for Low-Latency Real-Time Continuous Speech
  Separation
SkiM: Skipping Memory LSTM for Low-Latency Real-Time Continuous Speech Separation
Chenda Li
Lei Yang
Weiqin Wang
Y. Qian
34
25
0
26 Jan 2022
Directed Speech Separation for Automatic Speech Recognition of Long Form
  Conversational Speech
Directed Speech Separation for Automatic Speech Recognition of Long Form Conversational Speech
Rohit Paturi
S. Srinivasan
Katrin Kirchhoff
Daniel Garcia-Romero
19
9
0
10 Dec 2021
Learning-based personal speech enhancement for teleconferencing by
  exploiting spatial-spectral features
Learning-based personal speech enhancement for teleconferencing by exploiting spatial-spectral features
Yicheng Hsu
Yonghan Lee
M. Bai
24
10
0
10 Dec 2021
Target Speech Extraction: Independent Vector Extraction Guided by
  Supervised Speaker Identification
Target Speech Extraction: Independent Vector Extraction Guided by Supervised Speaker Identification
J. Málek
Jakub Janský
Zbyněk Koldovský
Tomás Kounovský
Jaroslav Cmejla
J. Zdánský
25
10
0
05 Nov 2021
Cross-attention conformer for context modeling in speech enhancement for
  ASR
Cross-attention conformer for context modeling in speech enhancement for ASR
A. Narayanan
Chung-Cheng Chiu
Tom O'Malley
Quan Wang
Yanzhang He
24
14
0
30 Oct 2021
One model to enhance them all: array geometry agnostic multi-channel
  personalized speech enhancement
One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement
H. Taherian
Sefik Emre Eskimez
Takuya Yoshioka
Huaming Wang
Zhuo Chen
Xuedong Huang
30
21
0
20 Oct 2021
Personalized Speech Enhancement: New Models and Comprehensive Evaluation
Personalized Speech Enhancement: New Models and Comprehensive Evaluation
Sefik Emre Eskimez
Takuya Yoshioka
Huaming Wang
Xiaofei Wang
Zhuo Chen
Xuedong Huang
32
62
0
18 Oct 2021
Controllable Multichannel Speech Dereverberation based on Deep Neural
  Networks
Controllable Multichannel Speech Dereverberation based on Deep Neural Networks
Ziteng Wang
Yueyue Na
Biao Tian
Q. Fu
21
0
0
16 Oct 2021
USEV: Universal Speaker Extraction with Visual Cue
USEV: Universal Speaker Extraction with Visual Cue
Zexu Pan
Meng Ge
Haizhou Li
34
41
0
30 Sep 2021
12
Next