ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.14433
25
0

Single-Channel Target Speech Extraction Utilizing Distance and Room Clues

20 May 2025
Runwu Shi
Zirui Lin
Benjamin Yen
Jiang Wang
Ragib Amin Nihal
Kazuhiro Nakadai
    3DV
ArXivPDFHTML
Abstract

This paper aims to achieve single-channel target speech extraction (TSE) in enclosures utilizing distance clues and room information. Recent works have verified the feasibility of distance clues for the TSE task, which can imply the sound source's direct-to-reverberation ratio (DRR) and thus can be utilized for speech separation and TSE systems. However, such distance clue is significantly influenced by the room's acoustic characteristics, such as dimension and reverberation time, making it challenging for TSE systems that rely solely on distance clues to generalize across a variety of different rooms. To solve this, we suggest providing room environmental information (room dimensions and reverberation time) for distance-based TSE for better generalization capabilities. Especially, we propose a distance and environment-based TSE model in the time-frequency (TF) domain with learnable distance and room embedding. Results on both simulated and real collected datasets demonstrate its feasibility. Demonstration materials are available atthis https URL.

View on arXiv
@article{shi2025_2505.14433,
  title={ Single-Channel Target Speech Extraction Utilizing Distance and Room Clues },
  author={ Runwu Shi and Zirui Lin and Benjamin Yen and Jiang Wang and Ragib Amin Nihal and Kazuhiro Nakadai },
  journal={arXiv preprint arXiv:2505.14433},
  year={ 2025 }
}
Comments on this paper