ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.20899
22
0

Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing

27 May 2025
Jeongsoo Choi
Jaehun Kim
Joon Son Chung
ArXiv (abs)PDFHTML
Main:8 Pages
3 Figures
Bibliography:3 Pages
8 Tables
Abstract

This paper introduces a cross-lingual dubbing system that translates speech from one language to another while preserving key characteristics such as duration, speaker identity, and speaking speed. Despite the strong translation quality of existing speech translation approaches, they often overlook the transfer of speech patterns, leading to mismatches with source speech and limiting their suitability for dubbing applications. To address this, we propose a discrete diffusion-based speech-to-unit translation model with explicit duration control, enabling time-aligned translation. We then synthesize speech based on the predicted units and source identity with a conditional flow matching model. Additionally, we introduce a unit-based speed adaptation mechanism that guides the translation model to produce speech at a rate consistent with the source, without relying on any text. Extensive experiments demonstrate that our framework generates natural and fluent translations that align with the original speech's duration and speaking pace, while achieving competitive translation performance.

View on arXiv
@article{choi2025_2505.20899,
  title={ Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing },
  author={ Jeongsoo Choi and Jaehun Kim and Joon Son Chung },
  journal={arXiv preprint arXiv:2505.20899},
  year={ 2025 }
}
Comments on this paper