Enabling Real-Time Conversations with Minimal Training Costs

18 September 2024

Wang Xu

Shuo Wang

Weilin Zhao

Xu Han

Yukun Yan

Yudi Zhang

Zhe Tao

Zhiyuan Liu

Wanxiang Che

ArXiv PDF HTML

Abstract

Large language models (LLMs) have demonstrated the ability to improve human efficiency through conversational interactions. Conventional LLM-powered dialogue systems, operating on a turn-based paradigm, preclude real-time interaction during response generation. To address this limitation, researchers have proposed duplex models. These models can dynamically adapt to user input, facilitating real-time interactive feedback. However, these methods typically require substantial computational resources to acquire the ability. To reduce overhead, this paper presents a new duplex decoding approach that enhances LLMs with duplex ability, requiring minimal additional training. Specifically, our method employs parallel decoding of queries and responses in conversations, effectively implementing a channel-division-multiplexing decoding strategy. Experimental results indicate that our proposed method significantly enhances the naturalness and human-likeness of user-AI interactions with minimal training costs.

View on arXiv

Comments on this paper