ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.12654
  4. Cited By
Predicting Turn-Taking and Backchannel in Human-Machine Conversations Using Linguistic, Acoustic, and Visual Signals
v1v2 (latest)

Predicting Turn-Taking and Backchannel in Human-Machine Conversations Using Linguistic, Acoustic, and Visual Signals

19 May 2025
Yuxin Lin
Yinglin Zheng
Ming Zeng
Wangzheng Shi
ArXiv (abs)PDFHTML

Papers citing "Predicting Turn-Taking and Backchannel in Human-Machine Conversations Using Linguistic, Acoustic, and Visual Signals"

21 / 21 papers shown
Title
Beyond Turn-taking: Introducing Text-based Overlap into Human-LLM Interactions
Beyond Turn-taking: Introducing Text-based Overlap into Human-LLM Interactions
JiWoo Kim
Minsuk Chang
Jinyeong Bak
133
4
0
30 Jan 2025
Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue
  Agents
Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents
Bandhav Veluri
Benjamin Peloquin
Bokai Yu
Hongyu Gong
Shyamnath Gollakota
AuLLMOffRL
115
19
0
23 Sep 2024
Moshi: a speech-text foundation model for real-time dialogue
Moshi: a speech-text foundation model for real-time dialogue
Alexandre Défossez
Laurent Mazaré
Manu Orsini
Amélie Royer
P. Pérez
Hervé Jégou
Edouard Grave
Neil Zeghidour
AuLLM
163
150
0
17 Sep 2024
Turn-taking and Backchannel Prediction with Acoustic and Large Language
  Model Fusion
Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion
Jinhan Wang
Long Chen
Aparna Khare
A. Raju
Pranav Dheram
Di He
Minhua Wu
A. Stolcke
Venkatesh Ravichandran
51
11
0
26 Jan 2024
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Max Bain
Jaesung Huh
Tengda Han
Andrew Zisserman
143
242
0
01 Mar 2023
Wespeaker: A Research and Production oriented Speaker Embedding Learning
  Toolkit
Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit
Hongji Wang
Che-Yuan Liang
Shuai Wang
Zhengyang Chen
Binbin Zhang
Xu Xiang
Yan Deng
Y. Qian
111
127
0
31 Oct 2022
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
Jingyi Li
Weiping Tu
Li Xiao
128
113
0
27 Oct 2022
Turn-Taking Prediction for Natural Conversational Speech
Turn-Taking Prediction for Natural Conversational Speech
Shuo-yiin Chang
Yue Liu
Tara N. Sainath
Chaoyang Zhang
Trevor Strohman
Qiao Liang
Yanzhang He
79
21
0
29 Aug 2022
Voice Activity Projection: Self-supervised Learning of Turn-taking
  Events
Voice Activity Projection: Self-supervised Learning of Turn-taking Events
Erik Ekstedt
Gabriel Skantze
59
40
0
19 May 2022
Gated Multimodal Fusion with Contrastive Learning for Turn-taking
  Prediction in Human-robot Dialogue
Gated Multimodal Fusion with Contrastive Learning for Turn-taking Prediction in Human-robot Dialogue
Jiudong Yang
Pei-Hsin Wang
Yi Zhu
Mingchao Feng
Meng Chen
Xiaodong He
61
16
0
18 Apr 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for
  Self-Supervised Video Pre-Training
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
252
1,219
0
23 Mar 2022
Is Someone Speaking? Exploring Long-term Temporal Features for
  Audio-visual Active Speaker Detection
Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection
Ruijie Tao
Zexu Pan
Rohan Kumar Das
Xinyuan Qian
Mike Zheng Shou
Haizhou Li
91
181
0
14 Jul 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked
  Prediction of Hidden Units
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
190
3,013
0
14 Jun 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
738
41,796
0
22 Oct 2020
TurnGPT: a Transformer-based Language Model for Predicting Turn-taking
  in Spoken Dialog
TurnGPT: a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog
Erik Ekstedt
Gabriel Skantze
91
59
0
21 Oct 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech
  Representations
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
323
5,868
0
20 Jun 2020
RetinaFace: Single-stage Dense Face Localisation in the Wild
RetinaFace: Single-stage Dense Face Localisation in the Wild
Jiankang Deng
Jiaxin Guo
Yuxiang Zhou
Jinke Yu
I. Kotsia
Stefanos Zafeiriou
CVBM3DH
119
602
0
02 May 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.9K
95,531
0
11 Oct 2018
Efficient Low-rank Multimodal Fusion with Modality-Specific Factors
Efficient Low-rank Multimodal Fusion with Modality-Specific Factors
Zhun Liu
Ying Shen
V. Lakshminarasimhan
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
82
651
0
31 May 2018
DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset
DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset
Yanran Li
Hui Su
Xiaoyu Shen
Wenjie Li
Ziqiang Cao
Shuzi Niu
115
1,308
0
11 Oct 2017
Tensor Fusion Network for Multimodal Sentiment Analysis
Tensor Fusion Network for Multimodal Sentiment Analysis
Amir Zadeh
Minghai Chen
Soujanya Poria
Min Zhang
Louis-Philippe Morency
92
1,238
0
23 Jul 2017
1