3M-TRANSFORMER: A Multi-Stage Multi-Stream Multimodal Transformer for Embodied Turn-Taking Prediction

23 October 2023

Papers citing "3M-TRANSFORMER: A Multi-Stage Multi-Stream Multimodal Transformer for Embodied Turn-Taking Prediction"

3 / 3 papers shown

Title
EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild Junhyeok Kim Min Soo Kim Jiwan Chung Jungbin Cho Jisoo Kim Sungwoong Kim Gyeongbo Sim Youngjae Yu EgoV 60 0 0 17 Feb 2025
Is Cross-Attention Preferable to Self-Attention for Multi-Modal Emotion Recognition? Vandana Rajan Alessio Brutti Andrea Cavallaro 35 33 0 18 Feb 2022
Is Space-Time Attention All You Need for Video Understanding? Gedas Bertasius Heng Wang Lorenzo Torresani ViT 283 1,984 0 09 Feb 2021