Listen Then See: Video Alignment with Speaker Attention

21 April 2024

Papers citing "Listen Then See: Video Alignment with Speaker Attention"

5 / 5 papers shown

Title
Long Video Understanding with Learnable Retrieval in Video-Language Models Jiaqi Xu Cuiling Lan Wenxuan Xie Xuejin Chen Yan Lu 107 7 0 24 Feb 2025
Social Genome: Grounded Social Reasoning Abilities of Multimodal Models Leena Mathur Marian Qian Paul Pu Liang Louis-Philippe Morency LRM 154 1 0 21 Feb 2025
Powerset multi-class cross entropy loss for neural speaker diarization Alexis Plaquet H. Bredin 106 91 0 19 Oct 2023
Learning Interactions and Relationships between Movie Characters Anna Kukleva Makarand Tapaswi Ivan Laptev 38 51 0 29 Mar 2020
pyannote.audio: neural building blocks for speaker diarization H. Bredin Ruiqing Yin Juan Manuel Coria G. Gelly Pavel Korshunov Marvin Lavechin D. Fustes Hadrien Titeux Wassim Bouaziz Marie-Philippe Gill 191 312 0 04 Nov 2019