VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

29 May 2024

Elias Stengel-Eskin

Gedas Bertasius

Mohit Bansal

Papers citing "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"

4 / 54 papers shown

Title
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection Bin Lin Yang Ye Bin Zhu Jiaxi Cui Munan Ning Peng Jin Li-ming Yuan VLM MLLM 194 591 0 16 Nov 2023
Unified Coarse-to-Fine Alignment for Video-Text Retrieval Ziyang Wang Yi-Lin Sung Feng Cheng Gedas Bertasius Joey Tianyi Zhou 98 44 0 18 Sep 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models Junnan Li Dongxu Li Silvio Savarese Steven C. H. Hoi VLM MLLM 270 4,244 0 30 Jan 2023
Is Space-Time Attention All You Need for Video Understanding? Gedas Bertasius Heng Wang Lorenzo Torresani ViT 280 1,982 0 09 Feb 2021