AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary
Alignment for Temporal Referential Dialogue

AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue

24 March 2024

Papers citing "AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue"

9 / 9 papers shown

Title
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs Sanjoy Chowdhury Hanan Gani Nishit Anand Sayan Nag Ruohan Gao Mohamed Elhoseiny Salman Khan Dinesh Manocha LRM 54 0 0 29 Mar 2025
VTimeLLM: Empower LLM to Grasp Video Moments Bin Huang Xin Wang Hong Chen Zihan Song Wenwu Zhu MLLM 89 113 0 30 Nov 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection Bin Lin Yang Ye Bin Zhu Jiaxi Cui Munan Ning Peng Jin Li-ming Yuan VLM MLLM 194 591 0 16 Nov 2023
LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning Yunlong Tang Jinrui Zhang Xiangchen Wang Teng Wang Feng Zheng VLM 68 9 0 17 Jun 2023
Caption Anything: Interactive Image Description with Diverse Multimodal Controls Teng Wang Jinrui Zhang Junjie Fei Hao Zheng Yunlong Tang Zhe Li Mingqi Gao Shanshan Zhao MLLM 102 82 0 04 May 2023
ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System Junke Wang Dongdong Chen Chong Luo Xiyang Dai Lu Yuan Zuxuan Wu Yu-Gang Jiang 95 54 0 27 Apr 2023
Exploiting Context Information for Generic Event Boundary Captioning Jinrui Zhang Teng Wang Feng Zheng Ran Cheng Ping Luo 98 5 0 03 Jul 2022
Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning Jing Bi Jiebo Luo Chenliang Xu 76 48 0 05 Oct 2021
Generic Event Boundary Detection: A Benchmark for Event Segmentation Mike Zheng Shou Stan Weixian Lei Weiyao Wang Deepti Ghadiyaram Matt Feiszli VOS 85 76 0 26 Jan 2021