VideoQA in the Era of LLMs: An Empirical Study

VideoQA in the Era of LLMs: An Empirical Study

8 August 2024

Papers citing "VideoQA in the Era of LLMs: An Empirical Study"

13 / 13 papers shown

Title
Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks Mohammad Saleha Azadeh Tabatabaeib 52 0 0 14 Apr 2025
MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection Yibo Yan Shen Wang Jiahao Huo Philip S. Yu Xuming Hu Qingsong Wen 140 1 0 23 Mar 2025
On the Consistency of Video Large Language Models in Temporal Comprehension Minjoon Jung Junbin Xiao Byoung-Tak Zhang Angela Yao 87 2 0 20 Nov 2024
Question-Answering Dense Video Events Hangyu Qin Junbin Xiao Angela Yao VLM 73 1 0 06 Sep 2024
Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model Benlin Liu Yuhao Dong Yiqin Wang Yongming Rao Yansong Tang Wei-Chiu Ma Ranjay Krishna LRM 16 4 0 01 Aug 2024
Hallucination of Multimodal Large Language Models: A Survey Zechen Bai Pichao Wang Tianjun Xiao Tong He Zongbo Han Zheng Zhang Mike Zheng Shou VLM LRM 95 139 0 29 Apr 2024
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM Wonkyun Kim Changin Choi Wonseok Lee Wonjong Rhee VLM 47 51 0 27 Mar 2024
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding Yue Fan Xiaojian Ma Rujie Wu Yuntao Du Jiaqi Li Zhi Gao Qing Li VLM LLMAG 46 55 0 18 Mar 2024
VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models Shicheng Li Lei Li Shuhuai Ren Yuanxin Liu Yi Liu Rundong Gao Xu Sun Lu Hou 36 29 0 29 Nov 2023
Perception Test: A Diagnostic Benchmark for Multimodal Video Models Viorica Puatruaucean Lucas Smaira Ankush Gupta Adrià Recasens Continente L. Markeeva ... Y. Aytar Simon Osindero Dima Damen Andrew Zisserman João Carreira VLM 135 140 0 23 May 2023
Test of Time: Instilling Video-Language Models with a Sense of Time Piyush Bagad Makarand Tapaswi Cees G. M. Snoek 83 36 0 05 Jan 2023
Large Language Models are Zero-Shot Reasoners Takeshi Kojima S. Gu Machel Reid Yutaka Matsuo Yusuke Iwasawa ReLM LRM 328 4,077 0 24 May 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 244 1,024 0 13 Oct 2021