Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.09713
Cited By
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
15 May 2024
Andong Wang
Bo Wu
Sunli Chen
Zhenfang Chen
Haotian Guan
Wei-Ning Lee
Li Erran Li
Chuang Gan
LRM
RALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge"
18 / 18 papers shown
Title
SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation
Hao Du
Bo Wu
Yan Lu
Zhendong Mao
27
0
0
08 Apr 2025
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
Xu Zheng
Ziqiao Weng
Yuanhuiyi Lyu
Lutao Jiang
Haiwei Xue
Bin Ren
Danda Pani Paudel
N. Sebe
Luc Van Gool
Xuming Hu
3DV
42
5
0
23 Mar 2025
What can Off-the-Shelves Large Multi-Modal Models do for Dynamic Scene Graph Generation?
Xuanming Cui
Jaiminkumar Ashokbhai Bhoi
Chionh Wei Peng
Adriel Kuek
Ser-Nam Lim
48
0
0
20 Mar 2025
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Thanh-Son Nguyen
Hong Yang
Tzeh Yuan Neoh
Hao Zhang
Ee Yeo Keat
Basura Fernando
NAI
59
0
0
19 Mar 2025
Exploring and Evaluating Multimodal Knowledge Reasoning Consistency of Multimodal Large Language Models
Boyu Jia
Junzhe Zhang
Huixuan Zhang
Xiaojun Wan
LRM
46
1
0
03 Mar 2025
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Andong Deng
Tongjia Chen
Shoubin Yu
Taojiannan Yang
Lincoln Spencer
Yapeng Tian
Ajmal Mian
Joey Tianyi Zhou
Chen Chen
LRM
68
1
0
15 Nov 2024
Mars: Situated Inductive Reasoning in an Open-World Environment
Xiaojuan Tang
Jiaqi Li
Yitao Liang
Song-chun Zhu
Muhan Zhang
Zilong Zheng
LM&Ro
LRM
LLMAG
34
1
0
10 Oct 2024
A Survey on Multimodal Benchmarks: In the Era of Large AI Models
Lin Li
Guikun Chen
Hanrong Shi
Jun Xiao
Long Chen
42
9
0
21 Sep 2024
"Is This It?": Towards Ecologically Valid Benchmarks for Situated Collaboration
D. Bohus
Sean Andrist
Yuwei Bao
Eric Horvitz
Ann Paradiso
35
0
0
30 Aug 2024
A Survey on Evaluation of Multimodal Large Language Models
Jiaxing Huang
Jingyi Zhang
LM&MA
ELM
LRM
50
20
0
28 Aug 2024
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer
Lu Zhang
Tiancheng Zhao
Heting Ying
Yibo Ma
Kyusong Lee
LLMAG
38
9
0
24 Jun 2024
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Ziyang Wang
Shoubin Yu
Elias Stengel-Eskin
Jaehong Yoon
Feng Cheng
Gedas Bertasius
Mohit Bansal
54
57
0
29 May 2024
ContPhy: Continuum Physical Concept Learning and Reasoning from Videos
Zhicheng Zheng
Xin Yan
Zhenfang Chen
Jingzhou Wang
Qin Zhi Eddie Lim
Joshua B. Tenenbaum
Chuang Gan
LRM
43
6
0
09 Feb 2024
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Shoubin Yu
Jaehong Yoon
Mohit Bansal
79
4
0
08 Feb 2024
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Ping Luo
Jiebo Luo
Chenliang Xu
VLM
54
84
0
29 Dec 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
322
3,021
0
22 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
287
4,261
0
30 Jan 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
339
12,003
0
04 Mar 2022
1