Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.16620
Cited By
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer
24 June 2024
Lu Zhang
Tiancheng Zhao
Heting Ying
Yibo Ma
Kyusong Lee
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer"
11 / 11 papers shown
Title
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding
Weiyu Guo
Ziyang Chen
Shaoguang Wang
Jianxiang He
Yijie Xu
Jinhui Ye
Ying Sun
Hui Xiong
79
4
0
17 Mar 2025
Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?
Tianyuan Qu
Longxiang Tang
Bohao Peng
Senqiao Yang
Bei Yu
Jiaya Jia
VLM
439
2
0
16 Mar 2025
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Ziyang Wang
Shoubin Yu
Elias Stengel-Eskin
Jaehong Yoon
Feng Cheng
Gedas Bertasius
Mohit Bansal
125
67
0
29 May 2024
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
Yue Fan
Xiaojian Ma
Rujie Wu
Yuntao Du
Jiaqi Li
Zhi Gao
Qing Li
VLM
LLMAG
94
64
0
18 Mar 2024
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Chenliang Xu
Jiebo Luo
Chenliang Xu
VLM
117
95
0
29 Dec 2023
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
Tu Vu
Mohit Iyyer
Xuezhi Wang
Noah Constant
Jerry W. Wei
...
Chris Tar
Yun-hsuan Sung
Denny Zhou
Quoc Le
Thang Luong
KELM
HILM
LRM
97
216
0
05 Oct 2023
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
E. Azarnasab
Faisal Ahmed
Zicheng Liu
Ce Liu
Michael Zeng
Lijuan Wang
ReLM
KELM
LRM
99
386
0
20 Mar 2023
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELM
AIMat
ReCod
ALM
195
1,954
0
16 Aug 2021
Multimodal Hierarchical Reinforcement Learning Policy for Task-Oriented Visual Dialog
Jiaping Zhang
Tiancheng Zhao
Zhou Yu
46
40
0
08 May 2018
Reading Wikipedia to Answer Open-Domain Questions
Danqi Chen
Adam Fisch
Jason Weston
Antoine Bordes
RALM
112
2,015
0
31 Mar 2017
DialPort: Connecting the Spoken Dialog Research Community to Real User Data
Tiancheng Zhao
Kyusong Lee
M. Eskénazi
84
22
0
08 Jun 2016
1