Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.00486
Cited By
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
1 April 2022
Yuxuan Wang
Difei Gao
Licheng Yu
Stan Weixian Lei
Matt Feiszli
Mike Zheng Shou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval"
19 / 19 papers shown
Title
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Lucas Ventura
Antoine Yang
Cordelia Schmid
Gül Varol
55
0
0
31 Mar 2025
HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
Xiao Wang
Jingyun Hua
Weihong Lin
Yize Zhang
Fuzheng Zhang
Jianlong Wu
Di Zhang
Liqiang Nie
VLM
90
0
0
28 Feb 2025
Leveraging ChatGPT for Sponsored Ad Detection and Keyword Extraction in YouTube Videos
Brice Valentin Kok-Shun
Johnny Chan
DeLMO
66
0
0
24 Feb 2025
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Yilun Zhao
Lujing Xie
Haowei Zhang
Guo Gan
Yitao Long
...
Xiangru Tang
Zhenwen Liang
Yongxu Liu
Chen Zhao
Arman Cohan
66
5
0
21 Jan 2025
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
45
0
0
09 Aug 2024
Comparison Visual Instruction Tuning
Wei Lin
M. Jehanzeb Mirza
Sivan Doveh
Rogerio Feris
Raja Giryes
Sepp Hochreiter
Leonid Karlinsky
54
4
0
13 Jun 2024
Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View
Jin Wang
Shichao Dong
Yapeng Zhu
Kelu Yao
Weidong Zhao
Chao Li
Ping Luo
CoGe
LRM
55
2
0
27 May 2024
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Hang Hua
Yunlong Tang
Chenliang Xu
Jiebo Luo
VGen
68
25
0
18 Apr 2024
AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue
Yunlong Tang
Daiki Shimada
Jing Bi
Chenliang Xu
VGen
47
11
0
24 Mar 2024
What's in the Flow? Exploiting Temporal Motion Cues for Unsupervised Generic Event Boundary Detection
Sourabh Vasant Gothe
Vibhav Agarwal
Sourav Ghosh
Jayesh Rajkumar Vachhani
Pranay Kashyap
Barath Raj Kandur
41
2
0
15 Feb 2024
ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition
Jiaming Zhou
Junwei Liang
Kun-Yu Lin
Jinrui Yang
Wei-Shi Zheng
VLM
26
8
0
22 Jan 2024
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Ping Luo
Jiebo Luo
Chenliang Xu
VLM
78
88
0
29 Dec 2023
Human-centric Behavior Description in Videos: New Benchmark and Model
Lingru Zhou
Yi-Meng Gao
Manqing Zhang
Peng Wu
Peng Wang
Yanning Zhang
40
1
0
04 Oct 2023
VidChapters-7M: Video Chapters at Scale
Antoine Yang
Arsha Nagrani
Ivan Laptev
Josef Sivic
Cordelia Schmid
VGen
32
26
0
25 Sep 2023
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Difei Gao
Lei Ji
Luowei Zhou
Kevin Lin
Joya Chen
Zihan Fan
Mike Zheng Shou
MLLM
40
72
0
14 Jun 2023
EDIS: Entity-Driven Image Search over Multimodal Web Content
Siqi Liu
Weixi Feng
Tsu-Jui Fu
Wenhu Chen
Wenjie Wang
VLM
48
10
0
23 May 2023
Equivariant Similarity for Vision-Language Foundation Models
Tan Wang
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Zhengyuan Yang
Hanwang Zhang
Zicheng Liu
Lijuan Wang
CoGe
49
45
0
25 Mar 2023
A Straightforward Framework For Video Retrieval Using CLIP
Jesús Andrés Portillo-Quintero
J. C. Ortíz-Bayliss
Hugo Terashima-Marín
CLIP
324
117
0
24 Feb 2021
Generic Event Boundary Detection: A Benchmark for Event Segmentation
Mike Zheng Shou
Stan Weixian Lei
Weiyao Wang
Deepti Ghadiyaram
Matt Feiszli
VOS
95
76
0
26 Jan 2021
1