ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.00486
  4. Cited By
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval

GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval

1 April 2022
Yuxuan Wang
Difei Gao
Licheng Yu
Stan Weixian Lei
Matt Feiszli
Mike Zheng Shou
ArXivPDFHTML

Papers citing "GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval"

19 / 19 papers shown
Title
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Lucas Ventura
Antoine Yang
Cordelia Schmid
Gül Varol
55
0
0
31 Mar 2025
HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
Xiao Wang
Jingyun Hua
Weihong Lin
Yize Zhang
Fuzheng Zhang
Jianlong Wu
Di Zhang
Liqiang Nie
VLM
90
0
0
28 Feb 2025
Leveraging ChatGPT for Sponsored Ad Detection and Keyword Extraction in YouTube Videos
Leveraging ChatGPT for Sponsored Ad Detection and Keyword Extraction in YouTube Videos
Brice Valentin Kok-Shun
Johnny Chan
DeLMO
66
0
0
24 Feb 2025
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Yilun Zhao
Lujing Xie
Haowei Zhang
Guo Gan
Yitao Long
...
Xiangru Tang
Zhenwen Liang
Yongxu Liu
Chen Zhao
Arman Cohan
66
5
0
21 Jan 2025
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
45
0
0
09 Aug 2024
Comparison Visual Instruction Tuning
Comparison Visual Instruction Tuning
Wei Lin
M. Jehanzeb Mirza
Sivan Doveh
Rogerio Feris
Raja Giryes
Sepp Hochreiter
Leonid Karlinsky
54
4
0
13 Jun 2024
Diagnosing the Compositional Knowledge of Vision Language Models from a
  Game-Theoretic View
Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View
Jin Wang
Shichao Dong
Yapeng Zhu
Kelu Yao
Weidong Zhao
Chao Li
Ping Luo
CoGe
LRM
55
2
0
27 May 2024
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt
  Instruction Tuning
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Hang Hua
Yunlong Tang
Chenliang Xu
Jiebo Luo
VGen
68
25
0
18 Apr 2024
AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary
  Alignment for Temporal Referential Dialogue
AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue
Yunlong Tang
Daiki Shimada
Jing Bi
Chenliang Xu
VGen
47
11
0
24 Mar 2024
What's in the Flow? Exploiting Temporal Motion Cues for Unsupervised
  Generic Event Boundary Detection
What's in the Flow? Exploiting Temporal Motion Cues for Unsupervised Generic Event Boundary Detection
Sourabh Vasant Gothe
Vibhav Agarwal
Sourav Ghosh
Jayesh Rajkumar Vachhani
Pranay Kashyap
Barath Raj Kandur
41
2
0
15 Feb 2024
ActionHub: A Large-scale Action Video Description Dataset for Zero-shot
  Action Recognition
ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition
Jiaming Zhou
Junwei Liang
Kun-Yu Lin
Jinrui Yang
Wei-Shi Zheng
VLM
26
8
0
22 Jan 2024
Video Understanding with Large Language Models: A Survey
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Ping Luo
Jiebo Luo
Chenliang Xu
VLM
78
88
0
29 Dec 2023
Human-centric Behavior Description in Videos: New Benchmark and Model
Human-centric Behavior Description in Videos: New Benchmark and Model
Lingru Zhou
Yi-Meng Gao
Manqing Zhang
Peng Wu
Peng Wang
Yanning Zhang
40
1
0
04 Oct 2023
VidChapters-7M: Video Chapters at Scale
VidChapters-7M: Video Chapters at Scale
Antoine Yang
Arsha Nagrani
Ivan Laptev
Josef Sivic
Cordelia Schmid
VGen
32
26
0
25 Sep 2023
AssistGPT: A General Multi-modal Assistant that can Plan, Execute,
  Inspect, and Learn
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Difei Gao
Lei Ji
Luowei Zhou
Kevin Lin
Joya Chen
Zihan Fan
Mike Zheng Shou
MLLM
40
72
0
14 Jun 2023
EDIS: Entity-Driven Image Search over Multimodal Web Content
EDIS: Entity-Driven Image Search over Multimodal Web Content
Siqi Liu
Weixi Feng
Tsu-Jui Fu
Wenhu Chen
Wenjie Wang
VLM
48
10
0
23 May 2023
Equivariant Similarity for Vision-Language Foundation Models
Equivariant Similarity for Vision-Language Foundation Models
Tan Wang
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Zhengyuan Yang
Hanwang Zhang
Zicheng Liu
Lijuan Wang
CoGe
49
45
0
25 Mar 2023
A Straightforward Framework For Video Retrieval Using CLIP
A Straightforward Framework For Video Retrieval Using CLIP
Jesús Andrés Portillo-Quintero
J. C. Ortíz-Bayliss
Hugo Terashima-Marín
CLIP
324
117
0
24 Feb 2021
Generic Event Boundary Detection: A Benchmark for Event Segmentation
Generic Event Boundary Detection: A Benchmark for Event Segmentation
Mike Zheng Shou
Stan Weixian Lei
Weiyao Wang
Deepti Ghadiyaram
Matt Feiszli
VOS
95
76
0
26 Jan 2021
1