Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2003.12058
Cited By
Grounded Situation Recognition
26 March 2020
Sarah M Pratt
Mark Yatskar
Luca Weihs
Ali Farhadi
Aniruddha Kembhavi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Grounded Situation Recognition"
22 / 22 papers shown
Title
NOTA: Multimodal Music Notation Understanding for Visual Large Language Model
Mingni Tang
Jiajia Li
Lu Yang
Zhiqiang Zhang
Jinghao Tian
Zehan Li
Lefei Zhang
Peijie Wang
56
0
0
17 Feb 2025
Dynamic Scene Understanding from Vision-Language Representations
Shahaf Pruss
Morris Alper
Hadar Averbuch-Elor
OCL
221
0
0
20 Jan 2025
Situational Scene Graph for Structured Human-centric Situation Understanding
Chinthani Sugandhika
Chen Li
Deepu Rajan
Basura Fernando
209
1
0
30 Oct 2024
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Dhruv Verma
Debaditya Roy
Basura Fernando
32
1
0
30 Jul 2024
Open-World Human-Object Interaction Detection via Multi-modal Prompts
Jie-jin Yang
Bingliang Li
Ailing Zeng
L. Zhang
Ruimao Zhang
VLM
32
8
0
11 Jun 2024
GenEARL: A Training-Free Generative Framework for Multimodal Event Argument Role Labeling
Hritik Bansal
Po-Nien Kung
P. Brantingham
Weisheng Wang
Miao Zheng
VLM
34
1
0
07 Apr 2024
In Defense of Structural Symbolic Representation for Video Event-Relation Prediction
Andrew Lu
Xudong Lin
Yulei Niu
Shih-Fu Chang
32
2
0
06 Jan 2023
VASR: Visual Analogies of Situation Recognition
Yonatan Bitton
Ron Yosef
Eli Strugo
Dafna Shahaf
Roy Schwartz
Gabriel Stanovsky
25
21
0
08 Dec 2022
Teaching Structured Vision&Language Concepts to Vision&Language Models
Sivan Doveh
Assaf Arbelle
Sivan Harary
Yikang Shen
Roei Herzig
...
Donghyun Kim
Raja Giryes
Rogerio Feris
S. Ullman
Leonid Karlinsky
VLM
CoGe
56
70
0
21 Nov 2022
Multi-VQG: Generating Engaging Questions for Multiple Images
Min-Hsuan Yeh
Vicent Chen
Ting-Hao Haung
Lun-Wei Ku
CoGe
18
7
0
14 Nov 2022
Video Event Extraction via Tracking Visual States of Arguments
Guang Yang
Manling Li
Jiajie Zhang
Xudong Lin
Shih-Fu Chang
Heng Ji
32
9
0
03 Nov 2022
Grounded Video Situation Recognition
Zeeshan Khan
C. V. Jawahar
Makarand Tapaswi
37
13
0
19 Oct 2022
GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement
Zhi-Qi Cheng
Qianwen Dai
Siyao Li
Teruko Mitamura
Alexander G. Hauptmann
16
34
0
18 Aug 2022
VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations
Tiancheng Zhao
Tianqi Zhang
Mingwei Zhu
Haozhan Shen
Kyusong Lee
Xiaopeng Lu
Jianwei Yin
VLM
CoGe
MLLM
45
91
0
01 Jul 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjD
VLM
MLLM
74
393
0
17 Jun 2022
Beyond Grounding: Extracting Fine-Grained Event Hierarchies Across Modalities
Hammad A. Ayyubi
Christopher Thomas
Lovish Chum
R. Lokesh
Long Chen
...
Xudong Lin
Xuande Feng
Jaywon Koo
Sounak Ray
Shih-Fu Chang
AI4TS
31
0
0
14 Jun 2022
Detecting the Role of an Entity in Harmful Memes: Techniques and Their Limitations
R. N. Nandi
Firoj Alam
Preslav Nakov
22
6
0
09 May 2022
Collaborative Transformers for Grounded Situation Recognition
Junhyeong Cho
Youngseok Yoon
Suha Kwak
ViT
27
25
0
30 Mar 2022
Multi-Modal Knowledge Graph Construction and Application: A Survey
Xiangru Zhu
Zhixu Li
Xiaodan Wang
Xueyao Jiang
Penglei Sun
Xuwu Wang
Yanghua Xiao
N. Yuan
33
154
0
11 Feb 2022
CLIP-Event: Connecting Text and Images with Event Structures
Manling Li
Ruochen Xu
Shuohang Wang
Luowei Zhou
Xudong Lin
Chenguang Zhu
Michael Zeng
Heng Ji
Shih-Fu Chang
VLM
CLIP
27
123
0
13 Jan 2022
Multilevel profiling of situation and dialogue-based deep networks for movie genre classification using movie trailers
Dinesh Kumar Vishwakarma
Mayank Jindal
Ayush Mittal
Aditya Sharma
6
5
0
14 Sep 2021
Human-like Controllable Image Captioning with Verb-specific Semantic Roles
Long Chen
Zhihong Jiang
Jun Xiao
Wei Liu
30
74
0
22 Mar 2021
1