ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.12058
  4. Cited By
Grounded Situation Recognition

Grounded Situation Recognition

26 March 2020
Sarah M Pratt
Mark Yatskar
Luca Weihs
Ali Farhadi
Aniruddha Kembhavi
ArXivPDFHTML

Papers citing "Grounded Situation Recognition"

22 / 22 papers shown
Title
NOTA: Multimodal Music Notation Understanding for Visual Large Language Model
NOTA: Multimodal Music Notation Understanding for Visual Large Language Model
Mingni Tang
Jiajia Li
Lu Yang
Zhiqiang Zhang
Jinghao Tian
Zehan Li
Lefei Zhang
Peijie Wang
56
0
0
17 Feb 2025
Dynamic Scene Understanding from Vision-Language Representations
Dynamic Scene Understanding from Vision-Language Representations
Shahaf Pruss
Morris Alper
Hadar Averbuch-Elor
OCL
221
0
0
20 Jan 2025
Situational Scene Graph for Structured Human-centric Situation Understanding
Situational Scene Graph for Structured Human-centric Situation Understanding
Chinthani Sugandhika
Chen Li
Deepu Rajan
Basura Fernando
209
1
0
30 Oct 2024
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Dhruv Verma
Debaditya Roy
Basura Fernando
32
1
0
30 Jul 2024
Open-World Human-Object Interaction Detection via Multi-modal Prompts
Open-World Human-Object Interaction Detection via Multi-modal Prompts
Jie-jin Yang
Bingliang Li
Ailing Zeng
L. Zhang
Ruimao Zhang
VLM
32
8
0
11 Jun 2024
GenEARL: A Training-Free Generative Framework for Multimodal Event
  Argument Role Labeling
GenEARL: A Training-Free Generative Framework for Multimodal Event Argument Role Labeling
Hritik Bansal
Po-Nien Kung
P. Brantingham
Weisheng Wang
Miao Zheng
VLM
34
1
0
07 Apr 2024
In Defense of Structural Symbolic Representation for Video
  Event-Relation Prediction
In Defense of Structural Symbolic Representation for Video Event-Relation Prediction
Andrew Lu
Xudong Lin
Yulei Niu
Shih-Fu Chang
32
2
0
06 Jan 2023
VASR: Visual Analogies of Situation Recognition
VASR: Visual Analogies of Situation Recognition
Yonatan Bitton
Ron Yosef
Eli Strugo
Dafna Shahaf
Roy Schwartz
Gabriel Stanovsky
25
21
0
08 Dec 2022
Teaching Structured Vision&Language Concepts to Vision&Language Models
Teaching Structured Vision&Language Concepts to Vision&Language Models
Sivan Doveh
Assaf Arbelle
Sivan Harary
Yikang Shen
Roei Herzig
...
Donghyun Kim
Raja Giryes
Rogerio Feris
S. Ullman
Leonid Karlinsky
VLM
CoGe
56
70
0
21 Nov 2022
Multi-VQG: Generating Engaging Questions for Multiple Images
Multi-VQG: Generating Engaging Questions for Multiple Images
Min-Hsuan Yeh
Vicent Chen
Ting-Hao Haung
Lun-Wei Ku
CoGe
18
7
0
14 Nov 2022
Video Event Extraction via Tracking Visual States of Arguments
Video Event Extraction via Tracking Visual States of Arguments
Guang Yang
Manling Li
Jiajie Zhang
Xudong Lin
Shih-Fu Chang
Heng Ji
32
9
0
03 Nov 2022
Grounded Video Situation Recognition
Grounded Video Situation Recognition
Zeeshan Khan
C. V. Jawahar
Makarand Tapaswi
37
13
0
19 Oct 2022
GSRFormer: Grounded Situation Recognition Transformer with Alternate
  Semantic Attention Refinement
GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement
Zhi-Qi Cheng
Qianwen Dai
Siyao Li
Teruko Mitamura
Alexander G. Hauptmann
16
34
0
18 Aug 2022
VL-CheckList: Evaluating Pre-trained Vision-Language Models with
  Objects, Attributes and Relations
VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations
Tiancheng Zhao
Tianqi Zhang
Mingwei Zhu
Haozhan Shen
Kyusong Lee
Xiaopeng Lu
Jianwei Yin
VLM
CoGe
MLLM
45
91
0
01 Jul 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjD
VLM
MLLM
74
393
0
17 Jun 2022
Beyond Grounding: Extracting Fine-Grained Event Hierarchies Across
  Modalities
Beyond Grounding: Extracting Fine-Grained Event Hierarchies Across Modalities
Hammad A. Ayyubi
Christopher Thomas
Lovish Chum
R. Lokesh
Long Chen
...
Xudong Lin
Xuande Feng
Jaywon Koo
Sounak Ray
Shih-Fu Chang
AI4TS
31
0
0
14 Jun 2022
Detecting the Role of an Entity in Harmful Memes: Techniques and Their
  Limitations
Detecting the Role of an Entity in Harmful Memes: Techniques and Their Limitations
R. N. Nandi
Firoj Alam
Preslav Nakov
22
6
0
09 May 2022
Collaborative Transformers for Grounded Situation Recognition
Collaborative Transformers for Grounded Situation Recognition
Junhyeong Cho
Youngseok Yoon
Suha Kwak
ViT
27
25
0
30 Mar 2022
Multi-Modal Knowledge Graph Construction and Application: A Survey
Multi-Modal Knowledge Graph Construction and Application: A Survey
Xiangru Zhu
Zhixu Li
Xiaodan Wang
Xueyao Jiang
Penglei Sun
Xuwu Wang
Yanghua Xiao
N. Yuan
33
154
0
11 Feb 2022
CLIP-Event: Connecting Text and Images with Event Structures
CLIP-Event: Connecting Text and Images with Event Structures
Manling Li
Ruochen Xu
Shuohang Wang
Luowei Zhou
Xudong Lin
Chenguang Zhu
Michael Zeng
Heng Ji
Shih-Fu Chang
VLM
CLIP
27
123
0
13 Jan 2022
Multilevel profiling of situation and dialogue-based deep networks for
  movie genre classification using movie trailers
Multilevel profiling of situation and dialogue-based deep networks for movie genre classification using movie trailers
Dinesh Kumar Vishwakarma
Mayank Jindal
Ayush Mittal
Aditya Sharma
6
5
0
14 Sep 2021
Human-like Controllable Image Captioning with Verb-specific Semantic
  Roles
Human-like Controllable Image Captioning with Verb-specific Semantic Roles
Long Chen
Zhihong Jiang
Jun Xiao
Wei Liu
30
74
0
22 Mar 2021
1