Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1811.10830
Cited By
From Recognition to Cognition: Visual Commonsense Reasoning
27 November 2018
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
LRM
BDL
OCL
ReLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"From Recognition to Cognition: Visual Commonsense Reasoning"
50 / 587 papers shown
Title
From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation
Jiaxin Ge
Sanjay Subramanian
Trevor Darrell
Boyi Li
LRM
32
4
0
21 Nov 2023
InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Xiaotian Han
Quanzeng You
Yongfei Liu
Wentao Chen
Huangjie Zheng
...
Yiqi Wang
Bohan Zhai
Jianbo Yuan
Heng Wang
Hongxia Yang
ReLM
LRM
ELM
97
9
0
20 Nov 2023
The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task
Yifan Wu
Pengchuan Zhang
Wenhan Xiong
Barlas Oğuz
James C. Gee
Yixin Nie
LRM
29
16
0
15 Nov 2023
GRASP: A novel benchmark for evaluating language GRounding And Situated Physics understanding in multimodal language models
Serwan Jassim
Mario S. Holubar
Annika Richter
Cornelius Wolff
Xenia Ohmer
Elia Bruni
ELM
24
9
0
15 Nov 2023
Using Natural Language Explanations to Improve Robustness of In-context Learning
Xuanli He
Yuxiang Wu
Oana-Maria Camburu
Pasquale Minervini
Pontus Stenetorp
AAML
31
1
0
13 Nov 2023
A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering
Yunxin Li
Longyue Wang
Baotian Hu
Xinyu Chen
Wanqi Zhong
Chenyang Lyu
Wei Wang
Min Zhang
ELM
32
21
0
13 Nov 2023
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models
.Ilker Kesen
Andrea Pedrotti
Mustafa Dogan
Michele Cafagna
Emre Can Acikgoz
...
Iacer Calixto
Anette Frank
Albert Gatt
Aykut Erdem
Erkut Erdem
41
15
0
13 Nov 2023
Visual Commonsense based Heterogeneous Graph Contrastive Learning
Zongzhao Li
Xiangyu Zhu
Xi Zhang
Zhaoxiang Zhang
Zhen Lei
24
1
0
11 Nov 2023
Towards A Unified Neural Architecture for Visual Recognition and Reasoning
Calvin Luo
Boqing Gong
Ting Chen
Chen Sun
OCL
ObjD
24
1
0
10 Nov 2023
Improving Vision-and-Language Reasoning via Spatial Relations Modeling
Cheng Yang
Rui Xu
Ye Guo
Peixiang Huang
Yiru Chen
Wenkui Ding
Zhongyuan Wang
Hong Zhou
LRM
23
5
0
09 Nov 2023
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Lei Huang
Weijiang Yu
Weitao Ma
Weihong Zhong
Zhangyin Feng
...
Qianglong Chen
Weihua Peng
Xiaocheng Feng
Bing Qin
Ting Liu
LRM
HILM
44
732
0
09 Nov 2023
Agent Lumos: Unified and Modular Training for Open-Source Language Agents
Da Yin
Faeze Brahman
Abhilasha Ravichander
Khyathi Raghavi Chandu
Kai-Wei Chang
Yejin Choi
Bill Yuchen Lin
LLMAG
43
36
0
09 Nov 2023
NExT-Chat: An LMM for Chat, Detection and Segmentation
Ao Zhang
Yuan Yao
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
MLLM
VLM
48
52
0
08 Nov 2023
ACQUIRED: A Dataset for Answering Counterfactual Questions In Real-Life Videos
Te-Lin Wu
Zi-Yi Dou
Qingyuan Hu
Yu Hou
Nischal Reddy Chandra
Marjorie Freedman
R. Weischedel
Nanyun Peng
39
5
0
02 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
46
36
0
01 Nov 2023
ROME: Evaluating Pre-trained Vision-Language Models on Reasoning beyond Visual Common Sense
Kankan Zhou
Eason Lai
Wei Bin Au Yeong
K. Mouratidis
Jing Jiang
ReLM
LRM
VLM
25
19
0
30 Oct 2023
Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting
Hejie Cui
Xinyu Fang
Zihan Zhang
Ran Xu
Xuan Kan
Xin Liu
Yue Yu
Manling Li
Yangqiu Song
Carl Yang
VLM
28
4
0
28 Oct 2023
Impressions: Understanding Visual Semiotics and Aesthetic Impact
Julia Kruk
Caleb Ziems
Diyi Yang
32
2
0
27 Oct 2023
V
D
\mathbb{VD}
VD
-
G
R
\mathbb{GR}
GR
: Boosting
V
\mathbb{V}
V
isual
D
\mathbb{D}
D
ialog with Cascaded Spatial-Temporal Multi-Modal
G
R
\mathbb{GR}
GR
aphs
Adnen Abdessaied
Lei Shi
Andreas Bulling
3DH
32
3
0
25 Oct 2023
GD-COMET: A Geo-Diverse Commonsense Inference Model
Mehar Bhatia
Vered Shwartz
24
6
0
23 Oct 2023
Large Language Models are Visual Reasoning Coordinators
Liangyu Chen
Bo Li
Sheng Shen
Jingkang Yang
Chunyuan Li
Kurt Keutzer
Trevor Darrell
Ziwei Liu
VLM
LRM
41
50
0
23 Oct 2023
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Zhecan Wang
Long Chen
Haoxuan You
Keyang Xu
Yicheng He
Wenhao Li
Noal Codella
Kai-Wei Chang
Shih-Fu Chang
30
3
0
23 Oct 2023
ITEm: Unsupervised Image-Text Embedding Learning for eCommerce
Baohao Liao
Michael Kozielski
Sanjika Hewavitharana
Jiangbo Yuan
Shahram Khadivi
Tomer Lancewicki
SSL
18
0
0
22 Oct 2023
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models
Yanyang Guo
Fangkai Jiao
Zhiqi Shen
Liqiang Nie
Mohan S. Kankanhalli
MLLM
30
5
0
17 Oct 2023
Automated Natural Language Explanation of Deep Visual Neurons with Large Models
Chenxu Zhao
Wei Qian
Yucheng Shi
Mengdi Huai
Ninghao Liu
32
2
0
16 Oct 2023
Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms
Seungju Han
Junhyeok Kim
Jack Hessel
Liwei Jiang
Jiwan Chung
Yejin Son
Yejin Choi
Youngjae Yu
21
2
0
16 Oct 2023
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Keli Zhang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TS
SyDa
35
118
0
16 Oct 2023
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
Dongsheng Jiang
Yuchen Liu
Songlin Liu
Jiné Zhao
Hao Zhang
Zhen Gao
Xiaopeng Zhang
Jin Li
Hongkai Xiong
MLLM
VLM
38
34
0
13 Oct 2023
DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing
Yueming Lyu
Kang Zhao
Bo Peng
Yue Jiang
Yingya Zhang
Jing Dong
31
2
0
12 Oct 2023
Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning
Junyu Lu
Di Zhang
Xiaojun Wu
Xinyu Gao
Ruyi Gan
Jiaxing Zhang
Yan Song
Pingjian Zhang
VLM
MLLM
17
7
0
12 Oct 2023
Ferret: Refer and Ground Anything Anywhere at Any Granularity
Haoxuan You
Haotian Zhang
Zhe Gan
Xianzhi Du
Bowen Zhang
Zirui Wang
Liangliang Cao
Shih-Fu Chang
Yinfei Yang
ObjD
MLLM
VLM
42
301
0
11 Oct 2023
ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
KAI-QING Zhou
Kwonjoon Lee
Teruhisa Misu
Xin Eric Wang
LRM
34
3
0
09 Oct 2023
ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Zejun Li
Ye Wang
Mengfei Du
Qingwen Liu
Binhao Wu
...
Zhihao Fan
Jie Fu
Jingjing Chen
Xuanjing Huang
Zhongyu Wei
30
13
0
04 Oct 2023
On the Cognition of Visual Question Answering Models and Human Intelligence: A Comparative Study
Liben Chen
Long Chen
Tian Ellison-Chen
Zhuoyuan Xu
LRM
22
0
0
04 Oct 2023
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
Pan Lu
Hritik Bansal
Tony Xia
Jiacheng Liu
Chun-yue Li
Hannaneh Hajishirzi
Hao Cheng
Kai-Wei Chang
Michel Galley
Jianfeng Gao
LRM
MLLM
43
509
0
03 Oct 2023
Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future
Zheng Chu
Jingchang Chen
Qianglong Chen
Weijiang Yu
Tao He
Haotian Wang
Weihua Peng
Ming-Yu Liu
Bing Qin
Ting Liu
LRM
AI4CE
37
153
0
27 Sep 2023
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning
Haozhe Zhao
Zefan Cai
Shuzheng Si
Xiaojian Ma
Kaikai An
Liang Chen
Zixuan Liu
Sheng Wang
Wenjuan Han
Baobao Chang
MLLM
VLM
28
133
0
14 Sep 2023
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
Yangyi Chen
Karan Sikka
Michael Cogswell
Heng Ji
Ajay Divakaran
LRM
36
25
0
08 Sep 2023
Interpretable Visual Question Answering via Reasoning Supervision
Maria Parelli
Dimitrios Mallis
Markos Diomataris
Vassilis Pitsikalis
LRM
30
2
0
07 Sep 2023
A Survey on Interpretable Cross-modal Reasoning
Dizhan Xue
Shengsheng Qian
Zuyi Zhou
Changsheng Xu
LRM
29
4
0
05 Sep 2023
Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Language Tasks via Semantic Grounding
Joshua Forster Feinglass
Yezhou Yang
27
2
0
01 Sep 2023
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
Yifan Xu
Mengdan Zhang
Xiaoshan Yang
Changsheng Xu
ObjD
32
5
0
30 Aug 2023
CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation
Devaansh Gupta
Siddhant Kharbanda
Jiawei Zhou
Wanhua Li
Hanspeter Pfister
D. Wei
VLM
39
9
0
29 Aug 2023
Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
Chi Chen
Ruoyu Qin
Fuwen Luo
Xiaoyue Mi
Peng Li
Maosong Sun
Yang Liu
MLLM
VLM
24
45
0
25 Aug 2023
HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks
Zichao Dong
Weikun Zhang
Xufeng Huang
Hang Ji
Xin Zhan
Junbo Chen
VLM
21
4
0
24 Aug 2023
InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4
Lai Wei
Zihao Jiang
Weiran Huang
Lichao Sun
VLM
MLLM
24
56
0
23 Aug 2023
FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning
Haokun Chen
Yao Zhang
Denis Krompass
Jindong Gu
Volker Tresp
FedML
67
39
0
21 Aug 2023
Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language Tasks
Fawaz Sammani
Nikos Deligiannis
13
5
0
17 Aug 2023
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Yonatan Bitton
Hritik Bansal
Jack Hessel
Rulin Shao
Wanrong Zhu
Anas Awadalla
Josh Gardner
Rohan Taori
L. Schimdt
VLM
31
77
0
12 Aug 2023
Explainable AI applications in the Medical Domain: a systematic review
Nicoletta Prentzas
A. Kakas
Constantinos S. Pattichis
21
11
0
10 Aug 2023
Previous
1
2
3
4
5
...
10
11
12
Next