Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1811.10830
Cited By
From Recognition to Cognition: Visual Commonsense Reasoning
27 November 2018
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
LRM
BDL
OCL
ReLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"From Recognition to Cognition: Visual Commonsense Reasoning"
50 / 587 papers shown
Title
Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets
Anurag Roy
David Johnson Ekka
Saptarshi Ghosh
Abir Das
23
1
0
13 Oct 2022
CIKQA: Learning Commonsense Inference with a Unified Knowledge-in-the-loop QA Paradigm
Hongming Zhang
Yintong Huo
Yanai Elazar
Yangqiu Song
Yoav Goldberg
Dan Roth
LRM
33
3
0
12 Oct 2022
Understanding Embodied Reference with Touch-Line Transformer
Yong Li
Xiaoxue Chen
Hao Zhao
Jiangtao Gong
Guyue Zhou
Federico Rossano
Yixin Zhu
160
16
0
11 Oct 2022
ViLPAct: A Benchmark for Compositional Generalization on Multimodal Human Activities
Terry Yue Zhuo
Yaqing Liao
Yuecheng Lei
Lizhen Qu
Gerard de Melo
Xiaojun Chang
Yazhou Ren
Zenglin Xu
42
2
0
11 Oct 2022
Mind's Eye: Grounded Language Model Reasoning through Simulation
Ruibo Liu
Jason W. Wei
S. Gu
Te-Yen Wu
Soroush Vosoughi
Claire Cui
Denny Zhou
Andrew M. Dai
ReLM
LRM
118
79
0
11 Oct 2022
Transformer-based Localization from Embodied Dialog with Large-scale Pre-training
Meera Hahn
James M. Rehg
LM&Ro
40
4
0
10 Oct 2022
EgoTaskQA: Understanding Human Tasks in Egocentric Videos
Baoxiong Jia
Ting Lei
Song-Chun Zhu
Siyuan Huang
EgoV
30
62
0
08 Oct 2022
Domain-Unified Prompt Representations for Source-Free Domain Generalization
Hongjing Niu
Hanting Li
Feng Zhao
Bin Li
VLM
67
18
0
29 Sep 2022
VIPHY: Probing "Visible" Physical Commonsense Knowledge
Shikhar Singh
Ehsan Qasemi
Muhao Chen
46
6
0
15 Sep 2022
WildQA: In-the-Wild Video Question Answering
Santiago Castro
Naihao Deng
Pingxuan Huang
Mihai Burzo
Rada Mihalcea
74
7
0
14 Sep 2022
MaXM: Towards Multilingual Visual Question Answering
Soravit Changpinyo
Linting Xue
Michal Yarom
Ashish V. Thapliyal
Idan Szpektor
J. Amelot
Xi Chen
Radu Soricut
33
8
0
12 Sep 2022
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
18
62
0
07 Sep 2022
ILLUME: Rationalizing Vision-Language Models through Human Interactions
Manuel Brack
P. Schramowski
Bjorn Deiseroth
Kristian Kersting
VLM
MLLM
27
3
0
17 Aug 2022
Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning
Tao He
Lianli Gao
Jingkuan Song
Yuan-Fang Li
VLM
34
50
0
17 Aug 2022
Self-Contained Entity Discovery from Captioned Videos
M. Ayoughi
P. Mettes
Paul T. Groth
28
2
0
13 Aug 2022
Visual Recognition by Request
Chufeng Tang
Lingxi Xie
Xiaopeng Zhang
Xiaolin Hu
Qi Tian
VLM
16
15
0
28 Jul 2022
Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering
Yang Liu
Guanbin Li
Liang Lin
LRM
36
80
0
26 Jul 2022
WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models
Yonatan Bitton
Nitzan Bitton-Guetta
Ron Yosef
Yuval Elovici
Joey Tianyi Zhou
Gabriel Stanovsky
Roy Schwartz
25
19
0
25 Jul 2022
Chunk-aware Alignment and Lexical Constraint for Visual Entailment with Natural Language Explanations
Qian Yang
Yunxin Li
Baotian Hu
Lin Ma
Yuxin Ding
Min Zhang
30
10
0
23 Jul 2022
FashionViL: Fashion-Focused Vision-and-Language Representation Learning
Xiaoping Han
Licheng Yu
Xiatian Zhu
Li Zhang
Yi-Zhe Song
Tao Xiang
AI4TS
16
49
0
17 Jul 2022
Reasoning about Actions over Visual and Linguistic Modalities: A Survey
Shailaja Keyur Sampat
Maitreya Patel
Subhasish Das
Yezhou Yang
Chitta Baral
ReLM
LM&Ro
LRM
27
12
0
15 Jul 2022
CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination
Hyounghun Kim
Abhaysinh Zala
Joey Tianyi Zhou
22
6
0
08 Jul 2022
VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations
Tiancheng Zhao
Tianqi Zhang
Mingwei Zhu
Haozhan Shen
Kyusong Lee
Xiaopeng Lu
Jianwei Yin
VLM
CoGe
MLLM
45
91
0
01 Jul 2022
Modern Question Answering Datasets and Benchmarks: A Survey
Zhen Wang
44
23
0
30 Jun 2022
CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks
Tejas Srinivasan
Ting-Yun Chang
Leticia Pinto-Alva
Georgios Chochlakis
Mohammad Rostami
Jesse Thomason
VLM
CLL
25
73
0
18 Jun 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjD
VLM
MLLM
74
393
0
17 Jun 2022
LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Chung-Ching Lin
Zicheng Liu
Ce Liu
Lijuan Wang
MLLM
VLM
20
81
0
14 Jun 2022
Revealing Single Frame Bias for Video-and-Language Learning
Jie Lei
Tamara L. Berg
Joey Tianyi Zhou
24
111
0
07 Jun 2022
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
Dustin Schwenk
Apoorv Khandelwal
Christopher Clark
Kenneth Marino
Roozbeh Mottaghi
16
506
0
03 Jun 2022
ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts
Bingqian Lin
Yi Zhu
Zicong Chen
Xiwen Liang
Jian-zhuo Liu
Xiaodan Liang
LM&Ro
33
51
0
31 May 2022
From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering
Jiangtong Li
Li Niu
Liqing Zhang
20
49
0
30 May 2022
VD-PCR: Improving Visual Dialog with Pronoun Coreference Resolution
Xintong Yu
Hongming Zhang
Ruixin Hong
Yangqiu Song
Changshui Zhang
17
13
0
29 May 2022
Visual Superordinate Abstraction for Robust Concept Learning
Qinjie Zheng
Chaoyue Wang
Dadong Wang
Dacheng Tao
VLM
25
2
0
28 May 2022
Effective Abstract Reasoning with Dual-Contrast Network
Tao Zhuo
Mohan S. Kankanhalli
16
40
0
27 May 2022
DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally Spreading Out Disinformation
Jingnong Qu
Liunian Harold Li
Jieyu Zhao
Sunipa Dev
Kai-Wei Chang
21
12
0
25 May 2022
On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization
Shruti Palaskar
Akshita Bhagia
Yonatan Bisk
Florian Metze
A. Black
Ana Marasović
31
4
0
24 May 2022
VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering
Yanan Wang
Michihiro Yasunaga
Hongyu Ren
Shinya Wada
J. Leskovec
29
17
0
23 May 2022
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models
Yuan Yao
Qi-An Chen
Ao Zhang
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
VLM
MLLM
29
38
0
23 May 2022
Housekeep: Tidying Virtual Households using Commonsense Reasoning
Yash Kant
Arun Ramachandran
Sriram Yenamandra
Igor Gilitschenski
Dhruv Batra
Andrew Szot
Harsh Agrawal
LM&Ro
LRM
160
73
0
22 May 2022
What do Models Learn From Training on More Than Text? Measuring Visual Commonsense Knowledge
Lovisa Hagström
Richard Johansson
VLM
32
4
0
14 May 2022
Visual Commonsense in Pretrained Unimodal and Multimodal Models
Chenyu Zhang
Benjamin Van Durme
Zhuowan Li
Elias Stengel-Eskin
VLM
SSL
31
39
0
04 May 2022
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
A. Piergiovanni
Wei Li
Weicheng Kuo
M. Saffar
Fred Bertsch
A. Angelova
17
16
0
02 May 2022
Visual Spatial Reasoning
Fangyu Liu
Guy Edward Toh Emerson
Nigel Collier
ReLM
42
159
0
30 Apr 2022
Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos
Arnav Chakravarthy
Zhiyuan Fang
Yezhou Yang
35
2
0
28 Apr 2022
Super-Prompting: Utilizing Model-Independent Contextual Data to Reduce Data Annotation Required in Visual Commonsense Tasks
Navid Rezaei
Marek Reformat
VLM
17
2
0
25 Apr 2022
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Xiyang Dai
...
Jianwei Yang
Haoxuan You
Kai-Wei Chang
Shih-Fu Chang
Lu Yuan
VLM
OffRL
31
22
0
22 Apr 2022
Hypergraph Transformer: Weakly-supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering
Y. Heo
Eun-Sol Kim
Woo Suk Choi
Byoung-Tak Zhang
29
27
0
22 Apr 2022
Attention in Reasoning: Dataset, Analysis, and Modeling
Shi Chen
Ming Jiang
Jinhui Yang
Qi Zhao
LRM
36
3
0
20 Apr 2022
End-to-end Dense Video Captioning as Sequence Generation
Wanrong Zhu
Bo Pang
Ashish V. Thapliyal
William Yang Wang
Radu Soricut
DiffM
19
32
0
18 Apr 2022
Attention Mechanism based Cognition-level Scene Understanding
Xuejiao Tang
Tai Le Quy
LRM
30
0
0
17 Apr 2022
Previous
1
2
3
...
6
7
8
...
10
11
12
Next