ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.10830
  4. Cited By
From Recognition to Cognition: Visual Commonsense Reasoning

From Recognition to Cognition: Visual Commonsense Reasoning

27 November 2018
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
    LRM
    BDL
    OCL
    ReLM
ArXivPDFHTML

Papers citing "From Recognition to Cognition: Visual Commonsense Reasoning"

50 / 587 papers shown
Title
Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets
Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets
Anurag Roy
David Johnson Ekka
Saptarshi Ghosh
Abir Das
23
1
0
13 Oct 2022
CIKQA: Learning Commonsense Inference with a Unified
  Knowledge-in-the-loop QA Paradigm
CIKQA: Learning Commonsense Inference with a Unified Knowledge-in-the-loop QA Paradigm
Hongming Zhang
Yintong Huo
Yanai Elazar
Yangqiu Song
Yoav Goldberg
Dan Roth
LRM
33
3
0
12 Oct 2022
Understanding Embodied Reference with Touch-Line Transformer
Understanding Embodied Reference with Touch-Line Transformer
Yong Li
Xiaoxue Chen
Hao Zhao
Jiangtao Gong
Guyue Zhou
Federico Rossano
Yixin Zhu
160
16
0
11 Oct 2022
ViLPAct: A Benchmark for Compositional Generalization on Multimodal
  Human Activities
ViLPAct: A Benchmark for Compositional Generalization on Multimodal Human Activities
Terry Yue Zhuo
Yaqing Liao
Yuecheng Lei
Lizhen Qu
Gerard de Melo
Xiaojun Chang
Yazhou Ren
Zenglin Xu
42
2
0
11 Oct 2022
Mind's Eye: Grounded Language Model Reasoning through Simulation
Mind's Eye: Grounded Language Model Reasoning through Simulation
Ruibo Liu
Jason W. Wei
S. Gu
Te-Yen Wu
Soroush Vosoughi
Claire Cui
Denny Zhou
Andrew M. Dai
ReLM
LRM
118
79
0
11 Oct 2022
Transformer-based Localization from Embodied Dialog with Large-scale
  Pre-training
Transformer-based Localization from Embodied Dialog with Large-scale Pre-training
Meera Hahn
James M. Rehg
LM&Ro
40
4
0
10 Oct 2022
EgoTaskQA: Understanding Human Tasks in Egocentric Videos
EgoTaskQA: Understanding Human Tasks in Egocentric Videos
Baoxiong Jia
Ting Lei
Song-Chun Zhu
Siyuan Huang
EgoV
30
62
0
08 Oct 2022
Domain-Unified Prompt Representations for Source-Free Domain
  Generalization
Domain-Unified Prompt Representations for Source-Free Domain Generalization
Hongjing Niu
Hanting Li
Feng Zhao
Bin Li
VLM
67
18
0
29 Sep 2022
VIPHY: Probing "Visible" Physical Commonsense Knowledge
VIPHY: Probing "Visible" Physical Commonsense Knowledge
Shikhar Singh
Ehsan Qasemi
Muhao Chen
46
6
0
15 Sep 2022
WildQA: In-the-Wild Video Question Answering
WildQA: In-the-Wild Video Question Answering
Santiago Castro
Naihao Deng
Pingxuan Huang
Mihai Burzo
Rada Mihalcea
74
7
0
14 Sep 2022
MaXM: Towards Multilingual Visual Question Answering
MaXM: Towards Multilingual Visual Question Answering
Soravit Changpinyo
Linting Xue
Michal Yarom
Ashish V. Thapliyal
Idan Szpektor
J. Amelot
Xi Chen
Radu Soricut
33
8
0
12 Sep 2022
Foundations and Trends in Multimodal Machine Learning: Principles,
  Challenges, and Open Questions
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
18
62
0
07 Sep 2022
ILLUME: Rationalizing Vision-Language Models through Human Interactions
ILLUME: Rationalizing Vision-Language Models through Human Interactions
Manuel Brack
P. Schramowski
Bjorn Deiseroth
Kristian Kersting
VLM
MLLM
27
3
0
17 Aug 2022
Towards Open-vocabulary Scene Graph Generation with Prompt-based
  Finetuning
Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning
Tao He
Lianli Gao
Jingkuan Song
Yuan-Fang Li
VLM
34
50
0
17 Aug 2022
Self-Contained Entity Discovery from Captioned Videos
Self-Contained Entity Discovery from Captioned Videos
M. Ayoughi
P. Mettes
Paul T. Groth
28
2
0
13 Aug 2022
Visual Recognition by Request
Visual Recognition by Request
Chufeng Tang
Lingxi Xie
Xiaopeng Zhang
Xiaolin Hu
Qi Tian
VLM
16
15
0
28 Jul 2022
Cross-Modal Causal Relational Reasoning for Event-Level Visual Question
  Answering
Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering
Yang Liu
Guanbin Li
Liang Lin
LRM
36
80
0
26 Jul 2022
WinoGAViL: Gamified Association Benchmark to Challenge
  Vision-and-Language Models
WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models
Yonatan Bitton
Nitzan Bitton-Guetta
Ron Yosef
Yuval Elovici
Joey Tianyi Zhou
Gabriel Stanovsky
Roy Schwartz
25
19
0
25 Jul 2022
Chunk-aware Alignment and Lexical Constraint for Visual Entailment with
  Natural Language Explanations
Chunk-aware Alignment and Lexical Constraint for Visual Entailment with Natural Language Explanations
Qian Yang
Yunxin Li
Baotian Hu
Lin Ma
Yuxin Ding
Min Zhang
30
10
0
23 Jul 2022
FashionViL: Fashion-Focused Vision-and-Language Representation Learning
FashionViL: Fashion-Focused Vision-and-Language Representation Learning
Xiaoping Han
Licheng Yu
Xiatian Zhu
Li Zhang
Yi-Zhe Song
Tao Xiang
AI4TS
16
49
0
17 Jul 2022
Reasoning about Actions over Visual and Linguistic Modalities: A Survey
Reasoning about Actions over Visual and Linguistic Modalities: A Survey
Shailaja Keyur Sampat
Maitreya Patel
Subhasish Das
Yezhou Yang
Chitta Baral
ReLM
LM&Ro
LRM
27
12
0
15 Jul 2022
CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination
CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination
Hyounghun Kim
Abhaysinh Zala
Joey Tianyi Zhou
22
6
0
08 Jul 2022
VL-CheckList: Evaluating Pre-trained Vision-Language Models with
  Objects, Attributes and Relations
VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations
Tiancheng Zhao
Tianqi Zhang
Mingwei Zhu
Haozhan Shen
Kyusong Lee
Xiaopeng Lu
Jianwei Yin
VLM
CoGe
MLLM
45
91
0
01 Jul 2022
Modern Question Answering Datasets and Benchmarks: A Survey
Modern Question Answering Datasets and Benchmarks: A Survey
Zhen Wang
44
23
0
30 Jun 2022
CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks
CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks
Tejas Srinivasan
Ting-Yun Chang
Leticia Pinto-Alva
Georgios Chochlakis
Mohammad Rostami
Jesse Thomason
VLM
CLL
25
73
0
18 Jun 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjD
VLM
MLLM
74
393
0
17 Jun 2022
LAVENDER: Unifying Video-Language Understanding as Masked Language
  Modeling
LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Chung-Ching Lin
Zicheng Liu
Ce Liu
Lijuan Wang
MLLM
VLM
20
81
0
14 Jun 2022
Revealing Single Frame Bias for Video-and-Language Learning
Revealing Single Frame Bias for Video-and-Language Learning
Jie Lei
Tamara L. Berg
Joey Tianyi Zhou
24
111
0
07 Jun 2022
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
Dustin Schwenk
Apoorv Khandelwal
Christopher Clark
Kenneth Marino
Roozbeh Mottaghi
16
506
0
03 Jun 2022
ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts
ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts
Bingqian Lin
Yi Zhu
Zicong Chen
Xiwen Liang
Jian-zhuo Liu
Xiaodan Liang
LM&Ro
33
51
0
31 May 2022
From Representation to Reasoning: Towards both Evidence and Commonsense
  Reasoning for Video Question-Answering
From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering
Jiangtong Li
Li Niu
Liqing Zhang
20
49
0
30 May 2022
VD-PCR: Improving Visual Dialog with Pronoun Coreference Resolution
VD-PCR: Improving Visual Dialog with Pronoun Coreference Resolution
Xintong Yu
Hongming Zhang
Ruixin Hong
Yangqiu Song
Changshui Zhang
17
13
0
29 May 2022
Visual Superordinate Abstraction for Robust Concept Learning
Visual Superordinate Abstraction for Robust Concept Learning
Qinjie Zheng
Chaoyue Wang
Dadong Wang
Dacheng Tao
VLM
25
2
0
28 May 2022
Effective Abstract Reasoning with Dual-Contrast Network
Effective Abstract Reasoning with Dual-Contrast Network
Tao Zhuo
Mohan S. Kankanhalli
16
40
0
27 May 2022
DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally
  Spreading Out Disinformation
DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally Spreading Out Disinformation
Jingnong Qu
Liunian Harold Li
Jieyu Zhao
Sunipa Dev
Kai-Wei Chang
21
12
0
25 May 2022
On Advances in Text Generation from Images Beyond Captioning: A Case
  Study in Self-Rationalization
On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization
Shruti Palaskar
Akshita Bhagia
Yonatan Bisk
Florian Metze
A. Black
Ana Marasović
31
4
0
24 May 2022
VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks
  for Visual Question Answering
VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering
Yanan Wang
Michihiro Yasunaga
Hongyu Ren
Shinya Wada
J. Leskovec
29
17
0
23 May 2022
PEVL: Position-enhanced Pre-training and Prompt Tuning for
  Vision-language Models
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models
Yuan Yao
Qi-An Chen
Ao Zhang
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
VLM
MLLM
29
38
0
23 May 2022
Housekeep: Tidying Virtual Households using Commonsense Reasoning
Housekeep: Tidying Virtual Households using Commonsense Reasoning
Yash Kant
Arun Ramachandran
Sriram Yenamandra
Igor Gilitschenski
Dhruv Batra
Andrew Szot
Harsh Agrawal
LM&Ro
LRM
160
73
0
22 May 2022
What do Models Learn From Training on More Than Text? Measuring Visual
  Commonsense Knowledge
What do Models Learn From Training on More Than Text? Measuring Visual Commonsense Knowledge
Lovisa Hagström
Richard Johansson
VLM
32
4
0
14 May 2022
Visual Commonsense in Pretrained Unimodal and Multimodal Models
Visual Commonsense in Pretrained Unimodal and Multimodal Models
Chenyu Zhang
Benjamin Van Durme
Zhuowan Li
Elias Stengel-Eskin
VLM
SSL
31
39
0
04 May 2022
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
A. Piergiovanni
Wei Li
Weicheng Kuo
M. Saffar
Fred Bertsch
A. Angelova
17
16
0
02 May 2022
Visual Spatial Reasoning
Visual Spatial Reasoning
Fangyu Liu
Guy Edward Toh Emerson
Nigel Collier
ReLM
42
159
0
30 Apr 2022
Tragedy Plus Time: Capturing Unintended Human Activities from
  Weakly-labeled Videos
Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos
Arnav Chakravarthy
Zhiyuan Fang
Yezhou Yang
35
2
0
28 Apr 2022
Super-Prompting: Utilizing Model-Independent Contextual Data to Reduce
  Data Annotation Required in Visual Commonsense Tasks
Super-Prompting: Utilizing Model-Independent Contextual Data to Reduce Data Annotation Required in Visual Commonsense Tasks
Navid Rezaei
Marek Reformat
VLM
17
2
0
25 Apr 2022
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for
  Vision-Language Tasks
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Xiyang Dai
...
Jianwei Yang
Haoxuan You
Kai-Wei Chang
Shih-Fu Chang
Lu Yuan
VLM
OffRL
31
22
0
22 Apr 2022
Hypergraph Transformer: Weakly-supervised Multi-hop Reasoning for
  Knowledge-based Visual Question Answering
Hypergraph Transformer: Weakly-supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering
Y. Heo
Eun-Sol Kim
Woo Suk Choi
Byoung-Tak Zhang
29
27
0
22 Apr 2022
Attention in Reasoning: Dataset, Analysis, and Modeling
Attention in Reasoning: Dataset, Analysis, and Modeling
Shi Chen
Ming Jiang
Jinhui Yang
Qi Zhao
LRM
36
3
0
20 Apr 2022
End-to-end Dense Video Captioning as Sequence Generation
End-to-end Dense Video Captioning as Sequence Generation
Wanrong Zhu
Bo Pang
Ashish V. Thapliyal
William Yang Wang
Radu Soricut
DiffM
19
32
0
18 Apr 2022
Attention Mechanism based Cognition-level Scene Understanding
Attention Mechanism based Cognition-level Scene Understanding
Xuejiao Tang
Tai Le Quy
LRM
30
0
0
17 Apr 2022
Previous
123...678...101112
Next