ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.10830
  4. Cited By
From Recognition to Cognition: Visual Commonsense Reasoning

From Recognition to Cognition: Visual Commonsense Reasoning

27 November 2018
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
    LRM
    BDL
    OCL
    ReLM
ArXivPDFHTML

Papers citing "From Recognition to Cognition: Visual Commonsense Reasoning"

50 / 587 papers shown
Title
Plug-and-Play Regulators for Image-Text Matching
Plug-and-Play Regulators for Image-Text Matching
Haiwen Diao
Wenjie Qu
Wei Liu
Xiang Ruan
Huchuan Lu
35
20
0
23 Mar 2023
Divide and Conquer: Answering Questions with Object Factorization and
  Compositional Reasoning
Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning
Shi Chen
Qi Zhao
47
5
0
18 Mar 2023
A Region-Prompted Adapter Tuning for Visual Abductive Reasoning
A Region-Prompted Adapter Tuning for Visual Abductive Reasoning
Hao Zhang
Yeo Keat Ee
Basura Fernando
VLM
29
3
0
18 Mar 2023
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet
  Tag-guided Synthetic Data
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
Xuenan Xu
Zhiling Zhang
Zelin Zhou
Pingyue Zhang
Zeyu Xie
Mengyue Wu
Ke Zhu
CLIP
71
14
0
14 Mar 2023
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of
  Synthetic and Compositional Images
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images
Nitzan Bitton-Guetta
Yonatan Bitton
Jack Hessel
Ludwig Schmidt
Yuval Elovici
Gabriel Stanovsky
Roy Schwartz
VLM
121
66
0
13 Mar 2023
ViM: Vision Middleware for Unified Downstream Transferring
ViM: Vision Middleware for Unified Downstream Transferring
Yutong Feng
Biao Gong
Jianwen Jiang
Yiliang Lv
Yujun Shen
Deli Zhao
Jingren Zhou
32
1
0
13 Mar 2023
Accountable Textual-Visual Chat Learns to Reject Human Instructions in
  Image Re-creation
Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation
Zhiwei Zhang
Yuliang Liu
MLLM
30
0
0
10 Mar 2023
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation
  Models
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
Chenfei Wu
Sheng-Kai Yin
Weizhen Qi
Xiaodong Wang
Zecheng Tang
Nan Duan
MLLM
LRM
53
614
0
08 Mar 2023
Knowledge-Based Counterfactual Queries for Visual Question Answering
Knowledge-Based Counterfactual Queries for Visual Question Answering
Theodoti Stoikou
Maria Lymperaiou
Giorgos Stamou
AAML
31
1
0
05 Mar 2023
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on
  Tasks and Challenges
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on Tasks and Challenges
Maria Lymperaiou
Giorgos Stamou
VLM
32
4
0
04 Mar 2023
Learning Visual Representations via Language-Guided Sampling
Learning Visual Representations via Language-Guided Sampling
Mohamed El Banani
Karan Desai
Justin Johnson
SSL
VLM
21
28
0
23 Feb 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
45
204
0
20 Feb 2023
Learning by Asking for Embodied Visual Navigation and Task Completion
Learning by Asking for Embodied Visual Navigation and Task Completion
Ying Shen
Ismini Lourentzou
34
1
0
09 Feb 2023
Benchmarks for Automated Commonsense Reasoning: A Survey
Benchmarks for Automated Commonsense Reasoning: A Survey
E. Davis
ELM
LRM
24
58
0
09 Feb 2023
Learning to Agree on Vision Attention for Visual Commonsense Reasoning
Learning to Agree on Vision Attention for Visual Commonsense Reasoning
Zhenyang Li
Yangyang Guo
Ke-Jyun Wang
Fan Liu
Liqiang Nie
Mohan S. Kankanhalli
40
10
0
04 Feb 2023
Multimodality Representation Learning: A Survey on Evolution,
  Pretraining and Its Applications
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications
Muhammad Arslan Manzoor
S. Albarri
Ziting Xian
Zaiqiao Meng
Preslav Nakov
Shangsong Liang
AI4TS
34
26
0
01 Feb 2023
ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View
  Semantic Consistency
ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View Semantic Consistency
Pengzhen Ren
Changlin Li
Hang Xu
Yi Zhu
Guangrun Wang
Jian-zhuo Liu
Xiaojun Chang
Xiaodan Liang
42
43
0
31 Jan 2023
Multi-modal Large Language Model Enhanced Pseudo 3D Perception Framework
  for Visual Commonsense Reasoning
Multi-modal Large Language Model Enhanced Pseudo 3D Perception Framework for Visual Commonsense Reasoning
Jian Zhu
Hanli Wang
Miaojing Shi
LRM
24
4
0
30 Jan 2023
Learning the Effects of Physical Actions in a Multi-modal Environment
Learning the Effects of Physical Actions in a Multi-modal Environment
Gautier Dagan
Frank Keller
A. Lascarides
LM&Ro
38
3
0
27 Jan 2023
Towards a Unified Model for Generating Answers and Explanations in
  Visual Question Answering
Towards a Unified Model for Generating Answers and Explanations in Visual Question Answering
Chenxi Whitehouse
Tillman Weyde
Pranava Madhyastha
LRM
44
3
0
25 Jan 2023
Effective End-to-End Vision Language Pretraining with Semantic Visual
  Loss
Effective End-to-End Vision Language Pretraining with Semantic Visual Loss
Xiaofeng Yang
Fayao Liu
Guosheng Lin
VLM
26
7
0
18 Jan 2023
GIVL: Improving Geographical Inclusivity of Vision-Language Models with
  Pre-Training Methods
GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods
Da Yin
Feng Gao
Govind Thattai
Michael F. Johnston
Kai-Wei Chang
VLM
34
15
0
05 Jan 2023
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and
  Challenges
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges
R. Zakari
Jim Wilson Owusu
Hailin Wang
Ke Qin
Zaharaddeen Karami Lawal
Yue-hong Dong
LRM
33
16
0
26 Dec 2022
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction
  Tuning
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
Zhiyang Xu
Ying Shen
Lifu Huang
MLLM
32
110
0
21 Dec 2022
Define, Evaluate, and Improve Task-Oriented Cognitive Capabilities for
  Instruction Generation Models
Define, Evaluate, and Improve Task-Oriented Cognitive Capabilities for Instruction Generation Models
Lingjun Zhao
Khanh Nguyen
Hal Daumé
ELM
35
6
0
21 Dec 2022
Are Deep Neural Networks SMARTer than Second Graders?
Are Deep Neural Networks SMARTer than Second Graders?
A. Cherian
Kuan-Chuan Peng
Suhas Lohit
Kevin A. Smith
J. Tenenbaum
AAML
LRM
ReLM
35
29
0
20 Dec 2022
Reasoning with Language Model Prompting: A Survey
Reasoning with Language Model Prompting: A Survey
Shuofei Qiao
Yixin Ou
Ningyu Zhang
Xiang Chen
Yunzhi Yao
Shumin Deng
Chuanqi Tan
Fei Huang
Huajun Chen
ReLM
ELM
LRM
71
311
0
19 Dec 2022
Find Someone Who: Visual Commonsense Understanding in Human-Centric
  Grounding
Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
Haoxuan You
Rui Sun
Zhecan Wang
Kai-Wei Chang
Shih-Fu Chang
16
4
0
14 Dec 2022
Doubly Right Object Recognition: A Why Prompt for Visual Rationales
Doubly Right Object Recognition: A Why Prompt for Visual Rationales
Chengzhi Mao
Revant Teotia
Amrutha Sundar
Sachit Menon
Junfeng Yang
Xin Eric Wang
Carl Vondrick
18
29
0
12 Dec 2022
VASR: Visual Analogies of Situation Recognition
VASR: Visual Analogies of Situation Recognition
Yonatan Bitton
Ron Yosef
Eli Strugo
Dafna Shahaf
Roy Schwartz
Gabriel Stanovsky
25
21
0
08 Dec 2022
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level
  Natural Language Explanations
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations
Björn Plüster
Jakob Ambsdorf
Lukas Braach
Jae Hee Lee
S. Wermter
25
6
0
08 Dec 2022
Going Beyond XAI: A Systematic Survey for Explanation-Guided Learning
Going Beyond XAI: A Systematic Survey for Explanation-Guided Learning
Yuyang Gao
Siyi Gu
Junji Jiang
S. Hong
Dazhou Yu
Liang Zhao
29
39
0
07 Dec 2022
Compound Tokens: Channel Fusion for Vision-Language Representation
  Learning
Compound Tokens: Channel Fusion for Vision-Language Representation Learning
Maxwell Mbabilla Aladago
A. Piergiovanni
19
1
0
02 Dec 2022
What do you MEME? Generating Explanations for Visual Semantic Role
  Labelling in Memes
What do you MEME? Generating Explanations for Visual Semantic Role Labelling in Memes
Shivam Sharma
Siddhant Agarwal
Tharun Suresh
Preslav Nakov
Md. Shad Akhtar
Tanmoy Charkraborty
VLM
28
18
0
01 Dec 2022
A Probabilistic-Logic based Commonsense Representation Framework for
  Modelling Inferences with Multiple Antecedents and Varying Likelihoods
A Probabilistic-Logic based Commonsense Representation Framework for Modelling Inferences with Multiple Antecedents and Varying Likelihoods
Shantanu Jaiswal
Liu Yan
Dongkyu Choi
Kenneth Kwok
22
0
0
30 Nov 2022
Improving Commonsense in Vision-Language Models via Knowledge Graph
  Riddles
Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
Shuquan Ye
Yujia Xie
Dongdong Chen
Yichong Xu
Lu Yuan
Chenguang Zhu
Jing Liao
VLM
27
11
0
29 Nov 2022
Perceive, Ground, Reason, and Act: A Benchmark for General-purpose
  Visual Representation
Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation
Jiangyong Huang
William Zhu
Baoxiong Jia
Zan Wang
Xiaojian Ma
Qing Li
Siyuan Huang
40
5
0
28 Nov 2022
A survey on knowledge-enhanced multimodal learning
A survey on knowledge-enhanced multimodal learning
Maria Lymperaiou
Giorgos Stamou
41
14
0
19 Nov 2022
Multi-VQG: Generating Engaging Questions for Multiple Images
Multi-VQG: Generating Engaging Questions for Multiple Images
Min-Hsuan Yeh
Vicent Chen
Ting-Hao Haung
Lun-Wei Ku
CoGe
18
7
0
14 Nov 2022
Understanding ME? Multimodal Evaluation for Fine-grained Visual
  Commonsense
Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense
Zhecan Wang
Haoxuan You
Yicheng He
Wenhao Li
Kai-Wei Chang
Shih-Fu Chang
23
5
0
10 Nov 2022
Towards Reasoning-Aware Explainable VQA
Towards Reasoning-Aware Explainable VQA
Rakesh Vaideeswaran
Feng Gao
Abhinav Mathur
Govind Thattai
LRM
46
3
0
09 Nov 2022
Understanding Cross-modal Interactions in V&L Models that Generate Scene
  Descriptions
Understanding Cross-modal Interactions in V&L Models that Generate Scene Descriptions
Michele Cafagna
Kees van Deemter
Albert Gatt
CoGe
16
3
0
09 Nov 2022
Going for GOAL: A Resource for Grounded Football Commentaries
Going for GOAL: A Resource for Grounded Football Commentaries
Alessandro Suglia
José Lopes
E. Bastianelli
Andrea Vanzo
Shubham Agarwal
Malvina Nikandrou
Lu Yu
Ioannis Konstas
Verena Rieser
33
5
0
08 Nov 2022
CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties
  via Video Question Answering
CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering
Maitreya Patel
Tejas Gokhale
Chitta Baral
Yezhou Yang
49
9
0
07 Nov 2022
VLC-BERT: Visual Question Answering with Contextualized Commonsense
  Knowledge
VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge
Sahithya Ravi
Aditya Chinchure
Leonid Sigal
Renjie Liao
Vered Shwartz
37
26
0
24 Oct 2022
DiscoSense: Commonsense Reasoning with Discourse Connectives
DiscoSense: Commonsense Reasoning with Discourse Connectives
Prajjwal Bhargava
Vincent Ng
LRM
164
4
0
22 Oct 2022
TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun
  Distillation
TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun Distillation
Pengfei Li
Beiwen Tian
Yongliang Shi
Xiaoxue Chen
Hao Zhao
Guyue Zhou
Ya Zhang
39
20
0
19 Oct 2022
SafeText: A Benchmark for Exploring Physical Safety in Language Models
SafeText: A Benchmark for Exploring Physical Safety in Language Models
Sharon Levy
Emily Allaway
Melanie Subbiah
Lydia B. Chilton
D. Patton
Kathleen McKeown
William Yang Wang
59
40
0
18 Oct 2022
COFAR: Commonsense and Factual Reasoning in Image Search
COFAR: Commonsense and Factual Reasoning in Image Search
Prajwal Gatti
A. S. Penamakuri
Revant Teotia
Anand Mishra
Shubhashis Sengupta
Roshni Ramnani
ReLM
LRM
25
4
0
16 Oct 2022
SQA3D: Situated Question Answering in 3D Scenes
SQA3D: Situated Question Answering in 3D Scenes
Xiaojian Ma
Silong Yong
Zilong Zheng
Qing Li
Yitao Liang
Song-Chun Zhu
Siyuan Huang
LM&Ro
33
132
0
14 Oct 2022
Previous
123...567...101112
Next