ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.10830
  4. Cited By
From Recognition to Cognition: Visual Commonsense Reasoning

From Recognition to Cognition: Visual Commonsense Reasoning

27 November 2018
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
    LRM
    BDL
    OCL
    ReLM
ArXivPDFHTML

Papers citing "From Recognition to Cognition: Visual Commonsense Reasoning"

50 / 587 papers shown
Title
Tiny LVLM-eHub: Early Multimodal Experiments with Bard
Tiny LVLM-eHub: Early Multimodal Experiments with Bard
Wenqi Shao
Yutao Hu
Peng Gao
Meng Lei
Kaipeng Zhang
...
Peng Xu
Siyuan Huang
Hongsheng Li
Yuning Qiao
Ping Luo
VLM
MLLM
32
2
0
07 Aug 2023
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
Weihao Yu
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
Zicheng Liu
Xinchao Wang
Lijuan Wang
MLLM
60
615
0
04 Aug 2023
Making the V in Text-VQA Matter
Making the V in Text-VQA Matter
Shamanthak Hegde
Soumya Jahagirdar
Shankar Gangisetty
CoGe
34
4
0
01 Aug 2023
Advancing Visual Grounding with Scene Knowledge: Benchmark and Method
Advancing Visual Grounding with Scene Knowledge: Benchmark and Method
Zhihong Chen
Ruifei Zhang
Yibing Song
Xiang Wan
Guanbin Li
24
15
0
21 Jul 2023
Does Visual Pretraining Help End-to-End Reasoning?
Does Visual Pretraining Help End-to-End Reasoning?
Chen Sun
Calvin Luo
Xingyi Zhou
Anurag Arnab
Cordelia Schmid
OCL
LRM
ViT
38
3
0
17 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
VLM
MLLM
85
224
0
07 Jul 2023
Vision Language Transformers: A Survey
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
28
5
0
06 Jul 2023
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Rui Sun
Zhecan Wang
Haoxuan You
Noel Codella
Kai-Wei Chang
Shih-Fu Chang
CLIP
32
3
0
03 Jul 2023
A Survey on Multimodal Large Language Models
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLM
LRM
54
556
0
23 Jun 2023
Visual Adversarial Examples Jailbreak Aligned Large Language Models
Visual Adversarial Examples Jailbreak Aligned Large Language Models
Xiangyu Qi
Kaixuan Huang
Ashwinee Panda
Peter Henderson
Mengdi Wang
Prateek Mittal
AAML
25
138
0
22 Jun 2023
Semantic HELM: A Human-Readable Memory for Reinforcement Learning
Semantic HELM: A Human-Readable Memory for Reinforcement Learning
Fabian Paischer
Thomas Adler
M. Hofmarcher
Sepp Hochreiter
26
10
0
15 Jun 2023
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large
  Vision-Language Models
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models
Peng Xu
Wenqi Shao
Kaipeng Zhang
Peng Gao
Shuo Liu
Meng Lei
Fanqing Meng
Siyuan Huang
Yu Qiao
Ping Luo
ELM
MLLM
36
159
0
15 Jun 2023
DiPlomat: A Dialogue Dataset for Situated Pragmatic Reasoning
DiPlomat: A Dialogue Dataset for Situated Pragmatic Reasoning
Hengli Li
Songchun Zhu
Zilong Zheng
11
8
0
15 Jun 2023
Toward Grounded Commonsense Reasoning
Toward Grounded Commonsense Reasoning
Minae Kwon
Hengyuan Hu
Vivek Myers
Siddharth Karamcheti
Anca Dragan
Dorsa Sadigh
LM&Ro
ReLM
LRM
42
9
0
14 Jun 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large
  Language Models
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
Lingxi Xie
Longhui Wei
Xiaopeng Zhang
Kaifeng Bi
Xiaotao Gu
Jianlong Chang
Qi Tian
38
7
0
14 Jun 2023
FLamE: Few-shot Learning from Natural Language Explanations
FLamE: Few-shot Learning from Natural Language Explanations
Yangqiaoyu Zhou
Yiming Zhang
Chenhao Tan
LRM
FAtt
30
9
0
13 Jun 2023
V-LoL: A Diagnostic Dataset for Visual Logical Learning
V-LoL: A Diagnostic Dataset for Visual Logical Learning
Lukas Helff
Wolfgang Stammer
Hikaru Shindo
Devendra Singh Dhami
Kristian Kersting
NAI
27
3
0
13 Jun 2023
A Comprehensive Survey on Applications of Transformers for Deep Learning
  Tasks
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks
Saidul Islam
Hanae Elmekki
Ahmed Elsebai
Jamal Bentahar
Najat Drawel
Gaith Rjoub
Witold Pedrycz
ViT
MedIm
24
172
0
11 Jun 2023
Multimodal Explainable Artificial Intelligence: A Comprehensive Review
  of Methodological Advances and Future Research Directions
Multimodal Explainable Artificial Intelligence: A Comprehensive Review of Methodological Advances and Future Research Directions
N. Rodis
Christos Sardianos
Panagiotis I. Radoglou-Grammatikis
Panagiotis G. Sarigiannidis
Iraklis Varlamis
Georgios Th. Papadopoulos
25
22
0
09 Jun 2023
Object Detection with Transformers: A Review
Object Detection with Transformers: A Review
Tahira Shehzadi
K. Hashmi
D. Stricker
Muhammad Zeshan Afzal
ViT
MU
23
28
0
07 Jun 2023
M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual
  Instruction Tuning
M3^33IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning
Lei Li
Yuwei Yin
Shicheng Li
Liang Chen
Peiyi Wang
...
Yazheng Yang
Jingjing Xu
Xu Sun
Lingpeng Kong
Qi Liu
MLLM
VLM
27
115
0
07 Jun 2023
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order
  Learning
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
Jianghui Wang
Yuxuan Wang
Dongyan Zhao
Zilong Zheng
46
1
0
04 Jun 2023
Unveiling Cross Modality Bias in Visual Question Answering: A Causal
  View with Possible Worlds VQA
Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA
A. Vosoughi
Shijian Deng
Songyang Zhang
Yapeng Tian
Chenliang Xu
Jiebo Luo
CML
53
3
0
31 May 2023
Generate then Select: Open-ended Visual Question Answering Guided by
  World Knowledge
Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge
Xingyu Fu
Shenmin Zhang
Gukyeong Kwon
Pramuditha Perera
Henghui Zhu
...
Zhiguo Wang
Vittorio Castelli
Patrick K. L. Ng
Dan Roth
Bing Xiang
29
19
0
30 May 2023
Discovering Novel Actions from Open World Egocentric Videos with
  Object-Grounded Visual Commonsense Reasoning
Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning
Sanjoy Kundu
Shubham Trehan
Sathyanarayanan N. Aakur
LRM
LM&Ro
27
1
0
26 May 2023
MEMEX: Detecting Explanatory Evidence for Memes via Knowledge-Enriched
  Contextualization
MEMEX: Detecting Explanatory Evidence for Memes via Knowledge-Enriched Contextualization
Shivam Sharma
S Ramaneswaran
Udit Arora
Md. Shad Akhtar
Tanmoy Chakraborty
38
9
0
25 May 2023
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image
  Regions
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions
Woojeong Jin
Subhabrata Mukherjee
Yu Cheng
Yelong Shen
Weizhu Chen
Ahmed Hassan Awadallah
Damien Jose
Xiang Ren
ObjD
VLM
33
8
0
24 May 2023
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Haoxuan You
Rui Sun
Zhecan Wang
Long Chen
Gengyu Wang
Hammad A. Ayyubi
Kai-Wei Chang
Shih-Fu Chang
VLM
MLLM
LRM
52
43
0
24 May 2023
Preconditioned Visual Language Inference with Weak Supervision
Preconditioned Visual Language Inference with Weak Supervision
Ehsan Qasemi
Amani Maina-Kilaas
Devadutta Dash
Khalid Alsaggaf
Muhao Chen
25
0
0
22 May 2023
What Makes for Good Visual Tokenizers for Large Language Models?
What Makes for Good Visual Tokenizers for Large Language Models?
Guangzhi Wang
Yixiao Ge
Xiaohan Ding
Mohan S. Kankanhalli
Ying Shan
MLLM
VLM
33
38
0
20 May 2023
An Empirical Study on the Language Modal in Visual Question Answering
An Empirical Study on the Language Modal in Visual Question Answering
Daowan Peng
Wei Wei
Xian-Ling Mao
Yuanyuan Fu
Dangyang Chen
39
4
0
17 May 2023
Explaining black box text modules in natural language with language
  models
Explaining black box text modules in natural language with language models
Chandan Singh
Aliyah R. Hsu
Richard Antonello
Shailee Jain
Alexander G. Huth
Bin-Xia Yu
Jianfeng Gao
MILM
34
47
0
17 May 2023
Measuring Progress in Fine-grained Vision-and-Language Understanding
Measuring Progress in Fine-grained Vision-and-Language Understanding
Emanuele Bugliarello
Laurent Sartran
Aishwarya Agrawal
Lisa Anne Hendricks
Aida Nematzadeh
VLM
36
22
0
12 May 2023
Egocentric Hierarchical Visual Semantics
Egocentric Hierarchical Visual Semantics
L. Erculiani
A. Bontempelli
Andrea Passerini
Fausto Giunchiglia
OCL
24
2
0
09 May 2023
A Multi-Modal Context Reasoning Approach for Conditional Inference on
  Joint Textual and Visual Clues
A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues
Yunxin Li
Baotian Hu
Xinyu Chen
Yuxin Ding
Lin Ma
Min Zhang
LRM
48
14
0
08 May 2023
LMEye: An Interactive Perception Network for Large Language Models
LMEye: An Interactive Perception Network for Large Language Models
Yunxin Li
Baotian Hu
Xinyu Chen
Lin Ma
Yong-mei Xu
Hao Fei
MLLM
VLM
33
24
0
05 May 2023
Fashionpedia-Taste: A Dataset towards Explaining Human Fashion Taste
Fashionpedia-Taste: A Dataset towards Explaining Human Fashion Taste
Mengyun Shi
Serge Belongie
Claire Cardie
29
2
0
03 May 2023
Visual Transformation Telling
Visual Transformation Telling
Wanqing Cui
Mustafa Nasir-Moin
Yanyan Lan
Viola J. Chen
J. Guo
Xueqi Cheng
LRM
64
1
0
03 May 2023
Visual Reasoning: from State to Transformation
Visual Reasoning: from State to Transformation
Xin Hong
Yanyan Lan
Liang Pang
J. Guo
Xueqi Cheng
LRM
19
4
0
02 May 2023
Interpreting Vision and Language Generative Models with Semantic Visual
  Priors
Interpreting Vision and Language Generative Models with Semantic Visual Priors
Michele Cafagna
L. Rojas-Barahona
Kees van Deemter
Albert Gatt
FAtt
VLM
17
1
0
28 Apr 2023
What does CLIP know about a red circle? Visual prompt engineering for
  VLMs
What does CLIP know about a red circle? Visual prompt engineering for VLMs
Aleksandar Shtedritski
Christian Rupprecht
Andrea Vedaldi
VLM
MLLM
32
140
0
13 Apr 2023
CLIP-Guided Vision-Language Pre-training for Question Answering in 3D
  Scenes
CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes
Maria Parelli
Alexandros Delitzas
Nikolas Hars
G. Vlassis
Sotiris Anagnostidis
Gregor Bachmann
Thomas Hofmann
CLIP
20
50
0
12 Apr 2023
CAVL: Learning Contrastive and Adaptive Representations of Vision and
  Language
CAVL: Learning Contrastive and Adaptive Representations of Vision and Language
Shentong Mo
Jingfei Xia
Ihor Markevych
CLIP
VLM
16
1
0
10 Apr 2023
What's in a Name? Beyond Class Indices for Image Recognition
What's in a Name? Beyond Class Indices for Image Recognition
Kai Han
Yandong Li
S. Vaze
Jie Li
Xuhui Jia
VLM
26
7
0
05 Apr 2023
Personality-aware Human-centric Multimodal Reasoning: A New Task,
  Dataset and Baselines
Personality-aware Human-centric Multimodal Reasoning: A New Task, Dataset and Baselines
Yaochen Zhu
Xiangqing Shen
Rui Xia
26
5
0
05 Apr 2023
Self-Supervised Multimodal Learning: A Survey
Self-Supervised Multimodal Learning: A Survey
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
24
43
0
31 Mar 2023
IRFL: Image Recognition of Figurative Language
IRFL: Image Recognition of Figurative Language
Ron Yosef
Yonatan Bitton
Dafna Shahaf
43
18
0
27 Mar 2023
Borrowing Human Senses: Comment-Aware Self-Training for Social Media
  Multimodal Classification
Borrowing Human Senses: Comment-Aware Self-Training for Social Media Multimodal Classification
Chunpu Xu
Jing Li
VLM
26
5
0
27 Mar 2023
Equivariant Similarity for Vision-Language Foundation Models
Equivariant Similarity for Vision-Language Foundation Models
Tan Wang
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Zhengyuan Yang
Hanwang Zhang
Zicheng Liu
Lijuan Wang
CoGe
46
44
0
25 Mar 2023
Video Pre-trained Transformer: A Multimodal Mixture of Pre-trained
  Experts
Video Pre-trained Transformer: A Multimodal Mixture of Pre-trained Experts
Kastan Day
D. Christl
Rohan Salvi
Pranav Sriram
ViT
27
1
0
24 Mar 2023
Previous
123456...101112
Next