ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.17213
  4. Cited By
VCD: A Dataset for Visual Commonsense Discovery in Images
v1v2 (latest)

VCD: A Dataset for Visual Commonsense Discovery in Images

27 February 2024
Xiangqing Shen
Yurun Song
Siwei Wu
Rui Xia
ArXiv (abs)PDFHTML

Papers citing "VCD: A Dataset for Visual Commonsense Discovery in Images"

50 / 53 papers shown
Title
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for
  Multi-modal Large Language Models
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Ziyi Lin
Chris Liu
Renrui Zhang
Peng Gao
Longtian Qiu
...
Siyuan Huang
Yichi Zhang
Xuming He
Hongsheng Li
Yu Qiao
MLLMVLM
95
230
0
13 Nov 2023
The All-Seeing Project: Towards Panoptic Visual Recognition and
  Understanding of the Open World
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Weiyun Wang
Min Shi
Qingyun Li
Wen Wang
Zhenhang Huang
...
Zhiguo Cao
Yushi Chen
Tong Lu
Jifeng Dai
Yu Qiao
LRMMLLM
119
88
0
03 Aug 2023
RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic
  and Regional Comprehension
RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension
Qiang-feng Zhou
Chaohui Yu
Shaofeng Zhang
Sitong Wu
Zhibin Wang
Fan Wang
76
27
0
03 Aug 2023
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring
  Instruction Tuning
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning
Liang Zhao
En Yu
Zheng Ge
Jinrong Yang
Hao-Ran Wei
...
Jian‐Yuan Sun
Yuang Peng
Runpei Dong
Chunrui Han
Xiangyu Zhang
MLLMLRM
69
54
0
18 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MHALM
419
12,091
0
18 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
MLLMVLM
158
238
0
07 Jul 2023
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Ke Chen
Zhao Zhang
Weili Zeng
Richong Zhang
Feng Zhu
Rui Zhao
ObjD
108
651
0
27 Jun 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
Kosmos-2: Grounding Multimodal Large Language Models to the World
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLMObjDVLM
123
764
0
26 Jun 2023
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language
  Models
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu
Peixian Chen
Yunhang Shen
Yulei Qin
Mengdan Zhang
...
Xiawu Zheng
Ke Li
Xing Sun
Zhenyu Qiu
Rongrong Ji
ELMMLLM
132
859
0
23 Jun 2023
ImageNetVC: Zero- and Few-Shot Visual Commonsense Evaluation on 1000
  ImageNet Categories
ImageNetVC: Zero- and Few-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories
Heming Xia
Qingxiu Dong
Lei Li
Jingjing Xu
Tianyu Liu
Ziwei Qin
Zhifang Sui
MLLMVLM
47
3
0
24 May 2023
InstructBLIP: Towards General-purpose Vision-Language Models with
  Instruction Tuning
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLMVLM
152
2,099
0
11 May 2023
A-CAP: Anticipation Captioning with Commonsense Knowledge
A-CAP: Anticipation Captioning with Commonsense Knowledge
D. Vo
Quoc-An Luong
Akihiro Sugimoto
Hideki Nakayama
65
1
0
13 Apr 2023
Devil's on the Edges: Selective Quad Attention for Scene Graph
  Generation
Devil's on the Edges: Selective Quad Attention for Scene Graph Generation
Deunsol Jung
Sanghyun Kim
Wonhui Kim
Minsu Cho
3DPCGNN
79
34
0
07 Apr 2023
Personality-aware Human-centric Multimodal Reasoning: A New Task,
  Dataset and Baselines
Personality-aware Human-centric Multimodal Reasoning: A New Task, Dataset and Baselines
Yaochen Zhu
Xiangqing Shen
Rui Xia
108
4
0
05 Apr 2023
Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining
  on Visual Language Understanding
Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding
Morris Alper
Michael Fiman
Hadar Averbuch-Elor
VLMLRM
56
15
0
21 Mar 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
1.5K
14,761
0
15 Mar 2023
ViperGPT: Visual Inference via Python Execution for Reasoning
ViperGPT: Visual Inference via Python Execution for Reasoning
Dídac Surís
Sachit Menon
Carl Vondrick
MLLMLRMReLM
123
467
0
14 Mar 2023
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of
  Synthetic and Compositional Images
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images
Nitzan Bitton-Guetta
Yonatan Bitton
Jack Hessel
Ludwig Schmidt
Yuval Elovici
Gabriel Stanovsky
Roy Schwartz
VLM
191
70
0
13 Mar 2023
Intrinsic Physical Concepts Discovery with Object-Centric Predictive
  Models
Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models
Qu Tang
Xiangyu Zhu
Zhen Lei
Zhaoxiang Zhang
OCL
109
8
0
03 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
1.5K
13,490
0
27 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
432
4,663
0
30 Jan 2023
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction
  Tuning
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
Zhiyang Xu
Ying Shen
Lifu Huang
MLLM
118
120
0
21 Dec 2022
Iterative Scene Graph Generation with Generative Transformers
Iterative Scene Graph Generation with Generative Transformers
Sanjoy Kundu
Sathyanarayanan N. Aakur
ViT
77
28
0
30 Nov 2022
Visually Grounded Commonsense Knowledge Acquisition
Visually Grounded Commonsense Knowledge Acquisition
Yuan Yao
Tianyu Yu
Ao Zhang
Mengdi Li
Ruobing Xie
...
Zhiyuan Liu
Haitao Zheng
S. Wermter
Tat-Seng Chua
Maosong Sun
SSL
70
6
0
22 Nov 2022
Dense-ATOMIC: Towards Densely-connected ATOMIC with High Knowledge
  Coverage and Massive Multi-hop Paths
Dense-ATOMIC: Towards Densely-connected ATOMIC with High Knowledge Coverage and Massive Multi-hop Paths
Xiangqing Shen
Siwei Wu
Rui Xia
67
13
0
14 Oct 2022
Visualize Before You Write: Imagination-Guided Open-Ended Text
  Generation
Visualize Before You Write: Imagination-Guided Open-Ended Text Generation
Wanrong Zhu
An Yan
Yujie Lu
Wenda Xu
Xinze Wang
Miguel P. Eckstein
William Yang Wang
119
36
0
07 Oct 2022
VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks
  for Visual Question Answering
VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering
Yanan Wang
Michihiro Yasunaga
Hongyu Ren
Shinya Wada
J. Leskovec
73
18
0
23 May 2022
Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge
  Graph Completion
Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion
Xiang Chen
Ningyu Zhang
Lei Li
Shumin Deng
Chuanqi Tan
Changliang Xu
Fei Huang
Luo Si
Huajun Chen
100
135
0
04 May 2022
Visual Abductive Reasoning
Visual Abductive Reasoning
Chen Liang
Wenguan Wang
Tianfei Zhou
Yi Yang
LRM
83
40
0
26 Mar 2022
Things not Written in Text: Exploring Spatial Commonsense from Visual
  Signals
Things not Written in Text: Exploring Spatial Commonsense from Visual Signals
Xiao Liu
Da Yin
Yansong Feng
Dongyan Zhao
LRM
77
46
0
15 Mar 2022
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive
  Reasoning
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning
Jack Hessel
Jena D. Hwang
Jinho Park
Rowan Zellers
Chandra Bhagavatula
Anna Rohrbach
Kate Saenko
Yejin Choi
ReLM
211
51
0
10 Feb 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple
  Sequence-to-Sequence Learning Framework
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLMObjD
162
884
0
07 Feb 2022
VLMo: Unified Vision-Language Pre-Training with
  Mixture-of-Modality-Experts
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
Hangbo Bao
Wenhui Wang
Li Dong
Qiang Liu
Owais Khan Mohammed
Kriti Aggarwal
Subhojit Som
Furu Wei
VLMMLLMMoE
102
559
0
03 Nov 2021
The World of an Octopus: How Reporting Bias Influences a Language
  Model's Perception of Color
The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color
Cory Paik
Stéphane Aroca-Ouellette
Alessandro Roncone
Katharina Kann
60
34
0
15 Oct 2021
Symbolic Knowledge Distillation: from General Language Models to
  Commonsense Models
Symbolic Knowledge Distillation: from General Language Models to Commonsense Models
Peter West
Chandrasekhar Bhagavatula
Jack Hessel
Jena D. Hwang
Liwei Jiang
Ronan Le Bras
Ximing Lu
Sean Welleck
Yejin Choi
SyDa
114
332
0
14 Oct 2021
PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D
  World
PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World
Rowan Zellers
Ari Holtzman
Matthew E. Peters
Roozbeh Mottaghi
Aniruddha Kembhavi
Ali Farhadi
Yejin Choi
97
69
0
01 Jun 2021
Cross-Task Generalization via Natural Language Crowdsourcing
  Instructions
Cross-Task Generalization via Natural Language Crowdsourcing Instructions
Swaroop Mishra
Daniel Khashabi
Chitta Baral
Hannaneh Hajishirzi
LRM
176
753
0
18 Apr 2021
Visual Distant Supervision for Scene Graph Generation
Visual Distant Supervision for Scene Graph Generation
Yuan Yao
Ao Zhang
Xu Han
Mengdi Li
C. Weber
Zhiyuan Liu
S. Wermter
Maosong Sun
65
39
0
29 Mar 2021
ViLT: Vision-and-Language Transformer Without Convolution or Region
  Supervision
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Wonjae Kim
Bokyung Son
Ildoo Kim
VLMCLIP
144
1,763
0
05 Feb 2021
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
908
42,520
0
28 May 2020
Structured Query-Based Image Retrieval Using Scene Graphs
Structured Query-Based Image Retrieval Using Scene Graphs
Brigit Schroeder
Subarna Tripathi
GNN
102
71
0
13 May 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
160
1,948
0
13 Apr 2020
Unbiased Scene Graph Generation from Biased Training
Unbiased Scene Graph Generation from Biased Training
Kaihua Tang
Yulei Niu
Jianqiang Huang
Jiaxin Shi
Hanwang Zhang
CML
83
701
0
27 Feb 2020
Evaluating Commonsense in Pre-trained Language Models
Evaluating Commonsense in Pre-trained Language Models
Xuhui Zhou
Yue Zhang
Leyang Cui
Dandan Huang
AI4MHLRM
78
185
0
27 Nov 2019
COMET: Commonsense Transformers for Automatic Knowledge Graph
  Construction
COMET: Commonsense Transformers for Automatic Knowledge Graph Construction
Antoine Bosselut
Hannah Rashkin
Maarten Sap
Chaitanya Malaviya
Asli Celikyilmaz
Yejin Choi
82
914
0
12 Jun 2019
From Recognition to Cognition: Visual Commonsense Reasoning
From Recognition to Cognition: Visual Commonsense Reasoning
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
LRMBDLOCLReLM
186
883
0
27 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
95,324
0
11 Oct 2018
AllenNLP: A Deep Semantic Natural Language Processing Platform
AllenNLP: A Deep Semantic Natural Language Processing Platform
Matt Gardner
Joel Grus
Mark Neumann
Oyvind Tafjord
Pradeep Dasigi
Nelson F. Liu
Matthew E. Peters
Michael Schmitz
Luke Zettlemoyer
VLM
97
1,283
0
20 Mar 2018
Acquiring Common Sense Spatial Knowledge through Implicit Spatial
  Templates
Acquiring Common Sense Spatial Knowledge through Implicit Spatial Templates
Guillem Collell
Luc Van Gool
Marie-Francine Moens
43
42
0
18 Nov 2017
Visual Relationship Detection with Internal and External Linguistic
  Knowledge Distillation
Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation
Ruichi Yu
Ang Li
Vlad I. Morariu
L. Davis
64
312
0
28 Jul 2017
12
Next