Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.00468
Cited By
v1
v2
v3
v4
v5
v6
v7 (latest)
VQA: Visual Question Answering
3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VQA: Visual Question Answering"
50 / 2,957 papers shown
Title
When Brain-inspired AI Meets AGI
Lin Zhao
Lu Zhang
Zihao Wu
Yuzhong Chen
Haixing Dai
...
Xi Jiang
Xiang Li
Dajiang Zhu
Dinggang Shen
Tianming Liu
AI4CE
76
91
0
28 Mar 2023
Zero-Shot Composed Image Retrieval with Textual Inversion
Alberto Baldrati
Lorenzo Agnolucci
Marco Bertini
A. Bimbo
104
118
0
27 Mar 2023
Borrowing Human Senses: Comment-Aware Self-Training for Social Media Multimodal Classification
Chunpu Xu
Jing Li
VLM
62
5
0
27 Mar 2023
Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens
Yuxiao Chen
Jianbo Yuan
Yu Tian
Shijie Geng
Xinyu Li
Ding Zhou
Dimitris N. Metaxas
Hongxia Yang
86
37
0
27 Mar 2023
SEM-POS: Grammatically and Semantically Correct Video Captioning
Asmar Nadeem
A. Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
78
8
0
26 Mar 2023
Equivariant Similarity for Vision-Language Foundation Models
Tan Wang
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Zhengyuan Yang
Hanwang Zhang
Zicheng Liu
Lijuan Wang
CoGe
85
51
0
25 Mar 2023
IFSeg: Image-free Semantic Segmentation via Vision-Language Model
Sukmin Yun
S. Park
Paul Hongsuck Seo
Jinwoo Shin
VLM
MLLM
111
14
0
25 Mar 2023
Video Pre-trained Transformer: A Multimodal Mixture of Pre-trained Experts
Kastan Day
D. Christl
Rohan Salvi
Pranav Sriram
ViT
81
1
0
24 Mar 2023
Accelerating Vision-Language Pretraining with Free Language Modeling
Teng Wang
Yixiao Ge
Feng Zheng
Ran Cheng
Ying Shan
Xiaohu Qie
Ping Luo
VLM
MLLM
118
10
0
24 Mar 2023
Feature Separation and Recalibration for Adversarial Robustness
Woo Jae Kim
Y. Cho
Junsik Jung
Sung-eui Yoon
AAML
119
22
0
24 Mar 2023
CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Haoxuan You
Mandy Guo
Zhecan Wang
Kai-Wei Chang
Jason Baldridge
Jiahui Yu
DiffM
94
13
0
23 Mar 2023
Taking A Closer Look at Visual Relation: Unbiased Video Scene Graph Generation with Decoupled Label Learning
Wenqing Wang
Yawei Luo
Zhiqin Chen
Tao Jiang
Lei Chen
Yi Yang
Jun Xiao
78
8
0
23 Mar 2023
Top-Down Visual Attention from Analysis by Synthesis
Baifeng Shi
Trevor Darrell
Xin Eric Wang
88
32
0
23 Mar 2023
Integrating Image Features with Convolutional Sequence-to-sequence Network for Multilingual Visual Question Answering
T. M. Thai
Son T. Luu
86
0
0
22 Mar 2023
Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval
Ding Jiang
Mang Ye
106
157
0
22 Mar 2023
SeiT: Storage-Efficient Vision Training with Tokens Using 1% of Pixel Storage
Song Park
Sanghyuk Chun
Byeongho Heo
Wonjae Kim
Sangdoo Yun
VLM
ViT
97
8
0
20 Mar 2023
Location-Free Scene Graph Generation
Ege Özsoy
Felix Holm
Tobias Czempiel
Tobias Czempiel
Benjamin Busam
Nassir Navab
Benjamin Busam
138
4
0
20 Mar 2023
Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning
Shi Chen
Qi Zhao
98
6
0
18 Mar 2023
Data Roaming and Quality Assessment for Composed Image Retrieval
Matan Levy
Rami Ben-Ari
N. Darshan
Dani Lischinski
105
28
0
16 Mar 2023
Logical Implications for Visual Question Answering Consistency
Sergio Tascon-Morales
Pablo Márquez-Neila
Raphael Sznitman
81
9
0
16 Mar 2023
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
Xuenan Xu
Zhiling Zhang
Zelin Zhou
Pingyue Zhang
Zeyu Xie
Mengyue Wu
Ke Zhu
CLIP
164
15
0
14 Mar 2023
Implicit and Explicit Commonsense for Multi-sentence Video Captioning
Shih-Han Chou
James J. Little
Leonid Sigal
74
2
0
14 Mar 2023
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions
Deyao Zhu
Jun Chen
Kilichbek Haydarov
Xiaoqian Shen
Wenxuan Zhang
Mohamed Elhoseiny
MLLM
100
106
0
12 Mar 2023
Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models
Juncheng Li
Minghe Gao
Longhui Wei
Siliang Tang
Wenqiao Zhang
Meng Li
Wei Ji
Qi Tian
Tat-Seng Chua
Yueting Zhuang
VLM
VPVLM
107
21
0
12 Mar 2023
Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation
Zhao Yang
Jiaqi Wang
Yansong Tang
Kai-xiang Chen
Hengshuang Zhao
Philip Torr
131
24
0
11 Mar 2023
Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation
Zhiwei Zhang
Yuliang Liu
MLLM
87
0
0
10 Mar 2023
Robotic Applications of Pre-Trained Vision-Language Models to Various Recognition Behaviors
Kento Kawaharazuka
Yoshiki Obinata
Naoaki Kanazawa
K. Okada
Masayuki Inaba
LM&Ro
64
12
0
10 Mar 2023
Toward Unsupervised Realistic Visual Question Answering
Yuwei Zhang
Chih-Hui Ho
Nuno Vasconcelos
CoGe
89
2
0
09 Mar 2023
VQA-based Robotic State Recognition Optimized with Genetic Algorithm
Kento Kawaharazuka
Yoshiki Obinata
Naoaki Kanazawa
K. Okada
Masayuki Inaba
48
16
0
09 Mar 2023
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
Chenfei Wu
Sheng-Kai Yin
Weizhen Qi
Xiaodong Wang
Zecheng Tang
Nan Duan
MLLM
LRM
150
649
0
08 Mar 2023
Interpretable Visual Question Answering Referring to Outside Knowledge
He Zhu
Ren Togo
Takahiro Ogawa
Miki Haseyama
65
0
0
08 Mar 2023
Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding
Minyoung Hwang
Jaeyeon Jeong
Minsoo Kim
Yoonseon Oh
Songhwai Oh
91
21
0
07 Mar 2023
Describe me an Aucklet: Generating Grounded Perceptual Category Descriptions
Bill Noble
N. Ilinykh
114
0
0
07 Mar 2023
Graph Neural Networks in Vision-Language Image Understanding: A Survey
Henry Senior
Greg Slabaugh
Shanxin Yuan
Luca Rossi
GNN
97
21
0
07 Mar 2023
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention
Shijie Geng
Jianbo Yuan
Yu Tian
Yuxiao Chen
Yongfeng Zhang
CLIP
VLM
72
46
0
06 Mar 2023
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning
Kan Chen
Xiangqian Wu
CoGe
57
9
0
05 Mar 2023
Knowledge-Based Counterfactual Queries for Visual Question Answering
Theodoti Stoikou
Maria Lymperaiou
Giorgos Stamou
AAML
80
1
0
05 Mar 2023
Prismer: A Vision-Language Model with Multi-Task Experts
Shikun Liu
Linxi Fan
Edward Johns
Zhiding Yu
Chaowei Xiao
Anima Anandkumar
VLM
MLLM
142
25
0
04 Mar 2023
Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners
Renrui Zhang
Xiangfei Hu
Bohao Li
Siyuan Huang
Hanqiu Deng
Hongsheng Li
Yu Qiao
Peng Gao
VLM
MLLM
116
181
0
03 Mar 2023
Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models
Qu Tang
Xiangyu Zhu
Zhen Lei
Zhaoxiang Zhang
OCL
109
8
0
03 Mar 2023
MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering
Jingjing Jiang
Nanning Zheng
MoE
122
6
0
02 Mar 2023
ESceme: Vision-and-Language Navigation with Episodic Scene Memory
Qinjie Zheng
Daqing Liu
Chaoyue Wang
Jing Zhang
Dadong Wang
Dacheng Tao
LM&Ro
88
5
0
02 Mar 2023
VQA with Cascade of Self- and Co-Attention Blocks
Aakansha Mishra
Ashish Anand
Prithwijit Guha
44
1
0
28 Feb 2023
Which One Are You Referring To? Multimodal Object Identification in Situated Dialogue
Holy Lovenia
Samuel Cahyawijaya
Pascale Fung
55
1
0
28 Feb 2023
Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation
Chaohui Yu
Qiang-feng Zhou
Jingliang Li
Jia-Chao Yuan
Zhibin Wang
Fan Wang
VLM
CLL
73
14
0
28 Feb 2023
Contrastive Video Question Answering via Video Graph Transformer
Junbin Xiao
Pan Zhou
Angela Yao
Yicong Li
Richang Hong
Shuicheng Yan
Tat-Seng Chua
ViT
115
37
0
27 Feb 2023
Medical visual question answering using joint self-supervised learning
Yuan Zhou
Jing Mei
Yiqin Yu
Tanveer Syeda-Mahmood
MedIm
52
1
0
25 Feb 2023
Learning Visual Representations via Language-Guided Sampling
Mohamed El Banani
Karan Desai
Justin Johnson
SSL
VLM
124
28
0
23 Feb 2023
Quantifying & Modeling Multimodal Interactions: An Information Decomposition Framework
Paul Pu Liang
Yun Cheng
Xiang Fan
Chun Kai Ling
Suzanne Nie
...
Nicholas B. Allen
Randy P. Auerbach
Faisal Mahmood
Ruslan Salakhutdinov
Louis-Philippe Morency
116
37
0
23 Feb 2023
Side Adapter Network for Open-Vocabulary Semantic Segmentation
Mengde Xu
Zheng Zhang
Fangyun Wei
Han Hu
Xiang Bai
VLM
89
273
0
23 Feb 2023
Previous
1
2
3
...
22
23
24
...
58
59
60
Next