Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.00468
Cited By
v1
v2
v3
v4
v5
v6
v7 (latest)
VQA: Visual Question Answering
3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VQA: Visual Question Answering"
50 / 2,957 papers shown
Title
Discover and Mitigate Unknown Biases with Debiasing Alternate Networks
Zhiheng Li
A. Hoogs
Chenliang Xu
83
56
0
20 Jul 2022
Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification
Renrui Zhang
Zhang Wei
Rongyao Fang
Peng Gao
Kunchang Li
Jifeng Dai
Yu Qiao
Hongsheng Li
VLM
145
321
0
19 Jul 2022
Exploiting Unlabeled Data with Vision and Language Models for Object Detection
Shiyu Zhao
Zhixing Zhang
S. Schulter
Long Zhao
Vijay Kumar B.G
Anastasis Stathopoulos
Manmohan Chandraker
Dimitris N. Metaxas
VLM
ObjD
89
102
0
18 Jul 2022
Rethinking Data Augmentation for Robust Visual Question Answering
Long Chen
Yuhang Zheng
Jun Xiao
OOD
88
43
0
18 Jul 2022
Towards the Human Global Context: Does the Vision-Language Model Really Judge Like a Human Being?
Sangmyeong Woh
Jaemin Lee
Hoki Kim
Jinsuk Lee
48
0
0
18 Jul 2022
Zero-Shot Temporal Action Detection via Vision-Language Prompting
Sauradip Nag
Xiatian Zhu
Yi-Zhe Song
Tao Xiang
VLM
79
68
0
17 Jul 2022
FashionViL: Fashion-Focused Vision-and-Language Representation Learning
Xiaoping Han
Licheng Yu
Xiatian Zhu
Li Zhang
Yi-Zhe Song
Tao Xiang
AI4TS
54
49
0
17 Jul 2022
Scene Graph for Embodied Exploration in Cluttered Scenario
Yuhong Deng
Qie Sima
Di Guo
Huaping Liu
Yi Wang
Gang Hua
LM&Ro
103
2
0
16 Jul 2022
Inner Monologue: Embodied Reasoning through Planning with Language Models
Wenlong Huang
F. Xia
Ted Xiao
Harris Chan
Jacky Liang
...
Tomas Jackson
Linda Luu
Sergey Levine
Karol Hausman
Brian Ichter
LLMAG
LM&Ro
LRM
211
927
0
12 Jul 2022
BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval
Wenqiao Zhang
Jiannan Guo
Meng Li
Haochen Shi
Shengyu Zhang
Juncheng Li
Siliang Tang
Yueting Zhuang
88
6
0
09 Jul 2022
CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination
Hyounghun Kim
Abhaysinh Zala
Joey Tianyi Zhou
60
6
0
08 Jul 2022
Crake: Causal-Enhanced Table-Filler for Question Answering over Large Scale Knowledge Base
Minhao Zhang
Ruoyu Zhang
Yanzeng Li
Lei Zou
83
10
0
08 Jul 2022
Adversarial Robustness of Visual Dialog
Lu Yu
Verena Rieser
AAML
83
0
0
06 Jul 2022
Chairs Can be Stood on: Overcoming Object Bias in Human-Object Interaction Detection
Guangzhi Wang
Yangyang Guo
Yongkang Wong
Mohan S. Kankanhalli
90
11
0
06 Jul 2022
Distance Matters in Human-Object Interaction Detection
Guangzhi Wang
Yangyang Guo
Yongkang Wong
Mohan S. Kankanhalli
105
13
0
05 Jul 2022
Scene-Aware Prompt for Multi-modal Dialogue Understanding and Generation
Bin Li
Yixuan Weng
Ziyu Ma
Bin Sun
Shutao Li
VLM
36
2
0
05 Jul 2022
ViRel: Unsupervised Visual Relations Discovery with Graph-level Analogy
D. Zeng
Tailin Wu
J. Leskovec
GNN
110
1
0
04 Jul 2022
Enabling Harmonious Human-Machine Interaction with Visual-Context Augmented Dialogue System: A Review
Hao Wang
Bin Guo
Y. Zeng
Yasan Ding
Chen Qiu
Ying Zhang
Li Yao
Zhiwen Yu
79
2
0
02 Jul 2022
Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation Learning and Retrieval
Keyu Wen
Zhenshan Tan
Qingrong Cheng
Cheng Chen
X. Gu
VLM
78
0
0
02 Jul 2022
Modern Question Answering Datasets and Benchmarks: A Survey
Zhen Wang
85
24
0
30 Jun 2022
A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA
Yangyang Guo
Liqiang Nie
Yongkang Wong
Yebin Liu
Zhiyong Cheng
Mohan S. Kankanhalli
121
40
0
30 Jun 2022
Technical Report for CVPR 2022 LOVEU AQTC Challenge
Hyeonyu Kim
Jongeun Kim
Jeonghun Kang
S. Park
Dongchan Park
Taehwan Kim
43
0
0
29 Jun 2022
Fair Machine Learning in Healthcare: A Review
Qizhang Feng
Mengnan Du
Na Zou
Helen Zhou
FaML
83
0
0
29 Jun 2022
EBMs vs. CL: Exploring Self-Supervised Visual Pretraining for Visual Question Answering
Violetta Shevchenko
Ehsan Abbasnejad
A. Dick
Anton Van Den Hengel
Damien Teney
73
0
0
29 Jun 2022
Consistency-preserving Visual Question Answering in Medical Imaging
Sergio Tascon-Morales
Pablo Márquez-Neila
Raphael Sznitman
MedIm
92
12
0
27 Jun 2022
Automatic Generation of Product-Image Sequence in E-commerce
Xiaochuan Fan
Chi Zhang
Yong-Jie Yang
Yue Shang
Xueying Zhang
Zhen He
Yun Xiao
Bo Long
Lingfei Wu
60
4
0
26 Jun 2022
CLAMP: Prompt-based Contrastive Learning for Connecting Language and Animal Pose
Xu Zhang
Wen Wang
Zhe Chen
Yufei Xu
Jing Zhang
Dacheng Tao
CLIP
VLM
73
28
0
23 Jun 2022
VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives
Zhuofan Ying
Peter Hase
Joey Tianyi Zhou
LRM
87
13
0
22 Jun 2022
Winning the CVPR'2022 AQTC Challenge: A Two-stage Function-centric Approach
Shiwei Wu
Weidong He
Tong Xu
Hao Wang
Enhong Chen
EgoV
64
3
0
20 Jun 2022
DALL-E for Detection: Language-driven Compositional Image Synthesis for Object Detection
Yunhao Ge
Lyne Tchapmi
Brian Nlong Zhao
Neel Joshi
Laurent Itti
Vibhav Vineet
DiffM
ObjD
107
18
0
20 Jun 2022
Interactive Visual Reasoning under Uncertainty
Manjie Xu
Guangyuan Jiang
Wei Liang
Song-Chun Zhu
Yixin Zhu
LRM
108
5
0
18 Jun 2022
VReBERT: A Simple and Flexible Transformer for Visual Relationship Detection
Yunbo Cui
M. Farazi
ViT
95
1
0
18 Jun 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjD
VLM
MLLM
173
412
0
17 Jun 2022
FD-CAM: Improving Faithfulness and Discriminability of Visual Explanation for CNNs
Hui Li
Zihao Li
Rui Ma
Tieru Wu
FAtt
47
9
0
17 Jun 2022
VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation
Kai Zheng
Xiaotong Chen
Odest Chadwicke Jenkins
Xinze Wang
LM&Ro
CoGe
95
63
0
17 Jun 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
152
240
0
16 Jun 2022
Multimodal Dialogue State Tracking
Hung Le
Nancy F. Chen
Guosheng Lin
75
9
0
16 Jun 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLM
ObjD
117
130
0
15 Jun 2022
Multimodal Learning with Transformers: A Survey
Peng Xu
Xiatian Zhu
David Clifton
ViT
241
579
0
13 Jun 2022
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
Elad Ben-Avraham
Roei Herzig
K. Mangalam
Amir Bar
Anna Rohrbach
Leonid Karlinsky
Trevor Darrell
Amir Globerson
94
0
0
13 Jun 2022
GLIPv2: Unifying Localization and Vision-Language Understanding
Haotian Zhang
Pengchuan Zhang
Xiaowei Hu
Yen-Chun Chen
Liunian Harold Li
Xiyang Dai
Lijuan Wang
Lu Yuan
Lei Li
Jianfeng Gao
ObjD
VLM
106
304
0
12 Jun 2022
A Unified Continuous Learning Framework for Multi-modal Knowledge Discovery and Pre-training
Zhihao Fan
Zhongyu Wei
Jingjing Chen
Siyuan Wang
Zejun Li
Jiarong Xu
Xuanjing Huang
CLL
59
6
0
11 Jun 2022
Revealing Single Frame Bias for Video-and-Language Learning
Jie Lei
Tamara L. Berg
Joey Tianyi Zhou
96
115
0
07 Jun 2022
cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation
Kshitij Gupta
Devansh Gautam
R. Mamidi
VLM
74
4
0
07 Jun 2022
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval
Xudong Lin
Simran Tiwari
Shiyuan Huang
Manling Li
Mike Zheng Shou
Heng Ji
Shih-Fu Chang
138
21
0
05 Jun 2022
From Pixels to Objects: Cubic Visual Attention for Visual Question Answering
Jingkuan Song
Pengpeng Zeng
Lianli Gao
Heng Tao Shen
72
62
0
04 Jun 2022
Revisiting the "Video" in Video-Language Understanding
S. Buch
Cristobal Eyzaguirre
Adrien Gaidon
Jiajun Wu
L. Fei-Fei
Juan Carlos Niebles
100
166
0
03 Jun 2022
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
Dustin Schwenk
Apoorv Khandelwal
Christopher Clark
Kenneth Marino
Roozbeh Mottaghi
77
556
0
03 Jun 2022
Pruning for Feature-Preserving Circuits in CNNs
Christopher Hamblin
Talia Konkle
G. Alvarez
80
2
0
03 Jun 2022
REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
Yuanze Lin
Yujia Xie
Dongdong Chen
Yichong Xu
Chenguang Zhu
Lu Yuan
93
75
0
02 Jun 2022
Previous
1
2
3
...
27
28
29
...
58
59
60
Next