Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2001.03615
Cited By
In Defense of Grid Features for Visual Question Answering
10 January 2020
Huaizu Jiang
Ishan Misra
Marcus Rohrbach
Erik Learned-Miller
Xinlei Chen
OOD
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"In Defense of Grid Features for Visual Question Answering"
23 / 23 papers shown
Title
Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis
Andrzej D. Dobrzycki
Ana M. Bernardos
Luca Bergesio
Andrzej Pomirski
Daniel Sáez-Trigueros
3DH
74
3
0
13 Jan 2025
From Data Deluge to Data Curation: A Filtering-WoRA Paradigm for Efficient Text-based Person Search
Jintao Sun
Zhedong Zheng
Gangyi Ding
Gangyi Ding
57
8
0
16 Apr 2024
Convincing Rationales for Visual Question Answering Reasoning
Kun Li
G. Vosselman
Michael Ying Yang
81
1
0
06 Feb 2024
Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment
Xiaoxu Xu
Yitian Yuan
Qiudan Zhang
Wen-Bin Wu
Zequn Jie
Lin Ma
Xu Wang
88
4
0
15 Dec 2023
Grid Jigsaw Representation with CLIP: A New Perspective on Image Clustering
Zijie Song
Zhenzhen Hu
Richang Hong
SSL
61
0
0
27 Oct 2023
UNITER: UNiversal Image-TExt Representation Learning
Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El Kholy
Faisal Ahmed
Zhe Gan
Yu Cheng
Jingjing Liu
VLM
OT
60
447
0
25 Sep 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
313
930
0
24 Sep 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
97
1,657
0
22 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
92
1,939
0
09 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
162
3,659
0
06 Aug 2019
Towards VQA Models That Can Read
Amanpreet Singh
Vivek Natarajan
Meet Shah
Yu Jiang
Xinlei Chen
Dhruv Batra
Devi Parikh
Marcus Rohrbach
EgoV
29
1,174
0
18 Apr 2019
Deformable ConvNets v2: More Deformable, Better Results
Xizhou Zhu
Han Hu
Stephen Lin
Jifeng Dai
ObjD
71
1,998
0
27 Nov 2018
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
Kexin Yi
Jiajun Wu
Chuang Gan
Antonio Torralba
Pushmeet Kohli
J. Tenenbaum
NAI
57
606
0
04 Oct 2018
VizWiz Grand Challenge: Answering Visual Questions from Blind People
Danna Gurari
Qing Li
Abigale Stangl
Anhong Guo
Chi Lin
Kristen Grauman
Jiebo Luo
Jeffrey P. Bigham
CoGe
61
831
0
22 Feb 2018
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
90
4,201
0
25 Jul 2017
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
250
2,346
0
20 Dec 2016
Pyramid Scene Parsing Network
Hengshuang Zhao
Jianping Shi
Xiaojuan Qi
Xiaogang Wang
Jiaya Jia
VOS
SSeg
208
11,941
0
04 Dec 2016
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
276
3,187
0
02 Dec 2016
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Zhuowen Tu
Kaiming He
392
10,281
0
16 Nov 2016
Stacked Attention Networks for Image Question Answering
Zichao Yang
Xiaodong He
Jianfeng Gao
Li Deng
Alex Smola
BDL
74
1,875
0
07 Nov 2015
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
129
2,461
0
01 Apr 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
233
10,034
0
10 Feb 2015
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
177
4,451
0
20 Nov 2014
1