Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.07998
Cited By
v1
v2
v3 (latest)
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"
50 / 1,868 papers shown
Title
Detecting Hate Speech in Multi-modal Memes
Abhishek Das
Japsimar Singh Wahi
Siyao Li
64
61
0
29 Dec 2020
Image-to-Image Retrieval by Learning Similarity between Scene Graphs
Sangwoong Yoon
Woo-Young Kang
Sungwook Jeon
SeongEun Lee
C. Han
Jonghun Park
Eun-Sol Kim
3DH
93
45
0
29 Dec 2020
Detecting Hateful Memes Using a Multimodal Deep Ensemble
Vlad Sandulescu
VLM
74
44
0
24 Dec 2020
Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge
Riza Velioglu
J. Rose
VLM
50
87
0
23 Dec 2020
Open Set Domain Adaptation by Extreme Value Theory
Yiming Xu
Diego Klabjan
VLM
58
3
0
22 Dec 2020
Object-Centric Diagnosis of Visual Reasoning
Jianwei Yang
Jiayuan Mao
Jiajun Wu
Devi Parikh
David D. Cox
J. Tenenbaum
Chuang Gan
OCL
82
16
0
21 Dec 2020
Learning content and context with language bias for Visual Question Answering
Chao Yang
Su Feng
Dongsheng Li
Huawei Shen
Guoqing Wang
Bin Jiang
68
21
0
21 Dec 2020
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
Kenneth Marino
Xinlei Chen
Devi Parikh
Abhinav Gupta
Marcus Rohrbach
128
188
0
20 Dec 2020
On Modality Bias in the TVQA Dataset
T. Winterbottom
S. Xiao
A. McLean
Noura Al Moubayed
87
35
0
18 Dec 2020
Overcoming Language Priors with Self-supervised Learning for Visual Question Answering
Xi Zhu
Zhendong Mao
Chunxiao Liu
Peng Zhang
Bin Wang
Yongdong Zhang
SSL
58
117
0
17 Dec 2020
AutoCaption: Image Captioning with Neural Architecture Search
Xinxin Zhu
Weining Wang
Longteng Guo
Jing Liu
102
9
0
16 Dec 2020
A Closer Look at the Robustness of Vision-and-Language Pre-trained Models
Linjie Li
Zhe Gan
Jingjing Liu
VLM
96
44
0
15 Dec 2020
Vilio: State-of-the-art Visio-Linguistic Models applied to Hateful Memes
Niklas Muennighoff
85
64
0
14 Dec 2020
TDAF: Top-Down Attention Framework for Vision Tasks
Bo Pang
Yizhuo Li
Jiefeng Li
Muchen Li
Hanwen Cao
Cewu Lu
83
10
0
14 Dec 2020
Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding
Qingxing Cao
Bailin Li
Xiaodan Liang
Keze Wang
Liang Lin
94
36
0
14 Dec 2020
Demystifying Deep Neural Networks Through Interpretation: A Survey
Giang Dao
Minwoo Lee
FaML
FAtt
66
1
0
13 Dec 2020
Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network
Jiayi Ji
Yunpeng Luo
Xiaoshuai Sun
Fuhai Chen
Gen Luo
Yongjian Wu
Yue Gao
Rongrong Ji
ViT
113
178
0
13 Dec 2020
MiniVLM: A Smaller and Faster Vision-Language Model
Jianfeng Wang
Xiaowei Hu
Pengchuan Zhang
Xiujun Li
Lijuan Wang
Lefei Zhang
Jianfeng Gao
Zicheng Liu
VLM
MLLM
133
60
0
13 Dec 2020
Look Before you Speak: Visually Contextualized Utterances
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
99
67
0
10 Dec 2020
Image Captioning with Context-Aware Auxiliary Guidance
Zeliang Song
Xiaofei Zhou
Zhendong Mao
Jianlong Tan
88
31
0
10 Dec 2020
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps
Qi Zhu
Chenyu Gao
Peng Wang
Qi Wu
92
54
0
09 Dec 2020
Towards Annotation-Free Evaluation of Cross-Lingual Image Captioning
Aozhu Chen
Xinyi Huang
Hailan Lin
Xirong Li
120
5
0
09 Dec 2020
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
Zhengyuan Yang
Yijuan Lu
Jianfeng Wang
Xi Yin
D. Florêncio
Lijuan Wang
Cha Zhang
Lei Zhang
Jiebo Luo
VLM
107
144
0
08 Dec 2020
StacMR: Scene-Text Aware Cross-Modal Retrieval
Andrés Mafla
Rafael Sampaio de Rezende
Lluís Gómez
Diane Larlus
Dimosthenis Karatzas
3DV
102
14
0
08 Dec 2020
Edited Media Understanding Frames: Reasoning About the Intent and Implications of Visual Misinformation
Jeff Da
Maxwell Forbes
Rowan Zellers
Anthony Zheng
Jena D. Hwang
Antoine Bosselut
Yejin Choi
DiffM
85
13
0
08 Dec 2020
Confidence-aware Non-repetitive Multimodal Transformers for TextCaps
Zhaokai Wang
Renda Bao
Qi Wu
Si Liu
138
26
0
07 Dec 2020
FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding
Maryam Rahnemoonfar
Tashnim Chowdhury
Argho Sarkar
D. Varshney
M. Yari
Robin Murphy
94
258
0
05 Dec 2020
WeaQA: Weak Supervision via Captions for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
110
36
0
04 Dec 2020
Understanding Guided Image Captioning Performance across Domains
Edwin G. Ng
Bo Pang
P. Sharma
Radu Soricut
118
25
0
04 Dec 2020
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
Dave Zhenyu Chen
A. Gholami
Matthias Nießner
Angel X. Chang
3DPC
181
176
0
03 Dec 2020
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs
Emanuele Bugliarello
Ryan Cotterell
Naoaki Okazaki
Desmond Elliott
102
120
0
30 Nov 2020
Language-Driven Region Pointer Advancement for Controllable Image Captioning
Annika Lindh
R. Ross
John D. Kelleher
43
14
0
30 Nov 2020
Point and Ask: Incorporating Pointing into Visual Question Answering
Arjun Mani
Nobline Yoo
William Fu-Hinthorn
Olga Russakovsky
3DPC
82
38
0
27 Nov 2020
Learning from Lexical Perturbations for Consistent Visual Question Answering
Spencer Whitehead
Hui Wu
Yi R. Fung
Heng Ji
Rogerio Feris
Kate Saenko
68
11
0
26 Nov 2020
A Recurrent Vision-and-Language BERT for Navigation
Yicong Hong
Qi Wu
Yuankai Qi
Cristian Rodriguez-Opazo
Stephen Gould
LM&Ro
128
303
0
26 Nov 2020
Multimodal Learning for Hateful Memes Detection
Yi Zhou
Zhenhao Chen
87
61
0
25 Nov 2020
XTQA: Span-Level Explanations of the Textbook Question Answering
Jie Ma
Q. Zheng
Jun Liu
Qingyu Yin
Jianlong Zhou
Y. Huang
34
13
0
25 Nov 2020
Interpretable Visual Reasoning via Induced Symbolic Space
Zhonghao Wang
Kai Wang
Mo Yu
Jinjun Xiong
Wen-mei W. Hwu
M. Hasegawa-Johnson
Humphrey Shi
LRM
OCL
63
20
0
23 Nov 2020
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks
Humam Alwassel
Silvio Giancola
Guohao Li
92
126
0
23 Nov 2020
Language-guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning
Weixia Zhang
Chao Ma
Qi Wu
Xiaokang Yang
102
46
0
22 Nov 2020
SuperOCR: A Conversion from Optical Character Recognition to Image Captioning
Baohua Sun
Michael Lin
Hao Sha
Lin Yang
37
5
0
21 Nov 2020
LRTA: A Transparent Neural-Symbolic Reasoning Framework with Modular Supervision for Visual Question Answering
Weixin Liang
Fei Niu
Aishwarya N. Reganti
Govind Thattai
Gokhan Tur
73
17
0
21 Nov 2020
Using Text to Teach Image Retrieval
Haoyu Dong
Ze Wang
Qiang Qiu
Guillermo Sapiro
3DV
75
4
0
19 Nov 2020
Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language
Hassan Akbari
Hamid Palangi
Jianwei Yang
Sudha Rao
Asli Celikyilmaz
Roland Fernandez
P. Smolensky
Jianfeng Gao
Shih-Fu Chang
103
3
0
18 Nov 2020
Towards Improved and Interpretable Deep Metric Learning via Attentive Grouping
Xinyi Xu
Zhangyang Wang
Cheng Deng
Hao Yuan
Shuiwang Ji
FedML
73
14
0
17 Nov 2020
Structural and Functional Decomposition for Personality Image Captioning in a Communication Game
Minh-Thu Nguyen
Duy Phung
Minh Hoai
Thien Huu Nguyen
65
4
0
17 Nov 2020
iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering
Aman Chadha
Gurneet Arora
Navpreet Kaloty
66
37
0
16 Nov 2020
Reinforced Medical Report Generation with X-Linear Attention and Repetition Penalty
Wenting Xu
Chang Qi
Zhenghua Xu
Thomas Lukasiewicz
MedIm
25
4
0
16 Nov 2020
Data-efficient Alignment of Multimodal Sequences by Aligning Gradient Updates and Internal Feature Distributions
Jianan Wang
Boyang Albert Li
Xiangyu Fan
Jing-Hua Lin
Yanwei Fu
49
2
0
15 Nov 2020
A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges
Moloud Abdar
Farhad Pourpanah
Sadiq Hussain
Dana Rezazadegan
Li Liu
...
Xiaochun Cao
Abbas Khosravi
U. Acharya
V. Makarenkov
S. Nahavandi
BDL
UQCV
373
1,951
0
12 Nov 2020
Previous
1
2
3
...
24
25
26
...
36
37
38
Next