Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.07998
Cited By
v1
v2
v3 (latest)
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"
50 / 1,868 papers shown
Title
Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph Captioning
D. Guo
Ruiying Lu
Bo Chen
Zequn Zeng
Mingyuan Zhou
VLM
89
9
0
10 May 2021
e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
Maxime Kayser
Oana-Maria Camburu
Leonard Salewski
Cornelius Emde
Virginie Do
Zeynep Akata
Thomas Lukasiewicz
VLM
112
101
0
08 May 2021
Exploring Explicit and Implicit Visual Relationships for Image Captioning
Zeliang Song
Xiaofei Zhou
26
8
0
06 May 2021
AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss
Yangyang Guo
Liqiang Nie
Zhiyong Cheng
Feng Ji
Ji Zhang
A. Bimbo
71
35
0
05 May 2021
A survey on VQA_Datasets and Approaches
Yeyun Zou
Qiyu Xie
81
18
0
02 May 2021
End-to-End Attention-based Image Captioning
Carola Sundaramoorthy
Lin Ziwen Kelvin
Mahak Sarin
Shubham Gupta
ViT
57
6
0
30 Apr 2021
A First Look: Towards Explainable TextVQA Models via Visual and Textual Explanations
Varun Nagaraj Rao
Xingjian Zhen
K. Hovsepian
Mingwei Shen
97
19
0
29 Apr 2021
Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning
Ukyo Honda
Yoshitaka Ushiku
Atsushi Hashimoto
Taro Watanabe
Yuji Matsumoto
68
23
0
28 Apr 2021
SGNet: A Super-class Guided Network for Image Classification and Object Detection
Kaidong Li
Ningning Wang
Yiju Yang
Guanghui Wang
153
22
0
26 Apr 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
249
897
0
26 Apr 2021
InfographicVQA
Minesh Mathew
Viraj Bagal
Rubèn Pérez Tito
Dimosthenis Karatzas
Ernest Valveny
C. V. Jawahar
112
242
0
26 Apr 2021
MusCaps: Generating Captions for Music Audio
Ilaria Manco
Emmanouil Benetos
Elio Quinton
Gyorgy Fazekas
116
37
0
24 Apr 2021
Playing Lottery Tickets with Vision and Language
Zhe Gan
Yen-Chun Chen
Linjie Li
Tianlong Chen
Yu Cheng
Shuohang Wang
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
142
56
0
23 Apr 2021
Towards Accurate Text-based Image Captioning with Content Diversity Exploration
Guanghui Xu
Shuaicheng Niu
Mingkui Tan
Yucheng Luo
Qing Du
Qi Wu
DiffM
84
58
0
23 Apr 2021
Discrete-continuous Action Space Policy Gradient-based Attention for Image-Text Matching
Shiyang Yan
Li Yu
Yuan Xie
88
34
0
21 Apr 2021
BM-NAS: Bilevel Multimodal Neural Architecture Search
Yihang Yin
Siyu Huang
Xiang Zhang
84
27
0
19 Apr 2021
Concadia: Towards Image-Based Text Generation with a Purpose
Elisa Kreiss
Fei Fang
Noah D. Goodman
Christopher Potts
136
23
0
16 Apr 2021
Effect of Visual Extensions on Natural Language Understanding in Vision-and-Language Models
Taichi Iki
Akiko Aizawa
VLM
60
20
0
16 Apr 2021
MultiModalQA: Complex Question Answering over Text, Tables and Images
Alon Talmor
Ori Yoran
Amnon Catav
Dan Lahav
Yizhong Wang
Akari Asai
Gabriel Ilharco
Hannaneh Hajishirzi
Jonathan Berant
LMTD
99
163
0
13 Apr 2021
Dealing with Missing Modalities in the Visual Question Answer-Difference Prediction Task through Knowledge Distillation
Jae-Won Cho
Dong-Jin Kim
Jinsoo Choi
Yunjae Jung
In So Kweon
VLM
57
17
0
13 Apr 2021
Visual Goal-Step Inference using wikiHow
Yue Yang
Artemis Panagopoulou
Qing Lyu
Li Zhang
Mark Yatskar
Chris Callison-Burch
85
46
0
12 Apr 2021
The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation
Yuankai Qi
Zizheng Pan
Yicong Hong
Ming-Hsuan Yang
Anton Van Den Hengel
Qi Wu
LM&Ro
82
69
0
09 Apr 2021
Exploiting Natural Language for Efficient Risk-Aware Multi-robot SaR Planning
Vikram Shree
B. Asfora
Rachel Zheng
Samantha Hong
Jacopo Banfi
M. Campbell
46
10
0
08 Apr 2021
Video Question Answering with Phrases via Semantic Roles
Arka Sadhu
Kan Chen
Ram Nevatia
51
16
0
08 Apr 2021
How Transferable are Reasoning Patterns in VQA?
Corentin Kervadec
Theo Jaunet
G. Antipov
M. Baccouche
Romain Vuillemot
Christian Wolf
LRM
59
28
0
08 Apr 2021
Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering
Corentin Dancette
Rémi Cadène
Damien Teney
Matthieu Cord
CML
94
78
0
07 Apr 2021
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Zhicheng Huang
Zhaoyang Zeng
Yupan Huang
Bei Liu
Dongmei Fu
Jianlong Fu
VLM
ViT
158
274
0
07 Apr 2021
Differentiable Patch Selection for Image Recognition
Jean-Baptiste Cordonnier
Aravindh Mahendran
Alexey Dosovitskiy
Dirk Weissenborn
Jakob Uszkoreit
Thomas Unterthiner
110
96
0
07 Apr 2021
Fine-Grained Fashion Similarity Prediction by Attribute-Specific Embedding Learning
Jianfeng Dong
Zhe Ma
Xiaofeng Mao
Xun Yang
Yuan He
Richang Hong
S. Ji
OOD
71
32
0
06 Apr 2021
Compressing Visual-linguistic Model via Knowledge Distillation
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lijuan Wang
Yezhou Yang
Zicheng Liu
VLM
116
99
0
05 Apr 2021
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation
Yapeng Tian
Di Hu
Chenliang Xu
ObjD
90
88
0
05 Apr 2021
FixMyPose: Pose Correctional Captioning and Retrieval
Hyounghun Kim
Abhaysinh Zala
Graham Burri
Joey Tianyi Zhou
51
16
0
04 Apr 2021
VisQA: X-raying Vision and Language Reasoning in Transformers
Theo Jaunet
Corentin Kervadec
Romain Vuillemot
G. Antipov
M. Baccouche
Christian Wolf
64
26
0
02 Apr 2021
Towards General Purpose Vision Systems
Tanmay Gupta
Amita Kamath
Aniruddha Kembhavi
Derek Hoiem
100
53
0
01 Apr 2021
Improved and efficient inter-vehicle distance estimation using road gradients of both ego and target vehicles
Robik Shrestha
Jinkyu Lee
Kushal Kafle
S. Hwang
Il Yong Chun
79
1
0
01 Apr 2021
Zero-Shot Language Transfer vs Iterative Back Translation for Unsupervised Machine Translation
Aviral Joshi
Chengzhi Huang
H. Singh
48
0
0
31 Mar 2021
Attention, please! A survey of Neural Attention Models in Deep Learning
Alana de Santana Correia
Esther Luna Colombini
HAI
128
198
0
31 Mar 2021
Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Linbo Jin
Ben Chen
Hao Zhou
Minghui Qiu
Ling Shao
VLM
99
121
0
30 Mar 2021
Domain-robust VQA with diverse datasets and methods but no target labels
Ruotong Wang
Tristan D. Maidment
Ahmad Diab
Adriana Kovashka
R. Hwa
OOD
129
23
0
29 Mar 2021
On Hallucination and Predictive Uncertainty in Conditional Language Generation
Yijun Xiao
Wenjie Wang
HILM
175
192
0
28 Mar 2021
Generating and Evaluating Explanations of Attended and Error-Inducing Input Regions for VQA Models
Arijit Ray
Michael Cogswell
Xiaoyu Lin
Kamran Alipour
Ajay Divakaran
Yi Yao
Giedrius Burachas
FAtt
36
4
0
26 Mar 2021
Describing and Localizing Multiple Changes with Transformers
Yue Qiu
Shintaro Yamamoto
Kodai Nakashima
Ryota Suzuki
K. Iwata
Hirokatsu Kataoka
Y. Satoh
91
59
0
25 Mar 2021
Projection: A Mechanism for Human-like Reasoning in Artificial Intelligence
Frank Guerin
82
6
0
24 Mar 2021
Structured Co-reference Graph Attention for Video-grounded Dialogue
Junyeong Kim
Sunjae Yoon
Dahyun Kim
Chang D. Yoo
68
26
0
24 Mar 2021
VLGrammar: Grounded Grammar Induction of Vision and Language
Yining Hong
Qing Li
Song-Chun Zhu
Siyuan Huang
VLM
89
25
0
24 Mar 2021
Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos
Sijie Song
Xudong Lin
Jiaying Liu
Zongming Guo
Shih-Fu Chang
ObjD
60
16
0
23 Mar 2021
Human-like Controllable Image Captioning with Verb-specific Semantic Roles
Long Chen
Zhihong Jiang
Jun Xiao
Wei Liu
97
77
0
22 Mar 2021
Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval
Gregor Geigle
Jonas Pfeiffer
Nils Reimers
Ivan Vulić
Iryna Gurevych
104
60
0
22 Mar 2021
An Unsupervised Sampling Approach for Image-Sentence Matching Using Document-Level Structural Information
Zejun Li
Zhongyu Wei
Zhihao Fan
Haijun Shan
Xuanjing Huang
52
5
0
21 Mar 2021
3M: Multi-style image caption generation using Multi-modality features under Multi-UPDOWN model
Chengxi Li
Brent Harrison
125
6
0
20 Mar 2021
Previous
1
2
3
...
22
23
24
...
36
37
38
Next