Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.07998
Cited By
v1
v2
v3 (latest)
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"
50 / 1,868 papers shown
Title
CapWAP: Captioning with a Purpose
Adam Fisch
Kenton Lee
Ming-Wei Chang
J. Clark
Regina Barzilay
53
11
0
09 Nov 2020
Imagining Grounded Conceptual Representations from Perceptual Information in Situated Guessing Games
Alessandro Suglia
Antonio Vergari
Ioannis Konstas
Yonatan Bisk
E. Bastianelli
Andrea Vanzo
Oliver Lemon
OCL
43
10
0
05 Nov 2020
DTGAN: Dual Attention Generative Adversarial Networks for Text-to-Image Generation
Zhenxing Zhang
Lambert Schomaker
GAN
67
35
0
05 Nov 2020
Utilizing Every Image Object for Semi-supervised Phrase Grounding
Haidong Zhu
Arka Sadhu
Zhao-Heng Zheng
Ram Nevatia
ObjD
66
7
0
05 Nov 2020
An Improved Attention for Visual Question Answering
Tanzila Rahman
Shih-Han Chou
Leonid Sigal
Giuseppe Carenini
55
45
0
04 Nov 2020
Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings
Yue Wang
Jing Li
Michael R. Lyu
Irwin King
75
16
0
03 Nov 2020
Dual Attention on Pyramid Feature Maps for Image Captioning
Litao Yu
Jian Zhang
Qiang Wu
108
50
0
02 Nov 2020
Diverse Image Captioning with Context-Object Split Latent Spaces
Shweta Mahajan
Stefan Roth
64
42
0
02 Nov 2020
Boost Image Captioning with Knowledge Reasoning
Feicheng Huang
Zhixin Li
Haiyang Wei
Canlong Zhang
Huifang Ma
38
25
0
02 Nov 2020
Exploring Dynamic Context for Multi-path Trajectory Prediction
Hao Cheng
Wentong Liao
Xuejiao Tang
M. Yang
Monika Sester
Bodo Rosenhahn
105
33
0
30 Oct 2020
Generating Radiology Reports via Memory-driven Transformer
Zhihong Chen
Yan Song
Tsung-Hui Chang
Xiang Wan
MedIm
76
486
0
30 Oct 2020
Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance View
Yangyang Guo
Liqiang Nie
Zhiyong Cheng
Q. Tian
Min Zhang
116
70
0
30 Oct 2020
MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering
Aisha Urooj Khan
Amir Mazaheri
N. Lobo
M. Shah
97
57
0
27 Oct 2020
Learning Multi-Agent Coordination for Enhancing Target Coverage in Directional Sensor Networks
Jing Xu
Fangwei Zhong
Yizhou Wang
83
50
0
25 Oct 2020
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering
Zanxia Jin
Heran Wu
Chun Yang
Fang Zhou
Jingyan Qin
Lei Xiao
Xu-Cheng Yin
88
31
0
24 Oct 2020
Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions
Radhika Dua
Sai Srinivas Kancheti
V. Balasubramanian
LRM
88
22
0
24 Oct 2020
Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions
Liunian Harold Li
Haoxuan You
Zhecan Wang
Alireza Zareian
Shih-Fu Chang
Kai-Wei Chang
SSL
VLM
101
12
0
24 Oct 2020
Can images help recognize entities? A study of the role of images for Multimodal NER
Shuguang Chen
Gustavo Aguilar
Leonardo Neves
Thamar Solorio
EgoV
90
37
0
23 Oct 2020
Show and Speak: Directly Synthesize Spoken Description of Images
Xinsheng Wang
Siyuan Feng
Jihua Zhu
M. Hasegawa-Johnson
O. Scharenborg
152
4
0
23 Oct 2020
Beyond the Deep Metric Learning: Enhance the Cross-Modal Matching with Adversarial Discriminative Domain Regularization
Li Ren
Keqin Li
Liqiang Wang
K. Hua
54
4
0
23 Oct 2020
Language-Conditioned Imitation Learning for Robot Manipulation Tasks
Simon Stepputtis
Joseph Campbell
Mariano Phielipp
Stefan Lee
Chitta Baral
H. B. Amor
LM&Ro
200
205
0
22 Oct 2020
Learning Dual Semantic Relations with Graph Attention for Image-Text Matching
Keyu Wen
Xiaodong Gu
Qingrong Cheng
76
97
0
22 Oct 2020
Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies
Itai Gat
Idan Schwartz
Alex Schwing
Tamir Hazan
106
92
0
21 Oct 2020
Bayesian Attention Modules
Xinjie Fan
Shujian Zhang
Bo Chen
Mingyuan Zhou
183
62
0
20 Oct 2020
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends
Shagun Uppal
Sarthak Bhagat
Devamanyu Hazarika
Navonil Majumdar
Soujanya Poria
Roger Zimmermann
Amir Zadeh
101
6
0
19 Oct 2020
Image Captioning with Visual Object Representations Grounded in the Textual Modality
Duvsan Varivs
Katsuhito Sudoh
Satoshi Nakamura
35
1
0
19 Oct 2020
Language and Visual Entity Relationship Graph for Agent Navigation
Yicong Hong
Cristian Rodriguez-Opazo
Yuankai Qi
Qi Wu
Stephen Gould
LM&Ro
226
134
0
19 Oct 2020
Unsupervised Foveal Vision Neural Networks with Top-Down Attention
Ryan Burt
Nina N. Thigpen
A. Keil
José C. Príncipe
56
2
0
18 Oct 2020
Hierarchical Conditional Relation Networks for Multimodal Video Question Answering
T. Le
Vuong Le
Svetha Venkatesh
T. Tran
BDL
138
23
0
18 Oct 2020
Answer-checking in Context: A Multi-modal FullyAttention Network for Visual Question Answering
Hantao Huang
Tao Han
Wei Han
D. Yap
Cheng-Ming Chiang
28
4
0
17 Oct 2020
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
Wei Chen
Weiping Wang
Li Liu
M. Lew
VLM
169
33
0
16 Oct 2020
Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs
Ana Marasović
Chandra Bhagavatula
J. S. Park
Ronan Le Bras
Noah A. Smith
Yejin Choi
ReLM
LRM
99
62
0
15 Oct 2020
The Benefit of Distraction: Denoising Remote Vitals Measurements using Inverse Attention
E. Nowara
Daniel J. McDuff
Ashok Veeraraghavan
53
13
0
14 Oct 2020
Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think!
Jack Hessel
Lillian Lee
108
76
0
13 Oct 2020
DORi: Discovering Object Relationship for Moment Localization of a Natural-Language Query in Video
Cristian Rodriguez-Opazo
Edison Marrese-Taylor
Basura Fernando
Hongdong Li
Stephen Gould
192
10
0
13 Oct 2020
Contrast and Classify: Training Robust VQA Models
Yash Kant
A. Moudgil
Dhruv Batra
Devi Parikh
Harsh Agrawal
55
5
0
13 Oct 2020
TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation
Dongxu Li
Chenchen Xu
Xin Yu
Kaihao Zhang
Ben Swift
H. Suominen
Hongdong Li
SLR
60
124
0
12 Oct 2020
MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding
Qinxin Wang
Hao Tan
Sheng Shen
Michael W. Mahoney
Z. Yao
ObjD
147
11
0
12 Oct 2020
Interpretable Neural Computation for Real-World Compositional Visual Question Answering
Ruixue Tang
Chao Ma
CoGe
26
2
0
10 Oct 2020
Dense Relational Image Captioning via Multi-task Triple-Stream Networks
Dong-Jin Kim
Tae-Hyun Oh
Jinsoo Choi
In So Kweon
115
27
0
08 Oct 2020
Visual News: Benchmark and Challenges in News Image Captioning
Fuxiao Liu
Yinghan Wang
Tianlu Wang
Vicente Ordonez
VLM
86
116
0
08 Oct 2020
Universal Weighting Metric Learning for Cross-Modal Matching
Jiwei Wei
Xing Xu
Yang Yang
Yanli Ji
Zheng Wang
Heng Tao Shen
70
89
0
07 Oct 2020
Vision Skills Needed to Answer Visual Questions
Xiaoyu Zeng
Yanan Wang
Tai-Yin Chiu
Nilavra Bhattacharya
Danna Gurari
66
18
0
07 Oct 2020
Learning to Represent Image and Text with Denotation Graph
Bowen Zhang
Hexiang Hu
Vihan Jain
Eugene Ie
Fei Sha
78
22
0
06 Oct 2020
Fine-Grained Grounding for Multimodal Speech Recognition
Tejas Srinivasan
Ramon Sanabria
Florian Metze
Desmond Elliott
76
11
0
05 Oct 2020
Attention Guided Semantic Relationship Parsing for Visual Question Answering
M. Farazi
Salman Khan
Nick Barnes
43
2
0
05 Oct 2020
UNISON: Unpaired Cross-lingual Image Captioning
Jiahui Gao
Yi Zhou
Philip L. H. Yu
Shafiq Joty
Jiuxiang Gu
82
17
0
03 Oct 2020
Taking Modality-free Human Identification as Zero-shot Learning
Zhizhe Liu
Xingxing Zhang
Zhenfeng Zhu
Shuai Zheng
Yao Zhao
Jian Cheng
56
4
0
02 Oct 2020
ISAAQ -- Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention
José Manuél Gómez-Pérez
Raúl Ortega
61
24
0
01 Oct 2020
Answer-Driven Visual State Estimator for Goal-Oriented Visual Dialogue
Zipeng Xu
Fangxiang Feng
Xiaojie Wang
Yushu Yang
Huixing Jiang
Zhongyuan Ouyang
47
7
0
01 Oct 2020
Previous
1
2
3
...
25
26
27
...
36
37
38
Next