Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.07998
Cited By
v1
v2
v3 (latest)
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"
50 / 1,868 papers shown
Title
Expressing Objects just like Words: Recurrent Visual Embedding for Image-Text Matching
Tianlang Chen
Jiebo Luo
72
69
0
20 Feb 2020
VQA-LOL: Visual Question Answering under the Lens of Logic
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
CoGe
71
75
0
19 Feb 2020
CQ-VQA: Visual Question Answering on Categorized Questions
Aakansha Mishra
A. Anand
Prithwijit Guha
145
6
0
17 Feb 2020
Gaussian Smoothen Semantic Features (GSSF) -- Exploring the Linguistic Aspects of Visual Captioning in Indian Languages (Bengali) Using MSCOCO Framework
C. Sur
116
7
0
16 Feb 2020
MRRC: Multiple Role Representation Crossover Interpretation for Image Captioning With R-CNN Feature Distribution Composition (FDC)
C. Sur
52
17
0
15 Feb 2020
Sparse and Structured Visual Attention
Pedro Henrique Martins
S. Becker
Zita Marinho
Michael Arens
78
8
0
13 Feb 2020
Component Analysis for Visual Question Answering Architectures
Camila Kolling
Jonatas Wehrmann
Rodrigo C. Barros
CoGe
36
2
0
12 Feb 2020
Object Detection as a Positive-Unlabeled Problem
Yuewei Yang
Kevin J. Liang
Lawrence Carin
82
39
0
11 Feb 2020
Vision-based Fight Detection from Surveillance Cameras
Seymanur Akti
G. A. Tataroglu
H. K. Ekenel
49
78
0
11 Feb 2020
Multimodal Matching Transformer for Live Commenting
Chaoqun Duan
Lei Cui
Shuming Ma
Furu Wei
Conghui Zhu
Tiejun Zhao
26
12
0
07 Feb 2020
iCap: Interactive Image Captioning with Predictive Text
Zhengxiong Jia
Xirong Li
30
8
0
31 Jan 2020
Evaluating the Progress of Deep Learning for Visual Relational Concepts
Sebastian Stabinger
Peer David
J. Piater
A. Rodríguez-Sánchez
79
19
0
29 Jan 2020
Explaining with Counter Visual Attributes and Examples
Sadaf Gulshad
A. Smeulders
XAI
FAtt
AAML
77
15
0
27 Jan 2020
aiTPR: Attribute Interaction-Tensor Product Representation for Image Caption
C. Sur
40
8
0
27 Jan 2020
Uncertainty based Class Activation Maps for Visual Question Answering
Badri N. Patro
Mayank Lunayach
Vinay P. Namboodiri
FAtt
UQCV
32
1
0
23 Jan 2020
ManyModalQA: Modality Disambiguation and QA over Diverse Inputs
Darryl Hannan
Akshay Jain
Joey Tianyi Zhou
AAML
85
60
0
22 Jan 2020
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
Di Qi
Lin Su
Jianwei Song
Edward Cui
Taroon Bharti
Arun Sacheti
VLM
132
263
0
22 Jan 2020
Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models
M. Farazi
Salman H. Khan
Nick Barnes
79
18
0
20 Jan 2020
Modality-Balanced Models for Visual Dialogue
Hyounghun Kim
Hao Tan
Joey Tianyi Zhou
61
27
0
17 Jan 2020
Show, Recall, and Tell: Image Captioning with Recall Mechanism
Li Wang
Zechen Bai
Yonghua Zhang
Hongtao Lu
75
67
0
15 Jan 2020
Ensemble based discriminative models for Visual Dialog Challenge 2018
Shubham Agarwal
Raghav Goyal
32
1
0
15 Jan 2020
In Defense of Grid Features for Visual Question Answering
Huaizu Jiang
Ishan Misra
Marcus Rohrbach
Erik Learned-Miller
Xinlei Chen
OOD
ObjD
88
320
0
10 Jan 2020
Visual Question Answering on 360° Images
Shih-Han Chou
Wei-Lun Chao
Wei-Sheng Lai
Min Sun
Ming-Hsuan Yang
52
22
0
10 Jan 2020
Explain and Improve: LRP-Inference Fine-Tuning for Image Captioning Models
Jiamei Sun
Sebastian Lapuschkin
Wojciech Samek
Alexander Binder
FAtt
98
30
0
04 Jan 2020
Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering
Lei Shi
Shijie Geng
Kai Shuang
Chiori Hori
Songxiang Liu
Peng Gao
Sen Su
85
11
0
03 Jan 2020
Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation
Xinjie Fan
Yizhe Zhang
Zhendong Wang
Mingyuan Zhou
BDL
72
4
0
31 Dec 2019
Visual Agreement Regularized Training for Multi-Modal Machine Translation
Pengcheng Yang
Boxing Chen
Pei Zhang
Xu Sun
154
31
0
27 Dec 2019
Vision and Language: from Visual Perception to Content Creation
Tao Mei
Wei Zhang
Ting Yao
VLM
76
8
0
26 Dec 2019
Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
Guangxiang Zhao
Junyang Lin
Zhiyuan Zhang
Xuancheng Ren
Qi Su
Xu Sun
79
113
0
25 Dec 2019
Look, Read and Feel: Benchmarking Ads Understanding with Multimodal Multitask Learning
Huaizheng Zhang
Yong Luo
Qiming Ai
Yonggang Wen
113
15
0
21 Dec 2019
CPGAN: Full-Spectrum Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis
Jiadong Liang
Wenjie Pei
Feng Lu
GAN
60
19
0
18 Dec 2019
DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog
Feilong Chen
Fandong Meng
Jiaming Xu
Peng Li
Bo Xu
Jie Zhou
93
34
0
18 Dec 2019
Meshed-Memory Transformer for Image Captioning
Marcella Cornia
Matteo Stefanini
Lorenzo Baraldi
Rita Cucchiara
110
890
0
17 Dec 2019
MEDIRL: Predicting the Visual Attention of Drivers via Maximum Entropy Deep Inverse Reinforcement Learning
Sonia Baee
Erfan Pakdamanian
Inki Kim
Lu Feng
Vicente Ordonez
Laura E. Barnes
99
47
0
17 Dec 2019
SG-VAE: Scene Grammar Variational Autoencoder to generate new indoor scenes
Pulak Purkait
Christopher Zach
Ian Reid
3DV
DRL
33
1
0
10 Dec 2019
A Real-time Global Inference Network for One-stage Referring Expression Comprehension
Yiyi Zhou
Rongrong Ji
Gen Luo
Xiaoshuai Sun
Jinsong Su
Xinghao Ding
Chia-Wen Lin
Q. Tian
ObjD
85
64
0
07 Dec 2019
Connecting Vision and Language with Localized Narratives
Jordi Pont-Tuset
J. Uijlings
Soravit Changpinyo
Radu Soricut
V. Ferrari
ObjD
143
252
0
06 Dec 2019
Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks
Corentin Kervadec
G. Antipov
M. Baccouche
Christian Wolf
60
15
0
06 Dec 2019
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
Vishvak Murahari
Dhruv Batra
Devi Parikh
Abhishek Das
VLM
111
117
0
05 Dec 2019
12-in-1: Multi-Task Vision and Language Representation Learning
Jiasen Lu
Vedanuj Goswami
Marcus Rohrbach
Devi Parikh
Stefan Lee
VLM
ObjD
131
481
0
05 Dec 2019
Knowledge-Enriched Visual Storytelling
Chao-Chun Hsu
Zi-Yuan Chen
Chi-Yang Hsu
Chih-Chia Li
Tzu-Yuan Lin
Ting-Hao 'Kenneth' Huang
Lun-Wei Ku
DiffM
90
47
0
03 Dec 2019
Deep Bayesian Active Learning for Multiple Correct Outputs
Khaled Jedoui
Ranjay Krishna
Michael S. Bernstein
Li Fei-Fei
BDL
OOD
UQCV
93
14
0
02 Dec 2019
Learning to Relate from Captions and Bounding Boxes
Sarthak Garg
Joel Ruben Antony Moniz
Anshu Aviral
Priyatham Bollimpalli
38
3
0
01 Dec 2019
Multimodal Attention Networks for Low-Level Vision-and-Language Navigation
Federico Landi
Lorenzo Baraldi
Marcella Cornia
M. Corsini
Rita Cucchiara
LM&Ro
87
29
0
27 Nov 2019
Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs
Van-Quang Nguyen
Masanori Suganuma
Takayuki Okatani
107
7
0
26 Nov 2019
Two Causal Principles for Improving Visual Dialog
Jiaxin Qi
Yulei Niu
Jianqiang Huang
Hanwang Zhang
CML
110
149
0
24 Nov 2019
Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences
Shizhe Chen
Bei Liu
Jianlong Fu
Ruihua Song
Qin Jin
Pingping Lin
Xiaoyu Qi
Chunting Wang
Jin Zhou
DiffM
75
33
0
24 Nov 2019
Unsupervised Keyword Extraction for Full-sentence VQA
Kohei Uehara
Tatsuya Harada
32
1
0
23 Nov 2019
CRUR: Coupled-Recurrent Unit for Unification, Conceptualization and Context Capture for Language Representation -- A Generalization of Bi Directional LSTM
C. Sur
BDL
49
6
0
22 Nov 2019
TPsgtR: Neural-Symbolic Tensor Product Scene-Graph-Triplet Representation for Image Captioning
C. Sur
79
13
0
22 Nov 2019
Previous
1
2
3
...
30
31
32
...
36
37
38
Next