Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.07998
Cited By
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"
50 / 769 papers shown
Title
Better Captioning with Sequence-Level Exploration
Jia Chen
Qin Jin
37
12
0
08 Mar 2020
Adaptive Offline Quintuplet Loss for Image-Text Matching
Tianlang Chen
Jiajun Deng
Jiebo Luo
181
68
0
07 Mar 2020
Show, Edit and Tell: A Framework for Editing Image Captions
Fawaz Sammani
Luke Melas-Kyriazi
KELM
DiffM
48
59
0
06 Mar 2020
XGPT: Cross-modal Generative Pre-Training for Image Captioning
Qiaolin Xia
Haoyang Huang
Nan Duan
Dongdong Zhang
Lei Ji
Zhifang Sui
Edward Cui
Taroon Bharti
Xin Liu
Ming Zhou
MLLM
VLM
25
74
0
03 Mar 2020
Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs
Shizhe Chen
Qin Jin
Peng Wang
Qi Wu
DiffM
36
215
0
01 Mar 2020
GATCluster: Self-Supervised Gaussian-Attention Network for Image Clustering
Chuang Niu
Jun Zhang
Ge Wang
Jimin Liang
SSL
27
70
0
27 Feb 2020
Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training
Weituo Hao
Chunyuan Li
Xiujun Li
Lawrence Carin
Jianfeng Gao
LM&Ro
18
274
0
25 Feb 2020
Captioning Images Taken by People Who Are Blind
Danna Gurari
Yinan Zhao
Meng Zhang
Nilavra Bhattacharya
22
181
0
20 Feb 2020
CQ-VQA: Visual Question Answering on Categorized Questions
Aakansha Mishra
A. Anand
Prithwijit Guha
33
6
0
17 Feb 2020
Gaussian Smoothen Semantic Features (GSSF) -- Exploring the Linguistic Aspects of Visual Captioning in Indian Languages (Bengali) Using MSCOCO Framework
C. Sur
27
7
0
16 Feb 2020
MRRC: Multiple Role Representation Crossover Interpretation for Image Captioning With R-CNN Feature Distribution Composition (FDC)
C. Sur
25
16
0
15 Feb 2020
Object Detection as a Positive-Unlabeled Problem
Yuewei Yang
Kevin J Liang
Lawrence Carin
21
38
0
11 Feb 2020
Vision-based Fight Detection from Surveillance Cameras
Seymanur Akti
G. A. Tataroglu
H. K. Ekenel
27
77
0
11 Feb 2020
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
Di Qi
Lin Su
Jianwei Song
Edward Cui
Taroon Bharti
Arun Sacheti
VLM
40
259
0
22 Jan 2020
Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models
M. Farazi
Salman H. Khan
Nick Barnes
23
17
0
20 Jan 2020
Show, Recall, and Tell: Image Captioning with Recall Mechanism
Li Wang
Zechen Bai
Yonghua Zhang
Hongtao Lu
27
67
0
15 Jan 2020
In Defense of Grid Features for Visual Question Answering
Huaizu Jiang
Ishan Misra
Marcus Rohrbach
Erik Learned-Miller
Xinlei Chen
OOD
ObjD
23
318
0
10 Jan 2020
Visual Agreement Regularized Training for Multi-Modal Machine Translation
Pengcheng Yang
Boxing Chen
Pei Zhang
Xu Sun
82
30
0
27 Dec 2019
Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
Guangxiang Zhao
Junyang Lin
Zhiyuan Zhang
Xuancheng Ren
Qi Su
Xu Sun
22
108
0
25 Dec 2019
Look, Read and Feel: Benchmarking Ads Understanding with Multimodal Multitask Learning
Huaizheng Zhang
Yong Luo
Qiming Ai
Yonggang Wen
25
15
0
21 Dec 2019
Meshed-Memory Transformer for Image Captioning
Marcella Cornia
Matteo Stefanini
Lorenzo Baraldi
Rita Cucchiara
14
868
0
17 Dec 2019
A Real-time Global Inference Network for One-stage Referring Expression Comprehension
Yiyi Zhou
Rongrong Ji
Gen Luo
Xiaoshuai Sun
Jinsong Su
Xinghao Ding
Chia-Wen Lin
Q. Tian
ObjD
24
60
0
07 Dec 2019
Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks
Corentin Kervadec
G. Antipov
M. Baccouche
Christian Wolf
21
15
0
06 Dec 2019
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
Vishvak Murahari
Dhruv Batra
Devi Parikh
Abhishek Das
VLM
23
115
0
05 Dec 2019
Two Causal Principles for Improving Visual Dialog
Jiaxin Qi
Yulei Niu
Jianqiang Huang
Hanwang Zhang
CML
16
146
0
24 Nov 2019
Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences
Shizhe Chen
Bei Liu
Jianlong Fu
Ruihua Song
Qin Jin
Pingping Lin
Xiaoyu Qi
Chunting Wang
Jin Zhou
DiffM
22
33
0
24 Nov 2019
TPsgtR: Neural-Symbolic Tensor Product Scene-Graph-Triplet Representation for Image Captioning
C. Sur
27
13
0
22 Nov 2019
Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA
Badri N. Patro
Anupriy
Vinay P. Namboodiri
AAML
FAtt
48
26
0
19 Nov 2019
DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue
X. Jiang
Jiahao Yu
Zengchang Qin
Yingying Zhuang
Xingxing Zhang
Yue Hu
Qi Wu
23
70
0
17 Nov 2019
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
Jingxiang Lin
Unnat Jain
Alex Schwing
LRM
ReLM
37
9
0
31 Oct 2019
KnowIT VQA: Answering Knowledge-Based Questions about Videos
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
30
77
0
23 Oct 2019
Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style
Hongwei Ge
Zehang Yan
Kai Zhang
Mingde Zhao
Liang Sun
30
24
0
15 Oct 2019
Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval
Sijin Wang
Ruiping Wang
Ziwei Yao
Shiguang Shan
Xilin Chen
3DV
33
208
0
11 Oct 2019
Multi-modal Deep Analysis for Multimedia
Wenwu Zhu
Xin Wang
Hongzhi Li
29
38
0
11 Oct 2019
Semantic-aware Image Deblurring
Fuhai Chen
Rongrong Ji
Chengpeng Dai
Xiaoshuai Sun
Chia-Wen Lin
Jiayi Ji
Baochang Zhang
Feiyue Huang
Liujuan Cao
BDL
VLM
25
6
0
09 Oct 2019
Modulated Self-attention Convolutional Network for VQA
Jean-Benoit Delbrouck
Antoine Maiorca
Nathan Hubens
Stéphane Dupont
23
1
0
08 Oct 2019
REMIND Your Neural Network to Prevent Catastrophic Forgetting
Tyler L. Hayes
Kushal Kafle
Robik Shrestha
Manoj Acharya
Christopher Kanan
CLL
31
295
0
06 Oct 2019
Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations
Po-Yao (Bernie) Huang
Xiaojun Chang
Alexander G. Hauptmann
30
25
0
30 Sep 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
252
927
0
24 Sep 2019
6D Pose Estimation with Correlation Fusion
Yi Cheng
Erik Cambria
Ying Sun
C. Acar
Wei Jing
Yan Wu
Liyuan Li
Cheston Tan
Joo-Hwee Lim
45
15
0
24 Sep 2019
Adaptively Aligned Image Captioning via Adaptive Attention Time
Lun Huang
Wenmin Wang
Yaxian Xia
Jie Chen
8
60
0
19 Sep 2019
Graph Neural Networks for Image Understanding Based on Multiple Cues: Group Emotion Recognition and Event Recognition as Use Cases
Xin Guo
Luisa F. Polanía
Bin Zhu
C. Boncelet
Kenneth Barner
24
28
0
19 Sep 2019
Part-Guided Attention Learning for Vehicle Instance Retrieval
Xinyu Zhang
Rufeng Zhang
Jiewei Cao
Dong Gong
Mingyu You
Chunhua Shen
29
37
0
13 Sep 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Zihao Wang
Xihui Liu
Hongsheng Li
Lu Sheng
Junjie Yan
Xiaogang Wang
Jing Shao
VLM
25
299
0
12 Sep 2019
PDANet: Polarity-consistent Deep Attention Network for Fine-grained Visual Emotion Regression
Sicheng Zhao
Zizhou Jia
Hui Chen
Leida Li
Guiguang Ding
Kurt Keutzer
36
62
0
11 Sep 2019
Probabilistic framework for solving Visual Dialog
Badri N. Patro
Anupriy
Vinay P. Namboodiri
BDL
30
13
0
11 Sep 2019
Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation
Arijit Ray
Karan Sikka
Ajay Divakaran
Stefan Lee
Giedrius Burachas
27
65
0
10 Sep 2019
Compositional Generalization in Image Captioning
Mitja Nikolaus
Mostafa Abdou
Matthew Lamm
Rahul Aralikatte
Desmond Elliott
CoGe
27
49
0
10 Sep 2019
Picture What you Read
I. Gallo
Shah Nawaz
Alessandro Calefati
Riccardo La Grassa
Nicola Landro
DiffM
29
0
0
09 Sep 2019
A Semantics-Assisted Video Captioning Model Trained with Scheduled Sampling
Haoran Chen
Ke Lin
A. Maye
Jianmin Li
Xiaoling Hu
25
47
0
31 Aug 2019
Previous
1
2
3
...
12
13
14
15
16
Next