Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.07998
Cited By
v1
v2
v3 (latest)
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"
50 / 1,868 papers shown
Title
Saying the Unseen: Video Descriptions via Dialog Agents
Ye Zhu
Yu Wu
Yi Yang
Yan Yan
69
6
0
26 Jun 2021
UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning
Hwanhee Lee
Seunghyun Yoon
Franck Dernoncourt
Trung Bui
Kyomin Jung
VLM
138
44
0
26 Jun 2021
Core Challenges in Embodied Vision-Language Planning
Jonathan M Francis
Nariaki Kitamura
Felix Labelle
Xiaopeng Lu
Ingrid Navarro
Jean Oh
LM&Ro
144
48
0
26 Jun 2021
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Hongwei Xue
Yupan Huang
Bei Liu
Houwen Peng
Jianlong Fu
Houqiang Li
Jiebo Luo
94
89
0
25 Jun 2021
A Picture May Be Worth a Hundred Words for Visual Question Answering
Yusuke Hirota
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
Ittetsu Taniguchi
Takao Onoye
ViT
35
4
0
25 Jun 2021
Euro-PVI: Pedestrian Vehicle Interactions in Dense Urban Centers
Apratim Bhattacharyya
Daniel Olmeda Reino
Mario Fritz
Bernt Schiele
73
25
0
22 Jun 2021
TCIC: Theme Concepts Learning Cross Language and Vision for Image Captioning
Zhihao Fan
Zhongyu Wei
Siyuan Wang
Ruize Wang
Zejun Li
Haijun Shan
Xuanjing Huang
56
26
0
21 Jun 2021
Attend What You Need: Motion-Appearance Synergistic Networks for Video Question Answering
Ahjeong Seo
Gi-Cheon Kang
J. Park
Byoung-Tak Zhang
82
54
0
19 Jun 2021
Learning to Predict Visual Attributes in the Wild
Khoi Pham
Kushal Kafle
Zhe Lin
Zhi Ding
Scott D. Cohen
Q. Tran
Abhinav Shrivastava
52
113
0
17 Jun 2021
Semi-Autoregressive Transformer for Image Captioning
Yuanen Zhou
Yong Zhang
Zhenzhen Hu
Meng Wang
VLM
78
25
0
17 Jun 2021
Understanding and Evaluating Racial Biases in Image Captioning
Dora Zhao
Angelina Wang
Olga Russakovsky
71
138
0
16 Jun 2021
Vision-Language Navigation with Random Environmental Mixup
Chong Liu
Fengda Zhu
Xiaojun Chang
Xiaodan Liang
Zongyuan Ge
Yi-Dong Shen
LM&Ro
129
87
0
15 Jun 2021
Step-Wise Hierarchical Alignment Network for Image-Text Matching
Zhong Ji
Kexin Chen
Haoran Wang
80
94
0
11 Jun 2021
Supervising the Transfer of Reasoning Patterns in VQA
Corentin Kervadec
Christian Wolf
G. Antipov
M. Baccouche
Madiha Nadri Wolf
79
11
0
10 Jun 2021
Data augmentation to improve robustness of image captioning solutions
Shashank Bujimalla
Mahesh Subedar
Omesh Tickoo
17
2
0
10 Jun 2021
PAM: Understanding Product Images in Cross Product Category Attribute Extraction
Rongmei Lin
Xiang He
J. Feng
Nasser Zalmout
Yan Liang
Li Xiong
Xin Luna Dong
88
36
0
08 Jun 2021
Check It Again: Progressive Visual Question Answering via Visual Entailment
Q. Si
Zheng Lin
Mingyu Zheng
Peng Fu
Weiping Wang
79
48
0
08 Jun 2021
Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions
Daniel Rosenberg
Itai Gat
Amir Feder
Roi Reichart
AAML
91
16
0
08 Jun 2021
Giving Commands to a Self-Driving Car: How to Deal with Uncertain Situations?
Thierry Deruyttere
Victor Milewski
Marie-Francine Moens
64
15
0
08 Jun 2021
BERTGEN: Multi-task Generation through BERT
Faidon Mitzalis
Ozan Caglayan
Pranava Madhyastha
Lucia Specia
VLM
48
7
0
07 Jun 2021
Emotion-aware Chat Machine: Automatic Emotional Response Generation for Human-like Emotional Interaction
Wei Wei
Jiayi Liu
Xian-Ling Mao
G. Guo
Feida Zhu
Pan Zhou
Yuchong Hu
97
56
0
06 Jun 2021
MERLOT: Multimodal Neural Script Knowledge Models
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLM
LRM
104
384
0
04 Jun 2021
Human-Adversarial Visual Question Answering
Sasha Sheng
Amanpreet Singh
Vedanuj Goswami
Jose Alberto Lopez Magana
Wojciech Galuba
Devi Parikh
Douwe Kiela
OOD
EgoV
AAML
58
63
0
04 Jun 2021
Visual Question Rewriting for Increasing Response Rate
Jiayi Wei
Xilian Li
Yi Zhang
Xin Eric Wang
53
3
0
04 Jun 2021
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
Haiyang Xu
Ming Yan
Chenliang Li
Bin Bi
Songfang Huang
Wenming Xiao
Fei Huang
VLM
113
119
0
03 Jun 2021
Attention mechanisms and deep learning for machine vision: A survey of the state of the art
A. M. Hafiz
S. A. Parah
R. A. Bhat
93
45
0
03 Jun 2021
Learning to Select: A Fully Attentive Approach for Novel Object Captioning
Marco Cagrandi
Marcella Cornia
Matteo Stefanini
Lorenzo Baraldi
Rita Cucchiara
67
9
0
02 Jun 2021
Container: Context Aggregation Network
Peng Gao
Jiasen Lu
Hongsheng Li
Roozbeh Mottaghi
Aniruddha Kembhavi
ViT
106
72
0
02 Jun 2021
Towards Efficient Cross-Modal Visual Textual Retrieval using Transformer-Encoder Deep Features
Nicola Messina
Giuseppe Amato
Fabrizio Falchi
Claudio Gennaro
Stéphane Marchand-Maillet
37
7
0
01 Jun 2021
Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models
Linjie Li
Jie Lei
Zhe Gan
Jingjing Liu
AAML
VLM
112
75
0
01 Jun 2021
Towards Diverse Paragraph Captioning for Untrimmed Videos
Yuqing Song
Shizhe Chen
Qin Jin
66
38
0
30 May 2021
Rethinking the constraints of multimodal fusion: case study in Weakly-Supervised Audio-Visual Video Parsing
Jianning Wu
Zhuqing Jiang
S. Wen
Aidong Men
Haiying Wang
84
1
0
30 May 2021
LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question Answering
Zujie Liang
Haifeng Hu
Jiaying Zhu
99
38
0
29 May 2021
Learning Relation Alignment for Calibrated Cross-modal Retrieval
Shuhuai Ren
Junyang Lin
Guangxiang Zhao
Rui Men
An Yang
Jingren Zhou
Xu Sun
Hongxia Yang
82
38
0
28 May 2021
Recent advances and clinical applications of deep learning in medical image analysis
Xuxin Chen
Ximing Wang
Kecheng Zhang
K. Fung
T. Thai
K. Moore
Robert S. Mannel
Hong Liu
B. Zheng
Y. Qiu
OOD
136
612
0
27 May 2021
Maria: A Visual Experience Powered Conversational Agent
Zujie Liang
Huang Hu
Can Xu
Chongyang Tao
Xiubo Geng
Yining Chen
Fan Liang
Daxin Jiang
91
32
0
27 May 2021
Writing by Memorizing: Hierarchical Retrieval-based Medical Report Generation
Xingyi Yang
Muchao Ye
Quanzeng You
Fenglong Ma
MedIm
57
38
0
25 May 2021
Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training
Jong Hak Moon
HyunGyung Lee
W. Shin
Young-Hak Kim
Edward Choi
MedIm
110
161
0
24 May 2021
Human-centric Relation Segmentation: Dataset and Solution
Si Liu
Zitian Wang
Yulu Gao
Lejian Ren
Yue Liao
Guanghui Ren
Bo Li
Shuicheng Yan
38
12
0
24 May 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review
Jabeen Summaira
Xi Li
Amin Muhammad Shoib
Songyuan Li
Abdul Jabbar
HAI
235
59
0
24 May 2021
VTNet: Visual Transformer Network for Object Goal Navigation
Heming Du
Xin Yu
Liang Zheng
ViT
91
93
0
20 May 2021
Dependent Multi-Task Learning with Causal Intervention for Image Captioning
Wenqing Chen
Jidong Tian
Caoyun Fan
Hao He
Yaohui Jin
CML
121
6
0
18 May 2021
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
Junbin Xiao
Xindi Shang
Angela Yao
Tat-Seng Chua
166
507
0
18 May 2021
Weakly Supervised Dense Video Captioning via Jointly Usage of Knowledge Distillation and Cross-modal Matching
Bofeng Wu
Guocheng Niu
Jun Yu
Xinyan Xiao
Jian Zhang
Hua Wu
54
8
0
18 May 2021
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval
K. Ueki
45
4
0
16 May 2021
Connecting What to Say With Where to Look by Modeling Human Attention Traces
Zihang Meng
Licheng Yu
Ning Zhang
Tamara L. Berg
Babak Damavandi
Vikas Singh
Amy Bearman
157
25
0
12 May 2021
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching
Chenchi Zhang
Wenbo Ma
Jun Xiao
Hanwang Zhang
Jian Shao
Yueting Zhuang
Long Chen
83
4
0
12 May 2021
Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules
Aisha Urooj Khan
Hilde Kuehne
Kevin Duarte
Chuang Gan
N. Lobo
M. Shah
71
36
0
11 May 2021
Cross-Modal Generative Augmentation for Visual Question Answering
Zixu Wang
Yishu Miao
Lucia Specia
78
11
0
11 May 2021
T-EMDE: Sketching-based global similarity for cross-modal retrieval
Barbara Rychalska
Mikolaj Wieczorek
Jacek Dąbrowski
59
0
0
10 May 2021
Previous
1
2
3
...
21
22
23
...
36
37
38
Next