Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.07998
Cited By
v1
v2
v3 (latest)
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"
50 / 1,868 papers shown
Title
Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection
Keren Ye
Ruotong Wang
Adriana Kovashka
Wei Li
Danfeng Qin
Jesse Berent
150
60
0
23 Jul 2019
Bilinear Graph Networks for Visual Question Answering
Dalu Guo
Chang Xu
Dacheng Tao
GNN
83
53
0
23 Jul 2019
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
141
136
0
22 Jul 2019
VIFIDEL: Evaluating the Visual Fidelity of Image Descriptions
Pranava Madhyastha
Josiah Wang
Lucia Specia
66
33
0
22 Jul 2019
Watch It Twice: Video Captioning with a Refocused Video Encoder
Xiangxi Shi
Jianfei Cai
Shafiq Joty
Jiuxiang Gu
70
28
0
21 Jul 2019
OmniNet: A unified architecture for multi-modal multi-task learning
Subhojeet Pramanik
Priyanka Agrawal
A. Hussain
64
41
0
17 Jul 2019
Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos
Shizhe Chen
Yuqing Song
Yida Zhao
Qin Jin
Zhaoyang Zeng
Bei Liu
Jianlong Fu
Alexander G. Hauptmann
58
12
0
11 Jul 2019
Two-stream Spatiotemporal Feature for Video QA Task
Chiwan Song
Woobin Im
Sung-eui Yoon
21
0
0
11 Jul 2019
A Survey of Deep Learning-based Object Detection
L. Jiao
Fan Zhang
Fang Liu
Shuyuan Yang
Lingling Li
Zhixi Feng
Rong Qu
ObjD
131
973
0
11 Jul 2019
Aesthetic Attributes Assessment of Images
Xin Jin
Le Wu
Geng Zhao
Xiaodong Li
Xiaokun Zhang
Shiming Ge
Dongqing Zou
Bin Zhou
Xinghui Zhou
78
40
0
11 Jul 2019
Neural Reasoning, Fast and Slow, for Video Question Answering
T. Le
Vuong Le
Svetha Venkatesh
T. Tran
44
14
0
10 Jul 2019
Learning by Abstraction: The Neural State Machine
Drew A. Hudson
Christopher D. Manning
NAI
OCL
131
262
0
09 Jul 2019
Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters
Federico Landi
Lorenzo Baraldi
M. Corsini
Rita Cucchiara
LM&Ro
98
26
0
05 Jul 2019
Neural Image Captioning
E. Tan
Lakshay Sharma
VLM
55
3
0
02 Jul 2019
Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems
Hung Le
Doyen Sahoo
Nancy F. Chen
Guosheng Lin
63
112
0
02 Jul 2019
ICDAR 2019 Competition on Scene Text Visual Question Answering
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Lluís Gómez
Marçal Rusiñol
Minesh Mathew
C. V. Jawahar
Ernest Valveny
Dimosthenis Karatzas
74
76
0
30 Jun 2019
Localizing Unseen Activities in Video via Image Query
Zhu Zhang
Zhou Zhao
Zhijie Lin
Jingkuan Song
Deng Cai
ViT
49
13
0
28 Jun 2019
Open-Ended Long-Form Video Question Answering via Hierarchical Convolutional Self-Attention Networks
Zhu Zhang
Zhou Zhao
Zhijie Lin
Jingkuan Song
Xiaofei He
BDL
44
14
0
28 Jun 2019
Deep Modular Co-Attention Networks for Visual Question Answering
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
99
811
0
25 Jun 2019
RUBi: Reducing Unimodal Biases in Visual Question Answering
Rémi Cadène
Corentin Dancette
H. Ben-younes
Matthieu Cord
Devi Parikh
CML
104
374
0
24 Jun 2019
Improving Description-based Person Re-identification by Multi-granularity Image-text Alignments
K. Niu
Y. Huang
Wanli Ouyang
Liang Wang
58
143
0
23 Jun 2019
Baidu-UTS Submission to the EPIC-Kitchens Action Recognition Challenge 2019
Xiaohan Wang
Yu Wu
Linchao Zhu
Yi Yang
77
19
0
22 Jun 2019
Adversarial Regularization for Visual Question Answering: Strengths, Shortcomings, and Side Effects
Gabriel Grand
Yonatan Belinkov
108
68
0
20 Jun 2019
Expressing Visual Relationships via Language
Hao Tan
Franck Dernoncourt
Zhe Lin
Trung Bui
Joey Tianyi Zhou
85
68
0
18 Jun 2019
ParNet: Position-aware Aggregated Relation Network for Image-Text matching
Yaxian Xia
Lun Huang
Wenmin Wang
Xiao-Yong Wei
Jie Chen
121
1
0
17 Jun 2019
Structured Pruning of Recurrent Neural Networks through Neuron Selection
Liangjiang Wen
Xuanyang Zhang
Haoli Bai
Zenglin Xu
65
38
0
17 Jun 2019
Image Captioning with Integrated Bottom-Up and Multi-level Residual Top-Down Attention for Game Scene Understanding
Jian Zheng
S. Krishnamurthy
Ruxin Chen
Min-Hung Chen
Zhenhao Ge
Xiaohua Li
72
4
0
16 Jun 2019
Generating Diverse and Informative Natural Language Fashion Feedback
Gil Sadeh
L. Fritz
Gabi Shalev
Eduard Oks
47
5
0
15 Jun 2019
Comparison of Diverse Decoding Methods from Conditional Language Models
Daphne Ippolito
Reno Kriz
M. Kustikova
João Sedoc
Chris Callison-Burch
AI4CE
85
114
0
14 Jun 2019
Improving Visual Question Answering by Referring to Generated Paragraph Captions
Hyounghun Kim
Joey Tianyi Zhou
CoGe
50
20
0
14 Jun 2019
Image Captioning: Transforming Objects into Words
Simão Herdade
Armin Kappeler
K. Boakye
Joao Soares
ViT
142
476
0
14 Jun 2019
Attention-based Multi-Input Deep Learning Architecture for Biological Activity Prediction: An Application in EGFR Inhibitors
Huy Pham
Trung Le
OOD
AI4CE
22
4
0
12 Jun 2019
Relationship-Embedded Representation Learning for Grounding Referring Expressions
Sibei Yang
Guanbin Li
Yizhou Yu
ObjD
93
55
0
11 Jun 2019
Improving Neural Language Modeling via Adversarial Training
Dilin Wang
Chengyue Gong
Qiang Liu
AAML
115
119
0
10 Jun 2019
Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos
Zhu Zhang
Zhijie Lin
Zhou Zhao
Zhenxin Xiao
65
213
0
06 Jun 2019
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering
Zhou Yu
D. Xu
Jun-chen Yu
Ting Yu
Zhou Zhao
Yueting Zhuang
Dacheng Tao
141
477
0
06 Jun 2019
Relational Reasoning using Prior Knowledge for Visual Captioning
Jingyi Hou
Xinxiao Wu
Yayun Qi
Wentian Zhao
Jiebo Luo
Yunde Jia
85
14
0
04 Jun 2019
Masked Non-Autoregressive Image Captioning
Junlong Gao
Xi Meng
Shiqi Wang
Xia Li
Shanshe Wang
Siwei Ma
Wen Gao
80
39
0
03 Jun 2019
Learning to Generate Grounded Visual Captions without Localization Supervision
Chih-Yao Ma
Yannis Kalantidis
Ghassan AlRegib
Peter Vajda
Marcus Rohrbach
Z. Kira
SSL
43
10
0
01 Jun 2019
Efficient Object Embedding for Spliced Image Retrieval
Bor-Chun Chen
Zuxuan Wu
L. Davis
Ser-Nam Lim
68
8
0
28 May 2019
Gaining Extra Supervision via Multi-task learning for Multi-Modal Video Question Answering
Junyeong Kim
Minuk Ma
Kyungsu Kim
Sungjin Kim
Chang D. Yoo
62
27
0
28 May 2019
Demand Forecasting from Spatiotemporal Data with Graph Networks and Temporal-Guided Embedding
Doyup Lee
Suehun Jung
Yeongjae Cheon
Dongil Kim
Seungil You
AI4TS
51
6
0
26 May 2019
DIANet: Dense-and-Implicit Attention Network
Zhongzhan Huang
Senwei Liang
Mingfu Liang
Haizhao Yang
CVBM
82
57
0
25 May 2019
SuperCaptioning: Image Captioning Using Two-dimensional Word Embedding
Baohua Sun
Ling Yang
Michael Lin
Charles Young
Patrick Dong
Wenhan Zhang
Jason Dong
VLM
42
8
0
25 May 2019
Deep Reason: A Strong Baseline for Real-World Visual Reasoning
Chenfei Wu
Yanzhao Zhou
Gen Li
Nan Duan
Duyu Tang
Xiaojie Wang
LRM
NAI
ReLM
18
2
0
24 May 2019
Image Captioning based on Deep Learning Methods: A Survey
Yiyu Wang
Jungang Xu
Yingfei Sun
Xianpei Han
VLM
31
7
0
20 May 2019
Multimodal Transformer with Multi-View Visual Representation for Image Captioning
Jun-chen Yu
Jing Li
Zhou Yu
Qingming Huang
ViT
65
387
0
20 May 2019
Deep Unified Multimodal Embeddings for Understanding both Content and Users in Social Media Networks
Karan Sikka
Lucas Van Bramer
Ajay Divakaran
85
2
0
17 May 2019
Quantifying and Alleviating the Language Prior Problem in Visual Question Answering
Yangyang Guo
Zhiyong Cheng
Liqiang Nie
Yebin Liu
Yinglong Wang
Mohan Kankanhalli
57
37
0
13 May 2019
Follow the Attention: Combining Partial Pose and Object Motion for Fine-Grained Action Detection
M. M. K. Moghaddam
Ehsan Abbasnejad
Javen Qinfeng Shi
36
2
0
11 May 2019
Previous
1
2
3
...
33
34
35
36
37
38
Next