Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.07998
Cited By
v1
v2
v3 (latest)
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"
50 / 1,868 papers shown
Title
TRIE++: Towards End-to-End Information Extraction from Visually Rich Documents
Zhanzhan Cheng
Peng Zhang
Can Li
Qiao Liang
Yunlu Xu
Pengfei Li
Shiliang Pu
Yi Niu
Fei Wu
42
10
0
14 Jul 2022
Skeletal Human Action Recognition using Hybrid Attention based Graph Convolutional Network
Hao Xing
Darius Burschka
GNN
3DH
56
7
0
12 Jul 2022
A Baseline for Detecting Out-of-Distribution Examples in Image Captioning
Gabi Shalev
Gal-Lev Shalev
Joseph Keshet
OODD
65
7
0
12 Jul 2022
Video Graph Transformer for Video Question Answering
Junbin Xiao
Pan Zhou
Tat-Seng Chua
Shuicheng Yan
ViT
229
78
0
12 Jul 2022
Cross-modal Prototype Driven Network for Radiology Report Generation
Jun Wang
A. Bhalerao
Yulan He
MedIm
171
77
0
11 Jul 2022
Horizontal and Vertical Attention in Transformers
Litao Yu
Shuai Liu
ViT
49
1
0
10 Jul 2022
Towards Multimodal Vision-Language Models Generating Non-Generic Text
Wes Robbins
Zanyar Zohourianshahzadi
Jugal Kalita
51
1
0
09 Jul 2022
Predicting Word Learning in Children from the Performance of Computer Vision Systems
Sunayana Rane
Mira L. Nencheva
Zeyu Wang
C. Lew‐Williams
Olga Russakovsky
Thomas Griffiths
99
3
0
07 Jul 2022
Exploring the sequence length bottleneck in the Transformer for Image Captioning
Jiapeng Hu
Roberto Cavicchioli
Alessandro Capotondi
ViT
55
3
0
07 Jul 2022
Towards Realistic Semi-Supervised Learning
Mamshad Nayeem Rizve
Navid Kardan
M. Shah
102
47
0
05 Jul 2022
Vision-and-Language Pretraining
Thong Nguyen
Cong-Duy Nguyen
Xiaobao Wu
See-Kiong Ng
Anh Tuan Luu
VLM
CLIP
63
2
0
05 Jul 2022
Are metrics measuring what they should? An evaluation of image captioning task metrics
Othón González-Chávez
Guillermo Ruiz
Daniela Moctezuma
Tania A. Ramirez-delreal
73
9
0
04 Jul 2022
Attributed Abnormality Graph Embedding for Clinically Accurate X-Ray Report Generation
Sixing Yan
William K. Cheung
Keith W H Chiu
Terence M. Tong
Charles K. Cheung
Simon See
MedIm
87
17
0
04 Jul 2022
Divert More Attention to Vision-Language Tracking
Mingzhe Guo
Zhipeng Zhang
Heng Fan
Li Jing
85
56
0
03 Jul 2022
Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation Learning and Retrieval
Keyu Wen
Zhenshan Tan
Qingrong Cheng
Cheng Chen
X. Gu
VLM
56
0
0
02 Jul 2022
Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations
Ziyan Yang
Kushal Kafle
Franck Dernoncourt
Vicente Ordónez Román
VLM
90
25
0
30 Jun 2022
A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA
Yangyang Guo
Liqiang Nie
Yongkang Wong
Yebin Liu
Zhiyong Cheng
Mohan S. Kankanhalli
119
40
0
30 Jun 2022
EBMs vs. CL: Exploring Self-Supervised Visual Pretraining for Visual Question Answering
Violetta Shevchenko
Ehsan Abbasnejad
A. Dick
Anton Van Den Hengel
Damien Teney
71
0
0
29 Jun 2022
ZoDIAC: Zoneout Dropout Injection Attention Calculation
Zanyar Zohourianshahzadi
Jugal Kalita
91
0
0
28 Jun 2022
From Shallow to Deep: Compositional Reasoning over Graphs for Visual Question Answering
Zihao Zhu
NAI
ReLM
GNN
120
3
0
25 Jun 2022
VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives
Zhuofan Ying
Peter Hase
Joey Tianyi Zhou
LRM
87
13
0
22 Jun 2022
Bypass Network for Semantics Driven Image Paragraph Captioning
Qinjie Zheng
Chaoyue Wang
Dadong Wang
120
1
0
21 Jun 2022
Winning the CVPR'2022 AQTC Challenge: A Two-stage Function-centric Approach
Shiwei Wu
Weidong He
Tong Xu
Hao Wang
Enhong Chen
EgoV
64
3
0
20 Jun 2022
DALL-E for Detection: Language-driven Compositional Image Synthesis for Object Detection
Yunhao Ge
Lyne Tchapmi
Brian Nlong Zhao
Neel Joshi
Laurent Itti
Vibhav Vineet
DiffM
ObjD
105
18
0
20 Jun 2022
VReBERT: A Simple and Flexible Transformer for Visual Relationship Detection
Yunbo Cui
M. Farazi
ViT
88
1
0
18 Jun 2022
Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval
Xiao Dong
Xunlin Zhan
Yunchao Wei
Xiaoyong Wei
Yaowei Wang
Minlong Lu
Xiaochun Cao
Xiaodan Liang
74
11
0
17 Jun 2022
Image Captioning based on Feature Refinement and Reflective Decoding
G. Alabduljabbar
Hafida Benhidour
Said Kerrache
3DV
24
3
0
16 Jun 2022
Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation
Ye Zhu
Yuehua Wu
Kyle Olszewski
Jian Ren
Sergey Tulyakov
Yan Yan
DiffM
107
49
0
15 Jun 2022
Measuring Representational Harms in Image Captioning
Angelina Wang
Solon Barocas
Kristen Laird
Hanna M. Wallach
116
54
0
14 Jun 2022
Comprehending and Ordering Semantics for Image Captioning
Yehao Li
Yingwei Pan
Ting Yao
Tao Mei
82
92
0
14 Jun 2022
TransVG++: End-to-End Visual Grounding with Language Conditioned Vision Transformer
Jiajun Deng
Zhengyuan Yang
Daqing Liu
Tianlang Chen
Wen-gang Zhou
Yanyong Zhang
Houqiang Li
Wanli Ouyang
ViT
107
57
0
14 Jun 2022
GLIPv2: Unifying Localization and Vision-Language Understanding
Haotian Zhang
Pengchuan Zhang
Xiaowei Hu
Yen-Chun Chen
Liunian Harold Li
Xiyang Dai
Lijuan Wang
Lu Yuan
Lei Li
Jianfeng Gao
ObjD
VLM
97
302
0
12 Jun 2022
Revealing Single Frame Bias for Video-and-Language Learning
Jie Lei
Tamara L. Berg
Joey Tianyi Zhou
96
115
0
07 Jun 2022
cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation
Kshitij Gupta
Devansh Gautam
R. Mamidi
VLM
70
4
0
07 Jun 2022
Improving Image Captioning with Control Signal of Sentence Quality
Zhangzi Zhu
Hong Qu
83
0
0
07 Jun 2022
Dual Decomposition of Convex Optimization Layers for Consistent Attention in Medical Images
Tom Ron
M. Weiler-Sagie
Tamir Hazan
FAtt
MedIm
81
6
0
06 Jun 2022
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval
Xudong Lin
Simran Tiwari
Shiyuan Huang
Manling Li
Mike Zheng Shou
Heng Ji
Shih-Fu Chang
127
21
0
05 Jun 2022
Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation
Mingjie Li
Wenjia Cai
Karin Verspoor
Shirui Pan
Xiaodan Liang
Xiaojun Chang
MedIm
88
38
0
04 Jun 2022
From Pixels to Objects: Cubic Visual Attention for Visual Question Answering
Jingkuan Song
Pengpeng Zeng
Lianli Gao
Heng Tao Shen
72
62
0
04 Jun 2022
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
Dustin Schwenk
Apoorv Khandelwal
Christopher Clark
Kenneth Marino
Roozbeh Mottaghi
74
556
0
03 Jun 2022
VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models
Wangchunshu Zhou
Yan Zeng
Shizhe Diao
Xinsong Zhang
CoGe
VLM
97
13
0
30 May 2022
Controllable Text Generation with Neurally-Decomposed Oracle
Tao Meng
Sidi Lu
Nanyun Peng
Kai-Wei Chang
BDL
103
37
0
27 May 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
172
562
0
27 May 2022
Fine-grained Image Captioning with CLIP Reward
Jaemin Cho
Seunghyun Yoon
Ajinkya Kale
Franck Dernoncourt
Trung Bui
Joey Tianyi Zhou
CLIP
228
79
0
26 May 2022
DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally Spreading Out Disinformation
Jingnong Qu
Liunian Harold Li
Jieyu Zhao
Sunipa Dev
Kai-Wei Chang
64
12
0
25 May 2022
Guiding Visual Question Answering with Attention Priors
T. Le
Vuong Le
Sunil R. Gupta
Svetha Venkatesh
T. Tran
66
6
0
25 May 2022
The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training
Gi-Cheon Kang
Sungdong Kim
Jin-Hwa Kim
Donghyun Kwak
Byoung-Tak Zhang
83
10
0
25 May 2022
Face2Text revisited: Improved data set and baseline results
Marc Tanti
Shaun Abdilla
A. Muscat
Claudia Borg
R. Farrugia
Albert Gatt
CVBM
32
3
0
24 May 2022
Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization
Aishwarya Agrawal
Ivana Kajić
Emanuele Bugliarello
Elnaz Davoodi
Anita Gergely
Phil Blunsom
Aida Nematzadeh
OOD
88
18
0
24 May 2022
HiVLP: Hierarchical Vision-Language Pre-Training for Fast Image-Text Retrieval
Feilong Chen
Xiuyi Chen
Jiaxin Shi
Duzhen Zhang
Jianlong Chang
Qi Tian
VLM
CLIP
93
6
0
24 May 2022
Previous
1
2
3
...
13
14
15
...
36
37
38
Next