Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.07998
Cited By
v1
v2
v3 (latest)
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"
50 / 1,868 papers shown
Title
Cross Modification Attention Based Deliberation Model for Image Captioning
Zheng Lian
Yanan Zhang
Haichang Li
Rui Wang
Xiaohui Hu
64
5
0
17 Sep 2021
An Interpretable Framework for Drug-Target Interaction with Gated Cross Attention
Yeachan Kim
Bonggun Shin
95
9
0
17 Sep 2021
A Survey on Temporal Sentence Grounding in Videos
Xiaohan Lan
Yitian Yuan
Xin Eric Wang
Zhi Wang
Wenwu Zhu
123
47
0
16 Sep 2021
Label-Attention Transformer with Geometrically Coherent Objects for Image Captioning
Shikha Dubey
Farrukh Olimov
M. Rafique
Joonmo Kim
M. Jeon
ViT
82
42
0
16 Sep 2021
Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering
Ander Salaberria
Gorka Azkune
Oier López de Lacalle
Aitor Soroa Etxabe
Eneko Agirre
92
61
0
15 Sep 2021
What Vision-Language Models `See' when they See Scenes
Michele Cafagna
Kees van Deemter
Albert Gatt
VLM
97
13
0
15 Sep 2021
Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
Jihyung Kil
Cheng Zhang
D. Xuan
Wei-Lun Chao
114
20
0
13 Sep 2021
xGQA: Cross-Lingual Visual Question Answering
Jonas Pfeiffer
Gregor Geigle
Aishwarya Kamath
Jan-Martin O. Steitz
Stefan Roth
Ivan Vulić
Iryna Gurevych
117
62
0
13 Sep 2021
Learning to Ground Visual Objects for Visual Dialog
Feilong Chen
Xiuyi Chen
Can Xu
Daxin Jiang
OOD
86
18
0
13 Sep 2021
UniMS: A Unified Framework for Multimodal Summarization with Knowledge Distillation
Zhengkun Zhang
Xiaojun Meng
Yasheng Wang
Xin Jiang
Qun Liu
Zhenglu Yang
89
47
0
13 Sep 2021
Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation
Zechen Bai
Yuta Nakashima
Noa Garcia
110
44
0
13 Sep 2021
Constructing Phrase-level Semantic Labels to Form Multi-Grained Supervision for Image-Text Retrieval
Zhihao Fan
Zhongyu Wei
Zejun Li
Siyuan Wang
Haijun Shan
Xuanjing Huang
Jianqing Fan
CLIP
45
12
0
12 Sep 2021
COSMic: A Coherence-Aware Generation Metric for Image Descriptions
Mert Inan
P. Sharma
Baber Khalid
Radu Soricut
Matthew Stone
Malihe Alikhani
EGVM
50
13
0
11 Sep 2021
Partially-Supervised Novel Object Captioning Leveraging Context from Paired Data
Shashank Bujimalla
Mahesh Subedar
Omesh Tickoo
104
1
0
10 Sep 2021
We went to look for meaning and all we got were these lousy representations: aspects of meaning representation for computational semantics
Simon Dobnik
R. Cooper
Adam Ek
Bill Noble
Staffan Larsson
N. Ilinykh
Vladislav Maraev
Vidya Somashekarappa
64
0
0
10 Sep 2021
Temporal Pyramid Transformer with Multimodal Interaction for Video Question Answering
Min Peng
Chongyang Wang
Yuan Gao
Yu Shi
Xiangdong Zhou
82
3
0
10 Sep 2021
Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation
H. Khan
D. Gupta
Asif Ekbal
57
14
0
10 Sep 2021
Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers
Stella Frank
Emanuele Bugliarello
Desmond Elliott
74
82
0
09 Sep 2021
TxT: Crossmodal End-to-End Learning with Transformers
Jan-Martin O. Steitz
Jonas Pfeiffer
Iryna Gurevych
Stefan Roth
LRM
29
2
0
09 Sep 2021
M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining
Xiao Dong
Xunlin Zhan
Yangxin Wu
Yunchao Wei
Michael C. Kampffmeyer
Xiaoyong Wei
Minlong Lu
Yaowei Wang
Xiaodan Liang
116
38
0
09 Sep 2021
Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models
Steven Y. Feng
Kevin Lu
Zhuofu Tao
Malihe Alikhani
Teruko Mitamura
Eduard H. Hovy
Varun Gangal
LRM
79
13
0
08 Sep 2021
RefineCap: Concept-Aware Refinement for Image Captioning
Yekun Chai
Shuo Jin
Junliang Xing
VLM
25
1
0
08 Sep 2021
Visual Sensation and Perception Computational Models for Deep Learning: State of the art, Challenges and Prospects
Bing Wei
Yudi Zhao
K. Hao
Lei Gao
74
5
0
08 Sep 2021
Journalistic Guidelines Aware News Image Captioning
Xuewen Yang
Svebor Karaman
Joel R. Tetreault
Alex Jaimes
79
27
0
07 Sep 2021
Improved RAMEN: Towards Domain Generalization for Visual Question Answering
Bhanuka Gamage
Lim Chern Hong
72
1
0
06 Sep 2021
Enhancing Visual Dialog Questioner with Entity-based Strategy Learning and Augmented Guesser
Duo Zheng
Zipeng Xu
Fandong Meng
Xiaojie Wang
Jiaan Wang
Jie Zhou
45
13
0
06 Sep 2021
LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation
Mohammad Abuzar Shaikh
Zhanghexuan Ji
Dana Moukheiber
Yan Shen
S. Srihari
Mingchen Gao
VLM
46
1
0
04 Sep 2021
Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
LRM
85
19
0
04 Sep 2021
Stimuli-Aware Visual Emotion Analysis
Jingyuan Yang
Jie Li
Xiumei Wang
Yuxuan Ding
Xinbo Gao
55
55
0
04 Sep 2021
IMG2SMI: Translating Molecular Structure Images to Simplified Molecular-input Line-entry System
Daniel Fernando Campos
Heng Ji
64
13
0
03 Sep 2021
Point-of-Interest Type Prediction using Text and Images
Danae Sánchez Villegas
Nikolaos Aletras
116
14
0
01 Sep 2021
Working Memory Connections for LSTM
Federico Landi
Lorenzo Baraldi
Marcella Cornia
Rita Cucchiara
KELM
74
170
0
31 Aug 2021
From General to Specific: Informative Scene Graph Generation via Balance Adjustment
Yuyu Guo
Lianli Gao
Xuanhan Wang
Yuxuan Hu
Xing Xu
Xu Lu
Heng Tao Shen
Jingkuan Song
103
88
0
30 Aug 2021
Zero-shot Natural Language Video Localization
Jinwoo Nam
Daechul Ahn
Dongyeop Kang
S. Ha
Jonghyun Choi
176
43
0
29 Aug 2021
On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering
K. Gouthaman
Anurag Mittal
CML
70
0
0
28 Aug 2021
QACE: Asking Questions to Evaluate an Image Caption
Hwanhee Lee
Thomas Scialom
Seunghyun Yoon
Franck Dernoncourt
Kyomin Jung
CoGe
87
19
0
28 Aug 2021
SASRA: Semantically-aware Spatio-temporal Reasoning Agent for Vision-and-Language Navigation in Continuous Environments
Muhammad Zubair Irshad
Niluthpol Chowdhury Mithun
Zachary Seymour
Han-Pang Chiu
S. Samarasekera
Rakesh Kumar
LM&Ro
84
51
0
26 Aug 2021
Similar Scenes arouse Similar Emotions: Parallel Data Augmentation for Stylized Image Captioning
Guodun Li
Yuchen Zhai
Zehao Lin
Yin Zhang
104
21
0
26 Aug 2021
Improving Object Detection and Attribute Recognition by Feature Entanglement Reduction
Zhao-Heng Zheng
Arka Sadhu
Ramkant Nevatia
27
2
0
25 Aug 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLM
MLLM
155
799
0
24 Aug 2021
Auto-Parsing Network for Image Captioning and Visual Question Answering
Xu Yang
Chongyang Gao
Hanwang Zhang
Jianfei Cai
117
37
0
24 Aug 2021
EKTVQA: Generalized use of External Knowledge to empower Scene Text in Text-VQA
Arka Ujjal Dey
Ernest Valveny
Gaurav Harit
43
3
0
22 Aug 2021
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Ming Yan
Haiyang Xu
Chenliang Li
Bin Bi
Junfeng Tian
Min Gui
Wei Wang
VLM
62
10
0
21 Aug 2021
Group-based Distinctive Image Captioning with Memory Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
100
18
0
20 Aug 2021
Airbert: In-domain Pretraining for Vision-and-Language Navigation
Pierre-Louis Guhur
Makarand Tapaswi
Shizhe Chen
Ivan Laptev
Cordelia Schmid
LM&Ro
59
144
0
20 Aug 2021
Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification
Yongming Rao
Guangyi Chen
Jiwen Lu
Jie Zhou
CML
OOD
88
247
0
19 Aug 2021
X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics
Yehao Li
Yingwei Pan
Jingwen Chen
Ting Yao
Tao Mei
VLM
88
31
0
18 Aug 2021
Who's Waldo? Linking People Across Text and Images
Claire Yuqing Cui
Apoorv Khandelwal
Yoav Artzi
Noah Snavely
Hadar Averbuch-Elor
85
21
0
16 Aug 2021
MMChat: Multi-Modal Chat Dataset on Social Media
Yinhe Zheng
Guanyi Chen
Xin Liu
K. Lin
83
33
0
16 Aug 2021
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Yuhao Cui
Zhou Yu
Chunqi Wang
Zhongzhou Zhao
Ji Zhang
Meng Wang
Jun-chen Yu
VLM
59
56
0
16 Aug 2021
Previous
1
2
3
...
19
20
21
...
36
37
38
Next