ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.07998
  4. Cited By
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
v1v2v3 (latest)

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
    AIMat
ArXiv (abs)PDFHTML

Papers citing "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"

50 / 1,868 papers shown
Title
Learning Object Detection from Captions via Textual Scene Attributes
Learning Object Detection from Captions via Textual Scene Attributes
Achiya Jerbi
Roei Herzig
Jonathan Berant
Gal Chechik
Amir Globerson
79
21
0
30 Sep 2020
Teacher-Critical Training Strategies for Image Captioning
Teacher-Critical Training Strategies for Image Captioning
Yiqing Huang
Jiansheng Chen
VLM
55
9
0
30 Sep 2020
Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for
  Change Captioning
Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning
Xiangxi Shi
Xu Yang
Jiuxiang Gu
Shafiq Joty
Jianfei Cai
71
53
0
30 Sep 2020
Spatial Attention as an Interface for Image Captioning Models
Spatial Attention as an Interface for Image Captioning Models
P. Sadler
53
0
0
29 Sep 2020
Where is the Model Looking At?--Concentrate and Explain the Network
  Attention
Where is the Model Looking At?--Concentrate and Explain the Network Attention
Wenjia Xu
Jiuniu Wang
Yang Wang
Guangluan Xu
Wei Dai
Yirong Wu
XAI
85
17
0
29 Sep 2020
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
Xiaowei Hu
Xi Yin
Kevin Qinghong Lin
Lijuan Wang
Lefei Zhang
Jianfeng Gao
Zicheng Liu
VLM
105
57
0
28 Sep 2020
Neural Twins Talk
Neural Twins Talk
Zanyar Zohourianshahzadi
Jugal Kalita
49
1
0
26 Sep 2020
Regularizing Attention Networks for Anomaly Detection in Visual Question
  Answering
Regularizing Attention Networks for Anomaly Detection in Visual Question Answering
Doyup Lee
Yeongjae Cheon
Wook-Shin Han
AAMLOOD
44
16
0
21 Sep 2020
Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image
  Classification and Retrieval
Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval
Andrés Mafla
S. Dey
Ali Furkan Biten
Lluís Gómez
Dimosthenis Karatzas
80
25
0
21 Sep 2020
Commands 4 Autonomous Vehicles (C4AV) Workshop Summary
Commands 4 Autonomous Vehicles (C4AV) Workshop Summary
Thierry Deruyttere
Simon Vandenhende
Dusan Grujicic
Yu Liu
Luc Van Gool
Matthew Blaschko
Tinne Tuytelaars
Marie-Francine Moens
68
6
0
18 Sep 2020
MUTANT: A Training Paradigm for Out-of-Distribution Generalization in
  Visual Question Answering
MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
OOD
56
142
0
18 Sep 2020
Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News
Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News
Reuben Tan
Bryan A. Plummer
Kate Saenko
AAML
70
72
0
16 Sep 2020
Simultaneous Machine Translation with Visual Context
Simultaneous Machine Translation with Visual Context
Ozan Caglayan
Julia Ive
Veneta Haralampieva
Pranava Madhyastha
Loïc Barrault
Lucia Specia
45
30
0
15 Sep 2020
Multimodal Joint Attribute Prediction and Value Extraction for
  E-commerce Product
Multimodal Joint Attribute Prediction and Value Extraction for E-commerce Product
Tiangang Zhu
Yue Wang
Haoran Li
Youzheng Wu
Xiaodong He
Bowen Zhou
58
71
0
15 Sep 2020
AttnGrounder: Talking to Cars with Attention
AttnGrounder: Talking to Cars with Attention
Vivek Mittal
ViT
95
11
0
11 Sep 2020
Systematic Generalization on gSCAN with Language Conditioned Embedding
Systematic Generalization on gSCAN with Language Conditioned Embedding
Tong Gao
Qi Huang
Raymond J. Mooney
68
22
0
11 Sep 2020
Denoising Large-Scale Image Captioning from Alt-text Data using Content
  Selection Models
Denoising Large-Scale Image Captioning from Alt-text Data using Content Selection Models
Khyathi Chandu
Piyush Sharma
Soravit Changpinyo
Ashish V. Thapliyal
Radu Soricut
DiffMVLM
84
3
0
10 Sep 2020
Towards Unique and Informative Captioning of Images
Towards Unique and Informative Captioning of Images
Zeyu Wang
Berthy Feng
Karthik Narasimhan
Olga Russakovsky
69
37
0
08 Sep 2020
Dynamic Context-guided Capsule Network for Multimodal Machine
  Translation
Dynamic Context-guided Capsule Network for Multimodal Machine Translation
Huan Lin
Fandong Meng
Jinsong Su
Yongjing Yin
Zhengyuan Yang
Yubin Ge
Jie Zhou
Jiebo Luo
81
81
0
04 Sep 2020
A Comparison of Pre-trained Vision-and-Language Models for Multimodal
  Representation Learning across Medical Images and Reports
A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports
Yikuan Li
Hanyin Wang
Yuan Luo
70
67
0
03 Sep 2020
Video Captioning Using Weak Annotation
Video Captioning Using Weak Annotation
Jingyi Hou
Yunde Jia
Xinxiao Wu
Yayun Qi
49
2
0
02 Sep 2020
Zero-Shot Human-Object Interaction Recognition via Affordance Graphs
Zero-Shot Human-Object Interaction Recognition via Affordance Graphs
Alessio Sarullo
Tingting Mu
14
4
0
02 Sep 2020
Cross-modal Knowledge Reasoning for Knowledge-based Visual Question
  Answering
Cross-modal Knowledge Reasoning for Knowledge-based Visual Question Answering
Jiahao Yu
Zihao Zhu
Yujing Wang
Weifeng Zhang
Yue Hu
Jianlong Tan
74
100
0
31 Aug 2020
Person-in-Context Synthesiswith Compositional Structural Space
Person-in-Context Synthesiswith Compositional Structural Space
Weidong Yin
Ziwei Liu
Leonid Sigal
36
2
0
28 Aug 2020
A Dataset and Baselines for Visual Question Answering on Art
A Dataset and Baselines for Visual Question Answering on Art
Noa Garcia
Chentao Ye
Zihua Liu
Qingtao Hu
Mayu Otani
Chenhui Chu
Yuta Nakashima
Teruko Mitamura
CoGe
57
56
0
28 Aug 2020
Visual Question Answering on Image Sets
Visual Question Answering on Image Sets
Ankan Bansal
Yuting Zhang
Rama Chellappa
CoGe
156
44
0
27 Aug 2020
Attr2Style: A Transfer Learning Approach for Inferring Fashion Styles
  via Apparel Attributes
Attr2Style: A Transfer Learning Approach for Inferring Fashion Styles via Apparel Attributes
Rajdeep H. Banerjee
Abhinav Ravi
U. Dutta
41
5
0
26 Aug 2020
Protect, Show, Attend and Tell: Empowering Image Captioning Models with
  Ownership Protection
Protect, Show, Attend and Tell: Empowering Image Captioning Models with Ownership Protection
Jian Han Lim
Chee Seng Chan
Kam Woh Ng
Lixin Fan
Qiang Yang
179
32
0
25 Aug 2020
Joint Modeling of Chest Radiographs and Radiology Reports for Pulmonary
  Edema Assessment
Joint Modeling of Chest Radiographs and Radiology Reports for Pulmonary Edema Assessment
Geeticka Chauhan
Ruizhi Liao
W. Wells
Jacob Andreas
Xin Wang
Seth Berkowitz
Steven Horng
Peter Szolovits
Polina Golland
MedIm
74
53
0
22 Aug 2020
Attribute Prototype Network for Zero-Shot Learning
Attribute Prototype Network for Zero-Shot Learning
Wenjia Xu
Yongqin Xian
Jiuniu Wang
Bernt Schiele
Zeynep Akata
99
295
0
19 Aug 2020
Linguistically-aware Attention for Reducing the Semantic-Gap in
  Vision-Language Tasks
Linguistically-aware Attention for Reducing the Semantic-Gap in Vision-Language Tasks
K. Gouthaman
Athira M. Nambiar
K. Srinivas
Anurag Mittal
VLM
63
13
0
18 Aug 2020
Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents
Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents
Ye Zhu
Yu Wu
Yi Yang
Yan Yan
82
13
0
18 Aug 2020
DeVLBert: Learning Deconfounded Visio-Linguistic Representations
DeVLBert: Learning Deconfounded Visio-Linguistic Representations
Shengyu Zhang
Tan Jiang
Tan Wang
Kun Kuang
Zhou Zhao
Jianke Zhu
Jin Yu
Hongxia Yang
Leilei Gan
OOD
81
88
0
16 Aug 2020
Weakly supervised cross-domain alignment with optimal transport
Weakly supervised cross-domain alignment with optimal transport
Siyang Yuan
Ke Bai
Liqun Chen
Yizhe Zhang
Chenyang Tao
Chunyuan Li
Guoyin Wang
Ricardo Henao
Lawrence Carin
OT
60
7
0
14 Aug 2020
PAM:Point-wise Attention Module for 6D Object Pose Estimation
PAM:Point-wise Attention Module for 6D Object Pose Estimation
Myoungha Song
Jeongho Lee
Donghwan Kim
3DPC
75
3
0
12 Aug 2020
Location-aware Graph Convolutional Networks for Video Question Answering
Location-aware Graph Convolutional Networks for Video Question Answering
Deng Huang
Peihao Chen
Runhao Zeng
Qing Du
Mingkui Tan
Chuang Gan
GNNBDL
107
175
0
07 Aug 2020
Fashion Captioning: Towards Generating Accurate Descriptions with
  Semantic Rewards
Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards
Xuewen Yang
Heming Zhang
Di Jin
Yingru Liu
Chi-Hao Wu
Jianchao Tan
Dongliang Xie
Jue Wang
Xin Wang
100
68
0
06 Aug 2020
Learning Transition Models with Time-delayed Causal Relations
Learning Transition Models with Time-delayed Causal Relations
Junchi Liang
Abdeslam Boularias
OffRL
42
3
0
04 Aug 2020
Describing Textures using Natural Language
Describing Textures using Natural Language
Chenyun Wu
Mikayla Timm
Subhransu Maji
3DV
58
10
0
03 Aug 2020
SeqDialN: Sequential Visual Dialog Networks in Joint Visual-Linguistic Representation Space
Liu Yang
VLM
60
5
0
02 Aug 2020
Eigen-CAM: Class Activation Map using Principal Components
Eigen-CAM: Class Activation Map using Principal Components
Mohammed Bany Muhammad
M. Yeasin
78
346
0
01 Aug 2020
AiR: Attention with Reasoning Capability
AiR: Attention with Reasoning Capability
Shi Chen
Ming Jiang
Jinhui Yang
Qi Zhao
LRM
56
36
0
28 Jul 2020
Decomposing Generation Networks with Structure Prediction for Recipe
  Generation
Decomposing Generation Networks with Structure Prediction for Recipe Generation
Hao Wang
Guosheng Lin
Guosheng Lin
Chunyan Miao
38
1
0
27 Jul 2020
REXUP: I REason, I EXtract, I UPdate with Structured Compositional
  Reasoning for Visual Question Answering
REXUP: I REason, I EXtract, I UPdate with Structured Compositional Reasoning for Visual Question Answering
Siwen Luo
S. Han
Kaiyuan Sun
Josiah Poon
CoGeLRMReLM
83
4
0
27 Jul 2020
Contrastive Visual-Linguistic Pretraining
Contrastive Visual-Linguistic Pretraining
Lei Shi
Kai Shuang
Shijie Geng
Peng Su
Zhengkai Jiang
Peng Gao
Zuohui Fu
Gerard de Melo
Sen Su
VLMSSLCLIP
105
29
0
26 Jul 2020
Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data
Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data
Michael Cogswell
Jiasen Lu
Rishabh Jain
Stefan Lee
Devi Parikh
Dhruv Batra
VLMEgoV
78
15
0
24 Jul 2020
Spatially Aware Multimodal Transformers for TextVQA
Spatially Aware Multimodal Transformers for TextVQA
Yash Kant
Dhruv Batra
Peter Anderson
Alex Schwing
Devi Parikh
Jiasen Lu
Harsh Agrawal
100
86
0
23 Jul 2020
Comprehensive Image Captioning via Scene Graph Decomposition
Comprehensive Image Captioning via Scene Graph Decomposition
Yiwu Zhong
Liwei Wang
Jianshu Chen
Dong Yu
Yin Li
135
128
0
23 Jul 2020
Integrating Image Captioning with Rule-based Entity Masking
Integrating Image Captioning with Rule-based Entity Masking
Aditya Mogadala
Xiaoyu Shen
Dietrich Klakow
32
7
0
22 Jul 2020
Fine-Grained Image Captioning with Global-Local Discriminative Objective
Fine-Grained Image Captioning with Global-Local Discriminative Objective
Jie Wu
Tianshui Chen
Hefeng Wu
Zhi Yang
Guangchun Luo
Liang Lin
70
59
0
21 Jul 2020
Previous
123...262728...363738
Next