ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.07998
  4. Cited By
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
v1v2v3 (latest)

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
    AIMat
ArXiv (abs)PDFHTML

Papers citing "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"

50 / 1,868 papers shown
Title
Injecting Prior Knowledge into Image Caption Generation
Injecting Prior Knowledge into Image Caption Generation
A. Goel
Basura Fernando
Thanh-Son Nguyen
Hakan Bilen
33
0
0
22 Nov 2019
Reinforcing an Image Caption Generator Using Off-Line Human Feedback
Reinforcing an Image Caption Generator Using Off-Line Human Feedback
Paul Hongsuck Seo
Piyush Sharma
Tomer Levinboim
Bohyung Han
Radu Soricut
OffRL
72
22
0
21 Nov 2019
Temporal Reasoning via Audio Question Answering
Temporal Reasoning via Audio Question Answering
Haytham M. Fayek
Justin Johnson
65
54
0
21 Nov 2019
Learning Cross-modal Context Graph for Visual Grounding
Learning Cross-modal Context Graph for Visual Grounding
Yongfei Liu
Bo Wan
Xiao-Dan Zhu
Xuming He
94
91
0
20 Nov 2019
Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA
Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA
Badri N. Patro
Anupriy
Vinay P. Namboodiri
AAMLFAtt
85
26
0
19 Nov 2019
Vision-Language Navigation with Self-Supervised Auxiliary Reasoning
  Tasks
Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks
Fengda Zhu
Yi Zhu
Xiaojun Chang
Xiaodan Liang
LRM
115
244
0
18 Nov 2019
DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in
  Visual Dialogue
DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue
X. Jiang
Jiahao Yu
Zengchang Qin
Yingying Zhuang
Xingxing Zhang
Yue Hu
Qi Wu
90
70
0
17 Nov 2019
Iterative Answer Prediction with Pointer-Augmented Multimodal
  Transformers for TextVQA
Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA
Ronghang Hu
Amanpreet Singh
Trevor Darrell
Marcus Rohrbach
96
197
0
14 Nov 2019
Visual Dialogue State Tracking for Question Generation
Visual Dialogue State Tracking for Question Generation
Wei Pang
Xiaojie Wang
75
33
0
12 Nov 2019
Conditionally Learn to Pay Attention for Sequential Visual Task
Conditionally Learn to Pay Attention for Sequential Visual Task
Jun He
Quan-Jie Cao
Lei Zhang
44
0
0
11 Nov 2019
Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation
Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation
Yiming Xu
Lin Chen
Zhongwei Cheng
Lixin Duan
Jiebo Luo
OOD
86
24
0
11 Nov 2019
Multimodal Intelligence: Representation Learning, Information Fusion,
  and Applications
Multimodal Intelligence: Representation Learning, Information Fusion, and Applications
Chao Zhang
Zichao Yang
Xiaodong He
Li Deng
HAIAI4TS
122
338
0
10 Nov 2019
Drill-down: Interactive Retrieval of Complex Scenes using Natural
  Language Queries
Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries
Fuwen Tan
Paola Cascante-Bonilla
Xiaoxiao Guo
Hui Wu
Song Feng
Vicente Ordonez
66
30
0
10 Nov 2019
Contextual Grounding of Natural Language Entities in Images
Contextual Grounding of Natural Language Entities in Images
Farley Lai
Ning Xie
Derek Doran
Asim Kadav
ObjD
55
6
0
05 Nov 2019
Predicting the Politics of an Image Using Webly Supervised Data
Predicting the Politics of an Image Using Webly Supervised Data
Christopher Thomas
Adriana Kovashka
SSL
84
21
0
31 Oct 2019
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning
  Baselines
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
Jingxiang Lin
Unnat Jain
Alex Schwing
LRMReLM
102
9
0
31 Oct 2019
Learning Rich Image Region Representation for Visual Question Answering
Learning Rich Image Region Representation for Visual Question Answering
Bei Liu
Zhicheng Huang
Zhaoyang Zeng
Zheyu Chen
Jianlong Fu
60
9
0
29 Oct 2019
Heterogeneous Graph Learning for Visual Commonsense Reasoning
Heterogeneous Graph Learning for Visual Commonsense Reasoning
Weijiang Yu
Jingwen Zhou
Weihao Yu
Xiaodan Liang
Nong Xiao
LRM
79
47
0
25 Oct 2019
KnowIT VQA: Answering Knowledge-Based Questions about Videos
KnowIT VQA: Answering Knowledge-Based Questions about Videos
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
152
80
0
23 Oct 2019
Enforcing Reasoning in Visual Commonsense Reasoning
Enforcing Reasoning in Visual Commonsense Reasoning
Hammad A. Ayyubi
Md. Mehrab Tanjim
D. Kriegman
ReLMOOD
57
2
0
21 Oct 2019
Vatex Video Captioning Challenge 2020: Multi-View Features and Hybrid
  Reward Strategies for Video Captioning
Vatex Video Captioning Challenge 2020: Multi-View Features and Hybrid Reward Strategies for Video Captioning
Xinxin Zhu
A. Gorban
V. A. Makarov
Shichen Lu
I. Tyukin
Hanqing Lu
35
2
0
17 Oct 2019
Integrating Temporal and Spatial Attentions for VATEX Video Captioning
  Challenge 2019
Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 2019
Shizhe Chen
Yida Zhao
Yuqing Song
Qin Jin
Qi Wu
30
0
0
15 Oct 2019
Understanding Misclassifications by Attributes
Understanding Misclassifications by Attributes
Sadaf Gulshad
Zeynep Akata
J. H. Metzen
A. Smeulders
AAML
95
0
0
15 Oct 2019
Exploring Overall Contextual Information for Image Captioning in
  Human-Like Cognitive Style
Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style
Hongwei Ge
Zehang Yan
Kai Zhang
Mingde Zhao
Liang Sun
54
25
0
15 Oct 2019
VATEX Captioning Challenge 2019: Multi-modal Information Fusion and
  Multi-stage Training Strategy for Video Captioning
VATEX Captioning Challenge 2019: Multi-modal Information Fusion and Multi-stage Training Strategy for Video Captioning
Ziqi Zhang
Yaya Shi
Jiutong Wei
Chunfen Yuan
Bing Li
Weiming Hu
47
0
0
13 Oct 2019
Granular Multimodal Attention Networks for Visual Dialog
Granular Multimodal Attention Networks for Visual Dialog
Badri N. Patro
Shivansh Patel
Vinay P. Namboodiri
114
1
0
13 Oct 2019
Cross-modal Scene Graph Matching for Relationship-aware Image-Text
  Retrieval
Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval
Sijin Wang
Ruiping Wang
Ziwei Yao
Shiguang Shan
Xilin Chen
3DV
88
213
0
11 Oct 2019
Multi-modal Deep Analysis for Multimedia
Multi-modal Deep Analysis for Multimedia
Wenwu Zhu
Xin Eric Wang
Hongzhi Li
74
43
0
11 Oct 2019
Semantic-aware Image Deblurring
Semantic-aware Image Deblurring
Fuhai Chen
Rongrong Ji
Chengpeng Dai
Xiaoshuai Sun
Chia-Wen Lin
Jiayi Ji
Baochang Zhang
Feiyue Huang
Liujuan Cao
BDLVLM
111
6
0
09 Oct 2019
Modulated Self-attention Convolutional Network for VQA
Modulated Self-attention Convolutional Network for VQA
Jean-Benoit Delbrouck
Antoine Maiorca
Nathan Hubens
Stéphane Dupont
25
1
0
08 Oct 2019
Meta Module Network for Compositional Visual Reasoning
Meta Module Network for Compositional Visual Reasoning
Wenhu Chen
Zhe Gan
Linjie Li
Yu Cheng
Wenjie Wang
Jingjing Liu
LRM
93
71
0
08 Oct 2019
SMArT: Training Shallow Memory-aware Transformers for Robotic
  Explainability
SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
162
29
0
07 Oct 2019
REMIND Your Neural Network to Prevent Catastrophic Forgetting
REMIND Your Neural Network to Prevent Catastrophic Forgetting
Tyler L. Hayes
Kushal Kafle
Robik Shrestha
Manoj Acharya
Christopher Kanan
CLL
155
303
0
06 Oct 2019
Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention
  and Spatial Memory
Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention and Spatial Memory
A. Vasudevan
Ahmed K. Farahat
Chetan Gupta
LM&Ro
67
2
0
04 Oct 2019
Multi-Head Attention with Diversity for Learning Grounded Multilingual
  Multimodal Representations
Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations
Po-Yao (Bernie) Huang
Xiaojun Chang
Alexander G. Hauptmann
138
25
0
30 Sep 2019
UNITER: UNiversal Image-TExt Representation Learning
UNITER: UNiversal Image-TExt Representation Learning
Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El Kholy
Faisal Ahmed
Zhe Gan
Yu Cheng
Jingjing Liu
VLMOT
134
449
0
25 Sep 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLMVLM
365
948
0
24 Sep 2019
6D Pose Estimation with Correlation Fusion
6D Pose Estimation with Correlation Fusion
Yi Cheng
Erik Cambria
Ying Sun
C. Acar
Wei Jing
Yan Wu
Liyuan Li
Cheston Tan
Joo-Hwee Lim
87
15
0
24 Sep 2019
Non-monotonic Logical Reasoning Guiding Deep Learning for Explainable
  Visual Question Answering
Non-monotonic Logical Reasoning Guiding Deep Learning for Explainable Visual Question Answering
Heather Riley
Mohan Sridharan
NAI
54
0
0
23 Sep 2019
Explainable High-order Visual Question Reasoning: A New Benchmark and
  Knowledge-routed Network
Explainable High-order Visual Question Reasoning: A New Benchmark and Knowledge-routed Network
Qingxing Cao
Bailin Li
Xiaodan Liang
Liang Lin
57
13
0
23 Sep 2019
Adaptively Aligned Image Captioning via Adaptive Attention Time
Adaptively Aligned Image Captioning via Adaptive Attention Time
Lun Huang
Wenmin Wang
Yaxian Xia
Jie Chen
74
63
0
19 Sep 2019
Large-scale representation learning from visually grounded untranscribed
  speech
Large-scale representation learning from visually grounded untranscribed speech
Gabriel Ilharco
Yuan Zhang
Jason Baldridge
SSL
87
61
0
19 Sep 2019
Graph Neural Networks for Image Understanding Based on Multiple Cues:
  Group Emotion Recognition and Event Recognition as Use Cases
Graph Neural Networks for Image Understanding Based on Multiple Cues: Group Emotion Recognition and Event Recognition as Use Cases
Xin Guo
Luisa F. Polanía
Bin Zhu
C. Boncelet
Kenneth Barner
83
28
0
19 Sep 2019
Inverse Visual Question Answering with Multi-Level Attentions
Inverse Visual Question Answering with Multi-Level Attentions
Yaser Alwatter
Yuhong Guo
BDL
35
1
0
17 Sep 2019
Part-Guided Attention Learning for Vehicle Instance Retrieval
Part-Guided Attention Learning for Vehicle Instance Retrieval
Xinyu Zhang
Rufeng Zhang
Jiewei Cao
Dong Gong
Mingyu You
Chunhua Shen
97
37
0
13 Sep 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Zihao Wang
Xihui Liu
Hongsheng Li
Lu Sheng
Junjie Yan
Xiaogang Wang
Jing Shao
VLM
110
310
0
12 Sep 2019
PDANet: Polarity-consistent Deep Attention Network for Fine-grained
  Visual Emotion Regression
PDANet: Polarity-consistent Deep Attention Network for Fine-grained Visual Emotion Regression
Sicheng Zhao
Zizhou Jia
Hui Chen
Leida Li
Guiguang Ding
Kurt Keutzer
89
62
0
11 Sep 2019
Probabilistic framework for solving Visual Dialog
Probabilistic framework for solving Visual Dialog
Badri N. Patro
Anupriy
Vinay P. Namboodiri
BDL
141
13
0
11 Sep 2019
Sunny and Dark Outside?! Improving Answer Consistency in VQA through
  Entailed Question Generation
Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation
Arijit Ray
Karan Sikka
Ajay Divakaran
Stefan Lee
Giedrius Burachas
83
65
0
10 Sep 2019
Compositional Generalization in Image Captioning
Compositional Generalization in Image Captioning
Mitja Nikolaus
Mostafa Abdou
Matthew Lamm
Rahul Aralikatte
Desmond Elliott
CoGe
89
49
0
10 Sep 2019
Previous
123...313233...363738
Next