ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.07998
  4. Cited By
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
    AIMat
ArXivPDFHTML

Papers citing "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"

50 / 782 papers shown
Title
Generating Diverse and Informative Natural Language Fashion Feedback
Generating Diverse and Informative Natural Language Fashion Feedback
Gil Sadeh
L. Fritz
Gabi Shalev
Eduard Oks
11
5
0
15 Jun 2019
Image Captioning: Transforming Objects into Words
Image Captioning: Transforming Objects into Words
Simão Herdade
Armin Kappeler
K. Boakye
Joao Soares
ViT
45
462
0
14 Jun 2019
Improving Neural Language Modeling via Adversarial Training
Improving Neural Language Modeling via Adversarial Training
Dilin Wang
Chengyue Gong
Qiang Liu
AAML
43
115
0
10 Jun 2019
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via
  Question Answering
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering
Zhou Yu
D. Xu
Jun-chen Yu
Ting Yu
Zhou Zhao
Yueting Zhuang
Dacheng Tao
24
439
0
06 Jun 2019
Relational Reasoning using Prior Knowledge for Visual Captioning
Relational Reasoning using Prior Knowledge for Visual Captioning
Jingyi Hou
Xinxiao Wu
Yayun Qi
Wentian Zhao
Jiebo Luo
Yunde Jia
17
14
0
04 Jun 2019
Masked Non-Autoregressive Image Captioning
Masked Non-Autoregressive Image Captioning
Junlong Gao
Xi Meng
Shiqi Wang
Xia Li
Shanshe Wang
Siwei Ma
Wen Gao
19
36
0
03 Jun 2019
Efficient Object Embedding for Spliced Image Retrieval
Efficient Object Embedding for Spliced Image Retrieval
Bor-Chun Chen
Zuxuan Wu
L. Davis
Ser-Nam Lim
32
8
0
28 May 2019
Multimodal Transformer with Multi-View Visual Representation for Image
  Captioning
Multimodal Transformer with Multi-View Visual Representation for Image Captioning
Jun-chen Yu
Jing Li
Zhou Yu
Qingming Huang
ViT
27
377
0
20 May 2019
Language-Conditioned Graph Networks for Relational Reasoning
Language-Conditioned Graph Networks for Relational Reasoning
Ronghang Hu
Anna Rohrbach
Trevor Darrell
Kate Saenko
31
171
0
10 May 2019
Pointing Novel Objects in Image Captioning
Pointing Novel Objects in Image Captioning
Yehao Li
Ting Yao
Yingwei Pan
Hongyang Chao
Tao Mei
33
69
0
25 Apr 2019
HAR-Net: Joint Learning of Hybrid Attention for Single-stage Object
  Detection
HAR-Net: Joint Learning of Hybrid Attention for Single-stage Object Detection
Yali Li
Shengjin Wang
22
33
0
25 Apr 2019
Deep Metric Learning Beyond Binary Supervision
Deep Metric Learning Beyond Binary Supervision
Sungyeon Kim
Minkyo Seo
Ivan Laptev
Minsu Cho
Suha Kwak
SSL
20
94
0
21 Apr 2019
Attentive Single-Tasking of Multiple Tasks
Attentive Single-Tasking of Multiple Tasks
Kevis-Kokitsi Maninis
Ilija Radosavovic
Iasonas Kokkinos
77
245
0
18 Apr 2019
Learning to Collocate Neural Modules for Image Captioning
Learning to Collocate Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Jianfei Cai
25
77
0
18 Apr 2019
Progressive Attention Memory Network for Movie Story Question Answering
Progressive Attention Memory Network for Movie Story Question Answering
Junyeong Kim
Minuk Ma
Kyungsu Kim
Sungjin Kim
Chang D. Yoo
13
76
0
18 Apr 2019
Question Guided Modular Routing Networks for Visual Question Answering
Question Guided Modular Routing Networks for Visual Question Answering
Yanze Wu
Qiang Sun
Jianqi Ma
Bin Li
Yanwei Fu
Yao Peng
Xiangyang Xue
23
1
0
17 Apr 2019
Self-critical n-step Training for Image Captioning
Self-critical n-step Training for Image Captioning
Junlong Gao
Shiqi Wang
Shanshe Wang
Siwei Ma
Wen Gao
22
55
0
15 Apr 2019
Factor Graph Attention
Factor Graph Attention
Idan Schwartz
Seunghak Yu
Tamir Hazan
Alex Schwing
30
110
0
11 Apr 2019
Reasoning Visual Dialogs with Structural and Partial Observations
Reasoning Visual Dialogs with Structural and Partial Observations
Zilong Zheng
Wenguan Wang
Siyuan Qi
Song-Chun Zhu
39
117
0
11 Apr 2019
The Steep Road to Happily Ever After: An Analysis of Current Visual
  Storytelling Models
The Steep Road to Happily Ever After: An Analysis of Current Visual Storytelling Models
Yatri Modi
Natalie Parde
21
16
0
06 Apr 2019
Good News, Everyone! Context driven entity-aware captioning for news
  images
Good News, Everyone! Context driven entity-aware captioning for news images
Ali Furkan Biten
Lluís Gómez
Marçal Rusiñol
Dimosthenis Karatzas
27
139
0
02 Apr 2019
Context and Attribute Grounded Dense Captioning
Context and Attribute Grounded Dense Captioning
Guojun Yin
Lu Sheng
Bin Liu
Nenghai Yu
Xiaogang Wang
Jing Shao
16
75
0
02 Apr 2019
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption
  Alignment
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment
Samyak Datta
Karan Sikka
Anirban Roy
Karuna Ahuja
Devi Parikh
Ajay Divakaran
22
103
0
27 Mar 2019
Attention Based Glaucoma Detection: A Large-scale Database and CNN Model
Attention Based Glaucoma Detection: A Large-scale Database and CNN Model
Liu Li
Mai Xu
Xiaofei Wang
Lai Jiang
Hanruo Liu
37
202
0
26 Mar 2019
Dense Relational Captioning: Triple-Stream Networks for
  Relationship-Based Captioning
Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning
Dong-Jin Kim
Jinsoo Choi
Tae-Hyun Oh
In So Kweon
14
84
0
14 Mar 2019
MirrorGAN: Learning Text-to-image Generation by Redescription
MirrorGAN: Learning Text-to-image Generation by Redescription
Tingting Qiao
Jing Zhang
Duanqing Xu
Dacheng Tao
VLM
GAN
33
538
0
14 Mar 2019
Improving Referring Expression Grounding with Cross-modal
  Attention-guided Erasing
Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing
Xihui Liu
Zihao Wang
Jing Shao
Xiaogang Wang
Hongsheng Li
ObjD
19
180
0
03 Mar 2019
Answer Them All! Toward Universal Visual Question Answering Models
Answer Them All! Toward Universal Visual Question Answering Models
Robik Shrestha
Kushal Kafle
Christopher Kanan
25
82
0
01 Mar 2019
Image-Question-Answer Synergistic Network for Visual Dialog
Image-Question-Answer Synergistic Network for Visual Dialog
Dalu Guo
Chang Xu
Dacheng Tao
19
74
0
26 Feb 2019
MUREL: Multimodal Relational Reasoning for Visual Question Answering
MUREL: Multimodal Relational Reasoning for Visual Question Answering
Rémi Cadène
H. Ben-younes
Matthieu Cord
Nicolas Thome
LRM
19
271
0
25 Feb 2019
Dual Attention Networks for Visual Reference Resolution in Visual Dialog
Dual Attention Networks for Visual Reference Resolution in Visual Dialog
Gi-Cheon Kang
Jaeseo Lim
Byoung-Tak Zhang
22
72
0
25 Feb 2019
Taking a HINT: Leveraging Explanations to Make Vision and Language
  Models More Grounded
Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded
Ramprasaath R. Selvaraju
Stefan Lee
Yilin Shen
Hongxia Jin
Shalini Ghosh
Larry Heck
Dhruv Batra
Devi Parikh
FAtt
VLM
25
252
0
11 Feb 2019
Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog
Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog
Zhe Gan
Yu Cheng
Ahmed El Kholy
Linjie Li
Jingjing Liu
Jianfeng Gao
13
104
0
01 Feb 2019
Visual Entailment: A Novel Task for Fine-Grained Image Understanding
Visual Entailment: A Novel Task for Fine-Grained Image Understanding
Ning Xie
Farley Lai
Derek Doran
Asim Kadav
CoGe
53
322
0
20 Jan 2019
Evaluating Text-to-Image Matching using Binary Image Selection (BISON)
Evaluating Text-to-Image Matching using Binary Image Selection (BISON)
Hexiang Hu
Ishan Misra
Laurens van der Maaten
24
22
0
19 Jan 2019
Scene Graph Reasoning with Prior Visual Relationship for Visual Question
  Answering
Scene Graph Reasoning with Prior Visual Relationship for Visual Question Answering
Zhuoqian Yang
Zengchang Qin
Jing Yu
Yue Hu
GNN
25
16
0
23 Dec 2018
A Multi-task Neural Approach for Emotion Attribution, Classification and
  Summarization
A Multi-task Neural Approach for Emotion Attribution, Classification and Summarization
Guoyun Tu
Yanwei Fu
Boyang Albert Li
Jiarui Gao
Yu-Gang Jiang
Xiangyang Xue
17
29
0
21 Dec 2018
nocaps: novel object captioning at scale
nocaps: novel object captioning at scale
Harsh Agrawal
Karan Desai
Yufei Wang
Xinlei Chen
Rishabh Jain
Mark Johnson
Dhruv Batra
Devi Parikh
Stefan Lee
Peter Anderson
VLM
21
470
0
20 Dec 2018
Grounded Video Description
Grounded Video Description
Luowei Zhou
Yannis Kalantidis
Xinlei Chen
Jason J. Corso
Marcus Rohrbach
29
191
0
17 Dec 2018
Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual
  Question Answering
Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering
Peng Gao
Zhengkai Jiang
Haoxuan You
Pan Lu
Steven C. H. Hoi
Xiaogang Wang
Hongsheng Li
AIMat
30
363
0
13 Dec 2018
Long-Term Feature Banks for Detailed Video Understanding
Long-Term Feature Banks for Detailed Video Understanding
Chao-Yuan Wu
Christoph Feichtenhofer
Haoqi Fan
Kaiming He
Philipp Krahenbuhl
Ross B. Girshick
62
477
0
12 Dec 2018
Neighbourhood Watch: Referring Expression Comprehension via
  Language-guided Graph Attention Networks
Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks
Peng Wang
Qi Wu
Jiewei Cao
Chunhua Shen
Lianli Gao
Anton Van Den Hengel
ObjD
22
252
0
12 Dec 2018
An Attempt towards Interpretable Audio-Visual Video Captioning
An Attempt towards Interpretable Audio-Visual Video Captioning
Yapeng Tian
Chenxiao Guan
Justin Goodman
Marc Moore
Chenliang Xu
36
20
0
07 Dec 2018
Recursive Visual Attention in Visual Dialog
Recursive Visual Attention in Visual Dialog
Yulei Niu
Hanwang Zhang
Manli Zhang
Jianhong Zhang
Zhiwu Lu
Ji-Rong Wen
28
118
0
06 Dec 2018
Auto-Encoding Scene Graphs for Image Captioning
Auto-Encoding Scene Graphs for Image Captioning
Xu Yang
Kaihua Tang
Hanwang Zhang
Jianfei Cai
30
693
0
06 Dec 2018
Explainable and Explicit Visual Reasoning over Scene Graphs
Explainable and Explicit Visual Reasoning over Scene Graphs
Jiaxin Shi
Hanwang Zhang
Juan-Zi Li
OCL
169
230
0
05 Dec 2018
Multi-task Learning of Hierarchical Vision-Language Representation
Multi-task Learning of Hierarchical Vision-Language Representation
Duy-Kien Nguyen
Takayuki Okatani
28
51
0
03 Dec 2018
From Recognition to Cognition: Visual Commonsense Reasoning
From Recognition to Cognition: Visual Commonsense Reasoning
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
LRM
BDL
OCL
ReLM
58
868
0
27 Nov 2018
Art2Real: Unfolding the Reality of Artworks via Semantically-Aware
  Image-to-Image Translation
Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-to-Image Translation
Matteo Tomei
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
DiffM
17
76
0
26 Nov 2018
Scene Graph Generation via Conditional Random Fields
Weilin Cong
Wenjie Wang
Wang-Chien Lee
GNN
27
22
0
20 Nov 2018
Previous
123...141516
Next