ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.07998
  4. Cited By
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
v1v2v3 (latest)

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
    AIMat
ArXiv (abs)PDFHTML

Papers citing "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"

50 / 1,868 papers shown
Title
CapWAP: Captioning with a Purpose
CapWAP: Captioning with a Purpose
Adam Fisch
Kenton Lee
Ming-Wei Chang
J. Clark
Regina Barzilay
53
11
0
09 Nov 2020
Imagining Grounded Conceptual Representations from Perceptual
  Information in Situated Guessing Games
Imagining Grounded Conceptual Representations from Perceptual Information in Situated Guessing Games
Alessandro Suglia
Antonio Vergari
Ioannis Konstas
Yonatan Bisk
E. Bastianelli
Andrea Vanzo
Oliver Lemon
OCL
43
10
0
05 Nov 2020
DTGAN: Dual Attention Generative Adversarial Networks for Text-to-Image
  Generation
DTGAN: Dual Attention Generative Adversarial Networks for Text-to-Image Generation
Zhenxing Zhang
Lambert Schomaker
GAN
67
35
0
05 Nov 2020
Utilizing Every Image Object for Semi-supervised Phrase Grounding
Utilizing Every Image Object for Semi-supervised Phrase Grounding
Haidong Zhu
Arka Sadhu
Zhao-Heng Zheng
Ram Nevatia
ObjD
66
7
0
05 Nov 2020
An Improved Attention for Visual Question Answering
An Improved Attention for Visual Question Answering
Tanzila Rahman
Shih-Han Chou
Leonid Sigal
Giuseppe Carenini
55
45
0
04 Nov 2020
Cross-Media Keyphrase Prediction: A Unified Framework with
  Multi-Modality Multi-Head Attention and Image Wordings
Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings
Yue Wang
Jing Li
Michael R. Lyu
Irwin King
75
16
0
03 Nov 2020
Dual Attention on Pyramid Feature Maps for Image Captioning
Dual Attention on Pyramid Feature Maps for Image Captioning
Litao Yu
Jian Zhang
Qiang Wu
108
50
0
02 Nov 2020
Diverse Image Captioning with Context-Object Split Latent Spaces
Diverse Image Captioning with Context-Object Split Latent Spaces
Shweta Mahajan
Stefan Roth
64
42
0
02 Nov 2020
Boost Image Captioning with Knowledge Reasoning
Boost Image Captioning with Knowledge Reasoning
Feicheng Huang
Zhixin Li
Haiyang Wei
Canlong Zhang
Huifang Ma
38
25
0
02 Nov 2020
Exploring Dynamic Context for Multi-path Trajectory Prediction
Exploring Dynamic Context for Multi-path Trajectory Prediction
Hao Cheng
Wentong Liao
Xuejiao Tang
M. Yang
Monika Sester
Bodo Rosenhahn
105
33
0
30 Oct 2020
Generating Radiology Reports via Memory-driven Transformer
Generating Radiology Reports via Memory-driven Transformer
Zhihong Chen
Yan Song
Tsung-Hui Chang
Xiang Wan
MedIm
76
486
0
30 Oct 2020
Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a
  Class-imbalance View
Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance View
Yangyang Guo
Liqiang Nie
Zhiyong Cheng
Q. Tian
Min Zhang
116
70
0
30 Oct 2020
MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual
  Question Answering
MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering
Aisha Urooj Khan
Amir Mazaheri
N. Lobo
M. Shah
97
57
0
27 Oct 2020
Learning Multi-Agent Coordination for Enhancing Target Coverage in
  Directional Sensor Networks
Learning Multi-Agent Coordination for Enhancing Target Coverage in Directional Sensor Networks
Jing Xu
Fangwei Zhong
Yizhou Wang
83
50
0
25 Oct 2020
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question
  Answering
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering
Zanxia Jin
Heran Wu
Chun Yang
Fang Zhou
Jingyan Qin
Lei Xiao
Xu-Cheng Yin
88
31
0
24 Oct 2020
Beyond VQA: Generating Multi-word Answer and Rationale to Visual
  Questions
Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions
Radhika Dua
Sai Srinivas Kancheti
V. Balasubramanian
LRM
88
22
0
24 Oct 2020
Unsupervised Vision-and-Language Pre-training Without Parallel Images
  and Captions
Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions
Liunian Harold Li
Haoxuan You
Zhecan Wang
Alireza Zareian
Shih-Fu Chang
Kai-Wei Chang
SSLVLM
101
12
0
24 Oct 2020
Can images help recognize entities? A study of the role of images for
  Multimodal NER
Can images help recognize entities? A study of the role of images for Multimodal NER
Shuguang Chen
Gustavo Aguilar
Leonardo Neves
Thamar Solorio
EgoV
90
37
0
23 Oct 2020
Show and Speak: Directly Synthesize Spoken Description of Images
Show and Speak: Directly Synthesize Spoken Description of Images
Xinsheng Wang
Siyuan Feng
Jihua Zhu
M. Hasegawa-Johnson
O. Scharenborg
152
4
0
23 Oct 2020
Beyond the Deep Metric Learning: Enhance the Cross-Modal Matching with
  Adversarial Discriminative Domain Regularization
Beyond the Deep Metric Learning: Enhance the Cross-Modal Matching with Adversarial Discriminative Domain Regularization
Li Ren
Keqin Li
Liqiang Wang
K. Hua
54
4
0
23 Oct 2020
Language-Conditioned Imitation Learning for Robot Manipulation Tasks
Language-Conditioned Imitation Learning for Robot Manipulation Tasks
Simon Stepputtis
Joseph Campbell
Mariano Phielipp
Stefan Lee
Chitta Baral
H. B. Amor
LM&Ro
200
205
0
22 Oct 2020
Learning Dual Semantic Relations with Graph Attention for Image-Text
  Matching
Learning Dual Semantic Relations with Graph Attention for Image-Text Matching
Keyu Wen
Xiaodong Gu
Qingrong Cheng
76
97
0
22 Oct 2020
Removing Bias in Multi-modal Classifiers: Regularization by Maximizing
  Functional Entropies
Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies
Itai Gat
Idan Schwartz
Alex Schwing
Tamir Hazan
106
92
0
21 Oct 2020
Bayesian Attention Modules
Bayesian Attention Modules
Xinjie Fan
Shujian Zhang
Bo Chen
Mingyuan Zhou
183
62
0
20 Oct 2020
Multimodal Research in Vision and Language: A Review of Current and
  Emerging Trends
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends
Shagun Uppal
Sarthak Bhagat
Devamanyu Hazarika
Navonil Majumdar
Soujanya Poria
Roger Zimmermann
Amir Zadeh
101
6
0
19 Oct 2020
Image Captioning with Visual Object Representations Grounded in the
  Textual Modality
Image Captioning with Visual Object Representations Grounded in the Textual Modality
Duvsan Varivs
Katsuhito Sudoh
Satoshi Nakamura
35
1
0
19 Oct 2020
Language and Visual Entity Relationship Graph for Agent Navigation
Language and Visual Entity Relationship Graph for Agent Navigation
Yicong Hong
Cristian Rodriguez-Opazo
Yuankai Qi
Qi Wu
Stephen Gould
LM&Ro
226
135
0
19 Oct 2020
Unsupervised Foveal Vision Neural Networks with Top-Down Attention
Unsupervised Foveal Vision Neural Networks with Top-Down Attention
Ryan Burt
Nina N. Thigpen
A. Keil
José C. Príncipe
56
2
0
18 Oct 2020
Hierarchical Conditional Relation Networks for Multimodal Video Question
  Answering
Hierarchical Conditional Relation Networks for Multimodal Video Question Answering
T. Le
Vuong Le
Svetha Venkatesh
T. Tran
BDL
138
23
0
18 Oct 2020
Answer-checking in Context: A Multi-modal FullyAttention Network for
  Visual Question Answering
Answer-checking in Context: A Multi-modal FullyAttention Network for Visual Question Answering
Hantao Huang
Tao Han
Wei Han
D. Yap
Cheng-Ming Chiang
28
4
0
17 Oct 2020
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
Wei Chen
Weiping Wang
Li Liu
M. Lew
VLM
169
33
0
16 Oct 2020
Natural Language Rationales with Full-Stack Visual Reasoning: From
  Pixels to Semantic Frames to Commonsense Graphs
Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs
Ana Marasović
Chandra Bhagavatula
J. S. Park
Ronan Le Bras
Noah A. Smith
Yejin Choi
ReLMLRM
99
62
0
15 Oct 2020
The Benefit of Distraction: Denoising Remote Vitals Measurements using
  Inverse Attention
The Benefit of Distraction: Denoising Remote Vitals Measurements using Inverse Attention
E. Nowara
Daniel J. McDuff
Ashok Veeraraghavan
53
13
0
14 Oct 2020
Does my multimodal model learn cross-modal interactions? It's harder to
  tell than you might think!
Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think!
Jack Hessel
Lillian Lee
108
76
0
13 Oct 2020
DORi: Discovering Object Relationship for Moment Localization of a
  Natural-Language Query in Video
DORi: Discovering Object Relationship for Moment Localization of a Natural-Language Query in Video
Cristian Rodriguez-Opazo
Edison Marrese-Taylor
Basura Fernando
Hongdong Li
Stephen Gould
192
10
0
13 Oct 2020
Contrast and Classify: Training Robust VQA Models
Contrast and Classify: Training Robust VQA Models
Yash Kant
A. Moudgil
Dhruv Batra
Devi Parikh
Harsh Agrawal
55
5
0
13 Oct 2020
TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for
  Sign Language Translation
TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation
Dongxu Li
Chenchen Xu
Xin Yu
Kaihao Zhang
Ben Swift
H. Suominen
Hongdong Li
SLR
60
124
0
12 Oct 2020
MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase
  Grounding
MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding
Qinxin Wang
Hao Tan
Sheng Shen
Michael W. Mahoney
Z. Yao
ObjD
147
11
0
12 Oct 2020
Interpretable Neural Computation for Real-World Compositional Visual
  Question Answering
Interpretable Neural Computation for Real-World Compositional Visual Question Answering
Ruixue Tang
Chao Ma
CoGe
26
2
0
10 Oct 2020
Dense Relational Image Captioning via Multi-task Triple-Stream Networks
Dense Relational Image Captioning via Multi-task Triple-Stream Networks
Dong-Jin Kim
Tae-Hyun Oh
Jinsoo Choi
In So Kweon
115
27
0
08 Oct 2020
Visual News: Benchmark and Challenges in News Image Captioning
Visual News: Benchmark and Challenges in News Image Captioning
Fuxiao Liu
Yinghan Wang
Tianlu Wang
Vicente Ordonez
VLM
86
116
0
08 Oct 2020
Universal Weighting Metric Learning for Cross-Modal Matching
Universal Weighting Metric Learning for Cross-Modal Matching
Jiwei Wei
Xing Xu
Yang Yang
Yanli Ji
Zheng Wang
Heng Tao Shen
70
89
0
07 Oct 2020
Vision Skills Needed to Answer Visual Questions
Vision Skills Needed to Answer Visual Questions
Xiaoyu Zeng
Yanan Wang
Tai-Yin Chiu
Nilavra Bhattacharya
Danna Gurari
66
18
0
07 Oct 2020
Learning to Represent Image and Text with Denotation Graph
Learning to Represent Image and Text with Denotation Graph
Bowen Zhang
Hexiang Hu
Vihan Jain
Eugene Ie
Fei Sha
78
22
0
06 Oct 2020
Fine-Grained Grounding for Multimodal Speech Recognition
Fine-Grained Grounding for Multimodal Speech Recognition
Tejas Srinivasan
Ramon Sanabria
Florian Metze
Desmond Elliott
76
11
0
05 Oct 2020
Attention Guided Semantic Relationship Parsing for Visual Question
  Answering
Attention Guided Semantic Relationship Parsing for Visual Question Answering
M. Farazi
Salman Khan
Nick Barnes
43
2
0
05 Oct 2020
UNISON: Unpaired Cross-lingual Image Captioning
UNISON: Unpaired Cross-lingual Image Captioning
Jiahui Gao
Yi Zhou
Philip L. H. Yu
Shafiq Joty
Jiuxiang Gu
82
17
0
03 Oct 2020
Taking Modality-free Human Identification as Zero-shot Learning
Taking Modality-free Human Identification as Zero-shot Learning
Zhizhe Liu
Xingxing Zhang
Zhenfeng Zhu
Shuai Zheng
Yao Zhao
Jian Cheng
56
4
0
02 Oct 2020
ISAAQ -- Mastering Textbook Questions with Pre-trained Transformers and
  Bottom-Up and Top-Down Attention
ISAAQ -- Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention
José Manuél Gómez-Pérez
Raúl Ortega
61
24
0
01 Oct 2020
Answer-Driven Visual State Estimator for Goal-Oriented Visual Dialogue
Answer-Driven Visual State Estimator for Goal-Oriented Visual Dialogue
Zipeng Xu
Fangxiang Feng
Xiaojie Wang
Yushu Yang
Huixing Jiang
Zhongyuan Ouyang
47
7
0
01 Oct 2020
Previous
123...252627...363738
Next