ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.07998
  4. Cited By
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
v1v2v3 (latest)

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
    AIMat
ArXiv (abs)PDFHTML

Papers citing "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"

50 / 1,868 papers shown
Title
Semantic Equivalent Adversarial Data Augmentation for Visual Question
  Answering
Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering
Ruixue Tang
Chao Ma
W. Zhang
Qi Wu
Xiaokang Yang
OOD
72
49
0
19 Jul 2020
Length-Controllable Image Captioning
Length-Controllable Image Captioning
Chaorui Deng
Ning Ding
Mingkui Tan
Qi Wu
VLM
81
57
0
19 Jul 2020
Learning to Discretely Compose Reasoning Module Networks for Video
  Captioning
Learning to Discretely Compose Reasoning Module Networks for Video Captioning
Ganchao Tan
Daqing Liu
Meng Wang
Zhengjun Zha
LRM
86
74
0
17 Jul 2020
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval
Christopher Thomas
Adriana Kovashka
128
41
0
16 Jul 2020
Explore and Explain: Self-supervised Navigation and Recounting
Explore and Explain: Self-supervised Navigation and Recounting
Roberto Bigazzi
Federico Landi
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
EgoVLM&Ro
78
17
0
14 Jul 2020
Compare and Reweight: Distinctive Image Captioning Using Similar Images
  Sets
Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
70
45
0
14 Jul 2020
RATT: Recurrent Attention to Transient Tasks for Continual Image
  Captioning
RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning
Riccardo Del Chiaro
Bartlomiej Twardowski
Andrew D. Bagdanov
Joost van de Weijer
CLLVLM
77
41
0
13 Jul 2020
Reducing Language Biases in Visual Question Answering with
  Visually-Grounded Question Encoder
Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder
K. Gouthaman
Anurag Mittal
98
79
0
13 Jul 2020
Image Captioning with Compositional Neural Module Networks
Image Captioning with Compositional Neural Module Networks
Junjiao Tian
Jean Oh
44
11
0
10 Jul 2020
DCANet: Learning Connected Attentions for Convolutional Neural Networks
DCANet: Learning Connected Attentions for Convolutional Neural Networks
Xu Ma
Jingda Guo
Sihai Tang
Zhinan Qiao
Qi Chen
Qing Yang
Song Fu
41
15
0
09 Jul 2020
Learning to Reweight with Deep Interactions
Learning to Reweight with Deep Interactions
Yang Fan
Yingce Xia
Lijun Wu
Shufang Xie
Weiqing Liu
Jiang Bian
Tao Qin
Xiang-Yang Li
76
9
0
09 Jul 2020
IQ-VQA: Intelligent Visual Question Answering
IQ-VQA: Intelligent Visual Question Answering
Vatsal Goel
Mohit Chandak
A. Anand
Prithwijit Guha
64
5
0
08 Jul 2020
SmaAt-UNet: Precipitation Nowcasting using a Small Attention-UNet
  Architecture
SmaAt-UNet: Precipitation Nowcasting using a Small Attention-UNet Architecture
Kevin Trebing
Tomasz Stanczyk
S. Mehrkanoon
109
341
0
08 Jul 2020
Dynamic Graph Representation Learning for Video Dialog via Multi-Modal
  Shuffled Transformers
Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers
Shijie Geng
Peng Gao
Moitreya Chatterjee
Chiori Hori
Jonathan Le Roux
Yongfeng Zhang
Hongsheng Li
A. Cherian
101
11
0
08 Jul 2020
Diverse and Styled Image Captioning Using SVD-Based Mixture of Recurrent
  Experts
Diverse and Styled Image Captioning Using SVD-Based Mixture of Recurrent Experts
Marzi Heidari
M. Ghatee
A. Nickabadi
Arash Pourhasan Nezhad
DiffMMoE
84
1
0
07 Jul 2020
EDSL: An Encoder-Decoder Architecture with Symbol-Level Features for
  Printed Mathematical Expression Recognition
EDSL: An Encoder-Decoder Architecture with Symbol-Level Features for Printed Mathematical Expression Recognition
Yingnan Fu
Tingting Liu
Ming Gao
Aoying Zhou
100
7
0
06 Jul 2020
Auto-captions on GIF: A Large-scale Video-sentence Dataset for
  Vision-language Pre-training
Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training
Yingwei Pan
Yehao Li
Jianjie Luo
Jun Xu
Ting Yao
Tao Mei
100
59
0
05 Jul 2020
Modality Shifting Attention Network for Multi-modal Video Question
  Answering
Modality Shifting Attention Network for Multi-modal Video Question Answering
Junyeong Kim
Minuk Ma
T. Pham
Kyungsu Kim
Chang D. Yoo
84
72
0
04 Jul 2020
Learning to Discover Multi-Class Attentional Regions for Multi-Label
  Image Recognition
Learning to Discover Multi-Class Attentional Regions for Multi-Label Image Recognition
Bin-Bin Gao
Hong-Yu Zhou
71
115
0
03 Jul 2020
A Competence-aware Curriculum for Visual Concepts Learning via Question
  Answering
A Competence-aware Curriculum for Visual Concepts Learning via Question Answering
Qing Li
Siyuan Huang
Yining Hong
Song-Chun Zhu
119
29
0
03 Jul 2020
Scene Graph Reasoning for Visual Question Answering
Scene Graph Reasoning for Visual Question Answering
Marcel Hildebrandt
Hang Li
Rajat Koner
Volker Tresp
Stephan Günnemann
GNN
79
64
0
02 Jul 2020
The Impact of Explanations on AI Competency Prediction in VQA
The Impact of Explanations on AI Competency Prediction in VQA
Kamran Alipour
Arijit Ray
Xiaoyu Lin
J. Schulze
Yi Yao
Giedrius Burachas
51
9
0
02 Jul 2020
DocVQA: A Dataset for VQA on Document Images
DocVQA: A Dataset for VQA on Document Images
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
169
748
0
01 Jul 2020
Latent Compositional Representations Improve Systematic Generalization
  in Grounded Question Answering
Latent Compositional Representations Improve Systematic Generalization in Grounded Question Answering
Ben Bogin
Sanjay Subramanian
Matt Gardner
Jonathan Berant
ReLMOODBDLLRM
57
28
0
01 Jul 2020
A Transformer-based Audio Captioning Model with Keyword Estimation
A Transformer-based Audio Captioning Model with Keyword Estimation
Yuma Koizumi
Ryo Masumura
Kyosuke Nishida
Masahiro Yasuda
Shoichiro Saito
116
54
0
01 Jul 2020
Modality-Agnostic Attention Fusion for visual search with text feedback
Modality-Agnostic Attention Fusion for visual search with text feedback
Eric Dodds
Jack Culpepper
Simão Herdade
Yang Zhang
K. Boakye
EgoV
100
74
0
30 Jun 2020
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through
  Scene Graph
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
Fei Yu
Jiji Tang
Weichong Yin
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
128
382
0
30 Jun 2020
Graph Optimal Transport for Cross-Domain Alignment
Graph Optimal Transport for Cross-Domain Alignment
Liqun Chen
Zhe Gan
Yu Cheng
Linjie Li
Lawrence Carin
Jingjing Liu
OT
115
152
0
26 Jun 2020
Self-Segregating and Coordinated-Segregating Transformer for Focused
  Deep Multi-Modular Network for Visual Question Answering
Self-Segregating and Coordinated-Segregating Transformer for Focused Deep Multi-Modular Network for Visual Question Answering
C. Sur
21
9
0
25 Jun 2020
Improving Image Captioning with Better Use of Captions
Improving Image Captioning with Better Use of Captions
Zhan Shi
Xu Zhou
Xipeng Qiu
Xiao-Dan Zhu
66
128
0
21 Jun 2020
Off-Policy Self-Critical Training for Transformer in Visual Paragraph
  Generation
Off-Policy Self-Critical Training for Transformer in Visual Paragraph Generation
Shiyang Yan
Yang Hua
N. Robertson
OffRL
42
0
0
21 Jun 2020
Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"
Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"
Saeed Amizadeh
Hamid Palangi
Oleksandr Polozov
Yichen Huang
K. Koishida
NAILRM
121
60
0
20 Jun 2020
Neural Parameter Allocation Search
Neural Parameter Allocation Search
Bryan A. Plummer
Nikoli Dryden
Julius Frost
Torsten Hoefler
Kate Saenko
122
16
0
18 Jun 2020
Overcoming Statistical Shortcuts for Open-ended Visual Counting
Overcoming Statistical Shortcuts for Open-ended Visual Counting
Corentin Dancette
Rémi Cadène
Xinlei Chen
Matthieu Cord
36
3
0
17 Jun 2020
Contrastive Learning for Weakly Supervised Phrase Grounding
Contrastive Learning for Weakly Supervised Phrase Grounding
Tanmay Gupta
Arash Vahdat
Gal Chechik
Xiaodong Yang
Jan Kautz
Derek Hoiem
ObjDSSL
168
144
0
17 Jun 2020
Foreground-Background Imbalance Problem in Deep Object Detectors: A
  Review
Foreground-Background Imbalance Problem in Deep Object Detectors: A Review
Joya Chen
Qi Wu
Dong Liu
Tong Xu
ObjD
52
25
0
16 Jun 2020
Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual
  Question Answering
Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering
Zihao Zhu
Jiahao Yu
Yujing Wang
Yajing Sun
Yue Hu
Qi Wu
107
129
0
16 Jun 2020
Exploiting Visual Semantic Reasoning for Video-Text Retrieval
Exploiting Visual Semantic Reasoning for Video-Text Retrieval
Zerun Feng
Zhimin Zeng
Caili Guo
Zheng Li
79
36
0
16 Jun 2020
ORD: Object Relationship Discovery for Visual Dialogue Generation
ORD: Object Relationship Discovery for Visual Dialogue Generation
Ziwei Wang
Zi Huang
Yadan Luo
Huimin Lu
49
4
0
15 Jun 2020
Mitigating Gender Bias in Captioning Systems
Mitigating Gender Bias in Captioning Systems
Ruixiang Tang
Mengnan Du
Yuening Li
Zirui Liu
Na Zou
Helen Zhou
FaML
124
66
0
15 Jun 2020
AMENet: Attentive Maps Encoder Network for Trajectory Prediction
AMENet: Attentive Maps Encoder Network for Trajectory Prediction
Hao Cheng
Wentong Liao
M. Yang
Bodo Rosenhahn
Monika Sester
88
46
0
15 Jun 2020
Sparse and Continuous Attention Mechanisms
Sparse and Continuous Attention Mechanisms
André F. T. Martins
António Farinhas
Marcos Vinícius Treviso
Vlad Niculae
P. Aguiar
Mário A. T. Figueiredo
77
41
0
12 Jun 2020
VirTex: Learning Visual Representations from Textual Annotations
VirTex: Learning Visual Representations from Textual Annotations
Karan Desai
Justin Johnson
SSLVLM
173
437
0
11 Jun 2020
Large-Scale Adversarial Training for Vision-and-Language Representation
  Learning
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjDVLM
133
501
0
11 Jun 2020
Estimating semantic structure for the VQA answer space
Estimating semantic structure for the VQA answer space
Corentin Kervadec
G. Antipov
M. Baccouche
Christian Wolf
47
4
0
10 Jun 2020
Roses Are Red, Violets Are Blue... but Should Vqa Expect Them To?
Roses Are Red, Violets Are Blue... but Should Vqa Expect Them To?
Corentin Kervadec
G. Antipov
M. Baccouche
Christian Wolf
OOD
90
90
0
09 Jun 2020
Counterfactual VQA: A Cause-Effect Look at Language Bias
Counterfactual VQA: A Cause-Effect Look at Language Bias
Yulei Niu
Kaihua Tang
Hanwang Zhang
Zhiwu Lu
Xiansheng Hua
Ji-Rong Wen
CML
147
403
0
08 Jun 2020
Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report
  Generation
Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report Generation
Mingjie Li
Fuyu Wang
Xiaojun Chang
Xiaodan Liang
MedIm
86
107
0
06 Jun 2020
A Dataset and Benchmarks for Multimedia Social Analysis
A Dataset and Benchmarks for Multimedia Social Analysis
Bofan Xue
David M. Chan
John F. Canny
VGen
44
0
0
05 Jun 2020
Explaining Autonomous Driving by Learning End-to-End Visual Attention
Explaining Autonomous Driving by Learning End-to-End Visual Attention
Luca Cultrera
Lorenzo Seidenari
Federico Becattini
P. Pala
A. Bimbo
65
49
0
05 Jun 2020
Previous
123...272829...363738
Next