ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.07998
  4. Cited By
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
v1v2v3 (latest)

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
    AIMat
ArXiv (abs)PDFHTML

Papers citing "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"

50 / 1,868 papers shown
Title
An Entropy Clustering Approach for Assessing Visual Question Difficulty
An Entropy Clustering Approach for Assessing Visual Question Difficulty
K. Terao
Toru Tamaki
B. Raytchev
K. Kaneda
Shuníchi Satoh
OODAAML
58
1
0
12 Apr 2020
YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in
  Domain-Specific Videos
YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos
Shizhe Chen
Weiying Wang
Ludan Ruan
Linli Yao
Qin Jin
37
3
0
12 Apr 2020
Rephrasing visual questions by specifying the entropy of the answer
  distribution
Rephrasing visual questions by specifying the entropy of the answer distribution
K. Terao
Toru Tamaki
B. Raytchev
K. Kaneda
S. Satoh
OOD
44
2
0
10 Apr 2020
Learning to Scale Multilingual Representations for Vision-Language Tasks
Learning to Scale Multilingual Representations for Vision-Language Tasks
Andrea Burns
Donghyun Kim
Derry Wijaya
Kate Saenko
Bryan A. Plummer
50
35
0
09 Apr 2020
e-SNLI-VE: Corrected Visual-Textual Entailment with Natural Language
  Explanations
e-SNLI-VE: Corrected Visual-Textual Entailment with Natural Language Explanations
Virginie Do
Oana-Maria Camburu
Zeynep Akata
Thomas Lukasiewicz
LRM
99
30
0
07 Apr 2020
Context-Aware Group Captioning via Self-Attention and Contrastive
  Features
Context-Aware Group Captioning via Self-Attention and Contrastive Features
Zhuowan Li
Quan Hung Tran
Long Mai
Zhe Lin
Alan Yuille
VLM
81
44
0
07 Apr 2020
Sub-Instruction Aware Vision-and-Language Navigation
Sub-Instruction Aware Vision-and-Language Navigation
Yicong Hong
Cristian Rodriguez-Opazo
Qi Wu
Stephen Gould
129
72
0
06 Apr 2020
B-SCST: Bayesian Self-Critical Sequence Training for Image Captioning
B-SCST: Bayesian Self-Critical Sequence Training for Image Captioning
Shashank Bujimalla
Mahesh Subedar
Omesh Tickoo
BDLUQCV
25
10
0
06 Apr 2020
Iterative Context-Aware Graph Inference for Visual Dialog
Iterative Context-Aware Graph Inference for Visual Dialog
Dan Guo
Haibo Wang
Hanwang Zhang
Zhengjun Zha
Meng Wang
79
49
0
05 Apr 2020
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal
  Transformers
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
Zhicheng Huang
Zhaoyang Zeng
Bei Liu
Dongmei Fu
Jianlong Fu
ViT
197
440
0
02 Apr 2020
Consistent Multiple Sequence Decoding
Consistent Multiple Sequence Decoding
Bicheng Xu
Leonid Sigal
57
0
0
02 Apr 2020
More Grounded Image Captioning by Distilling Image-Text Matching Model
More Grounded Image Captioning by Distilling Image-Text Matching Model
Yuanen Zhou
Meng Wang
Daqing Liu
Zhenzhen Hu
Hanwang Zhang
90
126
0
01 Apr 2020
X-Linear Attention Networks for Image Captioning
X-Linear Attention Networks for Image Captioning
Yingwei Pan
Ting Yao
Yehao Li
Tao Mei
134
519
0
31 Mar 2020
Modulating Bottom-Up and Top-Down Visual Processing via
  Language-Conditional Filters
Modulating Bottom-Up and Top-Down Visual Processing via Language-Conditional Filters
.Ilker Kesen
Ozan Arkan Can
Erkut Erdem
Aykut Erdem
Deniz Yuret
VLM
53
1
0
28 Mar 2020
Assessing Image Quality Issues for Real-World Problems
Assessing Image Quality Issues for Real-World Problems
Tai-Yin Chiu
Yinan Zhao
Danna Gurari
137
54
0
27 Mar 2020
Grounded Situation Recognition
Grounded Situation Recognition
Sarah M Pratt
Mark Yatskar
Luca Weihs
Ali Farhadi
Aniruddha Kembhavi
99
112
0
26 Mar 2020
P $\approx$ NP, at least in Visual Question Answering
P ≈\approx≈ NP, at least in Visual Question Answering
Shailza Jolly
Sebastián M. Palacio
Joachim Folz
Federico Raue
Jörn Hees
Andreas Dengel
24
0
0
26 Mar 2020
Neural encoding and interpretation for high-level visual cortices based
  on fMRI using image caption features
Neural encoding and interpretation for high-level visual cortices based on fMRI using image caption features
Kai Qiao
Chi Zhang
Jian Chen
Linyuan Wang
Li Tong
Bin Yan
21
3
0
26 Mar 2020
Egoshots, an ego-vision life-logging dataset and semantic fidelity
  metric to evaluate diversity in image captioning models
Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning models
Pranav Agarwal
Alejandro Betancourt
V. Panagiotou
Natalia Díaz Rodríguez
EGVM
82
10
0
26 Mar 2020
Learning Compact Reward for Image Captioning
Learning Compact Reward for Image Captioning
Nannan Li
Zhenzhong Chen
66
3
0
24 Mar 2020
TextCaps: a Dataset for Image Captioning with Reading Comprehension
TextCaps: a Dataset for Image Captioning with Reading Comprehension
Oleksii Sidorov
Ronghang Hu
Marcus Rohrbach
Amanpreet Singh
103
418
0
24 Mar 2020
Video Object Grounding using Semantic Roles in Language Description
Video Object Grounding using Semantic Roles in Language Description
Arka Sadhu
Kan Chen
Ram Nevatia
143
48
0
24 Mar 2020
Linguistically Driven Graph Capsule Network for Visual Question
  Reasoning
Linguistically Driven Graph Capsule Network for Visual Question Reasoning
Qingxing Cao
Xiaodan Liang
Keze Wang
Liang Lin
GNN
47
3
0
23 Mar 2020
A Better Variant of Self-Critical Sequence Training
A Better Variant of Self-Critical Sequence Training
Ruotian Luo
BDL
73
37
0
22 Mar 2020
Visual Question Answering for Cultural Heritage
Visual Question Answering for Cultural Heritage
P. Bongini
Federico Becattini
Andrew D. Bagdanov
A. Bimbo
479
24
0
22 Mar 2020
Exploring Bottom-up and Top-down Cues with Attentive Learning for Webly
  Supervised Object Detection
Exploring Bottom-up and Top-down Cues with Attentive Learning for Webly Supervised Object Detection
Zhonghua Wu
Qingyi Tao
Guosheng Lin
Jianfei Cai
ObjD
70
14
0
22 Mar 2020
Normalized and Geometry-Aware Self-Attention Network for Image
  Captioning
Normalized and Geometry-Aware Self-Attention Network for Image Captioning
Longteng Guo
Jing Liu
Xinxin Zhu
Peng Yao
Shichen Lu
Hanqing Lu
ViT
201
192
0
19 Mar 2020
RSVQA: Visual Question Answering for Remote Sensing Data
RSVQA: Visual Question Answering for Remote Sensing Data
Sylvain Lobry
Diego Marcos
J. Murray
D. Tuia
126
223
0
16 Mar 2020
Vision-Dialog Navigation by Exploring Cross-modal Memory
Vision-Dialog Navigation by Exploring Cross-modal Memory
Yi Zhu
Fengda Zhu
Zhaohuan Zhan
Bingqian Lin
Jianbin Jiao
Xiaojun Chang
Xiaodan Liang
VLM
91
49
0
15 Mar 2020
Counterfactual Samples Synthesizing for Robust Visual Question Answering
Counterfactual Samples Synthesizing for Robust Visual Question Answering
Long Chen
Xin Yan
Jun Xiao
Hanwang Zhang
Shiliang Pu
Yueting Zhuang
OODAAML
219
294
0
14 Mar 2020
Analyzing Visual Representations in Embodied Navigation Tasks
Analyzing Visual Representations in Embodied Navigation Tasks
Erik Wijmans
Julian Straub
Dhruv Batra
Irfan Essa
Judy Hoffman
Ari S. Morcos
75
2
0
12 Mar 2020
MQA: Answering the Question via Robotic Manipulation
MQA: Answering the Question via Robotic Manipulation
Yuhong Deng
Di Guo
F. Sun
Naifu Zhang
Huaping Liu
Chen Pang
76
22
0
10 Mar 2020
Deconfounded Image Captioning: A Causal Retrospect
Deconfounded Image Captioning: A Causal Retrospect
Xu Yang
Hanwang Zhang
Jianfei Cai
CML
79
127
0
09 Mar 2020
Better Captioning with Sequence-Level Exploration
Better Captioning with Sequence-Level Exploration
Jia Chen
Qin Jin
61
12
0
08 Mar 2020
OVC-Net: Object-Oriented Video Captioning with Temporal Graph and Detail
  Enhancement
OVC-Net: Object-Oriented Video Captioning with Temporal Graph and Detail Enhancement
Fangyi Zhu
Lei Li
Zhanyu Ma
Guang Chen
Jun Guo
36
1
0
08 Mar 2020
Adaptive Offline Quintuplet Loss for Image-Text Matching
Adaptive Offline Quintuplet Loss for Image-Text Matching
Tianlang Chen
Jiajun Deng
Jiebo Luo
234
70
0
07 Mar 2020
Captioning Images with Novel Objects via Online Vocabulary Expansion
Captioning Images with Novel Objects via Online Vocabulary Expansion
Mikihiro Tanaka
Tatsuya Harada
3DV
77
2
0
06 Mar 2020
Show, Edit and Tell: A Framework for Editing Image Captions
Show, Edit and Tell: A Framework for Editing Image Captions
Fawaz Sammani
Luke Melas-Kyriazi
KELMDiffM
108
59
0
06 Mar 2020
XGPT: Cross-modal Generative Pre-Training for Image Captioning
XGPT: Cross-modal Generative Pre-Training for Image Captioning
Qiaolin Xia
Haoyang Huang
Nan Duan
Dongdong Zhang
Lei Ji
Zhifang Sui
Edward Cui
Taroon Bharti
Xin Liu
Ming Zhou
MLLMVLM
103
76
0
03 Mar 2020
Toward Interpretability of Dual-Encoder Models for Dialogue Response
  Suggestions
Toward Interpretability of Dual-Encoder Models for Dialogue Response Suggestions
Yitong Li
Dianqi Li
Sushant Prakash
Peng Wang
66
2
0
02 Mar 2020
Say As You Wish: Fine-grained Control of Image Caption Generation with
  Abstract Scene Graphs
Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs
Shizhe Chen
Qin Jin
Peng Wang
Qi Wu
DiffM
131
219
0
01 Mar 2020
Visual Commonsense R-CNN
Visual Commonsense R-CNN
Tan Wang
Jianqiang Huang
Hanwang Zhang
Qianru Sun
SSLObjDCML
86
252
0
27 Feb 2020
Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real
  Domain Shift and Improve Depth Estimation
Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation
Yunhan Zhao
Shu Kong
Daeyun Shin
Charless C. Fowlkes
MDE
76
44
0
27 Feb 2020
GATCluster: Self-Supervised Gaussian-Attention Network for Image
  Clustering
GATCluster: Self-Supervised Gaussian-Attention Network for Image Clustering
Chuang Niu
Jun Zhang
Ge Wang
Jimin Liang
SSL
90
70
0
27 Feb 2020
Analysis of diversity-accuracy tradeoff in image captioning
Analysis of diversity-accuracy tradeoff in image captioning
Ruotian Luo
Gregory Shakhnarovich
65
13
0
27 Feb 2020
What BERT Sees: Cross-Modal Transfer for Visual Question Generation
What BERT Sees: Cross-Modal Transfer for Visual Question Generation
Thomas Scialom
Patrick Bordes
Paul-Alexis Dray
Jacopo Staiano
Patrick Gallinari
59
6
0
25 Feb 2020
Towards Learning a Generic Agent for Vision-and-Language Navigation via
  Pre-training
Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training
Weituo Hao
Chunyuan Li
Xiujun Li
Lawrence Carin
Jianfeng Gao
LM&Ro
122
283
0
25 Feb 2020
Exploiting the Full Capacity of Deep Neural Networks while Avoiding
  Overfitting by Targeted Sparsity Regularization
Exploiting the Full Capacity of Deep Neural Networks while Avoiding Overfitting by Targeted Sparsity Regularization
Karim Huesmann
Soeren Klemm
Lars Linsen
Benjamin Risse
27
2
0
21 Feb 2020
A Convolutional Baseline for Person Re-Identification Using Vision and
  Language Descriptions
A Convolutional Baseline for Person Re-Identification Using Vision and Language Descriptions
Ammarah Farooq
Muhammad Awais
F. Yan
J. Kittler
A. Akbari
S. S. Khalid
114
8
0
20 Feb 2020
Captioning Images Taken by People Who Are Blind
Captioning Images Taken by People Who Are Blind
Danna Gurari
Yinan Zhao
Meng Zhang
Nilavra Bhattacharya
105
184
0
20 Feb 2020
Previous
123...293031...363738
Next