ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.07998
  4. Cited By
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
v1v2v3 (latest)

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
    AIMat
ArXiv (abs)PDFHTML

Papers citing "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"

50 / 1,868 papers shown
Title
Expressing Objects just like Words: Recurrent Visual Embedding for
  Image-Text Matching
Expressing Objects just like Words: Recurrent Visual Embedding for Image-Text Matching
Tianlang Chen
Jiebo Luo
72
69
0
20 Feb 2020
VQA-LOL: Visual Question Answering under the Lens of Logic
VQA-LOL: Visual Question Answering under the Lens of Logic
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
CoGe
71
75
0
19 Feb 2020
CQ-VQA: Visual Question Answering on Categorized Questions
CQ-VQA: Visual Question Answering on Categorized Questions
Aakansha Mishra
A. Anand
Prithwijit Guha
145
6
0
17 Feb 2020
Gaussian Smoothen Semantic Features (GSSF) -- Exploring the Linguistic
  Aspects of Visual Captioning in Indian Languages (Bengali) Using MSCOCO
  Framework
Gaussian Smoothen Semantic Features (GSSF) -- Exploring the Linguistic Aspects of Visual Captioning in Indian Languages (Bengali) Using MSCOCO Framework
C. Sur
116
7
0
16 Feb 2020
MRRC: Multiple Role Representation Crossover Interpretation for Image
  Captioning With R-CNN Feature Distribution Composition (FDC)
MRRC: Multiple Role Representation Crossover Interpretation for Image Captioning With R-CNN Feature Distribution Composition (FDC)
C. Sur
52
17
0
15 Feb 2020
Sparse and Structured Visual Attention
Sparse and Structured Visual Attention
Pedro Henrique Martins
S. Becker
Zita Marinho
Michael Arens
78
8
0
13 Feb 2020
Component Analysis for Visual Question Answering Architectures
Component Analysis for Visual Question Answering Architectures
Camila Kolling
Jonatas Wehrmann
Rodrigo C. Barros
CoGe
36
2
0
12 Feb 2020
Object Detection as a Positive-Unlabeled Problem
Object Detection as a Positive-Unlabeled Problem
Yuewei Yang
Kevin J. Liang
Lawrence Carin
82
39
0
11 Feb 2020
Vision-based Fight Detection from Surveillance Cameras
Vision-based Fight Detection from Surveillance Cameras
Seymanur Akti
G. A. Tataroglu
H. K. Ekenel
49
78
0
11 Feb 2020
Multimodal Matching Transformer for Live Commenting
Multimodal Matching Transformer for Live Commenting
Chaoqun Duan
Lei Cui
Shuming Ma
Furu Wei
Conghui Zhu
Tiejun Zhao
26
12
0
07 Feb 2020
iCap: Interactive Image Captioning with Predictive Text
iCap: Interactive Image Captioning with Predictive Text
Zhengxiong Jia
Xirong Li
30
8
0
31 Jan 2020
Evaluating the Progress of Deep Learning for Visual Relational Concepts
Evaluating the Progress of Deep Learning for Visual Relational Concepts
Sebastian Stabinger
Peer David
J. Piater
A. Rodríguez-Sánchez
79
19
0
29 Jan 2020
Explaining with Counter Visual Attributes and Examples
Explaining with Counter Visual Attributes and Examples
Sadaf Gulshad
A. Smeulders
XAIFAttAAML
77
15
0
27 Jan 2020
aiTPR: Attribute Interaction-Tensor Product Representation for Image
  Caption
aiTPR: Attribute Interaction-Tensor Product Representation for Image Caption
C. Sur
40
8
0
27 Jan 2020
Uncertainty based Class Activation Maps for Visual Question Answering
Uncertainty based Class Activation Maps for Visual Question Answering
Badri N. Patro
Mayank Lunayach
Vinay P. Namboodiri
FAttUQCV
32
1
0
23 Jan 2020
ManyModalQA: Modality Disambiguation and QA over Diverse Inputs
ManyModalQA: Modality Disambiguation and QA over Diverse Inputs
Darryl Hannan
Akshay Jain
Joey Tianyi Zhou
AAML
85
60
0
22 Jan 2020
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised
  Image-Text Data
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
Di Qi
Lin Su
Jianwei Song
Edward Cui
Taroon Bharti
Arun Sacheti
VLM
132
263
0
22 Jan 2020
Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models
Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models
M. Farazi
Salman H. Khan
Nick Barnes
79
18
0
20 Jan 2020
Modality-Balanced Models for Visual Dialogue
Modality-Balanced Models for Visual Dialogue
Hyounghun Kim
Hao Tan
Joey Tianyi Zhou
61
27
0
17 Jan 2020
Show, Recall, and Tell: Image Captioning with Recall Mechanism
Show, Recall, and Tell: Image Captioning with Recall Mechanism
Li Wang
Zechen Bai
Yonghua Zhang
Hongtao Lu
75
67
0
15 Jan 2020
Ensemble based discriminative models for Visual Dialog Challenge 2018
Ensemble based discriminative models for Visual Dialog Challenge 2018
Shubham Agarwal
Raghav Goyal
32
1
0
15 Jan 2020
In Defense of Grid Features for Visual Question Answering
In Defense of Grid Features for Visual Question Answering
Huaizu Jiang
Ishan Misra
Marcus Rohrbach
Erik Learned-Miller
Xinlei Chen
OODObjD
88
320
0
10 Jan 2020
Visual Question Answering on 360° Images
Visual Question Answering on 360° Images
Shih-Han Chou
Wei-Lun Chao
Wei-Sheng Lai
Min Sun
Ming-Hsuan Yang
52
22
0
10 Jan 2020
Explain and Improve: LRP-Inference Fine-Tuning for Image Captioning
  Models
Explain and Improve: LRP-Inference Fine-Tuning for Image Captioning Models
Jiamei Sun
Sebastian Lapuschkin
Wojciech Samek
Alexander Binder
FAtt
98
30
0
04 Jan 2020
Multi-Layer Content Interaction Through Quaternion Product For Visual
  Question Answering
Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering
Lei Shi
Shijie Geng
Kai Shuang
Chiori Hori
Songxiang Liu
Peng Gao
Sen Su
85
11
0
03 Jan 2020
Adaptive Correlated Monte Carlo for Contextual Categorical Sequence
  Generation
Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation
Xinjie Fan
Yizhe Zhang
Zhendong Wang
Mingyuan Zhou
BDL
72
4
0
31 Dec 2019
Visual Agreement Regularized Training for Multi-Modal Machine
  Translation
Visual Agreement Regularized Training for Multi-Modal Machine Translation
Pengcheng Yang
Boxing Chen
Pei Zhang
Xu Sun
154
31
0
27 Dec 2019
Vision and Language: from Visual Perception to Content Creation
Vision and Language: from Visual Perception to Content Creation
Tao Mei
Wei Zhang
Ting Yao
VLM
76
8
0
26 Dec 2019
Explicit Sparse Transformer: Concentrated Attention Through Explicit
  Selection
Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
Guangxiang Zhao
Junyang Lin
Zhiyuan Zhang
Xuancheng Ren
Qi Su
Xu Sun
79
113
0
25 Dec 2019
Look, Read and Feel: Benchmarking Ads Understanding with Multimodal
  Multitask Learning
Look, Read and Feel: Benchmarking Ads Understanding with Multimodal Multitask Learning
Huaizheng Zhang
Yong Luo
Qiming Ai
Yonggang Wen
113
15
0
21 Dec 2019
CPGAN: Full-Spectrum Content-Parsing Generative Adversarial Networks for
  Text-to-Image Synthesis
CPGAN: Full-Spectrum Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis
Jiadong Liang
Wenjie Pei
Feng Lu
GAN
60
19
0
18 Dec 2019
DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog
DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog
Feilong Chen
Fandong Meng
Jiaming Xu
Peng Li
Bo Xu
Jie Zhou
93
34
0
18 Dec 2019
Meshed-Memory Transformer for Image Captioning
Meshed-Memory Transformer for Image Captioning
Marcella Cornia
Matteo Stefanini
Lorenzo Baraldi
Rita Cucchiara
110
890
0
17 Dec 2019
MEDIRL: Predicting the Visual Attention of Drivers via Maximum Entropy
  Deep Inverse Reinforcement Learning
MEDIRL: Predicting the Visual Attention of Drivers via Maximum Entropy Deep Inverse Reinforcement Learning
Sonia Baee
Erfan Pakdamanian
Inki Kim
Lu Feng
Vicente Ordonez
Laura E. Barnes
99
47
0
17 Dec 2019
SG-VAE: Scene Grammar Variational Autoencoder to generate new indoor
  scenes
SG-VAE: Scene Grammar Variational Autoencoder to generate new indoor scenes
Pulak Purkait
Christopher Zach
Ian Reid
3DVDRL
33
1
0
10 Dec 2019
A Real-time Global Inference Network for One-stage Referring Expression
  Comprehension
A Real-time Global Inference Network for One-stage Referring Expression Comprehension
Yiyi Zhou
Rongrong Ji
Gen Luo
Xiaoshuai Sun
Jinsong Su
Xinghao Ding
Chia-Wen Lin
Q. Tian
ObjD
85
64
0
07 Dec 2019
Connecting Vision and Language with Localized Narratives
Connecting Vision and Language with Localized Narratives
Jordi Pont-Tuset
J. Uijlings
Soravit Changpinyo
Radu Soricut
V. Ferrari
ObjD
143
252
0
06 Dec 2019
Weak Supervision helps Emergence of Word-Object Alignment and improves
  Vision-Language Tasks
Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks
Corentin Kervadec
G. Antipov
M. Baccouche
Christian Wolf
60
15
0
06 Dec 2019
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art
  Baseline
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
Vishvak Murahari
Dhruv Batra
Devi Parikh
Abhishek Das
VLM
111
117
0
05 Dec 2019
12-in-1: Multi-Task Vision and Language Representation Learning
12-in-1: Multi-Task Vision and Language Representation Learning
Jiasen Lu
Vedanuj Goswami
Marcus Rohrbach
Devi Parikh
Stefan Lee
VLMObjD
131
481
0
05 Dec 2019
Knowledge-Enriched Visual Storytelling
Knowledge-Enriched Visual Storytelling
Chao-Chun Hsu
Zi-Yuan Chen
Chi-Yang Hsu
Chih-Chia Li
Tzu-Yuan Lin
Ting-Hao 'Kenneth' Huang
Lun-Wei Ku
DiffM
90
47
0
03 Dec 2019
Deep Bayesian Active Learning for Multiple Correct Outputs
Deep Bayesian Active Learning for Multiple Correct Outputs
Khaled Jedoui
Ranjay Krishna
Michael S. Bernstein
Li Fei-Fei
BDLOODUQCV
93
14
0
02 Dec 2019
Learning to Relate from Captions and Bounding Boxes
Learning to Relate from Captions and Bounding Boxes
Sarthak Garg
Joel Ruben Antony Moniz
Anshu Aviral
Priyatham Bollimpalli
38
3
0
01 Dec 2019
Multimodal Attention Networks for Low-Level Vision-and-Language
  Navigation
Multimodal Attention Networks for Low-Level Vision-and-Language Navigation
Federico Landi
Lorenzo Baraldi
Marcella Cornia
M. Corsini
Rita Cucchiara
LM&Ro
87
29
0
27 Nov 2019
Efficient Attention Mechanism for Visual Dialog that can Handle All the
  Interactions between Multiple Inputs
Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs
Van-Quang Nguyen
Masanori Suganuma
Takayuki Okatani
107
7
0
26 Nov 2019
Two Causal Principles for Improving Visual Dialog
Two Causal Principles for Improving Visual Dialog
Jiaxin Qi
Yulei Niu
Jianqiang Huang
Hanwang Zhang
CML
110
149
0
24 Nov 2019
Neural Storyboard Artist: Visualizing Stories with Coherent Image
  Sequences
Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences
Shizhe Chen
Bei Liu
Jianlong Fu
Ruihua Song
Qin Jin
Pingping Lin
Xiaoyu Qi
Chunting Wang
Jin Zhou
DiffM
75
33
0
24 Nov 2019
Unsupervised Keyword Extraction for Full-sentence VQA
Unsupervised Keyword Extraction for Full-sentence VQA
Kohei Uehara
Tatsuya Harada
32
1
0
23 Nov 2019
CRUR: Coupled-Recurrent Unit for Unification, Conceptualization and
  Context Capture for Language Representation -- A Generalization of Bi
  Directional LSTM
CRUR: Coupled-Recurrent Unit for Unification, Conceptualization and Context Capture for Language Representation -- A Generalization of Bi Directional LSTM
C. Sur
BDL
49
6
0
22 Nov 2019
TPsgtR: Neural-Symbolic Tensor Product Scene-Graph-Triplet
  Representation for Image Captioning
TPsgtR: Neural-Symbolic Tensor Product Scene-Graph-Triplet Representation for Image Captioning
C. Sur
79
13
0
22 Nov 2019
Previous
123...303132...363738
Next