ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.07998
  4. Cited By
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
v1v2v3 (latest)

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
    AIMat
ArXiv (abs)PDFHTML

Papers citing "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"

50 / 1,868 papers shown
Title
FindIt: Generalized Localization with Natural Language Queries
FindIt: Generalized Localization with Natural Language Queries
Weicheng Kuo
Fred Bertsch
Wei Li
A. Piergiovanni
M. Saffar
A. Angelova
ObjD
88
17
0
31 Mar 2022
SimVQA: Exploring Simulated Environments for Visual Question Answering
SimVQA: Exploring Simulated Environments for Visual Question Answering
Paola Cascante-Bonilla
Hui Wu
Letao Wang
Rogerio Feris
Vicente Ordonez
84
7
0
31 Mar 2022
NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image
  Caption Generation Models
NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models
Simin Chen
Zihe Song
Mirazul Haque
Cong Liu
Wei Yang
66
42
0
29 Mar 2022
Shifting More Attention to Visual Backbone: Query-modulated Refinement
  Networks for End-to-End Visual Grounding
Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding
Jiabo Ye
Junfeng Tian
Ming Yan
Xiaoshan Yang
Xuwu Wang
Ji Zhang
Liang He
Xin Lin
ObjD
88
66
0
29 Mar 2022
Quantifying Societal Bias Amplification in Image Captioning
Quantifying Societal Bias Amplification in Image Captioning
Yusuke Hirota
Yuta Nakashima
Noa Garcia
76
48
0
29 Mar 2022
End-to-End Transformer Based Model for Image Captioning
End-to-End Transformer Based Model for Image Captioning
Yiyu Wang
Jungang Xu
Yingfei Sun
VLMViT
64
125
0
29 Mar 2022
NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External
  Knowledge
NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge
D. Vo
Hong Chen
Akihiro Sugimoto
Hideki Nakayama
132
14
0
28 Mar 2022
A General Survey on Attention Mechanisms in Deep Learning
A General Survey on Attention Mechanisms in Deep Learning
Gianni Brauwers
Flavius Frasincar
104
334
0
27 Mar 2022
Towards Escaping from Language Bias and OCR Error: Semantics-Centered
  Text Visual Question Answering
Towards Escaping from Language Bias and OCR Error: Semantics-Centered Text Visual Question Answering
Chengyang Fang
Gangyan Zeng
Yu Zhou
Daiqing Wu
Can Ma
Dayong Hu
Weiping Wang
58
8
0
24 Mar 2022
Bilaterally Slimmable Transformer for Elastic and Efficient Visual
  Question Answering
Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering
Zhou Yu
Zitian Jin
Jun Yu
Mingliang Xu
Hongbo Wang
Jianping Fan
60
4
0
24 Mar 2022
PACS: A Dataset for Physical Audiovisual CommonSense Reasoning
PACS: A Dataset for Physical Audiovisual CommonSense Reasoning
Samuel Yu
Peter Wu
Paul Pu Liang
Ruslan Salakhutdinov
Louis-Philippe Morency
LRM
117
16
0
21 Mar 2022
MuKEA: Multimodal Knowledge Extraction and Accumulation for
  Knowledge-based Visual Question Answering
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering
Yang Ding
Jing Yu
Bangchang Liu
Yue Hu
Mingxin Cui
Qi Wu
58
64
0
17 Mar 2022
UNIMO-2: End-to-End Unified Vision-Language Grounded Learning
UNIMO-2: End-to-End Unified Vision-Language Grounded Learning
Wei Li
Can Gao
Guocheng Niu
Xinyan Xiao
Hao Liu
Jiachen Liu
Hua Wu
Haifeng Wang
MLLM
51
22
0
17 Mar 2022
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Haojun Jiang
Yuanze Lin
Dongchen Han
Shiji Song
Gao Huang
ObjD
107
54
0
16 Mar 2022
Spot the Difference: A Cooperative Object-Referring Game in
  Non-Perfectly Co-Observable Scene
Spot the Difference: A Cooperative Object-Referring Game in Non-Perfectly Co-Observable Scene
Duo Zheng
Fandong Meng
Q. Si
Hairun Fan
Zipeng Xu
Jie Zhou
Fangxiang Feng
Xiaojie Wang
73
0
0
16 Mar 2022
K-VQG: Knowledge-aware Visual Question Generation for Common-sense
  Acquisition
K-VQG: Knowledge-aware Visual Question Generation for Common-sense Acquisition
Kohei Uehara
Tatsuya Harada
98
10
0
15 Mar 2022
Revitalize Region Feature for Democratizing Video-Language Pre-training
  of Retrieval
Revitalize Region Feature for Democratizing Video-Language Pre-training of Retrieval
Guanyu Cai
Yixiao Ge
Binjie Zhang
Alex Jinpeng Wang
Rui Yan
...
Ying Shan
Lianghua He
Xiaohu Qie
Jianping Wu
Mike Zheng Shou
VLM
51
6
0
15 Mar 2022
Can you even tell left from right? Presenting a new challenge for VQA
Can you even tell left from right? Presenting a new challenge for VQA
Sairaam Venkatraman
Rishi Rao
S. Balasubramanian
C. Vorugunti
R. R. Sarma
CoGe
81
0
0
15 Mar 2022
Extracting associations and meanings of objects depicted in artworks
  through bi-modal deep networks
Extracting associations and meanings of objects depicted in artworks through bi-modal deep networks
Gregory Kell
Ryan-Rhys Griffiths
Anthony Bourached
D. Stork
42
3
0
14 Mar 2022
Pruned Graph Neural Network for Short Story Ordering
Pruned Graph Neural Network for Short Story Ordering
Melika Golestani
Zeinab Borhanifard
Farnaz Tahmasebian
Heshaam Faili
41
0
0
13 Mar 2022
Global2Local: A Joint-Hierarchical Attention for Video Captioning
Global2Local: A Joint-Hierarchical Attention for Video Captioning
Chengpeng Dai
Fuhai Chen
Xiaoshuai Sun
Rongrong Ji
QiXiang Ye
Yongjian Wu
79
1
0
13 Mar 2022
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge
  Distillation
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
Wenliang Dai
Lu Hou
Lifeng Shang
Xin Jiang
Qun Liu
Pascale Fung
VLM
92
94
0
12 Mar 2022
REX: Reasoning-aware and Grounded Explanation
REX: Reasoning-aware and Grounded Explanation
Shi Chen
Qi Zhao
89
18
0
11 Mar 2022
Two-stream Hierarchical Similarity Reasoning for Image-text Matching
Two-stream Hierarchical Similarity Reasoning for Image-text Matching
Ran Chen
Hanli Wang
Lei Wang
Sam Kwong
57
9
0
10 Mar 2022
Knowledge-enriched Attention Network with Group-wise Semantic for Visual
  Storytelling
Knowledge-enriched Attention Network with Group-wise Semantic for Visual Storytelling
Tengpeng Li
Hanli Wang
Bin He
Changan Chen
DiffM
88
10
0
10 Mar 2022
MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes
MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes
Yang Jiao
Shaoxiang Chen
Zequn Jie
Wenke Huang
Lin Ma
Yu-Gang Jiang
3DPC
115
48
0
10 Mar 2022
NLX-GPT: A Model for Natural Language Explanations in Vision and
  Vision-Language Tasks
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks
Fawaz Sammani
Tanmoy Mukherjee
Nikos Deligiannis
MILMELMLRM
138
68
0
09 Mar 2022
Towards Inadequately Pre-trained Models in Transfer Learning
Towards Inadequately Pre-trained Models in Transfer Learning
Andong Deng
Xingjian Li
Di Hu
Tianyang Wang
Haoyi Xiong
Chengzhong Xu
23
6
0
09 Mar 2022
AssistQ: Affordance-centric Question-driven Task Completion for
  Egocentric Assistant
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
B. Wong
Joya Chen
You Wu
Stan Weixian Lei
Dongxing Mao
Difei Gao
Mike Zheng Shou
EgoV
74
29
0
08 Mar 2022
Where Does the Performance Improvement Come From? -- A Reproducibility
  Concern about Image-Text Retrieval
Where Does the Performance Improvement Come From? -- A Reproducibility Concern about Image-Text Retrieval
Jun Rao
Fei Wang
Liang Ding
Shuhan Qi
Yibing Zhan
Weifeng Liu
Dacheng Tao
OOD
89
30
0
08 Mar 2022
GlideNet: Global, Local and Intrinsic based Dense Embedding NETwork for
  Multi-category Attributes Prediction
GlideNet: Global, Local and Intrinsic based Dense Embedding NETwork for Multi-category Attributes Prediction
Kareem M. Metwaly
Aerin Kim
E. Branson
V. Monga
86
7
0
07 Mar 2022
Modeling Coreference Relations in Visual Dialog
Modeling Coreference Relations in Visual Dialog
Mingxiao Li
Marie-Francine Moens
51
10
0
06 Mar 2022
Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for
  Knowledge-based Visual Question Answering
Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering
Mingxiao Li
Marie-Francine Moens
82
13
0
06 Mar 2022
Important Object Identification with Semi-Supervised Learning for
  Autonomous Driving
Important Object Identification with Semi-Supervised Learning for Autonomous Driving
Jiachen Li
Haiming Gang
Hengbo Ma
Masayoshi Tomizuka
Chiho Choi
93
12
0
05 Mar 2022
Vision-Language Intelligence: Tasks, Representation Learning, and Large
  Models
Vision-Language Intelligence: Tasks, Representation Learning, and Large Models
Feng Li
Hao Zhang
Yi-Fan Zhang
Shixuan Liu
Jian Guo
L. Ni
Pengchuan Zhang
Lei Zhang
AI4TSVLM
79
37
0
03 Mar 2022
A Deep Neural Framework for Image Caption Generation Using GRU-Based
  Attention Mechanism
A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism
Rashid Khan
Shujah Islam
Khadija Kanwal
Mansoor Iqbal
Md. Imran Hossain
Z. Ye
3DV
32
18
0
03 Mar 2022
Video Question Answering: Datasets, Algorithms and Challenges
Video Question Answering: Datasets, Algorithms and Challenges
Yaoyao Zhong
Junbin Xiao
Wei Ji
Yicong Li
Wei Deng
Tat-Seng Chua
124
92
0
02 Mar 2022
Unsupervised Vision-and-Language Pre-training via Retrieval-based
  Multi-Granular Alignment
Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
Mingyang Zhou
Licheng Yu
Amanpreet Singh
Mengjiao MJ Wang
Zhou Yu
Ning Zhang
VLM
82
31
0
01 Mar 2022
Interactive Machine Learning for Image Captioning
Interactive Machine Learning for Image Captioning
Mareike Hartmann
Aliki Anagnostopoulou
Daniel Sonntag
VLM
45
4
0
28 Feb 2022
SGL: Symbolic Goal Learning in a Hybrid, Modular Framework for Human
  Instruction Following
SGL: Symbolic Goal Learning in a Hybrid, Modular Framework for Human Instruction Following
Ruinian Xu
Hongyi Chen
Yunzhi Lin
Patricio A. Vela
66
6
0
25 Feb 2022
On Modality Bias Recognition and Reduction
On Modality Bias Recognition and Reduction
Yangyang Guo
Liqiang Nie
Harry Cheng
Zhiyong Cheng
Mohan S. Kankanhalli
A. Bimbo
75
28
0
25 Feb 2022
Joint Answering and Explanation for Visual Commonsense Reasoning
Joint Answering and Explanation for Visual Commonsense Reasoning
Zhenyang Li
Yangyang Guo
Ke-Jyun Wang
Yin-wei Wei
Liqiang Nie
Mohan S. Kankanhalli
67
17
0
25 Feb 2022
Think Global, Act Local: Dual-scale Graph Transformer for
  Vision-and-Language Navigation
Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation
Shizhe Chen
Pierre-Louis Guhur
Makarand Tapaswi
Cordelia Schmid
Ivan Laptev
LM&Ro
92
149
0
23 Feb 2022
Relation Regularized Scene Graph Generation
Relation Regularized Scene Graph Generation
Yuyu Guo
Lianli Gao
Jingkuan Song
Peng Wang
N. Sebe
Heng Tao Shen
Xuelong Li
68
15
0
22 Feb 2022
CaMEL: Mean Teacher Learning for Image Captioning
CaMEL: Mean Teacher Learning for Image Captioning
Manuele Barraco
Matteo Stefanini
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
ViTVLM
78
30
0
21 Feb 2022
(2.5+1)D Spatio-Temporal Scene Graphs for Video Question Answering
(2.5+1)D Spatio-Temporal Scene Graphs for Video Question Answering
A. Cherian
Chiori Hori
Tim K. Marks
Jonathan Le Roux
108
38
0
18 Feb 2022
A Review on Methods and Applications in Multimodal Deep Learning
A Review on Methods and Applications in Multimodal Deep Learning
Summaira Jabeen
Xi Li
Muhammad Shoib Amin
Abdul Jabbar
VLMHAI
73
101
0
18 Feb 2022
VLP: A Survey on Vision-Language Pre-training
VLP: A Survey on Vision-Language Pre-training
Feilong Chen
Duzhen Zhang
Minglun Han
Xiuyi Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
181
227
0
18 Feb 2022
When Did It Happen? Duration-informed Temporal Localization of Narrated
  Actions in Vlogs
When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
Oana Ignat
Santiago Castro
Yuhang Zhou
Jiajun Bao
Dandan Shan
Rada Mihalcea
55
3
0
16 Feb 2022
XFBoost: Improving Text Generation with Controllable Decoders
XFBoost: Improving Text Generation with Controllable Decoders
Xiangyu Peng
Michael Sollami
70
1
0
16 Feb 2022
Previous
123...151617...363738
Next