ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.07998
  4. Cited By
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
v1v2v3 (latest)

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
    AIMat
ArXiv (abs)PDFHTML

Papers citing "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"

50 / 1,868 papers shown
Title
Cross Modification Attention Based Deliberation Model for Image
  Captioning
Cross Modification Attention Based Deliberation Model for Image Captioning
Zheng Lian
Yanan Zhang
Haichang Li
Rui Wang
Xiaohui Hu
64
5
0
17 Sep 2021
An Interpretable Framework for Drug-Target Interaction with Gated Cross
  Attention
An Interpretable Framework for Drug-Target Interaction with Gated Cross Attention
Yeachan Kim
Bonggun Shin
95
9
0
17 Sep 2021
A Survey on Temporal Sentence Grounding in Videos
A Survey on Temporal Sentence Grounding in Videos
Xiaohan Lan
Yitian Yuan
Xin Eric Wang
Zhi Wang
Wenwu Zhu
123
47
0
16 Sep 2021
Label-Attention Transformer with Geometrically Coherent Objects for
  Image Captioning
Label-Attention Transformer with Geometrically Coherent Objects for Image Captioning
Shikha Dubey
Farrukh Olimov
M. Rafique
Joonmo Kim
M. Jeon
ViT
82
42
0
16 Sep 2021
Image Captioning for Effective Use of Language Models in Knowledge-Based
  Visual Question Answering
Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering
Ander Salaberria
Gorka Azkune
Oier López de Lacalle
Aitor Soroa Etxabe
Eneko Agirre
92
61
0
15 Sep 2021
What Vision-Language Models `See' when they See Scenes
What Vision-Language Models `See' when they See Scenes
Michele Cafagna
Kees van Deemter
Albert Gatt
VLM
97
13
0
15 Sep 2021
Discovering the Unknown Knowns: Turning Implicit Knowledge in the
  Dataset into Explicit Training Examples for Visual Question Answering
Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
Jihyung Kil
Cheng Zhang
D. Xuan
Wei-Lun Chao
114
20
0
13 Sep 2021
xGQA: Cross-Lingual Visual Question Answering
xGQA: Cross-Lingual Visual Question Answering
Jonas Pfeiffer
Gregor Geigle
Aishwarya Kamath
Jan-Martin O. Steitz
Stefan Roth
Ivan Vulić
Iryna Gurevych
117
62
0
13 Sep 2021
Learning to Ground Visual Objects for Visual Dialog
Learning to Ground Visual Objects for Visual Dialog
Feilong Chen
Xiuyi Chen
Can Xu
Daxin Jiang
OOD
86
18
0
13 Sep 2021
UniMS: A Unified Framework for Multimodal Summarization with Knowledge
  Distillation
UniMS: A Unified Framework for Multimodal Summarization with Knowledge Distillation
Zhengkun Zhang
Xiaojun Meng
Yasheng Wang
Xin Jiang
Qun Liu
Zhenglu Yang
89
47
0
13 Sep 2021
Explain Me the Painting: Multi-Topic Knowledgeable Art Description
  Generation
Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation
Zechen Bai
Yuta Nakashima
Noa Garcia
110
44
0
13 Sep 2021
Constructing Phrase-level Semantic Labels to Form Multi-Grained
  Supervision for Image-Text Retrieval
Constructing Phrase-level Semantic Labels to Form Multi-Grained Supervision for Image-Text Retrieval
Zhihao Fan
Zhongyu Wei
Zejun Li
Siyuan Wang
Haijun Shan
Xuanjing Huang
Jianqing Fan
CLIP
45
12
0
12 Sep 2021
COSMic: A Coherence-Aware Generation Metric for Image Descriptions
COSMic: A Coherence-Aware Generation Metric for Image Descriptions
Mert Inan
P. Sharma
Baber Khalid
Radu Soricut
Matthew Stone
Malihe Alikhani
EGVM
50
13
0
11 Sep 2021
Partially-Supervised Novel Object Captioning Leveraging Context from
  Paired Data
Partially-Supervised Novel Object Captioning Leveraging Context from Paired Data
Shashank Bujimalla
Mahesh Subedar
Omesh Tickoo
104
1
0
10 Sep 2021
We went to look for meaning and all we got were these lousy
  representations: aspects of meaning representation for computational
  semantics
We went to look for meaning and all we got were these lousy representations: aspects of meaning representation for computational semantics
Simon Dobnik
R. Cooper
Adam Ek
Bill Noble
Staffan Larsson
N. Ilinykh
Vladislav Maraev
Vidya Somashekarappa
64
0
0
10 Sep 2021
Temporal Pyramid Transformer with Multimodal Interaction for Video
  Question Answering
Temporal Pyramid Transformer with Multimodal Interaction for Video Question Answering
Min Peng
Chongyang Wang
Yuan Gao
Yu Shi
Xiangdong Zhou
82
3
0
10 Sep 2021
Towards Developing a Multilingual and Code-Mixed Visual Question
  Answering System by Knowledge Distillation
Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation
H. Khan
D. Gupta
Asif Ekbal
57
14
0
10 Sep 2021
Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in
  Multimodal Transformers
Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers
Stella Frank
Emanuele Bugliarello
Desmond Elliott
74
82
0
09 Sep 2021
TxT: Crossmodal End-to-End Learning with Transformers
TxT: Crossmodal End-to-End Learning with Transformers
Jan-Martin O. Steitz
Jonas Pfeiffer
Iryna Gurevych
Stefan Roth
LRM
29
2
0
09 Sep 2021
M5Product: Self-harmonized Contrastive Learning for E-commercial
  Multi-modal Pretraining
M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining
Xiao Dong
Xunlin Zhan
Yangxin Wu
Yunchao Wei
Michael C. Kampffmeyer
Xiaoyong Wei
Minlong Lu
Yaowei Wang
Xiaodan Liang
116
38
0
09 Sep 2021
Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense
  in Text Generation Models
Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models
Steven Y. Feng
Kevin Lu
Zhuofu Tao
Malihe Alikhani
Teruko Mitamura
Eduard H. Hovy
Varun Gangal
LRM
79
13
0
08 Sep 2021
RefineCap: Concept-Aware Refinement for Image Captioning
RefineCap: Concept-Aware Refinement for Image Captioning
Yekun Chai
Shuo Jin
Junliang Xing
VLM
25
1
0
08 Sep 2021
Visual Sensation and Perception Computational Models for Deep Learning:
  State of the art, Challenges and Prospects
Visual Sensation and Perception Computational Models for Deep Learning: State of the art, Challenges and Prospects
Bing Wei
Yudi Zhao
K. Hao
Lei Gao
74
5
0
08 Sep 2021
Journalistic Guidelines Aware News Image Captioning
Journalistic Guidelines Aware News Image Captioning
Xuewen Yang
Svebor Karaman
Joel R. Tetreault
Alex Jaimes
79
27
0
07 Sep 2021
Improved RAMEN: Towards Domain Generalization for Visual Question
  Answering
Improved RAMEN: Towards Domain Generalization for Visual Question Answering
Bhanuka Gamage
Lim Chern Hong
72
1
0
06 Sep 2021
Enhancing Visual Dialog Questioner with Entity-based Strategy Learning
  and Augmented Guesser
Enhancing Visual Dialog Questioner with Entity-based Strategy Learning and Augmented Guesser
Duo Zheng
Zipeng Xu
Fandong Meng
Xiaojie Wang
Jiaan Wang
Jie Zhou
45
13
0
06 Sep 2021
LAViTeR: Learning Aligned Visual and Textual Representations Assisted by
  Image and Caption Generation
LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation
Mohammad Abuzar Shaikh
Zhanghexuan Ji
Dana Moukheiber
Yan Shen
S. Srihari
Mingchen Gao
VLM
46
1
0
04 Sep 2021
Weakly Supervised Relative Spatial Reasoning for Visual Question
  Answering
Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
LRM
85
19
0
04 Sep 2021
Stimuli-Aware Visual Emotion Analysis
Stimuli-Aware Visual Emotion Analysis
Jingyuan Yang
Jie Li
Xiumei Wang
Yuxuan Ding
Xinbo Gao
55
55
0
04 Sep 2021
IMG2SMI: Translating Molecular Structure Images to Simplified
  Molecular-input Line-entry System
IMG2SMI: Translating Molecular Structure Images to Simplified Molecular-input Line-entry System
Daniel Fernando Campos
Heng Ji
64
13
0
03 Sep 2021
Point-of-Interest Type Prediction using Text and Images
Point-of-Interest Type Prediction using Text and Images
Danae Sánchez Villegas
Nikolaos Aletras
116
14
0
01 Sep 2021
Working Memory Connections for LSTM
Working Memory Connections for LSTM
Federico Landi
Lorenzo Baraldi
Marcella Cornia
Rita Cucchiara
KELM
74
170
0
31 Aug 2021
From General to Specific: Informative Scene Graph Generation via Balance
  Adjustment
From General to Specific: Informative Scene Graph Generation via Balance Adjustment
Yuyu Guo
Lianli Gao
Xuanhan Wang
Yuxuan Hu
Xing Xu
Xu Lu
Heng Tao Shen
Jingkuan Song
103
88
0
30 Aug 2021
Zero-shot Natural Language Video Localization
Zero-shot Natural Language Video Localization
Jinwoo Nam
Daechul Ahn
Dongyeop Kang
S. Ha
Jonghyun Choi
176
43
0
29 Aug 2021
On the Significance of Question Encoder Sequence Model in the
  Out-of-Distribution Performance in Visual Question Answering
On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering
K. Gouthaman
Anurag Mittal
CML
70
0
0
28 Aug 2021
QACE: Asking Questions to Evaluate an Image Caption
QACE: Asking Questions to Evaluate an Image Caption
Hwanhee Lee
Thomas Scialom
Seunghyun Yoon
Franck Dernoncourt
Kyomin Jung
CoGe
87
19
0
28 Aug 2021
SASRA: Semantically-aware Spatio-temporal Reasoning Agent for
  Vision-and-Language Navigation in Continuous Environments
SASRA: Semantically-aware Spatio-temporal Reasoning Agent for Vision-and-Language Navigation in Continuous Environments
Muhammad Zubair Irshad
Niluthpol Chowdhury Mithun
Zachary Seymour
Han-Pang Chiu
S. Samarasekera
Rakesh Kumar
LM&Ro
84
51
0
26 Aug 2021
Similar Scenes arouse Similar Emotions: Parallel Data Augmentation for
  Stylized Image Captioning
Similar Scenes arouse Similar Emotions: Parallel Data Augmentation for Stylized Image Captioning
Guodun Li
Yuchen Zhai
Zehao Lin
Yin Zhang
104
21
0
26 Aug 2021
Improving Object Detection and Attribute Recognition by Feature
  Entanglement Reduction
Improving Object Detection and Attribute Recognition by Feature Entanglement Reduction
Zhao-Heng Zheng
Arka Sadhu
Ramkant Nevatia
27
2
0
25 Aug 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLMMLLM
155
799
0
24 Aug 2021
Auto-Parsing Network for Image Captioning and Visual Question Answering
Auto-Parsing Network for Image Captioning and Visual Question Answering
Xu Yang
Chongyang Gao
Hanwang Zhang
Jianfei Cai
117
37
0
24 Aug 2021
EKTVQA: Generalized use of External Knowledge to empower Scene Text in
  Text-VQA
EKTVQA: Generalized use of External Knowledge to empower Scene Text in Text-VQA
Arka Ujjal Dey
Ernest Valveny
Gaurav Harit
43
3
0
22 Aug 2021
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Ming Yan
Haiyang Xu
Chenliang Li
Bin Bi
Junfeng Tian
Min Gui
Wei Wang
VLM
62
10
0
21 Aug 2021
Group-based Distinctive Image Captioning with Memory Attention
Group-based Distinctive Image Captioning with Memory Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
100
18
0
20 Aug 2021
Airbert: In-domain Pretraining for Vision-and-Language Navigation
Airbert: In-domain Pretraining for Vision-and-Language Navigation
Pierre-Louis Guhur
Makarand Tapaswi
Shizhe Chen
Ivan Laptev
Cordelia Schmid
LM&Ro
59
144
0
20 Aug 2021
Counterfactual Attention Learning for Fine-Grained Visual Categorization
  and Re-identification
Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification
Yongming Rao
Guangyi Chen
Jiwen Lu
Jie Zhou
CMLOOD
88
247
0
19 Aug 2021
X-modaler: A Versatile and High-performance Codebase for Cross-modal
  Analytics
X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics
Yehao Li
Yingwei Pan
Jingwen Chen
Ting Yao
Tao Mei
VLM
88
31
0
18 Aug 2021
Who's Waldo? Linking People Across Text and Images
Who's Waldo? Linking People Across Text and Images
Claire Yuqing Cui
Apoorv Khandelwal
Yoav Artzi
Noah Snavely
Hadar Averbuch-Elor
85
21
0
16 Aug 2021
MMChat: Multi-Modal Chat Dataset on Social Media
MMChat: Multi-Modal Chat Dataset on Social Media
Yinhe Zheng
Guanyi Chen
Xin Liu
K. Lin
83
33
0
16 Aug 2021
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and
  Intra-modal Knowledge Integration
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Yuhao Cui
Zhou Yu
Chunqi Wang
Zhongzhou Zhao
Ji Zhang
Meng Wang
Jun-chen Yu
VLM
59
56
0
16 Aug 2021
Previous
123...192021...363738
Next