ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.07998
  4. Cited By
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
v1v2v3 (latest)

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
    AIMat
ArXiv (abs)PDFHTML

Papers citing "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"

50 / 1,868 papers shown
Title
Local Interpretations for Explainable Natural Language Processing: A
  Survey
Local Interpretations for Explainable Natural Language Processing: A Survey
Siwen Luo
Hamish Ivison
S. Han
Josiah Poon
MILM
120
51
0
20 Mar 2021
Let Your Heart Speak in its Mother Tongue: Multilingual Captioning of
  Cardiac Signals
Let Your Heart Speak in its Mother Tongue: Multilingual Captioning of Cardiac Signals
Dani Kiyasseh
T. Zhu
David Clifton
122
0
0
19 Mar 2021
ClawCraneNet: Leveraging Object-level Relation for Text-based Video
  Segmentation
ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation
Chen Liang
Yu Wu
Yawei Luo
Yi Yang
VOS
101
30
0
19 Mar 2021
LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time
  Image-Text Retrieval
LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval
Siqi Sun
Yen-Chun Chen
Linjie Li
Shuohang Wang
Yuwei Fang
Jingjing Liu
VLM
89
84
0
16 Mar 2021
Visual Cues and Error Correction for Translation Robustness
Visual Cues and Error Correction for Translation Robustness
Zhenhao Li
Marek Rei
Lucia Specia
57
3
0
12 Mar 2021
Relationship-based Neural Baby Talk
Relationship-based Neural Baby Talk
Fan Fu
Tingting Xie
Ioannis Patras
Sepehr Jalali
32
0
0
08 Mar 2021
Causal Attention for Vision-Language Tasks
Causal Attention for Vision-Language Tasks
Xu Yang
Hanwang Zhang
Guojun Qi
Jianfei Cai
CML
101
158
0
05 Mar 2021
Visual Question Answering: which investigated applications?
Visual Question Answering: which investigated applications?
Silvio Barra
Carmen Bisogni
M. De Marsico
S. Ricciardi
80
38
0
04 Mar 2021
Learning With Context Feedback Loop for Robust Medical Image
  Segmentation
Learning With Context Feedback Loop for Robust Medical Image Segmentation
K. Girum
G. Créhange
A. Lalande
MedIm
56
39
0
04 Mar 2021
Learning Asynchronous and Sparse Human-Object Interaction in Videos
Learning Asynchronous and Sparse Human-Object Interaction in Videos
Romero Morais
Vuong Le
Svetha Venkatesh
T. Tran
46
30
0
03 Mar 2021
Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval with
  Partial Query
Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval with Partial Query
Guanyu Cai
Jun Zhang
Xinyang Jiang
Yifei Gong
Lianghua He
Fufu Yu
Pai Peng
Xiaowei Guo
Feiyue Huang
Xing Sun
77
13
0
02 Mar 2021
A Universal Model for Cross Modality Mapping by Relational Reasoning
A Universal Model for Cross Modality Mapping by Relational Reasoning
Zun Li
Congyan Lang
Liqian Liang
Tao Wang
Songhe Feng
Jun Wu
Yidong Li
56
2
0
26 Feb 2021
Enhanced Modality Transition for Image Captioning
Enhanced Modality Transition for Image Captioning
Ziwei Wang
Yadan Luo
Zi Huang
28
0
0
23 Feb 2021
Exploiting Multimodal Reinforcement Learning for Simultaneous Machine
  Translation
Exploiting Multimodal Reinforcement Learning for Simultaneous Machine Translation
Julia Ive
A. Li
Yishu Miao
Ozan Caglayan
Pranava Madhyastha
Lucia Specia
70
12
0
22 Feb 2021
UniT: Multimodal Multitask Learning with a Unified Transformer
UniT: Multimodal Multitask Learning with a Unified Transformer
Ronghang Hu
Amanpreet Singh
ViT
106
301
0
22 Feb 2021
Learning Compositional Representation for Few-shot Visual Question
  Answering
Learning Compositional Representation for Few-shot Visual Question Answering
Dalu Guo
Dacheng Tao
OODCoGe
64
4
0
21 Feb 2021
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for
  Image Captioning
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning
Jun Chen
Han Guo
Kai Yi
Boyang Albert Li
Mohamed Elhoseiny
VLM
164
227
0
20 Feb 2021
Progressive Transformer-Based Generation of Radiology Reports
Progressive Transformer-Based Generation of Radiology Reports
Farhad Nooralahzadeh
Nicolas Andres Perez Gonzalez
T. Frauenfelder
Koji Fujimoto
Michael Krauthammer
ViTMedIm
108
89
0
19 Feb 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
541
1,143
0
17 Feb 2021
Learning to Recognize Actions on Objects in Egocentric Video with
  Attention Dictionaries
Learning to Recognize Actions on Objects in Egocentric Video with Attention Dictionaries
Swathikiran Sudhakaran
Sergio Escalera
Oswald Lanz
EgoV
63
16
0
16 Feb 2021
The MSR-Video to Text Dataset with Clean Annotations
The MSR-Video to Text Dataset with Clean Annotations
Haoran Chen
Jianmin Li
Simone Frintrop
Xiaolin Hu
85
18
0
12 Feb 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse
  Sampling
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
CLIP
179
665
0
11 Feb 2021
A Metamodel and Framework for Artificial General Intelligence From
  Theory to Practice
A Metamodel and Framework for Artificial General Intelligence From Theory to Practice
Hugo Latapie
Özkan Kiliç
Gaowen Liu
Yan Yan
Ramana Rao Kompella
Pei Wang
K. Thórisson
Adam Lawrence
Yuhong Sun
Jayanth Srinivasa
AI4CE
61
9
0
11 Feb 2021
In Defense of Scene Graphs for Image Captioning
In Defense of Scene Graphs for Image Captioning
Kien Nguyen
Subarna Tripathi
Bang Du
T. Guha
Truong Thao Nguyen
81
46
0
09 Feb 2021
Telling the What while Pointing to the Where: Multimodal Queries for
  Image Retrieval
Telling the What while Pointing to the Where: Multimodal Queries for Image Retrieval
Soravit Changpinyo
Jordi Pont-Tuset
V. Ferrari
Radu Soricut
66
26
0
09 Feb 2021
ViLT: Vision-and-Language Transformer Without Convolution or Region
  Supervision
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Wonjae Kim
Bokyung Son
Ildoo Kim
VLMCLIP
190
1,775
0
05 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Joey Tianyi Zhou
MLLM
392
547
0
04 Feb 2021
Answer Questions with Right Image Regions: A Visual Attention
  Regularization Approach
Answer Questions with Right Image Regions: A Visual Attention Regularization Approach
Yebin Liu
Yangyang Guo
Jianhua Yin
Xuemeng Song
Weifeng Liu
Liqiang Nie
70
30
0
03 Feb 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal
  Transformers
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
148
117
0
31 Jan 2021
VX2TEXT: End-to-End Learning of Video-Based Text Generation From
  Multimodal Inputs
VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
Xudong Lin
Gedas Bertasius
Jue Wang
Shih-Fu Chang
Devi Parikh
Lorenzo Torresani
VGen
102
67
0
28 Jan 2021
The Role of Syntactic Planning in Compositional Image Captioning
The Role of Syntactic Planning in Compositional Image Captioning
Emanuele Bugliarello
Desmond Elliott
CoGe
68
14
0
28 Jan 2021
DOC2PPT: Automatic Presentation Slides Generation from Scientific
  Documents
DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents
Tsu-Jui Fu
Wenjie Wang
Daniel J. McDuff
Yale Song
90
53
0
28 Jan 2021
Scheduled Sampling in Vision-Language Pretraining with Decoupled
  Encoder-Decoder Network
Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network
Yehao Li
Yingwei Pan
Ting Yao
Jingwen Chen
Tao Mei
VLM
95
53
0
27 Jan 2021
CPTR: Full Transformer Network for Image Captioning
CPTR: Full Transformer Network for Image Captioning
Wei Liu
Sihan Chen
Longteng Guo
Xinxin Zhu
Jing Liu
ViT
57
143
0
26 Jan 2021
Weakly Supervised Thoracic Disease Localization via Disease Masks
Weakly Supervised Thoracic Disease Localization via Disease Masks
Hyun-woo Kim
Hong G Jung
Seong-Whan Lee
49
9
0
25 Jan 2021
ECOL-R: Encouraging Copying in Novel Object Captioning with
  Reinforcement Learning
ECOL-R: Encouraging Copying in Novel Object Captioning with Reinforcement Learning
Yufei Wang
Ian D. Wood
Stephen Wan
Mark Johnson
43
8
0
25 Jan 2021
Fast Sequence Generation with Multi-Agent Reinforcement Learning
Fast Sequence Generation with Multi-Agent Reinforcement Learning
Longteng Guo
Jing Liu
Xinxin Zhu
Hanqing Lu
LRM
98
6
0
24 Jan 2021
Visual Question Answering based on Local-Scene-Aware Referring
  Expression Generation
Visual Question Answering based on Local-Scene-Aware Referring Expression Generation
Jungjun Kim
Dong-Gyu Lee
Jialin Wu
Hong G Jung
Seong-Whan Lee
ObjD
91
22
0
22 Jan 2021
Macroscopic Control of Text Generation for Image Captioning
Macroscopic Control of Text Generation for Image Captioning
Zhangzi Zhu
Tianlei Wang
Hong Qu
79
4
0
20 Jan 2021
ArtEmis: Affective Language for Visual Art
ArtEmis: Affective Language for Visual Art
Panos Achlioptas
M. Ovsjanikov
Kilichbek Haydarov
Mohamed Elhoseiny
Leonidas Guibas
72
121
0
19 Jan 2021
Dual-Level Collaborative Transformer for Image Captioning
Dual-Level Collaborative Transformer for Image Captioning
Yunpeng Luo
Jiayi Ji
Xiaoshuai Sun
Liujuan Cao
Yongjian Wu
Feiyue Huang
Chia-Wen Lin
Rongrong Ji
ViT
86
283
0
16 Jan 2021
Latent Variable Models for Visual Question Answering
Latent Variable Models for Visual Question Answering
Zixu Wang
Yishu Miao
Lucia Specia
137
5
0
16 Jan 2021
Reasoning over Vision and Language: Exploring the Benefits of
  Supplemental Knowledge
Reasoning over Vision and Language: Exploring the Benefits of Supplemental Knowledge
Violetta Shevchenko
Damien Teney
A. Dick
Anton Van Den Hengel
83
28
0
15 Jan 2021
Understanding the Role of Scene Graphs in Visual Question Answering
Understanding the Role of Scene Graphs in Visual Question Answering
Vinay Damodaran
Sharanya Chakravarthy
Akshay Kumar
Anjana Umapathy
Teruko Mitamura
Yuta Nakashima
Noa Garcia
Chenhui Chu
GNN
161
33
0
14 Jan 2021
Unifying Relational Sentence Generation and Retrieval for Medical Image
  Report Composition
Unifying Relational Sentence Generation and Retrieval for Medical Image Report Composition
Fuyu Wang
Xiaodan Liang
Lin Xu
Liang Lin
MedIm
77
27
0
09 Jan 2021
MSD: Saliency-aware Knowledge Distillation for Multimodal Understanding
MSD: Saliency-aware Knowledge Distillation for Multimodal Understanding
Woojeong Jin
Maziar Sanjabi
Shaoliang Nie
L Tan
Xiang Ren
Hamed Firooz
30
6
0
06 Jan 2021
Similarity Reasoning and Filtration for Image-Text Matching
Similarity Reasoning and Filtration for Image-Text Matching
Haiwen Diao
Ying Zhang
Lingyun Ma
Huchuan Lu
307
348
0
05 Jan 2021
VinVL: Revisiting Visual Representations in Vision-Language Models
VinVL: Revisiting Visual Representations in Vision-Language Models
Pengchuan Zhang
Xiujun Li
Xiaowei Hu
Jianwei Yang
Lei Zhang
Lijuan Wang
Yejin Choi
Jianfeng Gao
ObjDVLM
347
158
0
02 Jan 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Wei-Ning Hsu
David Harwath
Christopher Song
James R. Glass
CLIP
90
67
0
31 Dec 2020
OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual
  Contexts
OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual Contexts
Yuxian Meng
Shuhe Wang
Qinghong Han
Xiaofei Sun
Leilei Gan
Rui Yan
Jiwei Li
93
30
0
30 Dec 2020
Previous
123...232425...363738
Next