Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.07998
Cited By
v1
v2
v3 (latest)
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"
50 / 1,868 papers shown
Title
MTG: A Benchmark Suite for Multilingual Text Generation
Yiran Chen
Zhenqiao Song
Xianze Wu
Danqing Wang
Jingjing Xu
Jiaze Chen
Hao Zhou
Lei Li
LRM
VLM
82
22
0
13 Aug 2021
Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision
Xiaoshi Wu
Hadar Averbuch-Elor
J. Sun
Noah Snavely
84
19
0
12 Aug 2021
A Better Loss for Visual-Textual Grounding
Davide Rigoni
Luciano Serafini
A. Sperduti
ObjD
60
3
0
11 Aug 2021
Medical-VLBERT: Medical Visual Language BERT for COVID-19 CT Report Generation With Alternate Learning
Guangyi Liu
Yinghong Liao
Fuyu Wang
Bin Zhang
Lu Zhang
...
Xiang Wan
Shaolin Li
Zhen Li
Shuixing Zhang
Shuguang Cui
114
59
0
11 Aug 2021
BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis
Masoud Monajatipoor
Mozhdeh Rouhsedaghat
Liunian Harold Li
Aichi Chien
C.-C. Jay Kuo
Fabien Scalzo
Kai-Wei Chang
LM&MA
MedIm
60
31
0
10 Aug 2021
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
Zheyuan Liu
Cristian Rodriguez-Opazo
Damien Teney
Stephen Gould
VLM
79
207
0
09 Aug 2021
Discriminative Latent Semantic Graph for Video Captioning
Yang Bai
Junyan Wang
Yang Long
Bingzhang Hu
Yang Song
Maurice Pagnucco
Yu Guan
86
31
0
08 Aug 2021
Interpretable Visual Understanding with Cognitive Attention Network
Xuejiao Tang
Wenbin Zhang
Yi Yu
Kea Turner
Hanyu Wang
Mengyu Wang
Eirini Ntoutsi
136
12
0
06 Aug 2021
Neural Twins Talk & Alternative Calculations
Zanyar Zohourianshahzadi
Jugal Kalita
52
0
0
05 Aug 2021
Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval
Xuri Ge
Fuhai Chen
J. Jose
Zhilong Ji
Zhongqin Wu
Xiao-Chang Liu
72
57
0
05 Aug 2021
TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding
Dailan He
Yusheng Zhao
Junyu Luo
Tianrui Hui
Shaofei Huang
Aixi Zhang
Si Liu
ViT
67
95
0
05 Aug 2021
Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning
Xinzhi Dong
Chengjiang Long
Wenju Xu
Chunxia Xiao
ViT
147
68
0
05 Aug 2021
Ordered Attention for Coherent Visual Storytelling
Tom Braude
Idan Schwartz
Alex Schwing
Ariel Shamir
61
9
0
04 Aug 2021
Question-controlled Text-aware Image Captioning
Anwen Hu
Shizhe Chen
Qin Jin
76
15
0
04 Aug 2021
ICECAP: Information Concentrated Entity-aware Image Captioning
Anwen Hu
Shizhe Chen
Qin Jin
61
20
0
04 Aug 2021
Sparse Continuous Distributions and Fenchel-Young Losses
André F. T. Martins
Marcos Vinícius Treviso
António Farinhas
P. Aguiar
Mário A. T. Figueiredo
Mathieu Blondel
Vlad Niculae
76
12
0
04 Aug 2021
RAIN: Reinforced Hybrid Attention Inference Network for Motion Forecasting
Jiachen Li
Fan Yang
Hengbo Ma
Srikanth Malla
Masayoshi Tomizuka
Chiho Choi
91
42
0
03 Aug 2021
Distributed Attention for Grounded Image Captioning
Nenglun Chen
Xingjia Pan
Runnan Chen
Lei Yang
Zhiwen Lin
Yuqiang Ren
Haolei Yuan
Xiaowei Guo
Feiyue Huang
Wenping Wang
71
21
0
02 Aug 2021
From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection
Jiajun Deng
Wen-gang Zhou
Yanyong Zhang
Houqiang Li
3DPC
83
76
0
30 Jul 2021
ReFormer: The Relational Transformer for Image Captioning
Xuewen Yang
Yingru Liu
Xin Wang
ViT
103
57
0
29 Jul 2021
Bridging Gap between Image Pixels and Semantics via Supervision: A Survey
Jiali Duan
C.-C. Jay Kuo
96
8
0
29 Jul 2021
Greedy Gradient Ensemble for Robust Visual Question Answering
Xinzhe Han
Shuhui Wang
Chi Su
Qingming Huang
Q. Tian
65
78
0
27 Jul 2021
Image Scene Graph Generation (SGG) Benchmark
Xiao Han
Jianwei Yang
Houdong Hu
Lei Zhang
Jianfeng Gao
Pengchuan Zhang
65
38
0
27 Jul 2021
Language Grounding with 3D Objects
Jesse Thomason
Mohit Shridhar
Yonatan Bisk
Chris Paxton
Luke Zettlemoyer
LM&Ro
88
53
0
26 Jul 2021
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
Yuren Cong
Wentong Liao
H. Ackermann
Bodo Rosenhahn
M. Yang
ViT
72
128
0
26 Jul 2021
Language Models as Zero-shot Visual Semantic Learners
Yue Jiao
Jonathon S. Hare
Adam Prugel-Bennett
VLM
36
0
0
26 Jul 2021
Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph
Wentian Zhao
Yao Hu
Heda Wang
Xinxiao Wu
Jiebo Luo
55
49
0
26 Jul 2021
X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question Answering
Jingjing Jiang
Zi-yi Liu
Yifan Liu
Zhixiong Nan
N. Zheng
OOD
81
19
0
24 Jul 2021
Neural Abstructions: Abstractions that Support Construction for Grounded Language Learning
Kaylee Burns
Christopher D. Manning
Li Fei-Fei
47
0
0
20 Jul 2021
Separating Skills and Concepts for Novel Visual Question Answering
Spencer Whitehead
Hui Wu
Heng Ji
Rogerio Feris
Kate Saenko
CoGe
95
34
0
19 Jul 2021
Variational Topic Inference for Chest X-Ray Report Generation
Ivona Najdenkoska
Xiantong Zhen
M. Worring
Ling Shao
MedIm
88
29
0
15 Jul 2021
Surgical Instruction Generation with Transformers
Jinglu Zhang
Y. Nie
Jian Chang
Jiangning Zhang
MedIm
94
13
0
14 Jul 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
153
270
0
14 Jul 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
270
412
0
13 Jul 2021
Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question Answering
Rajat Koner
Hang Li
Marcel Hildebrandt
Deepan Das
Volker Tresp
Stephan Günnemann
63
34
0
13 Jul 2021
Human Attention during Goal-directed Reading Comprehension Relies on Task Optimization
Jiajie Zou
Yuran Zhang
Jialu Li
Xing Tian
Nai Ding
AIMat
92
2
0
13 Jul 2021
Zero-shot Visual Question Answering using Knowledge Graph
Zhuo Chen
Jiaoyan Chen
Yuxia Geng
Jeff Z. Pan
Zonggang Yuan
Huajun Chen
87
70
0
12 Jul 2021
Modeling Explicit Concerning States for Reinforcement Learning in Visual Dialogue
Zipeng Xu
Fandong Meng
Xiaojie Wang
Duo Zheng
Chenxu Lv
Jie Zhou
OffRL
72
6
0
12 Jul 2021
Learn from Anywhere: Rethinking Generalized Zero-Shot Learning with Limited Supervision
Gaurav Bhatt
Shivam Chandhok
V. Balasubramanian
56
1
0
11 Jul 2021
MuVAM: A Multi-View Attention-based Model for Medical Visual Question Answering
Haiwei Pan
Shuning He
Kejia Zhang
Bo Qu
Chunling Chen
Kun Shi
56
11
0
07 Jul 2021
Deep Learning for Embodied Vision Navigation: A Survey
Fengda Zhu
Yi Zhu
Vincent CS Lee
Xiaodan Liang
Xiaojun Chang
EgoV
LM&Ro
101
0
0
07 Jul 2021
PhotoChat: A Human-Human Dialogue Dataset with Photo Sharing Behavior for Joint Image-Text Modeling
Xiaoxue Zang
Lijuan Liu
Maria Wang
Yang Song
Hao Zhang
Jindong Chen
VLM
99
60
0
06 Jul 2021
Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering
Siddharth Karamcheti
Ranjay Krishna
Li Fei-Fei
Christopher D. Manning
96
92
0
06 Jul 2021
Parts2Words: Learning Joint Embedding of Point Clouds and Texts by Bidirectional Matching between Parts and Words
Chuan Tang
Xi Yang
Bojian Wu
Zhizhong Han
Yi Chang
3DPC
91
13
0
05 Jul 2021
Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory
Xuejiao Tang
Xin Huang
Wenbin Zhang
T. Child
Qiong Hu
Zhen Liu
Ji Zhang
LRM
81
19
0
04 Jul 2021
Case Relation Transformer: A Crossmodal Language Generation Model for Fetching Instructions
Motonari Kambara
K. Sugiura
ViT
62
6
0
02 Jul 2021
Egocentric Image Captioning for Privacy-Preserved Passive Dietary Intake Monitoring
Jianing Qiu
Frank P.-W. Lo
Xiao Gu
M. Jobarteh
Wenyan Jia
...
M. McCrory
Edward Sazonov
Mingui Sun
Gary Frost
Benny Lo
EgoV
64
19
0
01 Jul 2021
Deep auxiliary learning for visual localization using colorization task
Mi Tian
Qiong Nie
Hao Shen
Xiahua Xia
SSL
27
1
0
01 Jul 2021
Contrastive Semantic Similarity Learning for Image Captioning Evaluation with Intrinsic Auto-encoder
Chao Zeng
Tiesong Zhao
Sam Kwong
92
2
0
29 Jun 2021
Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs
Daniel Reich
F. Putze
Tanja Schultz
58
2
0
28 Jun 2021
Previous
1
2
3
...
20
21
22
...
36
37
38
Next