Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.07998
Cited By
v1
v2
v3 (latest)
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"
50 / 1,868 papers shown
Title
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
Longtian Qiu
Shan Ning
Xuming He
VLM
72
4
0
04 Jan 2024
Detection-based Intermediate Supervision for Visual Question Answering
Yuhang Liu
Daowan Peng
Wei Wei
Yuanyuan Fu
Wenfeng Xie
Dangyang Chen
57
2
0
26 Dec 2023
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
Xiaoqi Li
Mingxu Zhang
Yiran Geng
Haoran Geng
Yuxing Long
Yan Shen
Renrui Zhang
Jiaming Liu
Hao Dong
LM&Ro
LRM
119
99
0
24 Dec 2023
Semantic Draw Engineering for Text-to-Image Creation
Yang Li
Huaqiang Jiang
Yangkai Wu
51
1
0
23 Dec 2023
Cycle-Consistency Learning for Captioning and Grounding
Ning Wang
Jiajun Deng
Mingbo Jia
ObjD
93
8
0
23 Dec 2023
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding
Senqiao Yang
Jiaming Liu
Ray Zhang
Mingjie Pan
Zoey Guo
Xiaoqi Li
Zehui Chen
Peng Gao
Yandong Guo
Shanghang Zhang
3DV
108
71
0
21 Dec 2023
LLM4VG: Large Language Models Evaluation for Video Grounding
Wei Feng
Xin Wang
Hong Chen
Zeyang Zhang
Zihan Song
Yuwei Zhou
Wenwu Zhu
105
8
0
21 Dec 2023
Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA
Chengen Lai
Shengli Song
Shiqi Meng
Jingyang Li
Sitong Yan
Guangneng Hu
57
5
0
21 Dec 2023
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models
Bingbing Wen
Zhengyuan Yang
Jianfeng Wang
Zhe Gan
Bill Howe
Lijuan Wang
MLLM
64
1
0
21 Dec 2023
Object Attribute Matters in Visual Question Answering
Peize Li
Q. Si
Peng Fu
Zheng Lin
Yan Wang
78
0
0
20 Dec 2023
Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual Question Answering
Chengxiang Yin
Zhengping Che
Kun Wu
Zhiyuan Xu
Jian Tang
56
0
0
20 Dec 2023
Dual Branch Network Towards Accurate Printed Mathematical Expression Recognition
Yuqing Wang
Zhenyu Weng
Zhaokun Zhou
Shuaijian Ji
Zhongjie Ye
Yuesheng Zhu
57
2
0
14 Dec 2023
Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image Captioning
Zhiyue Liu
Jinyuan Liu
Fanrong Ma
CLIP
VLM
78
12
0
14 Dec 2023
NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous Driving Datasets using Markup Annotations
Yuichi Inoue
Yuki Yada
Kotaro Tanahashi
Yu Yamaguchi
71
23
0
11 Dec 2023
RCA-NOC: Relative Contrastive Alignment for Novel Object Captioning
Jiashuo Fan
Yaoyuan Liang
Leyao Liu
Shao-Lun Huang
Lei Zhang
117
2
0
11 Dec 2023
User-Aware Prefix-Tuning is a Good Learner for Personalized Image Captioning
Xuan Wang
Guanhong Wang
Wenhao Chai
Jiayu Zhou
Gaoang Wang
153
6
0
08 Dec 2023
Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
Cong Yang
Zuchao Li
Lefei Zhang
72
26
0
02 Dec 2023
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Jacob Zhiyuan Fang
Skyler Zheng
Vasu Sharma
Robinson Piramuthu
VLM
151
0
0
28 Nov 2023
The curse of language biases in remote sensing VQA: the role of spatial attributes, language diversity, and the need for clear evaluation
Christel Chappuis
Eliot Walt
Vincent Mendez
Sylvain Lobry
B. L. Saux
D. Tuia
98
4
0
28 Nov 2023
Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models
Zhihe Lu
Jiawang Bai
Xin Li
Zeyu Xiao
Xinchao Wang
VLM
76
12
0
28 Nov 2023
EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
Jiaxuan Li
D. Vo
Akihiro Sugimoto
Hideki Nakayama
KELM
VLM
102
25
0
27 Nov 2023
LANS: A Layout-Aware Neural Solver for Plane Geometry Problem
Zhong-Zhi Li
Ming-Liang Zhang
Fei Yin
Cheng-Lin Liu
82
17
0
25 Nov 2023
DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism
Zhen Wang
Xinyun Jiang
Jun Xiao
Tao Chen
Long Chen
DiffM
52
1
0
25 Nov 2023
A Systematic Review of Deep Learning-based Research on Radiology Report Generation
Chang Liu
Yuanhe Tian
Yan Song
MedIm
115
16
0
23 Nov 2023
Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
Cheng Tan
Jingxuan Wei
Zhangyang Gao
Linzhuang Sun
Siyuan Li
Ruifeng Guo
Xihong Yang
Stan Z. Li
LRM
100
10
0
23 Nov 2023
Open-Vocabulary Camouflaged Object Segmentation
Youwei Pang
Xiaoqi Zhao
Jiaming Zuo
Lihe Zhang
Huchuan Lu
VLM
ObjD
100
6
0
19 Nov 2023
Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation
Nurbanu Aksoy
Serge Sharoff
Selçuk Başer
Nishant Ravikumar
Alejandro F Frangi
MedIm
59
5
0
18 Nov 2023
Violet: A Vision-Language Model for Arabic Image Captioning with Gemini Decoder
Abdelrahman Mohamed
Fakhraddin Alwajih
El Moatez Billah Nagoudi
Alcides Alcoba Inciarte
Muhammad Abdul-Mageed
VLM
MLLM
65
7
0
15 Nov 2023
Improving Image Captioning via Predicting Structured Concepts
Ting Wang
Weidong Chen
Yuanhe Tian
Yan Song
Zhendong Mao
84
8
0
14 Nov 2023
Improving Hateful Meme Detection through Retrieval-Guided Contrastive Learning
Jingbiao Mei
Jinghong Chen
Weizhe Lin
Bill Byrne
Marcus Tomalin
VLM
62
8
0
14 Nov 2023
Active Mining Sample Pair Semantics for Image-text Matching
Yongfeng Chen
Jin Liu
Zhijing Yang
Ruihan Chen
Junpeng Tan
VLM
50
0
0
09 Nov 2023
Improving Vision-and-Language Reasoning via Spatial Relations Modeling
Cheng Yang
Rui Xu
Ye Guo
Peixiang Huang
Yiru Chen
Wenkui Ding
Zhongyuan Wang
Hong Zhou
LRM
59
6
0
09 Nov 2023
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs
Zhenfang Chen
Rui Sun
Wenjun Liu
Yining Hong
Chuang Gan
LRM
113
15
0
08 Nov 2023
JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures for Image Captioning Models
Yuiga Wada
Kanta Kaneda
Komei Sugiura
57
4
0
07 Nov 2023
Complex Organ Mask Guided Radiology Report Generation
Tiancheng Gu
Dongnan Liu
Zhiyuan Li
Weidong Cai
MedIm
75
14
0
04 Nov 2023
A New Fine-grained Alignment Method for Image-text Matching
Yang Zhang
36
1
0
03 Nov 2023
Learning A Multi-Task Transformer Via Unified And Customized Instruction Tuning For Chest Radiograph Interpretation
Lijian Xu
Ziyu Ni
Xinglong Liu
Xiaosong Wang
Hongsheng Li
Shaoting Zhang
MedIm
LM&MA
63
4
0
02 Nov 2023
Enhanced Knowledge Injection for Radiology Report Generation
Qingqiu Li
Jilan Xu
Runtian Yuan
Mohan Chen
Yuejie Zhang
Rui Feng
Xiaobo Zhang
Shang Gao
MedIm
83
7
0
01 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
151
44
0
01 Nov 2023
Women Wearing Lipstick: Measuring the Bias Between an Object and Its Related Gender
Ahmed Sabir
Lluís Padró
67
0
0
29 Oct 2023
3D-Aware Visual Question Answering about Parts, Poses and Occlusions
Xingrui Wang
Wufei Ma
Zhuowan Li
Adam Kortylewski
Alan Yuille
CoGe
98
14
0
27 Oct 2023
ArchBERT: Bi-Modal Understanding of Neural Architectures and Natural Languages
Mohammad Akbari
Saeed Ranjbar Alvar
Behnam Kamranian
Amin Banitalebi-Dehkordi
Yong Zhang
AI4CE
31
0
0
26 Oct 2023
Cross-modal Active Complementary Learning with Self-refining Correspondence
Yang Qin
Yuan Sun
Dezhong Peng
Qiufeng Wang
Xiaocui Peng
Peng Hu
100
21
0
26 Oct 2023
Hallucination Detection for Grounded Instruction Generation
Lingjun Zhao
Khanh Nguyen
Hal Daumé
HILM
81
7
0
23 Oct 2023
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Zhecan Wang
Long Chen
Haoxuan You
Keyang Xu
Yicheng He
Wenhao Li
Noal Codella
Kai-Wei Chang
Shih-Fu Chang
105
3
0
23 Oct 2023
RECAP: Towards Precise Radiology Report Generation via Dynamic Disease Progression Reasoning
Wenjun Hou
Yi Cheng
Kaishuai Xu
Wenjie Li
Jiangming Liu
76
16
0
21 Oct 2023
Multiscale Superpixel Structured Difference Graph Convolutional Network for VL Representation
Siyu Zhang
Ye-Ting Chen
Fang Wang
Yaoru Sun
Jun Yang
Lizhi Bai
SSL
61
0
0
20 Oct 2023
PGA: Personalizing Grasping Agents with Single Human-Robot Interaction
Junghyun Kim
Gi-Cheon Kang
Jaein Kim
Seoyun Yang
Minjoon Jung
Byoung-Tak Zhang
74
0
0
19 Oct 2023
EXMODD: An EXplanatory Multimodal Open-Domain Dialogue dataset
Hang Yin
Pinren Lu
Ziang Li
Bin Sun
Kan Li
96
0
0
17 Oct 2023
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models
Yanyang Guo
Fangkai Jiao
Zhiqi Shen
Liqiang Nie
Mohan S. Kankanhalli
MLLM
81
7
0
17 Oct 2023
Previous
1
2
3
4
5
6
...
36
37
38
Next