ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.07998
  4. Cited By
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
v1v2v3 (latest)

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
    AIMat
ArXiv (abs)PDFHTML

Papers citing "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"

50 / 1,868 papers shown
Title
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via
  Text-Only Training
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
Longtian Qiu
Shan Ning
Xuming He
VLM
72
4
0
04 Jan 2024
Detection-based Intermediate Supervision for Visual Question Answering
Detection-based Intermediate Supervision for Visual Question Answering
Yuhang Liu
Daowan Peng
Wei Wei
Yuanyuan Fu
Wenfeng Xie
Dangyang Chen
57
2
0
26 Dec 2023
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric
  Robotic Manipulation
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
Xiaoqi Li
Mingxu Zhang
Yiran Geng
Haoran Geng
Yuxing Long
Yan Shen
Renrui Zhang
Jiaming Liu
Hao Dong
LM&RoLRM
119
99
0
24 Dec 2023
Semantic Draw Engineering for Text-to-Image Creation
Semantic Draw Engineering for Text-to-Image Creation
Yang Li
Huaqiang Jiang
Yangkai Wu
51
1
0
23 Dec 2023
Cycle-Consistency Learning for Captioning and Grounding
Cycle-Consistency Learning for Captioning and Grounding
Ning Wang
Jiajun Deng
Mingbo Jia
ObjD
93
8
0
23 Dec 2023
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR
  Understanding
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding
Senqiao Yang
Jiaming Liu
Ray Zhang
Mingjie Pan
Zoey Guo
Xiaoqi Li
Zehui Chen
Peng Gao
Yandong Guo
Shanghang Zhang
3DV
108
71
0
21 Dec 2023
LLM4VG: Large Language Models Evaluation for Video Grounding
LLM4VG: Large Language Models Evaluation for Video Grounding
Wei Feng
Xin Wang
Hong Chen
Zeyang Zhang
Zihan Song
Yuwei Zhou
Wenwu Zhu
105
8
0
21 Dec 2023
Towards More Faithful Natural Language Explanation Using Multi-Level
  Contrastive Learning in VQA
Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA
Chengen Lai
Shengli Song
Shiqi Meng
Jingyang Li
Sitong Yan
Guangneng Hu
57
5
0
21 Dec 2023
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large
  Multimodal and Language Models
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models
Bingbing Wen
Zhengyuan Yang
Jianfeng Wang
Zhe Gan
Bill Howe
Lijuan Wang
MLLM
64
1
0
21 Dec 2023
Object Attribute Matters in Visual Question Answering
Object Attribute Matters in Visual Question Answering
Peize Li
Q. Si
Peng Fu
Zheng Lin
Yan Wang
78
0
0
20 Dec 2023
Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual
  Question Answering
Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual Question Answering
Chengxiang Yin
Zhengping Che
Kun Wu
Zhiyuan Xu
Jian Tang
56
0
0
20 Dec 2023
Dual Branch Network Towards Accurate Printed Mathematical Expression
  Recognition
Dual Branch Network Towards Accurate Printed Mathematical Expression Recognition
Yuqing Wang
Zhenyu Weng
Zhaokun Zhou
Shuaijian Ji
Zhongjie Ye
Yuesheng Zhu
57
2
0
14 Dec 2023
Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image
  Captioning
Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image Captioning
Zhiyue Liu
Jinyuan Liu
Fanrong Ma
CLIPVLM
78
12
0
14 Dec 2023
NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous
  Driving Datasets using Markup Annotations
NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous Driving Datasets using Markup Annotations
Yuichi Inoue
Yuki Yada
Kotaro Tanahashi
Yu Yamaguchi
71
23
0
11 Dec 2023
RCA-NOC: Relative Contrastive Alignment for Novel Object Captioning
RCA-NOC: Relative Contrastive Alignment for Novel Object Captioning
Jiashuo Fan
Yaoyuan Liang
Leyao Liu
Shao-Lun Huang
Lei Zhang
117
2
0
11 Dec 2023
User-Aware Prefix-Tuning is a Good Learner for Personalized Image
  Captioning
User-Aware Prefix-Tuning is a Good Learner for Personalized Image Captioning
Xuan Wang
Guanhong Wang
Wenhao Chai
Jiayu Zhou
Gaoang Wang
153
6
0
08 Dec 2023
Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image
  Captioning
Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
Cong Yang
Zuchao Li
Lefei Zhang
72
26
0
02 Dec 2023
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with
  Semantic Vector-Quantized Tokenizer
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Jacob Zhiyuan Fang
Skyler Zheng
Vasu Sharma
Robinson Piramuthu
VLM
151
0
0
28 Nov 2023
The curse of language biases in remote sensing VQA: the role of spatial
  attributes, language diversity, and the need for clear evaluation
The curse of language biases in remote sensing VQA: the role of spatial attributes, language diversity, and the need for clear evaluation
Christel Chappuis
Eliot Walt
Vincent Mendez
Sylvain Lobry
B. L. Saux
D. Tuia
98
4
0
28 Nov 2023
Beyond Sole Strength: Customized Ensembles for Generalized
  Vision-Language Models
Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models
Zhihe Lu
Jiawang Bai
Xin Li
Zeyu Xiao
Xinchao Wang
VLM
76
12
0
28 Nov 2023
EVCap: Retrieval-Augmented Image Captioning with External Visual-Name
  Memory for Open-World Comprehension
EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
Jiaxuan Li
D. Vo
Akihiro Sugimoto
Hideki Nakayama
KELMVLM
102
25
0
27 Nov 2023
LANS: A Layout-Aware Neural Solver for Plane Geometry Problem
LANS: A Layout-Aware Neural Solver for Plane Geometry Problem
Zhong-Zhi Li
Ming-Liang Zhang
Fei Yin
Cheng-Lin Liu
82
17
0
25 Nov 2023
DECap: Towards Generalized Explicit Caption Editing via Diffusion
  Mechanism
DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism
Zhen Wang
Xinyun Jiang
Jun Xiao
Tao Chen
Long Chen
DiffM
52
1
0
25 Nov 2023
A Systematic Review of Deep Learning-based Research on Radiology Report
  Generation
A Systematic Review of Deep Learning-based Research on Radiology Report Generation
Chang Liu
Yuanhe Tian
Yan Song
MedIm
115
16
0
23 Nov 2023
Boosting the Power of Small Multimodal Reasoning Models to Match Larger
  Models with Self-Consistency Training
Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
Cheng Tan
Jingxuan Wei
Zhangyang Gao
Linzhuang Sun
Siyuan Li
Ruifeng Guo
Xihong Yang
Stan Z. Li
LRM
100
10
0
23 Nov 2023
Open-Vocabulary Camouflaged Object Segmentation
Open-Vocabulary Camouflaged Object Segmentation
Youwei Pang
Xiaoqi Zhao
Jiaming Zuo
Lihe Zhang
Huchuan Lu
VLMObjD
100
6
0
19 Nov 2023
Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report
  Generation
Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation
Nurbanu Aksoy
Serge Sharoff
Selçuk Başer
Nishant Ravikumar
Alejandro F Frangi
MedIm
59
5
0
18 Nov 2023
Violet: A Vision-Language Model for Arabic Image Captioning with Gemini
  Decoder
Violet: A Vision-Language Model for Arabic Image Captioning with Gemini Decoder
Abdelrahman Mohamed
Fakhraddin Alwajih
El Moatez Billah Nagoudi
Alcides Alcoba Inciarte
Muhammad Abdul-Mageed
VLMMLLM
65
7
0
15 Nov 2023
Improving Image Captioning via Predicting Structured Concepts
Improving Image Captioning via Predicting Structured Concepts
Ting Wang
Weidong Chen
Yuanhe Tian
Yan Song
Zhendong Mao
84
8
0
14 Nov 2023
Improving Hateful Meme Detection through Retrieval-Guided Contrastive
  Learning
Improving Hateful Meme Detection through Retrieval-Guided Contrastive Learning
Jingbiao Mei
Jinghong Chen
Weizhe Lin
Bill Byrne
Marcus Tomalin
VLM
62
8
0
14 Nov 2023
Active Mining Sample Pair Semantics for Image-text Matching
Active Mining Sample Pair Semantics for Image-text Matching
Yongfeng Chen
Jin Liu
Zhijing Yang
Ruihan Chen
Junpeng Tan
VLM
50
0
0
09 Nov 2023
Improving Vision-and-Language Reasoning via Spatial Relations Modeling
Improving Vision-and-Language Reasoning via Spatial Relations Modeling
Cheng Yang
Rui Xu
Ye Guo
Peixiang Huang
Yiru Chen
Wenkui Ding
Zhongyuan Wang
Hong Zhou
LRM
59
6
0
09 Nov 2023
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and
  reusing ModulEs
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs
Zhenfang Chen
Rui Sun
Wenjun Liu
Yining Hong
Chuang Gan
LRM
113
15
0
08 Nov 2023
JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures
  for Image Captioning Models
JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures for Image Captioning Models
Yuiga Wada
Kanta Kaneda
Komei Sugiura
57
4
0
07 Nov 2023
Complex Organ Mask Guided Radiology Report Generation
Complex Organ Mask Guided Radiology Report Generation
Tiancheng Gu
Dongnan Liu
Zhiyuan Li
Weidong Cai
MedIm
75
14
0
04 Nov 2023
A New Fine-grained Alignment Method for Image-text Matching
A New Fine-grained Alignment Method for Image-text Matching
Yang Zhang
36
1
0
03 Nov 2023
Learning A Multi-Task Transformer Via Unified And Customized Instruction
  Tuning For Chest Radiograph Interpretation
Learning A Multi-Task Transformer Via Unified And Customized Instruction Tuning For Chest Radiograph Interpretation
Lijian Xu
Ziyu Ni
Xinglong Liu
Xiaosong Wang
Hongsheng Li
Shaoting Zhang
MedImLM&MA
63
4
0
02 Nov 2023
Enhanced Knowledge Injection for Radiology Report Generation
Enhanced Knowledge Injection for Radiology Report Generation
Qingqiu Li
Jilan Xu
Runtian Yuan
Mohan Chen
Yuejie Zhang
Rui Feng
Xiaobo Zhang
Shang Gao
MedIm
83
7
0
01 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering
  (VQA) Approaches, Challenges, and Opportunities
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
151
44
0
01 Nov 2023
Women Wearing Lipstick: Measuring the Bias Between an Object and Its
  Related Gender
Women Wearing Lipstick: Measuring the Bias Between an Object and Its Related Gender
Ahmed Sabir
Lluís Padró
67
0
0
29 Oct 2023
3D-Aware Visual Question Answering about Parts, Poses and Occlusions
3D-Aware Visual Question Answering about Parts, Poses and Occlusions
Xingrui Wang
Wufei Ma
Zhuowan Li
Adam Kortylewski
Alan Yuille
CoGe
98
14
0
27 Oct 2023
ArchBERT: Bi-Modal Understanding of Neural Architectures and Natural
  Languages
ArchBERT: Bi-Modal Understanding of Neural Architectures and Natural Languages
Mohammad Akbari
Saeed Ranjbar Alvar
Behnam Kamranian
Amin Banitalebi-Dehkordi
Yong Zhang
AI4CE
31
0
0
26 Oct 2023
Cross-modal Active Complementary Learning with Self-refining
  Correspondence
Cross-modal Active Complementary Learning with Self-refining Correspondence
Yang Qin
Yuan Sun
Dezhong Peng
Qiufeng Wang
Xiaocui Peng
Peng Hu
100
21
0
26 Oct 2023
Hallucination Detection for Grounded Instruction Generation
Hallucination Detection for Grounded Instruction Generation
Lingjun Zhao
Khanh Nguyen
Hal Daumé
HILM
81
7
0
23 Oct 2023
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and
  Beyond
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Zhecan Wang
Long Chen
Haoxuan You
Keyang Xu
Yicheng He
Wenhao Li
Noal Codella
Kai-Wei Chang
Shih-Fu Chang
105
3
0
23 Oct 2023
RECAP: Towards Precise Radiology Report Generation via Dynamic Disease
  Progression Reasoning
RECAP: Towards Precise Radiology Report Generation via Dynamic Disease Progression Reasoning
Wenjun Hou
Yi Cheng
Kaishuai Xu
Wenjie Li
Jiangming Liu
76
16
0
21 Oct 2023
Multiscale Superpixel Structured Difference Graph Convolutional Network
  for VL Representation
Multiscale Superpixel Structured Difference Graph Convolutional Network for VL Representation
Siyu Zhang
Ye-Ting Chen
Fang Wang
Yaoru Sun
Jun Yang
Lizhi Bai
SSL
61
0
0
20 Oct 2023
PGA: Personalizing Grasping Agents with Single Human-Robot Interaction
PGA: Personalizing Grasping Agents with Single Human-Robot Interaction
Junghyun Kim
Gi-Cheon Kang
Jaein Kim
Seoyun Yang
Minjoon Jung
Byoung-Tak Zhang
74
0
0
19 Oct 2023
EXMODD: An EXplanatory Multimodal Open-Domain Dialogue dataset
EXMODD: An EXplanatory Multimodal Open-Domain Dialogue dataset
Hang Yin
Pinren Lu
Ziang Li
Bin Sun
Kan Li
96
0
0
17 Oct 2023
UNK-VQA: A Dataset and a Probe into the Abstention Ability of
  Multi-modal Large Models
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models
Yanyang Guo
Fangkai Jiao
Zhiqi Shen
Liqiang Nie
Mohan S. Kankanhalli
MLLM
81
7
0
17 Oct 2023
Previous
123456...363738
Next