Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1809.07041
Cited By
Exploring Visual Relationship for Image Captioning
19 September 2018
Ting Yao
Yingwei Pan
Yehao Li
Tao Mei
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Exploring Visual Relationship for Image Captioning"
50 / 321 papers shown
Title
Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation
Israa A. Albadarneh
Bassam Hammo
Omar Al-Kadi
VLM
27
0
0
03 Jun 2025
Locality-Aware Zero-Shot Human-Object Interaction Detection
Sanghyun Kim
Deunsol Jung
Minsu Cho
VLM
183
0
0
26 May 2025
LLM Meets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical Study
Dongil Yang
Minjin Kim
Sunghwan Kim
Beong-woo Kwak
Minjun Park
Jinseok Hong
Woontack Woo
Jinyoung Yeo
58
0
0
26 May 2025
Tri-FusionNet: Enhancing Image Description Generation with Transformer-based Fusion Network and Dual Attention Mechanism
Lakshita Agarwal
Bindu Verma
ViT
129
0
0
23 Apr 2025
PRISM-0: A Predicate-Rich Scene Graph Generation Framework for Zero-Shot Open-Vocabulary Tasks
Abdelrahman Elskhawy
Mengze Li
Nassir Navab
Benjamin Busam
VLM
95
1
0
01 Apr 2025
An Image-like Diffusion Method for Human-Object Interaction Detection
Xiaofei Hui
Haoxuan Qu
Hossein Rahmani
Jun Liu
DiffM
126
0
0
23 Mar 2025
DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition
Caoshuo Li
Tanzhe Li
Xiaobin Hu
Donghao Luo
Taisong Jin
93
1
0
19 Mar 2025
SuperCap: Multi-resolution Superpixel-based Image Captioning
Henry Senior
Luca Rossi
Gregory Slabaugh
Shanxin Yuan
VLM
108
0
0
11 Mar 2025
Predicate Hierarchies Improve Few-Shot State Classification
Emily Jin
Joy Hsu
Jiajun Wu
OffRL
146
0
0
18 Feb 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
139
0
0
03 Jan 2025
VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis
Donggoo Kang
Dasol Jeong
Hyunmin Lee
Sangwoo Park
Hasil Park
Sunkyu Kwon
Yeongjoon Kim
Joonki Paik
MLLM
VLM
148
0
0
27 Nov 2024
EZ-HOI: VLM Adaptation via Guided Prompt Learning for Zero-Shot HOI Detection
Qinqian Lei
Bo Wang
Robby T. Tan
VLM
94
5
0
31 Oct 2024
CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News Detection
Devank
Jayateja Kalla
Soma Biswas
67
2
0
06 Oct 2024
Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DV
VLM
86
1
0
28 Aug 2024
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
Nicholas Moratelli
Davide Caffagni
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
CLIP
95
3
0
26 Aug 2024
Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models
Kening Zheng
Junkai Chen
Yibo Yan
Xin Zou
Xuming Hu
229
7
0
18 Aug 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
71
0
0
09 Aug 2024
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
Yiwei Ma
Zhibin Wang
Xiaoshuai Sun
Weihuang Lin
Qiang-feng Zhou
Jiayi Ji
Rongrong Ji
MLLM
VLM
105
2
0
23 Jul 2024
Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target Tokens
Xikang Yang
Xuehai Tang
Fuqing Zhu
Jizhong Han
Songlin Hu
VLM
AAML
69
1
0
19 Jun 2024
Ultrasound Report Generation with Cross-Modality Feature Alignment via Unsupervised Guidance
Jun Li
Tongkun Su
Baoliang Zhao
Faqin Lv
Qiong Wang
Nassir Navab
Yin Hu
Zhongliang Jiang
MedIm
77
6
0
02 Jun 2024
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
72
0
0
01 Jun 2024
Towards Retrieval-Augmented Architectures for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Alessandro Nicolosi
Rita Cucchiara
VLM
75
12
0
21 May 2024
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Yang Yang
89
0
0
26 Mar 2024
An Image Is Worth 1000 Lies: Adversarial Transferability across Prompts on Vision-Language Models
Haochen Luo
Jindong Gu
Fengyuan Liu
Philip Torr
VLM
VPVLM
AAML
84
24
0
14 Mar 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
88
10
0
12 Mar 2024
Scene Graph Aided Radiology Report Generation
Jun Wang
Lixing Zhu
A. Bhalerao
Yulan He
MedIm
78
1
0
08 Mar 2024
Rule-driven News Captioning
Ning Xu
Tingting Zhang
Hongshuo Tian
An-An Liu
95
0
0
08 Mar 2024
Video Relationship Detection Using Mixture of Experts
A. Shaabana
Zahra Gharaee
Paul Fieguth
69
1
0
06 Mar 2024
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng
Yan Xie
Hao Zhang
Chiyu Chen
Zhengjue Wang
Boli Chen
VLM
86
15
0
06 Mar 2024
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
Longtian Qiu
Shan Ning
Xuming He
VLM
72
4
0
04 Jan 2024
Improving Image Captioning via Predicting Structured Concepts
Ting Wang
Weidong Chen
Yuanhe Tian
Yan Song
Zhendong Mao
79
8
0
14 Nov 2023
Improving Vision-and-Language Reasoning via Spatial Relations Modeling
Cheng Yang
Rui Xu
Ye Guo
Peixiang Huang
Yiru Chen
Wenkui Ding
Zhongyuan Wang
Hong Zhou
LRM
59
6
0
09 Nov 2023
V
D
\mathbb{VD}
VD
-
G
R
\mathbb{GR}
GR
: Boosting
V
\mathbb{V}
V
isual
D
\mathbb{D}
D
ialog with Cascaded Spatial-Temporal Multi-Modal
G
R
\mathbb{GR}
GR
aphs
Adnen Abdessaied
Lei Shi
Andreas Bulling
3DH
54
4
0
25 Oct 2023
Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA
Sheng Zhou
Dan Guo
Jia Li
Xun Yang
Ming Wang
88
14
0
13 Oct 2023
A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation
Rashid Khan
Bingding Huang
Haseeb Hassan
Asim Zaman
Z. Ye
39
2
0
11 Oct 2023
Predicate Classification Using Optimal Transport Loss in Scene Graph Generation
Sorachi Kurita
Satoshi Oyama
Itsuki Noda
OT
64
0
0
19 Sep 2023
RepSGG: Novel Representations of Entities and Relationships for Scene Graph Generation
Hengyue Liu
B. Bhanu
106
3
0
06 Sep 2023
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
Manuele Barraco
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
90
20
0
23 Aug 2023
CgT-GAN: CLIP-guided Text GAN for Image Captioning
Jiarui Yu
Haoran Li
Y. Hao
B. Zhu
Tong Xu
Xiangnan He
VLM
CLIP
65
13
0
23 Aug 2023
RLIPv2: Fast Scaling of Relational Language-Image Pre-training
Hangjie Yuan
Shiwei Zhang
Xiang Wang
Samuel Albanie
Yining Pan
Tao Feng
Jianwen Jiang
Dong Ni
Yingya Zhang
Deli Zhao
VLM
79
40
0
18 Aug 2023
Video Captioning with Aggregated Features Based on Dual Graphs and Gated Fusion
Yutao Jin
Bin Liu
Jing Wang
80
1
0
13 Aug 2023
A Comprehensive Analysis of Real-World Image Captioning and Scene Identification
Sai Suprabhanu Nallapaneni
Subrahmanyam Konakanchi
66
2
0
05 Aug 2023
Balanced Classification: A Unified Framework for Long-Tailed Object Detection
Tianhao Qi
Hongtao Xie
P. Li
Jiannan Ge
Yongdong Zhang
139
13
0
04 Aug 2023
Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering
Xinyue Hu
Lin Gu
Qi A. An
Mengliang Zhang
Liangchen Liu
Kazuma Kobayashi
Tatsuya Harada
Ronald M. Summers
Yingying Zhu
MedIm
78
31
0
22 Jul 2023
Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning
Zijie Song
Zhenzhen Hu
Yuanen Zhou
Ye Zhao
Richang Hong
Meng Wang
53
3
0
19 Jul 2023
Manga109Dialog: A Large-scale Dialogue Dataset for Comics Speaker Detection
Yingxuan Li
Kiyoharu Aizawa
Yusuke Matsui
58
14
0
30 Jun 2023
Improving Reference-based Distinctive Image Captioning with Contrastive Rewards
Yangjun Mao
Jun Xiao
Dong Zhang
Meng Cao
Jian Shao
Yueting Zhuang
Long Chen
EGVM
72
9
0
25 Jun 2023
Single-Stage Visual Relationship Learning using Conditional Queries
Alakh Desai
Tz-Ying Wu
Subarna Tripathi
Nuno Vasconcelos
86
7
0
09 Jun 2023
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining
Emanuele Bugliarello
Aida Nematzadeh
Lisa Anne Hendricks
SSL
100
5
0
23 May 2023
A request for clarity over the End of Sequence token in the Self-Critical Sequence Training
J. Hu
Roberto Cavicchioli
Alessandro Capotondi
73
7
0
20 May 2023
1
2
3
4
5
6
7
Next