Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1607.08822
Cited By
SPICE: Semantic Propositional Image Caption Evaluation
29 July 2016
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SPICE: Semantic Propositional Image Caption Evaluation"
50 / 949 papers shown
Title
AutoAD: Movie Description in Context
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
77
35
0
29 Mar 2023
Hierarchical Video-Moment Retrieval and Step-Captioning
Abhaysinh Zala
Jaemin Cho
Satwik Kottur
Xilun Chen
Barlas Ouguz
Yasher Mehdad
Joey Tianyi Zhou
3DV
87
54
0
29 Mar 2023
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
Sara Sarto
Manuele Barraco
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
81
60
0
21 Mar 2023
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
Yushi Hu
Benlin Liu
Jungo Kasai
Yizhong Wang
Mari Ostendorf
Ranjay Krishna
Noah A. Smith
EGVM
79
239
0
21 Mar 2023
GNNFormer: A Graph-based Framework for Cytopathology Report Generation
Yangqiaoyu Zhou
Kai-Lang Yao
Wusuo Li
MedIm
39
1
0
17 Mar 2023
Lana: A Language-Capable Navigator for Instruction Following and Generation
Xiaohan Wang
Wenguan Wang
Jiayi Shao
Yi Yang
LLMAG
LM&Ro
98
41
0
15 Mar 2023
PR-MCS: Perturbation Robust Metric for MultiLingual Image Captioning
Yongil Kim
Yerin Hwang
Hyeongu Yun
Seunghyun Yoon
Trung Bui
Kyomin Jung
56
6
0
15 Mar 2023
FactReranker: Fact-guided Reranker for Faithful Radiology Report Summarization
Qianqian Xie
Jiayu Zhou
Yifan Peng
Fei Wang
HILM
MedIm
103
11
0
15 Mar 2023
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation
Bang-ju Yang
Fenglin Liu
Yuexian Zou
Xian Wu
Yaowei Wang
David Clifton
88
9
0
11 Mar 2023
Learning Combinatorial Prompts for Universal Controllable Image Captioning
Zhen Wang
Jun Xiao
Yueting Zhuang
Fei Gao
Jian Shao
Long Chen
106
5
0
11 Mar 2023
Refined Vision-Language Modeling for Fine-grained Multi-modal Pre-training
Lisai Zhang
Qingcai Chen
Zhijian Chen
Yunpeng Han
Zhonghua Li
Bo Zhao
VLM
52
1
0
09 Mar 2023
Interpretable Visual Question Answering Referring to Outside Knowledge
He Zhu
Ren Togo
Takahiro Ogawa
Miki Haseyama
56
0
0
08 Mar 2023
Graph Neural Networks in Vision-Language Image Understanding: A Survey
Henry Senior
Greg Slabaugh
Shanxin Yuan
Luca Rossi
GNN
78
21
0
07 Mar 2023
Neighborhood Contrastive Transformer for Change Captioning
Yunbin Tu
Liang Li
Li Su
Kelvin Lu
Qin Huang
ViT
81
17
0
06 Mar 2023
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training
Wei Li
Linchao Zhu
Longyin Wen
Yi Yang
VLM
107
89
0
06 Mar 2023
Comparative study of Transformer and LSTM Network with attention mechanism on Image Captioning
Pranav Dandwate
Chaitanya Shahane
V. Jagtap
Shridevi C. Karande
96
9
0
05 Mar 2023
ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing
Zequn Zeng
Hao Zhang
Zhengjue Wang
Ruiying Lu
Dongsheng Wang
Bo Chen
BDL
DiffM
59
33
0
04 Mar 2023
Language Is Not All You Need: Aligning Perception with Language Models
Shaohan Huang
Li Dong
Wenhui Wang
Y. Hao
Saksham Singhal
...
Johan Bjorck
Vishrav Chaudhary
Subhojit Som
Xia Song
Furu Wei
VLM
LRM
MLLM
135
566
0
27 Feb 2023
Learning Visual Representations via Language-Guided Sampling
Mohamed El Banani
Karan Desai
Justin Johnson
SSL
VLM
110
28
0
23 Feb 2023
Test-Time Distribution Normalization for Contrastively Learned Vision-language Models
Yi Zhou
Juntao Ren
Fengyu Li
Ramin Zabih
Ser-Nam Lim
VLM
96
15
0
22 Feb 2023
Retrieval-augmented Image Captioning
R. Ramos
Desmond Elliott
Bruno Martins
VLM
80
29
0
16 Feb 2023
Towards Local Visual Modeling for Image Captioning
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Rongrong Ji
ViT
92
78
0
13 Feb 2023
Stacked Cross-modal Feature Consolidation Attention Networks for Image Captioning
Mozhgan Pourkeshavarz
Shahabedin Nabavi
Mohsen Moghaddam
M. Shamsfard
79
4
0
08 Feb 2023
KENGIC: KEyword-driven and N-Gram Graph based Image Captioning
Brandon Birmingham
A. Muscat
49
1
0
07 Feb 2023
DEVICE: Depth and Visual Concepts Aware Transformer for OCR-based Image Captioning
Dongsheng Xu
Qingbao Huang
Shuang Feng
Yiru Cai
Feng Shuang
Yi Cai
ViT
VLM
93
1
0
03 Feb 2023
Style-Aware Contrastive Learning for Multi-Style Image Captioning
Yucheng Zhou
Guodong Long
61
23
0
26 Jan 2023
Semi-Supervised Image Captioning by Adversarially Propagating Labeled Data
Dong-Jin Kim
Tae-Hyun Oh
Jinsoo Choi
In So Kweon
SSL
VLM
38
4
0
26 Jan 2023
Towards a Unified Model for Generating Answers and Explanations in Visual Question Answering
Chenxi Whitehouse
Tillman Weyde
Pranava Madhyastha
LRM
91
3
0
25 Jan 2023
Visual Semantic Relatedness Dataset for Image Captioning
Ahmed Sabir
Francesc Moreno-Noguer
Lluís Padró
CoGe
VLM
56
3
0
20 Jan 2023
Embodied Agents for Efficient Exploration and Smart Scene Description
Roberto Bigazzi
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
LM&Ro
66
7
0
17 Jan 2023
Advances in Medical Image Analysis with Vision Transformers: A Comprehensive Review
Reza Azad
Amirhossein Kazerouni
Moein Heidari
Ehsan Khodapanah Aghdam
Amir Molaei
Yiwei Jia
Abin Jose
Rijo Roy
Dorit Merhof
MedIm
ViT
111
186
0
09 Jan 2023
Adaptively Clustering Neighbor Elements for Image-Text Generation
Zihua Wang
Xu Yang
Hanwang Zhang
Haiyang Xu
Mingshi Yan
Feisi Huang
Yu Zhang
VLM
163
0
0
05 Jan 2023
Do DALL-E and Flamingo Understand Each Other?
Hang Li
Jindong Gu
Rajat Koner
Sahand Sharifzadeh
Volker Tresp
MLLM
75
12
0
23 Dec 2022
Benchmarking Spatial Relationships in Text-to-Image Generation
Tejas Gokhale
Hamid Palangi
Besmira Nushi
Vibhav Vineet
Eric Horvitz
Ece Kamar
Chitta Baral
Yezhou Yang
EGVM
114
72
0
20 Dec 2022
MetaCLUE: Towards Comprehensive Visual Metaphors Research
Arjun Reddy Akula
Brenda S. Driscoll
P. Narayana
Soravit Changpinyo
Zhi-xuan Jia
...
Sugato Basu
Leonidas Guibas
William T. Freeman
Yuanzhen Li
Varun Jampani
CLIP
VLM
46
26
0
19 Dec 2022
Efficient Image Captioning for Edge Devices
Ning Wang
Jiangrong Xie
Hangzai Luo
Qinglin Cheng
Jihao Wu
Mingbo Jia
Linlin Li
VLM
CLIP
79
22
0
18 Dec 2022
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations
Björn Plüster
Jakob Ambsdorf
Lukas Braach
Jae Hee Lee
S. Wermter
60
6
0
08 Dec 2022
Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning
Ukyo Honda
Taro Watanabe
Yuji Matsumoto
51
9
0
06 Dec 2022
Semantic-Conditional Diffusion Networks for Image Captioning
Jianjie Luo
Yehao Li
Yingwei Pan
Ting Yao
Jianlin Feng
Hongyang Chao
Tao Mei
DiffM
84
73
0
06 Dec 2022
Towards Generating Diverse Audio Captions via Adversarial Training
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
DiffM
75
2
0
05 Dec 2022
Controllable Image Captioning via Prompting
Ning Wang
Jiahao Xie
Jihao Wu
Mingbo Jia
Linlin Li
54
24
0
04 Dec 2022
Uncertainty-Aware Image Captioning
Zhengcong Fei
Mingyuan Fan
Li Zhu
Junshi Huang
Xiaoming Wei
Xiaolin K. Wei
UQLM
69
12
0
30 Nov 2022
CLID: Controlled-Length Image Descriptions with Limited Data
Elad Hirsch
A. Tal
VLM
3DV
51
4
0
27 Nov 2022
Aesthetically Relevant Image Captioning
Zhipeng Zhong
Fei Zhou
Guoping Qiu
62
9
0
25 Nov 2022
Aligning Source Visual and Target Language Domains for Unpaired Video Captioning
Fenglin Liu
Xian Wu
Chenyu You
Shen Ge
Yuexian Zou
Xu Sun
93
25
0
22 Nov 2022
Exploring Discrete Diffusion Models for Image Captioning
Zixin Zhu
Yixuan Wei
Jianfeng Wang
Zhe Gan
Zheng Zhang
Le Wang
G. Hua
Lijuan Wang
Zicheng Liu
Han Hu
DiffM
VLM
92
23
0
21 Nov 2022
VER: Unifying Verbalizing Entities and Relations
Jie Huang
Kevin Chen-Chuan Chang
78
1
0
20 Nov 2022
A survey on knowledge-enhanced multimodal learning
Maria Lymperaiou
Giorgos Stamou
153
15
0
19 Nov 2022
Impact of visual assistance for automated audio captioning
Wim Boes
Hugo Van hamme
52
1
0
18 Nov 2022
I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision
Sophia Gu
Christopher Clark
Aniruddha Kembhavi
VLM
64
26
0
17 Nov 2022
Previous
1
2
3
...
7
8
9
...
17
18
19
Next