Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1607.08822
Cited By
SPICE: Semantic Propositional Image Caption Evaluation
29 July 2016
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SPICE: Semantic Propositional Image Caption Evaluation"
50 / 949 papers shown
Title
CapsFusion: Rethinking Image-Text Data at Scale
Qiying Yu
Quan-Sen Sun
Xiaosong Zhang
Yufeng Cui
Fan Zhang
Yue Cao
Xinlong Wang
Jingjing Liu
VLM
97
62
0
31 Oct 2023
Video-Helpful Multimodal Machine Translation
Yihang Li
Shuichiro Shimizu
Chenhui Chu
Sadao Kurohashi
Wei Li
67
2
0
31 Oct 2023
Generating Context-Aware Natural Answers for Questions in 3D Scenes
Mohammed Munzer Dwedari
Matthias Niessner
Dave Zhenyu Chen
63
3
0
30 Oct 2023
Are NLP Models Good at Tracing Thoughts: An Overview of Narrative Understanding
Lixing Zhu
Runcong Zhao
Lin Gui
Yulan He
76
5
0
28 Oct 2023
An Early Evaluation of GPT-4V(ision)
Yang Wu
Shilong Wang
Hao Yang
Tian Zheng
Hongbo Zhang
Yanyan Zhao
Bing Qin
MLLM
ELM
53
39
0
25 Oct 2023
Evaluating, Understanding, and Improving Constrained Text Generation for Large Language Models
Xiang Chen
Xiaojun Wan
53
0
0
25 Oct 2023
Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation
Yinjie Lei
Zixuan Wang
Feng Chen
Guoqing Wang
Peng Wang
Yang Yang
93
12
0
24 Oct 2023
CLAIR: Evaluating Image Captions with Large Language Models
David M. Chan
Suzanne Petryk
Joseph E. Gonzalez
Trevor Darrell
John F. Canny
94
20
0
19 Oct 2023
Evaluating the Fairness of Discriminative Foundation Models in Computer Vision
Junaid Ali
Matthäus Kleindessner
F. Wenzel
Kailash Budhathoki
Volkan Cevher
Chris Russell
VLM
109
12
0
18 Oct 2023
Bounding and Filling: A Fast and Flexible Framework for Image Captioning
Zheng Ma
Changxin Wang
Bo Huang
Zi-Yue Zhu
Jianbing Zhang
55
1
0
15 Oct 2023
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
Yiyang Zhou
Chenhang Cui
Jaehong Yoon
Linjun Zhang
Zhun Deng
Chelsea Finn
Mohit Bansal
Huaxiu Yao
MLLM
164
186
0
01 Oct 2023
Self-supervised Cross-view Representation Reconstruction for Change Captioning
Yunbin Tu
Liang Li
Filippos Christianos
Zheng-Jun Zha
Zhibin Li
Qingming Huang
SSL
73
26
0
28 Sep 2023
Targeted Image Data Augmentation Increases Basic Skills Captioning Robustness
Valentin Barriere
Felipe del Rio
Andres Carvallo De Ferari
Carlos Aspillaga
Eugenio Herrera-Berg
Cristian Buc Calderon
DiffM
54
0
0
27 Sep 2023
MindGPT: Interpreting What You See with Non-invasive Brain Recordings
Jiaxuan Chen
Yu Qi
Yueming Wang
Gang Pan
85
6
0
27 Sep 2023
Weakly-supervised Automated Audio Captioning via text only training
Theodoros Kouzelis
Vassilis Katsouros
CLIP
77
7
0
21 Sep 2023
ContextRef: Evaluating Referenceless Metrics For Image Description Generation
Elisa Kreiss
E. Zelikman
Christopher Potts
Nick Haber
73
5
0
21 Sep 2023
Toward Unified Controllable Text Generation via Regular Expression Instruction
Xin Zheng
Hongyu Lin
Xianpei Han
Le Sun
80
5
0
19 Sep 2023
Predicate Classification Using Optimal Transport Loss in Scene Graph Generation
Sorachi Kurita
Satoshi Oyama
Itsuki Noda
OT
64
0
0
19 Sep 2023
Synth-AC: Enhancing Audio Captioning with Synthetic Supervision
Feiyang Xiao
Qiaoxi Zhu
Jian Guan
Xubo Liu
Haohe Liu
Kejia Zhang
Wenwu Wang
52
2
0
18 Sep 2023
Viewpoint Integration and Registration with Vision Language Foundation Model for Image Change Understanding
Xiaonan Lu
Jianlong Yuan
Ruigang Niu
Yuan Hu
Fan Wang
40
2
0
15 Sep 2023
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Minsu Kim
J. Choi
Soumi Maiti
Jeong Hun Yeo
Shinji Watanabe
Y. Ro
VLM
83
6
0
15 Sep 2023
Learning to Predict Concept Ordering for Common Sense Generation
Tianhui Zhang
Danushka Bollegala
Bei Peng
LRM
46
3
0
12 Sep 2023
Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning
Guisheng Liu
Yi Li
Zhengcong Fei
Haiyan Fu
Xiangyang Luo
Yanqing Guo
VLM
DiffM
81
8
0
10 Sep 2023
S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning
Wei Suo
Mengyang Sun
Weisong Liu
Yi-Meng Gao
Peifeng Wang
Yanning Zhang
Qi Wu
LRM
60
7
0
05 Sep 2023
NICE: CVPR 2023 Challenge on Zero-shot Image Captioning
Taehoon Kim
Pyunghwan Ahn
Sangyun Kim
Sihaeng Lee
Mark A Marsden
...
Yujin Wang
Yimu Wang
Tiancheng Gu
Xingchang Lv
Mingmao Sun
VLM
123
6
0
05 Sep 2023
DeViL: Decoding Vision features into Language
Meghal Dani
Isabel Rio-Torto
Stephan Alaniz
Zeynep Akata
VLM
75
8
0
04 Sep 2023
CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding
Etienne Labbé
Thomas Pellegrini
J. Pinquier
74
14
0
01 Sep 2023
Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Language Tasks via Semantic Grounding
Joshua Forster Feinglass
Yezhou Yang
46
2
0
01 Sep 2023
Killing two birds with one stone: Can an audio captioning system also be used for audio-text retrieval?
Etienne Labbé
Thomas Pellegrini
J. Pinquier
66
5
0
29 Aug 2023
Explaining Vision and Language through Graphs of Events in Space and Time
Mihai Masala
Nicolae Cudlenco
Traian Rebedea
Marius Leordeanu
VLM
92
2
0
29 Aug 2023
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
Bang-ju Yang
Fenglin Liu
X. Wu
Yaowei Wang
Xu Sun
Yuexian Zou
VLM
CLIP
78
13
0
25 Aug 2023
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
Manuele Barraco
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
80
20
0
23 Aug 2023
CgT-GAN: CLIP-guided Text GAN for Image Captioning
Jiarui Yu
Haoran Li
Y. Hao
B. Zhu
Tong Xu
Xiangnan He
VLM
CLIP
65
13
0
23 Aug 2023
Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement
Daiki Takeuchi
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
K. Kashino
73
7
0
23 Aug 2023
Explore and Tell: Embodied Visual Captioning in 3D Environments
Anwen Hu
Shizhe Chen
Liang Zhang
Qin Jin
LM&Ro
71
2
0
21 Aug 2023
Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language Tasks
Fawaz Sammani
Nikos Deligiannis
39
5
0
17 Aug 2023
Informative Scene Graph Generation via Debiasing
Lianli Gao
Xinyu Lyu
Yuyu Guo
Yuxuan Hu
Yuanyou Li
Lu Xu
Hengtao Shen
Jingkuan Song
63
5
0
10 Aug 2023
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Weiyun Wang
Min Shi
Qingyun Li
Wen Wang
Zhenhang Huang
...
Zhiguo Cao
Yushi Chen
Tong Lu
Jifeng Dai
Yu Qiao
LRM
MLLM
130
88
0
03 Aug 2023
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
Junjie Fei
Teng Wang
Jinrui Zhang
Zhenyu He
Chengjie Wang
Feng Zheng
VLM
76
36
0
31 Jul 2023
Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence Generation
Zhiyuan Li
Dongnan Liu
Heng Wang
Chaoyi Zhang
Weidong (Tom) Cai
RALM
59
0
0
27 Jul 2023
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models
Dong Lu
Zhiqiang Wang
Teng Wang
Weili Guan
Hongchang Gao
Feng Zheng
AAML
118
76
0
26 Jul 2023
Kefa: A Knowledge Enhanced and Fine-grained Aligned Speaker for Navigation Instruction Generation
Haitian Zeng
Xiaohan Wang
Wenguan Wang
Yi Yang
78
7
0
25 Jul 2023
Improving Multimodal Datasets with Image Captioning
Thao Nguyen
S. Gadre
Gabriel Ilharco
Sewoong Oh
Ludwig Schmidt
VLM
99
74
0
19 Jul 2023
Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments
R. Liu
Jiaming Zhang
Kunyu Peng
Junwei Zheng
Ke Cao
Yufan Chen
Kailun Yang
Rainer Stiefelhagen
64
17
0
15 Jul 2023
Linear Alignment of Vision-language Models for Image Captioning
Fabian Paischer
M. Hofmarcher
Sepp Hochreiter
Thomas Adler
CLIP
VLM
170
0
0
10 Jul 2023
Transformers in Healthcare: A Survey
Subhash Nerella
S. Bandyopadhyay
Jiaqing Zhang
Miguel Contreras
Scott Siegel
...
Jessica Sena
B. Shickel
A. Bihorac
Kia Khezeli
Parisa Rashidi
MedIm
AI4CE
96
36
0
30 Jun 2023
ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles
Haoqin Tu
Bowen Yang
Xianfeng Zhao
61
6
0
29 Jun 2023
Self-Supervised Image Captioning with CLIP
Chuanyang Jin
VLM
SSL
73
2
0
26 Jun 2023
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Fuxiao Liu
Kevin Qinghong Lin
Linjie Li
Jianfeng Wang
Yaser Yacoob
Lijuan Wang
VLM
MLLM
153
287
0
26 Jun 2023
Improving Reference-based Distinctive Image Captioning with Contrastive Rewards
Yangjun Mao
Jun Xiao
Dong Zhang
Meng Cao
Jian Shao
Yueting Zhuang
Long Chen
EGVM
67
9
0
25 Jun 2023
Previous
1
2
3
...
5
6
7
...
17
18
19
Next