Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1607.08822
Cited By
SPICE: Semantic Propositional Image Caption Evaluation
29 July 2016
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SPICE: Semantic Propositional Image Caption Evaluation"
50 / 949 papers shown
Title
Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models
Chun-Yi Kuan
Wei-Ping Huang
Hung-yi Lee
AuLLM
68
11
0
12 Jun 2024
ROADWork Dataset: Learning to Recognize, Observe, Analyze and Drive Through Work Zones
Anurag Ghosh
R. Tamburo
Shen Zheng
Juan R. Alvarez-Padilla
Hailiang Zhu
Michael Cardei
Nicholas Dunn
Christoph Mertz
Srinivasa G. Narasimhan
88
1
0
11 Jun 2024
Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions
Renjie Pi
Jianshu Zhang
Jipeng Zhang
Boyao Wang
Zhekai Chen
Tong Zhang
3DV
82
24
0
11 Jun 2024
Zero-Shot Audio Captioning Using Soft and Hard Prompts
Yiming Zhang
Xuenan Xu
Ruoyi Du
Haohe Liu
Yuan Dong
Zheng-Hua Tan
Wenwu Wang
Zhanyu Ma
VLM
72
4
0
10 Jun 2024
FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model
Yebin Lee
Imseong Park
Myungjoo Kang
73
18
0
10 Jun 2024
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
Asmar Nadeem
Faegheh Sardari
R. Dawes
Syed Sameed Husain
Adrian Hilton
Armin Mustafa
102
4
0
10 Jun 2024
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang
Jiawei Kong
Wenbo Yu
Bin Chen
Jiawei Li
Hao Wu
Ke Xu
Ke Xu
AAML
VLM
129
13
0
08 Jun 2024
MGIMM: Multi-Granularity Instruction Multimodal Model for Attribute-Guided Remote Sensing Image Detailed Description
Cong Yang
Zuchao Li
Lefei Zhang
69
2
0
07 Jun 2024
Multi-layer Learnable Attention Mask for Multimodal Tasks
Wayner Barrios
SouYoung Jin
66
1
0
04 Jun 2024
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
60
0
0
01 Jun 2024
Artemis: Towards Referential Understanding in Complex Videos
Jihao Qiu
Yuan Zhang
Xi Tang
Lingxi Xie
Tianren Ma
Pengyu Yan
David Doermann
Qixiang Ye
Yunjie Tian
VLM
VGen
83
10
0
01 Jun 2024
Context-aware Difference Distilling for Multi-change Captioning
Yunbin Tu
Liang-Sheng Li
Li Su
Zheng-Jun Zha
Chenggang Yan
Qin Huang
81
9
0
31 May 2024
Faithful Chart Summarization with ChaTS-Pi
Syrine Krichene
Francesco Piccinno
Fangyu Liu
Julian Martin Eisenschlos
113
2
0
29 May 2024
Benchmarking and Improving Detail Image Caption
Hongyuan Dong
Jiawen Li
Bohong Wu
Jiacong Wang
Yuan Zhang
Haoyuan Guo
VLM
MLLM
101
31
0
29 May 2024
MindSemantix: Deciphering Brain Visual Experiences with a Brain-Language Model
Ziqi Ren
Jie Li
Xuetong Xue
Xin Li
Fan Yang
Zhicheng Jiao
Xinbo Gao
94
3
0
29 May 2024
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification
Laura Fieback
Jakob Spiegelberg
Hanno Gottschalk
MLLM
227
5
0
29 May 2024
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
Yunqi Zhang
Songda Li
Chunyuan Deng
Luyi Wang
Hui Zhao
109
0
0
27 May 2024
Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges
Jonas Becker
Jan Philip Wahle
Bela Gipp
Terry Ruas
115
11
0
24 May 2024
Towards Retrieval-Augmented Architectures for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Alessandro Nicolosi
Rita Cucchiara
VLM
73
12
0
21 May 2024
MICap: A Unified Model for Identity-aware Movie Descriptions
Haran Raajesh
Naveen Reddy Desanur
Zeeshan Khan
Makarand Tapaswi
72
4
0
19 May 2024
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Xianzheng Ma
Yash Bhalgat
Brandon Smart
Shuai Chen
Xinghui Li
...
Matthias Nießner
Ian D Reid
Angel X. Chang
Iro Laina
V. Prisacariu
LRM
127
21
0
16 May 2024
CinePile: A Long Video Question Answering Dataset and Benchmark
Ruchit Rawal
Khalid Saifullah
Ronen Basri
David Jacobs
Gowthami Somepalli
Tom Goldstein
101
57
0
14 May 2024
The Lost Melody: Empirical Observations on Text-to-Video Generation From A Storytelling Perspective
Andrew Shin
Yusuke Mori
Kunitake Kaneko
VGen
EGVM
51
2
0
13 May 2024
Technical Report of NICE Challenge at CVPR 2024: Caption Re-ranking Evaluation Using Ensembled CLIP and Consensus Scores
Kiyoon Jeong
Woojun Lee
Woongchan Nam
Minjeong Ma
Pilsung Kang
60
2
0
02 May 2024
Exploring the Distinctiveness and Fidelity of the Descriptions Generated by Large Vision-Language Models
Yuhang Huang
Zihan Wu
Chongyang Gao
Jiawei Peng
Xu Yang
73
2
0
26 Apr 2024
Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning
Tianhui Zhang
Bei Peng
Danushka Bollegala
LRM
53
10
0
25 Apr 2024
AutoAD III: The Prequel -- Back to the Pixels
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
128
22
0
22 Apr 2024
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
Yuchi Wang
Shuhuai Ren
Rundong Gao
Linli Yao
Qingyan Guo
Kaikai An
Jianhong Bai
Xu Sun
DiffM
VLM
106
9
0
16 Apr 2024
Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases
Kai Chen
Yanze Li
Wenhua Zhang
Yanxin Liu
Pengxiang Li
...
Xinhai Zhao
Zhenguo Li
Dit-Yan Yeung
Huchuan Lu
Xu Jia
ELM
MLLM
116
37
0
16 Apr 2024
AIGeN: An Adversarial Approach for Instruction Generation in VLN
Niyati Rawal
Roberto Bigazzi
Lorenzo Baraldi
Rita Cucchiara
GAN
79
4
0
15 Apr 2024
Bridging Vision and Language Spaces with Assignment Prediction
Jungin Park
Jiyoung Lee
Kwanghoon Sohn
VLM
92
7
0
15 Apr 2024
UMBRAE: Unified Multimodal Brain Decoding
Weihao Xia
Raoul de Charette
Cengiz Öztireli
Jing-Hao Xue
74
9
0
10 Apr 2024
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
Matteo Farina
Massimiliano Mancini
Elia Cunegatti
Gaowen Liu
Giovanni Iacca
Elisa Ricci
VLM
71
2
0
08 Apr 2024
Would Deep Generative Models Amplify Bias in Future Models?
Tianwei Chen
Yusuke Hirota
Mayu Otani
Noa Garcia
Yuta Nakashima
77
15
0
04 Apr 2024
ALOHa: A New Measure for Hallucination in Captioning Models
Suzanne Petryk
David M. Chan
Anish Kachinthaya
Haodi Zou
John F. Canny
Joseph E. Gonzalez
Trevor Darrell
HILM
104
17
0
03 Apr 2024
CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes
Paritosh Parmar
Eric Peh
Ruirui Chen
Ting En Lam
Yuhan Chen
Elston Tan
Basura Fernando
CML
89
7
0
01 Apr 2024
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
Rongjie Li
Songyang Zhang
Dahua Lin
Kai-xiang Chen
Xuming He
VLM
111
19
0
01 Apr 2024
Semantic Map-based Generation of Navigation Instructions
Chengzu Li
Chao Zhang
Simone Teufel
R. Doddipatla
Svetlana Stoyanchev
68
2
0
28 Mar 2024
Text Data-Centric Image Captioning with Interactive Prompts
Yiyu Wang
Hao Luo
Jungang Xu
Yingfei Sun
Fan Wang
VLM
71
0
0
28 Mar 2024
ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds
Gijs Wijngaard
Elia Formisano
Bruno L. Giordano
M. Dumontier
91
3
0
27 Mar 2024
Automated Report Generation for Lung Cytological Images Using a CNN Vision Classifier and Multiple-Transformer Text Decoders: Preliminary Study
Atsushi Teramoto
Ayano Michiba
Yuka Kiriyama
Tetsuya Tsukamoto
K. Imaizumi
H. Fujita
MedIm
47
1
0
26 Mar 2024
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Yang Yang
87
0
0
26 Mar 2024
Investigating Use Cases of AI-Powered Scene Description Applications for Blind and Low Vision People
Ricardo E Gonzalez Penuela
Jazmin Collins
Shiri Azenkot
Cynthia L. Bennett
77
26
0
22 Mar 2024
Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination
Dingchen Yang
Bowen Cao
Guang Chen
Changjun Jiang
87
11
0
21 Mar 2024
Improved Baselines for Data-efficient Perceptual Augmentation of LLMs
Théophane Vallaeys
Mustafa Shukor
Matthieu Cord
Jakob Verbeek
98
13
0
20 Mar 2024
Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory
Sensen Gao
Xiaojun Jia
Xuhong Ren
Ivor Tsang
Qing Guo
AAML
86
19
0
19 Mar 2024
TARN-VIST: Topic Aware Reinforcement Network for Visual Storytelling
Weiran Chen
Xin Li
Jiaqi Su
Guiqian Zhu
Ying Li
Yi Ji
Chunping Liu
60
1
0
18 Mar 2024
Improving Adversarial Transferability of Vision-Language Pre-training Models through Collaborative Multimodal Interaction
Jiyuan Fu
Zhaoyu Chen
Kaixun Jiang
Haijing Guo
Jiafeng Wang
Shuyong Gao
Wenqiang Zhang
VLM
AAML
81
4
0
16 Mar 2024
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Haiyang Wang
Hao Tang
Li Jiang
Shaoshuai Shi
Muhammad Ferjad Naeem
Hongsheng Li
Bernt Schiele
Liwei Wang
VLM
99
13
0
14 Mar 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
86
10
0
12 Mar 2024
Previous
1
2
3
4
5
...
17
18
19
Next