ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1607.08822
  4. Cited By
SPICE: Semantic Propositional Image Caption Evaluation

SPICE: Semantic Propositional Image Caption Evaluation

29 July 2016
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
    EGVM
ArXiv (abs)PDFHTML

Papers citing "SPICE: Semantic Propositional Image Caption Evaluation"

50 / 949 papers shown
Title
MeaCap: Memory-Augmented Zero-shot Image Captioning
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng
Yan Xie
Hao Zhang
Chiyu Chen
Zhengjue Wang
Boli Chen
VLM
86
15
0
06 Mar 2024
Neural Image Compression with Text-guided Encoding for both Pixel-level
  and Perceptual Fidelity
Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity
Hagyeong Lee
Minkyu Kim
Jun-Hyuk Kim
Seungeon Kim
Dokwan Oh
Jaeho Lee
DiffM
90
6
0
05 Mar 2024
DECIDER: A Dual-System Rule-Controllable Decoding Framework for Language Generation
DECIDER: A Dual-System Rule-Controllable Decoding Framework for Language Generation
Chen Xu
Tian Lan
Changlong Yu
Wei Wang
Jun Gao
...
Qunxi Dong
Kun Qian
Piji Li
Wei Bi
Bin Hu
82
1
0
04 Mar 2024
Polos: Multimodal Metric Learning from Human Feedback for Image
  Captioning
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada
Kanta Kaneda
Daichi Saito
Komei Sugiura
89
30
0
28 Feb 2024
Vision Language Model-based Caption Evaluation Method Leveraging Visual
  Context Extraction
Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction
Koki Maeda
Shuhei Kurita
Taiki Miyanishi
Naoaki Okazaki
56
2
0
28 Feb 2024
EDTC: enhance depth of text comprehension in automated audio captioning
EDTC: enhance depth of text comprehension in automated audio captioning
Liwen Tan
Yin Cao
Yi Zhou
70
0
0
27 Feb 2024
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
Yichi Zhang
Ziqiao Ma
Xiaofeng Gao
Suhaila Shakiah
Qiaozi Gao
Joyce Chai
MLLMVLM
133
47
0
26 Feb 2024
AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D
  Talking Face Generation
AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation
Yasheng Sun
Wenqing Chu
Hang Zhou
Kaisiyuan Wang
Hideki Koike
79
4
0
25 Feb 2024
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Minsu Kim
Jee-weon Jung
Hyeongseop Rha
Soumi Maiti
Siddhant Arora
Xuankai Chang
Shinji Watanabe
Y. Ro
102
7
0
25 Feb 2024
Distinctive Image Captioning: Leveraging Ground Truth Captions in CLIP
  Guided Reinforcement Learning
Distinctive Image Captioning: Leveraging Ground Truth Captions in CLIP Guided Reinforcement Learning
Antoine Chaffin
Ewa Kijak
Vincent Claveau
74
0
0
21 Feb 2024
MORE: Multi-mOdal REtrieval Augmented Generative Commonsense Reasoning
MORE: Multi-mOdal REtrieval Augmented Generative Commonsense Reasoning
Wanqing Cui
Keping Bi
Jiafeng Guo
Xueqi Cheng
SyDaReLMRALMLRM
97
9
0
21 Feb 2024
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot
  Interaction
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction
Jie Xu
Hanbo Zhang
Xinghang Li
Huaping Liu
Xuguang Lan
Tao Kong
LM&Ro
85
3
0
19 Feb 2024
Cobra Effect in Reference-Free Image Captioning Metrics
Cobra Effect in Reference-Free Image Captioning Metrics
Zheng Ma
Changxin Wang
Yawen Ouyang
Fei Zhao
Jianbing Zhang
Shujian Huang
Jiajun Chen
80
2
0
18 Feb 2024
ProtChatGPT: Towards Understanding Proteins with Large Language Models
ProtChatGPT: Towards Understanding Proteins with Large Language Models
Chao Wang
Hehe Fan
Ruijie Quan
Yi Yang
105
16
0
15 Feb 2024
A Systematic Review of Data-to-Text NLG
A Systematic Review of Data-to-Text NLG
Chinonso Osuji
Thiago Castro Ferreira
Brian Davis
58
2
0
13 Feb 2024
MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and
  Instruction Tuning
MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning
Hang Zhao
Yifei Xin
Zhesong Yu
Bilei Zhu
Lu Lu
Zejun Ma
AuLLM
93
4
0
12 Feb 2024
Open-ended VQA benchmarking of Vision-Language models by exploiting
  Classification datasets and their semantic hierarchy
Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy
Simon Ging
M. A. Bravo
Thomas Brox
VLM
148
12
0
11 Feb 2024
CIC: A Framework for Culturally-Aware Image Captioning
CIC: A Framework for Culturally-Aware Image Captioning
Youngsik Yun
Jihie Kim
VLM
130
6
0
08 Feb 2024
Multimodal Rationales for Explainable Visual Question Answering
Multimodal Rationales for Explainable Visual Question Answering
Kun Li
G. Vosselman
Michael Ying Yang
130
2
0
06 Feb 2024
SymbolicAI: A framework for logic-based approaches combining generative
  models and solvers
SymbolicAI: A framework for logic-based approaches combining generative models and solvers
Marius-Constantin Dinu
Claudiu Leoveanu-Condrei
Markus Holzleitner
Werner Zellinger
Sepp Hochreiter
82
11
0
01 Feb 2024
SCO-VIST: Social Interaction Commonsense Knowledge-based Visual
  Storytelling
SCO-VIST: Social Interaction Commonsense Knowledge-based Visual Storytelling
Eileen Wang
S. Han
Josiah Poon
52
5
0
01 Feb 2024
Common Sense Reasoning for Deepfake Detection
Common Sense Reasoning for Deepfake Detection
Yue Zhang
Ben Colman
Xiao Guo
Ali Shahriyari
Gaurav Bharaj
143
35
0
31 Jan 2024
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for
  Automated Audio Captioning
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
Jaeyeon Kim
Jaeyoon Jung
Jinjoo Lee
Sang Hoon Woo
CLIPVLM
72
25
0
31 Jan 2024
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth
  observation data
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data
Chenhui Zhang
Sherrie Wang
101
25
0
31 Jan 2024
A Survey on Data Augmentation in Large Model Era
A Survey on Data Augmentation in Large Model Era
Yue Zhou
Chenlu Guo
Xu Wang
Yi-Ju Chang
Yuan Wu
LM&MAVLM
128
27
0
27 Jan 2024
Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal
  Data
Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data
Yuhui Zhang
Elaine Sui
Serena Yeung-Levy
85
10
0
16 Jan 2024
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via
  Text-Only Training
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
Longtian Qiu
Shan Ning
Xuming He
VLM
70
4
0
04 Jan 2024
Video Understanding with Large Language Models: A Survey
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Chenliang Xu
Jiebo Luo
Chenliang Xu
VLM
213
100
0
29 Dec 2023
Towards Consistent Language Models Using Declarative Constraints
Towards Consistent Language Models Using Declarative Constraints
Jasmin Mousavi
Arash Termehchy
HILMALM
62
2
0
24 Dec 2023
Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image
  Captioning
Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image Captioning
Zhiyue Liu
Jinyuan Liu
Fanrong Ma
CLIPVLM
78
12
0
14 Dec 2023
ToViLaG: Your Visual-Language Generative Model is Also An Evildoer
ToViLaG: Your Visual-Language Generative Model is Also An Evildoer
Xinpeng Wang
Xiaoyuan Yi
Han Jiang
Shanlin Zhou
Zhihua Wei
Xing Xie
68
15
0
13 Dec 2023
OT-Attack: Enhancing Adversarial Transferability of Vision-Language
  Models via Optimal Transport Optimization
OT-Attack: Enhancing Adversarial Transferability of Vision-Language Models via Optimal Transport Optimization
Dongchen Han
Xiaojun Jia
Yang Bai
Jindong Gu
Yang Liu
Xiaochun Cao
VLM
86
26
0
07 Dec 2023
Towards Knowledge-driven Autonomous Driving
Towards Knowledge-driven Autonomous Driving
Xin Li
Yeqi Bai
Pinlong Cai
Licheng Wen
Daocheng Fu
...
Yikang Li
Botian Shi
Yong-Jin Liu
Liang He
Yu Qiao
113
28
0
07 Dec 2023
Mitigating Open-Vocabulary Caption Hallucinations
Mitigating Open-Vocabulary Caption Hallucinations
Assaf Ben-Kish
Moran Yanuka
Morris Alper
Raja Giryes
Hadar Averbuch-Elor
MLLMVLM
115
6
0
06 Dec 2023
Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
Brian Gordon
Yonatan Bitton
Yonatan Shafir
Roopal Garg
Xi Chen
Dani Lischinski
Daniel Cohen-Or
Idan Szpektor
82
12
0
05 Dec 2023
Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image
  Captioning
Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
Cong Yang
Zuchao Li
Lefei Zhang
62
25
0
02 Dec 2023
Segment and Caption Anything
Segment and Caption Anything
Xiaoke Huang
Jianfeng Wang
Yansong Tang
Zheng Zhang
Han Hu
Jiwen Lu
Lijuan Wang
Zicheng Liu
MLLMVLM
87
21
0
01 Dec 2023
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context
  Learning
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Chaoyi Zhang
Kevin Qinghong Lin
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
VGen
99
32
0
29 Nov 2023
StyleCap: Automatic Speaking-Style Captioning from Speech Based on
  Speech and Language Self-supervised Learning Models
StyleCap: Automatic Speaking-Style Captioning from Speech Based on Speech and Language Self-supervised Learning Models
Kazuki Yamauchi
Yusuke Ijima
Yuki Saito
66
9
0
28 Nov 2023
DECap: Towards Generalized Explicit Caption Editing via Diffusion
  Mechanism
DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism
Zhen Wang
Xinyun Jiang
Jun Xiao
Tao Chen
Long Chen
DiffM
45
1
0
25 Nov 2023
From Wrong To Right: A Recursive Approach Towards Vision-Language
  Explanation
From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation
Jiaxin Ge
Sanjay Subramanian
Trevor Darrell
Boyi Li
LRM
94
4
0
21 Nov 2023
InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal
  Large Language Models
InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Xiaotian Han
Quanzeng You
Yongfei Liu
Wentao Chen
Huangjie Zheng
...
Yiqi Wang
Bohan Zhai
Jianbo Yuan
Heng Wang
Hongxia Yang
ReLMLRMELM
157
10
0
20 Nov 2023
Trustworthy Large Models in Vision: A Survey
Trustworthy Large Models in Vision: A Survey
Ziyan Guo
Li Xu
Jun Liu
MU
126
0
0
16 Nov 2023
Zero-shot audio captioning with audio-language model guidance and audio
  context keywords
Zero-shot audio captioning with audio-language model guidance and audio context keywords
Leonard Salewski
Stefan Fauth
A. Sophia Koepke
Zeynep Akata
49
11
0
14 Nov 2023
Improving Image Captioning via Predicting Structured Concepts
Improving Image Captioning via Predicting Structured Concepts
Ting Wang
Weidong Chen
Yuanhe Tian
Yan Song
Zhendong Mao
79
8
0
14 Nov 2023
Qwen-Audio: Advancing Universal Audio Understanding via Unified
  Large-Scale Audio-Language Models
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
139
352
0
14 Nov 2023
Zero-shot Translation of Attention Patterns in VQA Models to Natural
  Language
Zero-shot Translation of Attention Patterns in VQA Models to Natural Language
Leonard Salewski
A. Sophia Koepke
Hendrik P. A. Lensch
Zeynep Akata
63
2
0
08 Nov 2023
JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures
  for Image Captioning Models
JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures for Image Captioning Models
Yuiga Wada
Kanta Kaneda
Komei Sugiura
55
4
0
07 Nov 2023
Multitask Multimodal Prompted Training for Interactive Embodied Task
  Completion
Multitask Multimodal Prompted Training for Interactive Embodied Task Completion
Georgios Pantazopoulos
Malvina Nikandrou
Amit Parekh
Bhathiya Hemanthage
Arash Eshghi
Ioannis Konstas
Verena Rieser
Oliver Lemon
Alessandro Suglia
LM&Ro
72
7
0
07 Nov 2023
LLM4Drive: A Survey of Large Language Models for Autonomous Driving
LLM4Drive: A Survey of Large Language Models for Autonomous Driving
Zhenjie Yang
Xiaosong Jia
Hongyang Li
Junchi Yan
ELM
134
120
0
02 Nov 2023
Previous
123456...171819
Next