ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1607.08822
  4. Cited By
SPICE: Semantic Propositional Image Caption Evaluation

SPICE: Semantic Propositional Image Caption Evaluation

29 July 2016
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
    EGVM
ArXiv (abs)PDFHTML

Papers citing "SPICE: Semantic Propositional Image Caption Evaluation"

50 / 949 papers shown
Title
HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback
  Learning with Vision-enhanced Penalty Decoding
HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding
Fan Yuan
Chi Qin
Xiaogang Xu
Piji Li
VLMMLLM
71
5
0
30 Sep 2024
Decoding the Echoes of Vision from fMRI: Memory Disentangling for Past
  Semantic Information
Decoding the Echoes of Vision from fMRI: Memory Disentangling for Past Semantic Information
Runze Xia
Congchi Yin
Piji Li
72
1
0
30 Sep 2024
TROPE: TRaining-Free Object-Part Enhancement for Seamlessly Improving
  Fine-Grained Zero-Shot Image Captioning
TROPE: TRaining-Free Object-Part Enhancement for Seamlessly Improving Fine-Grained Zero-Shot Image Captioning
Joshua Forster Feinglass
Yezhou Yang
58
0
0
30 Sep 2024
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image
  Captioning
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
Kazuki Matsuda
Yuiga Wada
Komei Sugiura
52
1
0
28 Sep 2024
Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs
Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs
Yanyuan Qiao
Wenqi Lyu
Hui Wang
Zixu Wang
Zerui Li
Yuan Zhang
Mingkui Tan
Qi Wu
LRM
96
6
0
27 Sep 2024
Evaluation of Large Language Models for Summarization Tasks in the
  Medical Domain: A Narrative Review
Evaluation of Large Language Models for Summarization Tasks in the Medical Domain: A Narrative Review
Emma Croxford
Yanjun Gao
Nicholas Pellegrino
Karen K. Wong
Graham Wills
Elliot First
Frank J. Liao
Cherodeep Goswami
Brian Patterson
Majid Afshar
HILMELMLM&MA
124
1
0
26 Sep 2024
Inferring Alt-text For UI Icons With Large Language Models During App
  Development
Inferring Alt-text For UI Icons With Large Language Models During App Development
Sabrina Haque
Christoph Csallner
VLM
65
0
0
26 Sep 2024
IFCap: Image-like Retrieval and Frequency-based Entity Filtering for
  Zero-shot Captioning
IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning
Soeun Lee
Si-Woo Kim
Taewhan Kim
Dong-Jin Kim
CLIPVLM
54
0
0
26 Sep 2024
Domain-Independent Automatic Generation of Descriptive Texts for
  Time-Series Data
Domain-Independent Automatic Generation of Descriptive Texts for Time-Series Data
Kota Dohi
Aoi Ito
Harsh Purohit
Tomoya Nishida
Takashi Endo
Yohei Kawaguchi
50
3
0
25 Sep 2024
CLAIR-A: Leveraging Large Language Models to Judge Audio Captions
CLAIR-A: Leveraging Large Language Models to Judge Audio Captions
Tsung-Han Wu
Joseph E. Gonzalez
Trevor Darrell
David M. Chan
127
2
0
19 Sep 2024
KALE: An Artwork Image Captioning System Augmented with Heterogeneous
  Graph
KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph
Yanbei Jiang
Krista A. Ehinger
Jey Han Lau
SLR
68
1
0
17 Sep 2024
Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large
  Language Models
Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models
Bingchen Liu
Ehsan Akhgari
Alexander Visheratin
Aleks Kamko
Linmiao Xu
Shivam Shrirao
Joao Souza
Suhail Doshi
Daiqing Li
Daiqing Li
DiffMMLLM
105
60
0
16 Sep 2024
Video Token Sparsification for Efficient Multimodal LLMs in Autonomous
  Driving
Video Token Sparsification for Efficient Multimodal LLMs in Autonomous Driving
Yunsheng Ma
Amr Abdelraouf
Rohit Gupta
Ziran Wang
Kyungtae Han
104
3
0
16 Sep 2024
Towards Diverse and Efficient Audio Captioning via Diffusion Models
Towards Diverse and Efficient Audio Captioning via Diffusion Models
Manjie Xu
Chenxing Li
Xinyi Tu
Yong Ren
Ruibo Fu
Wei Liang
Dong Yu
DiffM
91
2
0
14 Sep 2024
Spatially-Aware Speaker for Vision-and-Language Navigation Instruction
  Generation
Spatially-Aware Speaker for Vision-and-Language Navigation Instruction Generation
Muraleekrishna Gopinathan
Martin Masek
Jumana Abu-Khalaf
David Suter
LM&Ro
72
2
0
09 Sep 2024
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
Manu Gaur
Darshan Singh
Makarand Tapaswi
456
1
0
04 Sep 2024
Retrieval-Augmented Natural Language Reasoning for Explainable Visual
  Question Answering
Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering
Su Hyeon Lim
Minkuk Kim
Hyeon Bae Kim
Seong Tae Kim
ReLMLRM
71
0
0
30 Aug 2024
Pixels to Prose: Understanding the art of Image Captioning
Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DVVLM
81
1
0
28 Aug 2024
AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training,
  Finetuning, and Evaluating Aerospace Embodied World Models
AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models
Fanglong Yao
Yuanchang Yue
Youzhi Liu
Xian Sun
Kun Fu
VGenEgoV
64
8
0
28 Aug 2024
What Makes a Good Story and How Can We Measure It? A Comprehensive
  Survey of Story Evaluation
What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation
Dingyi Yang
Qin Jin
130
7
0
26 Aug 2024
Revisiting Image Captioning Training Paradigm via Direct CLIP-based
  Optimization
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
Nicholas Moratelli
Davide Caffagni
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
CLIP
90
3
0
26 Aug 2024
One-shot Video Imitation via Parameterized Symbolic Abstraction Graphs
One-shot Video Imitation via Parameterized Symbolic Abstraction Graphs
Jianren Wang
Kangni Liu
Dingkun Guo
Xian Zhou
Christopher G Atkeson
64
0
0
22 Aug 2024
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted
  Attack for Image-to-Text Models
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models
Qingyuan Zeng
Zhenzhong Wang
Yiu-ming Cheung
Min Jiang
AAML
78
2
0
16 Aug 2024
DIVE: Towards Descriptive and Diverse Visual Commonsense Generation
DIVE: Towards Descriptive and Diverse Visual Commonsense Generation
Jun-Hyung Park
Hyuntae Park
Youjin Kang
Eojin Jeon
SangKeun Lee
54
0
0
15 Aug 2024
IIU: Independent Inference Units for Knowledge-based Visual Question
  Answering
IIU: Independent Inference Units for Knowledge-based Visual Question Answering
Yili Li
Jing Yu
Keke Gai
Gang Xiong
51
0
0
15 Aug 2024
Context-aware Visual Storytelling with Visual Prefix Tuning and
  Contrastive Learning
Context-aware Visual Storytelling with Visual Prefix Tuning and Contrastive Learning
Yingjin Song
Denis Paperno
Albert Gatt
61
0
0
12 Aug 2024
Hyperbolic Learning with Multimodal Large Language Models
Hyperbolic Learning with Multimodal Large Language Models
Paolo Mandica
Luca Franco
Konstantinos Kallidromitis
Suzanne Petryk
Fabio Galasso
82
3
0
09 Aug 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
64
0
0
09 Aug 2024
UNMuTe: Unifying Navigation and Multimodal Dialogue-like Text Generation
UNMuTe: Unifying Navigation and Multimodal Dialogue-like Text Generation
Niyati Rawal
Roberto Bigazzi
Lorenzo Baraldi
Rita Cucchiara
LM&Ro
94
1
0
08 Aug 2024
A Novel Evaluation Framework for Image2Text Generation
A Novel Evaluation Framework for Image2Text Generation
Jia-Hong Huang
Hongyi Zhu
Yixian Shen
Stevan Rudinac
A. M. Pacces
Evangelos Kanoulas
70
9
0
03 Aug 2024
Learning Video Context as Interleaved Multimodal Sequences
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
100
6
0
31 Jul 2024
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large
  Language Models
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
Ming-Kuan Wu
Xinyue Cai
Jiayi Ji
Jiale Li
Oucheng Huang
Gen Luo
Hao Fei
Xiaoshuai Sun
Rongrong Ji
MLLM
156
13
0
31 Jul 2024
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger
  Visual Cues
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
68
7
0
29 Jul 2024
HICEScore: A Hierarchical Metric for Image Captioning Evaluation
HICEScore: A Hierarchical Metric for Image Captioning Evaluation
Zequn Zeng
Jianqiao Sun
Hao Zhang
Tiansheng Wen
Yudi Su
Yan Xie
Zhengjue Wang
Boli Chen
97
3
0
26 Jul 2024
WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained
  Spatial-Temporal Understanding
WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding
Quan Kong
Yuki Kawana
Rajat Saini
Ashutosh Kumar
Jingjing Pan
...
Yohei Ozao
Balázs Opra
D. Anastasiu
Yoichi Sato
Norimasa Kobori
VGen
54
10
0
22 Jul 2024
Navigation Instruction Generation with BEV Perception and Large Language
  Models
Navigation Instruction Generation with BEV Perception and Large Language Models
Sheng Fan
Rui Liu
Wenguan Wang
Yi Yang
89
9
0
21 Jul 2024
ModalChorus: Visual Probing and Alignment of Multi-modal Embeddings via
  Modal Fusion Map
ModalChorus: Visual Probing and Alignment of Multi-modal Embeddings via Modal Fusion Map
Yilin Ye
Shishi Xiao
Xingchen Zeng
Wei Zeng
109
5
0
17 Jul 2024
Distractors-Immune Representation Learning with Cross-modal Contrastive
  Regularization for Change Captioning
Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning
Yunbin Tu
Liang-Sheng Li
Li Su
Chenggang Yan
Qin Huang
99
6
0
16 Jul 2024
Controllable Navigation Instruction Generation with Chain of Thought
  Prompting
Controllable Navigation Instruction Generation with Chain of Thought Prompting
Xianghao Kong
Jinyu Chen
Wenguan Wang
Hang Su
Xiaolin Hu
Yi Yang
Si Liu
LRM
94
9
0
10 Jul 2024
Vision-Language Models under Cultural and Inclusive Considerations
Vision-Language Models under Cultural and Inclusive Considerations
Antonia Karamolegkou
Phillip Rust
Yong Cao
Ruixiang Cui
Anders Søgaard
Daniel Hershcovich
VLM
112
8
0
08 Jul 2024
Explainable Image Captioning using CNN- CNN architecture and
  Hierarchical Attention
Explainable Image Captioning using CNN- CNN architecture and Hierarchical Attention
Rishi Mohan
Sanjay Sureshkumar
Vignesh Sivasubramaniam
43
2
0
28 Jun 2024
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
Jiafeng Liang
Shixin Jiang
Zekun Wang
Haojie Pan
Zerui Chen
Zheng Chu
Ming Liu
Ruiji Fu
Zhongyuan Wang
Bing Qin
64
3
0
26 Jun 2024
A Refer-and-Ground Multimodal Large Language Model for Biomedicine
A Refer-and-Ground Multimodal Large Language Model for Biomedicine
Xiaoshuang Huang
Haifeng Huang
Lingdong Shen
Yehui Yang
Fangxin Shang
Junwei Liu
Jia Liu
LM&MA
135
7
0
26 Jun 2024
RaTEScore: A Metric for Radiology Report Generation
RaTEScore: A Metric for Radiology Report Generation
W. Zhao
Chaoyi Wu
Xiechi Zhang
Ya Zhang
Yanfeng Wang
Weidi Xie
102
12
0
24 Jun 2024
UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
Yuting Mei
Linli Yao
Qin Jin
63
1
0
24 Jun 2024
LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multimodal Large Language Models
LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multimodal Large Language Models
Mengdan Zhu
Raasikh Kanjiani
Jiahui Lu
Andrew Choi
Qirui Ye
Liang Zhao
DiffM
90
1
0
21 Jun 2024
Adaptable Logical Control for Large Language Models
Adaptable Logical Control for Large Language Models
Honghua Zhang
Po-Nien Kung
Masahiro Yoshida
Guy Van den Broeck
Nanyun Peng
71
10
0
19 Jun 2024
Enhancing Automated Audio Captioning via Large Language Models with
  Optimized Audio Encoding
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
Jizhong Liu
Gang Li
Junbo Zhang
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Yujun Wang
Bin Wang
AuLLM
133
5
0
19 Jun 2024
RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote
  Sensing Image Understanding
RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding
Linrui Xu
Ling Zhao
Wang Guo
Qiujun Li
Kewang Long
Kaiqi Zou
Yuhan Wang
Haifeng Li
AI4TS
77
7
0
18 Jun 2024
A Survey on Large Language Models from General Purpose to Medical
  Applications: Datasets, Methodologies, and Evaluations
A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations
Jinqiang Wang
Huansheng Ning
Yi Peng
Qikai Wei
Daniel Tesfai
Wenwei Mao
Tao Zhu
Runhe Huang
LM&MAAI4MHELM
138
8
0
14 Jun 2024
Previous
123456...171819
Next