ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1607.08822
  4. Cited By
SPICE: Semantic Propositional Image Caption Evaluation

SPICE: Semantic Propositional Image Caption Evaluation

29 July 2016
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
    EGVM
ArXiv (abs)PDFHTML

Papers citing "SPICE: Semantic Propositional Image Caption Evaluation"

50 / 949 papers shown
Title
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
Qinghao Ye
Xianhan Zeng
Fu Li
Chong Li
Haoqi Fan
CoGe
116
5
0
10 Mar 2025
Optimal Transport for Brain-Image Alignment: Unveiling Redundancy and Synergy in Neural Information Processing
Yang Xiao
Wang Lu
Jie Ji
Ruimeng Ye
Gen Li
Xiaolong Ma
Bo Hui
OT
97
0
0
09 Mar 2025
Group Relative Policy Optimization for Image Captioning
Xu Liang
73
1
0
03 Mar 2025
Natural Language Generation from Visual Events: Challenges and Future Directions
Natural Language Generation from Visual Events: Challenges and Future Directions
Aditya K Surikuchi
Raquel Fernández
Sandro Pezzelle
EGVM
510
0
0
18 Feb 2025
MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding
MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding
Weikang Qiu
Zheng Huang
Haoyu Hu
Aosong Feng
Yujun Yan
Rex Ying
97
0
0
18 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
344
7
0
12 Feb 2025
Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning
Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning
Manh Luong
Khai Nguyen
Dinh Q. Phung
Gholamreza Haffari
Zhuang Li
79
0
0
08 Feb 2025
A Video-grounded Dialogue Dataset and Metric for Event-driven Activities
Wiradee Imrattanatrai
Masaki Asada
Kimihiro Hasegawa
Zhi-Qi Cheng
Ken Fukuda
Teruko Mitamura
VGen
129
0
0
30 Jan 2025
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Kei Katsumata
Motonari Kambara
Daichi Yashima
Ryosuke Korekata
Komei Sugiura
200
0
0
28 Jan 2025
An Ensemble Model with Attention Based Mechanism for Image Captioning
Israa Al Badarneh
Bassam Hammo
Omar Al-Kadi
198
6
0
28 Jan 2025
DriveLM: Driving with Graph Visual Question Answering
DriveLM: Driving with Graph Visual Question Answering
Chonghao Sima
Katrin Renz
Kashyap Chitta
Lawrence Yunliang Chen
Hanxue Zhang
Chengen Xie
Jens Beißwenger
Ping Luo
Andreas Geiger
Hongyang Li
285
207
0
17 Jan 2025
Classifier-Guided Captioning Across Modalities
Ariel Shaulov
Tal Shaharabany
E. Shaar
Gal Chechik
Lior Wolf
84
0
0
03 Jan 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffMVLM
139
0
0
03 Jan 2025
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
Yuqian Yuan
Hang Zhang
Wentong Li
Zesen Cheng
Boqiang Zhang
...
Deli Zhao
Wenqiao Zhang
Yueting Zhuang
Jianke Zhu
Lidong Bing
155
10
0
31 Dec 2024
ErgoChat: a Visual Query System for the Ergonomic Risk Assessment of Construction Workers
ErgoChat: a Visual Query System for the Ergonomic Risk Assessment of Construction Workers
Chao Fan
Qipei Mei
Xiaonan Wang
Xinming Li
73
3
0
31 Dec 2024
Multi-Agent Planning Using Visual Language Models
Multi-Agent Planning Using Visual Language Models
Michele Brienza
F. Argenziano
Vincenzo Suriani
D. Bloisi
Daniele Nardi
LM&RoLLMAG
134
5
0
31 Dec 2024
From Hallucinations to Facts: Enhancing Language Models with Curated
  Knowledge Graphs
From Hallucinations to Facts: Enhancing Language Models with Curated Knowledge Graphs
Ratnesh Kumar Joshi
Sagnik Sengupta
Asif Ekbal
HILMKELM
79
0
0
24 Dec 2024
SCBench: A Sports Commentary Benchmark for Video LLMs
SCBench: A Sports Commentary Benchmark for Video LLMs
Kuangzhi Ge
Lawrence Yunliang Chen
Kevin Zhang
Yulin Luo
Tianyu Shi
Liaoyuan Fan
Xiang Li
Guanqun Wang
Shanghang Zhang
79
1
0
23 Dec 2024
Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track
Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track
D. Gupta
Dina Demner-Fushman
LM&MA
103
1
0
15 Dec 2024
Neptune: The Long Orbit to Benchmarking Long Video Understanding
Arsha Nagrani
Ruotong Wang
Ramin Mehran
Rachel Hornung
N. B. Gundavarapu
...
Boqing Gong
Cordelia Schmid
Mikhail Sirotenko
Yukun Zhu
Tobias Weyand
168
8
0
12 Dec 2024
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Zhisheng Zhong
Chengyao Wang
Yuqi Liu
Senqiao Yang
Longxiang Tang
...
Shaozuo Yu
Sitong Wu
Eric Lo
Shu Liu
Jiaya Jia
AuLLM
162
7
0
12 Dec 2024
CEGI: Measuring the trade-off between efficiency and carbon emissions
  for SLMs and VLMs
CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs
Abhas Kumar
Kapil Pathak
Rajesh Kavuru
Prabhakar Srinivasan
124
0
0
03 Dec 2024
DIR: Retrieval-Augmented Image Captioning with Comprehensive
  Understanding
DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding
Hao Wu
Zhihang Zhong
Xiao Sun
DiffM
106
0
0
02 Dec 2024
VideoOrion: Tokenizing Object Dynamics in Videos
VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng
Yijiang Li
Wanpeng Zhang
Sipeng Zheng
Zongqing Lu
Sipeng Zheng
Zongqing Lu
155
2
0
25 Nov 2024
EVQAScore: A Fine-grained Metric for Video Question Answering Data Quality Evaluation
EVQAScore: A Fine-grained Metric for Video Question Answering Data Quality Evaluation
Hao Liang
Zirong Chen
Wentao Zhang
Wentao Zhang
108
1
0
11 Nov 2024
ViTOC: Vision Transformer and Object-aware Captioner
ViTOC: Vision Transformer and Object-aware Captioner
Feiyang Huang
87
0
0
09 Nov 2024
Analyzing The Language of Visual Tokens
Analyzing The Language of Visual Tokens
David M. Chan
Rodolfo Corona
J. S. Park
Cheol Jun Cho
Yutong Bai
Trevor Darrell
37
4
0
07 Nov 2024
Semantic-Aligned Adversarial Evolution Triangle for High-Transferability
  Vision-Language Attack
Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language Attack
Xiaojun Jia
Sensen Gao
Qing Guo
Ke Ma
Yihao Huang
Simeng Qin
Yang Liu
Ivor Tsang Fellow
Xiaochun Cao
AAML
82
3
0
04 Nov 2024
TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models
TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models
Georgia Gabriela Sampaio
Ruixiang Zhang
Shuangfei Zhai
Jiatao Gu
J. Susskind
Navdeep Jaitly
Yizhe Zhang
DiffMCLIP
65
1
0
02 Nov 2024
MACE: Leveraging Audio for Evaluating Audio Captioning Systems
MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Satvik Dixit
Soham Deshmukh
Bhiksha Raj
59
1
0
01 Nov 2024
Preserving Pre-trained Representation Space: On Effectiveness of
  Prefix-tuning for Large Multi-modal Models
Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models
Donghoon Kim
Gusang Lee
Kyuhong Shim
B. Shim
97
1
0
29 Oct 2024
Sensor2Text: Enabling Natural Language Interactions for Daily Activity
  Tracking Using Wearable Sensors
Sensor2Text: Enabling Natural Language Interactions for Daily Activity Tracking Using Wearable Sensors
Wenqiang Chen
Jiaxuan Cheng
Leyao Wang
Wei Zhao
Wojciech Matusik
116
2
0
26 Oct 2024
SceneGraMMi: Scene Graph-boosted Hybrid-fusion for Multi-Modal
  Misinformation Veracity Prediction
SceneGraMMi: Scene Graph-boosted Hybrid-fusion for Multi-Modal Misinformation Veracity Prediction
Swarang Joshi
Siddharth Mavani
Joel Alex
Arnav Negi
Rahul Mishra
Ponnurangam Kumaraguru
98
0
0
20 Oct 2024
Synergistic Dual Spatial-aware Generation of Image-to-Text and
  Text-to-Image
Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image
Yu Zhao
Hao Fei
Xiangtai Li
L. Qin
Jiayi Ji
Erik Cambria
Meishan Zhang
Hao Fei
Jianguo Wei
DiffM
90
1
0
20 Oct 2024
EVA: An Embodied World Model for Future Video Anticipation
EVA: An Embodied World Model for Future Video Anticipation
Xiaowei Chi
Hengyuan Zhang
Chun-Kai Fan
Xingqun Qi
Rongyu Zhang
...
Chi-Min Chan
Wei Xue
Wenhan Luo
Shanghang Zhang
Yike Guo
VGen
88
8
0
20 Oct 2024
EmotionCaps: Enhancing Audio Captioning Through Emotion-Augmented Data
  Generation
EmotionCaps: Enhancing Audio Captioning Through Emotion-Augmented Data Generation
Mithun Manivannan
Vignesh Nethrapalli
Mark Cartwright
62
1
0
15 Oct 2024
SGEdit: Bridging LLM with Text2Image Generative Model for Scene
  Graph-based Image Editing
SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing
Zhiyuan Zhang
Dongdong Chen
J. Liao
DiffM
122
3
0
15 Oct 2024
Efficient and Effective Universal Adversarial Attack against
  Vision-Language Pre-training Models
Efficient and Effective Universal Adversarial Attack against Vision-Language Pre-training Models
Fan Yang
Yihao Huang
Kaidi Wang
Ling Shi
G. Pu
Yang Liu
Haoran Wang
AAMLVLM
80
2
0
15 Oct 2024
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent
  Approach
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach
Rory Young
Nicolas Pugeault
AAML
134
5
0
14 Oct 2024
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and
  CLAP-Refine through LLMs
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
Wenxi Chen
Ziyang Ma
Xiquan Li
Xuenan Xu
Yuzhe Liang
Zhisheng Zheng
Kai Yu
Xie Chen
97
7
0
12 Oct 2024
DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning
DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning
Xiquan Li
Wenxi Chen
Ziyang Ma
Xuenan Xu
Yuzhe Liang
Zhisheng Zheng
Qiuqiang Kong
Xie Chen
VLM
118
6
0
12 Oct 2024
Audio Description Generation in the Era of LLMs and VLMs: A Review of
  Transferable Generative AI Technologies
Audio Description Generation in the Era of LLMs and VLMs: A Review of Transferable Generative AI Technologies
Yingqiang Gao
Lukas Fischer
Alexa Lintner
Sarah Ebling
59
1
0
11 Oct 2024
A Unified Debiasing Approach for Vision-Language Models across
  Modalities and Tasks
A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks
Hoin Jung
T. Jang
Xiaoqian Wang
VLM
56
3
0
10 Oct 2024
NaVIP: An Image-Centric Indoor Navigation Solution for Visually Impaired
  People
NaVIP: An Image-Centric Indoor Navigation Solution for Visually Impaired People
Jun Yu
Yifan Zhang
Badrinadh Aila
V. Namboodiri
106
1
0
08 Oct 2024
The Mystery of Compositional Generalization in Graph-based Generative
  Commonsense Reasoning
The Mystery of Compositional Generalization in Graph-based Generative Commonsense Reasoning
Xiyan Fu
Anette Frank
LRM
116
0
0
08 Oct 2024
An Eye for an Ear: Zero-shot Audio Description Leveraging an Image
  Captioner using Audiovisual Distribution Alignment
An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment
Hugo Malard
Michel Olvera
Stéphane Lathuilière
S. Essid
VLM
61
0
0
08 Oct 2024
R-Bench: Are your Large Multimodal Model Robust to Real-world
  Corruptions?
R-Bench: Are your Large Multimodal Model Robust to Real-world Corruptions?
Chunyi Li
Junxuan Zhang
Zicheng Zhang
H. Wu
Yuan Tian
...
Guo Lu
Xiaohong Liu
Xiongkuo Min
Weisi Lin
Guangtao Zhai
AAML
92
4
0
07 Oct 2024
CoVLM: Leveraging Consensus from Vision-Language Models for
  Semi-supervised Multi-modal Fake News Detection
CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News Detection
Devank
Jayateja Kalla
Soma Biswas
62
2
0
06 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
200
37
0
04 Oct 2024
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences
Genta Indra Winata
David Anugraha
Lucky Susanto
Garry Kuwanto
Derry Wijaya
162
11
0
03 Oct 2024
Previous
12345...171819
Next