ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.08718
  4. Cited By
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
v1v2v3 (latest)

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

18 April 2021
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
    CLIP
ArXiv (abs)PDFHTML

Papers citing "CLIPScore: A Reference-free Evaluation Metric for Image Captioning"

50 / 156 papers shown
Title
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models
Lorenzo Baraldi
Davide Bucciarelli
Federico Betti
Marcella Cornia
Lorenzo Baraldi
N. Sebe
Rita Cucchiara
207
0
0
26 May 2025
GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains
GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains
C. Wang
Xiaoran Pan
Zihao Pan
Haofan Wang
Yiren Song
LRM
99
0
0
24 May 2025
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
Yiren Song
Cheng Liu
Mike Zheng Shou
DiffM
173
2
0
24 May 2025
Align Beyond Prompts: Evaluating World Knowledge Alignment in Text-to-Image Generation
Align Beyond Prompts: Evaluating World Knowledge Alignment in Text-to-Image Generation
Wenchao Zhang
Jiahe Tian
Runze He
Jizhong Han
Jiao Dai
Miaomiao Feng
Wei Mi
Xiaodan Zhang
90
0
0
24 May 2025
REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing
REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing
Weihan Xu
Yimeng Ma
Jingyue Huang
Yang Li
Wenye Ma
Taylor Berg-Kirkpatrick
Julian McAuley
Paul Pu Liang
Hao-Wen Dong
DiffMVGen
177
0
0
24 May 2025
Scaling Image and Video Generation via Test-Time Evolutionary Search
Haoran He
Jiajun Liang
X. Wang
Pengfei Wan
Di Zhang
Kun Gai
Ling Pan
DiffM
227
0
0
23 May 2025
Panoptic Captioning: Seeking An Equivalency Bridge for Image and Text
Panoptic Captioning: Seeking An Equivalency Bridge for Image and Text
Kun-Yu Lin
Hongjun Wang
Weining Ren
Kai Han
268
0
0
22 May 2025
When Are Concepts Erased From Diffusion Models?
When Are Concepts Erased From Diffusion Models?
Kevin Lu
Nicky Kriplani
Rohit Gandikota
Minh Pham
David Bau
Chinmay Hegde
Niv Cohen
66
0
0
22 May 2025
Responsible Diffusion Models via Constraining Text Embeddings within Safe Regions
Responsible Diffusion Models via Constraining Text Embeddings within Safe Regions
Zhiwen Li
Die Chen
Mingyuan Fan
Cen Chen
Yaliang Li
Yanhao Wang
Wenmeng Zhou
DiffM
68
2
0
21 May 2025
AKRMap: Adaptive Kernel Regression for Trustworthy Visualization of Cross-Modal Embeddings
AKRMap: Adaptive Kernel Regression for Trustworthy Visualization of Cross-Modal Embeddings
Yilin Ye
Junchao Huang
Xingchen Zeng
Jiazhi Xia
Wei Zeng
137
0
0
20 May 2025
One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework
One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework
Feiran Li
Qianqian Xu
Shilong Bao
Zhiyong Yang
Xiaochun Cao
Qingming Huang
DiffM
100
0
0
16 May 2025
Multi-Modal Language Models as Text-to-Image Model Evaluators
Multi-Modal Language Models as Text-to-Image Model Evaluators
Jiahui Chen
Candace Ross
Reyhane Askari Hemmat
Koustuv Sinha
Melissa Hall
M. Drozdzal
Adriana Romero-Soriano
EGVM
101
0
0
01 May 2025
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Xu Ma
Peize Sun
Haoyu Ma
Hao Tang
Chih-Yao Ma
...
Matt Feiszli
Peizhao Zhang
Peter Vajda
Sam S. Tsai
Y. Fu
160
4
0
24 Apr 2025
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma
Jing Ding
Xuejun Zhang
Dezhi Luo
Jiahe Ding
Sihan Xu
Yuchen Huang
Run Peng
Joyce Chai
212
0
0
22 Apr 2025
Generating Fine Details of Entity Interactions
Generating Fine Details of Entity Interactions
Xinyi Gu
Jiayuan Mao
148
0
0
11 Apr 2025
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
Xiangyu Zhao
Peiyuan Zhang
Kexian Tang
Hao Li
Zicheng Zhang
...
Guangtao Zhai
Junchi Yan
Hua Yang
Xue Yang
Haodong Duan
VLMLRM
127
6
0
03 Apr 2025
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
168
0
0
03 Apr 2025
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation
Yunhong Min
Daehyeon Choi
Kyeongmin Yeo
Jihyun Lee
Minhyuk Sung
96
0
0
28 Mar 2025
3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models
3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models
Yize Zhang
Mengchen Zhang
Tong Wu
Tengfei Wang
Gordon Wetzstein
Dahua Lin
Ziwei Liu
ELM
156
1
0
27 Mar 2025
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
Prin Phunyaphibarn
Phillip Y. Lee
Jaihoon Kim
Minhyuk Sung
DiffM
154
1
0
26 Mar 2025
Progressive Prompt Detailing for Improved Alignment in Text-to-Image Generative Models
Progressive Prompt Detailing for Improved Alignment in Text-to-Image Generative Models
Ketan Suhaas Saichandran
Xavier Thomas
Prakhar Kaushik
Deepti Ghadiyaram
DiffM
129
1
0
22 Mar 2025
CRCE: Coreference-Retention Concept Erasure in Text-to-Image Diffusion Models
CRCE: Coreference-Retention Concept Erasure in Text-to-Image Diffusion Models
Yuyang Xue
Edward Moroshko
Feng Chen
Jingyu Sun
Steven McDonagh
Sotirios A. Tsaftaris
90
2
0
18 Mar 2025
UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing
UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing
Tsu-Jui Fu
Yusu Qian
Chen Chen
Wenze Hu
Zhe Gan
Yue Yang
195
2
0
16 Mar 2025
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
Jonas Belouadi
Eddy Ilg
Margret Keuper
Hideki Tanaka
Masao Utiyama
Raj Dabre
Steffen Eger
Simone Paolo Ponzetto
148
0
0
14 Mar 2025
DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation
DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation
Chen Chen
Rui Qian
Wenze Hu
Tsu-Jui Fu
Jialing Tong
...
Lezhi Li
Bowen Zhang
Alex Schwing
Wei Liu
Yue Yang
136
0
0
13 Mar 2025
WonderVerse: Extendable 3D Scene Generation with Video Generative Models
WonderVerse: Extendable 3D Scene Generation with Video Generative Models
Hao Feng
Zhi Zuo
Jia-Hui Pan
Ka-Hei Hui
Yihua Shao
Qi Dou
Wei Xie
Zhengzhe Liu
VGen
121
1
0
12 Mar 2025
From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers
Jiacheng Liu
Chang Zou
Yuanhuiyi Lyu
Junjie Chen
Linfeng Zhang
DiffM
131
4
0
10 Mar 2025
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
Yuwei Niu
Munan Ning
Mengren Zheng
Weiyang Jin
Bin Lin
...
Jiaqi Liao
Chaoran Feng
Kunpeng Ning
Bin Zhu
Li Yuan
EGVM
116
26
0
10 Mar 2025
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
Kwanyoung Kim
Byeongsu Sim
DiffMVLM
140
0
0
10 Mar 2025
VACT: A Video Automatic Causal Testing System and a Benchmark
VACT: A Video Automatic Causal Testing System and a Benchmark
Haotong Yang
Qingyuan Zheng
Yunjian Gao
Yongkun Yang
Yangbo He
Zhouchen Lin
Muhan Zhang
VGenCML
122
0
0
08 Mar 2025
Predicting Team Performance from Communications in Simulated Search-and-Rescue
Ali Jalal-Kamali
Nikolos Gurney
David Pynadath
AI4TS
174
14
0
05 Mar 2025
LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation
LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation
Shuai Yang
Jing Tan
Mengchen Zhang
Tong Wu
Yongqian Li
Gordon Wetzstein
Ziwei Liu
Dahua Lin
MDEVGen
150
9
0
24 Feb 2025
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
Guanqi Zhan
Yuanpei Liu
Kai Han
Weidi Xie
Andrew Zisserman
VLM
511
0
0
21 Feb 2025
Accelerating Diffusion Transformers with Token-wise Feature Caching
Accelerating Diffusion Transformers with Token-wise Feature Caching
Chang Zou
Xuyang Liu
Ting Liu
Siteng Huang
Linfeng Zhang
143
24
0
20 Feb 2025
MoVer: Motion Verification for Motion Graphics Animations
MoVer: Motion Verification for Motion Graphics Animations
Jiaju Ma
Maneesh Agrawala
VGen
98
0
0
19 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
314
7
0
12 Feb 2025
LANTERN++: Enhancing Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models
LANTERN++: Enhancing Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models
Sihwan Park
Doohyuk Jang
Sungyub Kim
Souvik Kundu
Eunho Yang
123
0
0
10 Feb 2025
A Physical Coherence Benchmark for Evaluating Video Generation Models via Optical Flow-guided Frame Prediction
A Physical Coherence Benchmark for Evaluating Video Generation Models via Optical Flow-guided Frame Prediction
Yongfan Chen
Xiuwen Zhu
Tianyu Li
EGVMVGen
142
3
0
08 Feb 2025
Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling
Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling
Xiaowen Qiu
Jincheng Yang
Yian Wang
Zhehuan Chen
Yufei Wang
Tsun-Hsuan Wang
Zhou Xian
Chuang Gan
187
5
0
04 Feb 2025
Accelerate High-Quality Diffusion Models with Inner Loop Feedback
Accelerate High-Quality Diffusion Models with Inner Loop Feedback
M. Gwilliam
Han Cai
Di Wu
Abhinav Shrivastava
Zhiyu Cheng
184
1
0
22 Jan 2025
Modality Interactive Mixture-of-Experts for Fake News Detection
Modality Interactive Mixture-of-Experts for Fake News Detection
Yifan Liu
Y. Liu
Zehan Li
Ruichen Yao
Yang Zhang
Dong Wang
MoE
82
0
0
21 Jan 2025
Ditto: Accelerating Diffusion Model via Temporal Value Similarity
Ditto: Accelerating Diffusion Model via Temporal Value Similarity
Sungbin Kim
Hyunwuk Lee
Wonho Cho
Mincheol Park
Won Woo Ro
133
1
0
20 Jan 2025
Lossy Compression with Pretrained Diffusion Models
Jeremy Vonderfecht
Feng Liu
DiffM
145
2
0
20 Jan 2025
Geometric Median (GM) Matching for Robust Data Pruning
Geometric Median (GM) Matching for Robust Data Pruning
Anish Acharya
Inderjit S Dhillon
Sujay Sanghavi
AAML
119
0
0
20 Jan 2025
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models
Yong-Hyun Park
Sangdoo Yun
Jin-Hwa Kim
Junho Kim
Geonhui Jang
Yonghyun Jeong
Junghyo Jo
Gayoung Lee
137
19
0
17 Jan 2025
Beyond Flat Text: Dual Self-inherited Guidance for Visual Text Generation
Beyond Flat Text: Dual Self-inherited Guidance for Visual Text Generation
Minxing Luo
Zixun Xia
L. Chen
Zhenhang Li
Weichao Zeng
Jinqiao Wang
Wentao Cheng
Yaxing Wang
Yu Zhou
Jian Yang
DiffM
132
1
0
10 Jan 2025
Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models
Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models
Yuzhu Cai
Sheng Yin
Yuxi Wei
Chenxin Xu
Weibo Mao
Felix Juefei Xu
Siheng Chen
Yanfeng Wang
EGVM
168
3
0
03 Jan 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffMVLM
120
0
0
03 Jan 2025
TexAVi: Generating Stereoscopic VR Video Clips from Text Descriptions
Vriksha Srihari
R. Bhavya
Shruti Jayaraman
V. Mary Anita Rajam
DiffMVGen
91
0
0
02 Jan 2025
Fine-grained and Explainable Factuality Evaluation for Multimodal Summarization
Fine-grained and Explainable Factuality Evaluation for Multimodal Summarization
Liqiang Jing
Jingxuan Zuo
Yue Zhang
103
8
0
31 Dec 2024
1234
Next