Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.08718
Cited By
v1
v2
v3 (latest)
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
18 April 2021
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CLIPScore: A Reference-free Evaluation Metric for Image Captioning"
50 / 156 papers shown
Title
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models
Lorenzo Baraldi
Davide Bucciarelli
Federico Betti
Marcella Cornia
Lorenzo Baraldi
N. Sebe
Rita Cucchiara
207
0
0
26 May 2025
GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains
C. Wang
Xiaoran Pan
Zihao Pan
Haofan Wang
Yiren Song
LRM
99
0
0
24 May 2025
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
Yiren Song
Cheng Liu
Mike Zheng Shou
DiffM
173
2
0
24 May 2025
Align Beyond Prompts: Evaluating World Knowledge Alignment in Text-to-Image Generation
Wenchao Zhang
Jiahe Tian
Runze He
Jizhong Han
Jiao Dai
Miaomiao Feng
Wei Mi
Xiaodan Zhang
90
0
0
24 May 2025
REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing
Weihan Xu
Yimeng Ma
Jingyue Huang
Yang Li
Wenye Ma
Taylor Berg-Kirkpatrick
Julian McAuley
Paul Pu Liang
Hao-Wen Dong
DiffM
VGen
177
0
0
24 May 2025
Scaling Image and Video Generation via Test-Time Evolutionary Search
Haoran He
Jiajun Liang
X. Wang
Pengfei Wan
Di Zhang
Kun Gai
Ling Pan
DiffM
227
0
0
23 May 2025
Panoptic Captioning: Seeking An Equivalency Bridge for Image and Text
Kun-Yu Lin
Hongjun Wang
Weining Ren
Kai Han
268
0
0
22 May 2025
When Are Concepts Erased From Diffusion Models?
Kevin Lu
Nicky Kriplani
Rohit Gandikota
Minh Pham
David Bau
Chinmay Hegde
Niv Cohen
66
0
0
22 May 2025
Responsible Diffusion Models via Constraining Text Embeddings within Safe Regions
Zhiwen Li
Die Chen
Mingyuan Fan
Cen Chen
Yaliang Li
Yanhao Wang
Wenmeng Zhou
DiffM
68
2
0
21 May 2025
AKRMap: Adaptive Kernel Regression for Trustworthy Visualization of Cross-Modal Embeddings
Yilin Ye
Junchao Huang
Xingchen Zeng
Jiazhi Xia
Wei Zeng
137
0
0
20 May 2025
One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework
Feiran Li
Qianqian Xu
Shilong Bao
Zhiyong Yang
Xiaochun Cao
Qingming Huang
DiffM
100
0
0
16 May 2025
Multi-Modal Language Models as Text-to-Image Model Evaluators
Jiahui Chen
Candace Ross
Reyhane Askari Hemmat
Koustuv Sinha
Melissa Hall
M. Drozdzal
Adriana Romero-Soriano
EGVM
101
0
0
01 May 2025
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Xu Ma
Peize Sun
Haoyu Ma
Hao Tang
Chih-Yao Ma
...
Matt Feiszli
Peizhao Zhang
Peter Vajda
Sam S. Tsai
Y. Fu
160
4
0
24 Apr 2025
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma
Jing Ding
Xuejun Zhang
Dezhi Luo
Jiahe Ding
Sihan Xu
Yuchen Huang
Run Peng
Joyce Chai
212
0
0
22 Apr 2025
Generating Fine Details of Entity Interactions
Xinyi Gu
Jiayuan Mao
148
0
0
11 Apr 2025
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
Xiangyu Zhao
Peiyuan Zhang
Kexian Tang
Hao Li
Zicheng Zhang
...
Guangtao Zhai
Junchi Yan
Hua Yang
Xue Yang
Haodong Duan
VLM
LRM
127
6
0
03 Apr 2025
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
168
0
0
03 Apr 2025
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation
Yunhong Min
Daehyeon Choi
Kyeongmin Yeo
Jihyun Lee
Minhyuk Sung
96
0
0
28 Mar 2025
3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models
Yize Zhang
Mengchen Zhang
Tong Wu
Tengfei Wang
Gordon Wetzstein
Dahua Lin
Ziwei Liu
ELM
156
1
0
27 Mar 2025
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
Prin Phunyaphibarn
Phillip Y. Lee
Jaihoon Kim
Minhyuk Sung
DiffM
154
1
0
26 Mar 2025
Progressive Prompt Detailing for Improved Alignment in Text-to-Image Generative Models
Ketan Suhaas Saichandran
Xavier Thomas
Prakhar Kaushik
Deepti Ghadiyaram
DiffM
129
1
0
22 Mar 2025
CRCE: Coreference-Retention Concept Erasure in Text-to-Image Diffusion Models
Yuyang Xue
Edward Moroshko
Feng Chen
Jingyu Sun
Steven McDonagh
Sotirios A. Tsaftaris
90
2
0
18 Mar 2025
UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing
Tsu-Jui Fu
Yusu Qian
Chen Chen
Wenze Hu
Zhe Gan
Yue Yang
195
2
0
16 Mar 2025
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
Jonas Belouadi
Eddy Ilg
Margret Keuper
Hideki Tanaka
Masao Utiyama
Raj Dabre
Steffen Eger
Simone Paolo Ponzetto
148
0
0
14 Mar 2025
DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation
Chen Chen
Rui Qian
Wenze Hu
Tsu-Jui Fu
Jialing Tong
...
Lezhi Li
Bowen Zhang
Alex Schwing
Wei Liu
Yue Yang
136
0
0
13 Mar 2025
WonderVerse: Extendable 3D Scene Generation with Video Generative Models
Hao Feng
Zhi Zuo
Jia-Hui Pan
Ka-Hei Hui
Yihua Shao
Qi Dou
Wei Xie
Zhengzhe Liu
VGen
121
1
0
12 Mar 2025
From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers
Jiacheng Liu
Chang Zou
Yuanhuiyi Lyu
Junjie Chen
Linfeng Zhang
DiffM
131
4
0
10 Mar 2025
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
Yuwei Niu
Munan Ning
Mengren Zheng
Weiyang Jin
Bin Lin
...
Jiaqi Liao
Chaoran Feng
Kunpeng Ning
Bin Zhu
Li Yuan
EGVM
116
26
0
10 Mar 2025
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
Kwanyoung Kim
Byeongsu Sim
DiffM
VLM
140
0
0
10 Mar 2025
VACT: A Video Automatic Causal Testing System and a Benchmark
Haotong Yang
Qingyuan Zheng
Yunjian Gao
Yongkun Yang
Yangbo He
Zhouchen Lin
Muhan Zhang
VGen
CML
122
0
0
08 Mar 2025
Predicting Team Performance from Communications in Simulated Search-and-Rescue
Ali Jalal-Kamali
Nikolos Gurney
David Pynadath
AI4TS
174
14
0
05 Mar 2025
LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation
Shuai Yang
Jing Tan
Mengchen Zhang
Tong Wu
Yongqian Li
Gordon Wetzstein
Ziwei Liu
Dahua Lin
MDE
VGen
150
9
0
24 Feb 2025
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
Guanqi Zhan
Yuanpei Liu
Kai Han
Weidi Xie
Andrew Zisserman
VLM
511
0
0
21 Feb 2025
Accelerating Diffusion Transformers with Token-wise Feature Caching
Chang Zou
Xuyang Liu
Ting Liu
Siteng Huang
Linfeng Zhang
143
24
0
20 Feb 2025
MoVer: Motion Verification for Motion Graphics Animations
Jiaju Ma
Maneesh Agrawala
VGen
98
0
0
19 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
314
7
0
12 Feb 2025
LANTERN++: Enhancing Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models
Sihwan Park
Doohyuk Jang
Sungyub Kim
Souvik Kundu
Eunho Yang
123
0
0
10 Feb 2025
A Physical Coherence Benchmark for Evaluating Video Generation Models via Optical Flow-guided Frame Prediction
Yongfan Chen
Xiuwen Zhu
Tianyu Li
EGVM
VGen
142
3
0
08 Feb 2025
Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling
Xiaowen Qiu
Jincheng Yang
Yian Wang
Zhehuan Chen
Yufei Wang
Tsun-Hsuan Wang
Zhou Xian
Chuang Gan
187
5
0
04 Feb 2025
Accelerate High-Quality Diffusion Models with Inner Loop Feedback
M. Gwilliam
Han Cai
Di Wu
Abhinav Shrivastava
Zhiyu Cheng
184
1
0
22 Jan 2025
Modality Interactive Mixture-of-Experts for Fake News Detection
Yifan Liu
Y. Liu
Zehan Li
Ruichen Yao
Yang Zhang
Dong Wang
MoE
82
0
0
21 Jan 2025
Ditto: Accelerating Diffusion Model via Temporal Value Similarity
Sungbin Kim
Hyunwuk Lee
Wonho Cho
Mincheol Park
Won Woo Ro
133
1
0
20 Jan 2025
Lossy Compression with Pretrained Diffusion Models
Jeremy Vonderfecht
Feng Liu
DiffM
145
2
0
20 Jan 2025
Geometric Median (GM) Matching for Robust Data Pruning
Anish Acharya
Inderjit S Dhillon
Sujay Sanghavi
AAML
119
0
0
20 Jan 2025
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models
Yong-Hyun Park
Sangdoo Yun
Jin-Hwa Kim
Junho Kim
Geonhui Jang
Yonghyun Jeong
Junghyo Jo
Gayoung Lee
137
19
0
17 Jan 2025
Beyond Flat Text: Dual Self-inherited Guidance for Visual Text Generation
Minxing Luo
Zixun Xia
L. Chen
Zhenhang Li
Weichao Zeng
Jinqiao Wang
Wentao Cheng
Yaxing Wang
Yu Zhou
Jian Yang
DiffM
132
1
0
10 Jan 2025
Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models
Yuzhu Cai
Sheng Yin
Yuxi Wei
Chenxin Xu
Weibo Mao
Felix Juefei Xu
Siheng Chen
Yanfeng Wang
EGVM
168
3
0
03 Jan 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
120
0
0
03 Jan 2025
TexAVi: Generating Stereoscopic VR Video Clips from Text Descriptions
Vriksha Srihari
R. Bhavya
Shruti Jayaraman
V. Mary Anita Rajam
DiffM
VGen
91
0
0
02 Jan 2025
Fine-grained and Explainable Factuality Evaluation for Multimodal Summarization
Liqiang Jing
Jingxuan Zuo
Yue Zhang
103
8
0
31 Dec 2024
1
2
3
4
Next