Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.03391
Cited By
Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval
6 April 2023
Jae Myung Kim
A. Sophia Koepke
Cordelia Schmid
Zeynep Akata
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval"
17 / 17 papers shown
Title
Attention Hijackers: Detect and Disentangle Attention Hijacking in LVLMs for Hallucination Mitigation
Beitao Chen
Xinyu Lyu
Lianli Gao
Jingkuan Song
H. Shen
63
1
0
11 Mar 2025
Bridging Text and Vision: A Multi-View Text-Vision Registration Approach for Cross-Modal Place Recognition
Tianyi Shang
Zhenyu Li
Pengjie Xu
Jinwei Qiao
Gang Chen
Zihan Ruan
Weijun Hu
59
0
0
20 Feb 2025
Deep Reversible Consistency Learning for Cross-modal Retrieval
Ruitao Pu
Yang Qin
Dezhong Peng
Xiaomin Song
Huiming Zheng
38
1
0
10 Jan 2025
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection
Le Yang
Ziwei Zheng
Boxu Chen
Zhengyu Zhao
Chenhao Lin
Chao Shen
VLM
140
3
0
18 Dec 2024
Tree of Attributes Prompt Learning for Vision-Language Models
Tong Ding
Wanhua Li
Zhongqi Miao
Hanspeter Pfister
VLM
52
1
0
15 Oct 2024
Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models
Xin Zou
Yizhou Wang
Yibo Yan
Yuanhuiyi Lyu
Kening Zheng
...
Junkai Chen
Peijie Jiang
J. Liu
Chang Tang
Xuming Hu
86
7
0
04 Oct 2024
Debiasing Multimodal Large Language Models
Yi-Fan Zhang
Weichen Yu
Qingsong Wen
Xue Wang
Zhang Zhang
Liang Wang
Rong Jin
Tien-Ping Tan
39
4
0
08 Mar 2024
Effectiveness Assessment of Recent Large Vision-Language Models
Yao Jiang
Xinyu Yan
Ge-Peng Ji
Keren Fu
Meijun Sun
Huan Xiong
Deng-Ping Fan
Fahad Shahbaz Khan
31
14
0
07 Mar 2024
What does a platypus look like? Generating customized prompts for zero-shot image classification
Sarah M Pratt
Ian Covert
Rosanne Liu
Ali Farhadi
VLM
125
212
0
07 Sep 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
392
4,125
0
28 Jan 2022
Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search
Jialu Wang
Yang Liu
X. Wang
FaML
157
95
0
12 Sep 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
317
780
0
18 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
298
3,693
0
11 Feb 2021
Probabilistic Embeddings for Cross-Modal Retrieval
Sanghyuk Chun
Seong Joon Oh
Rafael Sampaio de Rezende
Yannis Kalantidis
Diane Larlus
UQCV
401
200
0
13 Jan 2021
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
415
595
0
21 Jul 2020
Counterfactual Samples Synthesizing for Robust Visual Question Answering
Long Chen
Xin Yan
Jun Xiao
Hanwang Zhang
Shiliang Pu
Yueting Zhuang
OOD
AAML
154
290
0
14 Mar 2020
Adaptive Offline Quintuplet Loss for Image-Text Matching
Tianlang Chen
Jiajun Deng
Jiebo Luo
181
68
0
07 Mar 2020
1