Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.17510
Cited By
Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning
27 February 2024
Maurits J. R. Bleeker
Mariya Hendriksen
Andrew Yates
Maarten de Rijke
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning"
42 / 42 papers shown
Title
Probing Mechanical Reasoning in Large Vision Language Models
Haoran Sun
Qingying Gao
Haiyun Lyu
Dezhi Luo
Yijiang Li
Hokin Deng
LRM
78
2
0
01 Oct 2024
When and why vision-language models behave like bags-of-words, and what to do about it?
Mert Yuksekgonul
Federico Bianchi
Pratyusha Kalluri
Dan Jurafsky
James Zou
VLM
CoGe
76
393
0
04 Oct 2022
Rethinking Minimal Sufficient Representation in Contrastive Learning
Haoqing Wang
Xun Guo
Zhiwei Deng
Yan Lu
SSL
54
76
0
14 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
542
4,398
0
28 Jan 2022
SLIP: Self-supervision meets Language-Image Pre-training
Norman Mu
Alexander Kirillov
David Wagner
Saining Xie
VLM
CLIP
143
490
0
23 Dec 2021
Florence: A New Foundation Model for Computer Vision
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
...
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
132
906
0
22 Nov 2021
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
Yan Zeng
Xinsong Zhang
Hang Li
VLM
CLIP
74
307
0
16 Nov 2021
FILIP: Fine-grained Interactive Language-Image Pre-Training
Lewei Yao
Runhu Huang
Lu Hou
Guansong Lu
Minzhe Niu
Hang Xu
Xiaodan Liang
Zhenguo Li
Xin Jiang
Chunjing Xu
VLM
CLIP
105
639
0
09 Nov 2021
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
Hangbo Bao
Wenhui Wang
Li Dong
Qiang Liu
Owais Khan Mohammed
Kriti Aggarwal
Subhojit Som
Furu Wei
VLM
MLLM
MoE
74
558
0
03 Nov 2021
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Yangguang Li
Feng Liang
Lichen Zhao
Yufeng Cui
Wanli Ouyang
Jing Shao
F. Yu
Junjie Yan
VLM
CLIP
150
457
0
11 Oct 2021
Which Shortcut Cues Will DNNs Choose? A Study from the Parameter-Space Perspective
Luca Scimeca
Seong Joon Oh
Sanghyuk Chun
Michael Poli
Sangdoo Yun
OOD
497
54
0
06 Oct 2021
Compressive Visual Representations
Kuang-Huei Lee
Anurag Arnab
S. Guadarrama
John F. Canny
Ian S. Fischer
SSL
108
48
0
27 Sep 2021
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
212
1,970
0
16 Jul 2021
Can contrastive learning avoid shortcut solutions?
Joshua Robinson
Li Sun
Ke Yu
Kayhan Batmanghelich
Stefanie Jegelka
S. Sra
SSL
74
145
0
21 Jun 2021
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning
Adrien Bardes
Jean Ponce
Yann LeCun
SSL
DML
153
944
0
11 May 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
445
3,887
0
11 Feb 2021
Intriguing Properties of Contrastive Losses
Ting Chen
Calvin Luo
Lala Li
69
177
0
05 Nov 2020
What Should Not Be Contrastive in Contrastive Learning
Tete Xiao
Xiaolong Wang
Alexei A. Efros
Trevor Darrell
SSL
DRL
79
303
0
13 Aug 2020
What Makes for Good Views for Contrastive Learning?
Yonglong Tian
Chen Sun
Ben Poole
Dilip Krishnan
Cordelia Schmid
Phillip Isola
SSL
114
1,335
0
20 May 2020
MPNet: Masked and Permuted Pre-training for Language Understanding
Kaitao Song
Xu Tan
Tao Qin
Jianfeng Lu
Tie-Yan Liu
102
1,121
0
20 Apr 2020
Shortcut Learning in Deep Neural Networks
Robert Geirhos
J. Jacobsen
Claudio Michaelis
R. Zemel
Wieland Brendel
Matthias Bethge
Felix Wichmann
211
2,056
0
16 Apr 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
121
1,944
0
13 Apr 2020
A Simple Framework for Contrastive Learning of Visual Representations
Ting-Li Chen
Simon Kornblith
Mohammad Norouzi
Geoffrey E. Hinton
SSL
372
18,778
0
13 Feb 2020
Visual Semantic Reasoning for Image-Text Matching
Kunpeng Li
Yulun Zhang
Keqin Li
Yuanyuan Li
Y. Fu
VLM
87
505
0
06 Sep 2019
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers
Iryna Gurevych
1.3K
12,295
0
27 Aug 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
163
1,666
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
247
2,488
0
20 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSL
VLM
MLLM
207
905
0
16 Aug 2019
Fusion of Detected Objects in Text for Visual Question Answering
Chris Alberti
Jeffrey Ling
Michael Collins
David Reitter
62
173
0
14 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
144
1,962
0
09 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
231
3,693
0
06 Aug 2019
On Mutual Information Maximization for Representation Learning
Michael Tschannen
Josip Djolonga
Paul Kishan Rubenstein
Sylvain Gelly
Mario Lucic
SSL
179
500
0
31 Jul 2019
Contrastive Multiview Coding
Yonglong Tian
Dilip Krishnan
Phillip Isola
SSL
171
2,409
0
13 Jun 2019
Learning Representations by Maximizing Mutual Information Across Views
Philip Bachman
R. Devon Hjelm
William Buchwalter
SSL
195
1,476
0
03 Jun 2019
Stacked Cross Attention for Image-Text Matching
Kuang-Huei Lee
Xi Chen
G. Hua
Houdong Hu
Xiaodong He
93
1,154
0
21 Mar 2018
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
106
2,936
0
26 May 2017
Semi-Supervised Classification with Graph Convolutional Networks
Thomas Kipf
Max Welling
GNN
SSL
644
29,076
0
09 Sep 2016
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
215
2,489
0
01 Apr 2015
Deep Visual-Semantic Alignments for Generating Image Descriptions
A. Karpathy
Li Fei-Fei
140
5,590
0
07 Dec 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAtt
MDE
1.7K
100,479
0
04 Sep 2014
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Kyunghyun Cho
B. V. Merrienboer
Çağlar Gülçehre
Dzmitry Bahdanau
Fethi Bougares
Holger Schwenk
Yoshua Bengio
AIMat
1.0K
23,370
0
03 Jun 2014
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
413
43,777
0
01 May 2014
1