ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.17510
  4. Cited By
Demonstrating and Reducing Shortcuts in Vision-Language Representation
  Learning

Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning

27 February 2024
Maurits J. R. Bleeker
Mariya Hendriksen
Andrew Yates
Maarten de Rijke
    VLM
ArXiv (abs)PDFHTML

Papers citing "Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning"

42 / 42 papers shown
Title
Probing Mechanical Reasoning in Large Vision Language Models
Probing Mechanical Reasoning in Large Vision Language Models
Haoran Sun
Qingying Gao
Haiyun Lyu
Dezhi Luo
Yijiang Li
Hokin Deng
LRM
78
2
0
01 Oct 2024
When and why vision-language models behave like bags-of-words, and what
  to do about it?
When and why vision-language models behave like bags-of-words, and what to do about it?
Mert Yuksekgonul
Federico Bianchi
Pratyusha Kalluri
Dan Jurafsky
James Zou
VLMCoGe
76
393
0
04 Oct 2022
Rethinking Minimal Sufficient Representation in Contrastive Learning
Rethinking Minimal Sufficient Representation in Contrastive Learning
Haoqing Wang
Xun Guo
Zhiwei Deng
Yan Lu
SSL
54
76
0
14 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLMBDLVLMCLIP
542
4,398
0
28 Jan 2022
SLIP: Self-supervision meets Language-Image Pre-training
SLIP: Self-supervision meets Language-Image Pre-training
Norman Mu
Alexander Kirillov
David Wagner
Saining Xie
VLMCLIP
143
490
0
23 Dec 2021
Florence: A New Foundation Model for Computer Vision
Florence: A New Foundation Model for Computer Vision
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
...
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
132
906
0
22 Nov 2021
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual
  Concepts
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
Yan Zeng
Xinsong Zhang
Hang Li
VLMCLIP
74
307
0
16 Nov 2021
FILIP: Fine-grained Interactive Language-Image Pre-Training
FILIP: Fine-grained Interactive Language-Image Pre-Training
Lewei Yao
Runhu Huang
Lu Hou
Guansong Lu
Minzhe Niu
Hang Xu
Xiaodan Liang
Zhenguo Li
Xin Jiang
Chunjing Xu
VLMCLIP
105
639
0
09 Nov 2021
VLMo: Unified Vision-Language Pre-Training with
  Mixture-of-Modality-Experts
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
Hangbo Bao
Wenhui Wang
Li Dong
Qiang Liu
Owais Khan Mohammed
Kriti Aggarwal
Subhojit Som
Furu Wei
VLMMLLMMoE
74
558
0
03 Nov 2021
Supervision Exists Everywhere: A Data Efficient Contrastive
  Language-Image Pre-training Paradigm
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Yangguang Li
Feng Liang
Lichen Zhao
Yufeng Cui
Wanli Ouyang
Jing Shao
F. Yu
Junjie Yan
VLMCLIP
150
457
0
11 Oct 2021
Which Shortcut Cues Will DNNs Choose? A Study from the Parameter-Space
  Perspective
Which Shortcut Cues Will DNNs Choose? A Study from the Parameter-Space Perspective
Luca Scimeca
Seong Joon Oh
Sanghyuk Chun
Michael Poli
Sangdoo Yun
OOD
497
54
0
06 Oct 2021
Compressive Visual Representations
Compressive Visual Representations
Kuang-Huei Lee
Anurag Arnab
S. Guadarrama
John F. Canny
Ian S. Fischer
SSL
108
48
0
27 Sep 2021
Align before Fuse: Vision and Language Representation Learning with
  Momentum Distillation
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
212
1,970
0
16 Jul 2021
Can contrastive learning avoid shortcut solutions?
Can contrastive learning avoid shortcut solutions?
Joshua Robinson
Li Sun
Ke Yu
Kayhan Batmanghelich
Stefanie Jegelka
S. Sra
SSL
74
145
0
21 Jun 2021
VICReg: Variance-Invariance-Covariance Regularization for
  Self-Supervised Learning
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning
Adrien Bardes
Jean Ponce
Yann LeCun
SSLDML
153
944
0
11 May 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLMCLIP
445
3,887
0
11 Feb 2021
Intriguing Properties of Contrastive Losses
Intriguing Properties of Contrastive Losses
Ting Chen
Calvin Luo
Lala Li
69
177
0
05 Nov 2020
What Should Not Be Contrastive in Contrastive Learning
What Should Not Be Contrastive in Contrastive Learning
Tete Xiao
Xiaolong Wang
Alexei A. Efros
Trevor Darrell
SSLDRL
79
303
0
13 Aug 2020
What Makes for Good Views for Contrastive Learning?
What Makes for Good Views for Contrastive Learning?
Yonglong Tian
Chen Sun
Ben Poole
Dilip Krishnan
Cordelia Schmid
Phillip Isola
SSL
114
1,335
0
20 May 2020
MPNet: Masked and Permuted Pre-training for Language Understanding
MPNet: Masked and Permuted Pre-training for Language Understanding
Kaitao Song
Xu Tan
Tao Qin
Jianfeng Lu
Tie-Yan Liu
102
1,121
0
20 Apr 2020
Shortcut Learning in Deep Neural Networks
Shortcut Learning in Deep Neural Networks
Robert Geirhos
J. Jacobsen
Claudio Michaelis
R. Zemel
Wieland Brendel
Matthias Bethge
Felix Wichmann
211
2,056
0
16 Apr 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
121
1,944
0
13 Apr 2020
A Simple Framework for Contrastive Learning of Visual Representations
A Simple Framework for Contrastive Learning of Visual Representations
Ting-Li Chen
Simon Kornblith
Mohammad Norouzi
Geoffrey E. Hinton
SSL
372
18,778
0
13 Feb 2020
Visual Semantic Reasoning for Image-Text Matching
Visual Semantic Reasoning for Image-Text Matching
Kunpeng Li
Yulun Zhang
Keqin Li
Yuanyuan Li
Y. Fu
VLM
87
505
0
06 Sep 2019
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers
Iryna Gurevych
1.3K
12,295
0
27 Aug 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLMMLLMSSL
163
1,666
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLMMLLM
247
2,488
0
20 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal
  Pre-training
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSLVLMMLLM
207
905
0
16 Aug 2019
Fusion of Detected Objects in Text for Visual Question Answering
Fusion of Detected Objects in Text for Visual Question Answering
Chris Alberti
Jeffrey Ling
Michael Collins
David Reitter
62
173
0
14 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
144
1,962
0
09 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSLVLM
231
3,693
0
06 Aug 2019
On Mutual Information Maximization for Representation Learning
On Mutual Information Maximization for Representation Learning
Michael Tschannen
Josip Djolonga
Paul Kishan Rubenstein
Sylvain Gelly
Mario Lucic
SSL
179
500
0
31 Jul 2019
Contrastive Multiview Coding
Contrastive Multiview Coding
Yonglong Tian
Dilip Krishnan
Phillip Isola
SSL
171
2,409
0
13 Jun 2019
Learning Representations by Maximizing Mutual Information Across Views
Learning Representations by Maximizing Mutual Information Across Views
Philip Bachman
R. Devon Hjelm
William Buchwalter
SSL
195
1,476
0
03 Jun 2019
Stacked Cross Attention for Image-Text Matching
Stacked Cross Attention for Image-Text Matching
Kuang-Huei Lee
Xi Chen
G. Hua
Houdong Hu
Xiaodong He
93
1,154
0
21 Mar 2018
Multimodal Machine Learning: A Survey and Taxonomy
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
106
2,936
0
26 May 2017
Semi-Supervised Classification with Graph Convolutional Networks
Semi-Supervised Classification with Graph Convolutional Networks
Thomas Kipf
Max Welling
GNNSSL
644
29,076
0
09 Sep 2016
Microsoft COCO Captions: Data Collection and Evaluation Server
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
215
2,489
0
01 Apr 2015
Deep Visual-Semantic Alignments for Generating Image Descriptions
Deep Visual-Semantic Alignments for Generating Image Descriptions
A. Karpathy
Li Fei-Fei
140
5,590
0
07 Dec 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAttMDE
1.7K
100,479
0
04 Sep 2014
Learning Phrase Representations using RNN Encoder-Decoder for
  Statistical Machine Translation
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Kyunghyun Cho
B. V. Merrienboer
Çağlar Gülçehre
Dzmitry Bahdanau
Fethi Bougares
Holger Schwenk
Yoshua Bengio
AIMat
1.0K
23,370
0
03 Jun 2014
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
413
43,777
0
01 May 2014
1