Object-centric Binding in Contrastive Language-Image Pretraining

19 February 2025

Adriana Romero Soriano

Papers citing "Object-centric Binding in Contrastive Language-Image Pretraining"

20 / 20 papers shown

Title
An Introduction to Vision-Language Modeling Florian Bordes Richard Yuanzhe Pang Anurag Ajay Alexander C. Li Adrien Bardes ... Karen Ullrich Aishwarya Agrawal Kate Saenko Asli Celikyilmaz Vikas Chandra VLM 48 77 0 27 May 2024
SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models Ziyi Wu Jingyu Hu Wuyue Lu Igor Gilitschenski Animesh Garg DiffM OCL 54 47 0 18 May 2023
Sigmoid Loss for Language Image Pre-Training Xiaohua Zhai Basil Mustafa Alexander Kolesnikov Lucas Beyer CLIP VLM 83 1,076 0 27 Mar 2023
When and why vision-language models behave like bags-of-words, and what to do about it? Mert Yuksekgonul Federico Bianchi Pratyusha Kalluri Dan Jurafsky James Zou VLM CoGe 54 384 0 04 Oct 2022
Bridging the Gap to Real-World Object-Centric Learning Maximilian Seitzer Max Horn Andrii Zadaianchuk Dominik Zietlow Tianjun Xiao ... Tong He Zheng Zhang Bernhard Schölkopf Thomas Brox Francesco Locatello OCL 71 145 0 29 Sep 2022
PaLI: A Jointly-Scaled Multilingual Language-Image Model Xi Chen Tianlin Li Soravit Changpinyo A. Piergiovanni Piotr Padlewski ... Andreas Steiner A. Angelova Xiaohua Zhai N. Houlsby Radu Soricut MLLM VLM 63 704 0 14 Sep 2022
VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations Tiancheng Zhao Tianqi Zhang Mingwei Zhu Haozhan Shen Kyusong Lee Xiaopeng Lu Jianwei Yin VLM CoGe MLLM 83 93 0 01 Jul 2022
SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos Gamaleldin F. Elsayed Aravindh Mahendran Sjoerd van Steenkiste Klaus Greff Michael C. Mozer Thomas Kipf VOS OCL 98 142 0 15 Jun 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents Aditya A. Ramesh Prafulla Dhariwal Alex Nichol Casey Chu Mark Chen VLM DiffM 282 6,768 0 13 Apr 2022
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality Tristan Thrush Ryan Jiang Max Bartolo Amanpreet Singh Adina Williams Douwe Kiela Candace Ross CoGe 82 415 0 07 Apr 2022
SLIP: Self-supervision meets Language-Image Pre-training Norman Mu Alexander Kirillov David Wagner Saining Xie VLM CLIP 105 485 0 23 Dec 2021
CLIP-Adapter: Better Vision-Language Models with Feature Adapters Peng Gao Shijie Geng Renrui Zhang Teli Ma Rongyao Fang Yongfeng Zhang Hongsheng Li Yu Qiao VLM CLIP 204 1,011 0 09 Oct 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning Jack Hessel Ari Holtzman Maxwell Forbes Ronan Le Bras Yejin Choi CLIP 82 1,512 0 18 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision Alec Radford Jong Wook Kim Chris Hallacy Aditya A. Ramesh Gabriel Goh ... Amanda Askell Pamela Mishkin Jack Clark Gretchen Krueger Ilya Sutskever CLIP VLM 681 28,659 0 26 Feb 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts Soravit Changpinyo P. Sharma Nan Ding Radu Soricut VLM 386 1,103 0 17 Feb 2021
On the Binding Problem in Artificial Neural Networks Klaus Greff Sjoerd van Steenkiste Jürgen Schmidhuber OCL 268 259 0 09 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai ... Matthias Minderer G. Heigold Sylvain Gelly Jakob Uszkoreit N. Houlsby ViT 400 40,217 0 22 Oct 2020
Hard Negative Mixing for Contrastive Learning Yannis Kalantidis Mert Bulent Sariyildiz Noé Pion Philippe Weinzaepfel Diane Larlus SSL 93 636 0 02 Oct 2020
Object-Centric Learning with Slot Attention Francesco Locatello Dirk Weissenborn Thomas Unterthiner Aravindh Mahendran G. Heigold Jakob Uszkoreit Alexey Dosovitskiy Thomas Kipf OCL 178 832 0 26 Jun 2020
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering Yash Goyal Tejas Khot D. Summers-Stay Dhruv Batra Devi Parikh CoGe 291 3,187 0 02 Dec 2016