Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks

22 December 2020

Papers citing "Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks"

7 / 7 papers shown

Title
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation Michal Golovanevsky William Rudman Vedant Palit Ritambhara Singh Carsten Eickhoff 33 1 0 24 Jun 2024
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior? Ari Holtzman Peter West Luke Zettlemoyer AI4CE 30 14 0 31 Jul 2023
Text encoders bottleneck compositionality in contrastive vision-language models Amita Kamath Jack Hessel Kai-Wei Chang CoGe CLIP VLM 25 19 0 24 May 2023
Controlling for Stereotypes in Multimodal Language Model Evaluation Manuj Malik Richard Johansson 20 1 0 03 Feb 2023
Finding Structural Knowledge in Multimodal-BERT Victor Milewski Miryam de Lhoneux Marie-Francine Moens 19 9 0 17 Mar 2022
Recent Advances of Continual Learning in Computer Vision: An Overview Haoxuan Qu Hossein Rahmani Li Xu Bryan M. Williams Jun Liu VLM CLL 25 73 0 23 Sep 2021
Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers Stella Frank Emanuele Bugliarello Desmond Elliott 32 81 0 09 Sep 2021