Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.01359
Cited By
FOIL it! Find One mismatch between Image and Language caption
3 May 2017
Ravi Shekhar
Sandro Pezzelle
Yauhen Klimovich
Aurélie Herbelot
Moin Nabi
E. Sangineto
Raffaella Bernardi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FOIL it! Find One mismatch between Image and Language caption"
28 / 28 papers shown
Title
TULIP: Towards Unified Language-Image Pretraining
Zineng Tang
Long Lian
Seun Eisape
Xudong Wang
Roei Herzig
Adam Yala
Alane Suhr
Trevor Darrell
David M. Chan
VLM
CLIP
MLLM
103
3
0
19 Mar 2025
MASS: Overcoming Language Bias in Image-Text Matching
Jiwan Chung
Seungwon Lim
Sangkyu Lee
Youngjae Yu
VLM
32
0
0
20 Jan 2025
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
Michal Golovanevsky
William Rudman
Vedant Palit
Ritambhara Singh
Carsten Eickhoff
33
1
0
24 Jun 2024
Don't Buy it! Reassessing the Ad Understanding Abilities of Contrastive Multimodal Models
A. Bavaresco
A. Testoni
Raquel Fernández
36
2
0
31 May 2024
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Letitia Parcalabescu
Anette Frank
MLLM
CoGe
VLM
84
3
0
29 Apr 2024
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada
Kanta Kaneda
Daichi Saito
Komei Sugiura
34
24
0
28 Feb 2024
Data Augmentations for Improved (Large) Language Model Generalization
Amir Feder
Yoav Wald
Claudia Shi
S. Saria
David M. Blei
OOD
CML
32
7
0
19 Oct 2023
Scalable Performance Analysis for Vision-Language Models
Santiago Castro
Oana Ignat
Rada Mihalcea
VLM
35
1
0
30 May 2023
An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics
Saba Ahmadi
Aishwarya Agrawal
30
6
0
24 May 2023
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining
Emanuele Bugliarello
Aida Nematzadeh
Lisa Anne Hendricks
SSL
30
5
0
23 May 2023
Mutual Information Divergence: A Unified Metric for Multimodal Generative Models
Jin-Hwa Kim
Yunji Kim
Jiyoung Lee
Kang Min Yoo
Sang-Woo Lee
EGVM
36
32
0
25 May 2022
Can Visual Dialogue Models Do Scorekeeping? Exploring How Dialogue Representations Incrementally Encode Shared Knowledge
Brielen Madureira
David Schlangen
25
4
0
14 Apr 2022
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
Tristan Thrush
Ryan Jiang
Max Bartolo
Amanpreet Singh
Adina Williams
Douwe Kiela
Candace Ross
CoGe
34
401
0
07 Apr 2022
CLIP-Event: Connecting Text and Images with Event Structures
Manling Li
Ruochen Xu
Shuohang Wang
Luowei Zhou
Xudong Lin
Chenguang Zhu
Michael Zeng
Heng Ji
Shih-Fu Chang
VLM
CLIP
27
123
0
13 Jan 2022
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Yaya Shi
Xu Yang
Haiyang Xu
Chunfen Yuan
Bing Li
Weiming Hu
Zhengjun Zha
39
33
0
17 Nov 2021
Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond
Amir Feder
Katherine A. Keith
Emaad A. Manzoor
Reid Pryzant
Dhanya Sridhar
...
Roi Reichart
Margaret E. Roberts
Brandon M Stewart
Victor Veitch
Diyi Yang
CML
41
234
0
02 Sep 2021
QACE: Asking Questions to Evaluate an Image Caption
Hwanhee Lee
Thomas Scialom
Seunghyun Yoon
Franck Dernoncourt
Kyomin Jung
CoGe
25
18
0
28 Aug 2021
Probing Image-Language Transformers for Verb Understanding
Lisa Anne Hendricks
Aida Nematzadeh
30
114
0
16 Jun 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
17
1,442
0
18 Apr 2021
Probing Multimodal Embeddings for Linguistic Properties: the Visual-Semantic Case
Adam Dahlgren Lindström
Suna Bensch
Johanna Björklund
F. Drewes
24
20
0
22 Feb 2021
Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks
Letitia Parcalabescu
Albert Gatt
Anette Frank
Iacer Calixto
LRM
33
48
0
22 Dec 2020
Evaluating Models' Local Decision Boundaries via Contrast Sets
Matt Gardner
Yoav Artzi
Victoria Basmova
Jonathan Berant
Ben Bogin
...
Sanjay Subramanian
Reut Tsarfaty
Eric Wallace
Ally Zhang
Ben Zhou
ELM
43
84
0
06 Apr 2020
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
70
3,624
0
06 Aug 2019
Evaluating the Representational Hub of Language and Vision Models
Ravi Shekhar
Ece Takmaz
Raquel Fernández
Raffaella Bernardi
30
11
0
12 Apr 2019
Pre-gen metrics: Predicting caption quality metrics without generating captions
Marc Tanti
Albert Gatt
K. Camilleri
26
2
0
12 Oct 2018
How clever is the FiLM model, and how clever can it be?
A. Kuhnle
Huiyuan Xie
Ann A. Copestake
22
6
0
09 Sep 2018
Targeted Syntactic Evaluation of Language Models
Rebecca Marvin
Tal Linzen
31
406
0
27 Aug 2018
Defoiling Foiled Image Captions
Pranava Madhyastha
Josiah Wang
Lucia Specia
27
9
0
16 May 2018
1