ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.07796
  4. Cited By
CREPE: Can Vision-Language Foundation Models Reason Compositionally?

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

13 December 2022
Zixian Ma
Jerry Hong
Mustafa Omer Gul
Mona Gandhi
Irena Gao
Ranjay Krishna
    CoGe
ArXivPDFHTML

Papers citing "CREPE: Can Vision-Language Foundation Models Reason Compositionally?"

36 / 36 papers shown
Title
VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models
VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models
Dahun Kim
A. Piergiovanni
Ganesh Mallya
A. Angelova
CoGe
41
0
0
04 Apr 2025
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
Haoxin Li
Boyang Li
CoGe
73
0
0
03 Mar 2025
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation
Reza Abbasi
Ali Nazari
Aminreza Sefid
Mohammadali Banayeeanzade
M. Rohban
M. Baghshah
VLM
81
1
0
27 Feb 2025
MoFM: A Large-Scale Human Motion Foundation Model
MoFM: A Large-Scale Human Motion Foundation Model
Mohammadreza Baharani
Ghazal Alinezhad Noghre
Armin Danesh Pazho
Gabriel Maldonado
Hamed Tabkhi
AI4CE
137
0
0
08 Feb 2025
Know "No'' Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP
Know "No'' Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP
J. Park
Jungbeom Lee
Jongyoon Song
Sangwon Yu
Dahuin Jung
Sungroh Yoon
45
0
0
19 Jan 2025
Fine-grained and Explainable Factuality Evaluation for Multimodal Summarization
Fine-grained and Explainable Factuality Evaluation for Multimodal Summarization
Liqiang Jing
Jingxuan Zuo
Yue Zhang
36
7
0
31 Dec 2024
GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers
GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers
Éloi Zablocki
Valentin Gerard
Amaia Cardiel
Eric Gaussier
Matthieu Cord
Eduardo Valle
79
0
0
23 Nov 2024
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities
Zheyuan Zhang
Fengyuan Hu
Jayjun Lee
Freda Shi
Parisa Kordjamshidi
Joyce Chai
Ziqiao Ma
53
11
0
22 Oct 2024
Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective
Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective
Xiangru Zhu
Penglei Sun
Yaoxian Song
Yanghua Xiao
Zhixu Li
Chengyu Wang
Jun Huang
Bei Yang
Xiaoxiao Xu
EGVM
173
1
0
14 Oct 2024
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
Junzhuo Liu
X. Yang
Weiwei Li
Peng Wang
ObjD
56
3
0
23 Sep 2024
ComAlign: Compositional Alignment in Vision-Language Models
ComAlign: Compositional Alignment in Vision-Language Models
Ali Abdollah
Amirmohammad Izadi
Armin Saghafian
Reza Vahidimajd
Mohammad Mozafari
Amirreza Mirzaei
Mohammadmahdi Samiei
M. Baghshah
CoGe
VLM
30
0
0
12 Sep 2024
IVGF: The Fusion-Guided Infrared and Visible General Framework
IVGF: The Fusion-Guided Infrared and Visible General Framework
Fangcen Liu
Chenqiang Gao
Fang Chen
Pengcheng Li
Junjie Guo
Deyu Meng
29
0
0
02 Sep 2024
Deciphering the Role of Representation Disentanglement: Investigating
  Compositional Generalization in CLIP Models
Deciphering the Role of Representation Disentanglement: Investigating Compositional Generalization in CLIP Models
Reza Abbasi
M. Rohban
M. Baghshah
CoGe
38
5
0
08 Jul 2024
When does compositional structure yield compositional generalization? A kernel theory
When does compositional structure yield compositional generalization? A kernel theory
Samuel Lippl
Kim Stachenfeld
NAI
CoGe
73
5
0
26 May 2024
From Frege to chatGPT: Compositionality in language, cognition, and deep
  neural networks
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks
Jacob Russin
Sam Whitman McGrath
Danielle J. Williams
Lotem Elber-Dorozko
AI4CE
73
3
0
24 May 2024
Hallucination of Multimodal Large Language Models: A Survey
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
95
139
0
29 Apr 2024
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Letitia Parcalabescu
Anette Frank
MLLM
CoGe
VLM
82
3
0
29 Apr 2024
Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with
  Queryable Objects and Open-Set Relationships
Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships
Sebastian Koch
Narunas Vaskevicius
Mirco Colosi
Pedro Hermosilla
Timo Ropinski
3DPC
28
25
0
19 Feb 2024
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language
  Models
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
Sreyan Ghosh
Ashish Seth
Sonal Kumar
Utkarsh Tyagi
Chandra Kiran Reddy Evuru
S. Ramaneswaran
S. Sakshi
Oriol Nieto
R. Duraiswami
Dinesh Manocha
AuLLM
VLM
CoGe
35
21
0
12 Oct 2023
Beyond Task Performance: Evaluating and Reducing the Flaws of Large
  Multimodal Models with In-Context Learning
Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning
Mustafa Shukor
Alexandre Ramé
Corentin Dancette
Matthieu Cord
LRM
MLLM
38
20
0
01 Oct 2023
An Examination of the Compositionality of Large Generative
  Vision-Language Models
An Examination of the Compositionality of Large Generative Vision-Language Models
Teli Ma
Rong Li
Junwei Liang
CoGe
26
2
0
21 Aug 2023
Compositional diversity in visual concept learning
Compositional diversity in visual concept learning
Yanli Zhou
Reuben Feinman
Brenden Lake
CoGe
OCL
32
8
0
30 May 2023
Scalable Performance Analysis for Vision-Language Models
Scalable Performance Analysis for Vision-Language Models
Santiago Castro
Oana Ignat
Rada Mihalcea
VLM
27
1
0
30 May 2023
An Examination of the Robustness of Reference-Free Image Captioning
  Evaluation Metrics
An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics
Saba Ahmadi
Aishwarya Agrawal
27
6
0
24 May 2023
COLA: A Benchmark for Compositional Text-to-image Retrieval
COLA: A Benchmark for Compositional Text-to-image Retrieval
Arijit Ray
Filip Radenovic
Abhimanyu Dubey
Bryan A. Plummer
Ranjay Krishna
Kate Saenko
CoGe
VLM
38
34
0
05 May 2023
CyCLIP: Cyclic Contrastive Language-Image Pretraining
CyCLIP: Cyclic Contrastive Language-Image Pretraining
Shashank Goel
Hritik Bansal
S. Bhatia
Ryan A. Rossi
Vishwa Vinay
Aditya Grover
CLIP
VLM
173
132
0
28 May 2022
On Guiding Visual Attention with Language Specification
On Guiding Visual Attention with Language Specification
Suzanne Petryk
Lisa Dunlap
Keyan Nasseri
Joseph E. Gonzalez
Trevor Darrell
Anna Rohrbach
VLM
191
31
1
17 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
392
4,125
0
28 Jan 2022
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text
  Understanding
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
259
558
0
28 Sep 2021
COVR: A test-bed for Visually Grounded Compositional Generalization with
  real images
COVR: A test-bed for Visually Grounded Compositional Generalization with real images
Ben Bogin
Shivanshu Gupta
Matt Gardner
Jonathan Berant
CoGe
34
29
0
22 Sep 2021
ActionCLIP: A New Paradigm for Video Action Recognition
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
149
362
0
17 Sep 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Mohit Bansal
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
196
405
0
13 Jul 2021
Probing Classifiers: Promises, Shortcomings, and Advances
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
226
404
0
24 Feb 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
275
1,081
0
17 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
298
3,693
0
11 Feb 2021
Out of Order: How Important Is The Sequential Order of Words in a
  Sentence in Natural Language Understanding Tasks?
Out of Order: How Important Is The Sequential Order of Words in a Sentence in Natural Language Understanding Tasks?
Thang M. Pham
Trung Bui
Long Mai
Anh Totti Nguyen
214
122
0
30 Dec 2020
1