ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.08164
  4. Cited By
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs

ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs

12 June 2024
Irene Huang
Wei Lin
M. Jehanzeb Mirza
Jacob A. Hansen
Sivan Doveh
V. Butoi
Roei Herzig
Assaf Arbelle
Hilde Kuhene
Trevor Darrel
Chuang Gan
Aude Oliva
Rogerio Feris
Leonid Karlinsky
    CoGe
    LRM
ArXivPDFHTML

Papers citing "ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs"

14 / 14 papers shown
Title
Baichuan-Omni-1.5 Technical Report
Yadong Li
J. Liu
Tao Zhang
Tao Zhang
S. Chen
...
Jianhua Xu
Haoze Sun
Mingan Lin
Zenan Zhou
Weipeng Chen
AuLLM
72
10
0
28 Jan 2025
Teaching VLMs to Localize Specific Objects from In-context Examples
Teaching VLMs to Localize Specific Objects from In-context Examples
Sivan Doveh
Nimrod Shabtay
Wei Lin
Eli Schwartz
Hilde Kuehne
...
Leonid Karlinsky
James Glass
Assaf Arbelle
S. Ullman
Muhammad Jehanzeb Mirza
VLM
100
1
0
20 Nov 2024
3DArticCyclists: Generating Synthetic Articulated 8D Pose-Controllable Cyclist Data for Computer Vision Applications
3DArticCyclists: Generating Synthetic Articulated 8D Pose-Controllable Cyclist Data for Computer Vision Applications
Eduardo R. Corral-Soto
Yang Liu
Tongtong Cao
Y. Ren
Liu Bingbing
46
5
0
14 Oct 2024
GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
Muhammad Jehanzeb Mirza
Mengjie Zhao
Zhuoyuan Mao
Sivan Doveh
Wei Lin
...
Yuki Mitsufuji
Horst Possegger
Rogerio Feris
Leonid Karlinsky
James Glass
VLM
84
1
0
08 Oct 2024
Towards Multimodal In-Context Learning for Vision & Language Models
Towards Multimodal In-Context Learning for Vision & Language Models
Sivan Doveh
Shaked Perek
M. Jehanzeb Mirza
Wei Lin
Amit Alfassy
Assaf Arbelle
S. Ullman
Leonid Karlinsky
VLM
110
14
0
19 Mar 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and
  Comprehension in Vision-Language Large Model
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Conghui He
Xingcheng Zhang
Yu Qiao
Dahua Lin
Jiaqi Wang
VLM
MLLM
79
244
0
29 Jan 2024
MiniGPT-v2: large language model as a unified interface for
  vision-language multi-task learning
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
160
440
0
14 Oct 2023
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action
  Recognition with Language Knowledge
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
Wei Lin
Leonid Karlinsky
Nina Shvetsova
Horst Possegger
Mateusz Koziñski
Rameswar Panda
Rogerio Feris
Hilde Kuehne
Horst Bischof
VLM
102
38
0
15 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
270
4,229
0
30 Jan 2023
MaPLe: Multi-modal Prompt Learning
MaPLe: Multi-modal Prompt Learning
Muhammad Uzair Khattak
H. Rasheed
Muhammad Maaz
Salman Khan
F. Khan
VPVLM
VLM
191
530
0
06 Oct 2022
CyCLIP: Cyclic Contrastive Language-Image Pretraining
CyCLIP: Cyclic Contrastive Language-Image Pretraining
Shashank Goel
Hritik Bansal
S. Bhatia
Ryan A. Rossi
Vishwa Vinay
Aditya Grover
CLIP
VLM
173
132
0
28 May 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
390
4,125
0
28 Jan 2022
Learning to Prompt for Vision-Language Models
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
325
2,263
0
02 Sep 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
298
3,693
0
11 Feb 2021
1