Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
1811.00491
Cited By
v1
v2
v3 (latest)
A Corpus for Reasoning About Natural Language Grounded in Photographs
1 November 2018
Alane Suhr
Stephanie Zhou
Ally Zhang
Iris Zhang
Huajun Bai
Yoav Artzi
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A Corpus for Reasoning About Natural Language Grounded in Photographs"
50 / 419 papers shown
Title
Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs
Xiao Xu
L. Qin
Wanxiang Che
Min-Yen Kan
MoE
VLM
49
0
0
13 Jun 2025
Burn After Reading: Do Multimodal Large Language Models Truly Capture Order of Events in Image Sequences?
Yingjin Song
Yupei Du
Denis Paperno
Albert Gatt
MLLM
140
0
0
12 Jun 2025
Language-Vision Planner and Executor for Text-to-Visual Reasoning
Yichang Xu
Gaowen Liu
Ramana Rao Kompella
Sihao Hu
Tiansheng Huang
Fatih Ilhan
Selim Furkan Tekin
Zachary Yahn
Ling Liu
LRM
VLM
32
0
0
09 Jun 2025
Coordinated Robustness Evaluation Framework for Vision-Language Models
Ashwin Ramesh Babu
Sajad Mousavi
Vineet Gundecha
Sahand Ghorbanpour
Avisek Naug
Antonio Guillen
Ricardo Luna Gutierrez
Soumyendu Sarkar
AAML
40
0
0
05 Jun 2025
Navigating the Accuracy-Size Trade-Off with Flexible Model Merging
Akash Dhasade
Divyansh Jhunjhunwala
Milos Vujasinovic
Gauri Joshi
Anne-Marie Kermarrec
MoMe
87
0
0
29 May 2025
Efficiently Enhancing General Agents With Hierarchical-categorical Memory
Changze Qiao
Mingming Lu
LLMAG
45
0
0
28 May 2025
ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval
Eric Xing
Pranavi Kolouju
Robert Pless
Abby Stylianou
Nathan Jacobs
28
0
0
27 May 2025
MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval
Rong-Cheng Tu
Zhao Jin
Jingyi Liao
Xiao Luo
Yingjie Wang
Li Shen
Dacheng Tao
123
0
0
26 May 2025
DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval
Yuxin Yang
Yinan Zhou
Yuxin Chen
Ziqi Zhang
Zongyang Ma
...
Bing Li
Lin Song
Jun Gao
Peng Li
Weiming Hu
201
0
0
23 May 2025
Knot So Simple: A Minimalistic Environment for Spatial Reasoning
Zizhao Chen
Yoav Artzi
LRM
298
0
0
23 May 2025
GeoMM: On Geodesic Perspective for Multi-modal Learning
Shibin Mei
Hang Wang
Bingbing Ni
84
0
0
16 May 2025
Investigating the Effect of Parallel Data in the Cross-Lingual Transfer for Vision-Language Encoders
Andrei-Alexandru Manea
Jindřich Libovický
VLM
128
0
0
30 Apr 2025
Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains
Jingyun Zhang
Chuanqi Cheng
Yang Liu
Wen Liu
Jian Luan
Rui Yan
70
4
0
28 Apr 2025
FLIP Reasoning Challenge
Andreas Plesner
Turlan Kuzhagaliyev
Roger Wattenhofer
AAML
VLM
LRM
194
0
0
16 Apr 2025
TMCIR: Token Merge Benefits Composed Image Retrieval
Chaoyang Wang
Zeyu Zhang
Long Teng
Zijun Li
Shichao Kan
109
0
0
15 Apr 2025
TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models
Jaewoo Lee
Keyang Xuan
Chanakya Ekbote
Sandeep Polisetty
Yi R. Fung
Paul Pu Liang
VLM
98
1
0
14 Apr 2025
Impact of Language Guidance: A Reproducibility Study
Cherish Puniani
Advika Sinha
Shree Singhi
Aayan Yadav
VLM
220
0
0
10 Apr 2025
Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization
Iñigo Pikabea
Iñaki Lacunza
Oriol Pareras
Carlos Escolano
Aitor Gonzalez-Agirre
Javier Hernando
Marta Villegas
VLM
214
1
0
28 Mar 2025
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Jiayi Ji
Jie Lou
Debing Zhang
Rongrong Ji
256
2
0
26 Mar 2025
Coeff-Tuning: A Graph Filter Subspace View for Tuning Attention-Based Large Models
Zichen Miao
Wei Chen
Qiang Qiu
151
2
0
24 Mar 2025
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Xinyu Tian
Shu Zou
Zhaoyuan Yang
Jing Zhang
115
3
0
18 Mar 2025
Unified Autoregressive Visual Generation and Understanding with Continuous Tokens
Lijie Fan
Luming Tang
Siyang Qin
Tianhong Li
Xuan S. Yang
...
Tao Zhu
Michael Rubinstein
Michalis Raptis
Deqing Sun
Radu Soricut
138
8
0
17 Mar 2025
Seeing Delta Parameters as JPEG Images: Data-Free Delta Compression with Discrete Cosine Transform
Chenyu Huang
Peng Ye
Xinyu Wang
Shenghe Zheng
Biqing Qi
Lei Bai
Wanli Ouyang
Tao Chen
70
2
0
09 Mar 2025
Composed Multi-modal Retrieval: A Survey of Approaches and Applications
Kun Zhang
Jingyu Li
Zhiyu Li
Jingjing Zhang
F. Li
...
Nan Chen
Lei Zhang
Yongdong Zhang
Zhendong Mao
S.Kevin Zhou
101
0
0
03 Mar 2025
Quantifying Memorization and Parametric Response Rates in Retrieval-Augmented Vision-Language Models
Peter Carragher
Abhinand Jha
R Raghav
Kathleen M. Carley
RALM
158
0
0
19 Feb 2025
A Comprehensive Survey on Composed Image Retrieval
Xuemeng Song
Haoqiang Lin
Haokun Wen
Bohan Hou
Mingzhu Xu
Liqiang Nie
139
3
0
19 Feb 2025
Natural Language Generation from Visual Events: Challenges and Future Directions
Aditya K Surikuchi
Raquel Fernández
Sandro Pezzelle
EGVM
534
0
0
18 Feb 2025
Triplet Synthesis For Enhancing Composed Image Retrieval via Counterfactual Image Generation
Kenta Uesugi
Naoki Saito
Keisuke Maeda
Takahiro Ogawa
Miki Haseyama
89
0
0
22 Jan 2025
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering
Anupam Pandey
Deepjyoti Bodo
Arpan Phukan
Asif Ekbal
158
0
0
13 Jan 2025
SCOT: Self-Supervised Contrastive Pretraining For Zero-Shot Compositional Retrieval
Bhavin Jawade
JOÃO-BRUNO Soares
K. Thadani
D. Mohan
Amir Erfan Eshratifar
Benjamin Culpepper
Paloma de Juan
S. Setlur
V. Govindaraju
104
0
0
12 Jan 2025
AI Benchmarks and Datasets for LLM Evaluation
Todor Ivanov
Valeri Penchev
174
2
0
02 Dec 2024
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Chan Hee Song
Valts Blukis
Jonathan Tremblay
Stephen Tyree
Yu-Chuan Su
Stan Birchfield
289
20
0
25 Nov 2024
Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy
Yuchen Li
Fan Ma
Yi Yang
201
3
0
24 Nov 2024
Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios
Shantanu Jaiswal
Debaditya Roy
Basura Fernando
Cheston Tan
ReLM
LRM
151
2
0
20 Nov 2024
Renaissance: Investigating the Pretraining of Vision-Language Encoders
Clayton Fields
C. Kennington
VLM
73
0
0
11 Nov 2024
MoTaDual: Modality-Task Dual Alignment for Enhanced Zero-shot Composed Image Retrieval
Haiwen Li
Fei Su
Zhicheng Zhao
81
0
0
31 Oct 2024
Improving Generalization in Visual Reasoning via Self-Ensemble
Tien-Huy Nguyen
Quang-Khai Tran
Anh-Tuan Quang-Hoang
VLM
LRM
140
6
0
28 Oct 2024
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Ziyu Liu
Yuhang Zang
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Haodong Duan
Zeang Sheng
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
115
12
0
23 Oct 2024
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Shailaja Keyur Sampat
Mutsumi Nakamura
Shankar Kailas
Kartik Aggarwal
Mandy Zhou
Yezhou Yang
Chitta Baral
MLLM
CoGe
ReLM
VLM
LRM
87
0
0
17 Oct 2024
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
127
9
0
16 Oct 2024
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs
Yunqiu Xu
Linchao Zhu
Yi Yang
143
5
0
16 Oct 2024
ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy
Hong Li
Zhiquan Tan
Xingyu Li
Weiran Huang
CLL
MoMe
78
1
0
14 Oct 2024
Can We Predict Performance of Large Models across Vision-Language Tasks?
Qinyu Zhao
Ming Xu
Kartik Gupta
Akshay Asthana
Liang Zheng
Stephen Gould
145
0
0
14 Oct 2024
Recent Advances of Multimodal Continual Learning: A Comprehensive Survey
Dianzhi Yu
Xinni Zhang
Yankai Chen
Aiwei Liu
Yifei Zhang
Philip S. Yu
Irwin King
VLM
CLL
115
13
0
07 Oct 2024
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Haotian Zhang
Mingfei Gao
Zhe Gan
Philipp Dufter
Nina Wenzel
...
Haoxuan You
Zirui Wang
Afshin Dehghan
Peter Grasch
Yinfei Yang
VLM
MLLM
143
41
1
30 Sep 2024
The Hard Positive Truth about Vision-Language Compositionality
Amita Kamath
Cheng-Yu Hsieh
Kai-Wei Chang
Ranjay Krishna
CLIP
CoGe
VLM
91
8
0
26 Sep 2024
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
Junzhuo Liu
Xiaohu Yang
Weiwei Li
Peng Wang
ObjD
156
5
0
23 Sep 2024
Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models
Yuan-Hong Liao
Rafid Mahmood
Sanja Fidler
David Acuna
ReLM
LRM
98
16
0
15 Sep 2024
Building and better understanding vision-language models: insights and future directions
Hugo Laurençon
Andrés Marafioti
Victor Sanh
Léo Tronchon
VLM
140
78
0
22 Aug 2024
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Jiabo Ye
Haiyang Xu
Haowei Liu
Anwen Hu
Ming Yan
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
117
139
0
09 Aug 2024
1
2
3
4
5
6
7
8
9
Next