Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1811.00491
Cited By
A Corpus for Reasoning About Natural Language Grounded in Photographs
1 November 2018
Alane Suhr
Stephanie Zhou
Ally Zhang
Iris Zhang
Huajun Bai
Yoav Artzi
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Corpus for Reasoning About Natural Language Grounded in Photographs"
50 / 162 papers shown
Title
GeoMM: On Geodesic Perspective for Multi-modal Learning
Shibin Mei
Hang Wang
Bingbing Ni
22
0
0
16 May 2025
FLIP Reasoning Challenge
Andreas Plesner
Turlan Kuzhagaliyev
Roger Wattenhofer
AAML
VLM
LRM
83
0
0
16 Apr 2025
Impact of Language Guidance: A Reproducibility Study
Cherish Puniani
Advika Sinha
Shree Singhi
Aayan Yadav
VLM
47
0
0
10 Apr 2025
Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization
Iñigo Pikabea
Iñaki Lacunza
Oriol Pareras
Carlos Escolano
Aitor Gonzalez-Agirre
Javier Hernando
Marta Villegas
VLM
56
0
0
28 Mar 2025
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Jiayi Ji
Jie Lou
Debing Zhang
Rongrong Ji
95
0
0
26 Mar 2025
Quantifying Memorization and Retriever Performance in Retrieval-Augmented Vision-Language Models
Peter Carragher
Abhinand Jha
R Raghav
Kathleen M. Carley
RALM
75
0
0
20 Feb 2025
Natural Language Generation from Visual Sequences: Challenges and Future Directions
Aditya K Surikuchi
Raquel Fernández
Sandro Pezzelle
EGVM
246
0
0
18 Feb 2025
Triplet Synthesis For Enhancing Composed Image Retrieval via Counterfactual Image Generation
Kenta Uesugi
Naoki Saito
Keisuke Maeda
Takahiro Ogawa
Miki Haseyama
44
0
0
22 Jan 2025
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Chan Hee Song
Valts Blukis
Jonathan Tremblay
Stephen Tyree
Yu-Chuan Su
Stan Birchfield
96
6
0
25 Nov 2024
Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios
Shantanu Jaiswal
Debaditya Roy
Basura Fernando
Cheston Tan
ReLM
LRM
79
2
0
20 Nov 2024
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
56
9
0
16 Oct 2024
ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy
Hong Li
Zhiquan Tan
Xingyu Li
Weiran Huang
CLL
MoMe
43
1
0
14 Oct 2024
Recent Advances of Multimodal Continual Learning: A Comprehensive Survey
Dianzhi Yu
Xinni Zhang
Yankai Chen
Aiwei Liu
Yifei Zhang
Philip S. Yu
Irwin King
VLM
CLL
44
9
0
07 Oct 2024
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
Junzhuo Liu
X. Yang
Weiwei Li
Peng Wang
ObjD
56
3
0
23 Sep 2024
Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models
Yuan-Hong Liao
Rafid Mahmood
Sanja Fidler
David Acuna
ReLM
LRM
39
10
0
15 Sep 2024
FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models
Yiyuan Li
Shichao Sun
Pengfei Liu
LRM
62
0
0
01 Jul 2024
MU-Bench: A Multitask Multimodal Benchmark for Machine Unlearning
Jiali Cheng
Hadi Amiri
BDL
45
3
0
21 Jun 2024
Multi-Head Mixture-of-Experts
Xun Wu
Shaohan Huang
Wenhui Wang
Furu Wei
MoE
39
12
0
23 Apr 2024
Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks
Tingyu Qu
Tinne Tuytelaars
Marie-Francine Moens
MoE
46
2
0
14 Mar 2024
What Is Missing in Multilingual Visual Reasoning and How to Fix It
Yueqi Song
Simran Khanuja
Graham Neubig
VLM
LRM
100
6
0
03 Mar 2024
Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion
Ziyue Wang
Chi Chen
Yiqi Zhu
Fuwen Luo
Peng Li
Ming Yan
Ji Zhang
Fei Huang
Maosong Sun
Yang Liu
46
5
0
19 Feb 2024
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLM
MLLM
48
29
0
19 Dec 2023
TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training
Chaoya Jiang
Wei Ye
Haiyang Xu
Qinghao Ye
Mingshi Yan
Ji Zhang
Shikun Zhang
CLIP
VLM
27
4
0
14 Dec 2023
What's left can't be right -- The remaining positional incompetence of contrastive vision-language models
Nils Hoehing
Ellen Rushe
Anthony Ventresque
VLM
28
2
0
20 Nov 2023
MultiDelete for Multimodal Machine Unlearning
Jiali Cheng
Hadi Amiri
MU
44
7
0
18 Nov 2023
Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals
Yanai Elazar
Bhargavi Paranjape
Hao Peng
Sarah Wiegreffe
Khyathi Raghavi
Vivek Srikumar
Sameer Singh
Noah A. Smith
AAML
OOD
34
0
0
16 Nov 2023
Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval
Junyang Chen
Hanjiang Lai
VLM
45
15
0
13 Nov 2023
Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models
Laura Cabello
Emanuele Bugliarello
Stephanie Brandl
Desmond Elliott
23
7
0
26 Oct 2023
Towards Robust Multi-Modal Reasoning via Model Selection
Xiangyan Liu
Rongxue Li
Wei Ji
Tao Lin
LLMAG
LRM
37
3
0
12 Oct 2023
Sentence-level Prompts Benefit Composed Image Retrieval
Yang Bai
Xinxing Xu
Yong-Jin Liu
Salman Khan
Fahad Khan
Wangmeng Zuo
Rick Siow Mong Goh
Chun-Mei Feng
38
26
0
09 Oct 2023
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
Noriyuki Kojima
Hadar Averbuch-Elor
Yoav Artzi
28
2
0
06 Sep 2023
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Yupan Huang
Zaiqiao Meng
Fangyu Liu
Yixuan Su
Nigel Collier
Yutong Lu
MLLM
41
22
0
31 Aug 2023
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Junyi Chen
Longteng Guo
Jianxiang Sun
Shuai Shao
Zehuan Yuan
Liang Lin
Dongyu Zhang
MLLM
VLM
MoE
60
9
0
23 Aug 2023
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Yonatan Bitton
Hritik Bansal
Jack Hessel
Rulin Shao
Wanrong Zhu
Anas Awadalla
Josh Gardner
Rohan Taori
L. Schimdt
VLM
31
77
0
12 Aug 2023
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models
Cheng-Yu Hsieh
Sibei Chen
Chun-Liang Li
Yasuhisa Fujii
Alexander Ratner
Chen-Yu Lee
Ranjay Krishna
Tomas Pfister
LLMAG
SyDa
46
41
0
01 Aug 2023
Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks
Kousik Rajesh
Mrigank Raman
M. A. Karim
Pranit Chawla
VLM
25
2
0
31 Jul 2023
MESED: A Multi-modal Entity Set Expansion Dataset with Fine-grained Semantic Classes and Hard Negative Entities
Yong Li
Tingwei Lu
Hai-Tao Zheng
Tianyu Yu
Shulin Huang
Haitao Zheng
Rui Zhang
Jun Yuan
53
10
0
27 Jul 2023
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
28
5
0
06 Jul 2023
Visual Instruction Tuning with Polite Flamingo
Delong Chen
Jianfeng Liu
Wenliang Dai
Baoyuan Wang
MLLM
34
42
0
03 Jul 2023
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models
Shuo Chen
Jindong Gu
Zhen Han
Yunpu Ma
Philip Torr
Volker Tresp
VPVLM
VLM
39
17
0
03 Jun 2023
PuMer: Pruning and Merging Tokens for Efficient Vision Language Models
Qingqing Cao
Bhargavi Paranjape
Hannaneh Hajishirzi
MLLM
VLM
13
21
0
27 May 2023
Are Diffusion Models Vision-And-Language Reasoners?
Benno Krojer
Elinor Poole-Dayan
Vikram S. Voleti
Christopher Pal
Siva Reddy
45
13
0
25 May 2023
Text encoders bottleneck compositionality in contrastive vision-language models
Amita Kamath
Jack Hessel
Kai-Wei Chang
CoGe
CLIP
VLM
30
19
0
24 May 2023
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions
Woojeong Jin
Subhabrata Mukherjee
Yu Cheng
Yelong Shen
Weizhu Chen
Ahmed Hassan Awadallah
Damien Jose
Xiang Ren
ObjD
VLM
33
8
0
24 May 2023
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining
Emanuele Bugliarello
Aida Nematzadeh
Lisa Anne Hendricks
SSL
30
5
0
23 May 2023
Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks
Sherzod Hakimov
David Schlangen
VLM
36
5
0
23 May 2023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Zikang Liu
Sihan Chen
Longteng Guo
Handong Li
Xingjian He
Jiaheng Liu
15
1
0
19 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
48
115
0
18 May 2023
RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training
Chulun Zhou
Yunlong Liang
Fandong Meng
Jinan Xu
Jinsong Su
Jie Zhou
VLM
23
4
0
13 May 2023
Scene Text Recognition with Image-Text Matching-guided Dictionary
Jiajun Wei
Hongjian Zhan
X. Tu
Yue Lu
Umapada Pal
VLM
17
0
0
08 May 2023
1
2
3
4
Next