Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
1811.00491
Cited By
v1
v2
v3 (latest)
A Corpus for Reasoning About Natural Language Grounded in Photographs
1 November 2018
Alane Suhr
Stephanie Zhou
Ally Zhang
Iris Zhang
Huajun Bai
Yoav Artzi
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A Corpus for Reasoning About Natural Language Grounded in Photographs"
50 / 419 papers shown
Title
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
Gregor Geigle
Abhay Jain
Radu Timofte
Goran Glavaš
VLM
MLLM
123
32
0
13 Jul 2023
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
66
5
0
06 Jul 2023
Several categories of Large Language Models (LLMs): A Short Survey
Saurabh Pahune
Manoj Chandrasekharan
AILaw
69
17
0
05 Jul 2023
What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?
Yan Zeng
Hanbo Zhang
Jiani Zheng
Jiangnan Xia
Guoqiang Wei
Yang Wei
Yuchen Zhang
Tao Kong
MLLM
113
79
0
05 Jul 2023
Visual Instruction Tuning with Polite Flamingo
Delong Chen
Jianfeng Liu
Wenliang Dai
Baoyuan Wang
MLLM
126
48
0
03 Jul 2023
Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering
A. S. Penamakuri
Manish Gupta
Mithun Das Gupta
Anand Mishra
78
7
0
29 Jun 2023
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models
Avinash Madasu
Vasudev Lal
CoGe
116
3
0
28 Jun 2023
Approximated Prompt Tuning for Vision-Language Pre-trained Models
Qiong Wu
Shubin Huang
Yiyi Zhou
Pingyang Dai
Annan Shu
Guannan Jiang
Rongrong Ji
VLM
VPVLM
42
2
0
27 Jun 2023
Improving Reference-based Distinctive Image Captioning with Contrastive Rewards
Yangjun Mao
Jun Xiao
Dong Zhang
Meng Cao
Jian Shao
Yueting Zhuang
Long Chen
EGVM
76
9
0
25 Jun 2023
Zero-shot Composed Text-Image Retrieval
Yikun Liu
Jiangchao Yao
Ya Zhang
Yanfeng Wang
Weidi Xie
83
25
0
12 Jun 2023
Global and Local Semantic Completion Learning for Vision-Language Pre-training
Rong-Cheng Tu
Yatai Ji
Jie Jiang
Weijie Kong
Chengfei Cai
Wenzhe Zhao
Hongfa Wang
Yujiu Yang
Wei Liu
VLM
100
4
0
12 Jun 2023
Modular Visual Question Answering via Code Generation
Sanjay Subramanian
Medhini Narasimhan
Kushal Khangaonkar
Kevin Kaichuang Yang
Arsha Nagrani
Cordelia Schmid
Andy Zeng
Trevor Darrell
Dan Klein
77
51
0
08 Jun 2023
UniBoost: Unsupervised Unimodal Pre-training for Boosting Zero-shot Vision-Language Tasks
Yanan Sun
Zi-Qi Zhong
Qi Fan
Chi-Keung Tang
Yu-Wing Tai
VLM
78
4
0
07 Jun 2023
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models
Shuo Chen
Jindong Gu
Zhen Han
Yunpu Ma
Philip Torr
Volker Tresp
VPVLM
VLM
129
21
0
03 Jun 2023
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Qiangchang Wang
Yilong Yin
110
0
0
02 Jun 2023
GPT4Image: Large Pre-trained Models Help Vision Models Learn Better on Perception Task
Ning Ding
Yehui Tang
Zhongqian Fu
Chaoting Xu
Kai Han
Yunhe Wang
MLLM
VLM
57
2
0
01 Jun 2023
ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Xiao Xu
Bei Li
Chenfei Wu
Shao-Yen Tseng
Anahita Bhiwandiwalla
Shachar Rosenman
Vasudev Lal
Wanxiang Che
Nan Duan
AIFin
VLM
83
4
0
31 May 2023
Too Large; Data Reduction for Vision-Language Pre-Training
Alex Jinpeng Wang
Kevin Qinghong Lin
David Junhao Zhang
Stan Weixian Lei
Mike Zheng Shou
VLM
93
24
0
31 May 2023
Z-GMOT: Zero-shot Generic Multiple Object Tracking
Kim Hoang Tran
Anh Duy Le Dinh
Tien-Phat Nguyen
Thinh Phan
Pha Nguyen
Khoa Luu
Don Adjeroh
Gianfranco Doretto
Ngan Hoang Le
VOT
99
7
0
28 May 2023
PuMer: Pruning and Merging Tokens for Efficient Vision Language Models
Qingqing Cao
Bhargavi Paranjape
Hannaneh Hajishirzi
MLLM
VLM
83
27
0
27 May 2023
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers
Dachuan Shi
Chaofan Tao
Anyi Rao
Zhendong Yang
Chun Yuan
Jiaqi Wang
VLM
138
23
0
27 May 2023
MPCHAT: Towards Multimodal Persona-Grounded Conversation
Jaewoo Ahn
Yeda Song
Sangdoo Yun
Gunhee Kim
60
22
0
27 May 2023
Are Diffusion Models Vision-And-Language Reasoners?
Benno Krojer
Elinor Poole-Dayan
Vikram S. Voleti
Christopher Pal
Siva Reddy
111
14
0
25 May 2023
Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder
Zheyuan Liu
Weixuan Sun
Damien Teney
Stephen Gould
96
19
0
25 May 2023
Weakly Supervised Vision-and-Language Pre-training with Relative Representations
Chi Chen
Peng Li
Maosong Sun
Yang Liu
75
2
0
24 May 2023
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient Vision-Language Models
Zekun Wang
Jingchang Chen
Wangchunshu Zhou
Haichao Zhu
Jiafeng Liang
Liping Shan
Ming Liu
Dongliang Xu
Qing Yang
Bing Qin
VLM
104
5
0
24 May 2023
Text encoders bottleneck compositionality in contrastive vision-language models
Amita Kamath
Jack Hessel
Kai-Wei Chang
CoGe
CLIP
VLM
99
21
0
24 May 2023
Meta-learning For Vision-and-language Cross-lingual Transfer
Hanxu Hu
Frank Keller
VLM
92
2
0
24 May 2023
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions
Woojeong Jin
Subhabrata Mukherjee
Yu Cheng
Yelong Shen
Weizhu Chen
Ahmed Hassan Awadallah
Damien Jose
Xiang Ren
ObjD
VLM
121
8
0
24 May 2023
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining
Emanuele Bugliarello
Aida Nematzadeh
Lisa Anne Hendricks
SSL
115
5
0
23 May 2023
Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks
Sherzod Hakimov
David Schlangen
VLM
73
5
0
23 May 2023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Zikang Liu
Sihan Chen
Longteng Guo
Handong Li
Xingjian He
Qingbin Liu
87
1
0
19 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
151
122
0
18 May 2023
Probing the Role of Positional Information in Vision-Language Models
Philipp J. Rösch
Jindrich Libovický
65
8
0
17 May 2023
RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training
Chulun Zhou
Yunlong Liang
Fandong Meng
Jinan Xu
Jinsong Su
Jie Zhou
VLM
71
4
0
13 May 2023
Measuring Progress in Fine-grained Vision-and-Language Understanding
Emanuele Bugliarello
Laurent Sartran
Aishwarya Agrawal
Lisa Anne Hendricks
Aida Nematzadeh
VLM
99
25
0
12 May 2023
Egocentric Hierarchical Visual Semantics
L. Erculiani
A. Bontempelli
Andrea Passerini
Fausto Giunchiglia
OCL
45
2
0
09 May 2023
Scene Text Recognition with Image-Text Matching-guided Dictionary
Jiajun Wei
Hongjian Zhan
X. Tu
Yue Lu
Umapada Pal
VLM
53
0
0
08 May 2023
Vision Language Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation
Chaoya Jiang
Wei Ye
Haiyang Xu
Miang yan
Shikun Zhang
Jie Zhang
Fei Huang
VLM
106
16
0
08 May 2023
Visual Reasoning: from State to Transformation
Xin Hong
Yanyan Lan
Liang Pang
Jiafeng Guo
Xueqi Cheng
LRM
69
4
0
02 May 2023
An Empirical Study of Multimodal Model Merging
Yi-Lin Sung
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Joey Tianyi Zhou
Lijuan Wang
MoMe
125
42
0
28 Apr 2023
Multi-Modal Representation Learning with Text-Driven Soft Masks
Jaeyoo Park
Bohyung Han
SSL
68
4
0
03 Apr 2023
Vision-Language Models for Vision Tasks: A Survey
Jingyi Zhang
Jiaxing Huang
Sheng Jin
Shijian Lu
VLM
169
553
0
03 Apr 2023
Bi-directional Training for Composed Image Retrieval via Text Prompt Learning
Zheyuan Liu
Weixuan Sun
Yicong Hong
Damien Teney
Stephen Gould
132
34
0
29 Mar 2023
Natural Language Reasoning, A Survey
Fei Yu
Hongbo Zhang
Prayag Tiwari
Benyou Wang
ReLM
LRM
182
64
0
26 Mar 2023
Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation
Yuliang Cai
Jesse Thomason
Mohammad Rostami
VLM
CLL
83
12
0
25 Mar 2023
Accelerating Vision-Language Pretraining with Free Language Modeling
Teng Wang
Yixiao Ge
Feng Zheng
Ran Cheng
Ying Shan
Xiaohu Qie
Ping Luo
VLM
MLLM
133
10
0
24 Mar 2023
CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion
Geonmo Gu
Sanghyuk Chun
Wonjae Kim
HeeJae Jun
Yoohoon Kang
Sangdoo Yun
DiffM
149
59
0
21 Mar 2023
Data Roaming and Quality Assessment for Composed Image Retrieval
Matan Levy
Rami Ben-Ari
N. Darshan
Dani Lischinski
107
28
0
16 Mar 2023
Scaling Vision-Language Models with Sparse Mixture of Experts
Sheng Shen
Z. Yao
Chunyuan Li
Trevor Darrell
Kurt Keutzer
Yuxiong He
VLM
MoE
83
68
0
13 Mar 2023
Previous
1
2
3
4
5
6
7
8
9
Next