Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
1811.00491
Cited By
v1
v2
v3 (latest)
A Corpus for Reasoning About Natural Language Grounded in Photographs
1 November 2018
Alane Suhr
Stephanie Zhou
Ally Zhang
Iris Zhang
Huajun Bai
Yoav Artzi
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A Corpus for Reasoning About Natural Language Grounded in Photographs"
50 / 419 papers shown
Title
VLMAE: Vision-Language Masked Autoencoder
Su He
Taian Guo
Tao Dai
Ruizhi Qiao
Chen Wu
Xiujun Shu
Bohan Ren
VLM
95
11
0
19 Aug 2022
GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language Pre-training
Jaeseok Byun
Taebaek Hwang
Jianlong Fu
Taesup Moon
VLM
95
11
0
08 Aug 2022
ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding
Bingning Wang
Feiya Lv
Ting Yao
Yiming Yuan
Jin Ma
Yu Luo
Haijin Liang
73
3
0
05 Aug 2022
Masked Vision and Language Modeling for Multi-modal Representation Learning
Gukyeong Kwon
Zhaowei Cai
Avinash Ravichandran
Erhan Bas
Rahul Bhotika
Stefano Soatto
92
68
0
03 Aug 2022
Augmenting Vision Language Pretraining by Learning Codebook with Visual Semantics
Xiaoyuan Guo
Jiali Duan
C.-C. Jay Kuo
J. Gichoya
Imon Banerjee
VLM
63
1
0
31 Jul 2022
NewsStories: Illustrating articles with visual summaries
Reuben Tan
Bryan A. Plummer
Kate Saenko
J. P. Lewis
Avneesh Sud
Thomas Leung
VLM
SSL
152
5
0
26 Jul 2022
Rethinking the Reference-based Distinctive Image Captioning
Yangjun Mao
Long Chen
Zhihong Jiang
Dong Zhang
Zhimeng Zhang
Jian Shao
Jun Xiao
DiffM
99
22
0
22 Jul 2022
Counterfactually Measuring and Eliminating Social Bias in Vision-Language Pre-training Models
Yi Zhang
Junyan Wang
Jitao Sang
104
28
0
03 Jul 2022
CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks
Tejas Srinivasan
Ting-Yun Chang
Leticia Pinto-Alva
Georgios Chochlakis
Mohammad Rostami
Jesse Thomason
VLM
CLL
114
76
0
18 Jun 2022
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
Teng Wang
Wenhao Jiang
Zhichao Lu
Feng Zheng
Ran Cheng
Chengguo Yin
Ping Luo
VLM
94
44
0
17 Jun 2022
BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning
Xiao Xu
Chenfei Wu
Shachar Rosenman
Vasudev Lal
Wanxiang Che
Nan Duan
105
70
0
17 Jun 2022
MixGen: A New Multi-Modal Data Augmentation
Xiaoshuai Hao
Yi Zhu
Srikar Appalaraju
Aston Zhang
Wanqian Zhang
Boyang Li
Mu Li
VLM
136
90
0
16 Jun 2022
Write and Paint: Generative Vision-Language Models are Unified Modal Learners
Shizhe Diao
Wangchunshu Zhou
Xinsong Zhang
Jiawei Wang
MLLM
AI4CE
108
17
0
15 Jun 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLM
ObjD
132
130
0
15 Jun 2022
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning
Yi-Lin Sung
Jaemin Cho
Joey Tianyi Zhou
VLM
113
246
0
13 Jun 2022
Language Models are General-Purpose Interfaces
Y. Hao
Haoyu Song
Li Dong
Shaohan Huang
Zewen Chi
Wenhui Wang
Shuming Ma
Furu Wei
MLLM
80
102
0
13 Jun 2022
VL-BEiT: Generative Vision-Language Pretraining
Hangbo Bao
Wenhui Wang
Li Dong
Furu Wei
VLM
86
45
0
02 Jun 2022
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training
Yan Zeng
Wangchunshu Zhou
Ao Luo
Ziming Cheng
Xinsong Zhang
VLM
110
32
0
01 Jun 2022
VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models
Wangchunshu Zhou
Yan Zeng
Shizhe Diao
Xinsong Zhang
CoGe
VLM
113
13
0
30 May 2022
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
Chenliang Li
Haiyang Xu
Junfeng Tian
Wei Wang
Ming Yan
...
Ji Zhang
Songfang Huang
Feiran Huang
Jingren Zhou
Luo Si
VLM
MLLM
109
224
0
24 May 2022
Training Vision-Language Transformers from Captions
Liangke Gui
Yingshan Chang
Qiuyuan Huang
Subhojit Som
Alexander G. Hauptmann
Jianfeng Gao
Yonatan Bisk
VLM
ViT
220
11
0
19 May 2022
Addressing Resource and Privacy Constraints in Semantic Parsing Through Data Augmentation
Kevin Yang
Olivia Deng
Charles C. Chen
Richard Shin
Subhro Roy
Benjamin Van Durme
104
10
0
18 May 2022
What is Right for Me is Not Yet Right for You: A Dataset for Grounding Relative Directions via Multi-Task Learning
Jae Hee Lee
Matthias Kerzel
Kyra Ahrens
C. Weber
S. Wermter
85
9
0
05 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
362
1,315
0
04 May 2022
Visual Spatial Reasoning
Fangyu Liu
Guy Edward Toh Emerson
Nigel Collier
ReLM
139
185
0
30 Apr 2022
Leaner and Faster: Two-Stage Model Compression for Lightweight Text-Image Retrieval
Siyu Ren
Kenny Q. Zhu
VLM
44
7
0
29 Apr 2022
Image Retrieval from Contextual Descriptions
Benno Krojer
Vaibhav Adlakha
Vibhav Vineet
Yash Goyal
Edoardo Ponti
Siva Reddy
97
32
0
29 Mar 2022
PACS: A Dataset for Physical Audiovisual CommonSense Reasoning
Samuel Yu
Peter Wu
Paul Pu Liang
Ruslan Salakhutdinov
Louis-Philippe Morency
LRM
132
16
0
21 Mar 2022
Finding Structural Knowledge in Multimodal-BERT
Victor Milewski
Miryam de Lhoneux
Marie-Francine Moens
75
10
0
17 Mar 2022
Vision-Language Intelligence: Tasks, Representation Learning, and Large Models
Feng Li
Hao Zhang
Yi-Fan Zhang
Shixuan Liu
Jian Guo
L. Ni
Pengchuan Zhang
Lei Zhang
AI4TS
VLM
83
37
0
03 Mar 2022
There is a Time and Place for Reasoning Beyond the Image
Xingyu Fu
Ben Zhou
I. Chandratreya
Carl Vondrick
Dan Roth
168
22
0
01 Mar 2022
Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
Mingyang Zhou
Licheng Yu
Amanpreet Singh
Mengjiao MJ Wang
Zhou Yu
Ning Zhang
VLM
82
31
0
01 Mar 2022
Multi-modal Alignment using Representation Codebook
Jiali Duan
Liqun Chen
Son Tran
Jinyu Yang
Yi Xu
Belinda Zeng
Trishul Chilimbi
160
68
0
28 Feb 2022
Vision-Language Pre-Training with Triple Contrastive Learning
Jinyu Yang
Jiali Duan
Son N. Tran
Yi Xu
Sampath Chanda
Liqun Chen
Belinda Zeng
Trishul Chilimbi
Junzhou Huang
VLM
161
300
0
21 Feb 2022
A Review of Emerging Research Directions in Abstract Visual Reasoning
Mikolaj Malkiñski
Jacek Mańdziuk
121
41
0
21 Feb 2022
A Survey of Vision-Language Pre-Trained Models
Yifan Du
Zikang Liu
Junyi Li
Wayne Xin Zhao
VLM
185
190
0
18 Feb 2022
When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
Oana Ignat
Santiago Castro
Yuhang Zhou
Jiajun Bao
Dandan Shan
Rada Mihalcea
68
3
0
16 Feb 2022
XFBoost: Improving Text Generation with Controllable Decoders
Xiangyu Peng
Michael Sollami
102
1
0
16 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
610
4,444
0
28 Jan 2022
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages
Emanuele Bugliarello
Fangyu Liu
Jonas Pfeiffer
Siva Reddy
Desmond Elliott
Edoardo Ponti
Ivan Vulić
MLLM
VLM
ELM
135
64
0
27 Jan 2022
MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding
Revanth Reddy Gangi Reddy
Xilin Rui
Pengfei Yu
Xudong Lin
Haoyang Wen
...
Joey Tianyi Zhou
Avirup Sil
Shih-Fu Chang
Alex Schwing
Heng Ji
82
32
0
20 Dec 2021
Contrastive Vision-Language Pre-training with Limited Resources
Quan Cui
Boyan Zhou
Yu Guo
Weidong Yin
Hao Wu
Osamu Yoshie
Yubo Chen
VLM
CLIP
61
34
0
17 Dec 2021
Distilled Dual-Encoder Model for Vision-Language Understanding
Zekun Wang
Wenhui Wang
Haichao Zhu
Ming Liu
Bing Qin
Furu Wei
VLM
FedML
95
33
0
16 Dec 2021
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
Zhecan Wang
Haoxuan You
Liunian Harold Li
Alireza Zareian
Suji Park
Yiqing Liang
Kai-Wei Chang
Shih-Fu Chang
ReLM
LRM
75
33
0
16 Dec 2021
VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena
Letitia Parcalabescu
Michele Cafagna
Lilitta Muradjan
Anette Frank
Iacer Calixto
Albert Gatt
CoGe
112
118
0
14 Dec 2021
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
Yi-Lin Sung
Jaemin Cho
Joey Tianyi Zhou
VLM
VPVLM
139
361
0
13 Dec 2021
ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition
Xinyu Wang
Min Gui
Yong Jiang
Zixia Jia
Nguyen Bach
Tao Wang
Zhongqiang Huang
Fei Huang
Kewei Tu
117
55
0
13 Dec 2021
PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning
Yining Hong
Li Yi
J. Tenenbaum
Antonio Torralba
Chuang Gan
76
40
0
09 Dec 2021
MLP Architectures for Vision-and-Language Modeling: An Empirical Study
Yi-Liang Nie
Linjie Li
Zhe Gan
Shuohang Wang
Chenguang Zhu
Michael Zeng
Zicheng Liu
Joey Tianyi Zhou
Lijuan Wang
66
6
0
08 Dec 2021
Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text
Christopher Clark
Jordi Salvador
Dustin Schwenk
Derrick Bonafilia
Mark Yatskar
...
Aaron Sarnat
Hannaneh Hajishirzi
Aniruddha Kembhavi
Oren Etzioni
Ali Farhadi
MLLM
65
5
0
01 Dec 2021
Previous
1
2
3
4
5
6
7
8
9
Next