Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,119 papers shown
Title
VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models
Wangchunshu Zhou
Yan Zeng
Shizhe Diao
Xinsong Zhang
CoGe
VLM
111
13
0
30 May 2022
Prompt-aligned Gradient for Prompt Tuning
Beier Zhu
Yulei Niu
Yucheng Han
Yuehua Wu
Hanwang Zhang
VLM
327
294
0
30 May 2022
UPB at SemEval-2022 Task 5: Enhancing UNITER with Image Sentiment and Graph Convolutional Networks for Multimedia Automatic Misogyny Identification
Andrei Paraschiv
M. Dascalu
Dumitru-Clementin Cercel
97
4
0
29 May 2022
VD-PCR: Improving Visual Dialog with Pronoun Coreference Resolution
Xintong Yu
Hongming Zhang
Ruixin Hong
Yangqiu Song
Changshui Zhang
72
13
0
29 May 2022
Parameter-Efficient and Student-Friendly Knowledge Distillation
Jun Rao
Xv Meng
Liang Ding
Shuhan Qi
Dacheng Tao
99
51
0
28 May 2022
Multimodal Fake News Detection via CLIP-Guided Learning
Yangming Zhou
Qichao Ying
Zhenxing Qian
Sheng Li
Xinpeng Zhang
99
61
0
28 May 2022
Multimodal Masked Autoencoders Learn Transferable Representations
Xinyang Geng
Hao Liu
Lisa Lee
Dale Schuurams
Sergey Levine
Pieter Abbeel
105
119
0
27 May 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
180
564
0
27 May 2022
AANG: Automating Auxiliary Learning
Lucio Dery
Paul Michel
M. Khodak
Graham Neubig
Ameet Talwalkar
122
9
0
27 May 2022
Multimodal Knowledge Alignment with Reinforcement Learning
Youngjae Yu
Jiwan Chung
Heeseung Yun
Jack Hessel
Jinho Park
...
Prithviraj Ammanabrolu
Rowan Zellers
Ronan Le Bras
Gunhee Kim
Yejin Choi
VLM
163
37
0
25 May 2022
DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally Spreading Out Disinformation
Jingnong Qu
Liunian Harold Li
Jieyu Zhao
Sunipa Dev
Kai-Wei Chang
77
12
0
25 May 2022
Guiding Visual Question Answering with Attention Priors
T. Le
Vuong Le
Sunil R. Gupta
Svetha Venkatesh
T. Tran
68
6
0
25 May 2022
Mutual Information Divergence: A Unified Metric for Multimodal Generative Models
Jin-Hwa Kim
Yunji Kim
Jiyoung Lee
Kang Min Yoo
Sang-Woo Lee
EGVM
111
35
0
25 May 2022
The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training
Gi-Cheon Kang
Sungdong Kim
Jin-Hwa Kim
Donghyun Kwak
Byoung-Tak Zhang
101
10
0
25 May 2022
Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization
Aishwarya Agrawal
Ivana Kajić
Emanuele Bugliarello
Elnaz Davoodi
Anita Gergely
Phil Blunsom
Aida Nematzadeh
OOD
92
18
0
24 May 2022
HiVLP: Hierarchical Vision-Language Pre-Training for Fast Image-Text Retrieval
Feilong Chen
Xiuyi Chen
Jiaxin Shi
Duzhen Zhang
Jianlong Chang
Qi Tian
VLM
CLIP
110
6
0
24 May 2022
Sim-To-Real Transfer of Visual Grounding for Human-Aided Ambiguity Resolution
Georgios Tziafas
S. Kasaei
119
2
0
24 May 2022
VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification
Souhail Bakkali
Zuheng Ming
Mickael Coustaty
Marccal Rusinol
O. R. Terrades
VLM
99
31
0
24 May 2022
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
Chenliang Li
Haiyang Xu
Junfeng Tian
Wei Wang
Ming Yan
...
Ji Zhang
Songfang Huang
Feiran Huang
Jingren Zhou
Luo Si
VLM
MLLM
102
224
0
24 May 2022
On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization
Shruti Palaskar
Akshita Bhagia
Yonatan Bisk
Florian Metze
A. Black
Ana Marasović
90
4
0
24 May 2022
VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering
Yanan Wang
Michihiro Yasunaga
Hongyu Ren
Shinya Wada
J. Leskovec
85
18
0
23 May 2022
Markedness in Visual Semantic AI
Robert Wolfe
Aylin Caliskan
VLM
112
36
0
23 May 2022
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models
Yuan Yao
Qi-An Chen
Ao Zhang
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
VLM
MLLM
100
38
0
23 May 2022
Supporting Vision-Language Model Inference with Confounder-pruning Knowledge Prompt
Jiangmeng Li
Wenyi Mo
Jingyao Wang
Fuchun Sun
Changwen Zheng
Hui Xiong
Ji-Rong Wen
VLM
93
0
0
23 May 2022
Evidence for Hypodescent in Visual Semantic AI
Robert Wolfe
M. Banaji
Aylin Caliskan
VLM
96
38
0
22 May 2022
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Zhenhailong Wang
Manling Li
Ruochen Xu
Luowei Zhou
Jie Lei
...
Chenguang Zhu
Derek Hoiem
Shih-Fu Chang
Joey Tianyi Zhou
Heng Ji
MLLM
VLM
230
142
0
22 May 2022
Housekeep: Tidying Virtual Households using Commonsense Reasoning
Yash Kant
Arun Ramachandran
Sriram Yenamandra
Igor Gilitschenski
Dhruv Batra
Andrew Szot
Harsh Agrawal
LM&Ro
LRM
231
73
0
22 May 2022
Visually-Augmented Language Modeling
Weizhi Wang
Li Dong
Hao Cheng
Haoyu Song
Xiaodong Liu
Xifeng Yan
Jianfeng Gao
Furu Wei
VLM
89
18
0
20 May 2022
Voxel-informed Language Grounding
Rodolfo Corona
Shizhan Zhu
Dan Klein
Trevor Darrell
183
12
0
19 May 2022
Training Vision-Language Transformers from Captions
Liangke Gui
Yingshan Chang
Qiuyuan Huang
Subhojit Som
Alexander G. Hauptmann
Jianfeng Gao
Yonatan Bisk
VLM
ViT
205
11
0
19 May 2022
Gender and Racial Bias in Visual Question Answering Datasets
Yusuke Hirota
Yuta Nakashima
Noa Garcia
FaML
187
55
0
17 May 2022
MATrIX -- Modality-Aware Transformer for Information eXtraction
Thomas Delteil
Edouard Belval
Lei Chen
Luis Goncalves
Vijay Mahadevan
91
3
0
17 May 2022
Multimodal Conversational AI: A Survey of Datasets and Approaches
Anirudh S. Sundar
Larry Heck
102
30
0
13 May 2022
Localized Vision-Language Matching for Open-vocabulary Object Detection
M. A. Bravo
Sudhanshu Mittal
Thomas Brox
VLM
ObjD
59
25
0
12 May 2022
A Computational Acquisition Model for Multimodal Word Categorization
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
48
9
0
12 May 2022
DISARM: Detecting the Victims Targeted by Harmful Memes
Shivam Sharma
Md. Shad Akhtar
Preslav Nakov
Tanmoy Chakraborty
71
32
0
11 May 2022
Learning to Answer Visual Questions from Web Videos
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
102
35
0
10 May 2022
Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning
Chia-Wen Kuo
Z. Kira
97
56
0
09 May 2022
Beyond Bounding Box: Multimodal Knowledge Learning for Object Detection
Wei Feng
Xingyuan Bu
Chenchen Zhang
Xubin Li
VLM
46
4
0
09 May 2022
Masked Co-attentional Transformer reconstructs 100x ultra-fast/low-dose whole-body PET from longitudinal images and anatomically guided MRI
Yan-Ran
Y. Wang
Liangqiong Qu
N. Sheybani
Xiaolong Luo
...
S. Gatidis
Xuerong Xiao
Allison Pribnow
D. Rubin
H. Daldrup-Link
ViT
MedIm
30
0
0
09 May 2022
CCMB: A Large-scale Chinese Cross-modal Benchmark
Chunyu Xie
Heng Cai
Jincheng Li
Fanjing Kong
Xiaoyu Wu
...
Xiangzheng Zhang
Dawei Leng
Baochang Zhang
Xiangyang Ji
Yafeng Deng
MLLM
VLM
88
12
0
08 May 2022
RoViST:Learning Robust Metrics for Visual Storytelling
Eileen Wang
S. Han
Josiah Poon
49
10
0
08 May 2022
Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction
Xiang Chen
Ningyu Zhang
Lei Li
Yunzhi Yao
Shumin Deng
Chuanqi Tan
Fei Huang
Luo Si
Huajun Chen
53
34
0
07 May 2022
Declaration-based Prompt Tuning for Visual Question Answering
Yuhang Liu
Wei Wei
Daowan Peng
Feida Zhu
MLLM
VLM
58
19
0
05 May 2022
Cross-modal Contrastive Learning for Speech Translation
Rong Ye
Mingxuan Wang
Lei Li
SSL
99
91
0
05 May 2022
Subverting Fair Image Search with Generative Adversarial Perturbations
A. Ghosh
Matthew Jagielski
Chris L. Wilson
91
7
0
05 May 2022
Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion
Xiang Chen
Ningyu Zhang
Lei Li
Shumin Deng
Chuanqi Tan
Changliang Xu
Fei Huang
Luo Si
Huajun Chen
121
138
0
04 May 2022
P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision
Henghui Zhao
Isma Hadji
Nikita Dvornik
Konstantinos G. Derpanis
Richard P. Wildes
Allan D. Jepson
110
46
0
04 May 2022
All You May Need for VQA are Image Captions
Soravit Changpinyo
Doron Kukliansky
Idan Szpektor
Xi Chen
Nan Ding
Radu Soricut
101
76
0
04 May 2022
Visual Commonsense in Pretrained Unimodal and Multimodal Models
Chenyu Zhang
Benjamin Van Durme
Zhuowan Li
Elias Stengel-Eskin
VLM
SSL
79
41
0
04 May 2022
Previous
1
2
3
...
25
26
27
...
41
42
43
Next