Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.03557
Cited By
VisualBERT: A Simple and Performant Baseline for Vision and Language
9 August 2019
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VisualBERT: A Simple and Performant Baseline for Vision and Language"
50 / 1,178 papers shown
Title
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
41
39,330
0
22 Oct 2020
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends
Shagun Uppal
Sarthak Bhagat
Devamanyu Hazarika
Navonil Majumdar
Soujanya Poria
Roger Zimmermann
Amir Zadeh
23
6
0
19 Oct 2020
Answer-checking in Context: A Multi-modal FullyAttention Network for Visual Question Answering
Hantao Huang
Tao Han
Wei Han
D. Yap
Cheng-Ming Chiang
18
2
0
17 Oct 2020
Unsupervised Natural Language Inference via Decoupled Multimodal Contrastive Learning
Wanyun Cui
Guangyu Zheng
Wei Wang
SSL
18
21
0
16 Oct 2020
Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision
Hao Tan
Joey Tianyi Zhou
CLIP
14
120
0
14 Oct 2020
CAPT: Contrastive Pre-Training for Learning Denoised Sequence Representations
Fuli Luo
Pengcheng Yang
Shicheng Li
Xuancheng Ren
Xu Sun
VLM
SSL
18
16
0
13 Oct 2020
MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding
Qinxin Wang
Hao Tan
Sheng Shen
Michael W. Mahoney
Z. Yao
ObjD
47
11
0
12 Oct 2020
Beyond Language: Learning Commonsense from Images for Reasoning
Wanqing Cui
Yanyan Lan
Liang Pang
Jiafeng Guo
Xueqi Cheng
LRM
19
5
0
10 Oct 2020
Learning to Represent Image and Text with Denotation Graph
Bowen Zhang
Hexiang Hu
Vihan Jain
Eugene Ie
Fei Sha
14
21
0
06 Oct 2020
Support-set bottlenecks for video-text representation learning
Mandela Patrick
Po-Yao (Bernie) Huang
Yuki M. Asano
Florian Metze
Alexander G. Hauptmann
João Henriques
Andrea Vedaldi
22
244
0
06 Oct 2020
Pathological Visual Question Answering
Xuehai He
Zhuo Cai
Wenlan Wei
Yichen Zhang
Luntian Mou
Eric P. Xing
P. Xie
72
24
0
06 Oct 2020
Multi-Modal Open-Domain Dialogue
Kurt Shuster
Eric Michael Smith
Da Ju
Jason Weston
AI4CE
38
42
0
02 Oct 2020
A Multimodal Memes Classification: A Survey and Open Research Issues
Tariq Habib Afridi
A. Alam
Muhammad Numan Khan
Jawad Khan
Young-Koo Lee
29
35
0
17 Sep 2020
Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations
Meng-Jiun Chiou
Roger Zimmermann
Jiashi Feng
21
1
0
10 Sep 2020
A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports
Yikuan Li
Hanyin Wang
Yuan Luo
19
63
0
03 Sep 2020
Active Contrastive Learning of Audio-Visual Video Representations
Shuang Ma
Zhaoyang Zeng
Daniel J. McDuff
Yale Song
VLM
SSL
24
8
0
31 Aug 2020
DeVLBert: Learning Deconfounded Visio-Linguistic Representations
Shengyu Zhang
Tan Jiang
Tan Wang
Kun Kuang
Zhou Zhao
Jianke Zhu
Jin Yu
Hongxia Yang
Fei Wu
OOD
20
85
0
16 Aug 2020
Neural Language Generation: Formulation, Methods, and Evaluation
Cristina Garbacea
Qiaozhu Mei
45
30
0
31 Jul 2020
Contrastive Visual-Linguistic Pretraining
Lei Shi
Kai Shuang
Shijie Geng
Peng Su
Zhengkai Jiang
Peng Gao
Zuohui Fu
Gerard de Melo
Sen Su
VLM
SSL
CLIP
33
29
0
26 Jul 2020
Spatially Aware Multimodal Transformers for TextVQA
Yash Kant
Dhruv Batra
Peter Anderson
A. Schwing
Devi Parikh
Jiasen Lu
Harsh Agrawal
17
85
0
23 Jul 2020
Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation
Wanrong Zhu
Qing Guo
Tsu-jui Fu
An Yan
P. Narayana
Kazoo Sone
Sugato Basu
Luu Anh Tuan
29
33
0
01 Jul 2020
Modality-Agnostic Attention Fusion for visual search with text feedback
Eric Dodds
Jack Culpepper
Simão Herdade
Yang Zhang
K. Boakye
EgoV
18
71
0
30 Jun 2020
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
Fei Yu
Jiji Tang
Weichong Yin
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
31
376
0
30 Jun 2020
Contrastive Learning for Weakly Supervised Phrase Grounding
Tanmay Gupta
Arash Vahdat
Gal Chechik
Xiaodong Yang
Jan Kautz
Derek Hoiem
ObjD
SSL
42
140
0
17 Jun 2020
VirTex: Learning Visual Representations from Textual Annotations
Karan Desai
Justin Johnson
SSL
VLM
30
432
0
11 Jun 2020
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjD
VLM
35
488
0
11 Jun 2020
TRIE: End-to-End Text Reading and Information Extraction for Document Understanding
Peng Zhang
Yunlu Xu
Zhanzhan Cheng
Shiliang Pu
Jing Lu
Liang Qiao
Yi Niu
Fei Wu
SyDa
27
102
0
27 May 2020
Adaptive Transformers for Learning Multimodal Representations
Prajjwal Bhargava
19
4
0
15 May 2020
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
Jize Cao
Zhe Gan
Yu Cheng
Licheng Yu
Yen-Chun Chen
Jingjing Liu
VLM
22
127
0
15 May 2020
Cross-Modality Relevance for Reasoning on Language and Vision
Chen Zheng
Quan Guo
Parisa Kordjamshidi
LRM
43
36
0
12 May 2020
The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
Douwe Kiela
Hamed Firooz
Aravind Mohan
Vedanuj Goswami
Amanpreet Singh
Pratik Ringshia
Davide Testuggine
37
580
0
10 May 2020
MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis
Devamanyu Hazarika
Roger Zimmermann
Soujanya Poria
21
669
0
07 May 2020
Cross-media Structured Common Space for Multimedia Event Extraction
Manling Li
Alireza Zareian
Qi Zeng
Spencer Whitehead
Di Lu
Heng Ji
Shih-Fu Chang
10
103
0
05 May 2020
Visuo-Linguistic Question Answering (VLQA) Challenge
Shailaja Keyur Sampat
Yezhou Yang
Chitta Baral
CoGe
13
1
0
01 May 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
MLLM
VLM
OffRL
AI4TS
43
493
0
01 May 2020
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web
Arjun Majumdar
Ayush Shrivastava
Stefan Lee
Peter Anderson
Devi Parikh
Dhruv Batra
LM&Ro
47
230
0
30 Apr 2020
VD-BERT: A Unified Vision and Dialog Transformer with BERT
Yue Wang
Chenyu You
Michael R. Lyu
Irwin King
Caiming Xiong
Guosheng Lin
24
102
0
28 Apr 2020
Deep Multimodal Neural Architecture Search
Zhou Yu
Yuhao Cui
Jun-chen Yu
Meng Wang
Dacheng Tao
Qi Tian
16
98
0
25 Apr 2020
Experience Grounds Language
Yonatan Bisk
Ari Holtzman
Jesse Thomason
Jacob Andreas
Yoshua Bengio
...
Angeliki Lazaridou
Jonathan May
Aleksandr Nisnevich
Nicolas Pinto
Joseph P. Turian
21
351
0
21 Apr 2020
Are we pretraining it right? Digging deeper into visio-linguistic pretraining
Amanpreet Singh
Vedanuj Goswami
Devi Parikh
VLM
40
48
0
19 Apr 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
17
1,917
0
13 Apr 2020
Multimodal Categorization of Crisis Events in Social Media
Mahdi Abavisani
Liwei Wu
Shengli Hu
Joel R. Tetreault
A. Jaimes
29
87
0
10 Apr 2020
Learning to Scale Multilingual Representations for Vision-Language Tasks
Andrea Burns
Donghyun Kim
Derry Wijaya
Kate Saenko
Bryan A. Plummer
15
35
0
09 Apr 2020
Context-Aware Group Captioning via Self-Attention and Contrastive Features
Zhuowan Li
Quan Hung Tran
Long Mai
Zhe-nan Lin
Alan Yuille
VLM
14
44
0
07 Apr 2020
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
Zhicheng Huang
Zhaoyang Zeng
Bei Liu
Dongmei Fu
Jianlong Fu
ViT
50
436
0
02 Apr 2020
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
243
1,452
0
18 Mar 2020
XGPT: Cross-modal Generative Pre-Training for Image Captioning
Qiaolin Xia
Haoyang Huang
Nan Duan
Dongdong Zhang
Lei Ji
Zhifang Sui
Edward Cui
Taroon Bharti
Xin Liu
Ming Zhou
MLLM
VLM
25
74
0
03 Mar 2020
What BERT Sees: Cross-Modal Transfer for Visual Question Generation
Thomas Scialom
Patrick Bordes
Paul-Alexis Dray
Jacopo Staiano
Patrick Gallinari
25
6
0
25 Feb 2020
Measuring Social Biases in Grounded Vision and Language Embeddings
Candace Ross
Boris Katz
Andrei Barbu
19
63
0
20 Feb 2020
Robustness Verification for Transformers
Zhouxing Shi
Huan Zhang
Kai-Wei Chang
Minlie Huang
Cho-Jui Hsieh
AAML
24
104
0
16 Feb 2020
Previous
1
2
3
...
22
23
24
Next