Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,118 papers shown
Title
Hierarchical Similarity Learning for Language-based Product Image Retrieval
Zhe Ma
Fenghao Liu
Jianfeng Dong
Xiaoye Qu
Yuan He
S. Ji
VLM
53
4
0
18 Feb 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
570
1,143
0
17 Feb 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
359
181
0
17 Feb 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
CLIP
191
666
0
11 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
562
3,917
0
11 Feb 2021
Telling the What while Pointing to the Where: Multimodal Queries for Image Retrieval
Soravit Changpinyo
Jordi Pont-Tuset
V. Ferrari
Radu Soricut
66
26
0
09 Feb 2021
Referring Segmentation in Images and Videos with Cross-Modal Self-Attention Network
Linwei Ye
Mrigank Rochan
Zhi Liu
Xiaoqin Zhang
Yang Wang
VOS
EgoV
66
57
0
09 Feb 2021
Iconographic Image Captioning for Artworks
E. Cetinic
66
24
0
07 Feb 2021
CSS-LM: A Contrastive Framework for Semi-supervised Fine-tuning of Pre-trained Language Models
Yusheng Su
Xu Han
Yankai Lin
Zhengyan Zhang
Zhiyuan Liu
Peng Li
Jie Zhou
Maosong Sun
73
10
0
07 Feb 2021
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Wonjae Kim
Bokyung Son
Ildoo Kim
VLM
CLIP
216
1,775
0
05 Feb 2021
RpBERT: A Text-image Relation Propagation-based BERT Model for Multimodal NER
Lin Sun
Jiquan Wang
Kai Zhang
Yindu Su
Fangsheng Weng
82
141
0
05 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Joey Tianyi Zhou
MLLM
404
547
0
04 Feb 2021
Inferring spatial relations from textual descriptions of images
A. Elu
Gorka Azkune
Oier López de Lacalle
Ignacio Arganda-Carreras
Aitor Soroa Etxabe
Eneko Agirre
45
2
0
01 Feb 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
150
117
0
31 Jan 2021
An Empirical Study on the Generalization Power of Neural Representations Learned via Visual Guessing Games
Alessandro Suglia
Yonatan Bisk
Ioannis Konstas
Antonio Vergari
E. Bastianelli
Andrea Vanzo
Oliver Lemon
40
8
0
31 Jan 2021
VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
Xudong Lin
Gedas Bertasius
Jue Wang
Shih-Fu Chang
Devi Parikh
Lorenzo Torresani
VGen
102
67
0
28 Jan 2021
Bottleneck Transformers for Visual Recognition
A. Srinivas
Nayeon Lee
Niki Parmar
Jonathon Shlens
Pieter Abbeel
Ashish Vaswani
SLR
402
999
0
27 Jan 2021
Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network
Yehao Li
Yingwei Pan
Ting Yao
Jingwen Chen
Tao Mei
VLM
95
53
0
27 Jan 2021
Cross-lingual Visual Pre-training for Multimodal Machine Translation
Ozan Caglayan
Menekse Kuyu
Mustafa Sercan Amac
Pranava Madhyastha
Erkut Erdem
Aykut Erdem
Lucia Specia
VLM
77
46
0
25 Jan 2021
Adversarial Text-to-Image Synthesis: A Review
Stanislav Frolov
Tobias Hinz
Federico Raue
Jörn Hees
Andreas Dengel
EGVM
86
178
0
25 Jan 2021
Visual Question Answering based on Local-Scene-Aware Referring Expression Generation
Jungjun Kim
Dong-Gyu Lee
Jialin Wu
Hong G Jung
Seong-Whan Lee
ObjD
91
22
0
22 Jan 2021
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
Brendan Duke
Abdalla Ahmed
Christian Wolf
P. Aarabi
Graham W. Taylor
VOS
70
167
0
21 Jan 2021
Learning rich touch representations through cross-modal self-supervision
Martina Zambelli
Y. Aytar
Francesco Visin
Yuxiang Zhou
R. Hadsell
SSL
82
16
0
21 Jan 2021
Understanding in Artificial Intelligence
S. Maetschke
D. M. Iraola
Pieter Barnard
Elaheh Shafieibavani
Peter Zhong
Ying Xu
Antonio Jimeno Yepes
ELM
VLM
49
0
0
17 Jan 2021
Latent Variable Models for Visual Question Answering
Zixu Wang
Yishu Miao
Lucia Specia
139
5
0
16 Jan 2021
Reasoning over Vision and Language: Exploring the Benefits of Supplemental Knowledge
Violetta Shevchenko
Damien Teney
A. Dick
Anton Van Den Hengel
87
29
0
15 Jan 2021
Probabilistic Embeddings for Cross-Modal Retrieval
Sanghyuk Chun
Seong Joon Oh
Rafael Sampaio de Rezende
Yannis Kalantidis
Diane Larlus
UQCV
534
210
0
13 Jan 2021
Trear: Transformer-based RGB-D Egocentric Action Recognition
Xiangyu Li
Yonghong Hou
Pichao Wang
Zhimin Gao
Mingliang Xu
Wanqing Li
ViT
233
88
0
05 Jan 2021
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
409
2,570
0
04 Jan 2021
VinVL: Revisiting Visual Representations in Vision-Language Models
Pengchuan Zhang
Xiujun Li
Xiaowei Hu
Jianwei Yang
Lei Zhang
Lijuan Wang
Yejin Choi
Jianfeng Gao
ObjD
VLM
351
158
0
02 Jan 2021
KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense Generation
Yiran Xing
Z. Shi
Zhao Meng
Gerhard Lakemeyer
Yunpu Ma
Roger Wattenhofer
VLM
128
40
0
02 Jan 2021
CDLM: Cross-Document Language Modeling
Avi Caciularu
Arman Cohan
Iz Beltagy
Matthew E. Peters
Arie Cattan
Ido Dagan
VLM
75
33
0
02 Jan 2021
UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning
Wei Li
Can Gao
Guocheng Niu
Xinyan Xiao
Hao Liu
Jiachen Liu
Hua Wu
Haifeng Wang
148
382
0
31 Dec 2020
Accurate Word Representations with Universal Visual Guidance
Zhuosheng Zhang
Haojie Yu
Hai Zhao
Rui Wang
Masao Utiyama
55
0
0
30 Dec 2020
OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual Contexts
Yuxian Meng
Shuhe Wang
Qinghong Han
Xiaofei Sun
Leilei Gan
Rui Yan
Jiwei Li
93
30
0
30 Dec 2020
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Yang Xu
Yiheng Xu
Tengchao Lv
Lei Cui
Furu Wei
...
D. Florêncio
Cha Zhang
Wanxiang Che
Min Zhang
Lidong Zhou
ViT
MLLM
248
523
0
29 Dec 2020
Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge
Riza Velioglu
J. Rose
VLM
50
87
0
23 Dec 2020
Training data-efficient image transformers & distillation through attention
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jégou
ViT
409
6,858
0
23 Dec 2020
A Multimodal Framework for the Detection of Hateful Memes
Phillip Lippe
Nithin Holla
Shantanu Chandra
S. Rajamanickam
Georgios Antoniou
Ekaterina Shutova
H. Yannakoudakis
60
74
0
23 Dec 2020
Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks
Letitia Parcalabescu
Albert Gatt
Anette Frank
Iacer Calixto
LRM
101
49
0
22 Dec 2020
ActionBert: Leveraging User Actions for Semantic Understanding of User Interfaces
Zecheng He
Srinivas Sunkara
Xiaoxue Zang
Ying Xu
Lijuan Liu
Nevan Wichers
Gabriel Schubiner
Ruby B. Lee
Jindong Chen
Blaise Agüera y Arcas
107
80
0
22 Dec 2020
Object-Centric Diagnosis of Visual Reasoning
Jianwei Yang
Jiayuan Mao
Jiajun Wu
Devi Parikh
David D. Cox
J. Tenenbaum
Chuang Gan
OCL
82
16
0
21 Dec 2020
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
Kenneth Marino
Xinlei Chen
Devi Parikh
Abhinav Gupta
Marcus Rohrbach
128
188
0
20 Dec 2020
Transformer Interpretability Beyond Attention Visualization
Hila Chefer
Shir Gur
Lior Wolf
145
681
0
17 Dec 2020
MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification
Te-Lin Wu
Shikhar Singh
S. Paul
Gully A. Burns
Nanyun Peng
43
18
0
16 Dec 2020
ReINTEL: A Multimodal Data Challenge for Responsible Information Identification on Social Network Sites
Duc-Trong Le
Xuan-Son Vu
Nhu-Dung To
Huu Nguyen
Thuy-Trinh Nguyen
...
A. Nguyen
Minh-Duc Hoang
Nghia T. Le
Huyen Thi Minh Nguyen
Hoang D. Nguyen
79
15
0
16 Dec 2020
A Closer Look at the Robustness of Vision-and-Language Pre-trained Models
Linjie Li
Zhe Gan
Jingjing Liu
VLM
101
44
0
15 Dec 2020
Attention over learned object embeddings enables complex visual reasoning
David Ding
Felix Hill
Adam Santoro
Malcolm Reynolds
M. Botvinick
OCL
114
71
0
15 Dec 2020
Vilio: State-of-the-art Visio-Linguistic Models applied to Hateful Memes
Niklas Muennighoff
85
64
0
14 Dec 2020
KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning
Dandan Song
S. Ma
Zhanchen Sun
Sicheng Yang
L. Liao
SSL
LRM
89
39
0
13 Dec 2020
Previous
1
2
3
...
37
38
39
...
41
42
43
Next