Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,094 papers shown
Title
TxT: Crossmodal End-to-End Learning with Transformers
Jan-Martin O. Steitz
Jonas Pfeiffer
Iryna Gurevych
Stefan Roth
LRM
21
2
0
09 Sep 2021
M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining
Xiao Dong
Xunlin Zhan
Yangxin Wu
Yunchao Wei
Michael C. Kampffmeyer
Xiaoyong Wei
Minlong Lu
Yaowei Wang
Xiaodan Liang
35
37
0
09 Sep 2021
Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models
Steven Y. Feng
Kevin Lu
Zhuofu Tao
Malihe Alikhani
Teruko Mitamura
Eduard H. Hovy
Varun Gangal
LRM
45
13
0
08 Sep 2021
Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering
Chenyu You
Nuo Chen
Yuexian Zou
SSL
27
63
0
08 Sep 2021
Learning grounded word meaning representations on similarity graphs
Mariella Dimiccoli
H. Wendt
Pau Batlle
18
1
0
07 Sep 2021
CTRL-C: Camera calibration TRansformer with Line-Classification
Jinwoo Lee
Hyun-Young Go
Hyunjoon Lee
Sunghyun Cho
Minhyuk Sung
Junho Kim
ViT
10
35
0
06 Sep 2021
Learning to Generate Scene Graph from Natural Language Supervision
Yiwu Zhong
Jing Shi
Jianwei Yang
Chenliang Xu
Yin Li
SSL
44
77
0
06 Sep 2021
Data Efficient Masked Language Modeling for Vision and Language
Yonatan Bitton
Gabriel Stanovsky
Michael Elhadad
Roy Schwartz
VLM
11
20
0
05 Sep 2021
LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation
Mohammad Abuzar Shaikh
Zhanghexuan Ji
Dana Moukheiber
Yan Shen
S. Srihari
Mingchen Gao
VLM
22
1
0
04 Sep 2021
Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
LRM
30
18
0
04 Sep 2021
Supervised Contrastive Learning for Multimodal Unreliable News Detection in COVID-19 Pandemic
Wenjia Zhang
Lin Gui
Yulan He
33
32
0
04 Sep 2021
Multimodal Conditionality for Natural Language Generation
Michael Sollami
Aashish Jain
24
10
0
02 Sep 2021
Point-of-Interest Type Prediction using Text and Images
Danae Sánchez Villegas
Nikolaos Aletras
10
14
0
01 Sep 2021
WebQA: Multihop and Multimodal QA
Yingshan Chang
M. Narang
Hisami Suzuki
Guihong Cao
Jianfeng Gao
Yonatan Bisk
LRM
18
78
0
01 Sep 2021
CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations
Hang Li
Yunxing Kang
Tianqiao Liu
Wenbiao Ding
Zitao Liu
41
17
0
01 Sep 2021
On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering
K. Gouthaman
Anurag Mittal
CML
45
0
0
28 Aug 2021
Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training
Yuqing Song
Shizhe Chen
Qin Jin
Wei Luo
Jun Xie
Fei Huang
31
18
0
25 Aug 2021
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
Hanbo Zhang
Yunfan Lu
Cunjun Yu
David Hsu
Xuguang Lan
Nanning Zheng
LM&Ro
29
63
0
25 Aug 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLM
MLLM
51
782
0
24 Aug 2021
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
Jianwei Yang
Yonatan Bisk
Jianfeng Gao
27
137
0
23 Aug 2021
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
Yuxin Wang
Hongtao Xie
Shancheng Fang
Jing Wang
Shenggao Zhu
Yongdong Zhang
VLM
58
152
0
22 Aug 2021
Multimodal Breast Lesion Classification Using Cross-Attention Deep Networks
Hung Q. Vo
Pengyu Yuan
T. He
Stephen T. C. Wong
H. Nguyen
18
1
0
21 Aug 2021
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Ming Yan
Haiyang Xu
Chenliang Li
Bin Bi
Junfeng Tian
Min Gui
Wei Wang
VLM
36
10
0
21 Aug 2021
Airbert: In-domain Pretraining for Vision-and-Language Navigation
Pierre-Louis Guhur
Makarand Tapaswi
Shizhe Chen
Ivan Laptev
Cordelia Schmid
LM&Ro
27
135
0
20 Aug 2021
Knowledge Perceived Multi-modal Pretraining in E-commerce
Yushan Zhu
Huaixiao Tou
Wen Zhang
Ganqiang Ye
Hui Chen
Ningyu Zhang
Huajun Chen
33
32
0
20 Aug 2021
Detection of Illicit Drug Trafficking Events on Instagram: A Deep Multimodal Multilabel Learning Approach
Chuanbo Hu
Minglei Yin
Bin Liu
Xin Li
Yanfang Ye
18
15
0
19 Aug 2021
X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics
Yehao Li
Yingwei Pan
Jingwen Chen
Ting Yao
Tao Mei
VLM
19
31
0
18 Aug 2021
Who's Waldo? Linking People Across Text and Images
Claire Yuqing Cui
Apoorv Khandelwal
Yoav Artzi
Noah Snavely
Hadar Averbuch-Elor
31
21
0
16 Aug 2021
MMChat: Multi-Modal Chat Dataset on Social Media
Yinhe Zheng
Guanyi Chen
Xin Liu
K. Lin
27
33
0
16 Aug 2021
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Yuhao Cui
Zhou Yu
Chunqi Wang
Zhongzhou Zhao
Ji Zhang
Meng Wang
Jun-chen Yu
VLM
27
53
0
16 Aug 2021
Video Transformer for Deepfake Detection with Incremental Learning
Sohail Ahmed Khan
Hang Dai
ViT
24
63
0
11 Aug 2021
BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis
Masoud Monajatipoor
Mozhdeh Rouhsedaghat
Liunian Harold Li
Aichi Chien
C.-C. Jay Kuo
Fabien Scalzo
Kai-Wei Chang
LM&MA
MedIm
32
30
0
10 Aug 2021
Embodied BERT: A Transformer Model for Embodied, Language-guided Visual Task Completion
Alessandro Suglia
Qiaozi Gao
Jesse Thomason
Govind Thattai
Gaurav Sukhatme
LM&Ro
37
78
0
10 Aug 2021
Relation-aware Compositional Zero-shot Learning for Attribute-Object Pair Recognition
Ziwei Xu
Guangzhi Wang
Yongkang Wong
Mohan S. Kankanhalli
56
26
0
10 Aug 2021
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
Zheyuan Liu
Cristian Rodriguez-Opazo
Damien Teney
Stephen Gould
VLM
22
191
0
09 Aug 2021
Disentangling Hate in Online Memes
Rui Cao
Ziqing Fan
Roy Ka-wei Lee
Wen-Haw Chong
Jing Jiang
26
76
0
09 Aug 2021
Detecting Propaganda Techniques in Memes
Dimitar Dimitrov
Bishr Bin Ali
Shaden Shaar
Firoj Alam
Fabrizio Silvestri
Hamed Firooz
Preslav Nakov
Giovanni Da San Martino
48
93
0
07 Aug 2021
Interpretable Visual Understanding with Cognitive Attention Network
Xuejiao Tang
Wenbin Zhang
Yi Yu
Kea Turner
Tyler Derr
Mengyu Wang
Eirini Ntoutsi
55
12
0
06 Aug 2021
StrucTexT: Structured Text Understanding with Multi-Modal Transformers
Yulin Li
Yuxi Qian
Yuchen Yu
Xiameng Qin
Chengquan Zhang
Yan Liu
Kun Yao
Junyu Han
Jingtuo Liu
Errui Ding
37
114
0
06 Aug 2021
Fast Convergence of DETR with Spatially Modulated Co-Attention
Peng Gao
Minghang Zheng
Xiaogang Wang
Jifeng Dai
Hongsheng Li
ViT
30
305
0
05 Aug 2021
Exploiting BERT For Multimodal Target Sentiment Classification Through Input Space Translation
Zaid Khan
Y. Fu
43
132
0
03 Aug 2021
Representation learning for neural population activity with Neural Data Transformers
Joel Ye
C. Pandarinath
AI4TS
AI4CE
11
53
0
02 Aug 2021
StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
Rinon Gal
Or Patashnik
Haggai Maron
Gal Chechik
Daniel Cohen-Or
CLIP
VLM
44
222
0
02 Aug 2021
Word2Pix: Word to Pixel Cross Attention Transformer in Visual Grounding
Heng Zhao
Qiufeng Wang
Yew-Soon Ong
ObjD
21
23
0
31 Jul 2021
Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretraining
Xunlin Zhan
Yangxin Wu
Xiao Dong
Yunchao Wei
Minlong Lu
Yichi Zhang
Hang Xu
Xiaodan Liang
ViT
34
64
0
30 Jul 2021
Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions
Anil Rahate
Rahee Walambe
S. Ramanna
K. Kotecha
27
135
0
29 Jul 2021
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
Pengfei Liu
Weizhe Yuan
Jinlan Fu
Zhengbao Jiang
Hiroaki Hayashi
Graham Neubig
VLM
SyDa
102
3,858
0
28 Jul 2021
Exceeding the Limits of Visual-Linguistic Multi-Task Learning
Cameron R. Wolfe
Keld T. Lundgaard
VLM
45
2
0
27 Jul 2021
Language Grounding with 3D Objects
Jesse Thomason
Mohit Shridhar
Yonatan Bisk
Chris Paxton
Luke Zettlemoyer
LM&Ro
33
53
0
26 Jul 2021
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
Yuren Cong
Wentong Liao
H. Ackermann
Bodo Rosenhahn
M. Yang
ViT
24
122
0
26 Jul 2021
Previous
1
2
3
...
32
33
34
...
40
41
42
Next