Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.10035
Cited By
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
20 February 2023
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey"
46 / 96 papers shown
Title
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Huayu Chen
Boqing Gong
ViT
289
581
0
22 Apr 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
339
4,873
0
24 Feb 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
418
1,103
0
17 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
407
3,778
0
11 Feb 2021
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
263
2,463
0
04 Jan 2021
A Survey on Visual Transformer
Kai Han
Yunhe Wang
Hanting Chen
Xinghao Chen
Jianyuan Guo
...
Chunjing Xu
Yixing Xu
Zhaohui Yang
Yiman Zhang
Dacheng Tao
ViT
130
2,174
0
23 Dec 2020
KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning
Dandan Song
S. Ma
Zhanchen Sun
Sicheng Yang
L. Liao
SSL
LRM
52
38
0
13 Dec 2020
Pre-Trained Image Processing Transformer
Hanting Chen
Yunhe Wang
Tianyu Guo
Chang Xu
Yiping Deng
Zhenhua Liu
Siwei Ma
Chunjing Xu
Chao Xu
Wen Gao
VLM
ViT
125
1,659
0
01 Dec 2020
ActBERT: Learning Global-Local Video-Text Representations
Linchao Zhu
Yi Yang
ViT
111
419
0
14 Nov 2020
Do Syntax Trees Help Pre-trained Transformers Extract Information?
Devendra Singh Sachan
Yuhao Zhang
Peng Qi
William L. Hamilton
39
78
0
20 Aug 2020
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
Fei Yu
Jiji Tang
Weichong Yin
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
54
377
0
30 Jun 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
206
5,734
0
20 Jun 2020
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjD
VLM
52
493
0
11 Jun 2020
FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval
D. Gao
Linbo Jin
Ben Chen
Minghui Qiu
Peng Li
Yi Wei
Yitao Hu
Haozhe Jasper Wang
OOD
60
133
0
20 May 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
82
1,927
0
13 Apr 2020
A Survey on Contextual Embeddings
Qi Liu
Matt J. Kusner
Phil Blunsom
234
146
0
16 Mar 2020
Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training
Weituo Hao
Chunyuan Li
Xiujun Li
Lawrence Carin
Jianfeng Gao
LM&Ro
61
276
0
25 Feb 2020
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
Adam Roberts
Colin Raffel
Noam M. Shazeer
KELM
84
886
0
10 Feb 2020
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
Vishvak Murahari
Dhruv Batra
Devi Parikh
Abhishek Das
VLM
56
115
0
05 Dec 2019
Composition-based Multi-Relational Graph Convolutional Networks
Shikhar Vashishth
Soumya Sanyal
Vikram Nitin
Partha P. Talukdar
GNN
111
827
0
08 Nov 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
327
933
0
24 Sep 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
130
1,657
0
22 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
122
1,939
0
09 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
210
3,659
0
06 Aug 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
103
1,192
0
07 Jun 2019
Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs
Deepak Nathani
Jatin Chauhan
Charu Sharma
Manohar Kaul
81
484
0
04 Jun 2019
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Christopher Clark
Kenton Lee
Ming-Wei Chang
Tom Kwiatkowski
Michael Collins
Kristina Toutanova
191
1,475
0
24 May 2019
ERNIE: Enhanced Language Representation with Informative Entities
Zhengyan Zhang
Xu Han
Zhiyuan Liu
Xin Jiang
Maosong Sun
Qun Liu
84
1,390
0
17 May 2019
Unified Language Model Pre-training for Natural Language Understanding and Generation
Li Dong
Nan Yang
Wenhui Wang
Furu Wei
Xiaodong Liu
Yu Wang
Jianfeng Gao
M. Zhou
H. Hon
ELM
AI4CE
174
1,553
0
08 May 2019
Visual Entailment: A Novel Task for Fine-Grained Image Understanding
Ning Xie
Farley Lai
Derek Doran
Asim Kadav
CoGe
92
322
0
20 Jan 2019
From Recognition to Cognition: Visual Commonsense Reasoning
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
LRM
BDL
OCL
ReLM
138
873
0
27 Nov 2018
FEVER: a large-scale dataset for Fact Extraction and VERification
James Thorne
Andreas Vlachos
Christos Christodoulopoulos
Arpit Mittal
HILM
118
1,633
0
14 Mar 2018
Computational Optimal Transport
Gabriel Peyré
Marco Cuturi
OT
166
2,133
0
01 Mar 2018
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
147
11,520
0
15 Feb 2018
Graph Attention Networks
Petar Velickovic
Guillem Cucurull
Arantxa Casanova
Adriana Romero
Pietro Lio
Yoshua Bengio
GNN
388
19,991
0
30 Oct 2017
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
Chen Sun
Abhinav Shrivastava
Saurabh Singh
Abhinav Gupta
VLM
135
2,386
0
10 Jul 2017
Modeling Relational Data with Graph Convolutional Networks
Michael Schlichtkrull
Thomas Kipf
Peter Bloem
Rianne van den Berg
Ivan Titov
Max Welling
GNN
172
4,772
0
17 Mar 2017
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
309
3,187
0
02 Dec 2016
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
Laurens van der Maaten
Kilian Q. Weinberger
PINN
3DV
687
36,599
0
25 Aug 2016
FVQA: Fact-based Visual Question Answering
Peng Wang
Qi Wu
Chunhua Shen
Anton van den Hengel
A. Dick
CoGe
75
455
0
17 Jun 2016
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
Christian Szegedy
Sergey Ioffe
Vincent Vanhoucke
Alexander A. Alemi
332
14,196
0
23 Feb 2016
Explicit Knowledge-based Reasoning for Visual Question Answering
Peng Wang
Qi Wu
Chunhua Shen
Anton Van Den Hengel
A. Dick
71
258
0
09 Nov 2015
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
180
2,461
0
01 Apr 2015
Embedding Entities and Relations for Learning and Inference in Knowledge Bases
Bishan Yang
Wen-tau Yih
Xiaodong He
Jianfeng Gao
Li Deng
NAI
87
3,174
0
20 Dec 2014
Spectral Networks and Locally Connected Networks on Graphs
Joan Bruna
Wojciech Zaremba
Arthur Szlam
Yann LeCun
GNN
174
4,856
0
21 Dec 2013
A Semantic Matching Energy Function for Learning with Multi-relational Data
Xavier Glorot
Antoine Bordes
Jason Weston
Yoshua Bengio
88
689
0
15 Jan 2013
Previous
1
2