Survey on Self-Supervised Multimodal Representation Learning and Foundation Models

29 November 2022

Papers citing "Survey on Self-Supervised Multimodal Representation Learning and Foundation Models"

27 / 27 papers shown

Title
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text Hassan Akbari Liangzhe Yuan Rui Qian Wei-Hong Chuang Shih-Fu Chang Huayu Chen Boqing Gong ViT 333 590 0 22 Apr 2021
CvT: Introducing Convolutions to Vision Transformers Haiping Wu Bin Xiao Noel Codella Mengchen Liu Xiyang Dai Lu Yuan Lei Zhang ViT 156 1,921 0 29 Mar 2021
Perceiver: General Perception with Iterative Attention Andrew Jaegle Felix Gimeno Andrew Brock Andrew Zisserman Oriol Vinyals João Carreira VLM ViT MDE 210 1,024 0 04 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision Alec Radford Jong Wook Kim Chris Hallacy Aditya A. Ramesh Gabriel Goh ... Amanda Askell Pamela Mishkin Jack Clark Gretchen Krueger Ilya Sutskever CLIP VLM 1.0K 29,926 0 26 Feb 2021
Reinforcement Learning with Prototypical Representations Denis Yarats Rob Fergus A. Lazaric Lerrel Pinto SSL 75 226 0 22 Feb 2021
Generative Spoken Language Modeling from Raw Audio Kushal Lakhotia Evgeny Kharitonov Wei-Ning Hsu Yossi Adi Adam Polyak ... Tu Nguyen Jade Copet Alexei Baevski A. Mohamed Emmanuel Dupoux AuLLM 270 365 0 01 Feb 2021
A Survey on Contrastive Self-supervised Learning Ashish Jaiswal Ashwin Ramesh Babu Mohammad Zaki Zadeh Debapriya Banerjee F. Makedon SSL 134 1,403 0 31 Oct 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai ... Matthias Minderer G. Heigold Sylvain Gelly Jakob Uszkoreit N. Houlsby ViT 684 41,483 0 22 Oct 2020
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments Mathilde Caron Ishan Misra Julien Mairal Priya Goyal Piotr Bojanowski Armand Joulin OCL SSL 278 4,101 0 17 Jun 2020
Linformer: Self-Attention with Linear Complexity Sinong Wang Belinda Z. Li Madian Khabsa Han Fang Hao Ma 219 1,719 0 08 Jun 2020
Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation Po-Han Chi Pei-Hung Chung Tsung-Han Wu Chun-Cheng Hsieh Yen-Hao Chen Shang-Wen Li Hung-yi Lee SSL 76 148 0 18 May 2020
A Simple Framework for Contrastive Learning of Visual Representations Ting-Li Chen Simon Kornblith Mohammad Norouzi Geoffrey E. Hinton SSL 390 18,897 0 13 Feb 2020
Momentum Contrast for Unsupervised Visual Representation Learning Kaiming He Haoqi Fan Yuxin Wu Saining Xie Ross B. Girshick SSL 216 12,136 0 13 Nov 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations Zhenzhong Lan Mingda Chen Sebastian Goodman Kevin Gimpel Piyush Sharma Radu Soricut SSL AIMat 373 6,469 0 26 Sep 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations Weijie Su Xizhou Zhu Yue Cao Bin Li Lewei Lu Furu Wei Jifeng Dai VLM MLLM SSL 184 1,668 0 22 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language Liunian Harold Li Mark Yatskar Da Yin Cho-Jui Hsieh Kai-Wei Chang VLM 155 1,967 0 09 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks Jiasen Lu Dhruv Batra Devi Parikh Stefan Lee SSL VLM 255 3,699 0 06 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy M. Lewis Luke Zettlemoyer Veselin Stoyanov AIMat 697 24,572 0 26 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding Zhilin Yang Zihang Dai Yiming Yang J. Carbonell Ruslan Salakhutdinov Quoc V. Le AI4CE 236 8,451 0 19 Jun 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Zihang Dai Zhilin Yang Yiming Yang J. Carbonell Quoc V. Le Ruslan Salakhutdinov VLM 260 3,747 0 09 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.8K 95,324 0 11 Oct 2018
Deep Clustering for Unsupervised Learning of Visual Features Mathilde Caron Piotr Bojanowski Armand Joulin Matthijs Douze SSL 90 1,902 0 15 Jul 2018
Deep contextualized word representations Matthew E. Peters Mark Neumann Mohit Iyyer Matt Gardner Christopher Clark Kenton Lee Luke Zettlemoyer NAI 233 11,565 0 15 Feb 2018
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 805 132,725 0 12 Jun 2017
Sequence to Sequence Learning with Neural Networks Ilya Sutskever Oriol Vinyals Quoc V. Le AIMat 450 20,606 0 10 Sep 2014
Neural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau Kyunghyun Cho Yoshua Bengio AIMat 580 27,338 0 01 Sep 2014
Distributed Representations of Words and Phrases and their Compositionality Tomas Mikolov Ilya Sutskever Kai Chen G. Corrado J. Dean NAI OCL 404 33,573 0 16 Oct 2013