Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.01008
Cited By
v1
v2 (latest)
Self-Supervised Multimodal Learning: A Survey
31 March 2023
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Github (254★)
Papers citing
"Self-Supervised Multimodal Learning: A Survey"
27 / 127 papers shown
Title
VideoBERT: A Joint Model for Video and Language Representation Learning
Chen Sun
Austin Myers
Carl Vondrick
Kevin Patrick Murphy
Cordelia Schmid
VLM
SSL
82
1,250
0
03 Apr 2019
Unpaired Image Captioning via Scene Graph Alignments
Jiuxiang Gu
Shafiq Joty
Jianfei Cai
Handong Zhao
Xu Yang
G. Wang
GNN
73
174
0
26 Mar 2019
The Missing Ingredient in Zero-Shot Neural Machine Translation
N. Arivazhagan
Ankur Bapna
Orhan Firat
Roee Aharoni
Melvin Johnson
Wolfgang Macherey
83
117
0
17 Mar 2019
Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey
Longlong Jing
Yingli Tian
SSL
171
1,701
0
16 Feb 2019
From Recognition to Cognition: Visual Commonsense Reasoning
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
LRM
BDL
OCL
ReLM
183
883
0
27 Nov 2018
Learning Latent Dynamics for Planning from Pixels
Danijar Hafner
Timothy Lillicrap
Ian S. Fischer
Ruben Villegas
David R Ha
Honglak Lee
James Davidson
BDL
92
1,448
0
12 Nov 2018
Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks
Michelle A. Lee
Yuke Zhu
K. Srinivasan
Parth Shah
Silvio Savarese
Li Fei-Fei
Animesh Garg
Jeannette Bohg
SSL
91
370
0
24 Oct 2018
Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization
Bruno Korbar
Du Tran
Lorenzo Torresani
99
476
0
30 Jun 2018
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens
Alexei A. Efros
SSL
100
754
0
10 Apr 2018
The Sound of Pixels
Hang Zhao
Chuang Gan
Andrew Rouditchenko
Carl Vondrick
Josh H. McDermott
Antonio Torralba
VLM
102
537
0
09 Apr 2018
Scene Graph Parsing as Dependency Parsing
Yu-Siang Wang
Chenxi Liu
Fangyin Wei
Alan Yuille
GNN
3DV
45
53
0
25 Mar 2018
VizWiz Grand Challenge: Answering Visual Questions from Blind People
Danna Gurari
Qing Li
Abigale Stangl
Anhong Guo
Chi Lin
Kristen Grauman
Jiebo Luo
Jeffrey P. Bigham
CoGe
114
862
0
22 Feb 2018
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjD
VOS
113
530
0
18 Dec 2017
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Aishwarya Agrawal
Dhruv Batra
Devi Parikh
Aniruddha Kembhavi
OOD
155
586
0
01 Dec 2017
Unsupervised Machine Translation Using Monolingual Corpora Only
Guillaume Lample
Alexis Conneau
Ludovic Denoyer
MarcÁurelio Ranzato
SSL
146
1,097
0
31 Oct 2017
Localizing Moments in Video with Natural Language
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell
Bryan C. Russell
125
949
0
04 Aug 2017
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
116
2,939
0
26 May 2017
Look, Listen and Learn
Relja Arandjelović
Andrew Zisserman
SSL
127
906
0
23 May 2017
Dense-Captioning Events in Videos
Ranjay Krishna
Kenji Hata
F. Ren
Li Fei-Fei
Juan Carlos Niebles
144
1,250
0
02 May 2017
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
Y. Jang
Yale Song
Youngjae Yu
Youngjin Kim
Gunhee Kim
87
561
0
14 Apr 2017
Towards Automatic Learning of Procedures from Web Instructional Videos
Luowei Zhou
Chenliang Xu
Jason J. Corso
EgoV
75
831
0
28 Mar 2017
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
352
3,273
0
02 Dec 2016
Conditional Image Generation with PixelCNN Decoders
Aaron van den Oord
Nal Kalchbrenner
Oriol Vinyals
L. Espeholt
Alex Graves
Koray Kavukcuoglu
VLM
214
2,519
0
16 Jun 2016
Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles
M. Noroozi
Paolo Favaro
SSL
177
2,986
0
30 Mar 2016
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Julia Hockenmaier
Svetlana Lazebnik
208
2,074
0
19 May 2015
VQA: Visual Question Answering
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
226
5,509
0
03 May 2015
Understanding image representations by measuring their equivariance and equivalence
Karel Lenc
Andrea Vedaldi
SSL
FAtt
116
537
0
21 Nov 2014
Previous
1
2
3