Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1912.06430
Cited By
End-to-End Learning of Visual Representations from Uncurated Instructional Videos
13 December 2019
Antoine Miech
Jean-Baptiste Alayrac
Lucas Smaira
Ivan Laptev
Josef Sivic
Andrew Zisserman
VGen
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"End-to-End Learning of Visual Representations from Uncurated Instructional Videos"
39 / 189 papers shown
Title
Divide and Contrast: Self-supervised Learning from Uncurated Data
Yonglong Tian
Olivier J. Hénaff
Aaron van den Oord
SSL
56
96
0
17 May 2021
Designing Multimodal Datasets for NLP Challenges
James Pustejovsky
E. Holderness
Jingxuan Tu
Parker Glenn
Kyeongmin Rim
Kelley Lynch
R. Brutti
18
5
0
12 May 2021
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey
Jinjie Ni
Tom Young
Vlad Pandelea
Fuzhao Xue
Erik Cambria
54
268
0
10 May 2021
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
Xiaohan Wang
Linchao Zhu
Yi Yang
164
170
0
20 Apr 2021
Compressing Visual-linguistic Model via Knowledge Distillation
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lijuan Wang
Yezhou Yang
Zicheng Liu
VLM
31
96
0
05 Apr 2021
Multiview Pseudo-Labeling for Semi-supervised Learning from Video
Bo Xiong
Haoqi Fan
Kristen Grauman
Christoph Feichtenhofer
SSL
22
49
0
01 Apr 2021
Composable Augmentation Encoding for Video Representation Learning
Chen Sun
Arsha Nagrani
Yonglong Tian
Cordelia Schmid
SSL
AI4TS
24
17
0
01 Apr 2021
Broaden Your Views for Self-Supervised Video Learning
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSL
AI4TS
23
127
0
30 Mar 2021
Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization
Mengmeng Xu
Juan-Manuel Perez-Rua
Xiatian Zhu
Bernard Ghanem
Brais Martinez
15
27
0
28 Mar 2021
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
Maksim Dzabraev
M. Kalashnikov
Stepan Alekseevich Komkov
Aleksandr Petiushko
24
128
0
19 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Mandela Patrick
Yuki M. Asano
Bernie Huang
Ishan Misra
Florian Metze
Joao Henriques
Andrea Vedaldi
AI4TS
21
33
0
18 Mar 2021
On Semantic Similarity in Video Retrieval
Michael Wray
Hazel Doughty
Dima Damen
23
66
0
18 Mar 2021
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Po-Yao (Bernie) Huang
Mandela Patrick
Junjie Hu
Graham Neubig
Florian Metze
Alexander G. Hauptmann
MLLM
VLM
24
56
0
16 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
103
27,682
0
26 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Mohit Bansal
MLLM
256
525
0
04 Feb 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
79
110
0
31 Jan 2021
Learning Temporal Dynamics from Cycles in Narrated Video
Dave Epstein
Jiajun Wu
Cordelia Schmid
Chen Sun
AI4TS
33
14
0
07 Jan 2021
Learning the Predictability of the Future
Dídac Surís
Ruoshi Liu
Carl Vondrick
21
71
0
01 Jan 2021
A Comprehensive Study of Deep Video Action Recognition
Yi Zhu
Xinyu Li
Chunhui Liu
Mohammadreza Zolfaghari
Yuanjun Xiong
Chongruo Wu
Zhi-Li Zhang
Joseph Tighe
R. Manmatha
Mu Li
VLM
AI4TS
38
184
0
11 Dec 2020
Look Before you Speak: Visually Contextualized Utterances
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
21
66
0
10 Dec 2020
Hierarchically Decoupled Spatial-Temporal Contrast for Self-supervised Video Representation Learning
Zehua Zhang
David J. Crandall
AI4TS
SSL
28
23
0
23 Nov 2020
Game Plan: What AI can do for Football, and What Football can do for AI
K. Tuyls
Shayegan Omidshafiei
Paul Muller
Zhe Wang
Jerome T. Connor
...
Simon Bouton
Nathalie Beauguerlange
Jackson Broshear
T. Graepel
Demis Hassabis
38
78
0
18 Nov 2020
Multimodal Pretraining for Dense Video Captioning
Gabriel Huang
Bo Pang
Zhenhai Zhu
Clara E. Rivera
Radu Soricut
13
81
0
10 Nov 2020
Center-wise Local Image Mixture For Contrastive Representation Learning
Hao Li
Xiaopeng Zhang
H. Xiong
18
8
0
05 Nov 2020
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Simon Ging
Mohammadreza Zolfaghari
Hamed Pirsiavash
Thomas Brox
ViT
CLIP
13
168
0
01 Nov 2020
SHERLock: Self-Supervised Hierarchical Event Representation Learning
Sumegh Roychowdhury
S. Sontakke
Nikaash Puri
Mausoom Sarkar
Milan Aggarwal
Pinkesh Badjatiya
Balaji Krishnamurthy
Laurent Itti
SSL
DRL
30
1
0
06 Oct 2020
Hard Negative Mixing for Contrastive Learning
Yannis Kalantidis
Mert Bulent Sariyildiz
Noé Pion
Philippe Weinzaepfel
Diane Larlus
SSL
32
628
0
02 Oct 2020
Spatiotemporal Contrastive Video Representation Learning
Rui Qian
Tianjian Meng
Boqing Gong
Ming-Hsuan Yang
H. Wang
Serge J. Belongie
Yin Cui
SSL
AI4TS
22
491
0
09 Aug 2020
Learning Video Representations from Textual Web Supervision
Jonathan C. Stroud
Zhichao Lu
Chen Sun
Jia Deng
Rahul Sukthankar
Cordelia Schmid
David A. Ross
SSL
32
48
0
29 Jul 2020
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
424
596
0
21 Jul 2020
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
40
371
0
29 Jun 2020
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David F. Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
22
141
0
16 Jun 2020
What Makes for Good Views for Contrastive Learning?
Yonglong Tian
Chen Sun
Ben Poole
Dilip Krishnan
Cordelia Schmid
Phillip Isola
SSL
22
1,305
0
20 May 2020
Condensed Movies: Story Based Retrieval with Contextual Embeddings
Max Bain
Arsha Nagrani
A. Brown
Andrew Zisserman
33
100
0
08 May 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
MLLM
VLM
OffRL
AI4TS
41
493
0
01 May 2020
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Bernard Ghanem
Du Tran
SSL
23
428
0
28 Nov 2019
Use What You Have: Video Retrieval Using Representations From Collaborative Experts
Yang Liu
Samuel Albanie
Arsha Nagrani
Andrew Zisserman
36
387
0
31 Jul 2019
Contrastive Multiview Coding
Yonglong Tian
Dilip Krishnan
Phillip Isola
SSL
59
2,360
0
13 Jun 2019
A Multi-View Embedding Space for Modeling Internet Images, Tags, and their Semantics
Yunchao Gong
Qifa Ke
Michael Isard
Svetlana Lazebnik
3DV
71
584
0
18 Dec 2012
Previous
1
2
3
4