Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.15613
Cited By
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
24 May 2024
Huy V. Vo
Vasil Khalidov
Timothée Darcet
Théo Moutakanni
Nikita Smetanin
Marc Szafraniec
Hugo Touvron
Camille Couprie
Maxime Oquab
Armand Joulin
Hervé Jégou
Patrick Labatut
Piotr Bojanowski
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach"
50 / 56 papers shown
Title
Demystifying CLIP Data
Hu Xu
Saining Xie
Xiaoqing Ellen Tan
Po-Yao (Bernie) Huang
Russell Howes
Vasu Sharma
Shang-Wen Li
Gargi Ghosh
Luke Zettlemoyer
Christoph Feichtenhofer
VLM
CLIP
87
121
0
31 Dec 2024
Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling
David Grangier
Simin Fan
Skyler Seto
Pierre Ablin
165
4
0
30 Sep 2024
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
372
2,377
0
09 Nov 2022
Open Long-Tailed Recognition in a Dynamic World
Ziwei Liu
Zhongqi Miao
Xiaohang Zhan
Jiayun Wang
Boqing Gong
Stella X. Yu
VLM
78
21
0
17 Aug 2022
Active Learning Strategies for Weakly-supervised Object Detection
Huy V. Vo
Oriane Siméoni
Spyros Gidaris
Andrei Bursuc
Patrick Pérez
Jean Ponce
89
19
0
25 Jul 2022
Beyond neural scaling laws: beating power law scaling via data pruning
Ben Sorscher
Robert Geirhos
Shashank Shekhar
Surya Ganguli
Ari S. Morcos
85
439
0
29 Jun 2022
Masked Siamese Networks for Label-Efficient Learning
Mahmoud Assran
Mathilde Caron
Ishan Misra
Piotr Bojanowski
Florian Bordes
Pascal Vincent
Armand Joulin
Michael G. Rabbat
Nicolas Ballas
SSL
81
320
0
14 Apr 2022
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
439
6,222
0
05 Apr 2022
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
183
1,944
0
29 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
780
12,893
0
04 Mar 2022
A Self-Supervised Descriptor for Image Copy Detection
Ed Pizzi
Sreya . Dutta Roy
Sugosh Nagavara Ravindra
Priya Goyal
Matthijs Douze
SSL
64
125
0
21 Feb 2022
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
Priya Goyal
Quentin Duval
Isaac Seessel
Mathilde Caron
Ishan Misra
Levent Sagun
Armand Joulin
Piotr Bojanowski
VLM
SSL
72
111
0
16 Feb 2022
Fairness Indicators for Systematic Assessments of Visual Feature Extractors
Priya Goyal
Adriana Romero Soriano
C. Hazirbas
Levent Sagun
Nicolas Usunier
EGVM
48
31
0
15 Feb 2022
PASS: An ImageNet replacement for self-supervised pretraining without humans
Yuki M. Asano
Christian Rupprecht
Andrew Zisserman
Andrea Vedaldi
VLM
SSL
77
58
0
27 Sep 2021
Deep Learning on a Data Diet: Finding Important Examples Early in Training
Mansheej Paul
Surya Ganguli
Gintare Karolina Dziugaite
105
457
0
15 Jul 2021
Divide and Contrast: Self-supervised Learning from Uncurated Data
Yonglong Tian
Olivier J. Hénaff
Aaron van den Oord
SSL
110
100
0
17 May 2021
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning
Adrien Bardes
Jean Ponce
Yann LeCun
SSL
DML
149
932
0
11 May 2021
With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations
Debidatta Dwibedi
Y. Aytar
Jonathan Tompson
P. Sermanet
Andrew Zisserman
SSL
228
467
0
29 Apr 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
613
6,059
0
29 Apr 2021
Benchmarking Representation Learning for Natural World Image Collections
Grant Van Horn
Elijah Cole
Sara Beery
Kimberly Wilber
Serge J. Belongie
Oisin Mac Aodha
SSL
VLM
62
173
0
30 Mar 2021
Active Learning for Deep Object Detection via Probabilistic Modeling
Jiwoong Choi
Ismail Elezi
Hyuk-Jae Lee
C. Farabet
J. Álvarez
50
122
0
30 Mar 2021
Vision Transformers for Dense Prediction
René Ranftl
Alexey Bochkovskiy
V. Koltun
ViT
MDE
125
1,729
0
24 Mar 2021
Barlow Twins: Self-Supervised Learning via Redundancy Reduction
Jure Zbontar
Li Jing
Ishan Misra
Yann LeCun
Stéphane Deny
SSL
300
2,343
0
04 Mar 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
385
4,937
0
24 Feb 2021
BYOL works even without batch statistics
Pierre Harvey Richemond
Jean-Bastien Grill
Florent Altché
Corentin Tallec
Florian Strub
...
Samuel L. Smith
Soham De
Razvan Pascanu
Bilal Piot
Michal Valko
SSL
286
115
0
20 Oct 2020
What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation
Vitaly Feldman
Chiyuan Zhang
TDI
140
462
0
09 Aug 2020
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
Dan Hendrycks
Steven Basart
Norman Mu
Saurav Kadavath
Frank Wang
...
Samyak Parajuli
Mike Guo
D. Song
Jacob Steinhardt
Justin Gilmer
OOD
318
1,732
0
29 Jun 2020
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
Mathilde Caron
Ishan Misra
Julien Mairal
Priya Goyal
Piotr Bojanowski
Armand Joulin
OCL
SSL
215
4,073
0
17 Jun 2020
Bootstrap your own latent: A new approach to self-supervised Learning
Jean-Bastien Grill
Florian Strub
Florent Altché
Corentin Tallec
Pierre Harvey Richemond
...
M. G. Azar
Bilal Piot
Koray Kavukcuoglu
Rémi Munos
Michal Valko
SSL
351
6,792
0
13 Jun 2020
A Simple Framework for Contrastive Learning of Visual Representations
Ting-Li Chen
Simon Kornblith
Mohammad Norouzi
Geoffrey E. Hinton
SSL
343
18,739
0
13 Feb 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
550
4,797
0
23 Jan 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
408
42,393
0
03 Dec 2019
Momentum Contrast for Unsupervised Visual Representation Learning
Kaiming He
Haoqi Fan
Yuxin Wu
Saining Xie
Ross B. Girshick
SSL
183
12,073
0
13 Nov 2019
Self-labelling via simultaneous clustering and representation learning
Yuki M. Asano
Christian Rupprecht
Andrea Vedaldi
SSL
112
770
0
13 Nov 2019
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data
Guillaume Wenzek
Marie-Anne Lachaux
Alexis Conneau
Vishrav Chaudhary
Francisco Guzmán
Armand Joulin
Edouard Grave
81
654
0
01 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
394
20,114
0
23 Oct 2019
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers
Iryna Gurevych
1.2K
12,181
0
27 Aug 2019
Natural Adversarial Examples
Dan Hendrycks
Kevin Zhao
Steven Basart
Jacob Steinhardt
D. Song
OODD
193
1,469
0
16 Jul 2019
Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds
Jordan T. Ash
Chicheng Zhang
A. Krishnamurthy
John Langford
Alekh Agarwal
BDL
UQCV
85
772
0
09 Jun 2019
Learning Robust Global Representations by Penalizing Local Predictive Power
Haohan Wang
Songwei Ge
Eric Xing
Zachary Chase Lipton
OOD
112
957
0
29 May 2019
Billion-scale semi-supervised learning for image classification
I. Z. Yalniz
Hervé Jégou
Kan Chen
Manohar Paluri
D. Mahajan
SSL
88
463
0
02 May 2019
Large-Scale Long-Tailed Recognition in an Open World
Ziwei Liu
Zhongqi Miao
Xiaohang Zhan
Jiayun Wang
Boqing Gong
Stella X. Yu
145
1,158
0
10 Apr 2019
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
Dan Hendrycks
Thomas G. Dietterich
OOD
VLM
167
3,429
0
28 Mar 2019
Do ImageNet Classifiers Generalize to ImageNet?
Benjamin Recht
Rebecca Roelofs
Ludwig Schmidt
Vaishaal Shankar
OOD
SSeg
VLM
109
1,714
0
13 Feb 2019
Diverse mini-batch Active Learning
Fedor Zhdanov
55
155
0
17 Jan 2019
An Empirical Study of Example Forgetting during Deep Neural Network Learning
Mariya Toneva
Alessandro Sordoni
Rémi Tachet des Combes
Adam Trischler
Yoshua Bengio
Geoffrey J. Gordon
107
733
0
12 Dec 2018
Active Learning for Deep Object Detection
C. Brust
Christoph Käding
Joachim Denzler
VLM
ObjD
48
118
0
26 Sep 2018
Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination
Zhirong Wu
Yuanjun Xiong
Stella X. Yu
Dahua Lin
SSL
170
3,452
0
05 May 2018
Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking
Filip Radenovic
Ahmet Iscen
Giorgos Tolias
Yannis Avrithis
Ondřej Chum
49
379
0
29 Mar 2018
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Mandar Joshi
Eunsol Choi
Daniel S. Weld
Luke Zettlemoyer
RALM
195
2,643
0
09 May 2017
1
2
Next