Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.06604
Cited By
Do we really have to filter out random noise in pre-training data for language models?
10 February 2025
Jinghan Ru
Yuxin Xie
Xianwei Zhuang
Yuguo Yin
Zhihui Guo
Zhiming Liu
Qianli Ren
Yuexian Zou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Do we really have to filter out random noise in pre-training data for language models?"
47 / 97 papers shown
Title
mGPT: Few-Shot Learners Go Multilingual
Oleh Shliazhko
Alena Fenogenova
Maria Tikhonova
Vladislav Mikhailov
Anastasia Kozlova
Tatiana Shavrina
71
150
0
15 Apr 2022
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models
Eldar Kurtic
Daniel Fernando Campos
Tuan Nguyen
Elias Frantar
Mark Kurtz
Ben Fineran
Michael Goin
Dan Alistarh
VLM
MQ
MedIm
62
122
0
14 Mar 2022
Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning
Yang Zhao
Hao Zhang
Xiuyuan Hu
112
119
0
08 Feb 2022
Pure Noise to the Rescue of Insufficient Data: Improving Imbalanced Classification by Training on Random Noise Images
Shiran Zada
Itay Benou
Michal Irani
78
27
0
16 Dec 2021
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling
Renrui Zhang
Rongyao Fang
Wei Zhang
Peng Gao
Kunchang Li
Jifeng Dai
Yu Qiao
Hongsheng Li
VLM
254
390
0
06 Nov 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
178
1,794
0
26 Oct 2021
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
443
2,340
0
02 Sep 2021
Noise Stability Regularization for Improving BERT Fine-tuning
Hang Hua
Xingjian Li
Dejing Dou
Chengzhong Xu
Jiebo Luo
36
44
0
10 Jul 2021
Unveiling the structure of wide flat minima in neural networks
Carlo Baldassi
Clarissa Lauditi
Enrico M. Malatesta
Gabriele Perugini
R. Zecchina
47
33
0
02 Jul 2021
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
264
10,099
0
17 Jun 2021
Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning
Colin Wei
Sang Michael Xie
Tengyu Ma
104
100
0
17 Jun 2021
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Guoguo Chen
Shuzhou Chai
Guan-Bo Wang
Jiayu Du
Weiqiang Zhang
...
Xuchen Yao
Yongqing Wang
Yujun Wang
Zhao You
Zhiyong Yan
86
360
0
13 Jun 2021
ImageNet-21K Pretraining for the Masses
T. Ridnik
Emanuel Ben-Baruch
Asaf Noy
Lihi Zelnik-Manor
SSeg
VLM
CLIP
275
692
0
22 Apr 2021
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization
Zeke Xie
Li-xin Yuan
Zhanxing Zhu
Masashi Sugiyama
50
29
0
31 Mar 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
339
21,175
0
25 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
692
28,659
0
26 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
386
2,051
0
31 Dec 2020
On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A Gradient-Norm Perspective
Zeke Xie
Zhiqiang Xu
Jingzhao Zhang
Issei Sato
Masashi Sugiyama
32
24
0
23 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
408
40,217
0
22 Oct 2020
A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks
Nikunj Saunshi
Sadhika Malladi
Sanjeev Arora
53
87
0
07 Oct 2020
Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret
Ariel Kleiner
H. Mobahi
Behnam Neyshabur
AAML
169
1,323
0
03 Oct 2020
Implicit Gradient Regularization
David Barrett
Benoit Dherin
53
149
0
23 Sep 2020
Learning from Noisy Labels with Deep Neural Networks: A Survey
Hwanjun Song
Minseok Kim
Dongmin Park
Yooju Shin
Jae-Gil Lee
NoLa
86
979
0
16 Jul 2020
News Sentiment Analysis
Antony Samuels
John Mcgonical
21
10
0
05 Jul 2020
Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization
Sang Michael Xie
Tengyu Ma
Percy Liang
87
13
0
29 Jun 2020
Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum
Zeke Xie
Xinrui Wang
Huishuai Zhang
Issei Sato
Masashi Sugiyama
ODL
52
47
0
29 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
510
41,106
0
28 May 2020
Finding Universal Grammatical Relations in Multilingual BERT
Ethan A. Chi
John Hewitt
Christopher D. Manning
38
151
0
09 May 2020
Does label smoothing mitigate label noise?
Michal Lukasik
Srinadh Bhojanapalli
A. Menon
Surinder Kumar
NoLa
112
348
0
05 Mar 2020
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
Haoming Jiang
Pengcheng He
Weizhu Chen
Xiaodong Liu
Jianfeng Gao
T. Zhao
65
560
0
08 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
296
19,824
0
23 Oct 2019
Structured Pruning of Large Language Models
Ziheng Wang
Jeremy Wohlwend
Tao Lei
38
283
0
10 Oct 2019
Mitigating Uncertainty in Document Classification
Xuchao Zhang
Fanglan Chen
Chang-Tien Lu
Naren Ramakrishnan
42
43
0
17 Jul 2019
How multilingual is Multilingual BERT?
Telmo Pires
Eva Schlinger
Dan Garrette
LRM
VLM
133
1,392
0
04 Jun 2019
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Mingxing Tan
Quoc V. Le
3DV
MedIm
123
17,950
0
28 May 2019
WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations
Mohammad Taher Pilehvar
Jose Camacho-Collados
125
478
0
28 Aug 2018
Rotation Equivariant CNNs for Digital Pathology
Bastiaan S. Veeling
J. Linmans
Jim Winkens
Taco S. Cohen
Max Welling
92
573
0
08 Jun 2018
EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification
P. Helber
B. Bischke
Andreas Dengel
Damian Borth
106
1,790
0
31 Aug 2017
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
Chen Sun
Abhinav Shrivastava
Saurabh Singh
Abhinav Gupta
VLM
110
2,386
0
10 Jul 2017
SmoothGrad: removing noise by adding noise
D. Smilkov
Nikhil Thorat
Been Kim
F. Viégas
Martin Wattenberg
FAtt
ODL
187
2,215
0
12 Jun 2017
Remote Sensing Image Scene Classification: Benchmark and State of the Art
Gong Cheng
Junwei Han
Xiaoqiang Lu
79
2,237
0
01 Mar 2017
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
1.4K
192,638
0
10 Dec 2015
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
246
19,523
0
09 Mar 2015
Sequence to Sequence Learning with Neural Networks
Ilya Sutskever
Oriol Vinyals
Quoc V. Le
AIMat
295
20,491
0
10 Sep 2014
Describing Textures in the Wild
Mircea Cimpoi
Subhransu Maji
Iasonas Kokkinos
S. Mohamed
Andrea Vedaldi
3DV
85
2,632
0
14 Nov 2013
Fine-Grained Visual Classification of Aircraft
Subhransu Maji
Esa Rahtu
Arno Solin
Matthew Blaschko
Andrea Vedaldi
91
2,227
0
21 Jun 2013
Optimistic Rates for Learning with a Smooth Loss
Nathan Srebro
Karthik Sridharan
Ambuj Tewari
145
282
0
20 Sep 2010
Previous
1
2