Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.04971
Cited By
v1
v2 (latest)
LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization
8 May 2023
Peng Lu
Ahmad Rashid
I. Kobyzev
Mehdi Rezagholizadeh
Philippe Langlais
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization"
31 / 31 papers shown
Title
Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging
Peng Lu
I. Kobyzev
Mehdi Rezagholizadeh
Ahmad Rashid
A. Ghodsi
Philippe Langlais
MoMe
74
11
0
12 Dec 2022
Do we need Label Regularization to Fine-tune Pre-trained Language Models?
I. Kobyzev
A. Jafari
Mehdi Rezagholizadeh
Tianda Li
Alan Do-Omri
Peng Lu
Pascal Poupart
A. Ghodsi
71
2
0
25 May 2022
Focus on the Target's Vocabulary: Masked Label Smoothing for Machine Translation
Liang Chen
Runxin Xu
Baobao Chang
27
6
0
06 Mar 2022
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
265
1,898
0
26 Oct 2021
Is Label Smoothing Truly Incompatible with Knowledge Distillation: An Empirical Study
Zhiqiang Shen
Zechun Liu
Dejia Xu
Zitian Chen
Kwang-Ting Cheng
Marios Savvides
54
76
0
01 Apr 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
670
41,430
0
22 Oct 2020
On Long-Tailed Phenomena in Neural Machine Translation
Vikas Raunak
Siddharth Dalmia
Vivek Gupta
Florian Metze
51
30
0
10 Oct 2020
Self-Knowledge Distillation with Progressive Refinement of Targets
Kyungyul Kim
Byeongmoon Ji
Doyoung Yoon
Sangheum Hwang
ODL
81
182
0
22 Jun 2020
Self-Distillation as Instance-Specific Label Smoothing
Zhilu Zhang
M. Sabuncu
69
119
0
09 Jun 2020
Stolen Probability: A Structural Weakness of Neural Language Models
David Demeter
Gregory J. Kimmel
Doug Downey
51
33
0
05 May 2020
Generalized Entropy Regularization or: There's Nothing Special about Label Smoothing
Clara Meister
Elizabeth Salesky
Ryan Cotterell
UQCV
44
61
0
02 May 2020
Regularizing Class-wise Predictions via Self-knowledge Distillation
Sukmin Yun
Jongjin Park
Kimin Lee
Jinwoo Shin
68
281
0
31 Mar 2020
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
255
7,547
0
02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
113
1,869
0
23 Sep 2019
When Does Label Smoothing Help?
Rafael Müller
Simon Kornblith
Geoffrey E. Hinton
UQCV
207
1,953
0
06 Jun 2019
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Sangdoo Yun
Dongyoon Han
Seong Joon Oh
Sanghyuk Chun
Junsuk Choe
Y. Yoo
OOD
622
4,802
0
13 May 2019
The Curious Case of Neural Text Degeneration
Ari Holtzman
Jan Buys
Li Du
Maxwell Forbes
Yejin Choi
199
3,210
0
22 Apr 2019
Bag of Tricks for Image Classification with Convolutional Neural Networks
Tong He
Zhi-Li Zhang
Hang Zhang
Zhongyue Zhang
Junyuan Xie
Mu Li
293
1,421
0
04 Dec 2018
SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation
Xinyi Wang
Hieu H. Pham
Zihang Dai
Graham Neubig
70
197
0
22 Aug 2018
MobileNetV2: Inverted Residuals and Linear Bottlenecks
Mark Sandler
Andrew G. Howard
Menglong Zhu
A. Zhmoginov
Liang-Chieh Chen
204
19,333
0
13 Jan 2018
Improving Lexical Choice in Neural Machine Translation
Toan Q. Nguyen
David Chiang
58
86
0
03 Oct 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
786
132,363
0
12 Jun 2017
Convolutional Sequence to Sequence Learning
Jonas Gehring
Michael Auli
David Grangier
Denis Yarats
Yann N. Dauphin
AIMat
171
3,289
0
08 May 2017
Regularizing Neural Networks by Penalizing Confident Output Distributions
Gabriel Pereyra
George Tucker
J. Chorowski
Lukasz Kaiser
Geoffrey E. Hinton
NoLa
165
1,141
0
23 Jan 2017
Towards better decoding and language model integration in sequence to sequence models
J. Chorowski
Navdeep Jaitly
78
370
0
08 Dec 2016
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
432
10,531
0
21 Jul 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,426
0
10 Dec 2015
Rethinking the Inception Architecture for Computer Vision
Christian Szegedy
Vincent Vanhoucke
Sergey Ioffe
Jonathon Shlens
Z. Wojna
3DV
BDL
886
27,416
0
02 Dec 2015
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
228
7,757
0
31 Aug 2015
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
364
19,733
0
09 Mar 2015
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe
Christian Szegedy
OOD
465
43,341
0
11 Feb 2015
1