ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.04971
  4. Cited By
LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization
v1v2 (latest)

LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization

8 May 2023
Peng Lu
Ahmad Rashid
I. Kobyzev
Mehdi Rezagholizadeh
Philippe Langlais
ArXiv (abs)PDFHTML

Papers citing "LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization"

31 / 31 papers shown
Title
Improving Generalization of Pre-trained Language Models via Stochastic
  Weight Averaging
Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging
Peng Lu
I. Kobyzev
Mehdi Rezagholizadeh
Ahmad Rashid
A. Ghodsi
Philippe Langlais
MoMe
74
11
0
12 Dec 2022
Do we need Label Regularization to Fine-tune Pre-trained Language
  Models?
Do we need Label Regularization to Fine-tune Pre-trained Language Models?
I. Kobyzev
A. Jafari
Mehdi Rezagholizadeh
Tianda Li
Alan Do-Omri
Peng Lu
Pascal Poupart
A. Ghodsi
71
2
0
25 May 2022
Focus on the Target's Vocabulary: Masked Label Smoothing for Machine
  Translation
Focus on the Target's Vocabulary: Masked Label Smoothing for Machine Translation
Liang Chen
Runxin Xu
Baobao Chang
27
6
0
06 Mar 2022
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
265
1,898
0
26 Oct 2021
Is Label Smoothing Truly Incompatible with Knowledge Distillation: An
  Empirical Study
Is Label Smoothing Truly Incompatible with Knowledge Distillation: An Empirical Study
Zhiqiang Shen
Zechun Liu
Dejia Xu
Zitian Chen
Kwang-Ting Cheng
Marios Savvides
54
76
0
01 Apr 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
670
41,430
0
22 Oct 2020
On Long-Tailed Phenomena in Neural Machine Translation
On Long-Tailed Phenomena in Neural Machine Translation
Vikas Raunak
Siddharth Dalmia
Vivek Gupta
Florian Metze
51
30
0
10 Oct 2020
Self-Knowledge Distillation with Progressive Refinement of Targets
Self-Knowledge Distillation with Progressive Refinement of Targets
Kyungyul Kim
Byeongmoon Ji
Doyoung Yoon
Sangheum Hwang
ODL
81
182
0
22 Jun 2020
Self-Distillation as Instance-Specific Label Smoothing
Self-Distillation as Instance-Specific Label Smoothing
Zhilu Zhang
M. Sabuncu
69
119
0
09 Jun 2020
Stolen Probability: A Structural Weakness of Neural Language Models
Stolen Probability: A Structural Weakness of Neural Language Models
David Demeter
Gregory J. Kimmel
Doug Downey
51
33
0
05 May 2020
Generalized Entropy Regularization or: There's Nothing Special about
  Label Smoothing
Generalized Entropy Regularization or: There's Nothing Special about Label Smoothing
Clara Meister
Elizabeth Salesky
Ryan Cotterell
UQCV
44
61
0
02 May 2020
Regularizing Class-wise Predictions via Self-knowledge Distillation
Regularizing Class-wise Predictions via Self-knowledge Distillation
Sukmin Yun
Jongjin Park
Kimin Lee
Jinwoo Shin
68
281
0
31 Mar 2020
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
255
7,547
0
02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
113
1,869
0
23 Sep 2019
When Does Label Smoothing Help?
When Does Label Smoothing Help?
Rafael Müller
Simon Kornblith
Geoffrey E. Hinton
UQCV
207
1,953
0
06 Jun 2019
CutMix: Regularization Strategy to Train Strong Classifiers with
  Localizable Features
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Sangdoo Yun
Dongyoon Han
Seong Joon Oh
Sanghyuk Chun
Junsuk Choe
Y. Yoo
OOD
622
4,802
0
13 May 2019
The Curious Case of Neural Text Degeneration
The Curious Case of Neural Text Degeneration
Ari Holtzman
Jan Buys
Li Du
Maxwell Forbes
Yejin Choi
199
3,210
0
22 Apr 2019
Bag of Tricks for Image Classification with Convolutional Neural
  Networks
Bag of Tricks for Image Classification with Convolutional Neural Networks
Tong He
Zhi-Li Zhang
Hang Zhang
Zhongyue Zhang
Junyuan Xie
Mu Li
293
1,421
0
04 Dec 2018
SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine
  Translation
SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation
Xinyi Wang
Hieu H. Pham
Zihang Dai
Graham Neubig
70
197
0
22 Aug 2018
MobileNetV2: Inverted Residuals and Linear Bottlenecks
MobileNetV2: Inverted Residuals and Linear Bottlenecks
Mark Sandler
Andrew G. Howard
Menglong Zhu
A. Zhmoginov
Liang-Chieh Chen
204
19,333
0
13 Jan 2018
Improving Lexical Choice in Neural Machine Translation
Improving Lexical Choice in Neural Machine Translation
Toan Q. Nguyen
David Chiang
58
86
0
03 Oct 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
786
132,363
0
12 Jun 2017
Convolutional Sequence to Sequence Learning
Convolutional Sequence to Sequence Learning
Jonas Gehring
Michael Auli
David Grangier
Denis Yarats
Yann N. Dauphin
AIMat
171
3,289
0
08 May 2017
Regularizing Neural Networks by Penalizing Confident Output
  Distributions
Regularizing Neural Networks by Penalizing Confident Output Distributions
Gabriel Pereyra
George Tucker
J. Chorowski
Lukasz Kaiser
Geoffrey E. Hinton
NoLa
165
1,141
0
23 Jan 2017
Towards better decoding and language model integration in sequence to
  sequence models
Towards better decoding and language model integration in sequence to sequence models
J. Chorowski
Navdeep Jaitly
78
370
0
08 Dec 2016
Layer Normalization
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
432
10,531
0
21 Jul 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,426
0
10 Dec 2015
Rethinking the Inception Architecture for Computer Vision
Rethinking the Inception Architecture for Computer Vision
Christian Szegedy
Vincent Vanhoucke
Sergey Ioffe
Jonathon Shlens
Z. Wojna
3DVBDL
886
27,416
0
02 Dec 2015
Neural Machine Translation of Rare Words with Subword Units
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
228
7,757
0
31 Aug 2015
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
364
19,733
0
09 Mar 2015
Batch Normalization: Accelerating Deep Network Training by Reducing
  Internal Covariate Shift
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe
Christian Szegedy
OOD
465
43,341
0
11 Feb 2015
1