Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.07052
Cited By
Gradient Ascent Post-training Enhances Language Model Generalization
12 June 2023
Dongkeun Yoon
Joel Jang
Sungdong Kim
Minjoon Seo
VLM
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Gradient Ascent Post-training Enhances Language Model Generalization"
11 / 11 papers shown
Title
RKLD: Reverse KL-Divergence-based Knowledge Distillation for Unlearning Personal Information in Large Language Models
Bichen Wang
Yuzhe Zi
Yixin Sun
Yanyan Zhao
Bing Qin
MU
75
8
0
04 Jun 2024
Digital Forgetting in Large Language Models: A Survey of Unlearning Methods
Alberto Blanco-Justicia
N. Jebreel
Benet Manzanares-Salor
David Sánchez
Josep Domingo-Ferrer
Guillem Collell
Kuan Eeik Tan
KELM
MU
60
17
0
02 Apr 2024
Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty
I. Timiryasov
J. Tastet
21
47
0
03 Aug 2023
Knowledge Unlearning for Mitigating Privacy Risks in Language Models
Joel Jang
Dongkeun Yoon
Sohee Yang
Sungmin Cha
Moontae Lee
Lajanugen Logeswaran
Minjoon Seo
KELM
PILM
MU
147
193
0
04 Oct 2022
Sharpness-Aware Minimization Improves Language Model Generalization
Dara Bahri
H. Mobahi
Yi Tay
133
98
0
16 Oct 2021
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
215
1,661
0
15 Oct 2021
Internet-Augmented Dialogue Generation
M. Komeili
Kurt Shuster
Jason Weston
RALM
244
281
0
15 Jul 2021
SWAD: Domain Generalization by Seeking Flat Minima
Junbum Cha
Sanghyuk Chun
Kyungjae Lee
Han-Cheol Cho
Seunghyun Park
Yunsung Lee
Sungrae Park
MoMe
216
424
0
17 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
282
2,000
0
31 Dec 2020
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
243
815
0
13 Sep 2019
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
308
2,892
0
15 Sep 2016
1