Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.04014
Cited By
Continual Pre-Training of Large Language Models: How to (re)warm your model?
8 August 2023
Kshitij Gupta
Benjamin Thérien
Adam Ibrahim
Mats L. Richter
Quentin G. Anthony
Eugene Belilovsky
Irina Rish
Timothée Lesort
KELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Continual Pre-Training of Large Language Models: How to (re)warm your model?"
35 / 85 papers shown
Title
InstructionCP: A fast approach to transfer Large Language Models into target language
Kuang-Ming Chen
Hung-yi Lee
CLL
43
2
0
30 May 2024
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
Alexander Hägele
Elie Bakouch
Atli Kosson
Loubna Ben Allal
Leandro von Werra
Martin Jaggi
38
34
0
28 May 2024
Empowering Character-level Text Infilling by Eliminating Sub-Tokens
Houxing Ren
Mingjie Zhan
Zhongyuan Wu
Hongsheng Li
AI4CE
27
1
0
27 May 2024
Zamba: A Compact 7B SSM Hybrid Model
Paolo Glorioso
Quentin G. Anthony
Yury Tokpanov
James Whittington
Jonathan Pilault
Adam Ibrahim
Beren Millidge
30
45
0
26 May 2024
LlamaTurk: Adapting Open-Source Generative Large Language Models for Low-Resource Language
Cagri Toraman
VLM
38
5
0
13 May 2024
Towards Incremental Learning in Large Language Models: A Critical Review
M. Jovanovic
Peter Voss
ELM
CLL
KELM
37
5
0
28 Apr 2024
Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities
Kazuki Fujii
Taishi Nakamura
Mengsay Loem
Hiroki Iida
Masanari Ohi
Kakeru Hattori
Hirai Shota
Sakae Mizuki
Rio Yokota
Naoaki Okazaki
CLL
41
53
0
27 Apr 2024
Continual Learning of Large Language Models: A Comprehensive Survey
Haizhou Shi
Zihao Xu
Hengyi Wang
Weiyi Qin
Wenyuan Wang
Yibin Wang
Zifeng Wang
Sayna Ebrahimi
Hao Wang
CLL
KELM
LRM
52
64
0
25 Apr 2024
SambaLingo: Teaching Large Language Models New Languages
Zoltan Csaki
Bo Li
Jonathan Li
Qiantong Xu
Pian Pawakapan
Leon Zhang
Yun Du
Hengyu Zhao
Changran Hu
Urmish Thakker
37
6
0
08 Apr 2024
ANGOFA: Leveraging OFA Embedding Initialization and Synthetic Data for Angolan Language Model
Osvaldo Luamba Quinjica
David Ifeoluwa Adelani
32
0
0
03 Apr 2024
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Taishi Nakamura
Mayank Mishra
Simone Tedeschi
Yekun Chai
Jason T Stillerman
...
Virendra Mehta
Matthew Blumberg
Victor May
Huu Nguyen
S. Pyysalo
LRM
31
7
0
30 Mar 2024
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Adam Ibrahim
Benjamin Thérien
Kshitij Gupta
Mats L. Richter
Quentin Anthony
Timothée Lesort
Eugene Belilovsky
Irina Rish
KELM
CLL
44
52
0
13 Mar 2024
Investigating Continual Pretraining in Large Language Models: Insights and Implications
cCaugatay Yildiz
Nishaanth Kanna Ravichandran
Prishruit Punia
Matthias Bethge
B. Ermiş
CLL
KELM
LRM
58
25
0
27 Feb 2024
Me LLaMA: Foundation Large Language Models for Medical Applications
Qianqian Xie
Qingyu Chen
Aokun Chen
C.A.I. Peng
Yan Hu
...
Huan He
Lucila Ohno-Machido
Yonghui Wu
Hua Xu
Jiang Bian
LM&MA
AI4MH
70
4
0
20 Feb 2024
Parallel Structures in Pre-training Data Yield In-Context Learning
Yanda Chen
Chen Zhao
Zhou Yu
Kathleen McKeown
He He
29
12
0
19 Feb 2024
BBox-Adapter: Lightweight Adapting for Black-Box Large Language Models
Haotian Sun
Yuchen Zhuang
Wei Wei
Chao Zhang
Bo Dai
18
3
0
13 Feb 2024
Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models
Jianhui Pang
Fanghua Ye
Longyue Wang
Dian Yu
Derek F. Wong
Shuming Shi
Zhaopeng Tu
ALM
38
6
0
16 Jan 2024
Examining Forgetting in Continual Pre-training of Aligned Large Language Models
Chen An Li
Hung-Yi Lee
CLL
KELM
31
8
0
06 Jan 2024
LLaMA Pro: Progressive LLaMA with Block Expansion
Chengyue Wu
Yukang Gan
Yixiao Ge
Zeyu Lu
Jiahao Wang
Ye Feng
Ying Shan
Ping Luo
CLL
37
61
0
04 Jan 2024
EcomGPT-CT: Continual Pre-training of E-commerce Large Language Models with Semi-structured Data
Shirong Ma
Shen Huang
Shulin Huang
Xiaobin Wang
Yangning Li
Hai-Tao Zheng
Pengjun Xie
Fei Huang
Yong-jia Jiang
38
6
0
25 Dec 2023
DeepArt: A Benchmark to Advance Fidelity Research in AI-Generated Content
Wentao Wang
Xuanyao Huang
Tianyang Wang
S. K. Roy
EGVM
48
0
0
16 Dec 2023
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Zeming Chen
Alejandro Hernández Cano
Angelika Romanou
Antoine Bonnet
Kyle Matoba
...
Axel Marmet
Syrielle Montariol
Mary-Anne Hartley
Martin Jaggi
Antoine Bosselut
LM&MA
AI4MH
MedIm
37
179
0
27 Nov 2023
AcademicGPT: Empowering Academic Research
Shufa Wei
Xiaolong Xu
Xianbiao Qi
Xi Yin
Jun Xia
...
Chihao Dai
Lihua Wang
Xiaohui Liu
Lei Zhang
Yutao Xie
LM&MA
39
3
0
21 Nov 2023
OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining
Yihong Liu
Peiqin Lin
Mingyang Wang
Hinrich Schütze
26
21
0
15 Nov 2023
DiLoCo: Distributed Low-Communication Training of Language Models
Arthur Douillard
Qixuang Feng
Andrei A. Rusu
Rachita Chhaparia
Yani Donchev
A. Kuncoro
MarcÁurelio Ranzato
Arthur Szlam
Jiajun Shen
58
31
0
14 Nov 2023
Continuous Training and Fine-tuning for Domain-Specific Language Models in Medical Question Answering
Zhen Guo
Yining Hua
LM&MA
CLL
ALM
AI4MH
30
5
0
01 Nov 2023
LEMON: Lossless model expansion
Yite Wang
Jiahao Su
Hanlin Lu
Cong Xie
Tianyi Liu
Jianbo Yuan
Yanghua Peng
Ruoyu Sun
Hongxia Yang
17
12
0
12 Oct 2023
How Do Large Language Models Capture the Ever-changing World Knowledge? A Review of Recent Advances
Zihan Zhang
Meng Fang
Lingxi Chen
Mohammad-Reza Namazi-Rad
Jun Wang
KELM
24
21
0
11 Oct 2023
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Mengzhou Xia
Tianyu Gao
Zhiyuan Zeng
Danqi Chen
40
267
0
10 Oct 2023
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Baolin Peng
Linfeng Song
Ye Tian
Lifeng Jin
Haitao Mi
Dong Yu
35
17
0
18 Sep 2023
Continual evaluation for lifelong learning: Identifying the stability gap
Matthias De Lange
Gido M. van de Ven
Tinne Tuytelaars
CLL
94
41
0
26 May 2022
Fine-tuned Language Models are Continual Learners
Thomas Scialom
Tuhin Chakrabarty
Smaranda Muresan
CLL
LRM
145
117
0
24 May 2022
Representational Continuity for Unsupervised Continual Learning
Divyam Madaan
Jaehong Yoon
Yuanchun Li
Yunxin Liu
Sung Ju Hwang
CLL
SSL
66
111
0
13 Oct 2021
Towards Continual Knowledge Learning of Language Models
Joel Jang
Seonghyeon Ye
Sohee Yang
Joongbo Shin
Janghoon Han
Gyeonghun Kim
Stanley Jungkyu Choi
Minjoon Seo
CLL
KELM
230
151
0
07 Oct 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
261
1,996
0
31 Dec 2020
Previous
1
2