Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2001.08361
Cited By
Scaling Laws for Neural Language Models
23 January 2020
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Laws for Neural Language Models"
50 / 359 papers shown
Title
MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning
Pengjie Ren
Chengshun Shi
Shiguang Wu
Mengqi Zhang
Zhaochun Ren
Maarten de Rijke
Zhumin Chen
Jiahuan Pei
MoE
160
14
0
27 Feb 2024
Large Language Models: A Survey
Shervin Minaee
Tomas Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
188
406
0
09 Feb 2024
Is Adversarial Training with Compressed Datasets Effective?
Tong Chen
Raghavendra Selvan
AAML
121
0
0
08 Feb 2024
Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning
Lanqing Li
Hai Zhang
Xinyu Zhang
Shatong Zhu
Junqiao Zhao
Junqiao Zhao
Pheng-Ann Heng
OffRL
76
8
0
04 Feb 2024
Decoding Speculative Decoding
Minghao Yan
Saurabh Agarwal
Shivaram Venkataraman
LRM
71
10
0
02 Feb 2024
CroissantLLM: A Truly Bilingual French-English Language Model
Manuel Faysse
Patrick Fernandes
Nuno M. Guerreiro
António Loison
Duarte M. Alves
...
François Yvon
André F.T. Martins
Gautier Viaud
C´eline Hudelot
Pierre Colombo
97
36
0
01 Feb 2024
Always-Sparse Training by Growing Connections with Guided Stochastic Exploration
Mike Heddes
Narayan Srinivasa
T. Givargis
Alexandru Nicolau
242
0
0
12 Jan 2024
Latte: Latent Diffusion Transformer for Video Generation
Xin Ma
Yaohui Wang
Gengyun Jia
Xinyuan Chen
Ziqiang Liu
Yuan-Fang Li
Cunjian Chen
Yu Qiao
DiffM
VGen
224
269
0
05 Jan 2024
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Nikhil Sardana
Jacob P. Portes
Sasha Doubov
Jonathan Frankle
LRM
293
84
0
31 Dec 2023
Critical Influence of Overparameterization on Sharpness-aware Minimization
Sungbin Shin
Dongyeop Lee
Maksym Andriushchenko
Namhoon Lee
AAML
98
2
0
29 Nov 2023
Large Language Models as Topological Structure Enhancers for Text-Attributed Graphs
Shengyin Sun
Yuxiang Ren
Chen Ma
Xuecang Zhang
185
21
0
24 Nov 2023
An Interpretable Deep-Learning Framework for Predicting Hospital Readmissions From Electronic Health Records
Fabio Azzalini
Tommaso Dolci
Marco Vagaggini
OOD
133
1
0
16 Oct 2023
ParFam -- (Neural Guided) Symbolic Regression Based on Continuous Global Optimization
Philipp Scholl
Katharina Bieker
Hillary Hauger
Gitta Kutyniok
77
5
0
09 Oct 2023
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Guosheng Dong
Zhiying Wu
ELM
LRM
160
743
0
19 Sep 2023
FLM-101B: An Open LLM and How to Train It with
100
K
B
u
d
g
e
t
100K Budget
100
K
B
u
d
g
e
t
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Xuying Meng
...
Li Du
Bowen Qin
Zheng Zhang
Aixin Sun
Yequan Wang
95
22
0
07 Sep 2023
Spoken Language Intelligence of Large Language Models for Language Learning
Linkai Peng
Baorian Nuchged
Yingming Gao
ELM
91
4
0
28 Aug 2023
Evolution of ESG-focused DLT Research: An NLP Analysis of the Literature
Walter Hernandez Cruz
K. Tylinski
Alastair Moore
Niall Roche
Nikhil Vadgama
Horst Treiblmaier
J. Shangguan
Paolo Tasca
Jiahua Xu
70
2
0
23 Aug 2023
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Jiasheng Ye
Zaixiang Zheng
Yu Bao
Lihua Qian
Quanquan Gu
DiffM
117
18
0
23 Aug 2023
An Empirical Study of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration
Hiroki Naganuma
Ryuichiro Hataya
Kotaro Yoshida
Ioannis Mitliagkas
OODD
158
3
0
17 Jul 2023
Personality Traits in Large Language Models
Gregory Serapio-García
Mustafa Safdari
Clément Crepy
Luning Sun
Stephen Fitz
P. Romero
Marwa Abdulhai
Aleksandra Faust
Maja J. Matarić
LM&MA
LLMAG
95
123
0
01 Jul 2023
Precise Asymptotic Generalization for Multiclass Classification with Overparameterized Linear Models
David X. Wu
A. Sahai
55
3
0
23 Jun 2023
RedMotion: Motion Prediction via Redundancy Reduction
Royden Wagner
Omer Sahin Tas
Marvin Klemp
Carlos Fernandez Lopez
Christoph Stiller
103
8
0
19 Jun 2023
Local Grammar-Based Coding Revisited
L. Debowski
40
0
0
27 Sep 2022
Co-domain Symmetry for Complex-Valued Deep Learning
Utkarsh Singhal
Yifei Xing
Stella X. Yu
80
12
0
02 Dec 2021
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
419
20,127
0
23 Oct 2019
A Constructive Prediction of the Generalization Error Across Scales
Jonathan S. Rosenfeld
Amir Rosenfeld
Yonatan Belinkov
Nir Shavit
91
211
0
27 Sep 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
356
6,449
0
26 Sep 2019
Beyond Human-Level Accuracy: Computational Challenges in Deep Learning
Joel Hestness
Newsha Ardalani
G. Diamos
44
67
0
03 Sep 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
615
24,431
0
26 Jul 2019
Growing a Brain: Fine-Tuning by Increasing Model Capacity
Yu-Xiong Wang
Deva Ramanan
M. Hebert
CLL
59
149
0
18 Jul 2019
Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model
Guodong Zhang
Lala Li
Zachary Nado
James Martens
Sushant Sachdeva
George E. Dahl
Christopher J. Shallue
Roger C. Grosse
93
152
0
09 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
230
8,426
0
19 Jun 2019
One Epoch Is All You Need
Aran Komatsuzaki
51
51
0
16 Jun 2019
AutoGrow: Automatic Layer Growing in Deep Convolutional Networks
W. Wen
Feng Yan
Yiran Chen
H. Li
60
39
0
07 Jun 2019
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Mingxing Tan
Quoc V. Le
3DV
MedIm
137
18,115
0
28 May 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
256
2,312
0
02 May 2019
Generating Long Sequences with Sparse Transformers
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
112
1,896
0
23 Apr 2019
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
Jaehoon Lee
Lechao Xiao
S. Schoenholz
Yasaman Bahri
Roman Novak
Jascha Narain Sohl-Dickstein
Jeffrey Pennington
211
1,101
0
18 Feb 2019
An Investigation into Neural Net Optimization via Hessian Eigenvalue Density
Behrooz Ghorbani
Shankar Krishnan
Ying Xiao
ODL
62
323
0
29 Jan 2019
Scaling description of generalization with number of parameters in deep learning
Mario Geiger
Arthur Jacot
S. Spigler
Franck Gabriel
Levent Sagun
Stéphane dÁscoli
Giulio Biroli
Clément Hongler
Matthieu Wyart
76
195
0
06 Jan 2019
An Empirical Model of Large-Batch Training
Sam McCandlish
Jared Kaplan
Dario Amodei
OpenAI Dota Team
65
277
0
14 Dec 2018
Gradient Descent Happens in a Tiny Subspace
Guy Gur-Ari
Daniel A. Roberts
Ethan Dyer
93
233
0
12 Dec 2018
The Full Spectrum of Deepnet Hessians at Scale: Dynamics with SGD Training and Sample Size
Vardan Papyan
53
31
0
16 Nov 2018
Measuring the Effects of Data Parallelism on Neural Network Training
Christopher J. Shallue
Jaehoon Lee
J. Antognini
J. Mamou
J. Ketterling
Yao Wang
82
410
0
08 Nov 2018
Mesh-TensorFlow: Deep Learning for Supercomputers
Noam M. Shazeer
Youlong Cheng
Niki Parmar
Dustin Tran
Ashish Vaswani
...
HyoukJoong Lee
O. Milenkovic
C. Young
Ryan Sepassi
Blake Hechtman
GNN
MoE
AI4CE
84
389
0
05 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.7K
94,770
0
11 Oct 2018
Universal Transformers
Mostafa Dehghani
Stephan Gouws
Oriol Vinyals
Jakob Uszkoreit
Lukasz Kaiser
83
752
0
10 Jul 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Arthur Jacot
Franck Gabriel
Clément Hongler
267
3,195
0
20 Jun 2018
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Noam M. Shazeer
Mitchell Stern
ODL
76
1,047
0
11 Apr 2018
Generating Wikipedia by Summarizing Long Sequences
Peter J. Liu
Mohammad Saleh
Etienne Pot
Ben Goodrich
Ryan Sepassi
Lukasz Kaiser
Noam M. Shazeer
CVBM
185
798
0
30 Jan 2018
Previous
1
2
3
4
5
6
7
8
Next