Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.15936
Cited By
v1
v2 (latest)
A Theory for Emergence of Complex Skills in Language Models
29 July 2023
Sanjeev Arora
Anirudh Goyal
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A Theory for Emergence of Complex Skills in Language Models"
14 / 14 papers shown
Title
The emergence of sparse attention: impact of data distribution and benefits of repetition
Nicolas Zucchet
Francesco dÁngelo
Andrew Kyle Lampinen
Stephanie C. Y. Chan
206
1
0
23 May 2025
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Kai Yan
Yufei Xu
Zhengyin Du
Xuesong Yao
Ziyi Wang
Xiaowen Guo
Jiecao Chen
ReLM
ELM
LRM
189
5
0
01 Apr 2025
Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
Yoonsoo Nam
Seok Hyeong Lee
Clementine Domine
Yea Chan Park
Charles London
Wonyl Choi
Niclas Goring
Seungjai Lee
AI4CE
188
1
0
28 Feb 2025
The Best Instruction-Tuning Data are Those That Fit
Dylan Zhang
Qirun Dai
Hao Peng
ALM
209
7
0
06 Feb 2025
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Jiajun Song
Zhuoyan Xu
Yiqiao Zhong
150
10
0
31 Dec 2024
Can Models Learn Skill Composition from Examples?
Haoyu Zhao
Simran Kaur
Dingli Yu
Anirudh Goyal
Sanjeev Arora
CoGe
MoE
97
8
0
29 Sep 2024
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELM
ReLM
LRM
290
2,518
0
15 Jun 2022
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
533
6,301
0
05 Apr 2022
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
208
1,987
0
29 Mar 2022
Explaining Neural Scaling Laws
Yasaman Bahri
Ethan Dyer
Jared Kaplan
Jaehoon Lee
Utkarsh Sharma
78
269
0
12 Feb 2021
A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks
Nikunj Saunshi
Sadhika Malladi
Sanjeev Arora
85
89
0
07 Oct 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
645
4,921
0
23 Jan 2020
Deep Learning Scaling is Predictable, Empirically
Joel Hestness
Sharan Narang
Newsha Ardalani
G. Diamos
Heewoo Jun
Hassan Kianinejad
Md. Mostofa Ali Patwary
Yang Yang
Yanqi Zhou
112
744
0
01 Dec 2017
Mathematical Foundations for a Compositional Distributional Model of Meaning
B. Coecke
M. Sadrzadeh
S. Clark
CoGe
117
569
0
23 Mar 2010
1