Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.01964
Cited By
Position: Understanding LLMs Requires More Than Statistical Generalization
3 May 2024
Patrik Reizinger
Szilvia Ujváry
Anna Mészáros
A. Kerekes
Wieland Brendel
Ferenc Huszár
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Position: Understanding LLMs Requires More Than Statistical Generalization"
37 / 37 papers shown
Title
Position: An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research
Patrik Reizinger
Randall Balestriero
David Klindt
Wieland Brendel
132
0
0
17 Apr 2025
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Jiajun Song
Zhuoyan Xu
Yiqiao Zhong
128
10
0
31 Dec 2024
All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling
Emanuele Marconato
Sébastien Lachapelle
Sebastian Weichwald
Luigi Gresele
88
4
0
30 Oct 2024
Understanding the Effects of RLHF on LLM Generalisation and Diversity
Robert Kirk
Ishita Mediratta
Christoforos Nalmpantis
Jelena Luketina
Eric Hambro
Edward Grefenstette
Roberta Raileanu
AI4CE
ALM
150
145
0
10 Oct 2023
Linearity of Relation Decoding in Transformer Language Models
Evan Hernandez
Arnab Sen Sharma
Tal Haklay
Kevin Meng
Martin Wattenberg
Jacob Andreas
Yonatan Belinkov
David Bau
KELM
56
98
0
17 Aug 2023
Predicting Ordinary Differential Equations with Transformers
Soren Becker
M. Klein
Alexander Neitz
Giambattista Parascandolo
Niki Kilbertus
65
15
0
24 Jul 2023
Transformers learn in-context by gradient descent
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
91
487
0
15 Dec 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
Hong Liu
Sang Michael Xie
Zhiyuan Li
Tengyu Ma
AI4CE
100
52
0
25 Oct 2022
A General framework for PAC-Bayes Bounds for Meta-Learning
A. Rezazadeh
AI4CE
74
4
0
11 Jun 2022
Causal de Finetti: On the Identification of Invariant Causal Structure in Exchangeable Data
Siyuan Guo
V. Tóth
Bernhard Schölkopf
Ferenc Huszár
CML
35
37
0
29 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
738
9,267
0
28 Jan 2022
An Explanation of In-context Learning as Implicit Bayesian Inference
Sang Michael Xie
Aditi Raghunathan
Percy Liang
Tengyu Ma
ReLM
BDL
VPVLM
LRM
175
746
0
03 Nov 2021
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
Yi Tay
Mostafa Dehghani
J. Rao
W. Fedus
Samira Abnar
Hyung Won Chung
Sharan Narang
Dani Yogatama
Ashish Vaswani
Donald Metzler
224
113
0
22 Sep 2021
Independent mechanism analysis, a new concept?
Luigi Gresele
Julius von Kügelgen
Vincent Stimper
Bernhard Schölkopf
M. Besserve
CML
45
102
0
09 Jun 2021
Generalization Bounds for Noisy Iterative Algorithms Using Properties of Additive Noise Channels
Hao Wang
Rui Gao
Flavio du Pin Calmon
60
17
0
05 Feb 2021
Toward Better Generalization Bounds with Locally Elastic Stability
Zhun Deng
Hangfeng He
Weijie J. Su
35
44
0
27 Oct 2020
Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret
Ariel Kleiner
H. Mobahi
Behnam Neyshabur
AAML
184
1,344
0
03 Oct 2020
Measuring Systematic Generalization in Neural Proof Generation with Transformers
Nicolas Angelard-Gontier
Koustuv Sinha
Siva Reddy
C. Pal
LRM
81
64
0
30 Sep 2020
Systematic Generalization on gSCAN with Language Conditioned Embedding
Tong Gao
Qi Huang
Raymond J. Mooney
45
22
0
11 Sep 2020
Object-Centric Learning with Slot Attention
Francesco Locatello
Dirk Weissenborn
Thomas Unterthiner
Aravindh Mahendran
G. Heigold
Jakob Uszkoreit
Alexey Dosovitskiy
Thomas Kipf
OCL
212
844
0
26 Jun 2020
A Survey of Neural Networks and Formal Languages
Joshua Ackerman
G. Cybenko
AI4CE
52
18
0
02 Jun 2020
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
119
988
0
12 Feb 2020
Fantastic Generalization Measures and Where to Find Them
Yiding Jiang
Behnam Neyshabur
H. Mobahi
Dilip Krishnan
Samy Bengio
AI4CE
117
606
0
04 Dec 2019
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
379
42,299
0
03 Dec 2019
Improving Transformer Models by Reordering their Sublayers
Ofir Press
Noah A. Smith
Omer Levy
42
87
0
10 Nov 2019
Learning Neural Causal Models from Unknown Interventions
Nan Rosemary Ke
O. Bilaniuk
Anirudh Goyal
Stefan Bauer
Hugo Larochelle
Bernhard Schölkopf
Michael C. Mozer
C. Pal
Yoshua Bengio
CML
OOD
94
168
0
02 Oct 2019
Gradient-Based Neural DAG Learning
Sébastien Lachapelle
P. Brouillard
T. Deleu
Simon Lacoste-Julien
BDL
CML
50
273
0
05 Jun 2019
Implicit Regularization in Deep Matrix Factorization
Sanjeev Arora
Nadav Cohen
Wei Hu
Yuping Luo
AI4CE
74
503
0
31 May 2019
On the Spectral Bias of Neural Networks
Nasim Rahaman
A. Baratin
Devansh Arpit
Felix Dräxler
Min Lin
Fred Hamprecht
Yoshua Bengio
Aaron Courville
126
1,432
0
22 Jun 2018
Implicit Bias of Gradient Descent on Linear Convolutional Networks
Suriya Gunasekar
Jason D. Lee
Daniel Soudry
Nathan Srebro
MDE
99
410
0
01 Jun 2018
Deep learning generalizes because the parameter-function map is biased towards simple functions
Guillermo Valle Pérez
Chico Q. Camargo
A. Louis
MLT
AI4CE
75
231
0
22 May 2018
A Closer Look at Memorization in Deep Networks
Devansh Arpit
Stanislaw Jastrzebski
Nicolas Ballas
David M. Krueger
Emmanuel Bengio
...
Tegan Maharaj
Asja Fischer
Aaron Courville
Yoshua Bengio
Simon Lacoste-Julien
TDI
120
1,814
0
16 Jun 2017
Information-theoretic analysis of generalization capability of learning algorithms
Aolin Xu
Maxim Raginsky
149
445
0
22 May 2017
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
Gintare Karolina Dziugaite
Daniel M. Roy
106
812
0
31 Mar 2017
Understanding deep learning requires rethinking generalization
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
HAI
320
4,624
0
10 Nov 2016
A New PAC-Bayesian Perspective on Domain Adaptation
Pascal Germain
Amaury Habrard
François Laviolette
Emilie Morvant
58
64
0
15 Jun 2015
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
Andrew M. Saxe
James L. McClelland
Surya Ganguli
ODL
162
1,844
0
20 Dec 2013
1