Position: Understanding LLMs Requires More Than Statistical Generalization

3 May 2024

Wieland Brendel

Papers citing "Position: Understanding LLMs Requires More Than Statistical Generalization"

37 / 37 papers shown

Title
Position: An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research Patrik Reizinger Randall Balestriero David Klindt Wieland Brendel 132 0 0 17 Apr 2025
Out-of-distribution generalization via composition: a lens through induction heads in Transformers Jiajun Song Zhuoyan Xu Yiqiao Zhong 128 10 0 31 Dec 2024
All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling Emanuele Marconato Sébastien Lachapelle Sebastian Weichwald Luigi Gresele 88 4 0 30 Oct 2024
Understanding the Effects of RLHF on LLM Generalisation and Diversity Robert Kirk Ishita Mediratta Christoforos Nalmpantis Jelena Luketina Eric Hambro Edward Grefenstette Roberta Raileanu AI4CE ALM 150 145 0 10 Oct 2023
Linearity of Relation Decoding in Transformer Language Models Evan Hernandez Arnab Sen Sharma Tal Haklay Kevin Meng Martin Wattenberg Jacob Andreas Yonatan Belinkov David Bau KELM 56 98 0 17 Aug 2023
Predicting Ordinary Differential Equations with Transformers Soren Becker M. Klein Alexander Neitz Giambattista Parascandolo Niki Kilbertus 65 15 0 24 Jul 2023
Transformers learn in-context by gradient descent J. Oswald Eyvind Niklasson E. Randazzo João Sacramento A. Mordvintsev A. Zhmoginov Max Vladymyrov MLT 91 487 0 15 Dec 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models Hong Liu Sang Michael Xie Zhiyuan Li Tengyu Ma AI4CE 100 52 0 25 Oct 2022
A General framework for PAC-Bayes Bounds for Meta-Learning A. Rezazadeh AI4CE 74 4 0 11 Jun 2022
Causal de Finetti: On the Identification of Invariant Causal Structure in Exchangeable Data Siyuan Guo V. Tóth Bernhard Schölkopf Ferenc Huszár CML 35 37 0 29 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 738 9,267 0 28 Jan 2022
An Explanation of In-context Learning as Implicit Bayesian Inference Sang Michael Xie Aditi Raghunathan Percy Liang Tengyu Ma ReLM BDL VPVLM LRM 175 746 0 03 Nov 2021
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers Yi Tay Mostafa Dehghani J. Rao W. Fedus Samira Abnar Hyung Won Chung Sharan Narang Dani Yogatama Ashish Vaswani Donald Metzler 224 113 0 22 Sep 2021
Independent mechanism analysis, a new concept? Luigi Gresele Julius von Kügelgen Vincent Stimper Bernhard Schölkopf M. Besserve CML 45 102 0 09 Jun 2021
Generalization Bounds for Noisy Iterative Algorithms Using Properties of Additive Noise Channels Hao Wang Rui Gao Flavio du Pin Calmon 60 17 0 05 Feb 2021
Toward Better Generalization Bounds with Locally Elastic Stability Zhun Deng Hangfeng He Weijie J. Su 35 44 0 27 Oct 2020
Sharpness-Aware Minimization for Efficiently Improving Generalization Pierre Foret Ariel Kleiner H. Mobahi Behnam Neyshabur AAML 184 1,344 0 03 Oct 2020
Measuring Systematic Generalization in Neural Proof Generation with Transformers Nicolas Angelard-Gontier Koustuv Sinha Siva Reddy C. Pal LRM 81 64 0 30 Sep 2020
Systematic Generalization on gSCAN with Language Conditioned Embedding Tong Gao Qi Huang Raymond J. Mooney 45 22 0 11 Sep 2020
Object-Centric Learning with Slot Attention Francesco Locatello Dirk Weissenborn Thomas Unterthiner Aravindh Mahendran G. Heigold Jakob Uszkoreit Alexey Dosovitskiy Thomas Kipf OCL 212 844 0 26 Jun 2020
A Survey of Neural Networks and Formal Languages Joshua Ackerman G. Cybenko AI4CE 52 18 0 02 Jun 2020
On Layer Normalization in the Transformer Architecture Ruibin Xiong Yunchang Yang Di He Kai Zheng Shuxin Zheng Chen Xing Huishuai Zhang Yanyan Lan Liwei Wang Tie-Yan Liu AI4CE 119 988 0 12 Feb 2020
Fantastic Generalization Measures and Where to Find Them Yiding Jiang Behnam Neyshabur H. Mobahi Dilip Krishnan Samy Bengio AI4CE 117 606 0 04 Dec 2019
PyTorch: An Imperative Style, High-Performance Deep Learning Library Adam Paszke Sam Gross Francisco Massa Adam Lerer James Bradbury ... Sasank Chilamkurthy Benoit Steiner Lu Fang Junjie Bai Soumith Chintala ODL 379 42,299 0 03 Dec 2019
Improving Transformer Models by Reordering their Sublayers Ofir Press Noah A. Smith Omer Levy 42 87 0 10 Nov 2019
Learning Neural Causal Models from Unknown Interventions Nan Rosemary Ke O. Bilaniuk Anirudh Goyal Stefan Bauer Hugo Larochelle Bernhard Schölkopf Michael C. Mozer C. Pal Yoshua Bengio CML OOD 94 168 0 02 Oct 2019
Gradient-Based Neural DAG Learning Sébastien Lachapelle P. Brouillard T. Deleu Simon Lacoste-Julien BDL CML 50 273 0 05 Jun 2019
Implicit Regularization in Deep Matrix Factorization Sanjeev Arora Nadav Cohen Wei Hu Yuping Luo AI4CE 74 503 0 31 May 2019
On the Spectral Bias of Neural Networks Nasim Rahaman A. Baratin Devansh Arpit Felix Dräxler Min Lin Fred Hamprecht Yoshua Bengio Aaron Courville 126 1,432 0 22 Jun 2018
Implicit Bias of Gradient Descent on Linear Convolutional Networks Suriya Gunasekar Jason D. Lee Daniel Soudry Nathan Srebro MDE 99 410 0 01 Jun 2018
Deep learning generalizes because the parameter-function map is biased towards simple functions Guillermo Valle Pérez Chico Q. Camargo A. Louis MLT AI4CE 75 231 0 22 May 2018
A Closer Look at Memorization in Deep Networks Devansh Arpit Stanislaw Jastrzebski Nicolas Ballas David M. Krueger Emmanuel Bengio ... Tegan Maharaj Asja Fischer Aaron Courville Yoshua Bengio Simon Lacoste-Julien TDI 120 1,814 0 16 Jun 2017
Information-theoretic analysis of generalization capability of learning algorithms Aolin Xu Maxim Raginsky 149 445 0 22 May 2017
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data Gintare Karolina Dziugaite Daniel M. Roy 106 812 0 31 Mar 2017
Understanding deep learning requires rethinking generalization Chiyuan Zhang Samy Bengio Moritz Hardt Benjamin Recht Oriol Vinyals HAI 320 4,624 0 10 Nov 2016
A New PAC-Bayesian Perspective on Domain Adaptation Pascal Germain Amaury Habrard François Laviolette Emilie Morvant 58 64 0 15 Jun 2015
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks Andrew M. Saxe James L. McClelland Surya Ganguli ODL 162 1,844 0 20 Dec 2013