Towards a theory of how the structure of language is acquired by deep neural networks

28 May 2024

Papers citing "Towards a theory of how the structure of language is acquired by deep neural networks"

16 / 16 papers shown

Title
Learning curves theory for hierarchically compositional data with power-law distributed features Francesco Cagnetta Hyunmo Kang Matthieu Wyart 98 1 0 11 May 2025
A distributional simplicity bias in the learning dynamics of transformers Riccardo Rende Federica Gerace Alessandro Laio Sebastian Goldt 107 8 0 17 Feb 2025
Bilinear Sequence Regression: A Model for Learning from Long Sequences of High-dimensional Tokens Vittorio Erba Emanuele Troiani Luca Biggio Antoine Maillard Lenka Zdeborová 156 1 0 24 Oct 2024
Probing the Latent Hierarchical Structure of Data via Diffusion Models Antonio Sclocchi Alessandro Favero Noam Itzhak Levi Matthieu Wyart DiffM 82 5 0 17 Oct 2024
A Dynamical Model of Neural Scaling Laws Blake Bordelon Alexander B. Atanasov Cengiz Pehlevan 89 41 0 02 Feb 2024
Large Language Models Michael R Douglas LLMAG LM&MA 135 625 0 11 Jul 2023
Autocorrelations Decay in Texts and Applicability Limits of Language Models N. Mikhaylovskiy I. Churilov 21 6 0 11 May 2023
Emergent Abilities of Large Language Models Jason W. Wei Yi Tay Rishi Bommasani Colin Raffel Barret Zoph ... Tatsunori Hashimoto Oriol Vinyals Percy Liang J. Dean W. Fedus ELM ReLM LRM 277 2,474 0 15 Jun 2022
Explaining Neural Scaling Laws Yasaman Bahri Ethan Dyer Jared Kaplan Jaehoon Lee Utkarsh Sharma 62 261 0 12 Feb 2021
Feature Learning in Infinite-Width Neural Networks Greg Yang J. E. Hu MLT 75 153 0 30 Nov 2020
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 602 4,801 0 23 Jan 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library Adam Paszke Sam Gross Francisco Massa Adam Lerer James Bradbury ... Sasank Chilamkurthy Benoit Steiner Lu Fang Junjie Bai Soumith Chintala ODL 493 42,407 0 03 Dec 2019
BERT Rediscovers the Classical NLP Pipeline Ian Tenney Dipanjan Das Ellie Pavlick MILM SSeg 135 1,471 0 15 May 2019
Emergence of order in random languages Eric De Giuli LRM 20 10 0 20 Feb 2019
Dissecting Contextual Word Embeddings: Architecture and Representation Matthew E. Peters Mark Neumann Luke Zettlemoyer Wen-tau Yih 96 429 0 27 Aug 2018
Pointer Sentinel Mixture Models Stephen Merity Caiming Xiong James Bradbury R. Socher RALM 308 2,859 0 26 Sep 2016