Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective

28 February 2025

Papers citing "Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective"

31 / 31 papers shown

Title
Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models Emily Cheng Richard Antonello 87 4 0 09 Sep 2024
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space Core Francisco Park Maya Okawa Andrew Lee Ekdeep Singh Lubana Hidenori Tanaka 73 12 0 27 Jun 2024
Mechanistic Interpretability for AI Safety -- A Review Leonard Bereska E. Gavves AI4CE 64 134 0 22 Apr 2024
Understanding Emergent Abilities of Language Models from the Loss Perspective Zhengxiao Du Aohan Zeng Yuxiao Dong Jie Tang UQCV LRM 99 51 0 23 Mar 2024
OLMo: Accelerating the Science of Language Models Dirk Groeneveld Iz Beltagy Pete Walsh Akshita Bhagia Rodney Michael Kinney ... Jesse Dodge Kyle Lo Luca Soldaini Noah A. Smith Hanna Hajishirzi OSLM 155 377 0 01 Feb 2024
LLM360: Towards Fully Transparent Open-Source LLMs Zhengzhong Liu Aurick Qiao Willie Neiswanger Hongyi Wang Bowen Tan ... Zhiting Hu Mark Schulze Preslav Nakov Timothy Baldwin Eric Xing 83 75 0 11 Dec 2023
Instruction-tuning Aligns LLMs to the Human Brain Khai Loong Aw Syrielle Montariol Badr AlKhamissi Martin Schrimpf Antoine Bosselut 105 20 0 01 Dec 2023
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs Angelica Chen Ravid Schwartz-Ziv Kyunghyun Cho Matthew L. Leavitt Naomi Saphra 41 66 0 13 Sep 2023
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints Joshua Ainslie James Lee-Thorp Michiel de Jong Yury Zemlyanskiy Federico Lebrón Sumit Sanghai 57 626 0 22 May 2023
Scaling laws for language encoding models in fMRI Richard Antonello Aditya R. Vaidya Alexander G. Huth MedIm 46 63 0 19 May 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing Wes Gurnee Neel Nanda Matthew Pauly Katherine Harvey Dmitrii Troitskii Dimitris Bertsimas MILM 178 203 0 02 May 2023
Training language models to summarize narratives improves brain alignment Khai Loong Aw Mariya Toneva 60 27 0 21 Dec 2022
Joint processing of linguistic properties in brains and language models Subba Reddy Oota Manish Gupta Mariya Toneva 35 29 0 15 Dec 2022
Finding Skill Neurons in Pre-trained Transformer-based Language Models Xiaozhi Wang Kaiyue Wen Zhengyan Zhang Lei Hou Zhiyuan Liu Juanzi Li MILM MoE 42 51 0 14 Nov 2022
Broken Neural Scaling Laws Ethan Caballero Kshitij Gupta Irina Rish David M. Krueger 55 75 0 26 Oct 2022
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 298 494 0 24 Sep 2022
Emergent Abilities of Large Language Models Jason W. Wei Yi Tay Rishi Bommasani Colin Raffel Barret Zoph ... Tatsunori Hashimoto Oriol Vinyals Percy Liang J. Dean W. Fedus ELM ReLM LRM 170 2,428 0 15 Jun 2022
Toward a realistic model of speech processing in the brain with self-supervised learning Juliette Millet Charlotte Caucheteux Pierre Orhan Yves Boubenec Alexandre Gramfort Ewan Dunbar Christophe Pallier J. King 45 95 0 03 Jun 2022
Training Compute-Optimal Large Language Models Jordan Hoffmann Sebastian Borgeaud A. Mensch Elena Buchatskaya Trevor Cai ... Karen Simonyan Erich Elsen Jack W. Rae Oriol Vinyals Laurent Sifre AI4TS 123 1,915 0 29 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 582 9,009 0 28 Jan 2022
RoFormer: Enhanced Transformer with Rotary Position Embedding Jianlin Su Yu Lu Shengfeng Pan Ahmed Murtadha Bo Wen Yunfeng Liu 139 2,307 0 20 Apr 2021
Knowledge Neurons in Pretrained Transformers Damai Dai Li Dong Y. Hao Zhifang Sui Baobao Chang Furu Wei KELM MU 60 440 0 18 Apr 2021
Transformer Feed-Forward Layers Are Key-Value Memories Mor Geva R. Schuster Jonathan Berant Omer Levy KELM 111 792 0 29 Dec 2020
Measuring Massive Multitask Language Understanding Dan Hendrycks Collin Burns Steven Basart Andy Zou Mantas Mazeika D. Song Jacob Steinhardt ELM RALM 137 4,222 0 07 Sep 2020
GLU Variants Improve Transformer Noam M. Shazeer 107 968 0 12 Feb 2020
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 451 4,662 0 23 Jan 2020
What do you learn from context? Probing for sentence structure in contextualized word representations Ian Tenney Patrick Xia Berlin Chen Alex Jinpeng Wang Adam Poliak ... Najoung Kim Benjamin Van Durme Samuel R. Bowman Dipanjan Das Ellie Pavlick 159 853 0 15 May 2019
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge Alon Talmor Jonathan Herzig Nicholas Lourie Jonathan Berant RALM 115 1,677 0 02 Nov 2018
Estimating the intrinsic dimension of datasets by a minimal neighborhood information Elena Facco M. d’Errico Alex Rodriguez Alessandro Laio 36 320 0 19 Mar 2018
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge Peter Clark Isaac Cowhey Oren Etzioni Tushar Khot Ashish Sabharwal Carissa Schoenick Oyvind Tafjord ELM RALM LRM 74 2,474 0 14 Mar 2018
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 453 129,831 0 12 Jun 2017