ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.07311
  4. Cited By
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and
  Simplicity Bias in MLMs

Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs

13 September 2023
Angelica Chen
Ravid Schwartz-Ziv
Kyunghyun Cho
Matthew L. Leavitt
Naomi Saphra
ArXivPDFHTML

Papers citing "Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs"

19 / 19 papers shown
Title
(How) Do Language Models Track State?
Belinda Z. Li
Zifan Carl Guo
Jacob Andreas
LRM
46
0
0
04 Mar 2025
Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective
Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective
Yuko Nakagi
Keigo Tada
Sota Yoshino
Shinji Nishimoto
Yu Takagi
LRM
37
0
0
28 Feb 2025
Distributional Scaling Laws for Emergent Capabilities
Distributional Scaling Laws for Emergent Capabilities
Rosie Zhao
Tian Qin
David Alvarez-Melis
Sham Kakade
Naomi Saphra
LRM
39
0
0
24 Feb 2025
Can Transformers Learn $n$-gram Language Models?
Can Transformers Learn nnn-gram Language Models?
Anej Svete
Nadav Borenstein
M. Zhou
Isabelle Augenstein
Ryan Cotterell
39
6
0
03 Oct 2024
Geometric Signatures of Compositionality Across a Language Model's Lifetime
Geometric Signatures of Compositionality Across a Language Model's Lifetime
Jin Hwa Lee
Thomas Jiralerspong
Lei Yu
Yoshua Bengio
Emily Cheng
CoGe
84
0
0
02 Oct 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
82
19
0
02 Jul 2024
How Do Large Language Models Acquire Factual Knowledge During
  Pretraining?
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Hoyeon Chang
Jinho Park
Seonghyeon Ye
Sohee Yang
Youngkyung Seo
Du-Seong Chang
Minjoon Seo
KELM
37
30
0
17 Jun 2024
Survival of the Fittest Representation: A Case Study with Modular
  Addition
Survival of the Fittest Representation: A Case Study with Modular Addition
Xiaoman Delores Ding
Zifan Carl Guo
Eric J. Michaud
Ziming Liu
Max Tegmark
48
3
0
27 May 2024
From Frege to chatGPT: Compositionality in language, cognition, and deep
  neural networks
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks
Jacob Russin
Sam Whitman McGrath
Danielle J. Williams
Lotem Elber-Dorozko
AI4CE
73
3
0
24 May 2024
Multiple Realizability and the Rise of Deep Learning
Multiple Realizability and the Rise of Deep Learning
Sam Whitman McGrath
Jacob Russin
AI4CE
27
2
0
21 May 2024
Natural Language Processing RELIES on Linguistics
Natural Language Processing RELIES on Linguistics
Juri Opitz
Shira Wein
Nathan Schneider
AI4CE
52
7
0
09 May 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
46
112
0
28 Mar 2024
Omnigrok: Grokking Beyond Algorithmic Data
Omnigrok: Grokking Beyond Algorithmic Data
Ziming Liu
Eric J. Michaud
Max Tegmark
56
76
0
03 Oct 2022
Can Transformer be Too Compositional? Analysing Idiom Processing in
  Neural Machine Translation
Can Transformer be Too Compositional? Analysing Idiom Processing in Neural Machine Translation
Verna Dankers
Christopher G. Lucas
Ivan Titov
38
36
0
30 May 2022
The paradox of the compositionality of natural language: a neural
  machine translation case study
The paradox of the compositionality of natural language: a neural machine translation case study
Verna Dankers
Elia Bruni
Dieuwke Hupkes
CoGe
162
75
0
12 Aug 2021
The large learning rate phase of deep learning: the catapult mechanism
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
159
234
0
04 Mar 2020
Selectivity considered harmful: evaluating the causal impact of class
  selectivity in DNNs
Selectivity considered harmful: evaluating the causal impact of class selectivity in DNNs
Matthew L. Leavitt
Ari S. Morcos
58
33
0
03 Mar 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
240
4,469
0
23 Jan 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
1