The mechanistic basis of data dependence and abrupt learning in an in-context classification task

3 December 2023

Papers citing "The mechanistic basis of data dependence and abrupt learning in an in-context classification task"

45 / 45 papers shown

Title
Context parroting: A simple but tough-to-beat baseline for foundation models in scientific machine learning Yuanzhao Zhang William Gilpin AI4TS 12 0 0 16 May 2025
Understanding In-context Learning of Addition via Activation Subspaces Xinyan Hu Kayo Yin Michael I. Jordan Jacob Steinhardt Lijie Chen 53 0 0 08 May 2025
ICL CIPHERS: Quantifying "Learning'' in In-Context Learning via Substitution Ciphers Zhouxiang Fang Aayush Mishra Muhan Gao Anqi Liu Daniel Khashabi 44 0 0 28 Apr 2025
In-Context Learning can distort the relationship between sequence likelihoods and biological fitness Pranav Kantroo Günter P. Wagner Benjamin B. Machta 45 0 0 23 Apr 2025
How do language models learn facts? Dynamics, curricula and hallucinations Nicolas Zucchet J. Bornschein Stephanie C. Y. Chan Andrew Kyle Lampinen Razvan Pascanu Soham De KELM HILM LRM 77 3 1 27 Mar 2025
Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding Shunqi Mao Chaoyi Zhang Weidong Cai MLLM 149 0 0 13 Mar 2025
Strategy Coopetition Explains the Emergence and Transience of In-Context Learning Aaditya K. Singh Ted Moskovitz Sara Dragutinovic Felix Hill Stephanie C. Y. Chan Andrew Saxe 145 0 0 07 Mar 2025
Training Dynamics of In-Context Learning in Linear Attention Yedi Zhang Aaditya K. Singh Peter E. Latham Andrew Saxe MLT 64 1 0 28 Jan 2025
What Matters for In-Context Learning: A Balancing Act of Look-up and In-Weight Learning Jelena Bratulić Sudhanshu Mittal Christian Rupprecht Thomas Brox 41 1 0 09 Jan 2025
A ghost mechanism: An analytical model of abrupt learning Fatih Dinc Ege Cirakman Yiqi Jiang Mert Yuksekgonul Mark J. Schnitzer Hidenori Tanaka 33 2 0 04 Jan 2025
Out-of-distribution generalization via composition: a lens through induction heads in Transformers Jiajun Song Zhuoyan Xu Yiqiao Zhong 85 4 0 31 Dec 2024
Quantifying artificial intelligence through algebraic generalization Takuya Ito Murray Campbell L. Horesh Tim Klinger Parikshit Ram ELM 48 0 0 08 Nov 2024
Toward Understanding In-context vs. In-weight Learning Bryan Chan Xinyi Chen András Gyorgy Dale Schuurmans 72 4 0 30 Oct 2024
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs Tianyu Guo Druv Pai Yu Bai Jiantao Jiao Michael I. Jordan Song Mei 29 9 0 17 Oct 2024
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent Bo Chen Xiaoyu Li Yingyu Liang Zhenmei Shi Zhao-quan Song 96 19 0 15 Oct 2024
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition Zheyang Xiong Ziyang Cai John Cooper Albert Ge Vasilis Papageorgiou ... Saurabh Agarwal Grigorios G Chrysos Samet Oymak Kangwook Lee Dimitris Papailiopoulos LRM 35 1 0 08 Oct 2024
Transformers learn variable-order Markov chains in-context Ruida Zhou C. Tian Suhas Diggavi 26 0 0 07 Oct 2024
Deeper Insights Without Updates: The Power of In-Context Learning Over Fine-Tuning Qingyu Yin Xuzheng He Luoao Deng Chak Tou Leong Fan Wang Yanzhao Yan Xiaoyu Shen Qiang Zhang 37 2 0 07 Oct 2024
Task Diversity Shortens the ICL Plateau Jaeyeon Kim Sehyun Kwon Joo Young Choi Jongho Park Jaewoong Cho Jason D. Lee Ernest K. Ryu MoMe 31 2 0 07 Oct 2024
GAMformer: In-Context Learning for Generalized Additive Models Andreas Mueller Julien N. Siems Harsha Nori David Salinas Arber Zela Rich Caruana Frank Hutter AI4CE 33 3 0 06 Oct 2024
Wrong-of-Thought: An Integrated Reasoning Framework with Multi-Perspective Verification and Wrong Information Yongheng Zhang Qiguang Chen Jingxuan Zhou Peng Wang Jiasheng Si Jin Wang Wenpeng Lu Libo Qin LRM 46 3 0 06 Oct 2024
Attention Prompting on Image for Large Vision-Language Models Runpeng Yu Weihao Yu Xinchao Wang VLM 37 6 0 25 Sep 2024
Attention Heads of Large Language Models: A Survey Zifan Zheng Yezhaohui Wang Yuxin Huang Shichao Song Mingchuan Yang Bo Tang Feiyu Xiong Zhiyu Li LRM 58 21 0 05 Sep 2024
Representing Rule-based Chatbots with Transformers Dan Friedman Abhishek Panigrahi Danqi Chen 63 1 0 15 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 82 19 0 02 Jul 2024
Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning Xiaolei Wang Xinyu Tang Wayne Xin Zhao Ji-Rong Wen 25 3 0 20 Jun 2024
Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning Hui Liu Wenya Wang Hao Sun Chris Xing Tian Chenqi Kong Xin Dong Haoliang Li 54 5 0 14 Jun 2024
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models Jack Merullo Carsten Eickhoff Ellie Pavlick 58 13 0 13 Jun 2024
Iteration Head: A Mechanistic Study of Chain-of-Thought Vivien A. Cabannes Charles Arnal Wassim Bouaziz Alice Yang Francois Charton Julia Kempe LRM 27 7 0 04 Jun 2024
Why Larger Language Models Do In-context Learning Differently? Zhenmei Shi Junyi Wei Zhuoyan Xu Yingyu Liang 37 18 0 30 May 2024
IM-Context: In-Context Learning for Imbalanced Regression Tasks Ismail Nejjar Faez Ahmed Olga Fink 27 1 0 28 May 2024
Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting Suraj Anand Michael A. Lepori Jack Merullo Ellie Pavlick CLL 31 6 0 28 May 2024
Towards Better Understanding of In-Context Learning Ability from In-Context Uncertainty Quantification Shang Liu Zhongze Cai Guanting Chen Xiaocheng Li UQCV 46 1 0 24 May 2024
Linking In-context Learning in Transformers to Human Episodic Memory Ji-An Li Corey Y. Zhou M. Benna Marcelo G. Mattar 32 3 0 23 May 2024
Asymptotic theory of in-context learning by linear attention Yue M. Lu Mary I. Letey Jacob A. Zavatone-Veth Anindita Maiti C. Pehlevan 26 10 0 20 May 2024
Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models Ang Lv Yuhan Chen Kaiyi Zhang Yulong Wang Lifeng Liu Ji-Rong Wen Jian Xie Rui Yan KELM 32 16 0 28 Mar 2024
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains Benjamin L. Edelman Ezra Edelman Surbhi Goel Eran Malach Nikolaos Tsilivis BDL 26 42 0 16 Feb 2024
A phase transition between positional and semantic learning in a solvable model of dot-product attention Hugo Cui Freya Behrens Florent Krzakala Lenka Zdeborová MLT 27 11 0 06 Feb 2024
Is Mamba Capable of In-Context Learning? Riccardo Grazzi Julien N. Siems Simon Schrodi Thomas Brox Frank Hutter 24 40 0 05 Feb 2024
On Catastrophic Inheritance of Large Foundation Models Hao Chen Bhiksha Raj Xing Xie Jindong Wang AI4CE 56 12 0 02 Feb 2024
Anchor function: a type of benchmark functions for studying language models Zhongwang Zhang Zhiwei Wang Junjie Yao Zhangchen Zhou Xiaolong Li E. Weinan Z. Xu 37 5 0 16 Jan 2024
The Transient Nature of Emergent In-Context Learning in Transformers Aaditya K. Singh Stephanie C. Y. Chan Ted Moskovitz Erin Grant Andrew M. Saxe Felix Hill 67 32 0 14 Nov 2023
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems David T. Hoffmann Simon Schrodi Jelena Bratulić Nadine Behrmann Volker Fischer Thomas Brox 30 5 0 19 Oct 2023
Breaking through the learning plateaus of in-context learning in Transformer Jingwen Fu Tao Yang Yuwang Wang Yan Lu Nanning Zheng 30 1 0 12 Sep 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 212 496 0 01 Nov 2022