ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.11895
  4. Cited By
In-context Learning and Induction Heads

In-context Learning and Induction Heads

24 September 2022
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
T. Henighan
Benjamin Mann
Amanda Askell
Yuntao Bai
Anna Chen
Tom Conerly
Dawn Drain
Deep Ganguli
Zac Hatfield-Dodds
Danny Hernandez
Scott R. Johnston
Andy Jones
John Kernion
Liane Lovitt
Kamal Ndousse
Dario Amodei
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
ArXiv (abs)PDFHTML

Papers citing "In-context Learning and Induction Heads"

50 / 434 papers shown
Title
Schema-learning and rebinding as mechanisms of in-context learning and
  emergence
Schema-learning and rebinding as mechanisms of in-context learning and emergence
Siva K. Swaminathan
Antoine Dedieu
Rajkumar Vasudeva Raju
Murray Shanahan
Miguel Lazaro-Gredilla
Dileep George
97
14
0
16 Jun 2023
Inverse Scaling: When Bigger Isn't Better
Inverse Scaling: When Bigger Isn't Better
I. R. McKenzie
Alexander Lyzhov
Michael Pieler
Alicia Parrish
Aaron Mueller
...
Yuhui Zhang
Zhengping Zhou
Najoung Kim
Sam Bowman
Ethan Perez
96
140
0
15 Jun 2023
Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Nikhil Vyas
Depen Morwani
Rosie Zhao
Gal Kaplun
Sham Kakade
Boaz Barak
MLT
79
4
0
14 Jun 2023
FLamE: Few-shot Learning from Natural Language Explanations
FLamE: Few-shot Learning from Natural Language Explanations
Yangqiaoyu Zhou
Yiming Zhang
Chenhao Tan
LRMFAtt
95
11
0
13 Jun 2023
Inductive reasoning in humans and large language models
Inductive reasoning in humans and large language models
Simon J. Han
Keith Ransom
Andrew Perfors
Charles Kemp
ELMReLMLRM
79
39
0
11 Jun 2023
Mapping the Challenges of HCI: An Application and Evaluation of ChatGPT
  and GPT-4 for Mining Insights at Scale
Mapping the Challenges of HCI: An Application and Evaluation of ChatGPT and GPT-4 for Mining Insights at Scale
Jonas Oppenlaender
Joonas Hamalainen
86
6
0
08 Jun 2023
In-Context Learning through the Bayesian Prism
In-Context Learning through the Bayesian Prism
Madhuri Panwar
Kabir Ahuja
Navin Goyal
BDL
89
48
0
08 Jun 2023
Causal interventions expose implicit situation models for commonsense
  language understanding
Causal interventions expose implicit situation models for commonsense language understanding
Takateru Yamakoshi
James L. McClelland
A. Goldberg
Robert D. Hawkins
104
6
0
06 Jun 2023
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight
  Compression
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
Tim Dettmers
Ruslan Svirschevski
Vage Egiazarian
Denis Kuznedelev
Elias Frantar
Saleh Ashkboos
Alexander Borzunov
Torsten Hoefler
Dan Alistarh
MQ
78
257
0
05 Jun 2023
Learning Transformer Programs
Learning Transformer Programs
Dan Friedman
Alexander Wettig
Danqi Chen
89
36
0
01 Jun 2023
Birth of a Transformer: A Memory Viewpoint
Birth of a Transformer: A Memory Viewpoint
A. Bietti
Vivien A. Cabannes
Diane Bouchacourt
Hervé Jégou
Léon Bottou
112
96
0
01 Jun 2023
Transformers learn to implement preconditioned gradient descent for
  in-context learning
Transformers learn to implement preconditioned gradient descent for in-context learning
Kwangjun Ahn
Xiang Cheng
Hadi Daneshmand
S. Sra
ODL
95
176
0
01 Jun 2023
Neuron to Graph: Interpreting Language Model Neurons at Scale
Neuron to Graph: Interpreting Language Model Neurons at Scale
Alex Foote
Neel Nanda
Esben Kran
Ioannis Konstas
Shay B. Cohen
Fazl Barez
MILM
78
26
0
31 May 2023
Feature-Learning Networks Are Consistent Across Widths At Realistic
  Scales
Feature-Learning Networks Are Consistent Across Widths At Realistic Scales
Nikhil Vyas
Alexander B. Atanasov
Blake Bordelon
Depen Morwani
Sabarish Sainathan
Cengiz Pehlevan
131
26
0
28 May 2023
Neural Sculpting: Uncovering hierarchically modular task structure in
  neural networks through pruning and network analysis
Neural Sculpting: Uncovering hierarchically modular task structure in neural networks through pruning and network analysis
S. M. Patil
Loizos Michael
C. Dovrolis
70
0
0
28 May 2023
A Mechanism for Sample-Efficient In-Context Learning for Sparse
  Retrieval Tasks
A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks
Jacob D. Abernethy
Alekh Agarwal
T. V. Marinov
Manfred K. Warmuth
85
21
0
26 May 2023
Backpack Language Models
Backpack Language Models
John Hewitt
John Thickstun
Christopher D. Manning
Percy Liang
KELM
101
16
0
26 May 2023
A Closer Look at In-Context Learning under Distribution Shifts
A Closer Look at In-Context Learning under Distribution Shifts
Kartik Ahuja
David Lopez-Paz
81
18
0
26 May 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in
  1-layer Transformer
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Yuandong Tian
Yiping Wang
Beidi Chen
S. Du
MLT
109
79
0
25 May 2023
Language Models Implement Simple Word2Vec-style Vector Arithmetic
Language Models Implement Simple Word2Vec-style Vector Arithmetic
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
KELM
95
66
0
25 May 2023
Towards Revealing the Mystery behind Chain of Thought: A Theoretical
  Perspective
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective
Guhao Feng
Bohang Zhang
Yuntian Gu
Haotian Ye
Di He
Liwei Wang
LRM
141
261
0
24 May 2023
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models
  using Causal Mediation Analysis
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis
Alessandro Stolfo
Yonatan Belinkov
Mrinmaya Sachan
MILMKELMLRM
101
54
0
24 May 2023
Label Words are Anchors: An Information Flow Perspective for
  Understanding In-Context Learning
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
Lean Wang
Lei Li
Damai Dai
Deli Chen
Hao Zhou
Fandong Meng
Jie Zhou
Xu Sun
177
193
0
23 May 2023
Physics of Language Models: Part 1, Learning Hierarchical Language Structures
Physics of Language Models: Part 1, Learning Hierarchical Language Structures
Zeyuan Allen-Zhu
Yuanzhi Li
112
21
0
23 May 2023
Look-back Decoding for Open-Ended Text Generation
Look-back Decoding for Open-Ended Text Generation
Nan Xu
Chunting Zhou
Asli Celikyilmaz
Xuezhe Ma
63
9
0
22 May 2023
Explaining Emergent In-Context Learning as Kernel Regression
Explaining Emergent In-Context Learning as Kernel Regression
Chi Han
Ziqi Wang
Haiying Zhao
Heng Ji
LRM
94
15
0
22 May 2023
Has It All Been Solved? Open NLP Research Questions Not Solved by Large
  Language Models
Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models
Oana Ignat
Zhijing Jin
Artem Abzaliev
Laura Biester
Santiago Castro
...
Verónica Pérez-Rosas
Siqi Shen
Zekun Wang
Winston Wu
Rada Mihalcea
LRM
136
6
0
21 May 2023
Explaining How Transformers Use Context to Build Predictions
Explaining How Transformers Use Context to Build Predictions
Javier Ferrando
Gerard I. Gállego
Ioannis Tsiamas
Marta R. Costa-jussá
65
37
0
21 May 2023
Token-wise Decomposition of Autoregressive Language Model Hidden States
  for Analyzing Model Predictions
Token-wise Decomposition of Autoregressive Language Model Hidden States for Analyzing Model Predictions
Byung-Doh Oh
William Schuler
65
2
0
17 May 2023
What In-Context Learning "Learns" In-Context: Disentangling Task
  Recognition and Task Learning
What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning
Jane Pan
Tianyu Gao
Howard Chen
Danqi Chen
84
126
0
16 May 2023
Pre-Training to Learn in Context
Pre-Training to Learn in Context
Yuxian Gu
Li Dong
Furu Wei
Minlie Huang
CLIPLRMReLM
167
38
0
16 May 2023
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Zhengxuan Wu
Atticus Geiger
Thomas Icard
Christopher Potts
Noah D. Goodman
MILM
85
93
0
15 May 2023
StarCoder: may the source be with you!
StarCoder: may the source be with you!
Raymond Li
Loubna Ben Allal
Yangtian Zi
Niklas Muennighoff
Denis Kocetkov
...
Sean M. Hughes
Thomas Wolf
Arjun Guha
Leandro von Werra
H. D. Vries
146
798
0
09 May 2023
A technical note on bilinear layers for interpretability
A technical note on bilinear layers for interpretability
Lee D. Sharkey
FAtt
26
6
0
05 May 2023
AttentionViz: A Global View of Transformer Attention
AttentionViz: A Global View of Transformer Attention
Catherine Yeh
Yida Chen
Aoyu Wu
Cynthia Chen
Fernanda Viégas
Martin Wattenberg
ViT
79
55
0
04 May 2023
Seeing is Believing: Brain-Inspired Modular Training for Mechanistic
  Interpretability
Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability
Ziming Liu
Eric Gan
Max Tegmark
82
40
0
04 May 2023
The System Model and the User Model: Exploring AI Dashboard Design
The System Model and the User Model: Exploring AI Dashboard Design
Fernanda Viégas
Martin Wattenberg
56
6
0
04 May 2023
Few-shot In-context Learning for Knowledge Base Question Answering
Few-shot In-context Learning for Knowledge Base Question Answering
Tianle Li
Xueguang Ma
Alex Zhuang
Yu Gu
Yu-Chuan Su
Wenhu Chen
182
84
0
02 May 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
282
218
0
02 May 2023
Towards Automated Circuit Discovery for Mechanistic Interpretability
Towards Automated Circuit Discovery for Mechanistic Interpretability
Arthur Conmy
Augustine N. Mavor-Parker
Aengus Lynch
Stefan Heimersheim
Adrià Garriga-Alonso
68
319
0
28 Apr 2023
The Quantization Model of Neural Scaling
The Quantization Model of Neural Scaling
Eric J. Michaud
Ziming Liu
Uzay Girit
Max Tegmark
MILM
125
89
0
23 Mar 2023
The Representational Status of Deep Learning Models
The Representational Status of Deep Learning Models
Eamon Duede
103
0
0
21 Mar 2023
Language Model Behavior: A Comprehensive Survey
Language Model Behavior: A Comprehensive Survey
Tyler A. Chang
Benjamin Bergen
VLMLRMLM&MA
111
109
0
20 Mar 2023
Towards the Scalable Evaluation of Cooperativeness in Language Models
Towards the Scalable Evaluation of Cooperativeness in Language Models
Alan Chan
Maxime Riché
Jesse Clifton
LLMAG
81
7
0
16 Mar 2023
Do Transformers Parse while Predicting the Masked Word?
Do Transformers Parse while Predicting the Masked Word?
Haoyu Zhao
A. Panigrahi
Rong Ge
Sanjeev Arora
146
35
0
14 Mar 2023
The Learnability of In-Context Learning
The Learnability of In-Context Learning
Noam Wies
Yoav Levine
Amnon Shashua
186
109
0
14 Mar 2023
Finding Alignments Between Interpretable Causal Variables and
  Distributed Neural Representations
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Atticus Geiger
Zhengxuan Wu
Christopher Potts
Thomas Icard
Noah D. Goodman
CML
182
112
0
05 Mar 2023
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural
  Networks
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
Rui-Jie Zhu
Qihang Zhao
Guoqi Li
Jason K. Eshraghian
BDLVLM
111
89
0
27 Feb 2023
Analyzing And Editing Inner Mechanisms Of Backdoored Language Models
Analyzing And Editing Inner Mechanisms Of Backdoored Language Models
Max Lamparth
Anka Reuel
KELM
76
11
0
24 Feb 2023
One Fits All:Power General Time Series Analysis by Pretrained LM
One Fits All:Power General Time Series Analysis by Pretrained LM
Tian Zhou
Peisong Niu
Xue Wang
Liang Sun
Rong Jin
AI4TS
156
440
0
23 Feb 2023
Previous
123456789
Next