ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.11895
  4. Cited By
In-context Learning and Induction Heads

In-context Learning and Induction Heads

24 September 2022
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
T. Henighan
Benjamin Mann
Amanda Askell
Yuntao Bai
Anna Chen
Tom Conerly
Dawn Drain
Deep Ganguli
Zac Hatfield-Dodds
Danny Hernandez
Scott Johnston
Andy Jones
John Kernion
Liane Lovitt
Kamal Ndousse
Dario Amodei
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
ArXivPDFHTML

Papers citing "In-context Learning and Induction Heads"

38 / 88 papers shown
Title
Talking Nonsense: Probing Large Language Models' Understanding of
  Adversarial Gibberish Inputs
Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs
Valeriia Cherepanova
James Zou
AAML
33
4
0
26 Apr 2024
Generative Retrieval as Multi-Vector Dense Retrieval
Generative Retrieval as Multi-Vector Dense Retrieval
Shiguang Wu
Wenda Wei
Mengqi Zhang
Zhumin Chen
Jun Ma
Zhaochun Ren
Maarten de Rijke
Pengjie Ren
3DV
29
7
0
31 Mar 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
44
111
0
28 Mar 2024
The pitfalls of next-token prediction
The pitfalls of next-token prediction
Gregor Bachmann
Vaishnavh Nagarajan
37
59
0
11 Mar 2024
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Mahdi Karami
Ali Ghodsi
VLM
40
6
0
28 Feb 2024
Linear Transformers are Versatile In-Context Learners
Linear Transformers are Versatile In-Context Learners
Max Vladymyrov
J. Oswald
Mark Sandler
Rong Ge
26
13
0
21 Feb 2024
Opening the AI black box: program synthesis via mechanistic
  interpretability
Opening the AI black box: program synthesis via mechanistic interpretability
Eric J. Michaud
Isaac Liao
Vedang Lad
Ziming Liu
Anish Mudide
Chloe Loughridge
Zifan Carl Guo
Tara Rezaei Kheirkhah
Mateja Vukelić
Max Tegmark
23
12
0
07 Feb 2024
Carrying over algorithm in transformers
Carrying over algorithm in transformers
J. Kruthoff
24
0
0
15 Jan 2024
Forbidden Facts: An Investigation of Competing Objectives in Llama-2
Forbidden Facts: An Investigation of Competing Objectives in Llama-2
Tony T. Wang
Miles Wang
Kaivu Hariharan
Nir Shavit
21
2
0
14 Dec 2023
FlexModel: A Framework for Interpretability of Distributed Large
  Language Models
FlexModel: A Framework for Interpretability of Distributed Large Language Models
Matthew Choi
Muhammad Adil Asif
John Willes
David Emerson
AI4CE
ALM
22
1
0
05 Dec 2023
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context
  Learning
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Chaoyi Zhang
K. Lin
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
VGen
18
28
0
29 Nov 2023
Compositional Capabilities of Autoregressive Transformers: A Study on
  Synthetic, Interpretable Tasks
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Rahul Ramesh
Ekdeep Singh Lubana
Mikail Khona
Robert P. Dick
Hidenori Tanaka
CoGe
33
6
0
21 Nov 2023
Uncovering Intermediate Variables in Transformers using Circuit Probing
Uncovering Intermediate Variables in Transformers using Circuit Probing
Michael A. Lepori
Thomas Serre
Ellie Pavlick
72
7
0
07 Nov 2023
The Importance of Prompt Tuning for Automated Neuron Explanations
The Importance of Prompt Tuning for Automated Neuron Explanations
Justin Lee
Tuomas P. Oikarinen
Arjun Chatha
Keng-Chi Chang
Yilan Chen
Tsui-Wei Weng
LRM
22
5
0
09 Oct 2023
Towards Best Practices of Activation Patching in Language Models:
  Metrics and Methods
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
Fred Zhang
Neel Nanda
LLMSV
28
97
0
27 Sep 2023
GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models
GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models
Yonggan Fu
Yongan Zhang
Zhongzhi Yu
Sixu Li
Zhifan Ye
Chaojian Li
Cheng Wan
Ying Lin
35
60
0
19 Sep 2023
Towards Vision-Language Mechanistic Interpretability: A Causal Tracing
  Tool for BLIP
Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP
Vedant Palit
Rohan Pandey
Aryaman Arora
Paul Pu Liang
26
20
0
27 Aug 2023
Generative Models as a Complex Systems Science: How can we make sense of
  large language model behavior?
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?
Ari Holtzman
Peter West
Luke Zettlemoyer
AI4CE
30
13
0
31 Jul 2023
Schema-learning and rebinding as mechanisms of in-context learning and
  emergence
Schema-learning and rebinding as mechanisms of in-context learning and emergence
Siva K. Swaminathan
Antoine Dedieu
Rajkumar Vasudeva Raju
Murray Shanahan
Miguel Lazaro-Gredilla
Dileep George
34
8
0
16 Jun 2023
Causal interventions expose implicit situation models for commonsense
  language understanding
Causal interventions expose implicit situation models for commonsense language understanding
Takateru Yamakoshi
James L. McClelland
A. Goldberg
Robert D. Hawkins
17
6
0
06 Jun 2023
A Mechanism for Sample-Efficient In-Context Learning for Sparse
  Retrieval Tasks
A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks
Jacob D. Abernethy
Alekh Agarwal
T. V. Marinov
Manfred K. Warmuth
26
17
0
26 May 2023
Label Words are Anchors: An Information Flow Perspective for
  Understanding In-Context Learning
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
Lean Wang
Lei Li
Damai Dai
Deli Chen
Hao Zhou
Fandong Meng
Jie Zhou
Xu Sun
28
165
0
23 May 2023
Look-back Decoding for Open-Ended Text Generation
Look-back Decoding for Open-Ended Text Generation
Nan Xu
Chunting Zhou
Asli Celikyilmaz
Xuezhe Ma
26
9
0
22 May 2023
Explaining How Transformers Use Context to Build Predictions
Explaining How Transformers Use Context to Build Predictions
Javier Ferrando
Gerard I. Gállego
Ioannis Tsiamas
Marta R. Costa-jussá
27
31
0
21 May 2023
Token-wise Decomposition of Autoregressive Language Model Hidden States
  for Analyzing Model Predictions
Token-wise Decomposition of Autoregressive Language Model Hidden States for Analyzing Model Predictions
Byung-Doh Oh
William Schuler
24
2
0
17 May 2023
Seeing is Believing: Brain-Inspired Modular Training for Mechanistic
  Interpretability
Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability
Ziming Liu
Eric Gan
Max Tegmark
18
36
0
04 May 2023
The Representational Status of Deep Learning Models
The Representational Status of Deep Learning Models
Eamon Duede
19
0
0
21 Mar 2023
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
Daniel Y. Fu
Elliot L. Epstein
Eric N. D. Nguyen
A. Thomas
Michael Zhang
Tri Dao
Atri Rudra
Christopher Ré
16
51
0
13 Feb 2023
A Toy Model of Universality: Reverse Engineering How Networks Learn
  Group Operations
A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations
Bilal Chughtai
Lawrence Chan
Neel Nanda
13
96
0
06 Feb 2023
Learning Functional Transduction
Learning Functional Transduction
Mathieu Chalvidal
Thomas Serre
Rufin VanRullen
AI4CE
30
2
0
01 Feb 2023
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Daniel Y. Fu
Tri Dao
Khaled Kamal Saab
A. Thomas
Atri Rudra
Christopher Ré
65
368
0
28 Dec 2022
Why Can GPT Learn In-Context? Language Models Implicitly Perform
  Gradient Descent as Meta-Optimizers
Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers
Damai Dai
Yutao Sun
Li Dong
Y. Hao
Shuming Ma
Zhifang Sui
Furu Wei
LRM
15
147
0
20 Dec 2022
Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations
Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations
Xinxi Lyu
Sewon Min
Iz Beltagy
Luke Zettlemoyer
Hannaneh Hajishirzi
VLM
12
62
0
19 Dec 2022
The Debate Over Understanding in AI's Large Language Models
The Debate Over Understanding in AI's Large Language Models
Melanie Mitchell
D. Krakauer
ELM
74
202
0
14 Oct 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function
  Classes
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
Shivam Garg
Dimitris Tsipras
Percy Liang
Gregory Valiant
21
447
0
01 Aug 2022
Emergent Abilities of Large Language Models
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELM
ReLM
LRM
48
2,333
0
15 Jun 2022
Inductive Biases and Variable Creation in Self-Attention Mechanisms
Inductive Biases and Variable Creation in Self-Attention Mechanisms
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Cyril Zhang
27
115
0
19 Oct 2021
Making Pre-trained Language Models Better Few-shot Learners
Making Pre-trained Language Models Better Few-shot Learners
Tianyu Gao
Adam Fisch
Danqi Chen
241
1,918
0
31 Dec 2020
Previous
12