ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.05858
  4. Cited By
Uncovering mesa-optimization algorithms in Transformers

Uncovering mesa-optimization algorithms in Transformers

11 September 2023
J. Oswald
Eyvind Niklasson
Maximilian Schlegel
Seijin Kobayashi
Nicolas Zucchet
Nino Scherrer
Nolan Miller
Mark Sandler
Blaise Agüera y Arcas
Max Vladymyrov
Razvan Pascanu
João Sacramento
ArXivPDFHTML

Papers citing "Uncovering mesa-optimization algorithms in Transformers"

47 / 47 papers shown
Title
Rethinking Invariance in In-context Learning
Rethinking Invariance in In-context Learning
Lizhe Fang
Yifei Wang
Khashayar Gatmiry
Lei Fang
Yitong Wang
54
2
0
08 May 2025
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
Ali Behrouz
Meisam Razaviyayn
Peilin Zhong
Vahab Mirrokni
41
0
0
17 Apr 2025
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
Yingcong Li
Davoud Ataee Tarzanagh
A. S. Rawat
Maryam Fazel
Samet Oymak
25
0
0
06 Apr 2025
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning
Anh Tong
Thanh Nguyen-Tang
Dongeun Lee
Duc Nguyen
Toan M. Tran
David Hall
Cheongwoong Kang
Jaesik Choi
35
0
0
03 Mar 2025
In-context learning of evolving data streams with tabular foundational models
In-context learning of evolving data streams with tabular foundational models
Afonso Lourenço
João Gama
Eric P. Xing
Goreti Marreiros
61
0
0
24 Feb 2025
ICLR: In-Context Learning of Representations
ICLR: In-Context Learning of Representations
Core Francisco Park
Andrew Lee
Ekdeep Singh Lubana
Yongyi Yang
Maya Okawa
Kento Nishi
Martin Wattenberg
Hidenori Tanaka
AIFin
120
3
0
29 Dec 2024
On the Role of Depth and Looping for In-Context Learning with Task
  Diversity
On the Role of Depth and Looping for In-Context Learning with Task Diversity
Khashayar Gatmiry
Nikunj Saunshi
Sashank J. Reddi
Stefanie Jegelka
Sanjiv Kumar
31
2
0
29 Oct 2024
Multi-agent cooperation through learning-aware policy gradients
Multi-agent cooperation through learning-aware policy gradients
Alexander Meulemans
Seijin Kobayashi
J. Oswald
Nino Scherrer
Eric Elmoznino
Blake A. Richards
Guillaume Lajoie
Blaise Agüera y Arcas
João Sacramento
45
0
0
24 Oct 2024
In-context learning and Occam's razor
In-context learning and Occam's razor
Eric Elmoznino
Tom Marty
Tejas Kasetty
Léo Gagnon
Sarthak Mittal
Mahan Fathi
Dhanya Sridhar
Guillaume Lajoie
42
1
0
17 Oct 2024
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
Renpu Liu
Ruida Zhou
Cong Shen
Jing Yang
30
0
0
17 Oct 2024
Can Looped Transformers Learn to Implement Multi-step Gradient Descent
  for In-context Learning?
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Khashayar Gatmiry
Nikunj Saunshi
Sashank J. Reddi
Stefanie Jegelka
Sanjiv Kumar
67
17
0
10 Oct 2024
Task Diversity Shortens the ICL Plateau
Task Diversity Shortens the ICL Plateau
Jaeyeon Kim
Sehyun Kwon
Joo Young Choi
Jongho Park
Jaewoong Cho
Jason D. Lee
Ernest K. Ryu
MoMe
33
2
0
07 Oct 2024
Towards Understanding the Universality of Transformers for Next-Token Prediction
Towards Understanding the Universality of Transformers for Next-Token Prediction
Michael E. Sander
Gabriel Peyré
CML
39
0
0
03 Oct 2024
Learning Randomized Algorithms with Transformers
Learning Randomized Algorithms with Transformers
J. Oswald
Seijin Kobayashi
Yassir Akram
Angelika Steger
AAML
42
0
0
20 Aug 2024
Transformers are Universal In-context Learners
Transformers are Universal In-context Learners
Takashi Furuya
Maarten V. de Hoop
Gabriel Peyré
42
6
0
02 Aug 2024
Longhorn: State Space Models are Amortized Online Learners
Longhorn: State Space Models are Amortized Online Learners
Bo Liu
Rui Wang
Lemeng Wu
Yihao Feng
Peter Stone
Qian Liu
51
10
0
19 Jul 2024
When can transformers compositionally generalize in-context?
When can transformers compositionally generalize in-context?
Seijin Kobayashi
Simon Schug
Yassir Akram
Florian Redhardt
J. Oswald
Razvan Pascanu
Guillaume Lajoie
João Sacramento
ViT
26
0
0
17 Jul 2024
Representing Rule-based Chatbots with Transformers
Representing Rule-based Chatbots with Transformers
Dan Friedman
Abhishek Panigrahi
Danqi Chen
66
1
0
15 Jul 2024
Fine-grained Analysis of In-context Linear Estimation: Data,
  Architecture, and Beyond
Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond
Yingcong Li
A. S. Rawat
Samet Oymak
25
6
0
13 Jul 2024
Encoder vs Decoder: Comparative Analysis of Encoder and Decoder Language Models on Multilingual NLU Tasks
Encoder vs Decoder: Comparative Analysis of Encoder and Decoder Language Models on Multilingual NLU Tasks
Dan S. Nielsen
Kenneth Enevoldsen
Peter Schneider-Kamp
ELM
46
2
0
19 Jun 2024
Attention as a Hypernetwork
Attention as a Hypernetwork
Simon Schug
Seijin Kobayashi
Yassir Akram
João Sacramento
Razvan Pascanu
GNN
37
3
0
09 Jun 2024
Universal In-Context Approximation By Prompting Fully Recurrent Models
Universal In-Context Approximation By Prompting Fully Recurrent Models
Aleksandar Petrov
Tom A. Lamb
Alasdair Paren
Philip Torr
Adel Bibi
LRM
32
0
0
03 Jun 2024
On Mesa-Optimization in Autoregressively Trained Transformers: Emergence
  and Capability
On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability
Chenyu Zheng
Wei Huang
Rongzheng Wang
Guoqiang Wu
Jun Zhu
Chongxuan Li
39
1
0
27 May 2024
From Frege to chatGPT: Compositionality in language, cognition, and deep
  neural networks
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks
Jacob Russin
Sam Whitman McGrath
Danielle J. Williams
Lotem Elber-Dorozko
AI4CE
75
3
0
24 May 2024
Mechanistic Interpretability for AI Safety -- A Review
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
40
114
0
22 Apr 2024
From Words to Numbers: Your Large Language Model Is Secretly A Capable
  Regressor When Given In-Context Examples
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
Robert Vacareanu
Vlad-Andrei Negru
Vasile Suciu
Mihai Surdeanu
31
28
0
11 Apr 2024
How Well Can Transformers Emulate In-context Newton's Method?
How Well Can Transformers Emulate In-context Newton's Method?
Angeliki Giannou
Liu Yang
Tianhao Wang
Dimitris Papailiopoulos
Jason D. Lee
38
16
0
05 Mar 2024
Investigating the Effectiveness of HyperTuning via Gisting
Investigating the Effectiveness of HyperTuning via Gisting
Jason Phang
46
0
0
26 Feb 2024
Linear Transformers are Versatile In-Context Learners
Linear Transformers are Versatile In-Context Learners
Max Vladymyrov
J. Oswald
Mark Sandler
Rong Ge
34
13
0
21 Feb 2024
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning
  Tasks
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Jongho Park
Jaeseung Park
Zheyang Xiong
Nayoung Lee
Jaewoong Cho
Samet Oymak
Kangwook Lee
Dimitris Papailiopoulos
30
69
0
06 Feb 2024
Superiority of Multi-Head Attention in In-Context Linear Regression
Superiority of Multi-Head Attention in In-Context Linear Regression
Yingqian Cui
Jie Ren
Pengfei He
Jiliang Tang
Yue Xing
39
12
0
30 Jan 2024
In-Context Language Learning: Architectures and Algorithms
In-Context Language Learning: Architectures and Algorithms
Ekin Akyürek
Bailin Wang
Yoon Kim
Jacob Andreas
LRM
ReLM
45
42
0
23 Jan 2024
Transformers Implement Functional Gradient Descent to Learn Non-Linear
  Functions In Context
Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
Xiang Cheng
Yuxin Chen
S. Sra
18
35
0
11 Dec 2023
Structured World Representations in Maze-Solving Transformers
Structured World Representations in Maze-Solving Transformers
Michael Ivanitskiy
Alex F Spies
Tilman Rauker
Guillaume Corlouer
Chris Mathwin
...
Rusheb Shah
Dan Valentine
Cecilia G. Diniz Behn
Katsumi Inoue
Samy Wu Fung
60
5
0
05 Dec 2023
In-context Learning and Gradient Descent Revisited
In-context Learning and Gradient Descent Revisited
Gilad Deutch
Nadav Magar
Tomer Bar Natan
Guy Dar
30
9
0
13 Nov 2023
In-Context Learning for MIMO Equalization Using Transformer-Based
  Sequence Models
In-Context Learning for MIMO Equalization Using Transformer-Based Sequence Models
Matteo Zecchin
Kai Yu
Osvaldo Simeone
20
10
0
10 Nov 2023
The Mystery of In-Context Learning: A Comprehensive Survey on
  Interpretation and Analysis
The Mystery of In-Context Learning: A Comprehensive Survey on Interpretation and Analysis
Yuxiang Zhou
Jiazheng Li
Yanzheng Xiang
Hanqi Yan
Lin Gui
Yulan He
24
14
0
01 Nov 2023
A Meta-Learning Perspective on Transformers for Causal Language Modeling
A Meta-Learning Perspective on Transformers for Causal Language Modeling
Xinbo Wu
L. Varshney
29
6
0
09 Oct 2023
Trainable Transformer in Transformer
Trainable Transformer in Transformer
A. Panigrahi
Sadhika Malladi
Mengzhou Xia
Sanjeev Arora
VLM
32
12
0
03 Jul 2023
Meta-Learned Models of Cognition
Meta-Learned Models of Cognition
Marcel Binz
Ishita Dasgupta
A. Jagadish
M. Botvinick
Jane X. Wang
Eric Schulz
30
25
0
12 Apr 2023
Transformers generalize differently from information stored in context
  vs in weights
Transformers generalize differently from information stored in context vs in weights
Stephanie C. Y. Chan
Ishita Dasgupta
Junkyung Kim
D. Kumaran
Andrew Kyle Lampinen
Felix Hill
106
45
0
11 Oct 2022
In-context Learning and Induction Heads
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
250
460
0
24 Sep 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
395
8,495
0
28 Jan 2022
The Power of Scale for Parameter-Efficient Prompt Tuning
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,848
0
18 Apr 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
279
1,996
0
31 Dec 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
264
4,489
0
23 Jan 2020
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Chelsea Finn
Pieter Abbeel
Sergey Levine
OOD
362
11,684
0
09 Mar 2017
1