Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.16845
Cited By
On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability
27 May 2024
Chenyu Zheng
Wei Huang
Rongzheng Wang
Guoqiang Wu
Jun Zhu
Chongxuan Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability"
8 / 8 papers shown
Title
How do Transformers perform In-Context Autoregressive Learning?
Michael E. Sander
Raja Giryes
Taiji Suzuki
Mathieu Blondel
Gabriel Peyré
40
8
0
08 Feb 2024
Superiority of Multi-Head Attention in In-Context Linear Regression
Yingqian Cui
Jie Ren
Pengfei He
Jiliang Tang
Yue Xing
39
13
0
30 Jan 2024
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?
Jingfeng Wu
Difan Zou
Zixiang Chen
Vladimir Braverman
Quanquan Gu
Peter L. Bartlett
131
50
0
12 Oct 2023
The Learnability of In-Context Learning
Noam Wies
Yoav Levine
Amnon Shashua
122
92
0
14 Mar 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding
Yuchen Li
Yuan-Fang Li
Andrej Risteski
120
61
0
07 Mar 2023
Autoregressive Image Generation using Residual Quantization
Doyup Lee
Chiheon Kim
Saehoon Kim
Minsu Cho
Wook-Shin Han
VGen
181
330
0
03 Mar 2022
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,796
0
24 Feb 2021
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
Hamed Karimi
J. Nutini
Mark W. Schmidt
139
1,201
0
16 Aug 2016
1