Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.04245
Cited By
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding
7 March 2023
Yuchen Li
Yuan-Fang Li
Andrej Risteski
Re-assign community
ArXiv
PDF
HTML
Papers citing
"How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding"
13 / 13 papers shown
Title
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
Hongkang Li
Yihua Zhang
Shuai Zhang
M. Wang
Sijia Liu
Pin-Yu Chen
MoMe
66
2
0
15 Apr 2025
Tracking the Feature Dynamics in LLM Training: A Mechanistic Study
Yang Xu
Y. Wang
Hao Wang
108
1
0
23 Dec 2024
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Bo Chen
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
86
19
0
15 Oct 2024
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
Kaiyue Wen
Huaqing Zhang
Hongzhou Lin
Jingzhao Zhang
MoE
LRM
61
2
0
07 Oct 2024
Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization
Xinhao Yao
Hongjin Qian
Xiaolin Hu
Gengze Xu
Wei Liu
Jian Luan
B. Wang
Y. Liu
48
0
0
03 Oct 2024
Attention layers provably solve single-location regression
P. Marion
Raphael Berthier
Gérard Biau
Claire Boyer
117
2
0
02 Oct 2024
Attention Meets Post-hoc Interpretability: A Mathematical Perspective
Gianluigi Lopardo
F. Precioso
Damien Garreau
16
4
0
05 Feb 2024
An Information-Theoretic Analysis of In-Context Learning
Hong Jun Jeon
Jason D. Lee
Qi Lei
Benjamin Van Roy
22
18
0
28 Jan 2024
Learning to forecast diagnostic parameters using pre-trained weather embedding
Peetak Mitra
Vivek Ramavajjala
32
1
0
01 Dec 2023
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
Fred Zhang
Neel Nanda
LLMSV
28
97
0
27 Sep 2023
Learning threshold neurons via the "edge of stability"
Kwangjun Ahn
Sébastien Bubeck
Sinho Chewi
Y. Lee
Felipe Suarez
Yi Zhang
MLT
31
36
0
14 Dec 2022
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
226
404
0
24 Feb 2021
Topic Modeling with Contextualized Word Representation Clusters
Laure Thompson
David M. Mimno
102
83
0
23 Oct 2020
1