Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2504.10957
Cited By
v1
v2
v3 (latest)
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
15 April 2025
Hongkang Li
Yihua Zhang
Shuai Zhang
Ming Wang
Sijia Liu
Pin-Yu Chen
MoMe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers"
50 / 68 papers shown
Title
Cross-Model Transfer of Task Vectors via Few-Shot Orthogonal Alignment
Kazuhiko Kawamoto
Atsuhiro Endo
Hiroshi Kera
66
0
0
17 May 2025
MergeBench: A Benchmark for Merging Domain-Specialized LLMs
Yifei He
Siqi Zeng
Yuzheng Hu
Rui Yang
Tong Zhang
Han Zhao
MoMe
ALM
108
0
0
16 May 2025
Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors
Ren-Wei Liang
Chin-Ting Hsu
Chan-Hung Yu
Saransh Agrawal
Shih-Cheng Huang
Shang-Tse Chen
Kuan-Hao Huang
Shao-Hua Sun
157
0
0
27 Apr 2025
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic
Yifei He
Yuzheng Hu
Yong Lin
Tong Zhang
Han Zhao
FedML
MoMe
127
25
0
08 Jan 2025
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Hongkang Li
Songtao Lu
Pin-Yu Chen
Xiaodong Cui
Meng Wang
LRM
75
6
0
03 Oct 2024
Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions
Siqiao Mu
Diego Klabjan
MU
136
5
0
15 Sep 2024
MUSE: Machine Unlearning Six-Way Evaluation for Language Models
Weijia Shi
Jaechan Lee
Yangsibo Huang
Sadhika Malladi
Jieyu Zhao
Ari Holtzman
Daogao Liu
Luke Zettlemoyer
Noah A. Smith
Chiyuan Zhang
MU
ELM
98
84
0
08 Jul 2024
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis
Hongkang Li
Meng Wang
Shuai Zhang
Sijia Liu
Pin-Yu Chen
97
7
0
24 Jun 2024
Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples
Dake Bu
Wei Huang
Taiji Suzuki
Ji Cheng
Qingfu Zhang
Zhiqiang Xu
Hau-San Wong
MLT
AAML
37
2
0
06 Jun 2024
What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding
Hongkang Li
Meng Wang
Tengfei Ma
Sijia Liu
Zaixi Zhang
Pin-Yu Chen
MLT
AI4CE
123
11
0
04 Jun 2024
How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?
Hongkang Li
Meng Wang
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
MLT
100
18
0
23 Feb 2024
From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers
M. E. Ildiz
Yixiao Huang
Yingcong Li
A. S. Rawat
Samet Oymak
83
23
0
21 Feb 2024
LoRA Training in the NTK Regime has No Spurious Local Minima
Uijeong Jang
Jason D. Lee
Ernest K. Ryu
87
17
0
19 Feb 2024
TOFU: A Task of Fictitious Unlearning for LLMs
Pratyush Maini
Zhili Feng
Avi Schwarzschild
Zachary Chase Lipton
J. Zico Kolter
MU
CLL
128
193
0
11 Jan 2024
In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
Sheng Liu
Haotian Ye
Lei Xing
James Y. Zou
125
117
0
11 Nov 2023
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Le Yu
Yu Bowen
Haiyang Yu
Fei Huang
Yongbin Li
MoMe
114
336
0
06 Nov 2023
Task Arithmetic with LoRA for Continual Learning
Rajas Chitale
Ankit Vaidya
Aditya Kane
Archana Ghotkar
93
17
0
04 Nov 2023
In-Context Learning Creates Task Vectors
Roee Hendel
Mor Geva
Amir Globerson
107
167
0
24 Oct 2023
Function Vectors in Large Language Models
Eric Todd
Millicent Li
Arnab Sen Sharma
Aaron Mueller
Byron C. Wallace
David Bau
57
124
0
23 Oct 2023
In-Context Convergence of Transformers
Yu Huang
Yuan Cheng
Yingbin Liang
MLT
106
73
0
08 Oct 2023
Textbooks Are All You Need II: phi-1.5 technical report
Yuan-Fang Li
Sébastien Bubeck
Ronen Eldan
Allison Del Giorno
Suriya Gunasekar
Yin Tat Lee
ALM
LRM
171
482
0
11 Sep 2023
Enhancing Graph Transformers with Hierarchical Distance Structural Encoding
Yuan Luo
Hongkang Li
Lei Shi
Xiao-Ming Wu
76
8
0
22 Aug 2023
Textbooks Are All You Need
Suriya Gunasekar
Yi Zhang
J. Aneja
C. C. T. Mendes
Allison Del Giorno
...
Sébastien Bubeck
Ronen Eldan
Adam Tauman Kalai
Y. Lee
Yuan-Fang Li
AI4CE
ALM
SyDa
95
410
0
20 Jun 2023
Trained Transformers Learn Linear Models In-Context
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
93
201
0
16 Jun 2023
Transformers learn through gradual rank increase
Enric Boix-Adserà
Etai Littwin
Emmanuel Abbe
Samy Bengio
J. Susskind
83
37
0
12 Jun 2023
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection
Yu Bai
Fan Chen
Haiquan Wang
Caiming Xiong
Song Mei
52
198
0
07 Jun 2023
On the Role of Attention in Prompt-tuning
Samet Oymak
A. S. Rawat
Mahdi Soltanolkotabi
Christos Thrampoulidis
MLT
LRM
70
47
0
06 Jun 2023
TIES-Merging: Resolving Interference When Merging Models
Prateek Yadav
Derek Tam
Leshem Choshen
Colin Raffel
Joey Tianyi Zhou
MoMe
137
317
0
02 Jun 2023
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models
Guillermo Ortiz-Jiménez
Alessandro Favero
P. Frossard
MoMe
154
125
0
22 May 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.5K
14,789
0
15 Mar 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding
Yuchen Li
Yuan-Fang Li
Andrej Risteski
163
65
0
07 Mar 2023
Benign Overfitting for Two-layer ReLU Convolutional Neural Networks
Yiwen Kou
Zi-Yuan Chen
Yuanzhou Chen
Quanquan Gu
MLT
92
17
0
07 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
1.6K
13,490
0
27 Feb 2023
SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics
Emmanuel Abbe
Enric Boix-Adserà
Theodor Misiakiewicz
FedML
MLT
157
86
0
21 Feb 2023
A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity
Hongkang Li
Ming Wang
Sijia Liu
Pin-Yu Chen
ViT
MLT
135
64
0
12 Feb 2023
Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks
Shuai Zhang
Ming Wang
Pin-Yu Chen
Sijia Liu
Songtao Lu
Miaoyuan Liu
MLT
100
17
0
06 Feb 2023
Transformers as Algorithms: Generalization and Stability in In-context Learning
Yingcong Li
M. E. Ildiz
Dimitris Papailiopoulos
Samet Oymak
98
174
0
17 Jan 2023
SensePOLAR: Word sense aware interpretability for pre-trained contextual word embeddings
Jan Engler
Sandipan Sikdar
Marlene Lutz
M. Strohmaier
79
7
0
11 Jan 2023
Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization
Alexandre Ramé
Kartik Ahuja
Jianyu Zhang
Matthieu Cord
Léon Bottou
David Lopez-Paz
MoMe
OODD
116
86
0
20 Dec 2022
Dataless Knowledge Fusion by Merging Weights of Language Models
Xisen Jin
Xiang Ren
Daniel Preoţiuc-Pietro
Pengxiang Cheng
FedML
MoMe
99
250
0
19 Dec 2022
Transformers learn in-context by gradient descent
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
121
496
0
15 Dec 2022
Editing Models with Task Arithmetic
Gabriel Ilharco
Marco Tulio Ribeiro
Mitchell Wortsman
Suchin Gururangan
Ludwig Schmidt
Hannaneh Hajishirzi
Ali Farhadi
KELM
MoMe
MU
203
521
0
08 Dec 2022
What learning algorithm is in-context learning? Investigations with linear models
Ekin Akyürek
Dale Schuurmans
Jacob Andreas
Tengyu Ma
Denny Zhou
123
493
0
28 Nov 2022
Vision Transformers provably learn spatial structure
Samy Jelassi
Michael E. Sander
Yuan-Fang Li
ViT
MLT
91
82
0
13 Oct 2022
Patching open-vocabulary models by interpolating weights
Gabriel Ilharco
Mitchell Wortsman
S. Gadre
Shuran Song
Hannaneh Hajishirzi
Simon Kornblith
Ali Farhadi
Ludwig Schmidt
VLM
KELM
117
176
0
10 Aug 2022
Neural Networks can Learn Representations with Gradient Descent
Alexandru Damian
Jason D. Lee
Mahdi Soltanolkotabi
SSL
MLT
98
123
0
30 Jun 2022
Diverse Weight Averaging for Out-of-Distribution Generalization
Alexandre Ramé
Matthieu Kirchmeyer
Thibaud Rahier
A. Rakotomamonjy
Patrick Gallinari
Matthieu Cord
OOD
256
138
0
19 May 2022
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
540
6,304
0
05 Apr 2022
On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks
Hongru Yang
Zhangyang Wang
MLT
97
8
0
27 Mar 2022
Visual Prompt Tuning
Menglin Jia
Luming Tang
Bor-Chun Chen
Claire Cardie
Serge Belongie
Bharath Hariharan
Ser-Nam Lim
VLM
VPVLM
175
1,647
0
23 Mar 2022
1
2
Next