On the Role of Attention in Prompt-tuning

6 June 2023

Samet Oymak

A. S. Rawat

Mahdi Soltanolkotabi

Christos Thrampoulidis

MLT

LRM

ArXiv PDF HTML

Papers citing "On the Role of Attention in Prompt-tuning"

33 / 33 papers shown

Title
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers Hongkang Li Yihua Zhang Shuai Zhang Hao Wu Sijia Liu Pin-Yu Chen MoMe 69 4 0 15 Apr 2025
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning Yingcong Li Davoud Ataee Tarzanagh A. S. Rawat Maryam Fazel Samet Oymak 25 0 0 06 Apr 2025
Learning Linear Attention in Polynomial Time Morris Yau Ekin Akyürek Jiayuan Mao Joshua B. Tenenbaum Stefanie Jegelka Jacob Andreas 19 2 0 14 Oct 2024
Parameter-Efficient Fine-Tuning of State Space Models Kevin Galim Wonjun Kang Yuchen Zeng H. Koo Kangwook Lee 31 4 0 11 Oct 2024
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis Hongkang Li Meng Wang Songtao Lu Xiaodong Cui Pin-Yu Chen LRM 35 5 0 03 Oct 2024
Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models Shuai Fu Xiequn Wang Qiushi Huang Yu Zhang VLM 45 2 0 26 Aug 2024
Memorization Capacity for Additive Fine-Tuning with Small ReLU Networks Jy-yong Sohn Dohyun Kwon Seoyeon An Kangwook Lee 46 0 0 01 Aug 2024
Transformers on Markov Data: Constant Depth Suffices Nived Rajaraman Marco Bondaschi Kannan Ramchandran Michael C. Gastpar Ashok Vardhan Makkuva 51 4 0 25 Jul 2024
On the Power of Convolution Augmented Transformer Mingchen Li Xuechen Zhang Yixiao Huang Samet Oymak 37 0 0 08 Jul 2024
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis Hongkang Li Meng Wang Shuai Zhang Sijia Liu Pin-Yu Chen 35 6 0 24 Jun 2024
What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding Hongkang Li Meng Wang Tengfei Ma Sijia Liu Zaixi Zhang Pin-Yu Chen MLT AI4CE 50 10 0 04 Jun 2024
Implicit Regularization of Gradient Flow on One-Layer Softmax Attention Heejune Sheen Siyu Chen Tianhao Wang Harrison H. Zhou MLT 38 10 0 13 Mar 2024
Mechanics of Next Token Prediction with Self-Attention Yingcong Li Yixiao Huang M. E. Ildiz A. S. Rawat Samet Oymak 37 26 0 12 Mar 2024
How Do Nonlinear Transformers Learn and Generalize in In-Context Learning? Hongkang Li Meng Wang Songtao Lu Xiaodong Cui Pin-Yu Chen MLT 42 14 0 23 Feb 2024
From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers M. E. Ildiz Yixiao Huang Yingcong Li A. S. Rawat Samet Oymak 38 17 0 21 Feb 2024
Implicit Bias and Fast Convergence Rates for Self-attention Bhavya Vasudeva Puneesh Deora Christos Thrampoulidis 37 13 0 08 Feb 2024
Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains Ashok Vardhan Makkuva Marco Bondaschi Adway Girish Alliot Nagle Martin Jaggi Hyeji Kim Michael C. Gastpar OffRL 18 25 0 06 Feb 2024
Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features Simone Bombari Marco Mondelli 39 3 0 05 Feb 2024
Superiority of Multi-Head Attention in In-Context Linear Regression Yingqian Cui Jie Ren Pengfei He Jiliang Tang Yue Xing 39 13 0 30 Jan 2024
Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning? Cheng Han Qifan Wang Yiming Cui Wenguan Wang Lifu Huang Siyuan Qi Dongfang Liu VLM 51 19 0 23 Jan 2024
The Expressive Power of Low-Rank Adaptation Yuchen Zeng Kangwook Lee 38 51 0 26 Oct 2023
On the Optimization and Generalization of Multi-head Attention Puneesh Deora Rouzbeh Ghaderi Hossein Taheri Christos Thrampoulidis MLT 50 33 0 19 Oct 2023
Visual Attention Prompted Prediction and Learning Yifei Zhang Siyi Gu Bo Pan Guangji Bai Meikang Qiu Xiaofeng Yang Liang Zhao LRM VLM 35 2 0 12 Oct 2023
Linear attention is (maybe) all you need (to understand transformer optimization) Kwangjun Ahn Xiang Cheng Minhak Song Chulhee Yun Ali Jadbabaie S. Sra 31 45 1 02 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention Yuandong Tian Yiping Wang Zhenyu (Allen) Zhang Beidi Chen Simon S. Du 37 35 0 01 Oct 2023
Are Soft Prompts Good Zero-shot Learners for Speech Recognition? Dianwen Ng Chong Zhang Ruixi Zhang Yukun Ma Fabian Ritter Gutierrez Trung Hieu Nguyen Chongjia Ni Shengkui Zhao E. Chng B. Ma VLM 40 1 0 18 Sep 2023
Transformers as Support Vector Machines Davoud Ataee Tarzanagh Yingcong Li Christos Thrampoulidis Samet Oymak 48 43 0 31 Aug 2023
What can a Single Attention Layer Learn? A Study Through the Random Features Lens Hengyu Fu Tianyu Guo Yu Bai Song Mei MLT 35 22 0 21 Jul 2023
Max-Margin Token Selection in Attention Mechanism Davoud Ataee Tarzanagh Yingcong Li Xuechen Zhang Samet Oymak 40 38 0 23 Jun 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer Yuandong Tian Yiping Wang Beidi Chen S. Du MLT 26 70 0 25 May 2023
SCENIC: A JAX Library for Computer Vision Research and Beyond Mostafa Dehghani A. Gritsenko Anurag Arnab Matthias Minderer Yi Tay 46 68 0 18 Oct 2021
The Power of Scale for Parameter-Efficient Prompt Tuning Brian Lester Rami Al-Rfou Noah Constant VPVLM 280 3,858 0 18 Apr 2021
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky Jia Deng Hao Su J. Krause S. Satheesh ... A. Karpathy A. Khosla Michael S. Bernstein Alexander C. Berg Li Fei-Fei VLM ObjD 296 39,217 0 01 Sep 2014