A Kernel-Based View of Language Model Fine-Tuning

11 October 2022

Sadhika Malladi

Alexander Wettig

Dingli Yu

Papers citing "A Kernel-Based View of Language Model Fine-Tuning"

20 / 20 papers shown

Title
Tensor Product Attention Is All You Need Yifan Zhang Yifeng Liu Huizhuo Yuan Zhen Qin Yang Yuan Q. Gu Andrew Chi-Chih Yao 77 9 0 11 Jan 2025
Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Models Jinxu Lin Linwei Tao Minjing Dong Chang Xu TDI 41 2 0 24 Oct 2024
Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning H. Fernando Han Shen Parikshit Ram Yi Zhou Horst Samulowitz Nathalie Baracaldo Tianyi Chen CLL 59 2 0 20 Oct 2024
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization Noam Razin Sadhika Malladi Adithya Bhaskar Danqi Chen Sanjeev Arora Boris Hanin 96 16 0 11 Oct 2024
Investigating the Impact of Model Complexity in Large Language Models Jing Luo Huiyuan Wang Weiran Huang 39 0 0 01 Oct 2024
When does compositional structure yield compositional generalization? A kernel theory Samuel Lippl Kim Stachenfeld NAI CoGe 73 5 0 26 May 2024
Learning Beyond Pattern Matching? Assaying Mathematical Understanding in LLMs Siyuan Guo Aniket Didolkar Nan Rosemary Ke Anirudh Goyal Ferenc Huszár Bernhard Schölkopf 52 3 0 24 May 2024
Active Few-Shot Fine-Tuning Jonas Hübotter Bhavya Sukhija Lenart Treven Yarden As Andreas Krause 45 1 0 13 Feb 2024
Scalable Neural Network Kernels Arijit Sehanobish Krzysztof Choromanski Yunfan Zhao Kumar Avinava Dubey Valerii Likhosherstov 41 4 0 20 Oct 2023
Fine-Tuning Language Models with Just Forward Passes Sadhika Malladi Tianyu Gao Eshaan Nichani Alexandru Damian Jason D. Lee Danqi Chen Sanjeev Arora 29 177 0 27 May 2023
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models Guillermo Ortiz-Jiménez Alessandro Favero P. Frossard MoMe 51 110 0 22 May 2023
Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together! Shiwei Liu Tianlong Chen Zhenyu (Allen) Zhang Xuxi Chen Tianjin Huang Ajay Jaiswal Zhangyang Wang 32 29 0 03 Mar 2023
Understanding Reconstruction Attacks with the Neural Tangent Kernel and Dataset Distillation Noel Loo Ramin Hasani Mathias Lechner Alexander Amini Daniela Rus DD 42 5 0 02 Feb 2023
An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models Yufeng Zhang Boyi Liu Qi Cai Lingxiao Wang Zhaoran Wang 53 11 0 30 Dec 2022
A picture of the space of typical learnable tasks Rahul Ramesh Jialin Mao Itay Griniasty Rubing Yang H. Teoh Mark K. Transtrum James P. Sethna Pratik Chaudhari SSL DRL 36 5 0 31 Oct 2022
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms Sadhika Malladi Kaifeng Lyu A. Panigrahi Sanjeev Arora 92 40 0 20 May 2022
The Power of Scale for Parameter-Efficient Prompt Tuning Brian Lester Rami Al-Rfou Noah Constant VPVLM 280 3,858 0 18 Apr 2021
Making Pre-trained Language Models Better Few-shot Learners Tianyu Gao Adam Fisch Danqi Chen 243 1,924 0 31 Dec 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference Timo Schick Hinrich Schütze 258 1,589 0 21 Jan 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 297 6,959 0 20 Apr 2018